CN110691202A

CN110691202A - Video editing method, device and computer storage medium

Info

Publication number: CN110691202A
Application number: CN201910804184.XA
Authority: CN
Inventors: 马丹; 马晓琳; 张进; 莫东松; 张健; 钟宜峰; 赵璐; 王科
Original assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2019-08-28
Filing date: 2019-08-28
Publication date: 2020-01-14

Abstract

The embodiment of the invention relates to the technical field of computers, and discloses a video editing method, which comprises the following steps: acquiring a video to be edited, and acquiring the human face characteristics and the overall characteristics of a target person; determining a video frame of the target person appearing in the video to be edited according to the face characteristics and the overall characteristics of the target person; and editing the video to be edited according to the video frames of the target person appearing in the video to be edited. The video clipping method, the video clipping device and the storage medium disclosed by the embodiment of the invention can improve the accuracy and efficiency of video clipping at the same time.

Description

Video editing method, device and computer storage medium

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a video clipping method, a video clipping device and a storage medium.

Background

Existing video clips are typically manually clipped or clipped using a face recognition based clipping system. Although the accuracy rate of manual editing is high, videos of concerts, movies, sports competitions and the like have the characteristic that character interpenetration is frequent, a large number of target characters and non-target characters appear alternately in a specific time period, and editing personnel are required to have professional visual identification capability and continuous concentration, so that great cost and pressure are brought to manual editing operation, and the editing efficiency is low;

the clipping system based on face recognition can improve clipping efficiency, but has a certain technical boundary, and particularly in scenes with complex conditions, such as live broadcast streams of sunclubs/sports events and the like, the clipping accuracy is influenced due to the fact that the limitation of conditions such as illumination, angles and the like is large and the recognition accuracy is low.

Disclosure of Invention

An object of embodiments of the present invention is to provide a video clipping method, apparatus and storage medium, which can improve the accuracy and efficiency of video clipping at the same time.

To solve the above technical problem, an embodiment of the present invention provides a video editing method, including: acquiring a video to be edited, and acquiring the human face characteristics and the overall characteristics of a target person; determining a video frame of the target person appearing in the video to be edited according to the face characteristics and the overall characteristics of the target person; and editing the video to be edited according to the video frames of the target person appearing in the video to be edited.

Embodiments of the present invention also provide a video editing apparatus, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the video clipping method described above.

Embodiments of the present invention also provide a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described video clipping method.

Compared with the prior art, the embodiment of the invention provides a video editing method, which comprises the steps of obtaining a video to be edited and obtaining the human face characteristics and the overall characteristics of a target person; determining a video frame of the target person appearing in the video to be edited according to the face characteristics and the overall characteristics of the target person; and editing the video to be edited according to the video frames of the target person appearing in the video to be edited. In the embodiment, the video frame of the target person in the video to be edited is determined through the face characteristics and the overall characteristics of the target person, so that the defect that the face characteristics are captured inaccurately due to the fact that the face area in the video frame of the video to be edited is small or the face is not exposed is overcome, and the determined video frame of the target person is more accurate; and the video to be edited is edited according to the determined video frame of the target character in the video to be edited, so that automatic editing is realized, and the efficiency and accuracy of video editing are improved.

In addition, determining a video frame of the target person appearing in the video to be edited according to the face characteristics and the overall characteristics of the target person specifically comprises the following steps: identifying the human face characteristics of the character to be identified in the video frame of the video to be edited; judging whether the face features of the figure to be recognized are successfully matched with the face features of the target figure or not; if the matching is not successful, identifying the overall characteristics of the character to be identified in the video frame of the video to be edited; judging whether the overall characteristics of the figure to be identified and the overall characteristics of the target figure are successfully matched or not; and if the matching is successful, determining the video frame as the video frame of the target person appearing in the video to be clipped. When the face feature matching is unsuccessful in the scheme, whether the person to be identified in the video frame is the target person is determined according to the overall feature value, so that the defect that the face feature capturing is inaccurate due to the fact that the face area in the video frame of the video to be edited is small or the face is not exposed is overcome, and the video frame of the determined target person is more accurate.

In addition, the overall characteristics of the figure to be identified in the video frame of the video to be edited are specifically as follows: segmenting a character to be identified in a video frame of a video to be edited to obtain a sub-image of the character to be identified; and inputting the sub-images into 19 layers of convolutional neural networks to obtain the overall characteristics of the person to be recognized, wherein the last layer in the 19 layers of convolutional neural networks is a full connection layer. The scheme provides a specific implementation mode for acquiring the overall characteristics of the person to be identified.

In addition, the method for clipping the video to be clipped according to the video frame of the target person appearing in the video to be clipped specifically comprises the following steps: determining a video frame successfully matched with the face characteristics of the target person for the first time, and taking the video frame successfully matched for the first time as a first field video frame; determining the interval duration between the previous video frame and the next video frame which are successfully matched with the face characteristics or the overall characteristics of the target person according to the time sequence; if the interval duration is longer than the first preset duration, taking the previous video frame as the last field-withdrawing video frame and taking the next video frame as the next field-exiting video frame; and editing the video to be edited according to the outgoing video frame and the outgoing video frame.

In addition, before the video to be clipped is clipped according to the field-out video frame and the field-back video frame, the method further comprises the following steps: calculating the field-out duration of the target person according to the field-out video frame and the field-out video frame; judging whether the departure time length is greater than a second preset time length or not; and if the field-out duration is greater than the second preset duration, editing the video to be edited according to the field-out video frame and the field-out video frame. According to the scheme, the departure time length of the target character is calculated according to the departure video frame and the departure video frame, so that the segment with the departure time length smaller than the second preset time length is not processed, misjudgment caused by short and temporary flash of the target character in the video is avoided, and the accuracy of editing is further improved.

In addition, determining a video frame with successfully matched face features of the target person for the first time specifically comprises the following steps: acquiring a confidence value of the face features of the to-be-recognized figure successfully matched with the face features of the target figure; judging whether the confidence value is larger than a preset threshold value or not; and determining the video frame with the first confidence value larger than the preset threshold value as the video frame successfully matched for the first time. According to the scheme, the video frame with the first confidence value larger than the preset threshold value is determined as the video frame successfully matched for the first time, so that the accuracy of determining the video frame of the target person which is on the scene for the first time is improved.

In addition, whether the overall characteristics of the figure to be recognized and the overall characteristics of the target figure are successfully matched is judged, and the method specifically comprises the following steps: calculating the difference value between the value of the overall characteristic of the character to be recognized and the value of the overall characteristic of the target character; judging whether the difference value is smaller than a preset difference value or not; and if the difference is smaller than the preset difference, the matching is successful.

In addition, the overall characteristics of the person to be identified include at least: and (4) pixel color characteristics of the person to be recognized in the video frame.

Drawings

One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.

FIG. 1 is a schematic flow diagram of a video clipping method according to a first embodiment of the present invention;

FIG. 2 is a schematic diagram of identifying global features according to a first embodiment of the present invention;

FIG. 3 is a schematic flow chart of a video clipping method according to a second embodiment of the present invention;

fig. 4 is a schematic configuration diagram of a video clipping device according to a third embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.

A first embodiment of the present invention relates to a video clipping method, and the core of the present embodiment is a video clipping method including: acquiring a video to be edited, and acquiring the human face characteristics and the overall characteristics of a target person; determining a video frame of the target person appearing in the video to be edited according to the face characteristics and the overall characteristics of the target person; and editing the video to be edited according to the video frames of the target person appearing in the video to be edited. In the embodiment, the video frame of the target person in the video to be edited is determined through the face characteristics and the overall characteristics of the target person, so that the defect that the face characteristics are captured inaccurately due to the fact that the face area in the video frame of the video to be edited is small or the face is not exposed is overcome, and the determined video frame of the target person is more accurate; and the video to be edited is edited according to the determined video frame of the target character in the video to be edited, so that automatic editing is realized, and the efficiency and accuracy of video editing are improved.

The following describes the implementation details of the video editing method of the present embodiment in detail, and the following is only provided for the convenience of understanding and is not necessary for implementing the present embodiment.

A flow diagram of a video clipping method in the present embodiment is shown in fig. 1:

step 101: and acquiring a video to be edited, and acquiring the human face characteristics and the overall characteristics of the target person.

Specifically, a video to be clipped is acquired first, and a target person to be clipped is determined, where the target person may be one or multiple target persons. In the embodiment, a plurality of target persons are taken as an example, the face features and the overall features of each target person are obtained in advance, specifically, a picture containing the target persons is obtained in advance, and the picture containing the target persons is subjected to face feature detection and extraction to obtain the face features of the target persons; and carrying out overall character characteristics on the picture containing the target character to extract the overall characteristics of the target character. It should be noted that the extraction of the overall feature of the target person may also be performed when a person in the video to be clipped is identified, and the overall feature value of the target person is obtained when the target person in the video to be clipped is determined according to the face feature of the target person, so that the face feature and the overall feature of the target person can be obtained at this time.

Step 102: and identifying the human face characteristics of the character to be identified in the video frame of the video to be edited.

Specifically, firstly, detecting and positioning the face of each frame of person to be recognized in a video by using an open-source transient chaotic neural network (mtcnn), determining whether the current video frame of the video frame to be edited contains the face, and further detecting the specific position of the face and the position of a bounding box if the current video frame of the video frame to be edited contains the face; secondly, affine transformation is carried out on the face image in the bounding box by using a Procrustes analysis method (Procrustes) to align the face, wherein the Procrustes analysis method is used for automatically positioning face features such as eyes, nose tips, mouth corner points, eyebrows and contour points of all parts of the face according to the input face image; and finally, extracting the positioned face features (represented as 128-dimensional vectors) by using a Residual Neural Network (ResNet), so as to obtain the face features of the persons to be identified in the current video frame of the video to be edited, and if the current video frame contains a plurality of target persons, obtaining the face features of each target person in the current video frame. Optionally, the recognition result of the facial features of each frame of the video to be edited can be stored in the cache database for subsequent use.

It is worth to be noted that, in the embodiment, the open-source transient chaotic neural network (mtcnn) is used for face detection, and compared with the conventional face detection method, the open-source transient chaotic neural network (mtcnn) is more suitable for complex face scene detection under various natural conditions.

Step 103: and judging whether the face features of the person to be recognized are successfully matched with the face features of the target person. If yes, go to step 106; if the determination is no, the process proceeds to step 104.

Specifically, the face features of video frames of a video to be edited are acquired from a buffer database, the face features of a person to be recognized in each video frame are matched with the face features of a target person acquired in advance, specifically, the difference between the value of the face feature of the person to be recognized and the value of the face feature of the target person can be calculated through a distance algorithm (european style, cosine, and the like), whether the face feature of the person to be recognized is matched with the face feature of the target person is judged by judging whether the difference is smaller than a preset threshold value, if so, the person to be recognized in the current video frame is determined to be the target person, that is, the target person appears in the video frame, and then, the step 106 is performed, and the video frame is determined to be the video frame in which the target person appears; if not, it may be determined that the to-be-identified person does not appear in the current video frame, or the to-be-identified person appears, but the to-be-identified person does not expose the face or only exposes a part of the face, at this time, it may not be determined whether the target person appears in the video frame according to the face features, at this time, step 104 is performed, the overall features of the to-be-identified person in the video frame of the to-be-clipped video are identified, and it is determined whether the to-be-identified person in the video frame is the target person according to the overall feature value, so as to make up for the defect that the face feature capture is inaccurate due to a small face area or no face exposed in the video frame of the to-be-clipped video, and make the video frame. The preset threshold represents that the human face features of the people to be recognized are close to the human face features of the target people, and the preset threshold can be set according to actual requirements.

Step 104: and identifying the overall characteristics of the character to be identified in the video frame of the video to be edited.

In this embodiment, the identifying of the overall characteristics of the character to be identified in the video frame of the video to be edited specifically includes: segmenting a character to be identified in a video frame of a video to be edited to obtain a sub-image of the character to be identified; and inputting the sub-images into 19 layers of convolutional neural networks to obtain the overall characteristics of the person to be recognized, wherein the last layer in the 19 layers of convolutional neural networks is a full connection layer.

Specifically, the overall features of the person to be recognized in the present embodiment at least include: and the pixel color characteristics of the person to be identified in the video frame represent the clothes color of the person to be identified. It is understood that the overall characteristics of the person to be recognized include not only pixel color characteristics but also: height features, body type features, posture features, and the like. In the embodiment, a Full Convolutional Network (FCN) is used for carrying out role semantic recognition on a video containing a person to be recognized, sub-images of the person to be recognized in a video frame are segmented and extracted, the extracted sub-images are sent to a 19-layer Convolutional neural network (VGG-19) which is trained in advance, the VGG-19 network shown in the embodiment replaces the last softmax classification layer with a full connection layer which is output as a 64-dimensional vector, and the sub-images are input into the 19-layer Convolutional neural network, so that the overall characteristic f of the person to be recognized can be obtained_bady64. Fig. 2 is a schematic diagram illustrating the overall features of the person to be recognized in the video frame. Optionally, the recognition result of the overall characteristics of the video frames of the video to be edited can be stored in a cache database for subsequent use.

It is worth mentioning that before inputting the sub-image into the 19-layer convolutional neural network, the position vector (pos) of the person to be recognized in the sub-image can be obtained_x,pos_y) And further identifying the overall characteristics of the to-be-identified person in the sub-images, thereby improving the confidence value of the overall characteristics of the to-be-identified person.

Step 105: and judging whether the overall characteristics of the character to be identified are successfully matched with the overall characteristics of the target character. If yes, go to step 106; if the determination is no, the flow ends.

In this embodiment, the determining whether the overall characteristic of the character to be recognized is successfully matched with the overall characteristic of the target character specifically includes: calculating the difference value between the value of the overall characteristic of the character to be recognized and the value of the overall characteristic of the target character; judging whether the difference value is smaller than a preset difference value or not; and if the difference is smaller than the preset difference, the matching is successful.

Specifically, the overall characteristics of the character to be recognized are obtained from the buffer database, the difference value between the value of the overall characteristics of the character to be recognized and the value of the overall characteristics of the target character is calculated, if the difference value is smaller than a preset difference value, the overall characteristics of the character to be recognized in the video frame are determined to be matched with the overall characteristics of the target character, so that the character to be recognized in the video frame can be determined to be the target character, at this moment, the step 106 is entered, and the video frame is determined to be the video frame of the target character appearing in the video to be edited; if the difference is not smaller than the preset difference, determining that the overall characteristics of the to-be-identified person in the video frame are not matched with the overall characteristics of the target person, so that the to-be-identified person in the video frame can be determined to be not the target person, and ending the process. The preset difference value represents that the face features of the people to be recognized are close to the face features of the target people, and the preset threshold value can be set according to actual requirements.

Step 106: and determining the video frame as the video frame of the target person appearing in the video to be edited.

Step 107: and determining a video frame successfully matched with the face characteristics of the target person for the first time, and taking the video frame successfully matched for the first time as a first field video frame.

In this embodiment, after the video frame in which the target person appears is determined, the video frame in which the facial features of the target person are successfully matched for the first time is determined according to the time on the video frame, and the video frame is used as the video frame in which the target person appears for the first time.

Further, in this embodiment, determining a video frame successfully matched with the target person for the first time in the video to be edited specifically includes: acquiring a confidence value of the face features of the to-be-recognized figure successfully matched with the face features of the target figure; judging whether the confidence value is larger than a preset threshold value or not; and determining the video frame with the first confidence value larger than the preset threshold value as the video frame successfully matched for the first time. According to the scheme, the video frame with the first confidence value larger than the preset threshold value is determined as the video frame successfully matched for the first time, so that the accuracy of determining the video frame of the target person which is on the scene for the first time is improved.

Step 108: and determining the interval duration between the previous video frame and the next video frame, which are successfully matched with the face features or the overall features of the target person, according to the time sequence.

Step 109: and judging whether the interval duration is greater than a first preset duration. If yes, go to step 110; if the determination is no, the process returns to step 108.

Step 110: and taking the previous video frame as the last field-withdrawing video frame, and taking the next video frame as the next field-withdrawing video frame.

Step 111: and editing the video to be edited according to the outgoing video frame and the outgoing video frame.

Specifically, for the above steps 108 to 111, the interval duration between the previous video frame and the next video frame, in which the face features or the overall features of the target person are successfully matched, is determined frame by frame according to the time sequence, and if the interval duration is greater than a first preset duration, the target person in the video frames is represented to disappear for a period of time, and the target person can be considered to have exited, at this time, the previous video frame can be used as the previous exiting video frame, and the next video frame can be used as the next exiting video frame; and obtaining a plurality of outgoing video frames and outgoing video frames until the video frames are completely processed, wherein the video to be edited can be edited according to the obtained outgoing video frames and outgoing video frames, starting from the first outgoing video frame of the character to be identified, the nearest outgoing video frame is the first outgoing video frame of the character to be identified, and the first-appearing video segment of the target character can be obtained according to the first outgoing video frame and the first outgoing video frame.

A second embodiment of the present invention relates to a video clipping method. The second embodiment is an improvement of the first embodiment, and is mainly characterized in that the appearance time length of the target person is calculated according to the appearance video frame and the field-returning video frame, so that the segment with the appearance time length smaller than the second preset time length is not processed, the misjudgment caused by the short and temporary flashing of the target person in the video is avoided, and the accuracy of the clipping is further improved.

A flow diagram of the video clipping method in this embodiment is shown in fig. 3, and specifically includes:

step 201: and acquiring a video to be edited, and acquiring the human face characteristics and the overall characteristics of the target person.

Step 202: and identifying the human face characteristics of the character to be identified in the video frame of the video to be edited.

Step 203: and judging whether the face features of the person to be recognized are successfully matched with the face features of the target person. If yes, go to step 206; if the determination is no, the process proceeds to step 204.

Step 204: and identifying the overall characteristics of the character to be identified in the video frame of the video to be edited.

Step 205: and judging whether the overall characteristics of the character to be identified are successfully matched with the overall characteristics of the target character. If yes, go to step 206; if the determination is no, the flow ends.

Step 206: and determining the video frame as the video frame of the target person appearing in the video to be edited.

Step 207: and determining a video frame successfully matched with the face characteristics of the target person for the first time, and taking the video frame successfully matched for the first time as a first field video frame.

Step 208: and determining the interval duration between the previous video frame and the next video frame, which are successfully matched with the face features or the overall features of the target person, according to the time sequence.

Step 209: and judging whether the interval duration is greater than a first preset duration. If yes, go to step 210; if the determination is no, the process returns to step 208.

Step 210: and taking the previous video frame as the last field-withdrawing video frame, and taking the next video frame as the next field-withdrawing video frame.

Steps 201 to 210 in this embodiment are the same as steps 101 to 110 in the first embodiment, and are not described again in this embodiment to avoid repetition.

Step 211: and calculating the field-out time length of the target person according to the field-out video frame and the field-out video frame.

Step 212: and judging whether the field time is greater than a second preset time. If yes, go to step 213; if the determination is no, the process proceeds to step 214.

Step 213: and editing the video to be edited according to the outgoing video frame and the outgoing video frame.

Step 214: and discarding the out-field video frame and the out-field video frame.

Specifically, in the foregoing steps 211 to 214, the departure duration of the target person is calculated according to the departure video frame of the person to be identified and the nearest departure video frame after the departure video frame, and it is determined whether the departure duration is greater than a second preset duration, if so, the duration representing the appearance of the target person is longer, and the target person is likely to be a leading role of the video segment, at this time, the segment of the person to be identified in the video to be clipped is clipped according to the departure video frame and the departure video frame. If the departure time length is not greater than the second preset time length, the time length representing the appearance of the target person is short, and the target person is likely to flash only in the video segment for a short time, but not in the leading role of the video segment at this time, the departure video frame and the departure video frame are discarded, and the step 211 is repeated to execute until all the departure video frames and the departure video frames are processed.

Compared with the prior art, the video clipping method in the embodiment of the invention provides a method for clipping a video to be clipped according to a live video frame and a receding video frame, which further comprises the following steps: calculating the field-out duration of the target person according to the field-out video frame and the field-out video frame; judging whether the departure time length is greater than a second preset time length or not; and if the field-out duration is greater than the second preset duration, editing the video to be edited according to the field-out video frame and the field-out video frame. According to the scheme, the departure time length of the target character is calculated according to the departure video frame and the departure video frame, so that the segment with the departure time length smaller than the second preset time length is not processed, misjudgment caused by short and temporary flash of the target character in the video is avoided, and the accuracy of editing is further improved.

The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.

A third embodiment of the invention relates to a video clipping device, as shown in fig. 4, comprising at least one processor 301; and, a memory 302 communicatively coupled to the at least one processor 301; the memory 302 stores instructions executable by the at least one processor 301, the instructions being executable by the at least one processor 301 to enable the at least one processor 301 to perform the video clipping method described above.

Where the memory 302 and the processor 301 are coupled in a bus, the bus may comprise any number of interconnected buses and bridges, the buses coupling one or more of the various circuits of the processor 301 and the memory 302. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 301 is transmitted over a wireless medium through an antenna, which further receives the data and transmits the data to the processor 301.

The processor 301 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 302 may be used to store data used by processor 301 in performing operations.

That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims

1. A video clipping method, comprising:

acquiring a video to be edited, and acquiring the human face characteristics and the overall characteristics of a target person;

determining a video frame of the target person appearing in the video to be edited according to the face feature and the overall feature of the target person;

and clipping the video to be clipped according to the video frames of the target person appearing in the video to be clipped.

2. The video clipping method according to claim 1, wherein the determining a video frame of the target person appearing in the video to be clipped according to the facial feature and the overall feature of the target person specifically comprises:

identifying the human face characteristics of the figure to be identified in the video frame of the video to be edited;

judging whether the face features of the people to be recognized are successfully matched with the face features of the target people or not;

if the matching is not successful, identifying the overall characteristics of the character to be identified in the video frame of the video to be edited;

judging whether the overall characteristics of the figure to be identified and the overall characteristics of the target figure are successfully matched or not;

and if the matching is successful, determining the video frame as the video frame of the target person in the video to be clipped.

3. The video clipping method according to claim 2, wherein the identifying of the overall features of the person to be identified in the video frame of the video to be clipped specifically comprises:

segmenting the character to be identified in the video frame of the video to be edited to obtain a sub-image of the character to be identified;

and inputting the sub-images into 19 layers of convolutional neural networks to obtain the overall characteristics of the person to be recognized, wherein the last layer in the 19 layers of convolutional neural networks is a full connection layer.

4. The video clipping method according to claim 2, wherein the clipping the video to be clipped according to the video frame of the target person appearing in the video to be clipped specifically comprises:

determining a video frame successfully matched with the face characteristics of the target person for the first time, and taking the video frame successfully matched for the first time as a first-time video frame of the scene;

determining the interval duration between the previous video frame and the next video frame, which are successfully matched with the face features or the overall features of the target person, according to the time sequence;

if the interval duration is greater than a first preset duration, taking the previous video frame as a last field-withdrawing video frame, and taking the next video frame as a next field-leaving video frame;

and clipping the video to be clipped according to the field-out video frame and the field-back video frame.

5. The video clipping method according to claim 4, wherein before clipping the video to be clipped according to the out-field video frame and the out-field video frame, further comprising:

calculating the departure time length of the target person according to the departure video frame and the departure video frame;

judging whether the departure time length is greater than a second preset time length or not;

and if the field-out duration is greater than the second preset duration, the video to be clipped is clipped according to the field-out video frame and the field-out video frame.

6. The video clipping method according to claim 4, wherein the determining of the video frame in which the first matching of the facial features of the target person is successful specifically includes:

obtaining a confidence value of the face features of the to-be-recognized person successfully matched with the face features of the target person;

judging whether the confidence value is larger than a preset threshold value or not;

and determining the first video frame with the confidence value larger than the preset threshold value as the video frame successfully matched for the first time.

7. The video clipping method according to claim 2, wherein the determining whether the overall feature of the person to be identified and the overall feature of the target person are successfully matched is specifically:

calculating the difference value between the value of the overall characteristic of the character to be recognized and the value of the overall characteristic of the target character;

judging whether the difference value is smaller than a preset difference value or not;

and if the difference is smaller than the preset difference, the matching is successful.

8. The video clipping method according to claim 2, wherein the overall characteristics of the person to be identified include at least: and the character to be identified has the pixel color characteristics in the video frame.

9. A video clipping apparatus, comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the video clipping method of any of claims 1 to 8.

10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the video clipping method of any one of claims 1 to 8.