WO2021003825A1

WO2021003825A1 - Video shot cutting method and apparatus, and computer device

Info

Publication number: WO2021003825A1
Application number: PCT/CN2019/103528
Authority: WO
Inventors: 雷晨雨
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-07-11
Filing date: 2019-08-30
Publication date: 2021-01-14
Also published as: CN110430443A; CN110430443B

Abstract

Disclosed are a video shot cutting method and apparatus, and a computer device, relating to the technical field of computers, and being able to solve the problems of a cumbersome cutting operation, a low efficiency, and time and labor being consumed during video cutting by using a manual software tool. The method comprises: extracting each single frame of a picture in a video to be cut; screening, on the basis of a variance change value, candidate frames of pictures from the single frame of a picture; determining, by using a target detection algorithm, all shot cut frames of pictures included in the candidate frames of pictures; and cutting said video into a plurality of video clips according to the shot cut frames of pictures. The present application is applicable to automatic splitting for video fragments under different shot scenarios.

Description

Method, device and computer equipment for cutting video lens

Technical field

This application claims priority with the Chinese patent application filed on July 11, 2019 with the Chinese Patent Office, the application number is 2019106249186, and the application name is "Method, Apparatus and Computer Equipment for Cutting Video Shots", the entire contents of which are incorporated by reference In application.

Background technique

Shot switching is a very important step in video editing. It is not only required for the narrative composition or artistic expression of TV programs, but also for the audience to watch. Generally, in long videos such as sports games or TV programs, it is often necessary to switch shots more frequently, and then it is necessary to cut this long video into multiple video clips of a single shot scene. With the improvement of people's living standards, the quality requirements for viewing entertainment items are becoming more and more stringent. Therefore, how to strengthen the video cutting technology to make the video editing better meet the consumer's user experience is particularly important in the current environment .

At present, this kind of video cutting work is generally done manually using video cutting software, and this cutting method is usually troublesome, cutting efficiency is low, and time-consuming and labor-intensive.

Summary of the invention

In view of this, the present application discloses a method, device and computer equipment for cutting video footage. The main purpose is to solve the problem of cumbersome, inefficient and time-consuming cutting operations when using manual software tools to cut video. problem.

According to one aspect of the present application, there is provided a method for cutting a video shot, the method including:

Extract each single frame picture in the video to be cut;

Filtering out candidate frame pictures from the single frame pictures based on the variance change value;

Using a target detection algorithm to determine all shot switching frame pictures included in the candidate frame picture;

The video to be cut is cut into multiple video clips according to the shot switching frame picture.

According to another aspect of the present application, there is provided an apparatus for cutting a video lens, the apparatus including:

The extraction module is used to extract each single frame picture in the video to be cut;

A screening module, configured to screen out candidate frame pictures from the single frame pictures based on the variance change value;

A determining module, configured to determine all shot switching frame pictures included in the candidate frame picture by using a target detection algorithm;

The cutting module is configured to cut the to-be-cut video into multiple video clips according to the shot switching frame picture.

According to yet another aspect of the present application, there is provided a non-volatile readable storage medium having computer readable instructions stored thereon, and the computer readable instructions are executed by a processor to implement the above-mentioned video shot cutting method.

According to another aspect of the present application, there is provided a computer device, including a non-volatile readable storage medium, a processor, and a computer-readable storage medium that is stored on the non-volatile readable storage medium and can run on the processor. Instructions, when the processor executes the computer-readable instructions, the video shot cutting method is implemented.

With the above technical solutions, the method, device and computer equipment for cutting video shots provided by this application are compared with the current way of using manual software tools for video cutting. This application can extract video from the video to be cut. Each single frame picture is selected; based on the variance change value, the candidate frame picture is initially selected from the single frame picture; then the target detection algorithm is used to determine each adjacent candidate frame with large differences, so as to determine the shot switching frame from the candidate frame pictures Picture; finally, the video to be cut is automatically cut into multiple video clips according to the camera switch frame picture. With the technical solution in this application, it is possible to automatically extract the shot switching frame from the video to be cut according to the variance calculation result and the detection result of the yolo target detection model, and complete the cutting of the video to be cut at the shot switching frame. It avoids detection errors that are easy to occur during manual detection, and effectively improves the detection accuracy of lens switching frames and the efficiency of lens cutting.

Description of the drawings

The drawings described here are used to provide a further understanding of the application and constitute a part of the application. The exemplary embodiments and descriptions of the application are used to explain the application, and do not constitute an improper limitation of the local application. In the attached picture:

FIG. 1 shows a schematic flowchart of a method for cutting a video shot provided by an embodiment of the present application;

FIG. 2 shows a schematic flowchart of another video shot cutting method provided by an embodiment of the present application;

FIG. 3 shows a schematic structural diagram of a video lens cutting device provided by an embodiment of the present application;

Fig. 4 shows a schematic structural diagram of another video lens cutting device provided by an embodiment of the present application.

Detailed ways

Hereinafter, the application will be described in detail with reference to the drawings and in conjunction with embodiments. It should be noted that the embodiments in this application and the features in the embodiments can be combined with each other if there is no conflict.

In view of the problems of cumbersome, inefficient and time-consuming and labor-consuming cutting operations when using manual software tools for video cutting, an embodiment of the present application provides a method for cutting a video shot, as shown in FIG. Methods include:

101. Extract each single frame picture in the video to be cut.

In a specific application scenario, in order to facilitate the precise cutting of the cut video, the pre-cut video to be cut must be shown for at least three minutes. The first step of performing the cutting operation is to extract each single frame of pictures from the to-be-cut video, so as to determine all the shot switching frames contained in the to-be-cut video by comparing and analyzing each single frame of pictures.

102. Filter candidate frame pictures from a single frame picture based on the variance change value.

In a specific application scenario, because the size of the picture variance value can show the degree of fluctuation of the pixels in the picture, the two phases can be initially determined by calculating the variance change difference between each single frame picture and the adjacent single frame picture. The change of the high frequency part of the pixel in the adjacent single frame picture. Among them, the greater the variance change value, the greater the fluctuation of the pixel point. It is further determined that different pixel aggregation points appear in the two single-frame pictures, and the single-frame picture can be preliminarily determined as a candidate frame picture and removed at the same time. The non-shot-switching frame pictures determined by the difference of variance change are small, so that all the retained single-frame pictures are candidate frame pictures, so as to perform finer screening.

103. Use a target detection algorithm to determine all shot switching frame pictures included in the candidate frame picture.

Among them, in this embodiment, the target detection algorithm uses the yolo target detection method, that is, the detection task of the connected components in the candidate frame picture is treated as a regression problem, and the detection is directly obtained through all the pixels of the entire picture. The coordinates of the bounding box and the bounding box contain the confidence of the object and the conditional category probability. The position coordinates of each bounding box are (x, y, w, h), x and y represent the coordinates of the center point of the bounding box, and w and h represent the width and height of the bounding box. Through yolo to detect the target, by recognizing the picture, it is possible to judge which objects and the positions of these objects are in the candidate frame picture.

104. Cut the to-be-cut video into multiple video clips according to the camera switching frame pictures.

In a specific application scenario, after all the shot switching frame pictures are determined, the video to be cut can be automatically cut, and then multiple video clips in a single shot scene can be obtained.

Through the method of cutting the video shot in this embodiment, each single frame picture can be extracted from the video to be cut; the candidate frame picture is initially selected from the single frame picture based on the variance change value; then the target detection algorithm is used to determine the existence Adjacent candidate frames with large differences are used to determine the shot switching frame picture from the candidate frame pictures; finally, the video to be cut is automatically cut into multiple video clips according to the shot switching frame picture. With the technical solution in this application, it is possible to automatically extract the shot switching frame from the video to be cut according to the variance calculation result and the detection result of the yolo target detection model, and complete the cutting of the video to be cut at the shot switching frame. It avoids detection errors that are easy to occur during manual detection, and effectively improves the detection accuracy of lens switching frames and the efficiency of lens cutting.

Further, as a refinement and extension of the specific implementation of the foregoing embodiment, in order to fully describe the specific implementation process in this embodiment, another method of video shot cutting is provided. As shown in FIG. 2, the method includes:

201. Extract each single frame picture in the video to be cut.

In a specific application scenario, since a single frame of video in the scene switching process has a transformation process, this process can be divided into two categories according to the transformation duration: fast camera switching and slow camera switching. Among them, determining the speed of lens switching can be determined by the number of different single-frame pictures played by the lens per second. When the number of different single-frame pictures played per second is greater than the screen transition threshold, it means that the camera will play within one second. The video segment is a fast camera switch, otherwise it means a slow camera switch.

In this embodiment, for the fast scene switching scene, since the conversion speed of different single-frame pictures is relatively fast, the pictures corresponding to each continuous frame in the video to be cut can be extracted as the waiting in this embodiment. For the analyzed single frame picture, continue to perform the analysis and cut operation in steps 202 to 214 of the embodiment.

Correspondingly, as a preferred way, for slow scene switching scenes, because the conversion speed of different single-frame pictures is slow, it will lead to a situation where multiple continuous single-frame pictures have little change. In order to reduce the amount of calculation, you can A sampling frequency (greater than 20 frames) is set, the pictures are sparsely sampled through the sampling frequency, and a sampled picture is acquired in each sampling period as a single-frame picture to be analyzed in this embodiment. For example, in combination with the actual situation, in this solution, the sampling frequency of a single frame picture can be set to 32, and the picture can be sparsely sampled by the sampling frequency to reduce the amount of calculation. If a video frame has 300 frames, the 0th frame, the 32nd frame, the 32*2 frame, the 32*3 frame, the 32*4 frame, etc. can be extracted according to the sampling frequency as the single pictures in this embodiment. Frame picture.

202. Scale each single frame picture to a preset size.

In a specific application scenario, in order to facilitate the unified analysis of the extracted single-frame pictures and to ensure the accuracy of the analysis, the single-frame pictures can be processed into a uniform format and size. In this embodiment, in order to meet the needs, Set the preset size to 256*256. When a single frame image is obtained, each single frame image needs to be scaled to a pixel size of 256*256.

203. Perform grayscale processing on the zoomed single frame picture.

Correspondingly, since the single-frame pictures extracted from the video to be cut are mostly color images, they all adopt the RGB color mode. In order to eliminate the interference of irrelevant information in the single-frame picture on image detection and enhance the detectability of related information, And to simplify the data to the greatest extent, it is necessary to perform gray-scale processing on the single-frame image to be recognized in the initial processing of the single-frame image, so as to ensure the reliability of image detection.

204. Calculate the variance values of all pixels in each single frame picture.

For this embodiment, the formula for calculating the variance of each single frame picture is:

Among them, S(t) is the variance value of each single frame picture, xi is the gray value of each pixel in the single frame picture,

Is the average gray value of all pixels in a single frame of picture, and n is the total number of pixels contained in a single frame of picture participating in the variance comparison.

205. Calculate the variance change value between each single frame picture and the corresponding single frame picture of the next frame.

In a specific application scenario, since the variance change between each single frame picture and the next single frame picture adjacent to each other can be used to preliminarily determine the changes in the high frequency part of the pixels in two adjacent single frame pictures. Therefore, by calculating the variance change value, the size of the change between the current single frame picture and the next frame picture can be preliminarily determined, so as to distinguish whether the current single frame picture is a non-shot switching frame picture or a candidate frame picture.

206. If it is determined that the variance change value is less than the first preset threshold, determine that the single frame picture is a non-shot switching frame picture.

Wherein, the first preset threshold is a minimum variance change value used to determine that the current single frame picture is a candidate frame picture.

Correspondingly, for this embodiment, if it is determined that the variance change value between the current single frame picture and the corresponding next single frame picture is less than the first preset threshold, it can be described that the current single frame picture and the next single frame picture are different from each other. If the difference between the changes is not obvious, it can be determined that there is no shot scene transition between the current frame and the next frame in the video to be cut, so there is no need to cut, and the current single frame picture can be determined as a non-shot switch Frame the picture and then filter it out.

For example, the variance value of the current single frame picture is calculated as S(t), the variance value corresponding to the next frame single frame picture is S(t+1), and the first preset threshold is set to N1. : |S(t)-S(t+1)|<N1, it can be determined that the current single frame picture is a non-shot switching frame picture.

207. If it is determined that the variance change value is greater than or equal to the first preset threshold, determine that a single frame picture is a candidate frame picture.

In a specific application scenario, for this embodiment, if it is determined that the variance change value between the current single frame picture and the corresponding next frame single frame picture is greater than or equal to the first preset threshold, it can indicate that the current single frame picture and the next frame The difference in changes between a single frame of pictures is relatively large, and whether the two are the same scene scene still needs to be accurately determined in the next step, so the current single frame picture can be saved as a candidate frame picture to be subjected to the next step of comparison and detection.

For example, the variance value of the current single frame picture is calculated as S(t), the variance value corresponding to the next frame single frame picture is S(t+1), and the first preset threshold is set to N1. : |S(t)-S(t+1)|≥N1, it can be determined that the current single frame picture is a candidate frame picture.

208. A target detection model whose training result meets a preset standard is obtained based on the target detection algorithm training.

For this embodiment, in a specific application scenario, step 208 of the embodiment may specifically include: collecting multiple single-frame pictures as sample images; labeling the position coordinates and category information of each connected component in the sample image; The sample image is used as the training set and input into the initial target detection model created in advance based on the yolo target detection algorithm; the initial target detection model is used to extract the image features of various connected components in the sample image, and based on the image features to generate the suggestion window of each connected component and The suggestion window corresponds to the conditional category probabilities of various connected components; the connected component category with the largest conditional category probability is determined as the category recognition result of the connected components in the suggestion window; if it is determined that the confidence of all suggestion windows is greater than the second preset threshold, and If the category recognition result matches the labeled category information, it is determined that the initial target detection model has passed the training; if it is determined that the initial target detection model has not passed the training, the location coordinates and category information of each connected component labeled in the sample image are used to modify the training initial target detection Model so that the judgment result of the initial target detection model meets the preset standard.

Among them, the confidence degree is used to determine whether there is an object in the recognition detection frame and the probability of the existence of the object. The calculation formula is:

Pr(Object) is used to identify whether there is an object in the detection frame, Pr(Object)∈{0,1}, when Pr(Object)=0, it means that the detection frame does not contain an object, then the confidence = 0 means that the object is not recognized; when Pr(Object)=1, it means that the detection frame contains objects, and the value of the confidence is the intersection ratio

It is the overlap ratio between the detected candidate bound and the ground truth bound, that is, the ratio of their intersection and union. The ideal situation is complete overlap, that is, the ratio is 1. The second preset threshold is a criterion used to evaluate whether the initial target detection model has passed the training. The confidence that is determined to be non-zero is compared with the second preset threshold. When the confidence is greater than the second preset threshold, the initial target is determined The detection model passes the training, otherwise it fails the training. Since the value of the confidence is between 0 and 1, the maximum value of the second preset threshold is set to 1. The larger the second preset threshold is, the more accurate the model training is. The specific value is set Can be determined according to application standards. The category information is the category that contains connected components in the video to be cut, such as people of different body shapes and appearances, fixed buildings, equipment, etc. In specific application scenarios, different settings to be recognized can be set according to the actual video recording scene Category information. The initial target detection model is created in advance according to the design needs. The difference from the target detection model is: the initial target detection model is only initially created, it fails the model training, and does not meet the preset standards, while the target detection model refers to the model training , Which has reached the preset standard and can be applied to the detection of connected components in each single frame picture.

In specific application scenarios, confidence is for each suggestion window, and conditional class probability information is for each grid, that is, the probability of each object in each suggestion window corresponding to each category, such as training recognition a , B, c, d, e five categories, according to the confidence to determine that the suggested window A contains objects, then predict the conditional category probabilities of the suggested window A corresponding to the five categories a, b, c, d, e, such as the prediction result Respectively: 80%, 55%, 50%, 37%, 15%, the category a with the highest conditional category probability is judged as the recognition result, it is necessary to verify whether the object category actually calibrated in the detection frame is category a, if it is a category, it is determined that the initial target detection model recognizes the category information in this suggestion window is correct. When it is determined that the confidence of all the recognized suggestion windows is greater than the second preset threshold, and the category recognition result matches the labeled category information, it is determined that the initial target detection model has passed the training.

209. Input the candidate frame picture into the target detection model, and obtain first detection data information corresponding to the candidate frame picture.

Wherein, the first detection information is the category and quantity of all connected components contained in the candidate frame picture, and data information such as position information, height, and width corresponding to each connected component.

210. Input the next single frame picture corresponding to the candidate frame picture into the target detection model, and obtain the second detection data information corresponding to the next single frame picture.

Among them, the next single frame picture is a single frame picture corresponding to the next frame of the current candidate frame picture in the video to be cut, and the next single frame picture may be a non-shot switching frame picture or a candidate frame picture. The second detection data information is the category and quantity of all connected components contained in the next single frame picture, and data information such as position information, height, and width corresponding to each connected component.

211. If it is determined that the first detection data information and the second detection data information do not contain the same connected component, determine that the candidate frame picture is a shot switching frame picture.

In a specific application scenario, for this embodiment, if it is determined that the first detection data information and the second detection data information do not contain the same connected component, it can be explained that the current candidate frame picture and the corresponding next single frame picture are in two. A completely different shot scene, that is, it is determined that a shot scene switching occurs between the candidate frame and the next frame, so the current candidate frame picture is retained as the shot switching frame picture. Conversely, if it is determined that the first detection data information and the second detection data information contain at least one same connected component, it can be determined that the current candidate frame picture is a non-shot switching frame picture, and the candidate frame is filtered out.

212. If it is determined that the first detection data information and the second detection data information contain the same connected component, calculate the difference value of the same connected component.

In a specific application scenario, for this embodiment, step 212 of the embodiment may specifically include: calculating a first difference value based on the position coordinate information of the same connected component in the first detection data information and the second detection data information; The height and width information of the same connected component in the data information and the second detected data information calculate the second difference value.

For example, it is detected that the current candidate frame picture and the corresponding next frame single frame picture contain two identical connected components, and the corresponding two connected components are: s1, s2, and the size of s1 is obtained through the first detection data information The sum position data is {x1, y1, w1, h1}, and the size and position data of s2 obtained through the second detection data information is: {x2, y2, w2, h2}. Among them, x1 and y1 are respectively the position coordinate information of s1 in the current candidate frame picture, x2 and y2 are respectively the position coordinate information of s2 in the next single frame picture, w1 and h1 are the width and height of s1 respectively, w2 h2 is the width and height of s2 respectively. Then the first difference value can be calculated: d1=(x1-x2)^2+(y1-y2)^2; the second difference value is: d2=(w1-w2)^2+(h1-h2)^ 2.

213. When the difference value meets the preset condition, determine that the candidate frame picture is a shot switching frame picture.

Correspondingly, for this embodiment, step 213 of the embodiment may specifically include: if the first difference value and/or the second difference value is greater than the third preset threshold, determining that the candidate frame picture is a shot switching frame picture.

The preset condition is that at least one of the first difference value and the second difference value is greater than the third preset threshold, and the third preset threshold is the smallest difference value used to determine that the candidate frame picture is the shot switching frame picture, and the specific value is Can be set according to the actual situation.

For example, based on the example in step 212 of the embodiment, the first difference value is calculated as d1, the second difference value is d2, and the third preset threshold is set to N2. If it is determined that d1>N2 or d2>N2 or d1 , D2>N2, it can be determined that the candidate frame picture is a shot switching frame picture.

214. Cut the to-be-cut video into multiple video clips according to the shot switching frame picture.

In a specific application scenario, for this embodiment, step 214 of the embodiment may specifically include: determining a shot switching frame corresponding to each shot switching frame picture; and cutting the video to be cut at the shot switching frame.

For example, all the single-frame picture sequences extracted from the video to be cut are: [t0,...,tn], if it is determined that the shot switching frame corresponding to the extracted shot switching frame picture is: tx1, tx2, ..., txm, And (t0<tx1<tx2<...<txm<tn). The video to be cut can be cut into [t0, tx1], [tx1+1, tx2], ... [txm+1, tn] video segments, where each video segment is a single shot segment.

Through the above method of video shot cutting, each single frame picture can be extracted from the video to be cut; after preprocessing each single frame picture, calculate the distance between each single frame picture and the corresponding next single frame picture When the variance change value is greater than the first preset threshold, it is determined that the single frame picture is a candidate frame picture. After all the candidate frame pictures are extracted, the candidate frame picture is compared with the corresponding next frame based on the yolo target detection algorithm When the difference degree of the connected components of a single frame picture is large, the candidate frame picture can be determined as the shot switching frame picture; finally, the to-be-cut video is cut at the shot switching frame corresponding to the shot switching frame picture. In this embodiment, through the double detection of the lens switching frames, all the lens switching frames included in the video to be cut can be accurately and efficiently determined, thereby realizing accurate cutting of each single lens scene, improving the cutting efficiency while , It also reduces the labor cost of video cutting.

Further, as a specific embodiment of the method shown in FIG. 1 and FIG. 2, an embodiment of the present application provides a device for cutting a video shot. As shown in FIG. 3, the device includes: an extraction module 31, a screening module 32, and a determination Module 33, cutting module 34.

The extraction module 31 is used to extract each single frame picture in the video to be cut;

The screening module 32 is used for screening candidate frame pictures from a single frame picture based on the variance change value;

The determining module 33 is configured to determine all shot switching frame pictures included in the candidate frame pictures by using a target detection algorithm;

The cutting module 34 is used to cut the to-be-cut video into multiple video clips according to the camera switching frame pictures.

In a specific application scenario, in order to eliminate interference and improve the detection accuracy of a single frame picture, as shown in FIG. 4, the device further includes a scaling module 35 and a processing module 36.

The zoom module 35 is used to zoom each single frame picture to a preset size;

The processing module 36 is used to perform grayscale processing on the scaled single frame picture.

Correspondingly, in order to filter out candidate frame pictures from a single frame picture based on the variance change value, the filtering module 32 is specifically used to calculate the variance value of all pixels in each single frame picture; calculate each single frame picture and the corresponding next frame The variance change value between single frames of pictures; if it is determined that the variance change value is less than the first preset threshold, then the single frame picture is determined to be a non-shot switching frame picture; if the variance change value is determined to be greater than or equal to the first preset threshold, then it is determined A single frame picture is a candidate frame picture.

In a specific application scenario, in order to use the target detection algorithm to determine all the shot switching frame pictures included in the candidate frame picture, the determining module 33 is specifically used to train the target detection algorithm based on the target detection algorithm to obtain a target detection model whose training result meets the preset standard; The candidate frame picture is input into the target detection model to obtain the first detection data information corresponding to the candidate frame picture; the next single frame picture corresponding to the candidate frame picture is input into the target detection model, and the second frame picture corresponding to the next single frame picture is obtained Detection data information; if it is determined that the first detection data information and the second detection data information do not contain the same connected component, it is determined that the candidate frame picture is a shot switching frame picture; if it is determined that the first detection data information and the second detection data information contain For the same connected component, the difference value of the same connected component is calculated; when the difference value meets the preset condition, it is determined that the candidate frame picture is the shot switching frame picture.

Correspondingly, in order to obtain a target detection model whose training results meet preset standards based on the target detection algorithm training, the determination module 33 is specifically used to collect multiple single-frame pictures as sample images; label the position coordinates and categories of each connected component in the sample image Information; use the sample images with marked coordinate positions as the training set and input them into the initial target detection model created in advance based on the yolo target detection algorithm; use the initial target detection model to extract the image features of various connected components in the sample images, and based on the image features Generate the suggestion window of each connected component and the conditional category probability of the various connected components corresponding to the suggestion window; determine the connected component category with the highest conditional category probability as the category recognition result of the connected component in the suggestion window; if it is determined that the confidence of all the suggestion windows is equal If it is greater than the second preset threshold and the category recognition result matches the labeled category information, it is determined that the initial target detection model has passed the training; if it is determined that the initial target detection model has not passed the training, the position coordinates of each connected component labeled in the sample image are used And the category information is revised to train the initial target detection model so that the judgment result of the initial target detection model meets the preset standard.

In a specific application scenario, when it is determined that the first detection data information and the second detection data information contain the same connected component, the determining module 33 is specifically configured to be based on the same connected component in the first detection data information and the second detection data information Calculate the first difference value based on the position coordinate information of the first detection data information and the second detection data information based on the height and width information of the same connected component in the second detection data information.

Correspondingly, when the difference value meets the preset condition, the determining module 33 is specifically configured to determine that the candidate frame picture is a shot switching frame picture if the first difference value and/or the second difference value is greater than the third preset threshold.

In a specific application scenario, in order to cut the video to be cut into multiple video segments, the cutting module 34 is specifically used to determine the shot switching frame corresponding to each shot switching frame picture; cut the to-be-cut video at the shot switching frame Cut the video.

It should be noted that, for other corresponding descriptions of the functional units involved in the apparatus for cutting video shots provided in this embodiment, reference may be made to the corresponding descriptions in FIGS. 1 to 2, and details are not repeated here.

Based on the above-mentioned method shown in FIG. 1 and FIG. 2, correspondingly, an embodiment of the present application also provides a non-volatile readable storage medium on which computer-readable instructions are stored, and the computer-readable instructions are When executed, the video shot cutting method shown in FIG. 1 and FIG. 2 is realized.

Based on this understanding, the technical solution of this application can be embodied in the form of a software product. The software product can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.), including several The instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods in each implementation scenario of the present application.

Based on the methods shown in Figures 1 and 2 and the virtual device embodiments shown in Figures 3 and 4, in order to achieve the above objectives, an embodiment of the present application also provides a computer device, which may be a personal computer, Server, network device, etc., the physical device includes a nonvolatile readable storage medium and a processor; a nonvolatile readable storage medium for storing computer readable instructions; a processor for executing computer readable instructions to The video shot cutting method shown in Figure 1 and Figure 2 is implemented.

Optionally, the computer device may also include a user interface, a network interface, a camera, a radio frequency (RF) circuit, a sensor, an audio circuit, a WI-FI module, and so on. The user interface may include a display screen (Display), an input unit such as a keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, and the like. The network interface can optionally include a standard wired interface, a wireless interface (such as a Bluetooth interface, a WI-FI interface), etc.

Those skilled in the art can understand that the computer device structure provided in this embodiment does not constitute a limitation on the physical device, and may include more or fewer components, or combine certain components, or arrange different components.

The non-volatile readable storage medium may also include an operating system and a network communication module. The operating system is a program for the hardware and software resources of the physical equipment cut by the video lens, and supports the operation of information processing programs and other software and/or programs. The network communication module is used to implement communication between various components in the non-volatile readable storage medium and communication with other hardware and software in the physical device.

Through the description of the foregoing implementation manners, those skilled in the art can clearly understand that this application can be implemented by means of software plus a necessary general hardware platform, or by hardware. By applying the technical solution of this application, compared with the current prior art, this application can extract each single frame picture from the video to be cut; after preprocessing each single frame picture, calculate the sum of each single frame picture Corresponding to the variance change value between the next single frame picture, when the variance change value is greater than the first preset threshold, determine the single frame picture as a candidate frame picture, after extracting all the candidate frame pictures, based on the yolo target detection algorithm Compare the degree of difference between the connected components of the candidate frame picture and the corresponding next single frame picture. When the difference is large, the candidate frame picture can be determined as the shot switching frame picture; finally, the shot switching frame corresponding to the shot switching frame picture Cut the video to be cut at any place. In this embodiment, through the double detection of the lens switching frames, all the lens switching frames included in the video to be cut can be accurately and efficiently determined, thereby realizing accurate cutting of each single lens scene, improving the cutting efficiency while , It also reduces the labor cost of video cutting.

Those skilled in the art can understand that the accompanying drawings are only schematic diagrams of preferred implementation scenarios, and the modules or processes in the accompanying drawings are not necessarily necessary for implementing this application. Those skilled in the art can understand that the modules in the device in the implementation scenario can be distributed in the device in the implementation scenario according to the description of the implementation scenario, or can be changed to be located in one or more devices different from the implementation scenario. The modules of the above implementation scenarios can be combined into one module or further divided into multiple sub-modules.

The above serial number of this application is only for description, and does not represent the merits of implementation scenarios. The above disclosures are only a few specific implementation scenarios of the application, but the application is not limited to these, and any changes that can be thought of by those skilled in the art should fall into the protection scope of the application.

Claims

A method for cutting video footage, which is characterized in that it includes:

Extract each single frame picture in the video to be cut;

Filtering out candidate frame pictures from the single frame pictures based on the variance change value;

Using a target detection algorithm to determine all shot switching frame pictures included in the candidate frame picture;

The video to be cut is cut into multiple video clips according to the shot switching frame picture.
The method according to claim 1, wherein before the screening of candidate frame pictures from the single frame pictures based on the variance change value, the method further specifically comprises:

Scaling each single frame picture to a preset size;

Perform grayscale processing on the single frame picture after scaling.
The method according to claim 2, wherein the filtering out candidate frame pictures from the single frame pictures based on the variance change value specifically comprises:

Calculating the variance value of all pixels in each single frame picture;

Calculating the variance change value between each single frame picture and the corresponding single frame picture of the next frame;

If it is determined that the variance change value is less than the first preset threshold, determining that the single frame picture is a non-shot switching frame picture;

If it is determined that the variance change value is greater than or equal to the first preset threshold, it is determined that the single frame picture is a candidate frame picture.
The method according to claim 3, wherein the determining all shot switching frame pictures included in the candidate frame picture by using a target detection algorithm specifically comprises:

Based on the target detection algorithm training, the target detection model whose training result meets the preset standard is obtained;

Inputting the candidate frame picture into the target detection model, and obtaining first detection data information corresponding to the candidate frame picture;

Input the next single frame picture corresponding to the candidate frame picture into the target detection model, and obtain the second detection data information corresponding to the next single frame picture;

If it is determined that the first detection data information and the second detection data information do not contain the same connected component, determining that the candidate frame picture is a shot switching frame picture;

If it is determined that the first detection data information and the second detection data information contain the same connected component, calculating the difference value of the same connected component;

When the difference value meets a preset condition, it is determined that the candidate frame picture is the shot switching frame picture.
The method according to claim 4, wherein the target detection model training based on the target detection algorithm to obtain a training result that meets a preset standard specifically comprises:

Collect multiple single-frame pictures as sample images;

Label the position coordinates and category information of each connected component in the sample image;

Use the sample images with marked coordinate positions as a training set, and input them into the initial target detection model created in advance based on the yolo target detection algorithm;

The initial target detection model is used to extract the image features of various connected components in the sample image, and based on the image features, a suggestion window for each connected component and the conditional category probability of each connected component corresponding to the suggestion window are generated ；

Determining the connected component category with the largest probability of the conditional category as the category recognition result of the connected component in the suggestion window;

If it is determined that the confidence levels of all the suggestion windows are greater than the second preset threshold, and the category recognition result matches the labeled category information, then it is determined that the initial target detection model has passed training;

If it is determined that the initial target detection model has not passed the training, the position coordinates and category information of each connected component marked in the sample image are used to modify and train the initial target detection model so that the judgment result of the initial target detection model Meet preset standards.
The method according to claim 5, wherein if it is determined that the first detection data information and the second detection data information contain the same connected component, then the difference value of the same connected component is calculated, specifically include:

Calculating a first difference value based on the position coordinate information of the same connected component in the first detection data information and the second detection data information;

Calculating a second difference value based on the height and width information of the same connected component in the first detection data information and the second detection data information;

The determining that the candidate frame picture is the shot switching frame picture when the difference value meets a preset condition specifically includes:

If the first difference value and/or the second difference value is greater than a third preset threshold, it is determined that the candidate frame picture is a shot switching frame picture.
The method according to claim 6, wherein the cutting the to-be-cut video into multiple video clips according to the shot switching frame picture specifically comprises:

Determining the lens switching frame corresponding to each of the lens switching frame pictures;

The video to be cut is cut at the shot switching frame.
A device for cutting video footage, characterized in that it comprises:

The extraction module is used to extract each single frame picture in the video to be cut;

A screening module, configured to screen out candidate frame pictures from the single frame pictures based on the variance change value;

A determining module, configured to determine all shot switching frame pictures included in the candidate frame picture by using a target detection algorithm;

The cutting module is configured to cut the to-be-cut video into multiple video clips according to the shot switching frame picture.
The apparatus according to claim 8, wherein the apparatus further comprises: a scaling module and a processing module;

The zoom module is configured to zoom each single frame picture to a preset size;

The processing module is configured to perform grayscale processing on the single-frame picture after scaling.
The device according to claim 9, wherein the filtering module is specifically configured to calculate the variance value of all pixels in each single frame picture; The variance change value between frames of pictures; if it is determined that the variance change value is less than the first preset threshold, it is determined that the single frame picture is a non-shot switching frame picture; if it is determined that the variance change value is greater than or equal to the first preset If the threshold is set, it is determined that the single frame picture is a candidate frame picture.
The apparatus according to claim 10, wherein the determining module is specifically configured to obtain a target detection model whose training result meets a preset standard based on the target detection algorithm training; input the candidate frame picture into the target detection model In, the first detection data information corresponding to the candidate frame picture is obtained; the next single frame picture corresponding to the candidate frame picture is input into the target detection model, and the first detection data information corresponding to the next single frame picture is obtained. 2. Detection data information; if it is determined that the first detection data information and the second detection data information do not contain the same connected component, determine that the candidate frame picture is a shot switching frame picture; if it is determined that the first detection data If the information and the second detection data information contain the same connected component, the difference value of the same connected component is calculated; when the difference value meets a preset condition, it is determined that the candidate frame picture is the shot switching frame image.
The device according to claim 11, wherein the determining module is specifically configured to collect multiple single-frame pictures as sample images; label the position coordinates and category information of each connected component in the sample image; The sample image of the coordinate position is used as a training set and input into the initial target detection model created in advance based on the yolo target detection algorithm; the initial target detection model is used to extract the image features of various connected components in the sample image, and based on all The image feature generates the suggestion window of each connected component and the conditional category probabilities of the various connected components corresponding to the suggestion window; the connected component category with the highest conditional category probability is determined as the category identification of the connected component in the suggestion window Result; if it is determined that the confidence levels of all the suggested windows are greater than the second preset threshold, and the category recognition result matches the labeled category information, then it is determined that the initial target detection model has passed the training; if it is determined that the If the initial target detection model fails the training, the position coordinates and category information of each connected component marked in the sample image are used to modify and train the initial target detection model so that the judgment result of the initial target detection model meets the preset standard .
The device according to claim 12, wherein the determining module is specifically configured to calculate the first detection data based on the position coordinate information of the same connected component in the first detection data information and the second detection data information. Difference value; calculating a second difference value based on the height and width information of the same connected component in the first detection data information and the second detection data information; when the difference value meets a preset condition, then The determining that the candidate frame picture is the shot switching frame picture specifically includes: if the first difference value and/or the second difference value is greater than a third preset threshold, determining that the candidate frame picture is a shot switching Frame picture.
The device according to claim 13, wherein the cutting module is specifically configured to determine the lens switching frame corresponding to each of the lens switching frame pictures; to cut the to-be-cut at the lens switching frame video.
A non-volatile readable storage medium having computer readable instructions stored thereon, wherein the method for cutting a video shot when the computer readable instructions are executed by a processor includes: extracting a video to be cut Each single frame picture in the single frame picture; filter out candidate frame pictures from the single frame picture based on the variance change value; use a target detection algorithm to determine all shot switching frame pictures included in the candidate frame picture; switch frame pictures according to the shot The video to be cut is cut into multiple video clips.
The non-volatile readable storage medium according to claim 15, wherein when the computer readable instructions are executed by a processor, the candidate frame pictures are selected from the single frame pictures based on the variance change value. , Including: calculating the variance value of all pixels in each single frame picture; calculating the variance change value between each single frame picture and the corresponding next single frame picture; if it is determined that the variance change value is less than the first If it is determined that the variance change value is greater than or equal to a first preset threshold, it is determined that the single frame picture is a candidate frame picture.
The non-volatile readable storage medium according to claim 16, wherein the computer-readable instruction is executed by the processor to implement the target detection algorithm to determine all shots included in the candidate frame picture The frame picture includes: a target detection model trained based on a target detection algorithm to obtain a training result that meets a preset standard; inputting the candidate frame picture into the target detection model to obtain first detection data information corresponding to the candidate frame picture; Input the next single frame picture corresponding to the candidate frame picture into the target detection model to obtain the second detection data information corresponding to the next single frame picture; If the second detection data information does not contain the same connected component, it is determined that the candidate frame picture is a shot switching frame picture; if it is determined that the first detection data information and the second detection data information contain the same connected component, then Calculate the difference value of the same connected component; when the difference value meets a preset condition, determine that the candidate frame picture is the shot switching frame picture.
A computer device, including a non-volatile readable storage medium, a processor, and computer readable instructions stored on the non-volatile readable storage medium and running on the processor, characterized in that the processor The method for implementing video shot cutting when the computer-readable instruction is executed includes: extracting each single frame picture in the video to be cut; selecting candidate frame pictures from the single frame pictures based on the variance change value; and using target detection An algorithm determines all shot switching frame pictures included in the candidate frame picture; and cutting the to-be-cut video into multiple video segments according to the shot switching frame picture.
The non-volatile readable storage medium according to claim 18, wherein when the computer-readable instructions are executed by a processor, the candidate frame pictures are selected from the single frame pictures based on the variance change value , Including: calculating the variance value of all pixels in each single frame picture; calculating the variance change value between each single frame picture and the corresponding next single frame picture; if it is determined that the variance change value is less than the first If it is determined that the variance change value is greater than or equal to a first preset threshold, it is determined that the single frame picture is a candidate frame picture.
The non-volatile readable storage medium according to claim 19, wherein when the computer-readable instructions are executed by a processor, the target detection algorithm is used to determine all shots included in the candidate frame picture The frame picture includes: a target detection model trained based on a target detection algorithm to obtain a training result that meets a preset standard; inputting the candidate frame picture into the target detection model to obtain first detection data information corresponding to the candidate frame picture; Input the next single frame picture corresponding to the candidate frame picture into the target detection model to obtain the second detection data information corresponding to the next single frame picture; If the second detection data information does not contain the same connected component, it is determined that the candidate frame picture is a shot switching frame picture; if it is determined that the first detection data information and the second detection data information contain the same connected component, then Calculate the difference value of the same connected component; when the difference value meets a preset condition, determine that the candidate frame picture is the shot switching frame picture.