WO2021003825A1 - Video shot cutting method and apparatus, and computer device - Google Patents

Video shot cutting method and apparatus, and computer device Download PDF

Info

Publication number
WO2021003825A1
WO2021003825A1 PCT/CN2019/103528 CN2019103528W WO2021003825A1 WO 2021003825 A1 WO2021003825 A1 WO 2021003825A1 CN 2019103528 W CN2019103528 W CN 2019103528W WO 2021003825 A1 WO2021003825 A1 WO 2021003825A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame picture
target detection
determined
data information
video
Prior art date
Application number
PCT/CN2019/103528
Other languages
French (fr)
Chinese (zh)
Inventor
雷晨雨
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021003825A1 publication Critical patent/WO2021003825A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Definitions

  • Shot switching is a very important step in video editing. It is not only required for the narrative composition or artistic expression of TV programs, but also for the audience to watch. Generally, in long videos such as sports games or TV programs, it is often necessary to switch shots more frequently, and then it is necessary to cut this long video into multiple video clips of a single shot scene. With the improvement of people's living standards, the quality requirements for viewing entertainment items are becoming more and more stringent. Therefore, how to strengthen the video cutting technology to make the video editing better meet the consumer's user experience is particularly important in the current environment .
  • the present application discloses a method, device and computer equipment for cutting video footage.
  • the main purpose is to solve the problem of cumbersome, inefficient and time-consuming cutting operations when using manual software tools to cut video. problem.
  • a method for cutting a video shot including:
  • the video to be cut is cut into multiple video clips according to the shot switching frame picture.
  • an apparatus for cutting a video lens including:
  • the extraction module is used to extract each single frame picture in the video to be cut;
  • a screening module configured to screen out candidate frame pictures from the single frame pictures based on the variance change value
  • a determining module configured to determine all shot switching frame pictures included in the candidate frame picture by using a target detection algorithm
  • the cutting module is configured to cut the to-be-cut video into multiple video clips according to the shot switching frame picture.
  • a non-volatile readable storage medium having computer readable instructions stored thereon, and the computer readable instructions are executed by a processor to implement the above-mentioned video shot cutting method.
  • a computer device including a non-volatile readable storage medium, a processor, and a computer-readable storage medium that is stored on the non-volatile readable storage medium and can run on the processor. Instructions, when the processor executes the computer-readable instructions, the video shot cutting method is implemented.
  • the method, device and computer equipment for cutting video shots provided by this application are compared with the current way of using manual software tools for video cutting.
  • This application can extract video from the video to be cut. Each single frame picture is selected; based on the variance change value, the candidate frame picture is initially selected from the single frame picture; then the target detection algorithm is used to determine each adjacent candidate frame with large differences, so as to determine the shot switching frame from the candidate frame pictures Picture; finally, the video to be cut is automatically cut into multiple video clips according to the camera switch frame picture.
  • FIG. 1 shows a schematic flowchart of a method for cutting a video shot provided by an embodiment of the present application
  • FIG. 2 shows a schematic flowchart of another video shot cutting method provided by an embodiment of the present application
  • FIG. 3 shows a schematic structural diagram of a video lens cutting device provided by an embodiment of the present application
  • Fig. 4 shows a schematic structural diagram of another video lens cutting device provided by an embodiment of the present application.
  • an embodiment of the present application provides a method for cutting a video shot, as shown in FIG. Methods include:
  • the pre-cut video to be cut must be shown for at least three minutes.
  • the first step of performing the cutting operation is to extract each single frame of pictures from the to-be-cut video, so as to determine all the shot switching frames contained in the to-be-cut video by comparing and analyzing each single frame of pictures.
  • the two phases can be initially determined by calculating the variance change difference between each single frame picture and the adjacent single frame picture.
  • the change of the high frequency part of the pixel in the adjacent single frame picture the greater the variance change value, the greater the fluctuation of the pixel point.
  • the single-frame picture can be preliminarily determined as a candidate frame picture and removed at the same time.
  • the non-shot-switching frame pictures determined by the difference of variance change are small, so that all the retained single-frame pictures are candidate frame pictures, so as to perform finer screening.
  • the target detection algorithm uses the yolo target detection method, that is, the detection task of the connected components in the candidate frame picture is treated as a regression problem, and the detection is directly obtained through all the pixels of the entire picture.
  • the coordinates of the bounding box and the bounding box contain the confidence of the object and the conditional category probability.
  • the position coordinates of each bounding box are (x, y, w, h), x and y represent the coordinates of the center point of the bounding box, and w and h represent the width and height of the bounding box.
  • the video to be cut can be automatically cut, and then multiple video clips in a single shot scene can be obtained.
  • each single frame picture can be extracted from the video to be cut; the candidate frame picture is initially selected from the single frame picture based on the variance change value; then the target detection algorithm is used to determine the existence Adjacent candidate frames with large differences are used to determine the shot switching frame picture from the candidate frame pictures; finally, the video to be cut is automatically cut into multiple video clips according to the shot switching frame picture.
  • the technical solution in this application it is possible to automatically extract the shot switching frame from the video to be cut according to the variance calculation result and the detection result of the yolo target detection model, and complete the cutting of the video to be cut at the shot switching frame. It avoids detection errors that are easy to occur during manual detection, and effectively improves the detection accuracy of lens switching frames and the efficiency of lens cutting.
  • the method includes:
  • determining the speed of lens switching can be determined by the number of different single-frame pictures played by the lens per second. When the number of different single-frame pictures played per second is greater than the screen transition threshold, it means that the camera will play within one second.
  • the video segment is a fast camera switch, otherwise it means a slow camera switch.
  • the pictures corresponding to each continuous frame in the video to be cut can be extracted as the waiting in this embodiment.
  • the analyzed single frame picture continue to perform the analysis and cut operation in steps 202 to 214 of the embodiment.
  • sampling frequency greater than 20 frames
  • the pictures are sparsely sampled through the sampling frequency, and a sampled picture is acquired in each sampling period as a single-frame picture to be analyzed in this embodiment.
  • the sampling frequency of a single frame picture can be set to 32, and the picture can be sparsely sampled by the sampling frequency to reduce the amount of calculation. If a video frame has 300 frames, the 0th frame, the 32nd frame, the 32*2 frame, the 32*3 frame, the 32*4 frame, etc. can be extracted according to the sampling frequency as the single pictures in this embodiment. Frame picture.
  • the single-frame pictures can be processed into a uniform format and size.
  • Set the preset size to 256*256.
  • each single frame image needs to be scaled to a pixel size of 256*256.
  • the single-frame pictures extracted from the video to be cut are mostly color images, they all adopt the RGB color mode.
  • the formula for calculating the variance of each single frame picture is:
  • S(t) is the variance value of each single frame picture
  • xi is the gray value of each pixel in the single frame picture
  • Is the average gray value of all pixels in a single frame of picture
  • n is the total number of pixels contained in a single frame of picture participating in the variance comparison.
  • the variance change between each single frame picture and the next single frame picture adjacent to each other can be used to preliminarily determine the changes in the high frequency part of the pixels in two adjacent single frame pictures. Therefore, by calculating the variance change value, the size of the change between the current single frame picture and the next frame picture can be preliminarily determined, so as to distinguish whether the current single frame picture is a non-shot switching frame picture or a candidate frame picture.
  • the first preset threshold is a minimum variance change value used to determine that the current single frame picture is a candidate frame picture.
  • the current single frame picture and the next single frame picture are different from each other. If the difference between the changes is not obvious, it can be determined that there is no shot scene transition between the current frame and the next frame in the video to be cut, so there is no need to cut, and the current single frame picture can be determined as a non-shot switch Frame the picture and then filter it out.
  • the variance value of the current single frame picture is calculated as S(t), the variance value corresponding to the next frame single frame picture is S(t+1), and the first preset threshold is set to N1. :
  • the current single frame picture can be saved as a candidate frame picture to be subjected to the next step of comparison and detection.
  • the variance value of the current single frame picture is calculated as S(t), the variance value corresponding to the next frame single frame picture is S(t+1), and the first preset threshold is set to N1. :
  • a target detection model whose training result meets a preset standard is obtained based on the target detection algorithm training.
  • step 208 of the embodiment may specifically include: collecting multiple single-frame pictures as sample images; labeling the position coordinates and category information of each connected component in the sample image;
  • the sample image is used as the training set and input into the initial target detection model created in advance based on the yolo target detection algorithm;
  • the initial target detection model is used to extract the image features of various connected components in the sample image, and based on the image features to generate the suggestion window of each connected component and
  • the suggestion window corresponds to the conditional category probabilities of various connected components;
  • the connected component category with the largest conditional category probability is determined as the category recognition result of the connected components in the suggestion window; if it is determined that the confidence of all suggestion windows is greater than the second preset threshold, and If the category recognition result matches the labeled category information, it is determined that the initial target detection model has passed the training; if it is determined that the initial target detection model has not passed the training, the location coordinates and category information of each connected component labeled in the sample image are used to modify the training initial
  • the confidence degree is used to determine whether there is an object in the recognition detection frame and the probability of the existence of the object.
  • the second preset threshold is a criterion used to evaluate whether the initial target detection model has passed the training.
  • the confidence that is determined to be non-zero is compared with the second preset threshold.
  • the initial target is determined
  • the detection model passes the training, otherwise it fails the training. Since the value of the confidence is between 0 and 1, the maximum value of the second preset threshold is set to 1. The larger the second preset threshold is, the more accurate the model training is.
  • the specific value is set Can be determined according to application standards.
  • the category information is the category that contains connected components in the video to be cut, such as people of different body shapes and appearances, fixed buildings, equipment, etc. In specific application scenarios, different settings to be recognized can be set according to the actual video recording scene Category information.
  • the initial target detection model is created in advance according to the design needs.
  • the initial target detection model is only initially created, it fails the model training, and does not meet the preset standards, while the target detection model refers to the model training , Which has reached the preset standard and can be applied to the detection of connected components in each single frame picture.
  • conditional class probability information is for each grid, that is, the probability of each object in each suggestion window corresponding to each category, such as training recognition a , B, c, d, e five categories, according to the confidence to determine that the suggested window A contains objects, then predict the conditional category probabilities of the suggested window A corresponding to the five categories a, b, c, d, e, such as the prediction result Respectively: 80%, 55%, 50%, 37%, 15%, the category a with the highest conditional category probability is judged as the recognition result, it is necessary to verify whether the object category actually calibrated in the detection frame is category a, if it is a category, it is determined that the initial target detection model recognizes the category information in this suggestion window is correct.
  • the confidence of all the recognized suggestion windows is greater than the second preset threshold, and the category recognition result matches the labeled category information, it is determined that the initial target detection model has passed the training.
  • the first detection information is the category and quantity of all connected components contained in the candidate frame picture, and data information such as position information, height, and width corresponding to each connected component.
  • the next single frame picture is a single frame picture corresponding to the next frame of the current candidate frame picture in the video to be cut, and the next single frame picture may be a non-shot switching frame picture or a candidate frame picture.
  • the second detection data information is the category and quantity of all connected components contained in the next single frame picture, and data information such as position information, height, and width corresponding to each connected component.
  • the current candidate frame picture and the corresponding next single frame picture are in two.
  • a completely different shot scene that is, it is determined that a shot scene switching occurs between the candidate frame and the next frame, so the current candidate frame picture is retained as the shot switching frame picture.
  • the first detection data information and the second detection data information contain at least one same connected component, it can be determined that the current candidate frame picture is a non-shot switching frame picture, and the candidate frame is filtered out.
  • step 212 of the embodiment may specifically include: calculating a first difference value based on the position coordinate information of the same connected component in the first detection data information and the second detection data information; The height and width information of the same connected component in the data information and the second detected data information calculate the second difference value.
  • the current candidate frame picture and the corresponding next frame single frame picture contain two identical connected components, and the corresponding two connected components are: s1, s2, and the size of s1 is obtained through the first detection data information
  • the sum position data is ⁇ x1, y1, w1, h1 ⁇
  • the size and position data of s2 obtained through the second detection data information is: ⁇ x2, y2, w2, h2 ⁇ .
  • x1 and y1 are respectively the position coordinate information of s1 in the current candidate frame picture
  • x2 and y2 are respectively the position coordinate information of s2 in the next single frame picture
  • w1 and h1 are the width and height of s1 respectively
  • w2 h2 is the width and height of s2 respectively.
  • step 213 of the embodiment may specifically include: if the first difference value and/or the second difference value is greater than the third preset threshold, determining that the candidate frame picture is a shot switching frame picture.
  • the preset condition is that at least one of the first difference value and the second difference value is greater than the third preset threshold, and the third preset threshold is the smallest difference value used to determine that the candidate frame picture is the shot switching frame picture, and the specific value is Can be set according to the actual situation.
  • the first difference value is calculated as d1
  • the second difference value is d2
  • the third preset threshold is set to N2. If it is determined that d1>N2 or d2>N2 or d1 , D2>N2, it can be determined that the candidate frame picture is a shot switching frame picture.
  • step 214 of the embodiment may specifically include: determining a shot switching frame corresponding to each shot switching frame picture; and cutting the video to be cut at the shot switching frame.
  • all the single-frame picture sequences extracted from the video to be cut are: [t0,...,tn], if it is determined that the shot switching frame corresponding to the extracted shot switching frame picture is: tx1, tx2, ..., txm, And (t0 ⁇ tx1 ⁇ tx2 ⁇ ... ⁇ txm ⁇ tn).
  • the video to be cut can be cut into [t0, tx1], [tx1+1, tx2], ... [txm+1, tn] video segments, where each video segment is a single shot segment.
  • each single frame picture can be extracted from the video to be cut; after preprocessing each single frame picture, calculate the distance between each single frame picture and the corresponding next single frame picture When the variance change value is greater than the first preset threshold, it is determined that the single frame picture is a candidate frame picture. After all the candidate frame pictures are extracted, the candidate frame picture is compared with the corresponding next frame based on the yolo target detection algorithm When the difference degree of the connected components of a single frame picture is large, the candidate frame picture can be determined as the shot switching frame picture; finally, the to-be-cut video is cut at the shot switching frame corresponding to the shot switching frame picture.
  • all the lens switching frames included in the video to be cut can be accurately and efficiently determined, thereby realizing accurate cutting of each single lens scene, improving the cutting efficiency while , It also reduces the labor cost of video cutting.
  • an embodiment of the present application provides a device for cutting a video shot.
  • the device includes: an extraction module 31, a screening module 32, and a determination Module 33, cutting module 34.
  • the extraction module 31 is used to extract each single frame picture in the video to be cut;
  • the screening module 32 is used for screening candidate frame pictures from a single frame picture based on the variance change value
  • the determining module 33 is configured to determine all shot switching frame pictures included in the candidate frame pictures by using a target detection algorithm
  • the cutting module 34 is used to cut the to-be-cut video into multiple video clips according to the camera switching frame pictures.
  • the device further includes a scaling module 35 and a processing module 36.
  • the zoom module 35 is used to zoom each single frame picture to a preset size
  • the processing module 36 is used to perform grayscale processing on the scaled single frame picture.
  • the filtering module 32 is specifically used to calculate the variance value of all pixels in each single frame picture; calculate each single frame picture and the corresponding next frame The variance change value between single frames of pictures; if it is determined that the variance change value is less than the first preset threshold, then the single frame picture is determined to be a non-shot switching frame picture; if the variance change value is determined to be greater than or equal to the first preset threshold, then it is determined A single frame picture is a candidate frame picture.
  • the determining module 33 is specifically used to train the target detection algorithm based on the target detection algorithm to obtain a target detection model whose training result meets the preset standard;
  • the candidate frame picture is input into the target detection model to obtain the first detection data information corresponding to the candidate frame picture;
  • the next single frame picture corresponding to the candidate frame picture is input into the target detection model, and the second frame picture corresponding to the next single frame picture is obtained Detection data information; if it is determined that the first detection data information and the second detection data information do not contain the same connected component, it is determined that the candidate frame picture is a shot switching frame picture; if it is determined that the first detection data information and the second detection data information contain For the same connected component, the difference value of the same connected component is calculated; when the difference value meets the preset condition, it is determined that the candidate frame picture is the shot switching frame picture.
  • the determination module 33 is specifically used to collect multiple single-frame pictures as sample images; label the position coordinates and categories of each connected component in the sample image Information; use the sample images with marked coordinate positions as the training set and input them into the initial target detection model created in advance based on the yolo target detection algorithm; use the initial target detection model to extract the image features of various connected components in the sample images, and based on the image features Generate the suggestion window of each connected component and the conditional category probability of the various connected components corresponding to the suggestion window; determine the connected component category with the highest conditional category probability as the category recognition result of the connected component in the suggestion window; if it is determined that the confidence of all the suggestion windows is equal If it is greater than the second preset threshold and the category recognition result matches the labeled category information, it is determined that the initial target detection model has passed the training; if it is determined that the initial target detection model has not passed the training, the position coordinate
  • the determining module 33 is specifically configured to be based on the same connected component in the first detection data information and the second detection data information Calculate the first difference value based on the position coordinate information of the first detection data information and the second detection data information based on the height and width information of the same connected component in the second detection data information.
  • the determining module 33 is specifically configured to determine that the candidate frame picture is a shot switching frame picture if the first difference value and/or the second difference value is greater than the third preset threshold.
  • the cutting module 34 is specifically used to determine the shot switching frame corresponding to each shot switching frame picture; cut the to-be-cut video at the shot switching frame Cut the video.
  • an embodiment of the present application also provides a non-volatile readable storage medium on which computer-readable instructions are stored, and the computer-readable instructions are When executed, the video shot cutting method shown in FIG. 1 and FIG. 2 is realized.
  • the technical solution of this application can be embodied in the form of a software product.
  • the software product can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.), including several
  • the instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods in each implementation scenario of the present application.
  • an embodiment of the present application also provides a computer device, which may be a personal computer, Server, network device, etc., the physical device includes a nonvolatile readable storage medium and a processor; a nonvolatile readable storage medium for storing computer readable instructions; a processor for executing computer readable instructions to The video shot cutting method shown in Figure 1 and Figure 2 is implemented.
  • the computer device may also include a user interface, a network interface, a camera, a radio frequency (RF) circuit, a sensor, an audio circuit, a WI-FI module, and so on.
  • the user interface may include a display screen (Display), an input unit such as a keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, and the like.
  • the network interface can optionally include a standard wired interface, a wireless interface (such as a Bluetooth interface, a WI-FI interface), etc.
  • the computer device structure provided in this embodiment does not constitute a limitation on the physical device, and may include more or fewer components, or combine certain components, or arrange different components.
  • the non-volatile readable storage medium may also include an operating system and a network communication module.
  • the operating system is a program for the hardware and software resources of the physical equipment cut by the video lens, and supports the operation of information processing programs and other software and/or programs.
  • the network communication module is used to implement communication between various components in the non-volatile readable storage medium and communication with other hardware and software in the physical device.
  • this application can be implemented by means of software plus a necessary general hardware platform, or by hardware.
  • this application can extract each single frame picture from the video to be cut; after preprocessing each single frame picture, calculate the sum of each single frame picture Corresponding to the variance change value between the next single frame picture, when the variance change value is greater than the first preset threshold, determine the single frame picture as a candidate frame picture, after extracting all the candidate frame pictures, based on the yolo target detection algorithm Compare the degree of difference between the connected components of the candidate frame picture and the corresponding next single frame picture.
  • the candidate frame picture can be determined as the shot switching frame picture; finally, the shot switching frame corresponding to the shot switching frame picture Cut the video to be cut at any place.
  • the shot switching frame corresponding to the shot switching frame picture Cut the video to be cut at any place.
  • all the lens switching frames included in the video to be cut can be accurately and efficiently determined, thereby realizing accurate cutting of each single lens scene, improving the cutting efficiency while , It also reduces the labor cost of video cutting.

Abstract

Disclosed are a video shot cutting method and apparatus, and a computer device, relating to the technical field of computers, and being able to solve the problems of a cumbersome cutting operation, a low efficiency, and time and labor being consumed during video cutting by using a manual software tool. The method comprises: extracting each single frame of a picture in a video to be cut; screening, on the basis of a variance change value, candidate frames of pictures from the single frame of a picture; determining, by using a target detection algorithm, all shot cut frames of pictures included in the candidate frames of pictures; and cutting said video into a plurality of video clips according to the shot cut frames of pictures. The present application is applicable to automatic splitting for video fragments under different shot scenarios.

Description

视频镜头剪切的方法、装置及计算机设备Method, device and computer equipment for cutting video lens 技术领域Technical field
本申请要求与2019年7月11日提交中国专利局、申请号为2019106249186、申请名称为“视频镜头剪切的方法、装置及计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在申请中。This application claims priority with the Chinese patent application filed on July 11, 2019 with the Chinese Patent Office, the application number is 2019106249186, and the application name is "Method, Apparatus and Computer Equipment for Cutting Video Shots", the entire contents of which are incorporated by reference In application.
背景技术Background technique
镜头切换是视频剪辑中非常重要的一步,它不仅是电视节目叙事构成或者艺术表现的需要,同时也是观众观赏的需要。一般在体育比赛或者电视节目等长视频中,往往需要比较频繁的进行镜头切换,之后需要把这种长视频剪切成多个单一镜头场景的视频片段。随着人们生活水平的提高,对观赏类娱乐项目的质量要求也要求越来越严格,故如何加强视频剪切技术,使视频剪辑更能满足消费者的用户体验,在当下环境中显得尤为重要。Shot switching is a very important step in video editing. It is not only required for the narrative composition or artistic expression of TV programs, but also for the audience to watch. Generally, in long videos such as sports games or TV programs, it is often necessary to switch shots more frequently, and then it is necessary to cut this long video into multiple video clips of a single shot scene. With the improvement of people's living standards, the quality requirements for viewing entertainment items are becoming more and more stringent. Therefore, how to strengthen the video cutting technology to make the video editing better meet the consumer's user experience is particularly important in the current environment .
目前这种视频剪切工作一般还是由人工使用视频剪切软件来完成,而这种剪切方法通常比较麻烦,剪切效率低且费时费力。At present, this kind of video cutting work is generally done manually using video cutting software, and this cutting method is usually troublesome, cutting efficiency is low, and time-consuming and labor-intensive.
发明内容Summary of the invention
有鉴于此,本申请公开了一种视频镜头剪切的方法、装置及计算机设备,主要目的在于解决在利用人工软件工具进行视频剪切时,剪切操作麻烦、效率低下且耗时耗力的问题。In view of this, the present application discloses a method, device and computer equipment for cutting video footage. The main purpose is to solve the problem of cumbersome, inefficient and time-consuming cutting operations when using manual software tools to cut video. problem.
根据本申请的一个方面,提供了一种视频镜头剪切的方法,该方法包括:According to one aspect of the present application, there is provided a method for cutting a video shot, the method including:
提取待剪切视频中的各个单帧图片;Extract each single frame picture in the video to be cut;
基于方差变化值从所述单帧图片中筛选出候选帧图片;Filtering out candidate frame pictures from the single frame pictures based on the variance change value;
利用目标检测算法确定所述候选帧图片中包含的所有镜头切换帧图片;Using a target detection algorithm to determine all shot switching frame pictures included in the candidate frame picture;
根据所述镜头切换帧图片将所述待剪切视频剪切成多个视频片段。The video to be cut is cut into multiple video clips according to the shot switching frame picture.
根据本申请的另一个方面,提供了一种视频镜头剪切的装置,该装置包括:According to another aspect of the present application, there is provided an apparatus for cutting a video lens, the apparatus including:
提取模块,用于提取待剪切视频中的各个单帧图片;The extraction module is used to extract each single frame picture in the video to be cut;
筛选模块,用于基于方差变化值从所述单帧图片中筛选出候选帧图片;A screening module, configured to screen out candidate frame pictures from the single frame pictures based on the variance change value;
确定模块,用于利用目标检测算法确定所述候选帧图片中包含的所有镜头切换帧图片;A determining module, configured to determine all shot switching frame pictures included in the candidate frame picture by using a target detection algorithm;
剪切模块,用于根据所述镜头切换帧图片将所述待剪切视频剪切成多个视频片段。The cutting module is configured to cut the to-be-cut video into multiple video clips according to the shot switching frame picture.
根据本申请的又一个方面,提供了一种非易失性可读存储介质,其上存储有计算机可 读指令,所述计算机可读指令被处理器执行时实现上述视频镜头剪切的方法。According to yet another aspect of the present application, there is provided a non-volatile readable storage medium having computer readable instructions stored thereon, and the computer readable instructions are executed by a processor to implement the above-mentioned video shot cutting method.
根据本申请的再一个方面,提供了一种计算机设备,包括非易失性可读存储介质、处理器及存储在非易失性可读存储介质上并可在处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现上述视频镜头剪切的方法。According to another aspect of the present application, there is provided a computer device, including a non-volatile readable storage medium, a processor, and a computer-readable storage medium that is stored on the non-volatile readable storage medium and can run on the processor. Instructions, when the processor executes the computer-readable instructions, the video shot cutting method is implemented.
借由上述技术方案,本申请提供的一种视频镜头剪切的方法、装置及计算机设备,与目前利用人工软件工具进行视频剪切的方式相比,本申请可通过从待剪切视频中提取出各个单帧图片;基于方差变化值从单帧图片中初步筛选出候选帧图片;之后利用目标检测算法确定存在较大差异的各个相邻候选帧,以便从候选帧图片中确定出镜头切换帧图片;最后根据镜头切换帧图片自动将待剪切视频剪切成多个视频片段。通过本申请中的技术方案,可以根据方差计算结果以及yolo目标检测模型的检测结果,自动从待剪切视频中提取出镜头切换帧,并在镜头切换帧处完成对待剪切视频的剪切,避免了人工检测时容易出现的检测误差,有效提高了镜头切换帧的检测精度以及镜头剪切的工作效率。With the above technical solutions, the method, device and computer equipment for cutting video shots provided by this application are compared with the current way of using manual software tools for video cutting. This application can extract video from the video to be cut. Each single frame picture is selected; based on the variance change value, the candidate frame picture is initially selected from the single frame picture; then the target detection algorithm is used to determine each adjacent candidate frame with large differences, so as to determine the shot switching frame from the candidate frame pictures Picture; finally, the video to be cut is automatically cut into multiple video clips according to the camera switch frame picture. With the technical solution in this application, it is possible to automatically extract the shot switching frame from the video to be cut according to the variance calculation result and the detection result of the yolo target detection model, and complete the cutting of the video to be cut at the shot switching frame. It avoids detection errors that are easy to occur during manual detection, and effectively improves the detection accuracy of lens switching frames and the efficiency of lens cutting.
附图说明Description of the drawings
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本地申请的不当限定。在附图中:The drawings described here are used to provide a further understanding of the application and constitute a part of the application. The exemplary embodiments and descriptions of the application are used to explain the application, and do not constitute an improper limitation of the local application. In the attached picture:
图1示出了本申请实施例提供的一种视频镜头剪切的方法的流程示意图;FIG. 1 shows a schematic flowchart of a method for cutting a video shot provided by an embodiment of the present application;
图2示出了本申请实施例提供的另一种视频镜头剪切的方法的流程示意图;FIG. 2 shows a schematic flowchart of another video shot cutting method provided by an embodiment of the present application;
图3示出了本申请实施例提供的一种视频镜头剪切的装置的结构示意图;FIG. 3 shows a schematic structural diagram of a video lens cutting device provided by an embodiment of the present application;
图4示出了本申请实施例提供的另一种视频镜头剪切的装置的结构示意图。Fig. 4 shows a schematic structural diagram of another video lens cutting device provided by an embodiment of the present application.
具体实施方式Detailed ways
下文将参考附图并结合实施例来详细说明本申请。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互结合。Hereinafter, the application will be described in detail with reference to the drawings and in conjunction with embodiments. It should be noted that the embodiments in this application and the features in the embodiments can be combined with each other if there is no conflict.
针对目前在利用人工软件工具进行视频剪切时,剪切操作麻烦、效率低下且耗时耗力的问题,本申请实施例提供了一种视频镜头剪切的方法,如图1所示,该方法包括:In view of the problems of cumbersome, inefficient and time-consuming and labor-consuming cutting operations when using manual software tools for video cutting, an embodiment of the present application provides a method for cutting a video shot, as shown in FIG. Methods include:
101、提取待剪切视频中的各个单帧图片。101. Extract each single frame picture in the video to be cut.
在具体的应用场景中,为了方便对待剪切视频的精确剪切,预进行剪切的待剪切视频 的放映时长至少要保证在三分钟以上。执行剪切操作的第一步需要从待剪切视频中提取出各个单帧的图片,以便通过对各个单帧图片的比较分析,确定出待剪切视频中包含的所有镜头切换帧。In a specific application scenario, in order to facilitate the precise cutting of the cut video, the pre-cut video to be cut must be shown for at least three minutes. The first step of performing the cutting operation is to extract each single frame of pictures from the to-be-cut video, so as to determine all the shot switching frames contained in the to-be-cut video by comparing and analyzing each single frame of pictures.
102、基于方差变化值从单帧图片中筛选出候选帧图片。102. Filter candidate frame pictures from a single frame picture based on the variance change value.
在具体的应用场景中,因图片方差值的大小可显示图片中像素点的波动程度,故可通过计算各个单帧图片与相邻单帧图片的方差变化差值,来初步确定两个相邻单帧图片中像素点高频部分的变化情况。其中,方差变化值越大,说明像素点的变化波动越大,进一步确定这两个单帧图片中出现了不同的像素聚集点,即可初步将该单帧图片确定为候选帧图片,同时剔除由方差变化差值小而确定出的非镜头切换帧图片,进而使保留下的单帧图片全部为候选帧图片,以便进行更精细的筛选。In a specific application scenario, because the size of the picture variance value can show the degree of fluctuation of the pixels in the picture, the two phases can be initially determined by calculating the variance change difference between each single frame picture and the adjacent single frame picture. The change of the high frequency part of the pixel in the adjacent single frame picture. Among them, the greater the variance change value, the greater the fluctuation of the pixel point. It is further determined that different pixel aggregation points appear in the two single-frame pictures, and the single-frame picture can be preliminarily determined as a candidate frame picture and removed at the same time. The non-shot-switching frame pictures determined by the difference of variance change are small, so that all the retained single-frame pictures are candidate frame pictures, so as to perform finer screening.
103、利用目标检测算法确定候选帧图片中包含的所有镜头切换帧图片。103. Use a target detection algorithm to determine all shot switching frame pictures included in the candidate frame picture.
其中,在本实施例中目标检测算法采用的是yolo目标检测的方法,即将对候选帧图片中连通分量的检测任务当做回归问题(regression problem)来处理,直接通过整张图片的所有像素得到检测框bounding box的坐标、bounding box中包含物体的置信度和条件类别概率。每个bounding box的位置坐标为(x,y,w,h),x和y表示bounding box中心点坐标,w和h表示bounding box宽度和高度。通过yolo对目标进行检测,通过识别图片就能判断出候选帧图片中都有哪些物体和这些物体的位置。Among them, in this embodiment, the target detection algorithm uses the yolo target detection method, that is, the detection task of the connected components in the candidate frame picture is treated as a regression problem, and the detection is directly obtained through all the pixels of the entire picture. The coordinates of the bounding box and the bounding box contain the confidence of the object and the conditional category probability. The position coordinates of each bounding box are (x, y, w, h), x and y represent the coordinates of the center point of the bounding box, and w and h represent the width and height of the bounding box. Through yolo to detect the target, by recognizing the picture, it is possible to judge which objects and the positions of these objects are in the candidate frame picture.
104、根据镜头切换帧图片将待剪切视频剪切成多个视频片段。104. Cut the to-be-cut video into multiple video clips according to the camera switching frame pictures.
在具体的应用场景中,在确定出所有的镜头切换帧图片后,可实现对待剪切视频的自动剪切,进而获取得到多个单一镜头场景下的视频片段。In a specific application scenario, after all the shot switching frame pictures are determined, the video to be cut can be automatically cut, and then multiple video clips in a single shot scene can be obtained.
通过本实施例中视频镜头剪切的方法,可通过从待剪切视频中提取出各个单帧图片;基于方差变化值从单帧图片中初步筛选出候选帧图片;之后利用目标检测算法确定存在较大差异的各个相邻候选帧,以便从候选帧图片中确定出镜头切换帧图片;最后根据镜头切换帧图片自动将待剪切视频剪切成多个视频片段。通过本申请中的技术方案,可以根据方差计算结果以及yolo目标检测模型的检测结果,自动从待剪切视频中提取出镜头切换帧,并在镜头切换帧处完成对待剪切视频的剪切,避免了人工检测时容易出现的检测误差,有 效提高了镜头切换帧的检测精度以及镜头剪切的工作效率。Through the method of cutting the video shot in this embodiment, each single frame picture can be extracted from the video to be cut; the candidate frame picture is initially selected from the single frame picture based on the variance change value; then the target detection algorithm is used to determine the existence Adjacent candidate frames with large differences are used to determine the shot switching frame picture from the candidate frame pictures; finally, the video to be cut is automatically cut into multiple video clips according to the shot switching frame picture. With the technical solution in this application, it is possible to automatically extract the shot switching frame from the video to be cut according to the variance calculation result and the detection result of the yolo target detection model, and complete the cutting of the video to be cut at the shot switching frame. It avoids detection errors that are easy to occur during manual detection, and effectively improves the detection accuracy of lens switching frames and the efficiency of lens cutting.
进一步的,作为上述实施例具体实施方式的细化和扩展,为了完整说明本实施例中的具体实施过程,提供了另一种视频镜头剪切的方法,如图2所示,该方法包括:Further, as a refinement and extension of the specific implementation of the foregoing embodiment, in order to fully describe the specific implementation process in this embodiment, another method of video shot cutting is provided. As shown in FIG. 2, the method includes:
201、提取待剪切视频中的各个单帧图片。201. Extract each single frame picture in the video to be cut.
在具体的应用场景中,由于场景切换过程中视频的单帧图片都有一个变换过程,根据变换时长可把这个过程分成2类:快速的镜头切换和慢速的镜头切换。其中,确定镜头切换的快慢可通过每秒钟内镜头播放不同单帧图片的数量来确定,当每秒钟内播放不同单帧图片的数量大于画面转换设定阈值时,说明一秒钟内播放的视频段属于快速的镜头切换,否则说明是慢速的镜头切换。In a specific application scenario, since a single frame of video in the scene switching process has a transformation process, this process can be divided into two categories according to the transformation duration: fast camera switching and slow camera switching. Among them, determining the speed of lens switching can be determined by the number of different single-frame pictures played by the lens per second. When the number of different single-frame pictures played per second is greater than the screen transition threshold, it means that the camera will play within one second. The video segment is a fast camera switch, otherwise it means a slow camera switch.
在本实施例中,针对快速的镜头切换场景,由于不同单帧图片的转换速度较快,故可将待剪切视频中每个连续帧对应的图片都提取出来,作为本实施例中的待分析的单帧图片,继续执行实施例步骤202至214中的分析剪切操作。In this embodiment, for the fast scene switching scene, since the conversion speed of different single-frame pictures is relatively fast, the pictures corresponding to each continuous frame in the video to be cut can be extracted as the waiting in this embodiment. For the analyzed single frame picture, continue to perform the analysis and cut operation in steps 202 to 214 of the embodiment.
相应的,作为一种优选方式,针对慢速的镜头切换场景,由于不同单帧图片的转换速度较慢,进而会导致出现多个连续单帧图片变动不大的情况,为了减少计算量,可设置一个采样频率(大于20帧),通过采样频率对图片进行稀疏采样,每个一个采样周期获取一个采样图片作为本实施例中待分析的单帧图片。例如,结合实际情况,在本方案可中将单帧图片的采样频率设定为32,则可通过采样频率对图片进行稀疏采样,以此来减少计算量。如一个视频帧有300帧,则可根据采样频率提取第0帧,第32帧,第32*2帧,第32*3帧,第32*4帧,…等图片作为本实施例中的单帧图片。Correspondingly, as a preferred way, for slow scene switching scenes, because the conversion speed of different single-frame pictures is slow, it will lead to a situation where multiple continuous single-frame pictures have little change. In order to reduce the amount of calculation, you can A sampling frequency (greater than 20 frames) is set, the pictures are sparsely sampled through the sampling frequency, and a sampled picture is acquired in each sampling period as a single-frame picture to be analyzed in this embodiment. For example, in combination with the actual situation, in this solution, the sampling frequency of a single frame picture can be set to 32, and the picture can be sparsely sampled by the sampling frequency to reduce the amount of calculation. If a video frame has 300 frames, the 0th frame, the 32nd frame, the 32*2 frame, the 32*3 frame, the 32*4 frame, etc. can be extracted according to the sampling frequency as the single pictures in this embodiment. Frame picture.
202、将各个单帧图片缩放到预设尺寸大小。202. Scale each single frame picture to a preset size.
在具体的应用场景中,为了方便对提取出的单帧图片进行统一的分析,进而保证分析的精确性,可将单帧图片处理成统一格式大小,在本实施例中,为适应需要,可将预设尺寸大小设定为256*256,当获取到单帧图片时,则需要将各个单帧图片缩放到256*256的像素大小。In a specific application scenario, in order to facilitate the unified analysis of the extracted single-frame pictures and to ensure the accuracy of the analysis, the single-frame pictures can be processed into a uniform format and size. In this embodiment, in order to meet the needs, Set the preset size to 256*256. When a single frame image is obtained, each single frame image needs to be scaled to a pixel size of 256*256.
203、对缩放后的单帧图片进行灰度化处理。203. Perform grayscale processing on the zoomed single frame picture.
相应的,由于从待剪切视频中提取出的单帧图片多为彩色图像,都是采用RGB颜色模式,为了消除单帧图片中无关信息对图像检测的干扰,增强有关信息的可检测性,并且最大限度地简化数据,需要在初期处理单帧图片的时候,预先对待识别的单帧图片进行灰度化处理,从而保证图片检测的可靠性。Correspondingly, since the single-frame pictures extracted from the video to be cut are mostly color images, they all adopt the RGB color mode. In order to eliminate the interference of irrelevant information in the single-frame picture on image detection and enhance the detectability of related information, And to simplify the data to the greatest extent, it is necessary to perform gray-scale processing on the single-frame image to be recognized in the initial processing of the single-frame image, so as to ensure the reliability of image detection.
204、计算各个单帧图片中所有像素点的方差值。204. Calculate the variance values of all pixels in each single frame picture.
对于本实施例,每一个单帧图片的方差计算公式为:For this embodiment, the formula for calculating the variance of each single frame picture is:
Figure PCTCN2019103528-appb-000001
Figure PCTCN2019103528-appb-000001
其中,S(t)为每个单帧图片的方差值,xi为单帧图片中各个像素点的灰度值,
Figure PCTCN2019103528-appb-000002
为单帧图片中所有像素点的平均灰度值,n为参与方差比较的单帧图片中包含的像素点的总数。
Among them, S(t) is the variance value of each single frame picture, xi is the gray value of each pixel in the single frame picture,
Figure PCTCN2019103528-appb-000002
Is the average gray value of all pixels in a single frame of picture, and n is the total number of pixels contained in a single frame of picture participating in the variance comparison.
205、计算各个单帧图片与对应下一帧单帧图片之间的方差变化值。205. Calculate the variance change value between each single frame picture and the corresponding single frame picture of the next frame.
在具体的应用场景中,由于根据各个单帧图片与相邻下一帧单帧图片的方差变化差值,可初步确定两个相邻单帧图片中像素点高频部分的变化情况。故可通过计算方差变化值,初步确定出当前单帧图片与下一帧图片变化大小,进而区分当前单帧图片是非镜头切换帧图片还是候选帧图片。In a specific application scenario, since the variance change between each single frame picture and the next single frame picture adjacent to each other can be used to preliminarily determine the changes in the high frequency part of the pixels in two adjacent single frame pictures. Therefore, by calculating the variance change value, the size of the change between the current single frame picture and the next frame picture can be preliminarily determined, so as to distinguish whether the current single frame picture is a non-shot switching frame picture or a candidate frame picture.
206、若确定方差变化值小于第一预设阈值,则判定单帧图片为非镜头切换帧图片。206. If it is determined that the variance change value is less than the first preset threshold, determine that the single frame picture is a non-shot switching frame picture.
其中,第一预设阈值为用于判定当前单帧图片为候选帧图片的最小方差变化值。Wherein, the first preset threshold is a minimum variance change value used to determine that the current single frame picture is a candidate frame picture.
相应的,对于本实施例,若确定当前单帧图片与对应下一帧单帧图片之间的方差变化值小于第一预设阈值,则可说明当前单帧图片与下一帧单帧图片之间的变化差异不明显,即可判定待剪切视频中在当前帧与下一帧之间不存在镜头场景的转换,故不需要进行剪切,则可将当前单帧图片确定为非镜头切换帧图片,之后进行滤除。Correspondingly, for this embodiment, if it is determined that the variance change value between the current single frame picture and the corresponding next single frame picture is less than the first preset threshold, it can be described that the current single frame picture and the next single frame picture are different from each other. If the difference between the changes is not obvious, it can be determined that there is no shot scene transition between the current frame and the next frame in the video to be cut, so there is no need to cut, and the current single frame picture can be determined as a non-shot switch Frame the picture and then filter it out.
例如,计算出当前单帧图片的方差值为S(t),对应下一帧单帧图片的方差值为S(t+1),且设定第一预设阈值为N1,若计算:|S(t)-S(t+1)|<N1,则可判定当前单帧图片为为非镜头切换帧图片。For example, the variance value of the current single frame picture is calculated as S(t), the variance value corresponding to the next frame single frame picture is S(t+1), and the first preset threshold is set to N1. : |S(t)-S(t+1)|<N1, it can be determined that the current single frame picture is a non-shot switching frame picture.
207、若确定方差变化值大于或等于第一预设阈值,则判定单帧图片为候选帧图片。207. If it is determined that the variance change value is greater than or equal to the first preset threshold, determine that a single frame picture is a candidate frame picture.
在具体的应用场景中,对于本实施例,若确定当前单帧图片与对应下一帧单帧图片之间的方差变化值大于或等于第一预设阈值,则可说明当前单帧图片与下一帧单帧图片之间的变化差异相对较大,两者是否为同一镜头场景仍需要进行下一步的精确判定,故可将当前单帧图片保存为待进行下一步对比检测的候选帧图片。In a specific application scenario, for this embodiment, if it is determined that the variance change value between the current single frame picture and the corresponding next frame single frame picture is greater than or equal to the first preset threshold, it can indicate that the current single frame picture and the next frame The difference in changes between a single frame of pictures is relatively large, and whether the two are the same scene scene still needs to be accurately determined in the next step, so the current single frame picture can be saved as a candidate frame picture to be subjected to the next step of comparison and detection.
例如,计算出当前单帧图片的方差值为S(t),对应下一帧单帧图片的方差值为S(t+1),且设定第一预设阈值为N1,若计算:|S(t)-S(t+1)|≥N1,则可判定当前单帧图片为为候选帧图片。For example, the variance value of the current single frame picture is calculated as S(t), the variance value corresponding to the next frame single frame picture is S(t+1), and the first preset threshold is set to N1. : |S(t)-S(t+1)|≥N1, it can be determined that the current single frame picture is a candidate frame picture.
208、基于目标检测算法训练得到训练结果满足预设标准的目标检测模型。208. A target detection model whose training result meets a preset standard is obtained based on the target detection algorithm training.
对于本实施例,在具体的应用场景中,实施例步骤208具体可以包括:采集多个单帧图片作为样本图像;标注样本图像中各个连通分量的位置坐标和类别信息;将已标注坐标位置的样本图像作为训练集,输入预先基于yolo目标检测算法创建的初始目标检测模型中;利用初始目标检测模型提取样本图像中各类连通分量的图像特征,并基于图像特征生成各个连通分量的建议窗口以及建议窗口对应各类连通分量的条件类别概率;将条件类别概率最大的连通分量类别确定为建议窗口内连通分量的类别识别结果;若判定所有建议窗口的置信度均大于第二预设阈值,且类别识别结果与标注的类别信息匹配,则判定初始目标检测模型通过训练;若判定初始目标检测模型未通过训练,则利用样本图像中标注的各个连通分量的位置坐标和类别信息修正训练初始目标检测模型,以使初始目标检测模型的判定结果满足预设标准。For this embodiment, in a specific application scenario, step 208 of the embodiment may specifically include: collecting multiple single-frame pictures as sample images; labeling the position coordinates and category information of each connected component in the sample image; The sample image is used as the training set and input into the initial target detection model created in advance based on the yolo target detection algorithm; the initial target detection model is used to extract the image features of various connected components in the sample image, and based on the image features to generate the suggestion window of each connected component and The suggestion window corresponds to the conditional category probabilities of various connected components; the connected component category with the largest conditional category probability is determined as the category recognition result of the connected components in the suggestion window; if it is determined that the confidence of all suggestion windows is greater than the second preset threshold, and If the category recognition result matches the labeled category information, it is determined that the initial target detection model has passed the training; if it is determined that the initial target detection model has not passed the training, the location coordinates and category information of each connected component labeled in the sample image are used to modify the training initial target detection Model so that the judgment result of the initial target detection model meets the preset standard.
其中,置信度confidence是用于判定识别检测框中是否含有物体,且存在物体的概率。其计算公式为:
Figure PCTCN2019103528-appb-000003
Pr(Object)是用于识别检测框中有无物体,Pr(Object)∈{0,1},当Pr(Object)=0时,说明检测框中不包含物体,则计算出置信度confidence=0,即代表未识别出物体;当Pr(Object)=1时,说明检测框中包含物体,则置信度confidence的取值为交并比
Figure PCTCN2019103528-appb-000004
是产生检测出的候选框(candidate bound)与实际标记框(ground truth bound)的交叠率,即它们的交集与并集的比值。最理想情况是完全重叠,即比值为1。第二预设阈值是用于评定初始目标检测模型是否通过训练的评判标准,将判定出非零的置信度与第二预设阈值比较,当置信度大于第二预设阈值, 则判定初始目标检测模型通过训练,否则未通过训练。由于置信度的取值在0到1之间,故设定的第二预设阈值的最大值为1,设定的第二预设阈值越大,代表模型训练得越精准,具体设定数值可根据应用标准来进行确定。类别信息为待剪切视频中包含连通分量的类别,如不同体型以及外貌的人、固定的建筑物、器械等,在具体的应用场景中,可根据实际的视频录制场景设定不同待识别的类别信息。初始目标检测模型为预先根据设计需要创建的,与目标检测模型的区别是:初始目标检测模型只是初步创建完成,未通过模型训练,且未满足预设标准,而目标检测模型是指通过模型训练,已达到预设标准、可应用于对各个单帧图片中连通分量的检测。
Among them, the confidence degree is used to determine whether there is an object in the recognition detection frame and the probability of the existence of the object. The calculation formula is:
Figure PCTCN2019103528-appb-000003
Pr(Object) is used to identify whether there is an object in the detection frame, Pr(Object)∈{0,1}, when Pr(Object)=0, it means that the detection frame does not contain an object, then the confidence = 0 means that the object is not recognized; when Pr(Object)=1, it means that the detection frame contains objects, and the value of the confidence is the intersection ratio
Figure PCTCN2019103528-appb-000004
It is the overlap ratio between the detected candidate bound and the ground truth bound, that is, the ratio of their intersection and union. The ideal situation is complete overlap, that is, the ratio is 1. The second preset threshold is a criterion used to evaluate whether the initial target detection model has passed the training. The confidence that is determined to be non-zero is compared with the second preset threshold. When the confidence is greater than the second preset threshold, the initial target is determined The detection model passes the training, otherwise it fails the training. Since the value of the confidence is between 0 and 1, the maximum value of the second preset threshold is set to 1. The larger the second preset threshold is, the more accurate the model training is. The specific value is set Can be determined according to application standards. The category information is the category that contains connected components in the video to be cut, such as people of different body shapes and appearances, fixed buildings, equipment, etc. In specific application scenarios, different settings to be recognized can be set according to the actual video recording scene Category information. The initial target detection model is created in advance according to the design needs. The difference from the target detection model is: the initial target detection model is only initially created, it fails the model training, and does not meet the preset standards, while the target detection model refers to the model training , Which has reached the preset standard and can be applied to the detection of connected components in each single frame picture.
在具体的应用场景中,置信度confidence是针对每个建议窗口的,而条件类别概率conditional class probability信息是针对每个网格的,即各个建议窗口中物体对应各个类别的概率,如训练识别a、b、c、d、e五个类别,则依据置信度判定建议窗口A包含物体,则分别预测建议窗口A对应a、b、c、d、e五个类别的条件类别概率,如预测结果分别为:80%、55%、50%、37%、15%,则将条件类别概率最高的a类别判定为识别结果,则需要验证检测框中实际标定的物体类别是否为a类别,如为a类别,则判定初始目标检测模型识别此建议窗口中类别信息是正确的。在判定所有识别出的建议窗口置信度均大于第二预设阈值,且类别识别结果与标注的类别信息匹配,则判定初始目标检测模型通过训练。In specific application scenarios, confidence is for each suggestion window, and conditional class probability information is for each grid, that is, the probability of each object in each suggestion window corresponding to each category, such as training recognition a , B, c, d, e five categories, according to the confidence to determine that the suggested window A contains objects, then predict the conditional category probabilities of the suggested window A corresponding to the five categories a, b, c, d, e, such as the prediction result Respectively: 80%, 55%, 50%, 37%, 15%, the category a with the highest conditional category probability is judged as the recognition result, it is necessary to verify whether the object category actually calibrated in the detection frame is category a, if it is a category, it is determined that the initial target detection model recognizes the category information in this suggestion window is correct. When it is determined that the confidence of all the recognized suggestion windows is greater than the second preset threshold, and the category recognition result matches the labeled category information, it is determined that the initial target detection model has passed the training.
209、将候选帧图片输入目标检测模型中,获取候选帧图片对应的第一检测数据信息。209. Input the candidate frame picture into the target detection model, and obtain first detection data information corresponding to the candidate frame picture.
其中,第一检测信息为候选帧图片中包含的所有连通分量的类别、数量,以及各个连通分量对应的位置信息、高度、宽度等数据信息。Wherein, the first detection information is the category and quantity of all connected components contained in the candidate frame picture, and data information such as position information, height, and width corresponding to each connected component.
210、将候选帧图片对应的下一帧单帧图片输入目标检测模型中,获取下一帧单帧图片对应的第二检测数据信息。210. Input the next single frame picture corresponding to the candidate frame picture into the target detection model, and obtain the second detection data information corresponding to the next single frame picture.
其中,下一帧单帧图片为待剪切视频中当前候选帧图片对应下一帧的单帧图片,下一帧单帧图片可为非镜头切换帧图片,也可为候选帧图片。第二检测数据信息为下一帧单帧图片中包含的所有连通分量的类别、数量,以及各个连通分量对应的位置信息、高度、宽度等数据信息。Among them, the next single frame picture is a single frame picture corresponding to the next frame of the current candidate frame picture in the video to be cut, and the next single frame picture may be a non-shot switching frame picture or a candidate frame picture. The second detection data information is the category and quantity of all connected components contained in the next single frame picture, and data information such as position information, height, and width corresponding to each connected component.
211、若判定第一检测数据信息和第二检测数据信息中不包含同一连通分量,则确定候选帧图片为镜头切换帧图片。211. If it is determined that the first detection data information and the second detection data information do not contain the same connected component, determine that the candidate frame picture is a shot switching frame picture.
在具体的应用场景中,对于本实施例,若确定第一检测数据信息和第二检测数据信息中不包含同一连通分量,则可说明当前候选帧图片与对应的下一帧单帧图片处于两个完全不同的镜头场景,即判定候选帧与下一帧之间出现了镜头场景的切换,故保留当前候选帧图片为镜头切换帧图片。反之,若确定第一检测数据信息和第二检测数据信息中至少包含一个同一连通分量,则可确定当前候选帧图片为非镜头切换帧图片,进而滤除该候选帧。In a specific application scenario, for this embodiment, if it is determined that the first detection data information and the second detection data information do not contain the same connected component, it can be explained that the current candidate frame picture and the corresponding next single frame picture are in two. A completely different shot scene, that is, it is determined that a shot scene switching occurs between the candidate frame and the next frame, so the current candidate frame picture is retained as the shot switching frame picture. Conversely, if it is determined that the first detection data information and the second detection data information contain at least one same connected component, it can be determined that the current candidate frame picture is a non-shot switching frame picture, and the candidate frame is filtered out.
212、若判定第一检测数据信息和第二检测数据信息中包含同一连通分量,则计算同一连通分量的差异值。212. If it is determined that the first detection data information and the second detection data information contain the same connected component, calculate the difference value of the same connected component.
在具体的应用场景中,对于本实施例,实施例步骤212具体可以包括:基于第一检测数据信息与第二检测数据信息中同一连通分量的位置坐标信息计算第一差异值;基于第一检测数据信息与第二检测数据信息中同一连通分量的高度和宽度信息计算第二差异值。In a specific application scenario, for this embodiment, step 212 of the embodiment may specifically include: calculating a first difference value based on the position coordinate information of the same connected component in the first detection data information and the second detection data information; The height and width information of the same connected component in the data information and the second detected data information calculate the second difference value.
例如,检测出当前候选帧图片及对应下一帧单帧图片中包含2个相同的连通分量,且对应的两个连通分量分别为:s1、s2,通过第一检测数据信息获取到s1的大小和位置数据为{x1,y1,w1,h1},通过第二检测数据信息获取到s2的大小和位置数据为为:{x2,y2,w2,h2}。其中,x1、y1分别为s1在当前候选帧图片中的位置坐标信息,x2、y2分别为s2在下一帧单帧图片中的位置坐标信息,w1、h1分别为s1的宽和高,w2、h2分别为s2的宽和高。则可计算出第一差异值为:d1=(x1-x2)^2+(y1-y2)^2;第二差异值为:d2=(w1-w2)^2+(h1-h2)^2。For example, it is detected that the current candidate frame picture and the corresponding next frame single frame picture contain two identical connected components, and the corresponding two connected components are: s1, s2, and the size of s1 is obtained through the first detection data information The sum position data is {x1, y1, w1, h1}, and the size and position data of s2 obtained through the second detection data information is: {x2, y2, w2, h2}. Among them, x1 and y1 are respectively the position coordinate information of s1 in the current candidate frame picture, x2 and y2 are respectively the position coordinate information of s2 in the next single frame picture, w1 and h1 are the width and height of s1 respectively, w2 h2 is the width and height of s2 respectively. Then the first difference value can be calculated: d1=(x1-x2)^2+(y1-y2)^2; the second difference value is: d2=(w1-w2)^2+(h1-h2)^ 2.
213、当差异值符合预设条件时,则判定候选帧图片为镜头切换帧图片。213. When the difference value meets the preset condition, determine that the candidate frame picture is a shot switching frame picture.
相应的,对于本实施例,实施例步骤213具体可以包括:若第一差异值和/或第二差异值大于第三预设阈值,则判定候选帧图片为镜头切换帧图片。Correspondingly, for this embodiment, step 213 of the embodiment may specifically include: if the first difference value and/or the second difference value is greater than the third preset threshold, determining that the candidate frame picture is a shot switching frame picture.
其中,预设条件为第一差异值和第二差异值中至少存在一个大于第三预设阈值,第三预设阈值为用于判定候选帧图片为镜头切换帧图片的最小差异值,具体数值可根据实际情况进行设定。The preset condition is that at least one of the first difference value and the second difference value is greater than the third preset threshold, and the third preset threshold is the smallest difference value used to determine that the candidate frame picture is the shot switching frame picture, and the specific value is Can be set according to the actual situation.
例如,基于实施例步骤212中的实例,计算出第一差异值为d1,第二差异值为d2,且设定的第三预设阈值为N2,若判定d1>N2或d2>N2或d1、d2>N2,则可判定候选帧图片为镜头切换帧图片。For example, based on the example in step 212 of the embodiment, the first difference value is calculated as d1, the second difference value is d2, and the third preset threshold is set to N2. If it is determined that d1>N2 or d2>N2 or d1 , D2>N2, it can be determined that the candidate frame picture is a shot switching frame picture.
214、根据镜头切换帧图片将待剪切视频剪切成多个视频片段。214. Cut the to-be-cut video into multiple video clips according to the shot switching frame picture.
在具体的应用场景中,对于本实施例,实施例步骤214具体可以包括:确定各个镜头切换帧图片对应的镜头切换帧;在镜头切换帧处剪切待剪切视频。In a specific application scenario, for this embodiment, step 214 of the embodiment may specifically include: determining a shot switching frame corresponding to each shot switching frame picture; and cutting the video to be cut at the shot switching frame.
例如,从待剪切视频中提取出的所有单帧图片序列为:[t0,…,tn],若确定提取出的镜头切换帧图片对应的镜头切换帧为:tx1,tx2,…,txm,且(t0<tx1<tx2<…<txm<tn)。则可将待剪切视频剪切成[t0,tx1],[tx1+1,tx2],…[txm+1,tn]个视频片段,其中每个视频片段都是一个单一的镜头片段。For example, all the single-frame picture sequences extracted from the video to be cut are: [t0,...,tn], if it is determined that the shot switching frame corresponding to the extracted shot switching frame picture is: tx1, tx2, ..., txm, And (t0<tx1<tx2<...<txm<tn). The video to be cut can be cut into [t0, tx1], [tx1+1, tx2], ... [txm+1, tn] video segments, where each video segment is a single shot segment.
通过上述视频镜头剪切的方法,可通过从待剪切视频中提取出各个单帧图片;在对各个单帧图片进行预处理后,计算各个单帧图片与对应下一帧单帧图片之间的方差变化值,在方差变化值大于第一预设阈值时,判定该单帧图片为候选帧图片,在提取出所有候选帧图片后,基于yolo目标检测算法比较候选帧图片与对应下一帧单帧图片的连通分量的差异度,当差异较大时,则可将该候选帧图片确定为镜头切换帧图片;最后在镜头切换帧图片对应的镜头切换帧处剪切待剪切视频。在本实施例中,通过对镜头切换帧的二重检测,可准确高效的确定出待剪切视频包含的所有镜头切换帧,进而实现对各个单一镜头场景的准确切割,提升了切割效率的同时,也降低了视频剪切的劳动成本。Through the above method of video shot cutting, each single frame picture can be extracted from the video to be cut; after preprocessing each single frame picture, calculate the distance between each single frame picture and the corresponding next single frame picture When the variance change value is greater than the first preset threshold, it is determined that the single frame picture is a candidate frame picture. After all the candidate frame pictures are extracted, the candidate frame picture is compared with the corresponding next frame based on the yolo target detection algorithm When the difference degree of the connected components of a single frame picture is large, the candidate frame picture can be determined as the shot switching frame picture; finally, the to-be-cut video is cut at the shot switching frame corresponding to the shot switching frame picture. In this embodiment, through the double detection of the lens switching frames, all the lens switching frames included in the video to be cut can be accurately and efficiently determined, thereby realizing accurate cutting of each single lens scene, improving the cutting efficiency while , It also reduces the labor cost of video cutting.
进一步的,作为图1和图2所示方法的具体体现,本申请实施例提供了一种视频镜头剪切的装置,如图3所示,该装置包括:提取模块31、筛选模块32、确定模块33、剪切模块34。Further, as a specific embodiment of the method shown in FIG. 1 and FIG. 2, an embodiment of the present application provides a device for cutting a video shot. As shown in FIG. 3, the device includes: an extraction module 31, a screening module 32, and a determination Module 33, cutting module 34.
提取模块31,用于提取待剪切视频中的各个单帧图片;The extraction module 31 is used to extract each single frame picture in the video to be cut;
筛选模块32,用于基于方差变化值从单帧图片中筛选出候选帧图片;The screening module 32 is used for screening candidate frame pictures from a single frame picture based on the variance change value;
确定模块33,用于利用目标检测算法确定候选帧图片中包含的所有镜头切换帧图片;The determining module 33 is configured to determine all shot switching frame pictures included in the candidate frame pictures by using a target detection algorithm;
剪切模块34,用于根据镜头切换帧图片将待剪切视频剪切成多个视频片段。The cutting module 34 is used to cut the to-be-cut video into multiple video clips according to the camera switching frame pictures.
在具体的应用场景中,为了排除干扰,提高单帧图片的检测精度,如图4所示,本装置还包括:缩放模块35、处理模块36。In a specific application scenario, in order to eliminate interference and improve the detection accuracy of a single frame picture, as shown in FIG. 4, the device further includes a scaling module 35 and a processing module 36.
缩放模块35,用于将各个单帧图片缩放到预设尺寸大小;The zoom module 35 is used to zoom each single frame picture to a preset size;
处理模块36,用于对缩放后的单帧图片进行灰度化处理。The processing module 36 is used to perform grayscale processing on the scaled single frame picture.
相应的,为了基于方差变化值从单帧图片中筛选出候选帧图片,筛选模块32,具体用于计算各个单帧图片中所有像素点的方差值;计算各个单帧图片与对应下一帧单帧图片之间的方差变化值;若确定方差变化值小于第一预设阈值,则判定单帧图片为非镜头切换帧图片;若确定方差变化值大于或等于第一预设阈值,则判定单帧图片为候选帧图片。Correspondingly, in order to filter out candidate frame pictures from a single frame picture based on the variance change value, the filtering module 32 is specifically used to calculate the variance value of all pixels in each single frame picture; calculate each single frame picture and the corresponding next frame The variance change value between single frames of pictures; if it is determined that the variance change value is less than the first preset threshold, then the single frame picture is determined to be a non-shot switching frame picture; if the variance change value is determined to be greater than or equal to the first preset threshold, then it is determined A single frame picture is a candidate frame picture.
在具体的应用场景中,为了利用目标检测算法确定候选帧图片中包含的所有镜头切换帧图片,确定模块33,具体用于基于目标检测算法训练得到训练结果满足预设标准的目标检测模型;将候选帧图片输入目标检测模型中,获取候选帧图片对应的第一检测数据信息;将候选帧图片对应的下一帧单帧图片输入目标检测模型中,获取下一帧单帧图片对应的第二检测数据信息;若判定第一检测数据信息和第二检测数据信息中不包含同一连通分量,则确定候选帧图片为镜头切换帧图片;若判定第一检测数据信息和第二检测数据信息中包含同一连通分量,则计算同一连通分量的差异值;当差异值符合预设条件时,则判定候选帧图片为镜头切换帧图片。In a specific application scenario, in order to use the target detection algorithm to determine all the shot switching frame pictures included in the candidate frame picture, the determining module 33 is specifically used to train the target detection algorithm based on the target detection algorithm to obtain a target detection model whose training result meets the preset standard; The candidate frame picture is input into the target detection model to obtain the first detection data information corresponding to the candidate frame picture; the next single frame picture corresponding to the candidate frame picture is input into the target detection model, and the second frame picture corresponding to the next single frame picture is obtained Detection data information; if it is determined that the first detection data information and the second detection data information do not contain the same connected component, it is determined that the candidate frame picture is a shot switching frame picture; if it is determined that the first detection data information and the second detection data information contain For the same connected component, the difference value of the same connected component is calculated; when the difference value meets the preset condition, it is determined that the candidate frame picture is the shot switching frame picture.
相应的,为了基于目标检测算法训练得到训练结果满足预设标准的目标检测模型,确定模块33,具体用于采集多个单帧图片作为样本图像;标注样本图像中各个连通分量的位置坐标和类别信息;将已标注坐标位置的样本图像作为训练集,输入预先基于yolo目标检测算法创建的初始目标检测模型中;利用初始目标检测模型提取样本图像中各类连通分量的图像特征,并基于图像特征生成各个连通分量的建议窗口以及建议窗口对应各类连通分量的条件类别概率;将条件类别概率最大的连通分量类别确定为建议窗口内连通分量的类别识别结果;若判定所有建议窗口的置信度均大于第二预设阈值,且类别识别结果与标注的类别信息匹配,则判定初始目标检测模型通过训练;若判定初始目标检测模型未通过训练,则利用样本图像中标注的各个连通分量的位置坐标和类别信息修正训练初始目标检测模型,以使初始目标检测模型的判定结果满足预设标准。Correspondingly, in order to obtain a target detection model whose training results meet preset standards based on the target detection algorithm training, the determination module 33 is specifically used to collect multiple single-frame pictures as sample images; label the position coordinates and categories of each connected component in the sample image Information; use the sample images with marked coordinate positions as the training set and input them into the initial target detection model created in advance based on the yolo target detection algorithm; use the initial target detection model to extract the image features of various connected components in the sample images, and based on the image features Generate the suggestion window of each connected component and the conditional category probability of the various connected components corresponding to the suggestion window; determine the connected component category with the highest conditional category probability as the category recognition result of the connected component in the suggestion window; if it is determined that the confidence of all the suggestion windows is equal If it is greater than the second preset threshold and the category recognition result matches the labeled category information, it is determined that the initial target detection model has passed the training; if it is determined that the initial target detection model has not passed the training, the position coordinates of each connected component labeled in the sample image are used And the category information is revised to train the initial target detection model so that the judgment result of the initial target detection model meets the preset standard.
在具体的应用场景中,在判定第一检测数据信息和第二检测数据信息中包含同一连通分量时,确定模块33,具体用于基于第一检测数据信息与第二检测数据信息中同一连通分量的位置坐标信息计算第一差异值;基于第一检测数据信息与第二检测数据信息中同一连 通分量的高度和宽度信息计算第二差异值。In a specific application scenario, when it is determined that the first detection data information and the second detection data information contain the same connected component, the determining module 33 is specifically configured to be based on the same connected component in the first detection data information and the second detection data information Calculate the first difference value based on the position coordinate information of the first detection data information and the second detection data information based on the height and width information of the same connected component in the second detection data information.
相应的,当差异值符合预设条件时,确定模块33,具体用于若第一差异值和/或第二差异值大于第三预设阈值,则判定候选帧图片为镜头切换帧图片。Correspondingly, when the difference value meets the preset condition, the determining module 33 is specifically configured to determine that the candidate frame picture is a shot switching frame picture if the first difference value and/or the second difference value is greater than the third preset threshold.
在具体的应用场景中,为了将待剪切视频剪切成多个视频片段,剪切模块34,具体用于确定各个镜头切换帧图片对应的镜头切换帧;在镜头切换帧处剪切待剪切视频。In a specific application scenario, in order to cut the video to be cut into multiple video segments, the cutting module 34 is specifically used to determine the shot switching frame corresponding to each shot switching frame picture; cut the to-be-cut video at the shot switching frame Cut the video.
需要说明的是,本实施例提供的一种视频镜头剪切的装置所涉及各功能单元的其它相应描述,可以参考图1至图2中的对应描述,在此不再赘述。It should be noted that, for other corresponding descriptions of the functional units involved in the apparatus for cutting video shots provided in this embodiment, reference may be made to the corresponding descriptions in FIGS. 1 to 2, and details are not repeated here.
基于上述如图1和图2所示方法,相应的,本申请实施例还提供了一种非易失性可读存储介质,其上存储有计算机可读指令,该计算机可读指令被处理器执行时实现上述如图1和图2所示的视频镜头剪切的方法。Based on the above-mentioned method shown in FIG. 1 and FIG. 2, correspondingly, an embodiment of the present application also provides a non-volatile readable storage medium on which computer-readable instructions are stored, and the computer-readable instructions are When executed, the video shot cutting method shown in FIG. 1 and FIG. 2 is realized.
基于这样的理解,本申请的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施场景的方法。Based on this understanding, the technical solution of this application can be embodied in the form of a software product. The software product can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.), including several The instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods in each implementation scenario of the present application.
基于上述如图1、图2所示的方法,以及图3、图4所示的虚拟装置实施例,为了实现上述目的,本申请实施例还提供了一种计算机设备,具体可以为个人计算机、服务器、网络设备等,该实体设备包括非易失性可读存储介质和处理器;非易失性可读存储介质,用于存储计算机可读指令;处理器,用于执行计算机可读指令以实现上述如图1和图2所示的视频镜头剪切的方法。Based on the methods shown in Figures 1 and 2 and the virtual device embodiments shown in Figures 3 and 4, in order to achieve the above objectives, an embodiment of the present application also provides a computer device, which may be a personal computer, Server, network device, etc., the physical device includes a nonvolatile readable storage medium and a processor; a nonvolatile readable storage medium for storing computer readable instructions; a processor for executing computer readable instructions to The video shot cutting method shown in Figure 1 and Figure 2 is implemented.
可选地,该计算机设备还可以包括用户接口、网络接口、摄像头、射频(Radio Frequency,RF)电路,传感器、音频电路、WI-FI模块等等。用户接口可以包括显示屏(Display)、输入单元比如键盘(Keyboard)等,可选用户接口还可以包括USB接口、读卡器接口等。网络接口可选的可以包括标准的有线接口、无线接口(如蓝牙接口、WI-FI接口)等。Optionally, the computer device may also include a user interface, a network interface, a camera, a radio frequency (RF) circuit, a sensor, an audio circuit, a WI-FI module, and so on. The user interface may include a display screen (Display), an input unit such as a keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, and the like. The network interface can optionally include a standard wired interface, a wireless interface (such as a Bluetooth interface, a WI-FI interface), etc.
本领域技术人员可以理解,本实施例提供的计算机设备结构并不构成对该实体设备的限定,可以包括更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the computer device structure provided in this embodiment does not constitute a limitation on the physical device, and may include more or fewer components, or combine certain components, or arrange different components.
非易失性可读存储介质中还可以包括操作系统、网络通信模块。操作系统是视频镜头剪切的实体设备硬件和软件资源的程序,支持信息处理程序以及其它软件和/或程序的运行。网络通信模块用于实现非易失性可读存储介质内部各组件之间的通信,以及与该实体设备中其它硬件和软件之间通信。The non-volatile readable storage medium may also include an operating system and a network communication module. The operating system is a program for the hardware and software resources of the physical equipment cut by the video lens, and supports the operation of information processing programs and other software and/or programs. The network communication module is used to implement communication between various components in the non-volatile readable storage medium and communication with other hardware and software in the physical device.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到本申请可以借助软件加必要的通用硬件平台的方式来实现,也可以通过硬件实现。通过应用本申请的技术方案,与目前现有技术相比,本申请可通过从待剪切视频中提取出各个单帧图片;在对各个单帧图片进行预处理后,计算各个单帧图片与对应下一帧单帧图片之间的方差变化值,在方差变化值大于第一预设阈值时,判定该单帧图片为候选帧图片,在提取出所有候选帧图片后,基于yolo目标检测算法比较候选帧图片与对应下一帧单帧图片的连通分量的差异度,当差异较大时,则可将该候选帧图片确定为镜头切换帧图片;最后在镜头切换帧图片对应的镜头切换帧处剪切待剪切视频。在本实施例中,通过对镜头切换帧的二重检测,可准确高效的确定出待剪切视频包含的所有镜头切换帧,进而实现对各个单一镜头场景的准确切割,提升了切割效率的同时,也降低了视频剪切的劳动成本。Through the description of the foregoing implementation manners, those skilled in the art can clearly understand that this application can be implemented by means of software plus a necessary general hardware platform, or by hardware. By applying the technical solution of this application, compared with the current prior art, this application can extract each single frame picture from the video to be cut; after preprocessing each single frame picture, calculate the sum of each single frame picture Corresponding to the variance change value between the next single frame picture, when the variance change value is greater than the first preset threshold, determine the single frame picture as a candidate frame picture, after extracting all the candidate frame pictures, based on the yolo target detection algorithm Compare the degree of difference between the connected components of the candidate frame picture and the corresponding next single frame picture. When the difference is large, the candidate frame picture can be determined as the shot switching frame picture; finally, the shot switching frame corresponding to the shot switching frame picture Cut the video to be cut at any place. In this embodiment, through the double detection of the lens switching frames, all the lens switching frames included in the video to be cut can be accurately and efficiently determined, thereby realizing accurate cutting of each single lens scene, improving the cutting efficiency while , It also reduces the labor cost of video cutting.
本领域技术人员可以理解附图只是一个优选实施场景的示意图,附图中的模块或流程并不一定是实施本申请所必须的。本领域技术人员可以理解实施场景中的装置中的模块可以按照实施场景描述进行分布于实施场景的装置中,也可以进行相应变化位于不同于本实施场景的一个或多个装置中。上述实施场景的模块可以合并为一个模块,也可以进一步拆分成多个子模块。Those skilled in the art can understand that the accompanying drawings are only schematic diagrams of preferred implementation scenarios, and the modules or processes in the accompanying drawings are not necessarily necessary for implementing this application. Those skilled in the art can understand that the modules in the device in the implementation scenario can be distributed in the device in the implementation scenario according to the description of the implementation scenario, or can be changed to be located in one or more devices different from the implementation scenario. The modules of the above implementation scenarios can be combined into one module or further divided into multiple sub-modules.
上述本申请序号仅仅为了描述,不代表实施场景的优劣。以上公开的仅为本申请的几个具体实施场景,但是,本申请并非局限于此,任何本领域的技术人员能思之的变化都应落入本申请的保护范围。The above serial number of this application is only for description, and does not represent the merits of implementation scenarios. The above disclosures are only a few specific implementation scenarios of the application, but the application is not limited to these, and any changes that can be thought of by those skilled in the art should fall into the protection scope of the application.

Claims (20)

  1. 一种视频镜头剪切的方法,其特征在于,包括:A method for cutting video footage, which is characterized in that it includes:
    提取待剪切视频中的各个单帧图片;Extract each single frame picture in the video to be cut;
    基于方差变化值从所述单帧图片中筛选出候选帧图片;Filtering out candidate frame pictures from the single frame pictures based on the variance change value;
    利用目标检测算法确定所述候选帧图片中包含的所有镜头切换帧图片;Using a target detection algorithm to determine all shot switching frame pictures included in the candidate frame picture;
    根据所述镜头切换帧图片将所述待剪切视频剪切成多个视频片段。The video to be cut is cut into multiple video clips according to the shot switching frame picture.
  2. 根据权利要求1所述的方法,其特征在于,在所述基于方差变化值从所述单帧图片中筛选出候选帧图片之前,具体还包括:The method according to claim 1, wherein before the screening of candidate frame pictures from the single frame pictures based on the variance change value, the method further specifically comprises:
    将各个所述单帧图片缩放到预设尺寸大小;Scaling each single frame picture to a preset size;
    对缩放后的所述单帧图片进行灰度化处理。Perform grayscale processing on the single frame picture after scaling.
  3. 根据权利要求2所述的方法,其特征在于,所述基于方差变化值从所述单帧图片中筛选出候选帧图片,具体包括:The method according to claim 2, wherein the filtering out candidate frame pictures from the single frame pictures based on the variance change value specifically comprises:
    计算各个所述单帧图片中所有像素点的方差值;Calculating the variance value of all pixels in each single frame picture;
    计算各个所述单帧图片与对应下一帧单帧图片之间的方差变化值;Calculating the variance change value between each single frame picture and the corresponding single frame picture of the next frame;
    若确定所述方差变化值小于第一预设阈值,则判定所述单帧图片为非镜头切换帧图片;If it is determined that the variance change value is less than the first preset threshold, determining that the single frame picture is a non-shot switching frame picture;
    若确定所述方差变化值大于或等于第一预设阈值,则判定所述单帧图片为候选帧图片。If it is determined that the variance change value is greater than or equal to the first preset threshold, it is determined that the single frame picture is a candidate frame picture.
  4. 根据权利要求3所述的方法,其特征在于,所述利用目标检测算法确定所述候选帧图片中包含的所有镜头切换帧图片,具体包括:The method according to claim 3, wherein the determining all shot switching frame pictures included in the candidate frame picture by using a target detection algorithm specifically comprises:
    基于目标检测算法训练得到训练结果满足预设标准的目标检测模型;Based on the target detection algorithm training, the target detection model whose training result meets the preset standard is obtained;
    将所述候选帧图片输入所述目标检测模型中,获取所述候选帧图片对应的第一检测数据信息;Inputting the candidate frame picture into the target detection model, and obtaining first detection data information corresponding to the candidate frame picture;
    将所述候选帧图片对应的下一帧单帧图片输入所述目标检测模型中,获取所述下一帧单帧图片对应的第二检测数据信息;Input the next single frame picture corresponding to the candidate frame picture into the target detection model, and obtain the second detection data information corresponding to the next single frame picture;
    若判定所述第一检测数据信息和所述第二检测数据信息中不包含同一连通分量,则确定所述候选帧图片为镜头切换帧图片;If it is determined that the first detection data information and the second detection data information do not contain the same connected component, determining that the candidate frame picture is a shot switching frame picture;
    若判定所述第一检测数据信息和所述第二检测数据信息中包含同一连通分量,则计算所述同一连通分量的差异值;If it is determined that the first detection data information and the second detection data information contain the same connected component, calculating the difference value of the same connected component;
    当所述差异值符合预设条件时,则判定所述候选帧图片为所述镜头切换帧图片。When the difference value meets a preset condition, it is determined that the candidate frame picture is the shot switching frame picture.
  5. 根据权利要求4所述的方法,其特征在于,所述基于目标检测算法训练得到训练结果满足预设标准的目标检测模型,具体包括:The method according to claim 4, wherein the target detection model training based on the target detection algorithm to obtain a training result that meets a preset standard specifically comprises:
    采集多个单帧图片作为样本图像;Collect multiple single-frame pictures as sample images;
    标注所述样本图像中各个连通分量的位置坐标和类别信息;Label the position coordinates and category information of each connected component in the sample image;
    将已标注坐标位置的所述样本图像作为训练集,输入预先基于yolo目标检测算法创建的初始目标检测模型中;Use the sample images with marked coordinate positions as a training set, and input them into the initial target detection model created in advance based on the yolo target detection algorithm;
    利用所述初始目标检测模型提取所述样本图像中各类连通分量的图像特征,并基于所述图像特征生成各个所述连通分量的建议窗口以及所述建议窗口对应各类连通分量的条件类别概率;The initial target detection model is used to extract the image features of various connected components in the sample image, and based on the image features, a suggestion window for each connected component and the conditional category probability of each connected component corresponding to the suggestion window are generated ;
    将所述条件类别概率最大的连通分量类别确定为所述建议窗口内连通分量的类别识别结果;Determining the connected component category with the largest probability of the conditional category as the category recognition result of the connected component in the suggestion window;
    若判定所有所述建议窗口的置信度均大于第二预设阈值,且所述类别识别结果与标注的所述类别信息匹配,则判定所述初始目标检测模型通过训练;If it is determined that the confidence levels of all the suggestion windows are greater than the second preset threshold, and the category recognition result matches the labeled category information, then it is determined that the initial target detection model has passed training;
    若判定所述初始目标检测模型未通过训练,则利用所述样本图像中标注的各个连通分量的位置坐标和类别信息修正训练所述初始目标检测模型,以使所述初始目标检测模型的判定结果满足预设标准。If it is determined that the initial target detection model has not passed the training, the position coordinates and category information of each connected component marked in the sample image are used to modify and train the initial target detection model so that the judgment result of the initial target detection model Meet preset standards.
  6. 根据权利要求5所述的方法,其特征在于,所述若判定所述第一检测数据信息和所述第二检测数据信息中包含同一连通分量,则计算所述同一连通分量的差异值,具体包括:The method according to claim 5, wherein if it is determined that the first detection data information and the second detection data information contain the same connected component, then the difference value of the same connected component is calculated, specifically include:
    基于所述第一检测数据信息与所述第二检测数据信息中所述同一连通分量的位置坐标信息计算第一差异值;Calculating a first difference value based on the position coordinate information of the same connected component in the first detection data information and the second detection data information;
    基于所述第一检测数据信息与所述第二检测数据信息中所述同一连通分量的高度和宽度信息计算第二差异值;Calculating a second difference value based on the height and width information of the same connected component in the first detection data information and the second detection data information;
    所述当所述差异值符合预设条件时,则判定所述候选帧图片为所述镜头切换帧图片,具体包括:The determining that the candidate frame picture is the shot switching frame picture when the difference value meets a preset condition specifically includes:
    若所述第一差异值和/或所述第二差异值大于第三预设阈值,则判定所述候选帧图片为镜头切换帧图片。If the first difference value and/or the second difference value is greater than a third preset threshold, it is determined that the candidate frame picture is a shot switching frame picture.
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述镜头切换帧图片将所述待剪切视频剪切成多个视频片段,具体包括:The method according to claim 6, wherein the cutting the to-be-cut video into multiple video clips according to the shot switching frame picture specifically comprises:
    确定各个所述镜头切换帧图片对应的镜头切换帧;Determining the lens switching frame corresponding to each of the lens switching frame pictures;
    在所述镜头切换帧处剪切所述待剪切视频。The video to be cut is cut at the shot switching frame.
  8. 一种视频镜头剪切的装置,其特征在于,包括:A device for cutting video footage, characterized in that it comprises:
    提取模块,用于提取待剪切视频中的各个单帧图片;The extraction module is used to extract each single frame picture in the video to be cut;
    筛选模块,用于基于方差变化值从所述单帧图片中筛选出候选帧图片;A screening module, configured to screen out candidate frame pictures from the single frame pictures based on the variance change value;
    确定模块,用于利用目标检测算法确定所述候选帧图片中包含的所有镜头切换帧图片;A determining module, configured to determine all shot switching frame pictures included in the candidate frame picture by using a target detection algorithm;
    剪切模块,用于根据所述镜头切换帧图片将所述待剪切视频剪切成多个视频片段。The cutting module is configured to cut the to-be-cut video into multiple video clips according to the shot switching frame picture.
  9. 根据权利要求8所述的装置,其特征在于,所述装置还包括:缩放模块、处理模块;The apparatus according to claim 8, wherein the apparatus further comprises: a scaling module and a processing module;
    所述缩放模块,用于将各个所述单帧图片缩放到预设尺寸大小;The zoom module is configured to zoom each single frame picture to a preset size;
    所述处理模块,用于对缩放后的所述单帧图片进行灰度化处理。The processing module is configured to perform grayscale processing on the single-frame picture after scaling.
  10. 根据权利要求9所述的装置,其特征在于,所述筛选模块,具体用于计算各个所述单帧图片中所有像素点的方差值;计算各个所述单帧图片与对应下一帧单帧图片之间的方差变化值;若确定所述方差变化值小于第一预设阈值,则判定所述单帧图片为非镜头切换帧图片;若确定所述方差变化值大于或等于第一预设阈值,则判定所述单帧图片为候选帧图片。The device according to claim 9, wherein the filtering module is specifically configured to calculate the variance value of all pixels in each single frame picture; The variance change value between frames of pictures; if it is determined that the variance change value is less than the first preset threshold, it is determined that the single frame picture is a non-shot switching frame picture; if it is determined that the variance change value is greater than or equal to the first preset If the threshold is set, it is determined that the single frame picture is a candidate frame picture.
  11. 根据权利要求10所述的装置,其特征在于,所述确定模块,具体用于基于目标检测算法训练得到训练结果满足预设标准的目标检测模型;将所述候选帧图片输入所述目标检测模型中,获取所述候选帧图片对应的第一检测数据信息;将所述候选帧图片对应的下一帧单帧图片输入所述目标检测模型中,获取所述下一帧单帧图片对应的第二检测数据信息;若判定所述第一检测数据信息和所述第二检测数据信息中不包含同一连通分量,则确定所述候选帧图片为镜头切换帧图片;若判定所述第一检测数据信息和所述第二检测数据信息中包含同一连通分量,则计算所述同一连通分量的差异值;当所述差异值符合预设条件时,则判定所述候选帧图片为所述镜头切换帧图片。The apparatus according to claim 10, wherein the determining module is specifically configured to obtain a target detection model whose training result meets a preset standard based on the target detection algorithm training; input the candidate frame picture into the target detection model In, the first detection data information corresponding to the candidate frame picture is obtained; the next single frame picture corresponding to the candidate frame picture is input into the target detection model, and the first detection data information corresponding to the next single frame picture is obtained. 2. Detection data information; if it is determined that the first detection data information and the second detection data information do not contain the same connected component, determine that the candidate frame picture is a shot switching frame picture; if it is determined that the first detection data If the information and the second detection data information contain the same connected component, the difference value of the same connected component is calculated; when the difference value meets a preset condition, it is determined that the candidate frame picture is the shot switching frame image.
  12. 根据权利要求11所述的装置,其特征在于,所述确定模块,具体用于采集多个单帧图片作为样本图像;标注所述样本图像中各个连通分量的位置坐标和类别信息;将已标注坐标位置的所述样本图像作为训练集,输入预先基于yolo目标检测算法创建的初始目标检测模型中;利用所述初始目标检测模型提取所述样本图像中各类连通分量的图像特征,并基于所述图像特征生成各个所述连通分量的建议窗口以及所述建议窗口对应各类连通分量的条件类别概率;将所述条件类别概率最大的连通分量类别确定为所述建议窗口内连通分量的类别识别结果;若判定所有所述建议窗口的置信度均大于第二预设阈值,且所述类别识别结果与标注的所述类别信息匹配,则判定所述初始目标检测模型通过训练;若判定所述初始目标检测模型未通过训练,则利用所述样本图像中标注的各个连通分量的位置坐标和类别信息修正训练所述初始目标检测模型,以使所述初始目标检测模型的判定结果满足预设标准。The device according to claim 11, wherein the determining module is specifically configured to collect multiple single-frame pictures as sample images; label the position coordinates and category information of each connected component in the sample image; The sample image of the coordinate position is used as a training set and input into the initial target detection model created in advance based on the yolo target detection algorithm; the initial target detection model is used to extract the image features of various connected components in the sample image, and based on all The image feature generates the suggestion window of each connected component and the conditional category probabilities of the various connected components corresponding to the suggestion window; the connected component category with the highest conditional category probability is determined as the category identification of the connected component in the suggestion window Result; if it is determined that the confidence levels of all the suggested windows are greater than the second preset threshold, and the category recognition result matches the labeled category information, then it is determined that the initial target detection model has passed the training; if it is determined that the If the initial target detection model fails the training, the position coordinates and category information of each connected component marked in the sample image are used to modify and train the initial target detection model so that the judgment result of the initial target detection model meets the preset standard .
  13. 根据权利要求12所述的装置,其特征在于,所述确定模块,具体用于基于所述第一检测数据信息与所述第二检测数据信息中所述同一连通分量的位置坐标信息计算第一差异值;基于所述第一检测数据信息与所述第二检测数据信息中所述同一连通分量的高度和宽度信息计算第二差异值;所述当所述差异值符合预设条件时,则判定所述候选帧图片为所述镜头切换帧图片,具体包括:若所述第一差异值和/或所述第二差异值大于第三预设阈值,则判定所述候选帧图片为镜头切换帧图片。The device according to claim 12, wherein the determining module is specifically configured to calculate the first detection data based on the position coordinate information of the same connected component in the first detection data information and the second detection data information. Difference value; calculating a second difference value based on the height and width information of the same connected component in the first detection data information and the second detection data information; when the difference value meets a preset condition, then The determining that the candidate frame picture is the shot switching frame picture specifically includes: if the first difference value and/or the second difference value is greater than a third preset threshold, determining that the candidate frame picture is a shot switching Frame picture.
  14. 根据权利要求13所述的装置,其特征在于,所述剪切模块,具体用于确定各个所述镜头切换帧图片对应的镜头切换帧;在所述镜头切换帧处剪切所述待剪切视频。The device according to claim 13, wherein the cutting module is specifically configured to determine the lens switching frame corresponding to each of the lens switching frame pictures; to cut the to-be-cut at the lens switching frame video.
  15. 一种非易失性可读存储介质,其上存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现视频镜头剪切的方法,包括:提取待剪切视频中的各个单帧图片;基于方差变化值从所述单帧图片中筛选出候选帧图片;利用目标检测算法确定所述候选帧图片中包含的所有镜头切换帧图片;根据所述镜头切换帧图片将所述待剪切视频剪切成多个视频片段。A non-volatile readable storage medium having computer readable instructions stored thereon, wherein the method for cutting a video shot when the computer readable instructions are executed by a processor includes: extracting a video to be cut Each single frame picture in the single frame picture; filter out candidate frame pictures from the single frame picture based on the variance change value; use a target detection algorithm to determine all shot switching frame pictures included in the candidate frame picture; switch frame pictures according to the shot The video to be cut is cut into multiple video clips.
  16. 根据权利要求15所述的非易失性可读存储介质,其特征在于,所述计算机可读指令被处理器执行时实现所述基于方差变化值从所述单帧图片中筛选出候选帧图片,包括:计算各个所述单帧图片中所有像素点的方差值;计算各个所述单帧图片与对应下一帧单帧图片之间的方差变化值;若确定所述方差变化值小于第一预设阈值,则判定所述单帧图片为非镜头切换帧图片;若确定所述方差变化值大于或等于第一预设阈值,则判定所述单帧图片为候选帧图片。The non-volatile readable storage medium according to claim 15, wherein when the computer readable instructions are executed by a processor, the candidate frame pictures are selected from the single frame pictures based on the variance change value. , Including: calculating the variance value of all pixels in each single frame picture; calculating the variance change value between each single frame picture and the corresponding next single frame picture; if it is determined that the variance change value is less than the first If it is determined that the variance change value is greater than or equal to a first preset threshold, it is determined that the single frame picture is a candidate frame picture.
  17. 根据权利要求16所述的非易失性可读存储介质,其特征在于,所述计算机可读指令被处理器执行时实现所述利用目标检测算法确定所述候选帧图片中包含的所有镜头切换帧图片,包括:基于目标检测算法训练得到训练结果满足预设标准的目标检测模型;将所述候选帧图片输入所述目标检测模型中,获取所述候选帧图片对应的第一检测数据信息;将所述候选帧图片对应的下一帧单帧图片输入所述目标检测模型中,获取所述下一帧单帧图片对应的第二检测数据信息;若判定所述第一检测数据信息和所述第二检测数据信息中不包含同一连通分量,则确定所述候选帧图片为镜头切换帧图片;若判定所述第一检测数据信息和所述第二检测数据信息中包含同一连通分量,则计算所述同一连通分量的差异值;当所述差异值符合预设条件时,则判定所述候选帧图片为所述镜头切换帧图片。The non-volatile readable storage medium according to claim 16, wherein the computer-readable instruction is executed by the processor to implement the target detection algorithm to determine all shots included in the candidate frame picture The frame picture includes: a target detection model trained based on a target detection algorithm to obtain a training result that meets a preset standard; inputting the candidate frame picture into the target detection model to obtain first detection data information corresponding to the candidate frame picture; Input the next single frame picture corresponding to the candidate frame picture into the target detection model to obtain the second detection data information corresponding to the next single frame picture; If the second detection data information does not contain the same connected component, it is determined that the candidate frame picture is a shot switching frame picture; if it is determined that the first detection data information and the second detection data information contain the same connected component, then Calculate the difference value of the same connected component; when the difference value meets a preset condition, determine that the candidate frame picture is the shot switching frame picture.
  18. 一种计算机设备,包括非易失性可读存储介质、处理器及存储在非易失性可读存储介质上并可在处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现视频镜头剪切的方法,包括:提取待剪切视频中的各个单帧图片;基于 方差变化值从所述单帧图片中筛选出候选帧图片;利用目标检测算法确定所述候选帧图片中包含的所有镜头切换帧图片;根据所述镜头切换帧图片将所述待剪切视频剪切成多个视频片段。A computer device, including a non-volatile readable storage medium, a processor, and computer readable instructions stored on the non-volatile readable storage medium and running on the processor, characterized in that the processor The method for implementing video shot cutting when the computer-readable instruction is executed includes: extracting each single frame picture in the video to be cut; selecting candidate frame pictures from the single frame pictures based on the variance change value; and using target detection An algorithm determines all shot switching frame pictures included in the candidate frame picture; and cutting the to-be-cut video into multiple video segments according to the shot switching frame picture.
  19. 根据权利要求18所述的非易失性可读存储介质,其特征在于,所述计算机可读指令被处理器执行时实现所述基于方差变化值从所述单帧图片中筛选出候选帧图片,包括:计算各个所述单帧图片中所有像素点的方差值;计算各个所述单帧图片与对应下一帧单帧图片之间的方差变化值;若确定所述方差变化值小于第一预设阈值,则判定所述单帧图片为非镜头切换帧图片;若确定所述方差变化值大于或等于第一预设阈值,则判定所述单帧图片为候选帧图片。The non-volatile readable storage medium according to claim 18, wherein when the computer-readable instructions are executed by a processor, the candidate frame pictures are selected from the single frame pictures based on the variance change value , Including: calculating the variance value of all pixels in each single frame picture; calculating the variance change value between each single frame picture and the corresponding next single frame picture; if it is determined that the variance change value is less than the first If it is determined that the variance change value is greater than or equal to a first preset threshold, it is determined that the single frame picture is a candidate frame picture.
  20. 根据权利要求19所述的非易失性可读存储介质,其特征在于,所述计算机可读指令被处理器执行时实现所述利用目标检测算法确定所述候选帧图片中包含的所有镜头切换帧图片,包括:基于目标检测算法训练得到训练结果满足预设标准的目标检测模型;将所述候选帧图片输入所述目标检测模型中,获取所述候选帧图片对应的第一检测数据信息;将所述候选帧图片对应的下一帧单帧图片输入所述目标检测模型中,获取所述下一帧单帧图片对应的第二检测数据信息;若判定所述第一检测数据信息和所述第二检测数据信息中不包含同一连通分量,则确定所述候选帧图片为镜头切换帧图片;若判定所述第一检测数据信息和所述第二检测数据信息中包含同一连通分量,则计算所述同一连通分量的差异值;当所述差异值符合预设条件时,则判定所述候选帧图片为所述镜头切换帧图片。The non-volatile readable storage medium according to claim 19, wherein when the computer-readable instructions are executed by a processor, the target detection algorithm is used to determine all shots included in the candidate frame picture The frame picture includes: a target detection model trained based on a target detection algorithm to obtain a training result that meets a preset standard; inputting the candidate frame picture into the target detection model to obtain first detection data information corresponding to the candidate frame picture; Input the next single frame picture corresponding to the candidate frame picture into the target detection model to obtain the second detection data information corresponding to the next single frame picture; If the second detection data information does not contain the same connected component, it is determined that the candidate frame picture is a shot switching frame picture; if it is determined that the first detection data information and the second detection data information contain the same connected component, then Calculate the difference value of the same connected component; when the difference value meets a preset condition, determine that the candidate frame picture is the shot switching frame picture.
PCT/CN2019/103528 2019-07-11 2019-08-30 Video shot cutting method and apparatus, and computer device WO2021003825A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910624918.6A CN110430443B (en) 2019-07-11 2019-07-11 Method and device for cutting video shot, computer equipment and storage medium
CN201910624918.6 2019-07-11

Publications (1)

Publication Number Publication Date
WO2021003825A1 true WO2021003825A1 (en) 2021-01-14

Family

ID=68410483

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103528 WO2021003825A1 (en) 2019-07-11 2019-08-30 Video shot cutting method and apparatus, and computer device

Country Status (2)

Country Link
CN (1) CN110430443B (en)
WO (1) WO2021003825A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113825012A (en) * 2021-06-04 2021-12-21 腾讯科技(深圳)有限公司 Video data processing method and computer device
CN113840159A (en) * 2021-09-26 2021-12-24 北京沃东天骏信息技术有限公司 Video processing method, device, computer system and readable storage medium
CN114120250A (en) * 2021-11-30 2022-03-01 北京文安智能技术股份有限公司 Video-based method for detecting illegal people carried by motor vehicle
CN114140461A (en) * 2021-12-09 2022-03-04 成都智元汇信息技术股份有限公司 Picture cutting method based on edge picture recognition box, electronic equipment and medium
CN114189754A (en) * 2021-12-08 2022-03-15 湖南快乐阳光互动娱乐传媒有限公司 Video plot segmentation method and system
CN114363695A (en) * 2021-11-11 2022-04-15 腾讯科技(深圳)有限公司 Video processing method, video processing device, computer equipment and storage medium
CN115022711A (en) * 2022-04-28 2022-09-06 之江实验室 System and method for ordering lens videos in movie scene
CN115119050A (en) * 2022-06-30 2022-09-27 北京奇艺世纪科技有限公司 Video clipping method and device, electronic equipment and storage medium
CN115174957A (en) * 2022-06-27 2022-10-11 咪咕文化科技有限公司 Bullet screen calling method and device, computer equipment and readable storage medium
CN115457447A (en) * 2022-11-07 2022-12-09 浙江莲荷科技有限公司 Moving object identification method, device and system, electronic equipment and storage medium
CN115861914A (en) * 2022-10-24 2023-03-28 广东魅视科技股份有限公司 Method for assisting user in searching specific target

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444819B (en) * 2020-03-24 2024-01-23 北京百度网讯科技有限公司 Cut frame determining method, network training method, device, equipment and storage medium
CN111491183B (en) * 2020-04-23 2022-07-12 百度在线网络技术(北京)有限公司 Video processing method, device, equipment and storage medium
CN112584073B (en) * 2020-12-24 2022-08-02 杭州叙简科技股份有限公司 5G-based law enforcement recorder distributed assistance calculation method
CN114286171B (en) * 2021-08-19 2023-04-07 腾讯科技(深圳)有限公司 Video processing method, device, equipment and storage medium
CN114155473B (en) * 2021-12-09 2022-11-08 成都智元汇信息技术股份有限公司 Picture cutting method based on frame compensation, electronic equipment and medium
CN114446331B (en) * 2022-04-07 2022-06-24 深圳爱卓软科技有限公司 Video editing software system capable of rapidly cutting video

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090141178A1 (en) * 2007-11-30 2009-06-04 Kerofsky Louis J Methods and Systems for Backlight Modulation with Scene-Cut Detection
CN102497556A (en) * 2011-12-26 2012-06-13 深圳市融创天下科技股份有限公司 Time-variation-degree-based scene switching detection method, device and equipment
CN105612535A (en) * 2013-08-29 2016-05-25 匹斯奥特(以色列)有限公司 Efficient content-based video retrieval
CN106162222A (en) * 2015-04-22 2016-11-23 无锡天脉聚源传媒科技有限公司 A kind of method and device of video lens cutting
CN106937114A (en) * 2015-12-30 2017-07-07 株式会社日立制作所 Method and apparatus for being detected to video scene switching
CN109510919A (en) * 2011-10-11 2019-03-22 瑞典爱立信有限公司 Scene change detection for the perceived quality assessment in video sequence
CN109740499A (en) * 2018-12-28 2019-05-10 北京旷视科技有限公司 Methods of video segmentation, video actions recognition methods, device, equipment and medium

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100559880C (en) * 2007-08-10 2009-11-11 中国传媒大学 A kind of highly-clear video image quality evaluation method and device based on self-adapted ST area
US8744122B2 (en) * 2008-10-22 2014-06-03 Sri International System and method for object detection from a moving platform
US8509526B2 (en) * 2010-04-13 2013-08-13 International Business Machines Corporation Detection of objects in digital images
CN103227963A (en) * 2013-03-20 2013-07-31 西交利物浦大学 Static surveillance video abstraction method based on video moving target detection and tracing
CN103426176B (en) * 2013-08-27 2017-03-01 重庆邮电大学 Based on the shot detection method improving rectangular histogram and clustering algorithm
CN103945281B (en) * 2014-04-29 2018-04-17 中国联合网络通信集团有限公司 Transmission of video processing method, device and system
CN104394422B (en) * 2014-11-12 2017-11-17 华为软件技术有限公司 A kind of Video segmentation point acquisition methods and device
CN104410867A (en) * 2014-11-17 2015-03-11 北京京东尚科信息技术有限公司 Improved video shot detection method
CN104715023B (en) * 2015-03-02 2018-08-03 北京奇艺世纪科技有限公司 Method of Commodity Recommendation based on video content and system
CN105025360B (en) * 2015-07-17 2018-07-17 江西洪都航空工业集团有限责任公司 A kind of method of improved fast video concentration
CN106331524B (en) * 2016-08-18 2019-07-26 无锡天脉聚源传媒科技有限公司 A kind of method and device identifying Shot change
US11004209B2 (en) * 2017-10-26 2021-05-11 Qualcomm Incorporated Methods and systems for applying complex object detection in a video analytics system
CN108205657A (en) * 2017-11-24 2018-06-26 中国电子科技集团公司电子科学研究院 Method, storage medium and the mobile terminal of video lens segmentation
CN108182421B (en) * 2018-01-24 2020-07-14 北京影谱科技股份有限公司 Video segmentation method and device
CN108769731B (en) * 2018-05-25 2021-09-24 北京奇艺世纪科技有限公司 Method and device for detecting target video clip in video and electronic equipment
CN108470077B (en) * 2018-05-28 2023-07-28 广东工业大学 Video key frame extraction method, system and device and storage medium
CN109819338B (en) * 2019-02-22 2021-09-14 影石创新科技股份有限公司 Automatic video editing method and device and portable terminal
CN109934131A (en) * 2019-02-28 2019-06-25 南京航空航天大学 A kind of small target detecting method based on unmanned plane

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090141178A1 (en) * 2007-11-30 2009-06-04 Kerofsky Louis J Methods and Systems for Backlight Modulation with Scene-Cut Detection
CN109510919A (en) * 2011-10-11 2019-03-22 瑞典爱立信有限公司 Scene change detection for the perceived quality assessment in video sequence
CN102497556A (en) * 2011-12-26 2012-06-13 深圳市融创天下科技股份有限公司 Time-variation-degree-based scene switching detection method, device and equipment
CN105612535A (en) * 2013-08-29 2016-05-25 匹斯奥特(以色列)有限公司 Efficient content-based video retrieval
CN106162222A (en) * 2015-04-22 2016-11-23 无锡天脉聚源传媒科技有限公司 A kind of method and device of video lens cutting
CN106937114A (en) * 2015-12-30 2017-07-07 株式会社日立制作所 Method and apparatus for being detected to video scene switching
CN109740499A (en) * 2018-12-28 2019-05-10 北京旷视科技有限公司 Methods of video segmentation, video actions recognition methods, device, equipment and medium

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113825012A (en) * 2021-06-04 2021-12-21 腾讯科技(深圳)有限公司 Video data processing method and computer device
CN113840159A (en) * 2021-09-26 2021-12-24 北京沃东天骏信息技术有限公司 Video processing method, device, computer system and readable storage medium
CN114363695B (en) * 2021-11-11 2023-06-13 腾讯科技(深圳)有限公司 Video processing method, device, computer equipment and storage medium
CN114363695A (en) * 2021-11-11 2022-04-15 腾讯科技(深圳)有限公司 Video processing method, video processing device, computer equipment and storage medium
CN114120250A (en) * 2021-11-30 2022-03-01 北京文安智能技术股份有限公司 Video-based method for detecting illegal people carried by motor vehicle
CN114120250B (en) * 2021-11-30 2024-04-05 北京文安智能技术股份有限公司 Video-based motor vehicle illegal manned detection method
CN114189754A (en) * 2021-12-08 2022-03-15 湖南快乐阳光互动娱乐传媒有限公司 Video plot segmentation method and system
CN114140461B (en) * 2021-12-09 2023-02-14 成都智元汇信息技术股份有限公司 Picture cutting method based on edge picture recognition box, electronic equipment and medium
CN114140461A (en) * 2021-12-09 2022-03-04 成都智元汇信息技术股份有限公司 Picture cutting method based on edge picture recognition box, electronic equipment and medium
CN115022711A (en) * 2022-04-28 2022-09-06 之江实验室 System and method for ordering lens videos in movie scene
CN115174957A (en) * 2022-06-27 2022-10-11 咪咕文化科技有限公司 Bullet screen calling method and device, computer equipment and readable storage medium
CN115174957B (en) * 2022-06-27 2023-08-15 咪咕文化科技有限公司 Barrage calling method and device, computer equipment and readable storage medium
CN115119050A (en) * 2022-06-30 2022-09-27 北京奇艺世纪科技有限公司 Video clipping method and device, electronic equipment and storage medium
CN115119050B (en) * 2022-06-30 2023-12-15 北京奇艺世纪科技有限公司 Video editing method and device, electronic equipment and storage medium
CN115861914A (en) * 2022-10-24 2023-03-28 广东魅视科技股份有限公司 Method for assisting user in searching specific target
CN115457447A (en) * 2022-11-07 2022-12-09 浙江莲荷科技有限公司 Moving object identification method, device and system, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN110430443A (en) 2019-11-08
CN110430443B (en) 2022-01-25

Similar Documents

Publication Publication Date Title
WO2021003825A1 (en) Video shot cutting method and apparatus, and computer device
EP3826317B1 (en) Method and device for identifying key time point of video, computer apparatus and storage medium
US20200167554A1 (en) Gesture Recognition Method, Apparatus, And Device
US9721387B2 (en) Systems and methods for implementing augmented reality
RU2637989C2 (en) Method and device for identifying target object in image
US9756261B2 (en) Method for synthesizing images and electronic device thereof
EP2374089B1 (en) Method, apparatus and computer program product for providing hand segmentation for gesture analysis
US20170154238A1 (en) Method and electronic device for skin color detection
US20150358549A1 (en) Image capturing parameter adjustment in preview mode
US8417026B2 (en) Gesture recognition methods and systems
US20180101949A1 (en) Automated nuclei area/number estimation for ihc image analysis
CN110460838B (en) Lens switching detection method and device and computer equipment
CN103106388B (en) Method and system of image recognition
CN111695540A (en) Video frame identification method, video frame cutting device, electronic equipment and medium
JP2014044461A (en) Image processing device and method, and program
CN112633313B (en) Bad information identification method of network terminal and local area network terminal equipment
JP7111873B2 (en) SIGNAL LAMP IDENTIFICATION METHOD, APPARATUS, DEVICE, STORAGE MEDIUM AND PROGRAM
CN108665769B (en) Network teaching method and device based on convolutional neural network
JP2019517079A (en) Shape detection
CN114613006A (en) Remote gesture recognition method and device
US9727145B2 (en) Detecting device and detecting method
CN113992976B (en) Video playing method, device, equipment and computer storage medium
WO2013104322A1 (en) Object recognizing method and object recognizing device
JP2016045744A (en) Image processor, image processing method, and program
CN112102147B (en) Background blurring identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19936643

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19936643

Country of ref document: EP

Kind code of ref document: A1