WO2021003825A1 - Procédé et appareil de découpage en séquences vidéo, et dispositif informatique - Google Patents

Procédé et appareil de découpage en séquences vidéo, et dispositif informatique Download PDF

Info

Publication number
WO2021003825A1
WO2021003825A1 PCT/CN2019/103528 CN2019103528W WO2021003825A1 WO 2021003825 A1 WO2021003825 A1 WO 2021003825A1 CN 2019103528 W CN2019103528 W CN 2019103528W WO 2021003825 A1 WO2021003825 A1 WO 2021003825A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame picture
target detection
determined
data information
video
Prior art date
Application number
PCT/CN2019/103528
Other languages
English (en)
Chinese (zh)
Inventor
雷晨雨
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021003825A1 publication Critical patent/WO2021003825A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Definitions

  • Shot switching is a very important step in video editing. It is not only required for the narrative composition or artistic expression of TV programs, but also for the audience to watch. Generally, in long videos such as sports games or TV programs, it is often necessary to switch shots more frequently, and then it is necessary to cut this long video into multiple video clips of a single shot scene. With the improvement of people's living standards, the quality requirements for viewing entertainment items are becoming more and more stringent. Therefore, how to strengthen the video cutting technology to make the video editing better meet the consumer's user experience is particularly important in the current environment .
  • the present application discloses a method, device and computer equipment for cutting video footage.
  • the main purpose is to solve the problem of cumbersome, inefficient and time-consuming cutting operations when using manual software tools to cut video. problem.
  • a method for cutting a video shot including:
  • the video to be cut is cut into multiple video clips according to the shot switching frame picture.
  • an apparatus for cutting a video lens including:
  • the extraction module is used to extract each single frame picture in the video to be cut;
  • a screening module configured to screen out candidate frame pictures from the single frame pictures based on the variance change value
  • a determining module configured to determine all shot switching frame pictures included in the candidate frame picture by using a target detection algorithm
  • the cutting module is configured to cut the to-be-cut video into multiple video clips according to the shot switching frame picture.
  • a non-volatile readable storage medium having computer readable instructions stored thereon, and the computer readable instructions are executed by a processor to implement the above-mentioned video shot cutting method.
  • a computer device including a non-volatile readable storage medium, a processor, and a computer-readable storage medium that is stored on the non-volatile readable storage medium and can run on the processor. Instructions, when the processor executes the computer-readable instructions, the video shot cutting method is implemented.
  • the method, device and computer equipment for cutting video shots provided by this application are compared with the current way of using manual software tools for video cutting.
  • This application can extract video from the video to be cut. Each single frame picture is selected; based on the variance change value, the candidate frame picture is initially selected from the single frame picture; then the target detection algorithm is used to determine each adjacent candidate frame with large differences, so as to determine the shot switching frame from the candidate frame pictures Picture; finally, the video to be cut is automatically cut into multiple video clips according to the camera switch frame picture.
  • FIG. 1 shows a schematic flowchart of a method for cutting a video shot provided by an embodiment of the present application
  • FIG. 2 shows a schematic flowchart of another video shot cutting method provided by an embodiment of the present application
  • FIG. 3 shows a schematic structural diagram of a video lens cutting device provided by an embodiment of the present application
  • Fig. 4 shows a schematic structural diagram of another video lens cutting device provided by an embodiment of the present application.
  • an embodiment of the present application provides a method for cutting a video shot, as shown in FIG. Methods include:
  • the pre-cut video to be cut must be shown for at least three minutes.
  • the first step of performing the cutting operation is to extract each single frame of pictures from the to-be-cut video, so as to determine all the shot switching frames contained in the to-be-cut video by comparing and analyzing each single frame of pictures.
  • the two phases can be initially determined by calculating the variance change difference between each single frame picture and the adjacent single frame picture.
  • the change of the high frequency part of the pixel in the adjacent single frame picture the greater the variance change value, the greater the fluctuation of the pixel point.
  • the single-frame picture can be preliminarily determined as a candidate frame picture and removed at the same time.
  • the non-shot-switching frame pictures determined by the difference of variance change are small, so that all the retained single-frame pictures are candidate frame pictures, so as to perform finer screening.
  • the target detection algorithm uses the yolo target detection method, that is, the detection task of the connected components in the candidate frame picture is treated as a regression problem, and the detection is directly obtained through all the pixels of the entire picture.
  • the coordinates of the bounding box and the bounding box contain the confidence of the object and the conditional category probability.
  • the position coordinates of each bounding box are (x, y, w, h), x and y represent the coordinates of the center point of the bounding box, and w and h represent the width and height of the bounding box.
  • the video to be cut can be automatically cut, and then multiple video clips in a single shot scene can be obtained.
  • each single frame picture can be extracted from the video to be cut; the candidate frame picture is initially selected from the single frame picture based on the variance change value; then the target detection algorithm is used to determine the existence Adjacent candidate frames with large differences are used to determine the shot switching frame picture from the candidate frame pictures; finally, the video to be cut is automatically cut into multiple video clips according to the shot switching frame picture.
  • the technical solution in this application it is possible to automatically extract the shot switching frame from the video to be cut according to the variance calculation result and the detection result of the yolo target detection model, and complete the cutting of the video to be cut at the shot switching frame. It avoids detection errors that are easy to occur during manual detection, and effectively improves the detection accuracy of lens switching frames and the efficiency of lens cutting.
  • the method includes:
  • determining the speed of lens switching can be determined by the number of different single-frame pictures played by the lens per second. When the number of different single-frame pictures played per second is greater than the screen transition threshold, it means that the camera will play within one second.
  • the video segment is a fast camera switch, otherwise it means a slow camera switch.
  • the pictures corresponding to each continuous frame in the video to be cut can be extracted as the waiting in this embodiment.
  • the analyzed single frame picture continue to perform the analysis and cut operation in steps 202 to 214 of the embodiment.
  • sampling frequency greater than 20 frames
  • the pictures are sparsely sampled through the sampling frequency, and a sampled picture is acquired in each sampling period as a single-frame picture to be analyzed in this embodiment.
  • the sampling frequency of a single frame picture can be set to 32, and the picture can be sparsely sampled by the sampling frequency to reduce the amount of calculation. If a video frame has 300 frames, the 0th frame, the 32nd frame, the 32*2 frame, the 32*3 frame, the 32*4 frame, etc. can be extracted according to the sampling frequency as the single pictures in this embodiment. Frame picture.
  • the single-frame pictures can be processed into a uniform format and size.
  • Set the preset size to 256*256.
  • each single frame image needs to be scaled to a pixel size of 256*256.
  • the single-frame pictures extracted from the video to be cut are mostly color images, they all adopt the RGB color mode.
  • the formula for calculating the variance of each single frame picture is:
  • S(t) is the variance value of each single frame picture
  • xi is the gray value of each pixel in the single frame picture
  • Is the average gray value of all pixels in a single frame of picture
  • n is the total number of pixels contained in a single frame of picture participating in the variance comparison.
  • the variance change between each single frame picture and the next single frame picture adjacent to each other can be used to preliminarily determine the changes in the high frequency part of the pixels in two adjacent single frame pictures. Therefore, by calculating the variance change value, the size of the change between the current single frame picture and the next frame picture can be preliminarily determined, so as to distinguish whether the current single frame picture is a non-shot switching frame picture or a candidate frame picture.
  • the first preset threshold is a minimum variance change value used to determine that the current single frame picture is a candidate frame picture.
  • the current single frame picture and the next single frame picture are different from each other. If the difference between the changes is not obvious, it can be determined that there is no shot scene transition between the current frame and the next frame in the video to be cut, so there is no need to cut, and the current single frame picture can be determined as a non-shot switch Frame the picture and then filter it out.
  • the variance value of the current single frame picture is calculated as S(t), the variance value corresponding to the next frame single frame picture is S(t+1), and the first preset threshold is set to N1. :
  • the current single frame picture can be saved as a candidate frame picture to be subjected to the next step of comparison and detection.
  • the variance value of the current single frame picture is calculated as S(t), the variance value corresponding to the next frame single frame picture is S(t+1), and the first preset threshold is set to N1. :
  • a target detection model whose training result meets a preset standard is obtained based on the target detection algorithm training.
  • step 208 of the embodiment may specifically include: collecting multiple single-frame pictures as sample images; labeling the position coordinates and category information of each connected component in the sample image;
  • the sample image is used as the training set and input into the initial target detection model created in advance based on the yolo target detection algorithm;
  • the initial target detection model is used to extract the image features of various connected components in the sample image, and based on the image features to generate the suggestion window of each connected component and
  • the suggestion window corresponds to the conditional category probabilities of various connected components;
  • the connected component category with the largest conditional category probability is determined as the category recognition result of the connected components in the suggestion window; if it is determined that the confidence of all suggestion windows is greater than the second preset threshold, and If the category recognition result matches the labeled category information, it is determined that the initial target detection model has passed the training; if it is determined that the initial target detection model has not passed the training, the location coordinates and category information of each connected component labeled in the sample image are used to modify the training initial
  • the confidence degree is used to determine whether there is an object in the recognition detection frame and the probability of the existence of the object.
  • the second preset threshold is a criterion used to evaluate whether the initial target detection model has passed the training.
  • the confidence that is determined to be non-zero is compared with the second preset threshold.
  • the initial target is determined
  • the detection model passes the training, otherwise it fails the training. Since the value of the confidence is between 0 and 1, the maximum value of the second preset threshold is set to 1. The larger the second preset threshold is, the more accurate the model training is.
  • the specific value is set Can be determined according to application standards.
  • the category information is the category that contains connected components in the video to be cut, such as people of different body shapes and appearances, fixed buildings, equipment, etc. In specific application scenarios, different settings to be recognized can be set according to the actual video recording scene Category information.
  • the initial target detection model is created in advance according to the design needs.
  • the initial target detection model is only initially created, it fails the model training, and does not meet the preset standards, while the target detection model refers to the model training , Which has reached the preset standard and can be applied to the detection of connected components in each single frame picture.
  • conditional class probability information is for each grid, that is, the probability of each object in each suggestion window corresponding to each category, such as training recognition a , B, c, d, e five categories, according to the confidence to determine that the suggested window A contains objects, then predict the conditional category probabilities of the suggested window A corresponding to the five categories a, b, c, d, e, such as the prediction result Respectively: 80%, 55%, 50%, 37%, 15%, the category a with the highest conditional category probability is judged as the recognition result, it is necessary to verify whether the object category actually calibrated in the detection frame is category a, if it is a category, it is determined that the initial target detection model recognizes the category information in this suggestion window is correct.
  • the confidence of all the recognized suggestion windows is greater than the second preset threshold, and the category recognition result matches the labeled category information, it is determined that the initial target detection model has passed the training.
  • the first detection information is the category and quantity of all connected components contained in the candidate frame picture, and data information such as position information, height, and width corresponding to each connected component.
  • the next single frame picture is a single frame picture corresponding to the next frame of the current candidate frame picture in the video to be cut, and the next single frame picture may be a non-shot switching frame picture or a candidate frame picture.
  • the second detection data information is the category and quantity of all connected components contained in the next single frame picture, and data information such as position information, height, and width corresponding to each connected component.
  • the current candidate frame picture and the corresponding next single frame picture are in two.
  • a completely different shot scene that is, it is determined that a shot scene switching occurs between the candidate frame and the next frame, so the current candidate frame picture is retained as the shot switching frame picture.
  • the first detection data information and the second detection data information contain at least one same connected component, it can be determined that the current candidate frame picture is a non-shot switching frame picture, and the candidate frame is filtered out.
  • step 212 of the embodiment may specifically include: calculating a first difference value based on the position coordinate information of the same connected component in the first detection data information and the second detection data information; The height and width information of the same connected component in the data information and the second detected data information calculate the second difference value.
  • the current candidate frame picture and the corresponding next frame single frame picture contain two identical connected components, and the corresponding two connected components are: s1, s2, and the size of s1 is obtained through the first detection data information
  • the sum position data is ⁇ x1, y1, w1, h1 ⁇
  • the size and position data of s2 obtained through the second detection data information is: ⁇ x2, y2, w2, h2 ⁇ .
  • x1 and y1 are respectively the position coordinate information of s1 in the current candidate frame picture
  • x2 and y2 are respectively the position coordinate information of s2 in the next single frame picture
  • w1 and h1 are the width and height of s1 respectively
  • w2 h2 is the width and height of s2 respectively.
  • step 213 of the embodiment may specifically include: if the first difference value and/or the second difference value is greater than the third preset threshold, determining that the candidate frame picture is a shot switching frame picture.
  • the preset condition is that at least one of the first difference value and the second difference value is greater than the third preset threshold, and the third preset threshold is the smallest difference value used to determine that the candidate frame picture is the shot switching frame picture, and the specific value is Can be set according to the actual situation.
  • the first difference value is calculated as d1
  • the second difference value is d2
  • the third preset threshold is set to N2. If it is determined that d1>N2 or d2>N2 or d1 , D2>N2, it can be determined that the candidate frame picture is a shot switching frame picture.
  • step 214 of the embodiment may specifically include: determining a shot switching frame corresponding to each shot switching frame picture; and cutting the video to be cut at the shot switching frame.
  • all the single-frame picture sequences extracted from the video to be cut are: [t0,...,tn], if it is determined that the shot switching frame corresponding to the extracted shot switching frame picture is: tx1, tx2, ..., txm, And (t0 ⁇ tx1 ⁇ tx2 ⁇ ... ⁇ txm ⁇ tn).
  • the video to be cut can be cut into [t0, tx1], [tx1+1, tx2], ... [txm+1, tn] video segments, where each video segment is a single shot segment.
  • each single frame picture can be extracted from the video to be cut; after preprocessing each single frame picture, calculate the distance between each single frame picture and the corresponding next single frame picture When the variance change value is greater than the first preset threshold, it is determined that the single frame picture is a candidate frame picture. After all the candidate frame pictures are extracted, the candidate frame picture is compared with the corresponding next frame based on the yolo target detection algorithm When the difference degree of the connected components of a single frame picture is large, the candidate frame picture can be determined as the shot switching frame picture; finally, the to-be-cut video is cut at the shot switching frame corresponding to the shot switching frame picture.
  • all the lens switching frames included in the video to be cut can be accurately and efficiently determined, thereby realizing accurate cutting of each single lens scene, improving the cutting efficiency while , It also reduces the labor cost of video cutting.
  • an embodiment of the present application provides a device for cutting a video shot.
  • the device includes: an extraction module 31, a screening module 32, and a determination Module 33, cutting module 34.
  • the extraction module 31 is used to extract each single frame picture in the video to be cut;
  • the screening module 32 is used for screening candidate frame pictures from a single frame picture based on the variance change value
  • the determining module 33 is configured to determine all shot switching frame pictures included in the candidate frame pictures by using a target detection algorithm
  • the cutting module 34 is used to cut the to-be-cut video into multiple video clips according to the camera switching frame pictures.
  • the device further includes a scaling module 35 and a processing module 36.
  • the zoom module 35 is used to zoom each single frame picture to a preset size
  • the processing module 36 is used to perform grayscale processing on the scaled single frame picture.
  • the filtering module 32 is specifically used to calculate the variance value of all pixels in each single frame picture; calculate each single frame picture and the corresponding next frame The variance change value between single frames of pictures; if it is determined that the variance change value is less than the first preset threshold, then the single frame picture is determined to be a non-shot switching frame picture; if the variance change value is determined to be greater than or equal to the first preset threshold, then it is determined A single frame picture is a candidate frame picture.
  • the determining module 33 is specifically used to train the target detection algorithm based on the target detection algorithm to obtain a target detection model whose training result meets the preset standard;
  • the candidate frame picture is input into the target detection model to obtain the first detection data information corresponding to the candidate frame picture;
  • the next single frame picture corresponding to the candidate frame picture is input into the target detection model, and the second frame picture corresponding to the next single frame picture is obtained Detection data information; if it is determined that the first detection data information and the second detection data information do not contain the same connected component, it is determined that the candidate frame picture is a shot switching frame picture; if it is determined that the first detection data information and the second detection data information contain For the same connected component, the difference value of the same connected component is calculated; when the difference value meets the preset condition, it is determined that the candidate frame picture is the shot switching frame picture.
  • the determination module 33 is specifically used to collect multiple single-frame pictures as sample images; label the position coordinates and categories of each connected component in the sample image Information; use the sample images with marked coordinate positions as the training set and input them into the initial target detection model created in advance based on the yolo target detection algorithm; use the initial target detection model to extract the image features of various connected components in the sample images, and based on the image features Generate the suggestion window of each connected component and the conditional category probability of the various connected components corresponding to the suggestion window; determine the connected component category with the highest conditional category probability as the category recognition result of the connected component in the suggestion window; if it is determined that the confidence of all the suggestion windows is equal If it is greater than the second preset threshold and the category recognition result matches the labeled category information, it is determined that the initial target detection model has passed the training; if it is determined that the initial target detection model has not passed the training, the position coordinate
  • the determining module 33 is specifically configured to be based on the same connected component in the first detection data information and the second detection data information Calculate the first difference value based on the position coordinate information of the first detection data information and the second detection data information based on the height and width information of the same connected component in the second detection data information.
  • the determining module 33 is specifically configured to determine that the candidate frame picture is a shot switching frame picture if the first difference value and/or the second difference value is greater than the third preset threshold.
  • the cutting module 34 is specifically used to determine the shot switching frame corresponding to each shot switching frame picture; cut the to-be-cut video at the shot switching frame Cut the video.
  • an embodiment of the present application also provides a non-volatile readable storage medium on which computer-readable instructions are stored, and the computer-readable instructions are When executed, the video shot cutting method shown in FIG. 1 and FIG. 2 is realized.
  • the technical solution of this application can be embodied in the form of a software product.
  • the software product can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.), including several
  • the instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods in each implementation scenario of the present application.
  • an embodiment of the present application also provides a computer device, which may be a personal computer, Server, network device, etc., the physical device includes a nonvolatile readable storage medium and a processor; a nonvolatile readable storage medium for storing computer readable instructions; a processor for executing computer readable instructions to The video shot cutting method shown in Figure 1 and Figure 2 is implemented.
  • the computer device may also include a user interface, a network interface, a camera, a radio frequency (RF) circuit, a sensor, an audio circuit, a WI-FI module, and so on.
  • the user interface may include a display screen (Display), an input unit such as a keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, and the like.
  • the network interface can optionally include a standard wired interface, a wireless interface (such as a Bluetooth interface, a WI-FI interface), etc.
  • the computer device structure provided in this embodiment does not constitute a limitation on the physical device, and may include more or fewer components, or combine certain components, or arrange different components.
  • the non-volatile readable storage medium may also include an operating system and a network communication module.
  • the operating system is a program for the hardware and software resources of the physical equipment cut by the video lens, and supports the operation of information processing programs and other software and/or programs.
  • the network communication module is used to implement communication between various components in the non-volatile readable storage medium and communication with other hardware and software in the physical device.
  • this application can be implemented by means of software plus a necessary general hardware platform, or by hardware.
  • this application can extract each single frame picture from the video to be cut; after preprocessing each single frame picture, calculate the sum of each single frame picture Corresponding to the variance change value between the next single frame picture, when the variance change value is greater than the first preset threshold, determine the single frame picture as a candidate frame picture, after extracting all the candidate frame pictures, based on the yolo target detection algorithm Compare the degree of difference between the connected components of the candidate frame picture and the corresponding next single frame picture.
  • the candidate frame picture can be determined as the shot switching frame picture; finally, the shot switching frame corresponding to the shot switching frame picture Cut the video to be cut at any place.
  • the shot switching frame corresponding to the shot switching frame picture Cut the video to be cut at any place.
  • all the lens switching frames included in the video to be cut can be accurately and efficiently determined, thereby realizing accurate cutting of each single lens scene, improving the cutting efficiency while , It also reduces the labor cost of video cutting.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

L'invention se rapporte au domaine technique des ordinateurs et concerne un procédé et un appareil de découpage en séquences vidéo, ainsi qu'un dispositif informatique, qui permettent de résoudre les problèmes liés à la complexité d'une opération de découpage, à une faible efficacité et au temps et à la main d'œuvre nécessaires à un découpage vidéo réalisé à l'aide d'un outil logiciel manuel. Le procédé consiste à : extraire chaque trame unique d'une image dans une vidéo à découper ; sélectionner, sur la base d'une valeur de changement de variance, des trames d'images candidates à partir de la trame unique d'une image ; déterminer, à l'aide d'un algorithme de détection de cible, toutes les trames d'images découpées en séquences comprises dans les trames d'images candidates ; et découper ladite vidéo en une pluralité de clips vidéo en fonction des trames d'images découpées en séquences. La présente invention peut être appliquée à la division automatique de fragments vidéo dans différents scénarios de séquences.
PCT/CN2019/103528 2019-07-11 2019-08-30 Procédé et appareil de découpage en séquences vidéo, et dispositif informatique WO2021003825A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910624918.6 2019-07-11
CN201910624918.6A CN110430443B (zh) 2019-07-11 2019-07-11 视频镜头剪切的方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021003825A1 true WO2021003825A1 (fr) 2021-01-14

Family

ID=68410483

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103528 WO2021003825A1 (fr) 2019-07-11 2019-08-30 Procédé et appareil de découpage en séquences vidéo, et dispositif informatique

Country Status (2)

Country Link
CN (1) CN110430443B (fr)
WO (1) WO2021003825A1 (fr)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113825012A (zh) * 2021-06-04 2021-12-21 腾讯科技(深圳)有限公司 视频数据处理方法和计算机设备
CN113840159A (zh) * 2021-09-26 2021-12-24 北京沃东天骏信息技术有限公司 视频处理方法、装置、计算机系统及可读存储介质
CN114120250A (zh) * 2021-11-30 2022-03-01 北京文安智能技术股份有限公司 一种基于视频的机动车辆违法载人检测方法
CN114140461A (zh) * 2021-12-09 2022-03-04 成都智元汇信息技术股份有限公司 基于边缘识图盒子的切图方法、电子设备及介质
CN114189754A (zh) * 2021-12-08 2022-03-15 湖南快乐阳光互动娱乐传媒有限公司 一种视频情节分段方法及系统
CN114363695A (zh) * 2021-11-11 2022-04-15 腾讯科技(深圳)有限公司 视频处理方法、装置、计算机设备和存储介质
CN115022711A (zh) * 2022-04-28 2022-09-06 之江实验室 一种电影场景内镜头视频排序系统及方法
CN115119050A (zh) * 2022-06-30 2022-09-27 北京奇艺世纪科技有限公司 一种视频剪辑方法和装置、电子设备和存储介质
CN115174957A (zh) * 2022-06-27 2022-10-11 咪咕文化科技有限公司 弹幕调用方法、装置、计算机设备及可读存储介质
CN115457447A (zh) * 2022-11-07 2022-12-09 浙江莲荷科技有限公司 运动物体识别的方法、装置、系统及电子设备、存储介质
CN115861914A (zh) * 2022-10-24 2023-03-28 广东魅视科技股份有限公司 一种辅助用户查找特定目标的方法

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444819B (zh) * 2020-03-24 2024-01-23 北京百度网讯科技有限公司 切割帧确定方法、网络训练方法、装置、设备及存储介质
CN111491183B (zh) * 2020-04-23 2022-07-12 百度在线网络技术(北京)有限公司 一种视频处理方法、装置、设备及存储介质
CN112584073B (zh) * 2020-12-24 2022-08-02 杭州叙简科技股份有限公司 一种基于5g的执法记录仪分布式协助计算方法
CN114286171B (zh) * 2021-08-19 2023-04-07 腾讯科技(深圳)有限公司 视频处理方法、装置、设备及存储介质
CN114155473B (zh) * 2021-12-09 2022-11-08 成都智元汇信息技术股份有限公司 基于帧补偿的切图方法、电子设备及介质
CN114446331B (zh) * 2022-04-07 2022-06-24 深圳爱卓软科技有限公司 一种快速剪裁视频的视频编辑软件系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090141178A1 (en) * 2007-11-30 2009-06-04 Kerofsky Louis J Methods and Systems for Backlight Modulation with Scene-Cut Detection
CN102497556A (zh) * 2011-12-26 2012-06-13 深圳市融创天下科技股份有限公司 一种基于时间变化度的场景切换检测方法、装置、设备
CN105612535A (zh) * 2013-08-29 2016-05-25 匹斯奥特(以色列)有限公司 高效的基于内容的视频检索
CN106162222A (zh) * 2015-04-22 2016-11-23 无锡天脉聚源传媒科技有限公司 一种视频镜头切分的方法及装置
CN106937114A (zh) * 2015-12-30 2017-07-07 株式会社日立制作所 用于对视频场景切换进行检测的方法和装置
CN109510919A (zh) * 2011-10-11 2019-03-22 瑞典爱立信有限公司 用于视频序列中的感知质量评估的场景变换检测
CN109740499A (zh) * 2018-12-28 2019-05-10 北京旷视科技有限公司 视频分割方法、视频动作识别方法、装置、设备及介质

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100559880C (zh) * 2007-08-10 2009-11-11 中国传媒大学 一种基于自适应st区的高清视频图像质量评价方法及装置
US8744122B2 (en) * 2008-10-22 2014-06-03 Sri International System and method for object detection from a moving platform
US8509526B2 (en) * 2010-04-13 2013-08-13 International Business Machines Corporation Detection of objects in digital images
CN103227963A (zh) * 2013-03-20 2013-07-31 西交利物浦大学 基于视频运动目标检测和跟踪的静态监控视频摘要方法
CN103426176B (zh) * 2013-08-27 2017-03-01 重庆邮电大学 基于改进直方图和聚类算法的视频镜头检测方法
CN103945281B (zh) * 2014-04-29 2018-04-17 中国联合网络通信集团有限公司 视频传输处理方法、装置和系统
CN104394422B (zh) * 2014-11-12 2017-11-17 华为软件技术有限公司 一种视频分割点获取方法及装置
CN104410867A (zh) * 2014-11-17 2015-03-11 北京京东尚科信息技术有限公司 改进的视频镜头检测方法
CN104715023B (zh) * 2015-03-02 2018-08-03 北京奇艺世纪科技有限公司 基于视频内容的商品推荐方法和系统
CN105025360B (zh) * 2015-07-17 2018-07-17 江西洪都航空工业集团有限责任公司 一种改进的快速视频浓缩的方法
CN106331524B (zh) * 2016-08-18 2019-07-26 无锡天脉聚源传媒科技有限公司 一种识别镜头切换的方法及装置
US11004209B2 (en) * 2017-10-26 2021-05-11 Qualcomm Incorporated Methods and systems for applying complex object detection in a video analytics system
CN108205657A (zh) * 2017-11-24 2018-06-26 中国电子科技集团公司电子科学研究院 视频镜头分割的方法、存储介质和移动终端
CN108182421B (zh) * 2018-01-24 2020-07-14 北京影谱科技股份有限公司 视频分割方法和装置
CN108769731B (zh) * 2018-05-25 2021-09-24 北京奇艺世纪科技有限公司 一种检测视频中目标视频片段的方法、装置及电子设备
CN108470077B (zh) * 2018-05-28 2023-07-28 广东工业大学 一种视频关键帧提取方法、系统及设备和存储介质
CN109819338B (zh) * 2019-02-22 2021-09-14 影石创新科技股份有限公司 一种视频自动剪辑方法、装置及便携式终端
CN109934131A (zh) * 2019-02-28 2019-06-25 南京航空航天大学 一种基于无人机的小目标检测方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090141178A1 (en) * 2007-11-30 2009-06-04 Kerofsky Louis J Methods and Systems for Backlight Modulation with Scene-Cut Detection
CN109510919A (zh) * 2011-10-11 2019-03-22 瑞典爱立信有限公司 用于视频序列中的感知质量评估的场景变换检测
CN102497556A (zh) * 2011-12-26 2012-06-13 深圳市融创天下科技股份有限公司 一种基于时间变化度的场景切换检测方法、装置、设备
CN105612535A (zh) * 2013-08-29 2016-05-25 匹斯奥特(以色列)有限公司 高效的基于内容的视频检索
CN106162222A (zh) * 2015-04-22 2016-11-23 无锡天脉聚源传媒科技有限公司 一种视频镜头切分的方法及装置
CN106937114A (zh) * 2015-12-30 2017-07-07 株式会社日立制作所 用于对视频场景切换进行检测的方法和装置
CN109740499A (zh) * 2018-12-28 2019-05-10 北京旷视科技有限公司 视频分割方法、视频动作识别方法、装置、设备及介质

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113825012A (zh) * 2021-06-04 2021-12-21 腾讯科技(深圳)有限公司 视频数据处理方法和计算机设备
CN113840159A (zh) * 2021-09-26 2021-12-24 北京沃东天骏信息技术有限公司 视频处理方法、装置、计算机系统及可读存储介质
CN114363695B (zh) * 2021-11-11 2023-06-13 腾讯科技(深圳)有限公司 视频处理方法、装置、计算机设备和存储介质
CN114363695A (zh) * 2021-11-11 2022-04-15 腾讯科技(深圳)有限公司 视频处理方法、装置、计算机设备和存储介质
CN114120250A (zh) * 2021-11-30 2022-03-01 北京文安智能技术股份有限公司 一种基于视频的机动车辆违法载人检测方法
CN114120250B (zh) * 2021-11-30 2024-04-05 北京文安智能技术股份有限公司 一种基于视频的机动车辆违法载人检测方法
CN114189754A (zh) * 2021-12-08 2022-03-15 湖南快乐阳光互动娱乐传媒有限公司 一种视频情节分段方法及系统
CN114140461B (zh) * 2021-12-09 2023-02-14 成都智元汇信息技术股份有限公司 基于边缘识图盒子的切图方法、电子设备及介质
CN114140461A (zh) * 2021-12-09 2022-03-04 成都智元汇信息技术股份有限公司 基于边缘识图盒子的切图方法、电子设备及介质
CN115022711A (zh) * 2022-04-28 2022-09-06 之江实验室 一种电影场景内镜头视频排序系统及方法
CN115022711B (zh) * 2022-04-28 2024-05-31 之江实验室 一种电影场景内镜头视频排序系统及方法
CN115174957A (zh) * 2022-06-27 2022-10-11 咪咕文化科技有限公司 弹幕调用方法、装置、计算机设备及可读存储介质
CN115174957B (zh) * 2022-06-27 2023-08-15 咪咕文化科技有限公司 弹幕调用方法、装置、计算机设备及可读存储介质
CN115119050A (zh) * 2022-06-30 2022-09-27 北京奇艺世纪科技有限公司 一种视频剪辑方法和装置、电子设备和存储介质
CN115119050B (zh) * 2022-06-30 2023-12-15 北京奇艺世纪科技有限公司 一种视频剪辑方法和装置、电子设备和存储介质
CN115861914A (zh) * 2022-10-24 2023-03-28 广东魅视科技股份有限公司 一种辅助用户查找特定目标的方法
CN115457447A (zh) * 2022-11-07 2022-12-09 浙江莲荷科技有限公司 运动物体识别的方法、装置、系统及电子设备、存储介质

Also Published As

Publication number Publication date
CN110430443A (zh) 2019-11-08
CN110430443B (zh) 2022-01-25

Similar Documents

Publication Publication Date Title
WO2021003825A1 (fr) Procédé et appareil de découpage en séquences vidéo, et dispositif informatique
EP3826317B1 (fr) Procédé et dispositif d'identification de point temporel clé de vidéo, appareil informatique et support d'informations
US20200167554A1 (en) Gesture Recognition Method, Apparatus, And Device
US9721387B2 (en) Systems and methods for implementing augmented reality
US9104242B2 (en) Palm gesture recognition method and device as well as human-machine interaction method and apparatus
US8379987B2 (en) Method, apparatus and computer program product for providing hand segmentation for gesture analysis
RU2637989C2 (ru) Способ и устройство для идентификации целевого объекта на изображении
US9756261B2 (en) Method for synthesizing images and electronic device thereof
US20170154238A1 (en) Method and electronic device for skin color detection
US20150358549A1 (en) Image capturing parameter adjustment in preview mode
WO2014156425A1 (fr) Procédé de partitionnement de zone et dispositif d'inspection
CN110460838B (zh) 一种镜头切换的检测方法、装置及计算机设备
US20110156999A1 (en) Gesture recognition methods and systems
CN112633313B (zh) 一种网络终端的不良信息识别方法及局域网终端设备
US20180101949A1 (en) Automated nuclei area/number estimation for ihc image analysis
CN103106388B (zh) 图像识别方法和系统
CN111695540A (zh) 视频边框识别方法及裁剪方法、装置、电子设备及介质
CN108665769B (zh) 基于卷积神经网络的网络教学方法以及装置
JP2014044461A (ja) 画像処理装置および方法、並びにプログラム
JP2019517079A (ja) 形状検知
JP7111873B2 (ja) 信号灯識別方法、装置、デバイス、記憶媒体及びプログラム
US9727145B2 (en) Detecting device and detecting method
CN112380940B (zh) 一种高空抛物监控图像的处理方法、装置、电子设备和存储介质
CN104850819B (zh) 信息处理方法及电子设备
CN113992976B (zh) 视频播放方法、装置、设备以及计算机存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19936643

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19936643

Country of ref document: EP

Kind code of ref document: A1