CN115529483A

CN115529483A - A video processing method, device, electronic equipment and storage medium

Info

Publication number: CN115529483A
Application number: CN202211025874.3A
Authority: CN
Inventors: 蔡佳音; 陶鑫; 戴宇荣
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2022-08-25
Filing date: 2022-08-25
Publication date: 2022-12-27
Anticipated expiration: 2042-08-25
Also published as: CN115529483B

Abstract

The present disclosure relates to a video processing method, device, electronic equipment, and storage medium. The method includes: segmenting and extracting key frames from the original video to obtain at least one video segment and key frames in each video segment. The same video segment The position information of the object to be processed contained in the image frame in the image frame to which the object belongs meets the preset position information, and determine the position information of the object to be processed in the key frame and the object to be processed in the key frame in each video segment , based on the object to be processed in each key frame and the position information of the object to be processed in each key frame, generate the mask of the object to be processed in each key frame, based on at least one video clip and the object to be processed in each key frame The mask performs object processing on at least one video segment to obtain a target video. The present application can reduce the hardware and software resources required for removing objects to be processed in subsequent videos, and is more universal.

Description

A video processing method, device, electronic equipment and storage medium

技术领域technical field

本公开涉及互联网技术领域，尤其涉及一种视频处理方法、装置、电子设备及存储介质。The present disclosure relates to the technical field of the Internet, and in particular to a video processing method, device, electronic equipment and storage medium.

背景技术Background technique

视频、摄影图片等越来越多地被分享在在网上，供其他用户获取信息和欣赏。但是，视频或者摄影图片在被发布之前，可能会在拍摄过程或者保存过程中遭受污损，影响视频或者摄影图片的发布。Videos, photos, etc. are increasingly being shared online for other users to obtain information and enjoy. However, before the video or photo is released, it may be defaced during the shooting or saving process, affecting the release of the video or photo.

一般的去除污损方法包括：搜集存在同一个污损的大量图像，从而进行污损的估计和对齐检测。随后，通过检测整个图像集合中所有图像的梯度将污损定位，得到一个正确的初始估计值，其中，对齐的检测用于估计初始的阿尔法掩码，并且细化估计污损层，然后将它们用作我们的多图像消除算法的初始化输入，得到去除污损后的图像。The general decontamination method includes: collecting a large number of images with the same defacement, so as to estimate and align the defacement. Subsequently, the defacement is localized by detecting the gradients of all images in the entire image set to obtain a correct initial estimate, where the aligned detections are used to estimate the initial alpha mask and refine the estimated defacement layer, and then they Used as initial input to our multi-image decontamination algorithm to obtain the defaced image.

然而，上述的去除污损的方法一次只能针对某一种特定污损的去除，没有普适性。However, the above-mentioned decontamination method can only be aimed at the removal of a specific decontamination at a time, and has no universal applicability.

发明内容Contents of the invention

本公开提供一种视频处理方法、装置、电子设备及存储介质，本公开的技术方案如下：The present disclosure provides a video processing method, device, electronic equipment, and storage medium. The technical solutions of the present disclosure are as follows:

根据本公开实施例的第一方面，提供一种视频处理方法，包括：According to a first aspect of an embodiment of the present disclosure, a video processing method is provided, including:

对原始视频进行分段和关键帧提取，得到至少一个视频片段和每个视频片段中的关键帧；其中，同一个视频片段中的图像帧包含的待处理对象在待处理对象所属的图像帧上的位置信息满足预设位置信息；预设位置信息为图像帧所属的视频片段对应的信息；Perform segmentation and key frame extraction on the original video to obtain at least one video segment and key frames in each video segment; wherein, the image frame in the same video segment contains the object to be processed on the image frame to which the object to be processed belongs The location information satisfies the preset location information; the preset location information is the information corresponding to the video segment to which the image frame belongs;

确定每个视频片段中的关键帧的待处理对象和关键帧的待处理对象的位置信息；Determine the object to be processed in the key frame and the position information of the object to be processed in the key frame in each video segment;

基于每个关键帧的待处理对象和每个关键帧的待处理对象的位置信息，生成每个关键帧的待处理对象的蒙版；Based on the object to be processed in each key frame and the position information of the object to be processed in each key frame, a mask of the object to be processed in each key frame is generated;

基于至少一个视频片段和每个关键帧的待处理对象的蒙版对至少一个视频片段进行对象处理，得到目标视频。Object processing is performed on at least one video segment based on the at least one video segment and the mask of the object to be processed in each key frame to obtain a target video.

在一些可能的实施例中，对原始视频进行分段和关键帧提取，得到至少一个视频片段和每个视频片段中的关键帧，包括：In some possible embodiments, the original video is segmented and key frame extracted to obtain at least one video segment and a key frame in each video segment, including:

对原始视频进行对象识别，得到原始视频的待处理对象的类型信息；Perform object recognition on the original video to obtain the type information of the object to be processed in the original video;

基于待处理对象的类型信息确定待处理对象在原始视频上的显示规则；Determine the display rule of the object to be processed on the original video based on the type information of the object to be processed;

基于显示规则对原始视频进行分段，得到至少一个视频片段；Segmenting the original video based on display rules to obtain at least one video segment;

将至少一个视频片段中的每个视频片段中的预设图像帧确定为每个视频片段中的关键帧。A preset image frame in each video segment of the at least one video segment is determined as a key frame in each video segment.

在一些可能的实施例中，基于待处理对象的类型信息确定待处理对象在原始视频上的显示规则，基于显示规则对原始视频进行分段，得到至少一个视频片段，包括：In some possible embodiments, the display rule of the object to be processed on the original video is determined based on the type information of the object to be processed, and the original video is segmented based on the display rule to obtain at least one video segment, including:

基于待处理对象的类型信息确定待处理对象在原始视频上的显示区域和显示时长；Determine the display area and display duration of the object to be processed on the original video based on the type information of the object to be processed;

基于显示区域和显示时长对原始视频进行分段，得到至少一个视频片段。The original video is segmented based on the display area and display duration to obtain at least one video segment.

对原始视频中的第一图像帧进行对象识别，得到原始视频的待处理对象；第一图像帧为原始视频中第一次出现待处理对象的图像帧；Object recognition is carried out to the first image frame in the original video to obtain the object to be processed in the original video; the first image frame is the image frame in which the object to be processed appears for the first time in the original video;

确定待处理对象在第一图像帧上的第一位置信息；determining first position information of the object to be processed on the first image frame;

基于第一位置信息对第一图像帧进行图像截取，得到第一位置信息对应的第一子图像；performing image interception on the first image frame based on the first position information, to obtain a first sub-image corresponding to the first position information;

基于第一子图像和第二图像帧集中的每个第二图像帧确定每个第二图像帧对应的相似度数据；第二图像帧集包括原始视频中，除第一图像帧外的图像帧；Determine the similarity data corresponding to each second image frame based on the first sub-image and each second image frame in the second image frame set; the second image frame set includes image frames other than the first image frame in the original video ;

基于每个第二图像帧对应的相似度数据对原始视频进行分段，得到至少一个视频片段；Segmenting the original video based on the similarity data corresponding to each second image frame to obtain at least one video segment;

在一些可能的实施例中，基于第一子图像和每个第二图像帧确定每个第二图像帧对应的相似度数据，基于每个第二图像帧对应的相似度数据对原始视频进行分段，得到至少一个视频片段，包括：In some possible embodiments, the similarity data corresponding to each second image frame is determined based on the first sub-image and each second image frame, and the original video is classified based on the similarity data corresponding to each second image frame. segment, get at least one video segment, including:

获取每个第二图像帧中与第一位置信息对应的第二子图像；acquiring a second sub-image corresponding to the first position information in each second image frame;

基于每个第二子图像和第一子图像的相似程度，确定每个第二图像帧对应的相似度数据；Based on the degree of similarity between each second sub-image and the first sub-image, determine similarity data corresponding to each second image frame;

若每个第二图像帧对应的相似度数据满足预设数据，得到一个视频片段；视频片段的第一个图像帧为第一图像帧。If the similarity data corresponding to each second image frame satisfies the preset data, a video segment is obtained; the first image frame of the video segment is the first image frame.

在一些可能的实施例中，基于每个第二子图像和第一子图像的相似程度，确定每个第二图像帧对应的相似度数据之后，还包括：In some possible embodiments, after determining the similarity data corresponding to each second image frame based on the degree of similarity between each second sub-image and the first sub-image, it further includes:

若第二图像帧集中存在第一目标图像帧集，且第一目标图像帧集中位置位于第一的第一目标图像帧与第一图像帧相邻，且在原始视频中位于第一图像帧之后，基于第一图像帧和第一目标图像帧集确定第一个视频片段；If there is a first target image frame set in the second image frame set, and the position of the first target image frame set is first, the first target image frame is adjacent to the first image frame, and is located after the first image frame in the original video , determining the first video segment based on the first image frame and the first target image frame set;

第一目标图像帧集包括一个第一目标图像帧或者多个连续的第一目标图像帧；第一目标图像帧集不等于第二图像帧集，第一目标图像帧集中的第一目标图像帧对应的相似度数据满足预设数据。The first target image frame set includes a first target image frame or a plurality of continuous first target image frames; the first target image frame set is not equal to the second image frame set, and the first target image frame in the first target image frame set The corresponding similarity data satisfies the preset data.

在一些可能的实施例中，方法还包括：In some possible embodiments, the method also includes:

以第二图像帧集和第一目标图像帧集之间的差集为待分段视频；Taking the difference between the second image frame set and the first target image frame set as the video to be segmented;

以待分段视频的第一个图像帧为新的第一图像帧；以待分段视频中，除新的第一图像帧之外的图像帧为新的第二图像帧集；The first image frame of the video to be segmented is a new first image frame; in the video to be segmented, the image frames other than the new first image frame are a new second image frame set;

确定待处理对象在新的第一图像帧上的新的第一位置信息；Determining new first position information of the object to be processed on the new first image frame;

从新的第一图像帧上，得到新的第一位置信息对应的新的第一子图像，以及从新的第二图像帧集中的每个第二图像帧中，获取与新的第一位置信息对应的新的第二子图像；From the new first image frame, obtain the new first sub-image corresponding to the new first position information, and obtain the new first sub-image corresponding to the new first position information from each second image frame in the new second image frame set The new second sub-image of ;

基于新的第一子图像和每个新的第二子图像的相似程度，确定每个新的第二子图像的相似度数据；determining similarity data for each new second sub-image based on the degree of similarity between the new first sub-image and each new second sub-image;

若新的第二图像帧集中存在新的第一目标图像帧集，且新的第一目标图像帧集中，位置位于第一的第一目标图像帧与新的第一图像帧相邻，且在原始视频中位于新的第一图像帧之后，基于新的第一图像帧和新的第一目标图像帧集确定第二个视频片段；If there is a new first target image frame set in the new second image frame set, and in the new first target image frame set, the first target image frame at the first position is adjacent to the new first image frame, and in After the new first image frame in the original video, the second video segment is determined based on the new first image frame and the new first target image frame set;

新的第一目标图像帧集包括一个新的第一目标图像帧或者多个连续的新的第一目标图像帧；新的第一目标图像帧集不等于新的第二图像帧集，新的第一目标图像帧集中的第一目标图像帧对应的相似度数据满足预设数据。The new first target image frame set includes a new first target image frame or a plurality of continuous new first target image frames; the new first target image frame set is not equal to the new second image frame set, and the new The similarity data corresponding to the first target image frames in the first target image frame set satisfies the preset data.

对原始视频中进行预设颜色占比检测，得到原始视频中的每个图像帧的预设颜色的占比数据；Preset color proportion detection is carried out in the original video, and the proportion data of the preset color of each image frame in the original video is obtained;

若原始视频中存在多个图像帧的预设颜色的占比数据满足第三预设数据，确定多个图像帧的起止时间；其中，多个图像帧为连续的图像帧，且多个图像帧的最后一个图像帧为视频结尾图像帧。If the ratio data of the preset colors of multiple image frames in the original video satisfies the third preset data, determine the start and end times of the multiple image frames; wherein, the multiple image frames are continuous image frames, and the multiple image frames The last image frame of is the video end image frame.

在一些可能的实施例中，待处理对象包括第一类待处理对象和第二类待处理对象，则确定每个视频片段中的关键帧的待处理对象和关键帧的待处理对象的位置信息，包括：In some possible embodiments, the object to be processed includes the first type of object to be processed and the second type of object to be processed, then determine the position information of the object to be processed in the key frame and the object to be processed in the key frame in each video clip ,include:

基于对象检测模型中的第一检测模型对每个视频片段中的关键帧进行第一类对象检测，确定每个视频片段中的关键帧的第一类待处理对象和第一类待处理对象的位置信息；Based on the first detection model in the object detection model, the first type of object detection is carried out to the key frame in each video clip, and the first type of object to be processed and the first type of object to be processed in the key frame in each video clip are determined. location information;

基于对象检测模型中的第二检测模型和第一类待处理对象的位置信息对每个视频片段中的关键帧进行第二类对象检测，确定每个视频片段中的关键帧的第二类待处理对象和第二类待处理对象的位置信息。Based on the second detection model in the object detection model and the position information of the first type of object to be processed, the key frame in each video segment is detected for the second type of object, and the second type of the key frame in each video segment is determined to be processed. Location information of the processing object and the second type of object to be processed.

在一些可能的实施例中，基于每个关键帧的待处理对象和每个关键帧的待处理对象的位置信息，生成每个关键帧的待处理对象的蒙版，包括：In some possible embodiments, based on the object to be processed in each key frame and the position information of the object to be processed in each key frame, generating a mask of the object to be processed in each key frame includes:

基于每个关键帧的待处理对象的位置信息对每个关键帧的待处理对象的像素进行二值化处理，得到每个关键帧的待处理对象的蒙版。Based on the position information of the object to be processed in each key frame, the pixels of the object to be processed in each key frame are binarized to obtain the mask of the object to be processed in each key frame.

根据本公开实施例的第二方面，提供一种视频处理装置，包括：According to a second aspect of an embodiment of the present disclosure, a video processing device is provided, including:

分段模块，被配置为执行对原始视频进行分段和关键帧提取，得到至少一个视频片段和每个视频片段中的关键帧；其中，同一个视频片段中的图像帧包含的待处理对象在待处理对象所属的图像帧上的位置信息满足预设位置信息；预设位置信息为图像帧所属的视频片段对应的信息；The segmentation module is configured to perform segmentation and key frame extraction on the original video to obtain at least one video segment and key frames in each video segment; wherein, the image frames in the same video segment contain objects to be processed in The position information on the image frame to which the object to be processed belongs satisfies the preset position information; the preset position information is the information corresponding to the video segment to which the image frame belongs;

确定模块，被配置为执行确定每个视频片段中的关键帧的待处理对象和关键帧的待处理对象的位置信息；A determining module configured to determine the position information of the object to be processed in the key frame and the object to be processed in the key frame in each video clip;

蒙版生成模块，被配置为执行基于每个关键帧的待处理对象和每个关键帧的待处理对象的位置信息，生成每个关键帧的待处理对象的蒙版；The mask generating module is configured to generate a mask of the object to be processed in each key frame based on the object to be processed in each key frame and the position information of the object to be processed in each key frame;

对象去除模块，被配置为执行基于至少一个视频片段和每个关键帧的待处理对象的蒙版对至少一个视频片段进行对象处理，得到目标视频。The object removal module is configured to perform object processing on at least one video segment based on at least one video segment and the mask of the object to be processed in each key frame to obtain a target video.

在一些可能的实施例中，分段模块，被配置为执行：In some possible embodiments, the segmentation module is configured to:

在一些可能的实施例中，基于分段模块，被配置为执行：In some possible embodiments, the segmentation-based module is configured to perform:

在一些可能的实施例中，装置还包括时间确定模块，被配置为执行：In some possible embodiments, the device further includes a time determination module configured to:

在一些可能的实施例中，确定模块，被配置为执行：In some possible embodiments, the determination module is configured to execute:

在一些可能的实施例中，蒙版生成模块，被配置为执行：In some possible embodiments, the mask generation module is configured to:

根据本公开实施例的第三方面，提供一种电子设备，包括：处理器；用于存储处理器可执行指令的存储器；其中，处理器被配置为执行指令，以实现如上述第一方面中任一项的方法。According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein, the processor is configured to execute instructions to implement the above-mentioned first aspect. any method.

根据本公开实施例的第四方面，提供一种计算机可读存储介质，当计算机可读存储介质中的指令由电子设备的处理器执行时，使得电子设备能够执行本公开实施例的第一方面中任一项的方法。According to a fourth aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, and when instructions in the computer-readable storage medium are executed by a processor of the electronic device, the electronic device can execute the first aspect of the embodiments of the present disclosure any of the methods.

根据本公开实施例的第五方面，提供一种计算机程序产品，计算机程序产品包括计算机程序，计算机程序存储在可读存储介质中，计算机设备的至少一个处理器从可读存储介质读取并执行计算机程序，使得计算机设备执行本公开实施例的第一方面中任一项的方法。According to a fifth aspect of the embodiments of the present disclosure, a computer program product is provided, the computer program product includes a computer program, the computer program is stored in a readable storage medium, and at least one processor of a computer device reads and executes the program from the readable storage medium A computer program that causes a computer device to execute the method of any one of the first aspects of the embodiments of the present disclosure.

本公开的实施例提供的技术方案至少带来以下有益效果：The technical solutions provided by the embodiments of the present disclosure bring at least the following beneficial effects:

对原始视频进行分段和关键帧提取，得到至少一个视频片段和每个视频片段中的关键帧，其中，同一个视频片段中的图像帧包含的待处理对象在待处理对象所属的图像帧上的位置信息满足预设位置信息，预设位置信息为图像帧所属的视频片段对应的信息，确定每个视频片段中的关键帧的待处理对象和关键帧的待处理对象的位置信息，基于每个关键帧的待处理对象和每个关键帧的待处理对象的位置信息，生成每个关键帧的待处理对象的蒙版，基于至少一个视频片段和每个关键帧的待处理对象的蒙版对至少一个视频片段进行对象处理，得到目标视频。本申请通过视频片段和关键帧可以准确定位到每个视频片段中待处理对象存在的位置，减少后续视频中待处理对象的去除所需要的算力资源，且可以应用于更多待处理对象所在的视频，更具普适性。Perform segmentation and key frame extraction on the original video to obtain at least one video segment and key frames in each video segment, wherein the image frame in the same video segment contains the object to be processed on the image frame to which the object to be processed belongs The position information meets the preset position information, and the preset position information is the information corresponding to the video segment to which the image frame belongs, and the position information of the object to be processed in the key frame and the object to be processed in the key frame are determined in each video segment, based on each The position information of the object to be processed of each key frame and the object to be processed of each key frame, generate the mask of the object to be processed of each key frame, based on at least one video clip and the mask of the object to be processed of each key frame Object processing is performed on at least one video segment to obtain a target video. This application can accurately locate the location of the object to be processed in each video clip through video clips and key frames, reduce the computing power resources required to remove the object to be processed in subsequent videos, and can be applied to more objects to be processed video, more universal.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained based on these drawings without creative effort.

图1是根据一示例性实施例示出的一种应用环境的示意图；Fig. 1 is a schematic diagram showing an application environment according to an exemplary embodiment;

图2是根据一示例性实施例示出的一种视频处理方法的流程图；Fig. 2 is a flowchart of a video processing method shown according to an exemplary embodiment;

图3是根据一示例性实施例示出的一种视频分段和关键帧提取方法的流程图；Fig. 3 is a flow chart of a method for video segmentation and key frame extraction according to an exemplary embodiment;

图4是根据一示例性实施例示出的一种视频分段和关键帧提取方法的流程图；Fig. 4 is a flowchart of a method for video segmentation and key frame extraction according to an exemplary embodiment;

图5是根据一示例性实施例示出的一种基于相似度数据对原始视频进行分段的方法的流程图；Fig. 5 is a flow chart showing a method for segmenting an original video based on similarity data according to an exemplary embodiment;

图6是根据一示例性实施例示出的一种基于相似度数据对原始视频进行分段的方法的流程图；Fig. 6 is a flow chart showing a method for segmenting an original video based on similarity data according to an exemplary embodiment;

图7是根据一示例性实施例示出的一种关键帧的待处理对象和关键帧的待处理对象的位置信息的确定方法的流程图；Fig. 7 is a flow chart showing an object to be processed in a key frame and a method for determining position information of the object to be processed in a key frame according to an exemplary embodiment;

图8是根据一示例性实施例示出的一种删除片尾的方法的流程图；Fig. 8 is a flow chart showing a method for deleting a credits trailer according to an exemplary embodiment;

图9是根据一示例性实施例示出的一种视频处理装置框图；Fig. 9 is a block diagram of a video processing device according to an exemplary embodiment;

图10是根据一示例性实施例示出的一种用于视频处理的电子设备的框图。Fig. 10 is a block diagram showing an electronic device for video processing according to an exemplary embodiment.

具体实施方式detailed description

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

需要说明的是，本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的第一对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反，它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。It should be noted that the terms "first" and "second" in the specification and claims of the present disclosure and the above drawings are used to distinguish similar first objects, but not necessarily to describe a specific order or sequence order. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present disclosure as recited in the appended claims.

需要说明的是，本公开所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于展示的数据、分析的数据等)，均为经用户授权或者经过各方充分授权的信息和数据。It should be noted that the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for display, data for analysis, etc.) involved in this disclosure are authorized by the user. Or information and data fully authorized by the parties.

请参阅图1，图1是根据一示例性实施例示出的一种视频处理方法的应用环境的示意图，如图1所示，该应用环境可以包括服务器01和客户端02。Please refer to FIG. 1 . FIG. 1 is a schematic diagram of an application environment of a video processing method according to an exemplary embodiment. As shown in FIG. 1 , the application environment may include a server 01 and a client 02 .

在一些可能的实施例中，服务器01可以包括是独立的物理服务器，也可以是多个物理服务器构成的服务器集群或者分布式系统，还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、中间件服务、域名服务、安全服务、CDN(Content DeliveryNetwork，内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器。服务器上运行的操作系统可以包括但不限于安卓系统、IOS系统、linux、windows、Unix等。In some possible embodiments, the server 01 may include an independent physical server, or a server cluster or a distributed system composed of multiple physical servers, and may also provide cloud services, cloud databases, cloud computing, cloud functions, Cloud servers for basic cloud computing services such as cloud storage, network services, middleware services, domain name services, security services, CDN (Content Delivery Network, content distribution network), and big data and artificial intelligence platforms. The operating system running on the server may include but not limited to Android system, IOS system, linux, windows, Unix, etc.

在一些可能的实施例中，上述的客户端02可以包括但不限于智能手机、台式计算机、平板电脑、笔记本电脑、智能音箱、数字助理、增强现实(augmented reality，AR)/虚拟现实(virtual reality，VR)设备、智能可穿戴设备等类型的客户端。也可以为运行于上述客户端的软体，例如应用程序、小程序等。可选的，客户端上运行的操作系统可以包括但不限于安卓系统、IOS系统、linux、windows、Unix等。In some possible embodiments, the above-mentioned client 02 may include but not limited to smartphones, desktop computers, tablet computers, notebook computers, smart speakers, digital assistants, augmented reality (augmented reality, AR)/virtual reality (virtual reality) , VR) devices, smart wearable devices and other types of clients. It may also be software running on the above-mentioned client, such as an application program, a small program, and the like. Optionally, the operating system running on the client may include but not limited to Android system, IOS system, linux, windows, Unix, etc.

在一些可能的实施例中，服务器01或者客户端02可以对原始视频进行分段和关键帧提取，得到至少一个视频片段和每个视频片段中的关键帧，其中，同一个视频片段中的图像帧包含的待处理对象在待处理对象所属的图像帧上的位置信息满足预设位置信息，预设位置信息为图像帧所属的视频片段对应的信息，确定每个视频片段中的关键帧的待处理对象和关键帧的待处理对象的位置信息，基于每个关键帧的待处理对象和每个关键帧的待处理对象的位置信息，生成每个关键帧的待处理对象的蒙版，基于至少一个视频片段和每个关键帧的待处理对象的蒙版对至少一个视频片段进行对象处理，得到目标视频。In some possible embodiments, the server 01 or the client 02 can perform segmentation and key frame extraction on the original video to obtain at least one video segment and key frames in each video segment, wherein the images in the same video segment The position information of the object to be processed contained in the frame on the image frame to which the object to be processed belongs satisfies the preset position information, and the preset position information is the information corresponding to the video segment to which the image frame belongs, and the key frame to be processed in each video segment is determined. The position information of the processing object and the object to be processed in the key frame is based on the position information of the object to be processed in each key frame and the position information of the object to be processed in each key frame, and the mask of the object to be processed in each key frame is generated, based on at least A video segment and a mask of an object to be processed in each key frame perform object processing on at least one video segment to obtain a target video.

在一些可能的实施例中，客户端02可以和服务器01之间可以通过有线链路连接，也可以通过无线链路连接。In some possible embodiments, the client 02 may be connected to the server 01 through a wired link, or may be connected through a wireless link.

在一个示例性的实施方式，客户端、服务器和服务器对应的数据库均可以是区块链系统中的节点设备，能够将获取到以及生成的信息共享给区块链系统中的其他节点设备，实现多个节点设备之间的信息共享。区块链系统中的多个节点设备可以配置有同一条区块链，该区块链由多个区块组成，并且前后相邻的区块具有关联关系，使得任一区块中的数据被篡改时都能通过下一区块检测到，从而能够避免区块链中的数据被篡改，保证区块链中数据的安全性和可靠性。In an exemplary embodiment, the client, the server, and the database corresponding to the server can all be node devices in the blockchain system, and can share the acquired and generated information with other node devices in the blockchain system to realize Information sharing between multiple node devices. Multiple node devices in the blockchain system can be configured with the same blockchain, which consists of multiple blocks, and the adjacent blocks have an association relationship, so that the data in any block is Any tampering can be detected through the next block, so that the data in the blockchain can be prevented from being tampered with, and the security and reliability of the data in the blockchain can be guaranteed.

图2是根据一示例性实施例示出的一种视频处理方法的流程图，如图2所示，视频处理方法可以应用于服务器，也可以应用于其他节点设备，比如客户端，下面以服务器为例进行阐述，该方法包括以下步骤：Fig. 2 is a flowchart of a video processing method shown according to an exemplary embodiment. As shown in Fig. 2, the video processing method can be applied to a server or to other node devices, such as a client, and the server is used as an example below For example, the method includes the following steps:

在步骤S201中，对原始视频进行分段和关键帧提取，得到至少一个视频片段和每个视频片段中的关键帧；其中，同一个视频片段中的图像帧包含的待处理对象在待处理对象所属的图像帧上的位置信息满足预设位置信息；预设位置信息为图像帧所属的视频片段对应的信息。In step S201, the original video is segmented and key frame extracted to obtain at least one video segment and a key frame in each video segment; wherein, the image frame in the same video segment contains the object to be processed in the object to be processed The position information on the associated image frame satisfies the preset position information; the preset position information is information corresponding to the video segment to which the image frame belongs.

本申请实施例中，服务器可以对原始视频进行分段和关键帧提取，得到至少一个视频片段和每个视频片段中的关键帧。In this embodiment of the present application, the server may segment and extract key frames from the original video to obtain at least one video segment and key frames in each video segment.

为了使得每个视频片段中的关键帧能够更好地辅助该关键帧所在的视频片段中的待处理对象的去除，同一个视频片段中的图像帧包含的待处理对象在待处理对象所属的图像帧上的位置信息满足预设位置信息，预设位置信息为图像帧所属的视频片段对应的信息。In order to enable the key frame in each video clip to better assist the removal of the object to be processed in the video clip where the key frame is located, the image frame in the same video clip contains the object to be processed in the image to which the object to be processed belongs The position information on the frame satisfies the preset position information, and the preset position information is the information corresponding to the video segment to which the image frame belongs.

可选的，同一个视频片段中的图像帧包含的待处理对象在待处理对象所属的图像帧上的位置信息满足预设位置信息可以是指：同一个视频片段中的所有图像帧包含的待处理对象在待处理对象所属的图像帧上的位置信息满足预设位置信息。Optionally, the position information of the object to be processed contained in the image frame in the same video segment on the image frame to which the object to be processed satisfies the preset position information may refer to: all image frames in the same video segment contain the position information to be processed The position information of the processing object on the image frame to which the object to be processed belongs satisfies the preset position information.

可选的，同一个视频片段中的图像帧包含的待处理对象在待处理对象所属的图像帧上的位置信息满足预设位置信息可以是指：同一个视频片段中的一部分图像帧包含的待处理对象在待处理对象所属的图像帧上的位置信息满足预设位置信息，另一部分图像帧不包含待处理对象，比如一个视频片段中，80％的图像帧包含待处理对象，且80％的图像帧包含的待处理对象满足预设位置信息，其余20％的图像帧不包含待处理对象。Optionally, the location information of the object to be processed contained in the image frame in the same video segment on the image frame to which the object to be processed satisfies the preset position information may refer to: the image frame to be processed contained in a part of the same video segment The position information of the processing object on the image frame to which the object belongs meets the preset position information, and another part of the image frames does not contain the object to be processed. For example, in a video clip, 80% of the image frames contain the object to be processed, and 80% of the The objects to be processed contained in the image frames satisfy the preset position information, and the remaining 20% of the image frames do not contain the objects to be processed.

本申请实施例中，每个视频片段都有其对应的一个预设位置信息，属于该视频片段中的图像帧包含的待处理对象在该待处理对象所属的图像帧上的位置信息在该预设位置信息之内。In the embodiment of the present application, each video segment has its corresponding preset position information, and the position information of the object to be processed contained in the image frame belonging to the video segment on the image frame to which the object to be processed belongs is within the preset position information. within the location information.

下面，通过第一个视频片段举例说明什么是每个视频片段都有其对应的一个预设位置信息。假设第一个视频片段中包含60个图像帧，且60个图像帧中，每个图像帧的分辨率为640*320，即每个图像帧的宽对应的像素值为640个，每个图像帧的高对应的像素值为320个。那么，第一个视频片段对应的预设位置信息可以为以下四个顶点划定的矩形区域：宽对应的像素值为0，且高对应的像素值为0(0，0)，宽对应的像素值为32，且高对应的像素值为0(32,0)，宽对应的像素值为0，且高对应的像素值为16(0,16)和宽对应的像素值为32，且高对应的像素值为16(32,16)。Next, the first video clip is used as an example to illustrate what each video clip has its corresponding preset position information. Assume that the first video clip contains 60 image frames, and among the 60 image frames, the resolution of each image frame is 640*320, that is, the pixel value corresponding to the width of each image frame is 640, and each image frame The pixel value corresponding to the height of the frame is 320. Then, the preset position information corresponding to the first video clip can be a rectangular area defined by the following four vertices: the pixel value corresponding to the width is 0, and the pixel value corresponding to the height is 0 (0, 0), and the pixel value corresponding to the width is 0 (0, 0). The pixel value is 32, and the pixel value corresponding to the height is 0 (32,0), the pixel value corresponding to the width is 0, and the pixel value corresponding to the height is 16 (0,16) and the pixel value corresponding to the width is 32, and High corresponds to a pixel value of 16(32,16).

在一种可选的实施例中，属于该视频片段中的图像帧包含的待处理对象在该待处理对象所属的图像帧上的位置信息在该预设位置信息之内是指：第一个视频片段中，60帧图像帧中的每一个图像帧包含的待处理对象在该待处理对象所属的图像帧上的位置信息是以下四个顶点划定的第一矩形区域，四个顶点为：(0,0)，(0,16)，(32,0)和(32,16)。In an optional embodiment, the position information of the object to be processed included in the image frame belonging to the video segment on the image frame to which the object to be processed belongs is within the preset position information means: the first In the video clip, the position information of the object to be processed contained in each of the 60 image frames on the image frame to which the object to be processed belongs is the first rectangular area delimited by the following four vertices, and the four vertices are: (0,0), (0,16), (32,0) and (32,16).

在另一种可选的实施例中，属于该视频片段中的图像帧包含的待处理对象在该待处理对象所属的图像帧上的位置信息在该预设位置信息之内是指：第一个视频片段中，60帧图像帧中的一部分图像帧包含的待处理对象在该待处理对象所属的图像帧上的位置信息是以下四个顶点划定的第二矩形区域，四个顶点为：(0,0)，(0,15)，(32,0)和(32,15)，另一部分图像帧包含的待处理对象在该待处理对象所属的图像帧上的位置信息是以下四个顶点划定的第三矩形区域，四个顶点为：(0,0)，(0,16)，(28,0)和(28,16)。可见，第二矩形区域和第三矩形区域都包含在第一矩形区域中。In another optional embodiment, the position information of the object to be processed included in the image frame belonging to the video segment on the image frame to which the object to be processed belongs is within the preset position information means: first In a video segment, the position information of the object to be processed contained in a part of the image frames in the 60 image frames on the image frame to which the object to be processed belongs is the second rectangular area delimited by the following four vertices, and the four vertices are: (0,0), (0,15), (32,0) and (32,15), the position information of the object to be processed contained in another part of the image frame on the image frame to which the object to be processed belongs is the following four The third rectangular area defined by vertices, the four vertices are: (0,0), (0,16), (28,0) and (28,16). It can be seen that both the second rectangular area and the third rectangular area are included in the first rectangular area.

可选的，服务器得到至少一个视频片段和该至少一个视频片段中的每个视频片段中的关键帧的方式有很多种，下面通过多个实施方式阐述如何得到至少一个视频片段和该至少一个视频片段中的每个视频片段中的关键帧。Optionally, there are many ways for the server to obtain at least one video clip and the key frames in each of the at least one video clip. The following describes how to obtain at least one video clip and the at least one video clip through multiple implementations. Keyframes in each video clip in the clip.

在一种可选的实施例中，服务器可以获取预设时间间隔，并基于该预设时间间隔从原始视频中获取多个关键帧。随后，服务器可以基于多个关键帧在原始视频中的位置将原始视频分成多个视频片段。In an optional embodiment, the server may obtain a preset time interval, and obtain multiple key frames from the original video based on the preset time interval. Subsequently, the server may divide the original video into multiple video segments based on the positions of the multiple key frames in the original video.

举个例子，假设原始视频的帧率为30帧每秒，且原始视频的时长为10秒，预设时间间隔为2秒，则服务器基于该预设时间间隔从原始视频中获取的多个关键帧为第1个图像帧、第61个图像帧、第121个图像帧、第181个图像帧、第241个图像帧。随后，服务器可以将第1个图像帧至第60个图像帧包含的所有图像帧确定为第一个视频片段，将第61个图像帧至第120个图像帧包含的所有图像帧确定为第二个视频片段，将第121个图像帧至第180个图像帧包含的所有图像帧确定为第三个视频片段……如此，服务器可以将原始视频分成5个视频片段，且将每个视频片段的第一个图像帧确定为该视频片段中的关键帧。For example, assuming that the frame rate of the original video is 30 frames per second, the duration of the original video is 10 seconds, and the preset time interval is 2 seconds, the server obtains multiple keys from the original video based on the preset time interval The frames are the first image frame, the 61st image frame, the 121st image frame, the 181st image frame, and the 241st image frame. Subsequently, the server may determine all image frames included in the first image frame to the 60th image frame as the first video segment, and determine all image frames included in the 61st image frame to the 120th image frame as the second video clip. video clips, determine all the image frames included in the 121st image frame to the 180th image frame as the third video clip... In this way, the server can divide the original video into 5 video clips, and divide each video clip into 5 video clips The first image frame is determined as the key frame in the video clip.

在另一种可选的实施例中，服务器可以基于原始视频中的待处理对象的显示规则对原始视频进行分段和关键帧提取，得到至少一个视频片段和每个视频片段中的关键帧。图3是根据一示例性实施例示出的一种视频分段和关键帧提取方法的流程图，如图3所示，包括：In another optional embodiment, the server may perform segmentation and key frame extraction on the original video based on the display rule of the object to be processed in the original video to obtain at least one video segment and a key frame in each video segment. Fig. 3 is a flowchart of a method for video segmentation and key frame extraction according to an exemplary embodiment, as shown in Fig. 3 , including:

在步骤S301中，对原始视频进行对象识别，得到原始视频的待处理对象的类型信息。In step S301, object recognition is performed on the original video to obtain type information of the object to be processed in the original video.

本申请实施例中，待处理对象的类型信息可以是指该待处理对象所属的公司信息或者组织信息。因此，服务器可以对原始视频进行对象识别，得到原始视频中，待处理对象所属的公司信息或者组织信息。In this embodiment of the present application, the type information of the object to be processed may refer to company information or organization information to which the object to be processed belongs. Therefore, the server can perform object recognition on the original video to obtain company information or organization information of the object to be processed in the original video.

在步骤S303中，基于待处理对象的类型信息确定待处理对象在原始视频上的显示规则。In step S303, the display rule of the object to be processed on the original video is determined based on the type information of the object to be processed.

本申请实施例中，待处理对象在原始视频上的显示规则是指待处理对象在原始视频上的显示区域和显示时长。举个例子，待处理对象A基于其提前设置的显示参数呈现在原始视频上的显示规则为：待处理对象A在原始视频上始终显示在每个图像帧的左上角区域(以左上角为一个顶点，5*5的区域)。待处理对象B基于其提前设置的显示参数呈现在原始视频上的显示规则为：待处理对象B在原始视频的第0-2秒之间显示在图像帧的左上角区域(以左上角为一个顶点，5*5的区域)，在原始视频的第2-4秒之间显示在图像帧的右下角区域(以右下角为一个顶点，5*5的区域)，在原始视频的第4-6秒之间显示在图像帧的右上角区域(以右下角为一个顶点，5*5的区域)……In the embodiment of the present application, the display rule of the object to be processed on the original video refers to the display area and display duration of the object to be processed on the original video. For example, the display rule for the object A to be processed to be presented on the original video based on the display parameters set in advance is: the object A to be processed is always displayed in the upper left corner area of each image frame on the original video (with the upper left corner as a vertices, 5*5 regions). The display rule for the object B to be processed to be presented on the original video based on the display parameters set in advance is: the object B to be processed is displayed in the upper left corner area of the image frame between the 0-2 seconds of the original video (taking the upper left corner as a vertex, 5*5 area), displayed in the lower right corner area of the image frame between the 2nd and 4th seconds of the original video (with the lower right corner as a vertex, 5*5 area), in the 4th- Displayed in the upper right corner area of the image frame within 6 seconds (with the lower right corner as a vertex, 5*5 area)...

在步骤S305中，基于显示规则对原始视频进行分段，得到至少一个视频片段。In step S305, the original video is segmented based on the display rule to obtain at least one video segment.

基于上述的待处理对象A继续阐述，由于待处理对象A的显示区域为“左上角区域”，显示时长为“始终”，因此，服务器可以基于显示区域和显示时长不对原始视频进行分段，也就是说，原始视频为一个整体。Continue to elaborate based on the above object A to be processed. Since the display area of the object A to be processed is the "upper left corner area" and the display duration is "always", the server may not segment the original video based on the display area and display duration. That is, the original video as a whole.

基于上述的待处理对象B继续阐述，服务器可以根据待处理对象B在原始视频上的显示区域和显示时长对原始视频进行分段，将第0-2秒对应的视频片段当作第一个视频片段，将第2-4秒对应的视频片段当作第二个视频片段，将第4-6秒对应的视频片段当作第三个视频片段……Based on the above-mentioned object B to be processed, the server can segment the original video according to the display area and display duration of the object B to be processed on the original video, and regard the video segment corresponding to the 0-2 second as the first video Segment, the video segment corresponding to the 2nd-4th second is regarded as the second video segment, and the video segment corresponding to the 4th-6th second is regarded as the third video segment...

在步骤S307中，将至少一个视频片段中的每个视频片段中的预设图像帧确定为每个视频片段中的关键帧。In step S307, a preset image frame in each video segment of at least one video segment is determined as a key frame in each video segment.

可选的，针对于待处理对象A对应的分段结果，服务器可以将第一个图像帧确定为原始视频中的关键帧，或者可以将原始视频中的任意一个图像帧确定为原始视频中的关键帧。Optionally, for the segmentation result corresponding to object A to be processed, the server may determine the first image frame as a key frame in the original video, or may determine any image frame in the original video as a key frame in the original video Keyframe.

可选的，针对于待处理对象B对应的分段结果，服务器可以将第一个视频片段中，第一个图像帧确定为第一个视频片段中的关键帧，或者可以将第一个视频片段中的任意一个图像帧确定为第一个视频片段中的关键帧，将第二个视频片段中，第一个图像帧确定为第二个视频片段中的关键帧，或者可以将第二个视频片段中的任意一个图像帧确定为第二个视频片段中的关键帧……其他视频片段中的关键帧的确定请参考第一个视频片段中关键帧的确定，这里不再赘述。Optionally, for the segmentation result corresponding to object B to be processed, the server may determine the first image frame in the first video segment as a key frame in the first video segment, or may determine the first video frame Any image frame in the clip is determined as a key frame in the first video clip, and in the second video clip, the first image frame is determined as a key frame in the second video clip, or the second Any image frame in the video clip is determined as a key frame in the second video clip...For the determination of key frames in other video clips, please refer to the determination of key frames in the first video clip, and will not be repeated here.

综上，由于显示规则是和待处理对象的类型信息关联的，进而导致待处理对象在图像帧上的显示存在一定的规则性，因此服务器可以按照待处理对象的显示规则，比如显示区域和显示时长精确地对原始视频进行分段以及确定出每个视频片段的关键帧，为后续利用每个视频片段中的关键帧更好地辅助该关键帧所在的视频片段进行对象去除做铺垫。In summary, since the display rules are associated with the type information of the objects to be processed, the display of the objects to be processed on the image frame has certain regularity, so the server can follow the display rules of the objects to be processed, such as the display area and display Segment the original video accurately and determine the key frame of each video clip, paving the way for the subsequent use of the key frame in each video clip to better assist the video clip where the key frame is located for object removal.

在另一种可选的实施例中，服务器可以基于图像帧之间的相似度数据对原始视频进行分段和关键帧提取，得到至少一个视频片段和每个视频片段中的关键帧。图4是根据一示例性实施例示出的一种视频分段和关键帧提取方法的流程图，如图4所示，包括：In another optional embodiment, the server may perform segmentation and key frame extraction on the original video based on similarity data between image frames to obtain at least one video segment and a key frame in each video segment. Fig. 4 is a flow chart of a method for video segmentation and key frame extraction according to an exemplary embodiment, as shown in Fig. 4 , including:

在步骤S401中，对原始视频中的第一图像帧进行对象识别，得到原始视频的待处理对象；第一图像帧为原始视频中第一次出现待处理对象的图像帧。In step S401, object recognition is performed on the first image frame in the original video to obtain the object to be processed in the original video; the first image frame is the image frame in which the object to be processed appears for the first time in the original video.

本申请实施例中，原始视频可以是发布在视频平台上的视频，因此，原始视频中的待处理对象在原始视频中可以是统一的，比如都是视频平台的待处理对象，或者是视频平台加上发布者的待处理对象。In the embodiment of the present application, the original video may be a video published on the video platform. Therefore, the objects to be processed in the original video may be unified in the original video, for example, they are all objects to be processed on the video platform, or the objects to be processed on the video platform Plus the publisher's pending object.

可选的，原始视频中的每个图像帧的分辨率是一致的，比如都是640*320。Optionally, the resolution of each image frame in the original video is consistent, such as 640*320.

在一种可选的实施例中，原始视频中的待处理对象可以从第一个图像帧开始的，因此，服务器可以对第一个图像帧进行对象识别，得到原始视频的待处理对象。In an optional embodiment, the object to be processed in the original video may start from the first image frame, therefore, the server may perform object recognition on the first image frame to obtain the object to be processed in the original video.

在另一种可选的实施例中，若原始视频中的待处理对象不是从第一个图像帧开始的，则服务器可以依次对图像帧进行对象识别，直到得到待处理对象。这样，服务器可以将存在待处理对象的那一个图像帧确定为第一图像帧。In another optional embodiment, if the object to be processed in the original video does not start from the first image frame, the server may perform object recognition on the image frames in sequence until the object to be processed is obtained. In this way, the server may determine the image frame in which the object to be processed exists as the first image frame.

本申请实施例中，服务器可以通过对象检测模型对原始视频中的第一图像帧进行对象识别，得到原始视频的待处理对象。可选的，对象检测模型可以包括但不限于采用卷积神经网络、循环神经网络或递归神经网络等深度学习模型。In the embodiment of the present application, the server may perform object recognition on the first image frame in the original video by using the object detection model to obtain the object to be processed in the original video. Optionally, the object detection model may include, but is not limited to, a deep learning model such as a convolutional neural network, a recurrent neural network, or a recurrent neural network.

在步骤S403中，确定待处理对象在第一图像帧上的第一位置信息。In step S403, the first position information of the object to be processed on the first image frame is determined.

在一种可选的实施例中，服务器不仅可以通过对象检测模型对原始视频中的第一图像帧进行对象识别，得到原始视频的待处理对象，还可以得到待处理对象在第一图像帧上的第一位置信息。In an optional embodiment, the server can not only perform object recognition on the first image frame in the original video through the object detection model to obtain the object to be processed in the original video, but also obtain the object to be processed on the first image frame The first location information of .

在另一种可选的实施例中，在服务器确定出原始视频的待处理对象后，还可以在第一图像帧上的待处理对象周围标上矩形框，并基于该矩形框在第一图像帧中所包含的像素确定出待处理对象在第一图像帧上的第一位置信息。In another optional embodiment, after the server determines the object to be processed in the original video, it may also mark a rectangular frame around the object to be processed on the first image frame, and based on the rectangular frame in the first image frame The pixels contained in the frame determine the first position information of the object to be processed on the first image frame.

可选的，第一位置信息是指矩形框四个角对应的四个像素对((X1，Y1)、(X1，Y2)、(X2，Y1)、(X2，Y2))，或者第一位置信息是指矩形框两个对角对应的两个像素对((X1，Y1)、(X2，Y2))，两个对角对应的两个像素对((X1，Y1)、(X2，Y2))可以定位矩形框四个角对应的四个像素对((X1，Y1)、(X1，Y2)、(X2，Y1)、(X2，Y2))。Optionally, the first position information refers to the four pixel pairs ((X1, Y1), (X1, Y2), (X2, Y1), (X2, Y2)) corresponding to the four corners of the rectangular frame, or the first The location information refers to the two pixel pairs ((X1, Y1), (X2, Y2)) corresponding to the two diagonal corners of the rectangular frame, and the two pixel pairs corresponding to the two diagonal corners ((X1, Y1), (X2, Y2)) can locate four pixel pairs ((X1, Y1), (X1, Y2), (X2, Y1), (X2, Y2)) corresponding to the four corners of the rectangular frame.

在步骤S405中，基于第一位置信息对第一图像帧进行图像截取，得到第一位置信息对应的第一子图像。In step S405, image interception is performed on the first image frame based on the first position information to obtain a first sub-image corresponding to the first position information.

本申请实施例中，服务器可以基于第一位置信息，比如(矩形框四个角对应的四个像素对(X1，Y1)、(X1，Y2)、(X2，Y1)、(X2，Y2))对第一图像帧进行图像截取，得到第一位置信息对应的第一子图像。In the embodiment of the present application, the server may be based on the first position information, such as (four pixel pairs (X1, Y1), (X1, Y2), (X2, Y1), (X2, Y2) corresponding to the four corners of the rectangular frame ) performing image interception on the first image frame to obtain a first sub-image corresponding to the first position information.

在步骤S407中，基于第一子图像和第二图像帧集中的每个第二图像帧确定每个第二图像帧对应的相似度数据；第二图像帧集包括原始视频中，除第一图像帧外的图像帧。In step S407, the similarity data corresponding to each second image frame is determined based on the first sub-image and each second image frame in the second image frame set; the second image frame set includes the original video, except for the first image Image frame out of frame.

可选的，服务器可以将原始视频中，除了第一图像帧之外的图像帧组成第二图像帧集。并基于第一子图像和第二图像帧集中的每个第二图像帧确定每个第二图像帧对应的相似度数据。Optionally, the server may combine image frames in the original video except the first image frame into a second image frame set. And based on the first sub-image and each second image frame in the second image frame set, the similarity data corresponding to each second image frame is determined.

在步骤S409中，基于每个第二图像帧对应的相似度数据对原始视频进行分段，得到至少一个视频片段。In step S409, the original video is segmented based on the similarity data corresponding to each second image frame to obtain at least one video segment.

可选的，服务器可以基于每个第二图像帧对应的相似度数据对原始视频进行分段，得到至少一个视频片段。Optionally, the server may segment the original video based on the similarity data corresponding to each second image frame to obtain at least one video segment.

在步骤S411中，将至少一个视频片段中的每个视频片段中的预设图像帧确定为每个视频片段中的关键帧。In step S411, a preset image frame in each video segment of at least one video segment is determined as a key frame in each video segment.

可选的，服务器可以将每一个视频片段中的第一个图像帧确定为每一个视频片段中的关键帧，或者可以将每一个视频片段中的任意一个图像帧确定为每一个视频片段中的关键帧。Optionally, the server may determine the first image frame in each video segment as a key frame in each video segment, or may determine any image frame in each video segment as a key frame in each video segment Keyframe.

如上，本申请实施例中，根据图像帧之间的相似度数据可以准确地将存在同样待处理对象的图像帧分在一个视频片段，使得在最后阶段，需要利用该视频片段的关键帧的待处理对象的蒙版对该视频片段进行待处理对象去除的时候，可以因为视频片段的关键帧的待处理对象的蒙版快速定位到该视频片段中的图像帧中待处理对象的位置，进而统一对视频片段进行待处理对象的去除。As above, in the embodiment of the present application, according to the similarity data between the image frames, the image frames with the same object to be processed can be accurately divided into a video segment, so that in the final stage, it is necessary to use the key frame of the video segment to be processed. When removing the object to be processed by the mask of the processing object, the mask of the object to be processed in the key frame of the video clip can quickly locate the position of the object to be processed in the image frame in the video clip, and then unify Remove the object to be processed on the video clip.

图5是根据一示例性实施例示出的一种基于相似度数据对原始视频进行分段的方法的流程图，如图5所示，包括：Fig. 5 is a flow chart of a method for segmenting an original video based on similarity data according to an exemplary embodiment, as shown in Fig. 5 , including:

在步骤S501中，获取每个第二图像帧中与第一位置信息对应的第二子图像。In step S501, a second sub-image corresponding to the first position information in each second image frame is acquired.

基于上文的第一位置信息，即矩形框四个角对应的四个像素对((X1，Y1)、(X1，Y2)、(X2，Y1)、(X2，Y2))继续阐述，服务器可以获取每个第二图像帧中，和第一位置信息对应的第二子图像。也即是说，每个第二图像帧对应的第二子图像都是对每个第二图像帧对应的矩形框进行截取，得到的。Based on the first position information above, that is, the four pixel pairs ((X1, Y1), (X1, Y2), (X2, Y1), (X2, Y2)) corresponding to the four corners of the rectangular frame continue to elaborate, the server A second sub-image corresponding to the first position information in each second image frame may be acquired. That is to say, the second sub-image corresponding to each second image frame is obtained by intercepting the rectangular frame corresponding to each second image frame.

在步骤S502中，基于每个第二子图像和第一子图像的相似程度，确定每个第二图像帧对应的相似度数据。In step S502, based on the degree of similarity between each second sub-image and the first sub-image, similarity data corresponding to each second image frame is determined.

可选的，假设第一子图像和第二子图像为5*5的像素对应的子图像，则服务器可以将第一子图像和第二子图像上的像素进行像素对的确定，得到25个像素对。比如，将第一子图像上，行位置为1，列位置为1的像素和第二子图像上，行位置为1，列位置为1的像素组成一个像素对，将第一子图像上，行位置为5，列位置为5的像素和第二子图像上，行位置为5，列位置为5的像素组成一个像素对。Optionally, assuming that the first sub-image and the second sub-image are sub-images corresponding to 5*5 pixels, the server can determine the pixel pairs of the pixels on the first sub-image and the second sub-image to obtain 25 pixel pairs. For example, on the first sub-image, the pixel whose row position is 1 and whose column position is 1 and the pixel whose row position is 1 and whose column position is 1 on the second sub-image form a pixel pair, and on the first sub-image, The pixel with row position 5 and column position 5 and the pixel with row position 5 and column position 5 on the second sub-image form a pixel pair.

随后，服务器可以基于每个像素对进行相似程度的对比，得到每个像素对对应的相似度数据，可选的，该相似度数据可以是百分比数据。进而，服务器可以基于25个像素对的相似度数据的均值确定每个第二图像帧对应的相似度数据。Subsequently, the server may compare similarity based on each pixel pair to obtain similarity data corresponding to each pixel pair. Optionally, the similarity data may be percentage data. Furthermore, the server may determine the similarity data corresponding to each second image frame based on the mean value of the similarity data of 25 pixel pairs.

在步骤S503中，对每个第二图像帧对应的相似度数据和预设数据进行比较，若每个第二图像帧对应的相似度数据满足预设数据，转至步骤S504；否则，转至步骤S505。In step S503, compare the similarity data corresponding to each second image frame with the preset data, if the similarity data corresponding to each second image frame meets the preset data, go to step S504; otherwise, go to Step S505.

在步骤S504中，得到一个视频片段；该视频片段的第一个图像帧为第一图像帧。In step S504, a video segment is obtained; the first image frame of the video segment is the first image frame.

可选的，服务器可以对每个第二图像帧对应的相似度数据和预设数据进行比较，假设预设数据为80％，若每个第二图像帧对应的相似度数据都大于或者等于预设数据，可以得到一个视频片段。也就是说，将该原始视频作为一个视频片段。其中，该视频片段的第一个图像帧为该第一图像帧。Optionally, the server may compare the similarity data corresponding to each second image frame with the preset data, assuming that the preset data is 80%, if the similarity data corresponding to each second image frame is greater than or equal to the preset Set data, you can get a video clip. That is, the original video is regarded as a video clip. Wherein, the first image frame of the video segment is the first image frame.

在步骤S505中，若第二图像帧集中存在第一目标图像帧集，且第一目标图像帧集中位置位于第一的第一目标图像帧与第一图像帧相邻，且在原始视频中位于第一图像帧之后，基于第一图像帧和第一目标图像帧集确定第一个视频片段；第一目标图像帧集包括一个第一目标图像帧或者多个连续的第一目标图像帧；第一目标图像帧集不等于第二图像帧集，第一目标图像帧集中的第一目标图像帧对应的相似度数据满足预设数据。In step S505, if there is a first target image frame set in the second image frame set, and the first target image frame in the first target image frame set is adjacent to the first image frame and located in the original video frame After the first image frame, determine the first video segment based on the first image frame and the first target image frame set; the first target image frame set includes a first target image frame or a plurality of continuous first target image frames; the second A target image frame set is not equal to the second image frame set, and the similarity data corresponding to the first target image frames in the first target image frame set satisfies preset data.

可选的，服务器可以对每个第二图像帧对应的相似度数据和预设数据进行比较，假设预设数据为80％，第一图像帧为原始视频中的第一个图像帧，原始视频包括300个图像帧。若第二图像帧集(包括第2-300个图像帧)中存在第一目标图像帧集(包括第2-60个图像帧)，且第一目标图像帧集中位置位于第一的第一目标图像帧(第2个图像帧)与第一图像帧相邻，且在原始视频中位于第一图像帧之后，服务器可以基于第一图像帧和第一目标图像帧集确定第一个视频片段。Optionally, the server can compare the similarity data corresponding to each second image frame with the preset data, assuming that the preset data is 80%, the first image frame is the first image frame in the original video, and the original video Contains 300 image frames. If the first target image frame set (including the 2-60th image frame) exists in the second image frame set (including the 2-300th image frame), and the position of the first target image frame set is located in the first first target The image frame (the second image frame) is adjacent to the first image frame and located after the first image frame in the original video, and the server may determine the first video segment based on the first image frame and the first target image frame set.

其中，第一目标图像帧集不等于第二图像帧集。第一目标图像帧集中的第一目标图像帧对应的相似度数据(比如都大于或者等于80％)满足预设数据。Wherein, the first target image frame set is not equal to the second image frame set. The similarity data (for example, all greater than or equal to 80%) corresponding to the first target image frames in the first target image frame set satisfy the preset data.

这样，服务器可以确定第一个视频片段。In this way, the server can determine the first video segment.

在步骤S506中，以第二图像帧集和第一目标图像帧集之间的差集为待分段视频。In step S506, the difference between the second image frame set and the first target image frame set is used as the video to be segmented.

以上文的实施例继续阐述，服务器可以将第二图像帧集(包括第2-300个图像帧)和一目标图像帧集(包括第2-60个图像帧)之间的差集(包括第61-300个图像帧)为待分段视频。Continuing to explain the above embodiment, the server can calculate the difference between the second image frame set (including the 2-300th image frame) and a target image frame set (including the 2-60th image frame) 61-300 image frames) are videos to be segmented.

在步骤S507中，以待分段视频的第一个图像帧为新的第一图像帧；以待分段视频中，除新的第一图像帧之外的图像帧为新的第二图像帧集。In step S507, the first image frame of the video to be segmented is a new first image frame; in the video to be segmented, the image frames other than the new first image frame are a new second image frame set.

以上文的实施例继续阐述，服务器可以以待分段视频的第一个图像帧(第61个图像帧)为新的第一图像帧；以待分段视频中，除新的第一图像帧之外的图像帧(第62-300个图像帧)为新的第二图像帧集。Continue to elaborate with the above embodiment, the server can use the first image frame (the 61st image frame) of the video to be segmented as the new first image frame; in the video to be segmented, remove the new first image frame The other image frames (the 62nd-300th image frames) are the new second image frame set.

在步骤S508中，确定待处理对象在新的第一图像帧上的新的第一位置信息。In step S508, new first position information of the object to be processed on the new first image frame is determined.

可选的，本申请实施例中，服务器可以通过对象检测模型确定待处理对象在新的第一图像帧上的新的第一位置信息，其中，新的第一位置信息可以为矩形框四个角对应的四个像素对((X3，Y3)、(X3，Y4)、(X4，Y3)、(X4，Y4))。可选的，对象检测模型可以包括但不限于采用卷积神经网络、循环神经网络或递归神经网络等深度学习模型。Optionally, in this embodiment of the present application, the server may determine the new first position information of the object to be processed on the new first image frame through the object detection model, where the new first position information may be a rectangular box four The four pixel pairs corresponding to the corners ((X3, Y3), (X3, Y4), (X4, Y3), (X4, Y4)). Optionally, the object detection model may include, but is not limited to, a deep learning model such as a convolutional neural network, a recurrent neural network, or a recurrent neural network.

在步骤S509中，从新的第一图像帧上，得到新的第一位置信息对应的新的第一子图像，以及从新的第二图像帧集中的每个第二图像帧中，获取与新的第一位置信息对应的新的第二子图像。In step S509, from the new first image frame, a new first sub-image corresponding to the new first position information is obtained, and from each second image frame in the new second image frame set, the new sub-image corresponding to the new A new second sub-image corresponding to the first position information.

可选的，服务器可以从新的第一图像帧上，得到新的第一位置信息对应的新的第一子图像，以及从新的第二图像帧集中的每个第二图像帧中，获取与新的第一位置信息对应的新的第二子图像。Optionally, the server may obtain the new first sub-image corresponding to the new first position information from the new first image frame, and obtain the new sub-image corresponding to the new sub-image from each second image frame in the new second image frame set. The new second sub-image corresponding to the first position information of .

在步骤S510中，基于新的第一子图像和每个新的第二子图像的相似程度，确定每个新的第二子图像的相似度数据。In step S510, based on the degree of similarity between the new first sub-image and each new second sub-image, similarity data of each new second sub-image is determined.

可选的，假设新的第一子图像和新的第二子图像为6*6的像素对应的子图像，则服务器可以将新的第一子图像和新的第二子图像上的像素进行像素对的确定，得到36个像素对。比如，将新的第一子图像上，行位置为1，列位置为1的像素和新的第二子图像上，行位置为1，列位置为1的像素组成一个像素对，将新的第一子图像上，行位置为6，列位置为6的像素和新的第二子图像上，行位置为6，列位置为6的像素组成一个像素对。Optionally, assuming that the new first sub-image and the new second sub-image are sub-images corresponding to 6*6 pixels, the server may compare the pixels on the new first sub-image and the new second sub-image Determination of pixel pairs yields 36 pixel pairs. For example, on the new first sub-image, the pixel whose row position is 1 and whose column position is 1 and the pixel whose row position is 1 and whose column position is 1 on the new second sub-image form a pixel pair, and the new The pixel with row position 6 and column position 6 on the first sub-image and the pixel with row position 6 and column position 6 on the new second sub-image form a pixel pair.

随后，服务器可以基于每个像素对进行相似程度的对比，得到每个像素对对应的相似度数据，可选的，该相似度数据可以是百分比数据。进而，服务器可以基于36个像素对的相似度数据的均值确定每个第二图像帧对应的相似度数据。Subsequently, the server may compare similarity based on each pixel pair to obtain similarity data corresponding to each pixel pair. Optionally, the similarity data may be percentage data. Further, the server may determine the similarity data corresponding to each second image frame based on the mean value of the similarity data of the 36 pixel pairs.

在步骤S511中，若新的第二图像帧集中存在新的第一目标图像帧集，且新的第一目标图像帧集中，位置位于第一的第一目标图像帧与新的第一图像帧相邻，且在原始视频中位于新的第一图像帧之后，基于新的第一图像帧和新的第一目标图像帧集确定第二个视频片段；新的第一目标图像帧集包括一个新的第一目标图像帧或者多个连续的新的第一目标图像帧；新的第一目标图像帧集不等于新的第二图像帧集，新的第一目标图像帧集中的第一目标图像帧对应的相似度数据满足预设数据。In step S511, if there is a new first target image frame set in the new second image frame set, and in the new first target image frame set, the position is between the first first target image frame and the new first image frame Adjacent, and after the new first image frame in the original video, determine the second video segment based on the new first image frame and the new first target image frame set; the new first target image frame set includes a A new first target image frame or multiple consecutive new first target image frames; the new first target image frame set is not equal to the new second image frame set, the first target in the new first target image frame set The similarity data corresponding to the image frame satisfies the preset data.

可选的，服务器可以对每个新的第二图像帧对应的相似度数据和预设数据进行比较。若新的第二图像帧集(第62-300个图像帧))中存在新的第一目标图像帧集(包括第62-120个图像帧)，且新的第一目标图像帧集中位置位于新的第一目标图像帧(第62个图像帧)与新的第一图像帧(第61个图像帧)相邻，且在原始视频中位于新的第一图像帧之后，服务器可以基于新的第一图像帧和新的第一目标图像帧集确定第二个视频片段(包括第61-120个图像帧)。Optionally, the server may compare the similarity data corresponding to each new second image frame with preset data. If there is a new first target image frame set (including the 62-120th image frame) in the new second image frame set (the 62nd-300th image frame), and the position of the new first target image frame set is at The new first target image frame (the 62nd image frame) is adjacent to the new first image frame (the 61st image frame), and after the new first image frame in the original video, the server can base on the new The first image frame and the new first target image frame set determine the second video segment (including image frames 61-120).

其中，新的第一目标图像帧集不等于新的第二图像帧集。新的第一目标图像帧集中的第一目标图像帧对应的相似度数据(比如都大于或者等于80％)满足预设数据。Wherein, the new first target image frame set is not equal to the new second image frame set. The similarity data (for example, all greater than or equal to 80%) corresponding to the first target image frames in the new first target image frame set satisfy the preset data.

这样，服务器可以确定第二个视频片段。接着，服务器可以参考第二个视频片段的确定方式确定第三个视频片段、第四个视频片段……直至将原始视频中的所有图像帧都分到视频片段中。In this way, the server can determine the second video segment. Next, the server can determine the third video segment, the fourth video segment... until all image frames in the original video are divided into video segments by referring to the determination method of the second video segment.

如上，通过第一图像帧中，位于第一位置信息的第一子图像，和其他图像帧中，第一位置信息对应的第二子图像之间的相似度数据准确地将存在同样情况的待处理对象的图像帧分在一个视频片段，进而将该原始视频分成多个视频片段。使得在最后阶段，需要利用该视频片段的关键帧的待处理对象的蒙版对该视频片段进行待处理对象去除的时候，可以因为视频片段的关键帧的待处理对象的蒙版快速定位到该视频片段中的图像帧中待处理对象的位置，进而统一对视频片段进行待处理对象的去除。As above, through the similarity data between the first sub-image located in the first position information in the first image frame, and the second sub-image corresponding to the first position information in other image frames, the same situation will exist exactly. The image frames of the processing object are divided into one video segment, and then the original video is divided into multiple video segments. So that in the final stage, when the mask of the object to be processed in the key frame of the video clip needs to be used to remove the object to be processed in the video clip, the mask of the object to be processed in the key frame of the video clip can be quickly positioned to the The position of the object to be processed in the image frame in the video clip, and then remove the object to be processed for the video clip in a unified manner.

图6是根据一示例性实施例示出的一种基于相似度数据对原始视频进行分段的方法的流程图，如图6所示，包括：Fig. 6 is a flow chart of a method for segmenting an original video based on similarity data according to an exemplary embodiment, as shown in Fig. 6 , including:

在步骤S601中，将第一图像帧确定为当前图像帧。In step S601, the first image frame is determined as the current image frame.

以第一图像帧为300帧的原始视频中的第一个图像帧阐述，服务器将第一图像帧确定为当前图像帧。Assuming that the first image frame is the first image frame in the 300 frames of the original video, the server determines the first image frame as the current image frame.

在步骤S602中，按照视频播放顺序，将原始视频中，距离当前图像帧预设间隔的图像帧确定为执行图像帧。In step S602, according to the sequence of video playback, the image frame in the original video that is at a preset distance from the current image frame is determined as the execution image frame.

本申请实施例中，服务器可以按照视频播放顺序，将原始视频中，距离当前图像帧预设间隔的图像帧当作执行图像帧。In the embodiment of the present application, the server may use the image frame in the original video that is at a preset distance from the current image frame as the execution image frame according to the video playback sequence.

可选的，为了保证后续视频分段的准确率，预设间隔可以是间隔一个图像帧。也就说书，服务器可以按照视频播放顺序，将原始视频中，距离当前图像帧一个图像帧的图像帧当作执行图像帧，即将原始视频的第二个图像帧当作执行图像帧。Optionally, in order to ensure the accuracy of subsequent video segmentation, the preset interval may be one image frame apart. That is to say, the server can take the image frame one image frame away from the current image frame in the original video as the execution image frame, that is, the second image frame of the original video as the execution image frame according to the order of video playback.

可选的，在实际应用中，待处理对象在原始视频中的图像帧上的位置在短时间内可以是不变的。基于此，为了保证一定的软件处理速度，在牺牲一定的分段准确率的情况下，预设间隔可以是间隔几个图像帧，比如间隔5个图像帧。也就是说，服务器可以按照视频播放顺序，将原始视频中，距离当前图像帧五个图像帧的图像帧当作执行图像帧，即将原始视频的第六个图像帧当作执行图像帧。Optionally, in practical applications, the position of the object to be processed on the image frame in the original video may not change in a short period of time. Based on this, in order to ensure a certain software processing speed, the preset interval may be a few image frames apart, for example, five image frames apart, under the condition of sacrificing a certain segmentation accuracy. That is to say, the server may take the image frame five image frames away from the current image frame in the original video as the execution image frame, that is, the sixth image frame of the original video as the execution image frame according to the video playback sequence.

在步骤S603中，基于第一位置信息确定执行图像帧中的执行位置信息，基于执行位置信息对执行图像帧进行图像截取，得到执行位置信息对应的执行子图像。In step S603, the execution location information in the execution image frame is determined based on the first location information, and the execution image frame is intercepted based on the execution location information to obtain an execution sub-image corresponding to the execution location information.

由于上文中提到原始视频中的每个图像帧的分辨率是相同的，比如都是640*320。假设第一位置信息为矩形框两个对角对应的两个像素对((X1，Y1)、(X2，Y2))，则执行图像帧中的执行位置信息可以是该执行图像帧中的矩形框两个对角对应的两个像素对((X1，Y1)、(X2，Y2))。接着，服务器可以基于执行位置信息对执行图像帧进行图像截取，得到执行位置信息对应的执行子图像。As mentioned above, the resolution of each image frame in the original video is the same, such as 640*320. Assuming that the first position information is two pixel pairs ((X1, Y1), (X2, Y2)) corresponding to two opposite corners of the rectangular frame, the execution position information in the execution image frame can be the rectangle in the execution image frame Two pixel pairs ((X1, Y1), (X2, Y2)) corresponding to the two diagonal corners of the box. Next, the server may perform image interception on the execution image frame based on the execution location information to obtain an execution sub-image corresponding to the execution location information.

在步骤S604中，基于第一子图像和执行子图像确定执行子图像对应的相似度数据。In step S604, the similarity data corresponding to the execution sub-image is determined based on the first sub-image and the execution sub-image.

可选的，假设第一子图像和执行子图像为5*5的像素对应的子图像，则服务器可以将第一子图像和执行子图像上的像素进行像素对的确定，得到25个像素对。比如，将第一子图像上，行位置为1，列位置为1的像素和执行子图像上，行位置为1，列位置为1的像素组成一个像素对，将第一子图像上，行位置为5，列位置为5的像素和执行子图像上，行位置为5，列位置为5的像素组成一个像素对。Optionally, assuming that the first sub-image and the execution sub-image are sub-images corresponding to 5*5 pixels, the server can determine the pixel pairs of the pixels on the first sub-image and the execution sub-image to obtain 25 pixel pairs . For example, on the first sub-image, the pixel whose row position is 1 and whose column position is 1 and the pixel whose row position is 1 and whose column position is 1 on the execution sub-image form a pixel pair, and on the first sub-image, row The pixel with position 5 and column position 5 and the pixel with row position 5 and column position 5 on the execution sub-image form a pixel pair.

随后，服务器可以基于每个像素对进行相似程度的对比，得到每个像素对对应的相似度数据，可选的，该相似度数据可以是百分比数据。进而，服务器可以基于25个像素对的相似度数据的均值确定执行图像帧对应的相似度数据。Subsequently, the server may compare similarity based on each pixel pair to obtain similarity data corresponding to each pixel pair. Optionally, the similarity data may be percentage data. Furthermore, the server may determine the similarity data corresponding to the execution image frame based on the mean value of the similarity data of 25 pixel pairs.

在步骤S605中，将执行子图像对应的相似度数据和第一预设数据进行对比，若执行子图像对应的相似度数据满足第一预设数据，将执行图像帧确定为当前图像帧，转至步骤S602；否则装置步骤S606。In step S605, compare the similarity data corresponding to the execution sub-image with the first preset data, if the similarity data corresponding to the execution sub-image satisfies the first preset data, determine the execution image frame as the current image frame, and turn to Go to step S602; otherwise, go to step S606.

本申请实施例中，第一预设数据可以是预先设置的，比如第一预设数据为大于或者等于95％，即若相似度数据满足大于等于95％，则确定执行子图像和第一子图像之间的相似度是很高的，因此，执行子图像中包括和第一子图像中一样的待处理对象。In the embodiment of the present application, the first preset data can be preset, for example, the first preset data is greater than or equal to 95%, that is, if the similarity data satisfies greater than or equal to 95%, it is determined to execute the sub-image and the first sub-image. The similarity between images is very high, therefore, the execution sub-image includes the same object to be processed as that in the first sub-image.

基于此，可以认为第一图像帧和执行图像帧(第二个图像帧)、或者从第一图像帧至执行图像帧(第六个图像帧)涉及的所有图像帧都可以看作是一个视频片段中的图像帧。随后，再去判断执行图像帧后面的图像帧是否也属于这个视频片段中的图像帧。Based on this, it can be considered that the first image frame and the execution image frame (the second image frame), or all image frames involved from the first image frame to the execution image frame (the sixth image frame) can be regarded as a video The image frame in the fragment. Subsequently, it is judged whether the image frame following the execution image frame also belongs to the image frame in the video segment.

以执行图像帧为第二个图像帧为例，服务器可以将执行图像帧当成当前图像帧，随后重复步骤S602-S605，即按照视频顺序，将原始视频中，距离当前图像帧一个图像帧的图像帧(原始视频中的第三个图像帧)当作执行图像帧。获取第三个图像帧中与第一位置信息对应的执行子图像，基于第一子图像和执行子图像确定相似度数据，若相似度数据满足大于或者等于95％，则确定执行子图像和第一子图像之间的相似度是很高的，因此，第三个图像帧中的执行子图像中包括和第一子图像中一样的待处理对象。Taking the execution image frame as the second image frame as an example, the server can regard the execution image frame as the current image frame, and then repeat steps S602-S605, that is, according to the order of the video, the image in the original video that is one image frame away from the current image frame frame (the third image frame in the original video) is regarded as the execution image frame. Obtain the execution sub-image corresponding to the first position information in the third image frame, determine the similarity data based on the first sub-image and the execution sub-image, and determine the execution sub-image and the second execution sub-image if the similarity data satisfies greater than or equal to 95%. The similarity between one sub-image is very high, therefore, the execution sub-image in the third image frame includes the same object to be processed as that in the first sub-image.

基于此，服务器可以将执行图像帧(第三个图像帧)当作当前图像帧，继续重复步骤S602-S605，重复步骤的过程请参考上文，这里不再重复。Based on this, the server can regard the execution image frame (the third image frame) as the current image frame, and continue to repeat steps S602-S605. For the process of repeating steps, please refer to the above, and will not repeat them here.

在步骤S606中，直至执行子图像对应的相似度数据满足第二预设数据，将第一图像帧确定为第一个视频片段的起始图像帧，将执行图像帧的前一图像帧确定为第一个视频片段的结尾图像帧，得到第一个视频片段。In step S606, until the similarity data corresponding to the execution sub-image satisfies the second preset data, the first image frame is determined as the starting image frame of the first video clip, and the previous image frame of the execution image frame is determined as The ending image frame of the first video clip to get the first video clip.

可选的，对应于上文的第一预设数据，第二预设数据可以是小于95％。举个例子，若服务器在若干轮重复步骤后，执行图像帧为原始视频中第三十一个图像帧，若第三十一个图像帧对应的执行子图像和第一子图像之间的相似度数据满足小于95％，则服务器可以将第一图像帧当作第一个视频片段的起始图像帧，将执行图像帧的前一图像帧当作第一个视频片段的结尾图像帧，得到第一个视频片段，即第一个视频片段包括第一图像帧至第三十个图像帧，共30个图像帧。这样，服务器确定了第一个视频片段，且第一个视频片段中的每个图像帧的待处理对象在每个图像帧上的位置是在第一视频片段对应的预设位置信息内，比如都是左上角(以左上角为一个顶点，5*5的区域)。Optionally, corresponding to the above first preset data, the second preset data may be less than 95%. For example, if the server executes the image frame as the thirty-first image frame in the original video after several rounds of repeated steps, if the execution sub-image corresponding to the thirty-first image frame is similar to the first sub-image If the degree data is less than 95%, the server can regard the first image frame as the start image frame of the first video segment, and the previous image frame of the execution image frame as the end image frame of the first video segment, and obtain The first video segment, that is, the first video segment includes a first image frame to a thirtieth image frame, a total of 30 image frames. In this way, the server determines the first video clip, and the position of the object to be processed in each image frame in the first video clip is within the preset position information corresponding to the first video clip, such as Both are in the upper left corner (with the upper left corner as a vertex, a 5*5 area).

在步骤S607中，将执行图像帧确定为第二图像帧。In step S607, the execution image frame is determined as the second image frame.

此时，执行图像帧为第三十一个图像帧，服务器将第三十一个图像帧当作第二图像帧。At this time, the execution image frame is the thirty-first image frame, and the server regards the thirty-first image frame as the second image frame.

在步骤S608中，确定待处理对象在第二图像帧上的第二位置信息，基于第二位置信息对第二图像帧进行图像截取，得到第二位置信息对应的第二子图像。In step S608, the second position information of the object to be processed on the second image frame is determined, and the second image frame is intercepted based on the second position information to obtain a second sub-image corresponding to the second position information.

可选的，服务器可以通过对象检测模型确定待处理对象在第二图像帧上的第二位置信息，其中，第二位置信息可以为矩形框四个角对应的四个像素对((X3，Y3)、(X3，Y4)、(X4，Y3)、(X4，Y4))。可选的，对象检测模型可以包括但不限于采用卷积神经网络、循环神经网络或递归神经网络等深度学习模型。并，服务器可以基于第二位置信息对第二图像帧进行图像截取，得到第二位置信息对应的第二子图像。Optionally, the server may determine the second position information of the object to be processed on the second image frame through the object detection model, where the second position information may be four pixel pairs corresponding to the four corners of the rectangular frame ((X3, Y3 ), (X3, Y4), (X4, Y3), (X4, Y4)). Optionally, the object detection model may include, but is not limited to, a deep learning model such as a convolutional neural network, a recurrent neural network, or a recurrent neural network. Also, the server may perform image interception on the second image frame based on the second position information to obtain a second sub-image corresponding to the second position information.

在步骤S609中，将第二图像帧确定为当前图像帧。In step S609, the second image frame is determined as the current image frame.

由于要进入第二个视频片段确定的循环，服务器可以，将第二图像帧确定为当前图像帧。Since the loop of determining the second video segment is about to be entered, the server may determine the second image frame as the current image frame.

在步骤S610中，按照视频播放顺序，将原始视频中，距离当前图像帧预设间隔的图像帧确定为执行图像帧。In step S610 , in the original video, an image frame at a preset distance from the current image frame is determined as an execution image frame according to the video playing sequence.

本申请实施例中，服务器可以按照视频播放顺序，将原始视频中，距离当前图像帧一个图像帧的图像帧当作执行图像帧。即，服务器可以将第三十二个图像帧当作执行图像帧。In the embodiment of the present application, the server may use the image frame in the original video that is one image frame away from the current image frame as the execution image frame according to the video playback sequence. That is, the server may regard the thirty-second image frame as an execution image frame.

在步骤S611中，基于第二位置信息确定执行图像帧中的执行位置信息，基于执行图像帧中的执行位置信息对执行图像帧进行图像截取，得到执行位置信息对应的执行子图像。In step S611, the execution location information in the execution image frame is determined based on the second location information, and the execution image frame is intercepted based on the execution location information in the execution image frame to obtain an execution sub-image corresponding to the execution location information.

本申请实施例中，服务器可以基于第二位置信息确定执行图像帧中的执行位置信息，基于执行图像帧中的执行位置信息对执行图像帧进行图像截取，得到执行位置信息对应的执行子图像。In this embodiment of the present application, the server may determine the execution location information in the execution image frame based on the second location information, and perform image capture on the execution image frame based on the execution location information in the execution image frame to obtain the execution sub-image corresponding to the execution location information.

在步骤S612中，基于第二子图像和执行子图像确定执行子图像对应的相似度数据。In step S612, the similarity data corresponding to the execution sub-image is determined based on the second sub-image and the execution sub-image.

可选的，假设第二子图像和执行子图像为5*5的像素对应的子图像，则服务器可以将第二子图像和执行子图像上的像素进行像素对的确定，得到25个像素对。比如，将第二子图像上，行位置为1，列位置为1的像素和执行子图像上，行位置为1，列位置为1的像素组成一个像素对，将第二子图像上，行位置为5，列位置为5的像素和执行子图像上，行位置为5，列位置为5的像素组成一个像素对。Optionally, assuming that the second subimage and the execution subimage are subimages corresponding to 5*5 pixels, the server can determine the pixel pairs of the pixels on the second subimage and the execution subimage to obtain 25 pixel pairs . For example, on the second sub-image, the pixel whose row position is 1 and whose column position is 1 and the pixel whose row position is 1 and whose column position is 1 on the execution sub-image form a pixel pair, and on the second sub-image, row The pixel with position 5 and column position 5 and the pixel with row position 5 and column position 5 on the execution sub-image form a pixel pair.

在步骤S613中，将执行子图像对应的相似度数据和第一预设数据进行对比，若执行子图像对应的相似度数据满足第一预设数据，将执行图像帧确定为当前图像帧；转至步骤S610；否则装置步骤S613。In step S613, the similarity data corresponding to the execution sub-image is compared with the first preset data, and if the similarity data corresponding to the execution sub-image satisfies the first preset data, the execution image frame is determined as the current image frame; Go to step S610; otherwise, step S613.

本申请实施例中，若相似度数据满足大于等于95％，则确定执行子图像和第一子图像之间的相似度是很高的，因此，执行子图像中包括和第二子图像中一样的待处理对象。In the embodiment of the present application, if the similarity data satisfies greater than or equal to 95%, it is determined that the similarity between the execution sub-image and the first sub-image is very high. Therefore, the execution sub-image includes the same of pending objects.

基于此，可以将第二图像帧(第31个图像帧)和执行图像帧(第32个图像帧)看作是一个视频片段中的图像帧。随后，再去判断执行图像帧后面的图像帧是否也属于这个视频片段中的图像帧。Based on this, the second image frame (the 31st image frame) and the execution image frame (the 32nd image frame) can be regarded as image frames in a video segment. Subsequently, it is judged whether the image frame following the execution image frame also belongs to the image frame in the video segment.

以执行图像帧为第32个图像帧为例，服务器可以将执行图像帧当成当前图像帧，随后重复步骤S610-S613，即按照视频顺序，将原始视频中，距离当前图像帧一个图像帧的图像帧(原始视频中的第33个图像帧)当作执行图像帧。获取第33个图像帧中与第二位置信息对应的执行子图像，基于第二子图像和执行子图像确定相似度数据，若相似度数据满足大于或者等于95％，则确定执行子图像和第二子图像之间的相似度是很高的，因此，第33个图像帧中的执行子图像中包括和第二子图像中一样的待处理对象。Taking the execution image frame as the 32nd image frame as an example, the server can regard the execution image frame as the current image frame, and then repeat steps S610-S613, that is, according to the order of the video, the image in the original video that is one image frame away from the current image frame frame (the 33rd image frame in the original video) is regarded as the execution image frame. Obtain the execution sub-image corresponding to the second position information in the 33rd image frame, determine the similarity data based on the second sub-image and the execution sub-image, and determine the execution sub-image and the execution sub-image if the similarity data is greater than or equal to 95%. The similarity between the two sub-images is very high, therefore, the execution sub-image in the 33rd image frame includes the same object to be processed as that in the second sub-image.

基于此，服务器可以将执行图像帧(第33个图像帧)当作当前图像帧，继续重复步骤S610-S613，重复步骤的过程请参考上文，这里不再重复。Based on this, the server may regard the execution image frame (the 33rd image frame) as the current image frame, and continue to repeat steps S610-S613. For the process of repeating steps, please refer to the above, and will not repeat them here.

在步骤S614中，直至执行子图像对应的相似度数据满足第二预设数据，将第二图像帧确定为第二个视频片段的起始图像帧，将执行图像帧的前一图像帧确定为第二个视频片段的结尾图像帧，得到第二个视频片段。In step S614, until the similarity data corresponding to the execution sub-image satisfies the second preset data, the second image frame is determined as the start image frame of the second video segment, and the previous image frame of the execution image frame is determined as The ending image frame of the second video clip, resulting in the second video clip.

可选的，对应于上文的第一预设数据，第二预设数据可以是小于95％。举个例子，若服务器在若干轮重复步骤后，执行图像帧为原始视频中第61个图像帧，若第61个图像帧对应的执行子图像和第二子图像之间的相似度数据满足小于95％，则服务器可以将第二图像帧当作第二个视频片段的起始图像帧，将执行图像帧的前一图像帧当作第二个视频片段的结尾图像帧，得到第二个视频片段，即第二个视频片段包括第30个图像帧至第60个图像帧，共30个图像帧，如此，服务器确定了第二个视频片段，且第二个视频片段中的每个图像帧的待处理对象在图像帧上的位置是在第一视频片段对应的预设位置信息内，比如都是右下角(以右下角为一个顶点，5*5的区域)。Optionally, corresponding to the above first preset data, the second preset data may be less than 95%. For example, if the server executes the image frame after several rounds of repeated steps, it is the 61st image frame in the original video, and if the similarity data between the execution sub-image and the second sub-image corresponding to the 61st image frame satisfies less than 95%, the server can regard the second image frame as the start image frame of the second video clip, and the previous image frame of the execution image frame as the end image frame of the second video clip, and obtain the second video segment, that is, the second video segment includes the 30th image frame to the 60th image frame, a total of 30 image frames, so that the server determines the second video segment, and each image frame in the second video segment The position of the object to be processed on the image frame is within the preset position information corresponding to the first video segment, such as the lower right corner (with the lower right corner as a vertex, a 5*5 area).

在步骤S615中，直至遍历完原始视频，得到至少一个视频片段；至少一个视频片段包括第一个视频片段和第二个视频片段。In step S615, until the original video is traversed, at least one video segment is obtained; the at least one video segment includes the first video segment and the second video segment.

随后，服务器可以以上步骤，直至遍历完原始视频中每一个图像帧，得到至少一个视频片段。将每个视频片段中的第一个图像帧确定为每个视频片段中的关键帧，或者将每个视频片段中的任意一个图像帧确定为每个视频片段中的关键帧。Subsequently, the server may perform the above steps until each image frame in the original video is traversed to obtain at least one video segment. The first image frame in each video segment is determined as a key frame in each video segment, or any image frame in each video segment is determined as a key frame in each video segment.

如上，通过第一图像帧中，位于第一位置信息的第一子图像，和其他图像帧中，第一位置信息对应执行位置信息的执行子图像之间的相似度数据准确地将存在同样情况的待处理对象的图像帧分在一个视频片段，进而通过大循环套着小循环地实施方式将该原始视频分成多个视频片段。使得在最后阶段，需要利用该视频片段的关键帧的待处理对象的蒙版对该视频片段进行待处理对象去除的时候，可以因为视频片段的关键帧的待处理对象的蒙版快速定位到该视频片段中的图像帧中待处理对象的位置，进而统一对视频片段进行待处理对象的去除。且，由于大循环套着小循环地实施方式可以通过代码中的循环语句实现，因此，本申请实施例可以通过较少的代码实现该功能。As above, through the first image frame, the first sub-image located in the first position information, and the similarity data between the execution sub-images corresponding to the execution position information in other image frames, the same situation will exist exactly The image frame of the object to be processed is divided into one video segment, and then the original video is divided into multiple video segments through the implementation of a large loop surrounded by a small loop. So that in the final stage, when the mask of the object to be processed in the key frame of the video clip needs to be used to remove the object to be processed in the video clip, the mask of the object to be processed in the key frame of the video clip can be quickly positioned to the The position of the object to be processed in the image frame in the video clip, and then remove the object to be processed for the video clip in a unified manner. Moreover, since the implementation of a large loop enclosing a small loop can be realized through a loop statement in the code, this embodiment of the present application can realize this function with less code.

在步骤S203中，确定每个视频片段中的关键帧的待处理对象和关键帧的待处理对象的位置信息。In step S203, the object to be processed in the key frame and the position information of the object to be processed in the key frame in each video segment are determined.

本申请实施例中，服务器可以基于对象检测模型确定每个视频片段中的关键帧的待处理对象和关键帧的待处理对象的位置信息。In the embodiment of the present application, the server may determine the object to be processed in the key frame and the position information of the object to be processed in the key frame in each video segment based on the object detection model.

图7是根据一示例性实施例示出的一种关键帧的待处理对象和关键帧的待处理对象的位置信息的确定方法的流程图，如图7所示，包括：Fig. 7 is a flow chart of a key frame object to be processed and a method for determining position information of the key frame object to be processed according to an exemplary embodiment, as shown in Fig. 7 , including:

在步骤S701中，基于对象检测模型中的第一检测模型对每个视频片段中的关键帧进行第一类对象检测，确定每个视频片段中的关键帧的第一类待处理对象和第一类待处理对象的位置信息。In step S701, based on the first detection model in the object detection model, the first type of object detection is performed on the key frame in each video clip, and the first type of object to be processed and the first type of object to be processed in the key frame in each video clip are determined. The location information of the object to be processed.

本申请实施例中，对象检测模型可以包括两个模块，分别为第一检测模型和第二检测模型。其中，对象检测模型是训练好的模型结构。可选的，服务器可以调用对象检测模型，基于对象检测模型中的第一检测模型对每个视频片段中的关键帧进行第一类对象检测，确定每个视频片段中的关键帧的第一类待处理对象和第一类待处理对象的位置信息。In the embodiment of the present application, the object detection model may include two modules, namely a first detection model and a second detection model. Among them, the object detection model is a trained model structure. Optionally, the server may invoke the object detection model, and perform first-type object detection on the key frames in each video segment based on the first detection model in the object detection model, and determine the first-type object detection of the key frames in each video segment The location information of the object to be processed and the first type of object to be processed.

在步骤S703中，基于对象检测模型中的第二检测模型和第一类待处理对象的位置信息对每个视频片段中的关键帧进行第二类对象检测，确定每个视频片段中的关键帧的第二类待处理对象和第二类待处理对象的位置信息。In step S703, based on the second detection model in the object detection model and the position information of the first type of object to be processed, the key frame in each video segment is detected for the second type of object, and the key frame in each video segment is determined The second type of object to be processed and the location information of the second type of object to be processed.

可选的，待处理对象中可以包括不同类型的待处理对象，比如第一类待处理对象和第二类待处理对象。Optionally, the objects to be processed may include different types of objects to be processed, such as objects to be processed of a first type and objects to be processed of a second type.

由于待处理对象中不仅可以包括第一类待处理对象，还可以包括第二类待处理对象，因此，还可以基于对象检测模型中的第二检测模型对每个视频片段中的关键帧进行第二类对象检测，确定每个视频片段中的关键帧的第二类待处理对象和第二类待处理对象的位置信息。Since the object to be processed can not only include the first type of object to be processed, but also the second type of object to be processed, the key frame in each video clip can also be performed for the first time based on the second detection model in the object detection model. The second-type object detection is to determine the second-type object to be processed and the position information of the second-type object to be processed in the key frame of each video segment.

然而，如果单纯利用对象检测模型中的第二检测模型对每个视频片段中的关键帧进行第二类对象检测，可能会将关键帧中不是待处理对象中的第二类待处理对象当成待处理对象中的第二类待处理对象。比如，当第二类待处理对象为文字的时候，可能将关键帧中的字幕检测为待处理对象中的文字。因此，服务器可以基于对象检测模型中的第二检测模型和第一类待处理对象的位置信息对每个视频片段中的关键帧进行第二类对象检测，确定每个视频片段中的关键帧的第二类待处理对象和第二类待处理对象的位置信息。However, if only the second detection model in the object detection model is used to detect the second type of object on the key frame in each video segment, the second type of object to be processed in the key frame that is not the object to be processed may be regarded as the object to be processed. The second type of pending object in the processing object. For example, when the second type of object to be processed is text, the subtitle in the key frame may be detected as the text in the object to be processed. Therefore, the server can perform second-type object detection on key frames in each video clip based on the second detection model in the object detection model and the position information of the first type of object to be processed, and determine the position of the key frame in each video clip The second type of object to be processed and the location information of the second type of object to be processed.

本申请实施例中，第一类待处理对象的位置信息在其中的作用是用来限定第二类待处理对象的位置信息。具体地，在第二检测模型在检测第二类待处理对象的过程中，若一个关键帧上存在多个第二类待处理对象，可以将距离第一类待处理对象在预设距离内的第二类待处理对象确定为后期要去除的待处理对象中的第二类待处理对象。其他的第二类待处理对象不属于后期要去除的待处理对象。In the embodiment of the present application, the location information of the first type of object to be processed is used to define the location information of the second type of object to be processed. Specifically, in the process of the second detection model detecting the second type of object to be processed, if there are multiple second type of objects to be processed on a key frame, the distance from the first type of object to be processed within the preset distance can be The object to be processed of the second type is determined as the object to be processed of the second type among the objects to be processed to be removed later. Other objects to be processed of the second type do not belong to the objects to be processed to be removed later.

可选的，服务器可以将第一类待处理对象和第二类待处理对象的合集，确定为待处理对象。可选的，服务器可以将第一类待处理对象的位置信息和第二类待处理对象的位置信息的合集，确定为待处理对象的位置信息。Optionally, the server may determine a collection of the first type of object to be processed and the second type of object to be processed as the object to be processed. Optionally, the server may determine the combination of the location information of the first type of object to be processed and the location information of the second type of object to be processed as the location information of the object to be processed.

其中，第一检测模型可以包括但不限于采用卷积神经网络、循环神经网络或递归神经网络等深度学习模型。深度学习模型涉及机器学习(Machine Learning,ML)，机器学习是一门多领域交叉学科，涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为，以获取新的知识或技能，重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心，是使计算机具有智能的根本途径，其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。机器学习可以分为有监督的机器学习，无监督的机器学习和半监督的机器学习。Wherein, the first detection model may include, but is not limited to, deep learning models such as convolutional neural networks, recurrent neural networks, or recurrent neural networks. The deep learning model involves machine learning (Machine Learning, ML). Machine learning is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. Specializes in the study of how computers simulate or implement human learning behaviors to acquire new knowledge or skills, and reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent, and its application pervades all fields of artificial intelligence. Machine learning and deep learning usually include techniques such as artificial neural network, belief network, reinforcement learning, transfer learning, inductive learning, and teaching learning. Machine learning can be divided into supervised machine learning, unsupervised machine learning and semi-supervised machine learning.

通过上述的实施例，可能避免将关键帧中不是待处理对象中的第二类待处理对象当成待处理对象中的第二类待处理对象，准确地定位到包含第一类待处理对象和第二类待处理对象的待处理对象，减少待处理对象识别错误的可能性。Through the above-mentioned embodiment, it is possible to avoid treating the object to be processed of the second type in the key frame that is not the object to be processed as the object to be processed of the second type in the object to be processed, and accurately locate the object containing the object to be processed of the first type and the object to be processed in the second type. The object to be processed of the second type of object to be processed reduces the possibility of misidentifying the object to be processed.

在步骤S205中，基于每个关键帧的待处理对象和每个关键帧的待处理对象的位置信息，生成每个关键帧的待处理对象的蒙版。In step S205, based on the object to be processed in each key frame and the position information of the object to be processed in each key frame, a mask of the object to be processed in each key frame is generated.

本申请实施例中，为了保留更多的背景信息，使得去除待处理对象后的视频自然和减少视频信息遗漏问题，服务器可以基于每个关键帧的待处理对象的位置信息对每个关键帧的待处理对象的像素进行二值化处理，得到每个关键帧的待处理对象的蒙版。In the embodiment of this application, in order to retain more background information, make the video after removing the object to be processed natural and reduce the problem of video information omission, the server can analyze the position information of each key frame based on the position information of the object to be processed in each key frame. The pixels of the object to be processed are binarized to obtain the mask of the object to be processed in each key frame.

可选的，待处理对象的位置信息在关键帧中可以不是一个完整的矩形区域，而是一个完整的矩形局域中的局部区域。因此，服务器可以基于待处理对象的位置信息确定其所在的矩形区域，该矩形区域为关键帧中的区域。Optionally, the position information of the object to be processed may not be a complete rectangular area in the key frame, but a partial area in a complete rectangular local area. Therefore, the server can determine the rectangular area where the object is located based on the position information of the object to be processed, and the rectangular area is the area in the key frame.

接着，服务器可以该将矩形区域从关键帧上截取出来，随后，基于待处理对象的位置信息和该矩形区域在关键帧上的位置信息确定待处理对象在矩形区域上的位置信息。服务器可以基于待处理对象在矩形区域上的位置信息对矩形区域的像素进行二值化处理，比如，将矩形区域上的待处理对象的像素的灰度值设置为0，将非待处理对象的像素的灰度值设置为255。这样，服务器就可以得到处理后的矩形区域，即每个关键帧的待处理对象的蒙版。Next, the server may intercept the rectangular area from the key frame, and then determine the position information of the object to be processed on the rectangular area based on the position information of the object to be processed and the position information of the rectangular area on the key frame. The server can binarize the pixels of the rectangular area based on the position information of the object to be processed on the rectangular area, for example, set the gray value of the pixel of the object to be processed on the rectangular area to 0, and set the gray value of the pixel of the object not to be processed to The grayscale value of the pixel is set to 255. In this way, the server can obtain the processed rectangular area, that is, the mask of the object to be processed for each key frame.

在步骤S207中，基于至少一个视频片段和每个关键帧的待处理对象的蒙版对至少一个视频片段进行对象去除，得到目标视频。In step S207, object removal is performed on at least one video clip based on the at least one video clip and the mask of the object to be processed in each key frame to obtain a target video.

本申请实施例中，服务器可以利用对象去除模型，基于至少一个视频片段和每个关键帧的待处理对象的蒙版对至少一个视频片段进行待处理对象去除，得到目标视频；目标视频不包含待处理对象。In the embodiment of the present application, the server can use the object removal model to remove the object to be processed based on at least one video segment and the mask of the object to be processed in each key frame to obtain the target video; the target video does not contain the object to be processed. Process objects.

可选的，服务器将至少一个视频片段和每个关键帧的待处理对象的蒙版输入对象去除模型，其中，待处理对象的蒙版携带有矩形区域在关键帧中的位置信息。随后，在对象去除模型的基础上，利用每个视频片段的关键帧的待处理对象的蒙版对该关键帧对应的视频片段进行待处理对象去除，得到不包含待处理对象的目标视频。Optionally, the server inputs at least one video segment and a mask of the object to be processed in each key frame into the object removal model, wherein the mask of the object to be processed carries position information of the rectangular area in the key frame. Then, on the basis of the object removal model, use the mask of the object to be processed in the key frame of each video clip to remove the object to be processed for the video clip corresponding to the key frame, and obtain the target video that does not contain the object to be processed.

其中，对象去除模型可以包括但不限于采用卷积神经网络、循环神经网络或递归神经网络等深度学习模型。深度学习模型涉及机器学习(Machine Learning,ML)，机器学习是一门多领域交叉学科，涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为，以获取新的知识或技能，重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心，是使计算机具有智能的根本途径，其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。机器学习可以分为有监督的机器学习，无监督的机器学习和半监督的机器学习。Wherein, the object removal model may include but not limited to deep learning models such as convolutional neural network, recurrent neural network or recurrent neural network. The deep learning model involves machine learning (Machine Learning, ML). Machine learning is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. Specializes in the study of how computers simulate or implement human learning behaviors to acquire new knowledge or skills, and reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent, and its application pervades all fields of artificial intelligence. Machine learning and deep learning usually include techniques such as artificial neural network, belief network, reinforcement learning, transfer learning, inductive learning, and teaching learning. Machine learning can be divided into supervised machine learning, unsupervised machine learning and semi-supervised machine learning.

本申请实施例中，原始视频中可以包含有视频尾部片段，视频尾部片段可以不包含任何内容，仅仅是比如延续将近半秒的黑色图像帧。若服务器获取片尾删除指令，服务器可以对原始视频进行片尾识别，随后将其进行删除。In the embodiment of the present application, the original video may contain a video tail segment, and the video tail segment may not contain any content, but only a black image frame that lasts for nearly half a second, for example. If the server obtains the instruction to delete the credits, the server can identify the credits of the original video, and then delete them.

图8是根据一示例性实施例示出的一种删除片尾的方法的流程图，如图8所示，包括：Fig. 8 is a flow chart of a method for deleting a credits trailer according to an exemplary embodiment, as shown in Fig. 8 , including:

在步骤S801中，对原始视频中进行预设颜色占比检测，得到原始视频中的每个图像帧的预设颜色的占比数据。In step S801, the preset color proportion detection is performed on the original video, and the proportion data of the preset color of each image frame in the original video is obtained.

本申请实施例中，服务器可以对原始视频中进行预设颜色占比检测，得到原始视频中的每个图像帧的预设颜色的占比数据。举个例子，假设预设颜色为黑色，服务器可以对每个图像帧中的像素进行黑色像素的占比检测，得到黑色像素在每个图像帧中的占比数据。In the embodiment of the present application, the server may detect the ratio of preset colors in the original video to obtain the ratio data of the preset color of each image frame in the original video. For example, assuming that the default color is black, the server can detect the proportion of black pixels in each image frame to obtain the proportion data of black pixels in each image frame.

在步骤S803中，若原始视频中存在多个图像帧的预设颜色的占比数据满足第三预设数据，确定多个图像帧的起止时间，其中，多个图像帧为连续的图像帧，且多个图像帧的最后一个图像帧为视频结尾图像帧。In step S803, if the ratio data of the preset colors of multiple image frames in the original video satisfies the third preset data, determine the start and end times of the multiple image frames, wherein the multiple image frames are continuous image frames, And the last image frame of the plurality of image frames is the video end image frame.

本申请实施例中，若原始视频中存在多个图像帧的预设颜色的占比数据满足第三预设数据(比如80％)，其中，该多个图像帧为连续的图像帧，且多个图像帧的最后一个图像帧为视频结尾图像帧，服务器可以确定多个图像帧组成视频尾部片段。接着，服务器可以确定视频尾部片段的起止时间。In the embodiment of the present application, if the ratio data of preset colors of multiple image frames in the original video satisfies the third preset data (such as 80%), wherein, the multiple image frames are continuous image frames, and more The last image frame of the two image frames is the video end image frame, and the server can determine that multiple image frames form the video end segment. Then, the server can determine the start and end times of the video tail segment.

在步骤S805中，基于片尾删除指令，基于起止时间将多个图像帧从原始视频中删除，得到更新后的原始视频。In step S805, a plurality of image frames are deleted from the original video based on the start and end time based on the end credit deletion instruction to obtain an updated original video.

本申请实施例中，服务器可以基于片尾删除指令，基于起止时间将多个图像帧从原始视频中删除，得到更新后的原始视频。In the embodiment of the present application, the server may delete multiple image frames from the original video based on the start and end time based on the end credit deletion instruction, to obtain an updated original video.

可选的，删除片尾可以在对原始视频进行分段和关键帧提取，得到至少一个视频片段和每个视频片段中的关键帧之前实施，也可以在对原始视频进行分段和关键帧提取，得到至少一个视频片段和每个视频片段中的关键帧后实施。Optionally, deleting the end credits can be performed before segmenting and extracting key frames from the original video to obtain at least one video segment and key frames in each video segment, or before segmenting and extracting key frames from the original video, The method is implemented after obtaining at least one video segment and a key frame in each video segment.

如上，通过定位多个图像帧组成的片尾，可以使得根据片尾删除指令对片尾进行删除，不再需要对片尾进行待处理对象的检测和去除，节约计算资源。As above, by locating the credits composed of multiple image frames, the credits can be deleted according to the credits deletion instruction, and it is no longer necessary to detect and remove objects to be processed for the credits, saving computing resources.

综上，本申请通过视频片段和关键帧可以准确定位到每个视频片段中待处理对象存在的位置，根据图像帧之间的相似度数据可以准确地将存在同样待处理对象的图像帧分在一个视频片段，使得在最后阶段，需要利用该视频片段的关键帧的待处理对象的蒙版对该视频片段进行待处理对象去除的时候，可以因为视频片段的关键帧的待处理对象的蒙版快速定位到该视频片段中的图像帧中待处理对象的位置，进而统一对视频片段进行待处理对象的去除，减少后续视频中待处理对象的去除所需要的算力资源，且可以应用于更多待处理对象所在的视频，更具普适性。To sum up, this application can accurately locate the position of the object to be processed in each video clip through the video clips and key frames, and can accurately divide the image frames with the same object to be processed according to the similarity data between image frames. A video clip, so that in the final stage, when it is necessary to use the mask of the object to be processed in the key frame of the video clip to remove the object to be processed in the video clip, it can be because of the mask of the object to be processed in the key frame of the video clip Quickly locate the position of the object to be processed in the image frame in the video clip, and then uniformly remove the object to be processed in the video clip, reduce the computing resources required for the removal of the object to be processed in the subsequent video, and can be applied to more Videos with multiple objects to be processed are more universal.

图9是根据一示例性实施例示出的一种视频处理装置框图。该装置具有实现上述方法实施例中数据处理方法的功能，所述功能可以由硬件实现，也可以由硬件执行相应的软件实现。参照图9，该装置包括：Fig. 9 is a block diagram of a video processing device according to an exemplary embodiment. The device has the function of realizing the data processing method in the above method embodiment, and the function may be realized by hardware, or may be realized by hardware executing corresponding software. Referring to Figure 9, the device includes:

分段模块901，被配置为执行对原始视频进行分段和关键帧提取，得到至少一个视频片段和每个视频片段中的关键帧；其中，同一个视频片段中的图像帧包含的待处理对象在待处理对象所属的图像帧上的位置信息满足预设位置信息；预设位置信息为图像帧所属的视频片段对应的信息；The segmentation module 901 is configured to perform segmentation and key frame extraction on the original video to obtain at least one video segment and key frames in each video segment; wherein, the image frames in the same video segment contain objects to be processed The position information on the image frame to which the object to be processed belongs satisfies the preset position information; the preset position information is the information corresponding to the video segment to which the image frame belongs;

确定模块902，被配置为执行确定每个视频片段中的关键帧的待处理对象和关键帧的待处理对象的位置信息；The determination module 902 is configured to determine the object to be processed in the key frame and the position information of the object to be processed in the key frame in each video segment;

蒙版生成模块903，被配置为执行基于每个关键帧的待处理对象和每个关键帧的待处理对象的位置信息，生成每个关键帧的待处理对象的蒙版；The mask generating module 903 is configured to generate a mask of the object to be processed in each key frame based on the object to be processed in each key frame and the position information of the object to be processed in each key frame;

对象去除模块904，被配置为执行基于至少一个视频片段和每个关键帧的待处理对象的蒙版对至少一个视频片段进行对象处理，得到目标视频。The object removal module 904 is configured to perform object processing on at least one video clip based on the at least one video clip and the mask of the object to be processed in each key frame, to obtain a target video.

需要说明的是，上述实施例提供的装置，在实现其功能时，仅以上述各功能模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能模块完成，即将设备的内部结构划分成不同的功能模块，以完成以上描述的全部或者部分功能。另外，上述实施例提供的装置与方法实施例属于同一构思，其具体实现过程详见方法实施例，这里不再赘述。It should be noted that, when realizing the functions of the device provided by the above-mentioned embodiments, the division of the above-mentioned functional modules is used as an example for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional modules according to the needs. The internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the device and the method embodiment provided by the above embodiment belong to the same idea, and the specific implementation process thereof is detailed in the method embodiment, and will not be repeated here.

图10是根据一示例性实施例示出的一种用于视频处理的装置3000的框图。例如，装置3000可以是移动电话，计算机，数字广播终端，消息收发设备，游戏控制台，平板设备，医疗设备，健身设备，个人数字助理等。Fig. 10 is a block diagram of an apparatus 3000 for video processing according to an exemplary embodiment. For example, the apparatus 3000 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.

参照图10，装置3000可以包括以下一个或多个组件：处理组件3002，存储器3004，电力组件3006，多媒体组件3008，音频组件3010，输入/输出(I/O)的接口3012，传感器组件3014，以及通信组件3016。10, the device 3000 may include one or more of the following components: a processing component 3002, a memory 3004, a power component 3006, a multimedia component 3008, an audio component 3010, an input/output (I/O) interface 3012, a sensor component 3014, and a communication component 3016.

处理组件3002通常控制装置3000的整体操作，诸如与显示，电话呼叫，数据通信，相机操作和记录操作相关联的操作。处理组件3002可以包括一个或多个处理器3020来执行指令，以完成上述的方法的全部或部分步骤。此外，处理组件3002可以包括一个或多个模块，便于处理组件3002和其他组件之间的交互。例如，处理组件3002可以包括多媒体模块，以方便多媒体组件3008和处理组件3002之间的交互。The processing component 3002 generally controls the overall operations of the device 3000, such as those associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 3002 may include one or more processors 3020 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 3002 may include one or more modules that facilitate interaction between processing component 3002 and other components. For example, processing component 3002 may include a multimedia module to facilitate interaction between multimedia component 3008 and processing component 3002 .

存储器3004被配置为存储各种类型的数据以支持在设备3000的操作。这些数据的示例包括用于在装置3000上操作的任何应用程序或方法的指令，联系人数据，电话簿数据，消息，图片，视频等。存储器3004可以由任何类型的易失性或非易失性存储设备或者它们的组合实现，如静态随机存取存储器(SRAM)，电可擦除可编程只读存储器(EEPROM)，可擦除可编程只读存储器(EPROM)，可编程只读存储器(PROM)，只读存储器(ROM)，磁存储器，快闪存储器，磁盘或光盘。The memory 3004 is configured to store various types of data to support operations at the device 3000 . Examples of such data include instructions for any application or method operating on device 3000, contact data, phonebook data, messages, pictures, videos, and the like. The memory 3004 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.

电源组件3006为装置3000的各种组件提供电力。电源组件3006可以包括电源管理系统，一个或多个电源，及其他与为装置3000生成、管理和分配电力相关联的组件。Power component 3006 provides power to various components of device 3000 . Power components 3006 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for device 3000 .

多媒体组件3008包括在所述装置3000和用户之间的提供一个输出接口的屏幕。在一些实施例中，屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板，屏幕可以被实现为触摸屏，以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界，而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中，多媒体组件3008包括一个前置摄像头和/或后置摄像头。当设备3000处于操作模式，如拍摄模式或视频模式时，前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。The multimedia component 3008 includes a screen that provides an output interface between the device 3000 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense a boundary of a touch or swipe action, but also detect duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia component 3008 includes a front camera and/or a rear camera. When the device 3000 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capability.

音频组件3010被配置为输出和/或输入音频信号。例如，音频组件3010包括一个麦克风(MIC)，当装置3000处于操作模式，如呼叫模式、记录模式和语音识别模式时，麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器3004或经由通信组件3016发送。在一些实施例中，音频组件3010还包括一个扬声器，用于输出音频信号。The audio component 3010 is configured to output and/or input audio signals. For example, the audio component 3010 includes a microphone (MIC), which is configured to receive external audio signals when the device 3000 is in operation modes, such as call mode, recording mode and voice recognition mode. Received audio signals may be further stored in memory 3004 or sent via communication component 3016 . In some embodiments, the audio component 3010 also includes a speaker for outputting audio signals.

I/O接口3012为处理组件3002和外围接口模块之间提供接口，上述外围接口模块可以是键盘，点击轮，按钮等。这些按钮可包括但不限于：主页按钮、音量按钮、启动按钮和锁定按钮。The I/O interface 3012 provides an interface between the processing component 3002 and a peripheral interface module, which may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: a home button, volume buttons, start button, and lock button.

传感器组件3014包括一个或多个传感器，用于为装置3000提供各个方面的状态评估。例如，传感器组件3014可以检测到设备3000的打开/关闭状态，组件的相对定位，例如所述组件为装置3000的显示器和小键盘，传感器组件3014还可以检测装置3000或装置3000一个组件的位置改变，用户与装置3000接触的存在或不存在，装置3000方位或加速/减速和装置3000的温度变化。传感器组件3014可以包括接近传感器，被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件3014还可以包括光传感器，如CMOS或CCD图像传感器，用于在成像应用中使用。在一些实施例中，该传感器组件3014还可以包括加速度传感器，陀螺仪传感器，磁传感器，压力传感器或温度传感器。Sensor assembly 3014 includes one or more sensors for providing status assessments of various aspects of device 3000 . For example, the sensor component 3014 can detect the open/closed state of the device 3000, the relative positioning of components, such as the display and keypad of the device 3000, and the sensor component 3014 can also detect a change in the position of the device 3000 or a component of the device 3000 , the presence or absence of user contact with the device 3000, the device 3000 orientation or acceleration/deceleration and the temperature change of the device 3000. Sensor assembly 3014 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 3014 may also include an optical sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 3014 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.

通信组件3016被配置为便于装置3000和其他设备之间有线或无线方式的通信。装置3000可以接入基于通信标准的无线网络，如WiFi，运营商网络(如2G、3G、4G或5G)，或它们的组合。在一个示例性实施例中，通信组件3016经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中，所述通信组件3016还包括近场通信(NFC)模块，以促进短程通信。例如，在NFC模块可基于射频识别(RFID)技术，红外数据协会(IrDA)技术，超宽带(UWB)技术，蓝牙(BT)技术和其他技术来实现。The communication component 3016 is configured to facilitate wired or wireless communication between the apparatus 3000 and other devices. The device 3000 can access wireless networks based on communication standards, such as WiFi, operator networks (such as 2G, 3G, 4G or 5G), or a combination thereof. In an exemplary embodiment, the communication component 3016 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 3016 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (BT) technology and other technologies.

在示例性实施例中，装置3000可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现，用于执行上述方法。In an exemplary embodiment, apparatus 3000 may be programmed by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing the methods described above.

本发明的实施例还提供了一种计算机可读存储介质，所述计算机可读存储介质可设置于电子设备之中以保存用于实现一种视频处理方法相关的至少一条指令或至少一段程序，该至少一条指令或该至少一段程序由该处理器加载并执行以实现上述方法实施例提供的视频处理方法。An embodiment of the present invention also provides a computer-readable storage medium, which can be set in an electronic device to store at least one instruction or at least one program related to implementing a video processing method, The at least one instruction or the at least one section of program is loaded and executed by the processor to implement the video processing method provided by the foregoing method embodiments.

本发明的实施例还提供了提供一种计算机程序产品，计算机程序产品包括计算机程序，计算机程序存储在可读存储介质中，计算机设备的至少一个处理器从可读存储介质读取并执行计算机程序，使得计算机设备执行本公开实施例的第一方面中任一项的方法。Embodiments of the present invention also provide a computer program product, the computer program product includes a computer program, the computer program is stored in a readable storage medium, at least one processor of the computer device reads and executes the computer program from the readable storage medium , so that the computer device executes any one of the methods in the first aspect of the embodiments of the present disclosure.

需要说明的是：上述本发明实施例先后顺序仅仅为了描述，不代表实施例的优劣。且上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。It should be noted that: the order of the above embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. And the above describes the specific embodiments of this specification. Other implementations are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain embodiments.

本说明书中的各个实施例均采用递进的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于装置实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment.

本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成，也可以通过程序来指令相关的硬件完成，所述的程序可以存储于一种计算机可读存储介质中，上述提到的存储介质可以是只读存储器，磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps for implementing the above embodiments can be completed by hardware, and can also be completed by instructing related hardware through a program. The program can be stored in a computer-readable storage medium. The above-mentioned The storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk, and the like.

以上所述仅为本发明的较佳实施例，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection of the present invention. within range.

Claims

1. A video processing method, comprising:

segmenting an original video and extracting key frames to obtain at least one video clip and the key frames in each video clip; the position information of an object to be processed contained in image frames in the same video clip on the image frame to which the object to be processed belongs meets preset position information; the preset position information is information corresponding to a video clip to which the image frame belongs;

determining an object to be processed of a key frame in each video clip and position information of the object to be processed of the key frame;

generating a mask of the object to be processed of each key frame based on the object to be processed of each key frame and the position information of the object to be processed of each key frame;

and carrying out object processing on the at least one video clip based on the at least one video clip and the mask of the object to be processed of each key frame to obtain a target video.

2. The video processing method according to claim 1, wherein the segmenting and key frame extracting the original video to obtain at least one video segment and a key frame in each video segment comprises:

carrying out object identification on the original video to obtain the type information of an object to be processed of the original video;

determining a display rule of the object to be processed on the original video based on the type information of the object to be processed;

segmenting the original video based on the display rule to obtain at least one video segment;

determining a preset image frame in each of the at least one video clip as a key frame in each of the at least one video clip.

3. The video processing method according to claim 2, wherein the determining a display rule of the object to be processed on the original video based on the type information of the object to be processed, and segmenting the original video based on the display rule to obtain the at least one video segment comprises:

determining a display area and a display duration of the object to be processed on the original video based on the type information of the object to be processed;

and segmenting the original video based on the display area and the display duration to obtain the at least one video segment.

4. The video processing method according to claim 1, wherein the segmenting and key frame extracting the original video to obtain at least one video segment and a key frame in each video segment comprises:

carrying out object identification on a first image frame in the original video to obtain an object to be processed of the original video; the first image frame is an image frame of the original video where the object to be processed appears for the first time;

determining first position information of the object to be processed on the first image frame;

image interception is carried out on the first image frame based on the first position information, and a first sub-image corresponding to the first position information is obtained;

determining similarity data corresponding to each second image frame in the second image frame set based on the first sub-image and the each second image frame; the second image frame set comprises image frames except the first image frame in the original video;

segmenting the original video based on the similarity data corresponding to each second image frame to obtain at least one video segment;

5. The video processing method according to claim 4, wherein determining similarity data corresponding to each second image frame based on the first sub-image and each second image frame, and segmenting the original video based on the similarity data corresponding to each second image frame to obtain the at least one video segment comprises:

acquiring a second sub-image corresponding to the first position information in each second image frame;

determining similarity data corresponding to each second image frame based on the similarity degree of each second sub-image and the first sub-image;

if the similarity data corresponding to each second image frame meets preset data, obtaining a video clip; the first image frame of the video clip is the first image frame.

6. The video processing method according to claim 5, wherein after determining the similarity data corresponding to each second image frame based on the similarity between each second sub-image and the first sub-image, the method further comprises:

if a first target image frame set exists in the second image frame set, a first target image frame positioned at a first position in the first target image frame set is adjacent to the first image frame, and the first target image frame set is positioned behind the first image frame in the original video, determining a first video segment based on the first image frame and the first target image frame set;

the first target image frame set comprises a first target image frame or a plurality of continuous first target image frames; the first target image frame set is not equal to the second image frame set, and the similarity data corresponding to a first target image frame in the first target image frame set meets the preset data.

7. The video processing method of claim 6, wherein the method further comprises:

taking a difference set between the second image frame set and the first target image frame set as a video to be segmented;

taking a first image frame of the video to be segmented as a new first image frame; taking image frames except the new first image frame in the video to be segmented as a new second image frame set;

determining new first position information of the object to be processed on the new first image frame;

obtaining a new first sub-image corresponding to the new first position information from the new first image frame, and obtaining a new second sub-image corresponding to the new first position information from each second image frame in the new second image frame set;

determining similarity data for each new second sub-image based on the similarity of the new first sub-image and each new second sub-image;

if a new first target image frame set exists in the new second image frame set and the new first target image frame set is positioned adjacent to the first image frame and is positioned after the new first image frame in the original video, determining a second video segment based on the new first image frame and the new first target image frame set;

the new first target image frame set comprises a new first target image frame or a plurality of consecutive new first target image frames; the new first target image frame set is not equal to the new second image frame set, and the similarity data corresponding to the first target image frame in the new first target image frame set meets the preset data.

8. The video processing method according to any of claims 1-7, wherein the method further comprises:

performing preset color proportion detection on the original video to obtain proportion data of preset colors of each image frame in the original video;

if the proportion data of the preset colors of a plurality of image frames in the original video meet third preset data, determining the starting and ending time of the plurality of image frames; wherein the plurality of image frames are consecutive image frames, and a last image frame of the plurality of image frames is a video end image frame.

9. The video processing method according to claim 1, wherein the objects to be processed include a first class of objects to be processed and a second class of objects to be processed, and the determining the objects to be processed of the key frames and the position information of the objects to be processed of the key frames in each video clip includes:

performing first-class object detection on key frames in each video clip based on a first detection model in object detection models, and determining first-class objects to be processed of the key frames in each video clip and position information of the first-class objects to be processed;

and performing second-class object detection on the key frame in each video clip based on a second detection model in the object detection model and the position information of the first-class object to be processed, and determining the second-class object to be processed and the position information of the second-class object to be processed of the key frame in each video clip.

10. The video processing method according to any one of claims 1 to 7 and 9, wherein the generating a mask of the object to be processed of each key frame based on the object to be processed of each key frame and the position information of the object to be processed of each key frame comprises:

and performing binarization processing on the pixels of the object to be processed of each key frame based on the position information of the object to be processed of each key frame to obtain a mask of the object to be processed of each key frame.

11. A video processing apparatus, comprising:

the segmentation module is configured to segment the original video and extract key frames to obtain at least one video segment and key frames in each video segment; the method comprises the steps that position information of an object to be processed contained in image frames in the same video clip on the image frame to which the object to be processed belongs meets preset position information; the preset position information is information corresponding to a video clip to which the image frame belongs;

a determining module configured to perform determining an object to be processed of a key frame and position information of the object to be processed of the key frame in each video clip;

a mask generation module configured to perform a mask generation of the object to be processed of each key frame based on the object to be processed of each key frame and the position information of the object to be processed of each key frame;

and the object removing module is configured to perform object processing on the at least one video segment based on the at least one video segment and the mask of the object to be processed of each key frame to obtain a target video.

12. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the video processing method of any of claims 1 to 10.

13. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the video processing method of any of claims 1 to 10.

14. A computer program product, characterized in that the computer program product comprises a computer program, which is stored in a readable storage medium, from which at least one processor of a computer device reads and executes the computer program, causing the computer device to perform the video processing method or the video processing method according to any one of claims 1 to 10.