CN116017049A

CN116017049A - Video processing method and device and electronic equipment

Info

Publication number: CN116017049A
Application number: CN202211697993.3A
Authority: CN
Inventors: 刘芳龙
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-12-28
Filing date: 2022-12-28
Publication date: 2023-04-25

Abstract

The disclosure provides a video processing method, a video processing device and electronic equipment, relates to the technical field of artificial intelligence, and particularly relates to the technical fields of computer vision, video processing, deep learning and the like. The specific implementation scheme is as follows: acquiring an original video segment to be processed, and extracting at least one frame of original video frame from the original video segment; carrying out information detection on the original video frame to obtain an information detection result, wherein the information detection result is used for representing characteristic information of a target object to be removed from the original video segment; determining at least one frame of original video frame to be processed from the original video segment based on the information detection result, wherein the original video frame to be processed is a video frame containing the information detection result in the original video segment; based on the information detection result, respectively removing information of the extracted original video frame and the original video frame to be processed to obtain a multi-frame target video frame; and combining the multi-frame target video frames to generate a target video segment.

Description

Video processing method and device and electronic equipment

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to the technical fields of computer vision, video processing, deep learning, and the like.

Background

In the related art, objects in a video are mainly processed by image processing software (PS for short), for example, icons (Logo) and subtitles in the video are processed, however, the professional technical requirements of operators are too high, and each frame in the video needs to be manually processed by the operators, so that huge manpower and material resources are required to be input.

Disclosure of Invention

The disclosure provides a video processing method, a video processing device and electronic equipment.

According to an aspect of the present disclosure, there is provided a video processing method including: acquiring an original video segment to be processed, and extracting at least one frame of original video frame from the original video segment; carrying out information detection on the original video frame to obtain an information detection result, wherein the information detection result is used for representing characteristic information of a target object to be removed from the original video segment; determining at least one frame of original video frame to be processed from the original video segment based on the information detection result, wherein the original video frame to be processed is a video frame containing the information detection result in the original video segment; based on the information detection result, respectively carrying out information removal on the extracted original video frame and the original video frame to be processed to obtain a multi-frame target video frame, wherein the target video frame is the video frame from which the target object is removed; and combining the multi-frame target video frames to generate a target video segment.

According to another aspect of the present disclosure, there is provided another video processing method including: displaying an original video segment to be processed on an operation interface, wherein the original video segment comprises at least one frame of original video frame to be extracted; responding to an information removing operation acting on an operation interface, and displaying a target video segment on the operation interface, wherein the target video segment is generated by combining multiple frames of target video frames, the multiple frames of target video frames are obtained by respectively removing information of at least one frame of original video frame and at least one frame of original video frame to be processed, which are extracted based on an information detection result, the information detection result is used for representing characteristic information of a target object to be removed from the original video segment, the at least one frame of original video frame to be processed is determined from the original video segment based on the information detection result, and the original video frame to be processed is a video frame containing the information detection result in the original video segment; and responding to the video editing operation acted on the operation interface, and displaying a video editing result for editing the target video clip on the operation interface.

According to an aspect of the present disclosure, there is provided a video processing apparatus including: the acquisition unit is used for acquiring an original video segment to be processed and extracting at least one frame of original video frame from the original video segment; the detection unit is used for carrying out information detection on the original video frame to obtain an information detection result, wherein the information detection result is used for representing characteristic information of a target object which needs to be removed from the original video segment; the determining unit is used for determining at least one frame of original video frame to be processed from the original video segment based on the information detection result, wherein the original video frame to be processed is a video frame containing the information detection result in the original video segment; the first removing unit is used for respectively removing information from the extracted original video frame and the original video frame to be processed based on the information detection result to obtain a multi-frame target video frame, wherein the target video frame is the video frame from which the target object is removed; and the merging unit is used for merging the multi-frame target video frames to generate a target video segment.

According to an aspect of the present disclosure, there is provided another video processing apparatus including: the first display unit is used for displaying an original video clip to be processed on the operation interface, wherein the original video clip comprises at least one frame of original video frame which needs to be extracted; the second removing unit is used for responding to the information removing operation acted on the operation interface, displaying a target video segment on the operation interface, wherein the target video segment is generated by combining multiple frames of target video frames, the multiple frames of target video frames are obtained by respectively removing information from at least one frame of original video frame and at least one frame of original video frame to be processed, which are extracted based on an information detection result, the information detection result is used for representing characteristic information of a target object to be removed from the original video segment, the at least one frame of original video frame to be processed is determined from the original video segment based on the information detection result, and the original video frame to be processed is a video frame containing the information detection result in the original video segment; and the editing unit is used for responding to the video editing operation acted on the operation interface and displaying a video editing result for editing the target video clip on the operation interface.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the video processing method of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the video processing method of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the video processing method of the embodiments of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of a video processing method according to an embodiment of the present disclosure;

FIG. 2 (a) is a flow chart of another video processing method according to an embodiment of the present disclosure;

FIG. 2 (b) is a schematic diagram of an operator interface for a product for performing a video processing method according to an embodiment of the present disclosure;

FIG. 3 is a flow chart of a method for intelligently removing icons and subtitles from a video in accordance with an embodiment of the present disclosure;

fig. 4 is a schematic diagram of a video processing apparatus according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of another video processing device according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram of an electronic device for implementing a video processing method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

A video processing method according to an embodiment of the present disclosure is described below.

Fig. 1 is a flowchart of a video processing method according to an embodiment of the present disclosure, as shown in fig. 1, the method may include the steps of:

step S102, an original video segment to be processed is obtained, and at least one frame of original video frame is extracted from the original video segment.

In the technical solution provided in the above step S102 of the present disclosure, the original video segment to be processed may be a video segment to be removed of icons and cut subtitles, for example, a video with icons and subtitles downloaded from a video website or a short video platform, at least one frame of original video frame may be extracted from the original video segment to be processed, and the information detection results of other original video frames are determined according to the information detection results of the extracted original video frames, so as to achieve the purposes of saving detection resources and improving detection efficiency.

Alternatively, the number of frames of the original video frames extracted from the original video segment may be an empirical value, for example, 3 frames/second, that is, 3 frames/second are extracted from the original video segment to be processed for processing, where the empirical value is selected to be 3 frames/second because the information detection result of the 3 frames of the original video frames may cover the information detection result of the whole 1 second video segment to the greatest extent, and the number of frames of the original video frames extracted from the original video segment may be different in different application scenarios, and the 3 frames/second is only for illustration and not limited herein specifically.

Alternatively, the original video frames extracted from the original video clip may be frames having a certain interval, for example, 25 frames per 1 second, and 3 frames are extracted from the original video clip, and the extracted 3 frames may be 1 st original video frame, 16 th original video frame, and 25 th original video frame, which are only exemplified herein and not particularly limited.

Step S104, information detection is carried out on the original video frame, and an information detection result is obtained.

In the technical solution provided in step S104 of the present disclosure, the information detection result may be used to characterize the feature information of the target object that needs to be removed from the original video clip. The target object may be an icon and/or a subtitle to be removed from the original video clip, and the feature information may be used to include icon information of the icon to be removed from the original video clip and subtitle information of the subtitle to be removed from the original video clip.

Optionally, after at least one frame of the original video frame is extracted from the original video segment, information detection may be performed on the at least one extracted frame of the original video frame to obtain an information detection result, where the information detection may include icon detection and/or subtitle detection.

Alternatively, when the information detection includes icon detection, and the target object includes an icon that needs to be removed from the original video clip, the information detection result may include an icon detection result of the original video frame; when the information detection includes subtitle detection, and the target object includes a subtitle to be removed from the original video clip, the information detection result may include a subtitle detection result; when the information detection includes icon detection and subtitle detection, the target object includes an icon and a subtitle that need to be removed from the original video clip, the information detection result may include an icon detection result and a subtitle detection result of the original video frame.

Optionally, when information detection is performed on an original video frame, in theory, icon detection may be performed on the original video frame through an icon detection model, then subtitle detection may be performed through a subtitle detection model, then subtitle detection may be performed on the original video frame, and then subtitle detection and icon detection may be performed on the original video frame synchronously, so as to improve the detection speed of information detection on the original video frame. However, when the icons and the subtitles are actually removed, in order to avoid the change of the proportion of the video pictures of the video clips, the icons may be first erased, and then the subtitles may be cut, so that the sequence of the icon detection and the subtitle detection is not specifically limited.

The icon detection model and the caption detection model are models obtained by training the target detection model (ppyo) with specific data by using pre-labeled icon and caption data and detection frame information based on a deep learning framework.

Alternatively, the information detection result may be used to indicate whether the original video frame contains an icon and/or a subtitle, for example, the information detection result may be used to indicate that the original video frame contains no icon, the original video frame contains no subtitle, the original video frame contains no icon and subtitle, the original video frame contains an icon, the original video frame contains a subtitle, and the original video frame contains an icon and a subtitle, which are only for illustration and not limitation.

Step S106, determining at least one frame of the original video frame to be processed from the original video segments based on the information detection result.

In the technical solution provided in step S106 of the present disclosure, the original video segment may include a plurality of original video frames, and according to the information detection result of at least one frame of original video frame extracted in step S102, at least one frame of original video frame to be processed may be determined from the original video segment, where the original video frame to be processed may be a video frame including the information detection result in the original video segment, and the video frame including the information detection result may be a video frame including an icon, a video frame including a subtitle, or a video frame including both an icon and a subtitle, which is not limited herein.

Optionally, the information detection result of the at least one frame of the original video frame to be processed may be the same as the information detection result of the at least one frame of the original video frame extracted in the step S102, so that the remaining original video frames may be processed based on the information detection result of the at least one frame of the original video frame extracted previously, so as to achieve the technical effect of improving the efficiency of information detection of the original video segment.

For example, if the information detection result of at least one frame of the original video frame extracted in step S102 is: if the 1 st original video frame includes an icon that needs to be removed from the original video segment and the 5 th original video frame includes an icon that needs to be removed from the original video segment, it may be determined that the information detection result of the other original video frames between the 1 st original video frame and the 5 th original video frame includes an icon that needs to be removed from the original video segment, that is, the other original video frames between the 1 st original video frame and the 5 th original video frame are to be processed original video frames in the original video segment.

It should be noted that the foregoing is merely illustrative, and the specific implementation manner of determining the to-be-processed original video frame from the original video segment is not limited herein, and other implementation manners capable of determining the to-be-processed original video frame from the original video segment are all within the scope of the present disclosure.

Step S108, based on the information detection result, respectively carrying out information removal on the extracted original video frame and the original video frame to be processed to obtain a multi-frame target video frame.

In the technical solution provided in step S108 of the present disclosure, whether the extracted at least one frame of original video frame contains an icon and/or a subtitle may be determined according to the information detection result of the extracted at least one frame of original video frame in step S102, where the extracted at least one frame of original video frame contains the icon and/or the subtitle, the original video frame to be processed in step S106 may be further determined, and then the extracted original video frame and the original video frame to be processed are respectively subjected to information removal to obtain a multi-frame target video frame, so as to achieve the purpose of performing information removal on the remaining original video frame based on the information detection result of the extracted at least one frame of original video frame, thereby achieving the technical effect of improving the efficiency of performing information removal on the original video segment, where the target video frame is the video frame from which the target object is removed, for example, the icon and/or the subtitle is removed.

Optionally, if the information detection result is that at least one extracted frame of original video frame contains an icon, the extracted original video frame and the icon in the original video frame to be processed can be erased, and a multi-frame target video frame after the icon is erased is obtained; if the information detection result is that the extracted at least one frame of original video frame contains subtitles, the extracted original video frame and the subtitles in the original video frame to be processed can be cut, and a multi-frame target video frame after the subtitle is cut is obtained; if the information detection result is that at least one frame of the original video frame contains the icon and the subtitle, the icon in the extracted original video frame and the icon in the original video frame to be processed can be erased, then the subtitle in the extracted original video frame and the original video frame to be processed is cut, and the multi-frame target video frame after the icon and the subtitle are erased is obtained, so that people and important text information in the original video segment are prevented from being cut.

Step S110, combining the multi-frame target video frames to generate a target video segment.

In the technical solution provided in step S110 of the present disclosure, the multi-frame target video frames obtained after removing the icons and/or the subtitles may be combined to generate a target video segment, where the target video segment may be a video segment after removing the icons and/or the subtitles.

Optionally, the target video clip may be used as a video material for secondary creation of the user, that is, the user may implement secondary creation of the video by re-adding subtitles and dubbing, which is only illustrated herein and not specifically limited.

Through the steps S102 to S110, an original video segment to be processed is obtained, and at least one original video frame is extracted from the original video segment; detecting information of the original video frame to obtain an information detection result; determining at least one frame of the original video frame to be processed from the original video segment based on the information detection result; based on the information detection result, respectively removing information of the extracted original video frame and the original video frame to be processed to obtain a multi-frame target video frame; and combining the multi-frame target video frames to generate a target video segment. That is, in the embodiment of the present disclosure, information detection is performed on an original video frame extracted from an original video segment to be processed, then information removal is performed on the extracted original video frame according to an information detection result, and finally, multi-frame target video frames after information removal are combined to obtain a target video segment, so that the purpose of information removal of other original video frames in the original video segment by an information detection result of the extracted original video frame is achieved, the technical problem of low efficiency of information removal in the video segment is solved, and the technical effect of improving the efficiency of information removal in the video segment is achieved.

The above-described method of this embodiment is described in further detail below.

As an optional implementation manner, step S102, determining, based on the information detection result, at least one frame of the original video frame to be processed from the original video segments, includes: in the original video clip, an original video frame having an association relationship with the extracted original video frame is determined as an original video frame to be processed including the information detection result.

In this embodiment, an original video frame of the original video clip having an association with the extracted original video frame including the information detection result may be determined as an original video frame to be processed including the information detection result according to the information detection result of the extracted at least one original video frame, so that the information detection result of the remaining original video frames may be determined based on the information detection result of the at least one previously extracted original video frame, so as to achieve the technical effect of improving the efficiency of information detection on the original video clip, where the original video frame having an association with the extracted original video frame may be a video frame between two adjacent original video frames among the plurality of extracted original video frames.

For example, the extracted plurality of original video frames may be the 1 st video frame, the 3 rd video frame and the 5 th video frame, and if the detection result is that the 1 st video frame does not include an icon, the 3 rd video frame includes an icon, and the 5 th video frame includes an icon, it may be determined that the original video frame to be processed is a video frame between the 3 rd video frame and the 5 th video frame, that is, the 4 th video frame, which has a high probability of also including an icon.

It should be noted that the foregoing is only one embodiment of determining the original video frame to be processed from the original video segment, and any method for determining the original video frame to be processed from the original video segment is not specifically included in the scope of the present disclosure.

As an alternative embodiment, in the original video clip, determining the original video frame having an association relationship with the extracted original video frame as the original video frame to be processed including the information detection result, including: among the multi-frame original video frames of the original video segment, the original video frames with the frame number smaller than the frame number threshold value with the extracted original video frames are determined as the original video frames to be processed, which comprise the information detection result.

In this embodiment, an original video frame, in which the number of frames between the original video frames of the original video clip and the extracted original video frame is smaller than the frame number threshold, may be determined as an original video frame to be processed including the information detection result, so as to determine the information detection result of the remaining original video frames based on the information detection result of at least one previously extracted original video frame, thereby achieving the technical effect of improving the efficiency of information detection on the original video clip, where the frame number threshold may be a number of frames separated between the extracted original video frames, for example, the extracted original video frames including the information detection result are the 1 st video frame and the 6 th video frame, the number of frames separated between the 1 st video frame and the 6 th video frame is 4, and the frame number threshold is 4, that is, all the 4 video frames separated between the 1 st video frame and the 6 th video frame may be the original video frame to be processed, which is merely illustrated herein, and is not limited specifically.

Alternatively, the information detection result of the original video frame to be processed may directly follow the information detection result of the extracted original video frame, for example, the information detection result of the extracted original video frame is: if the 1 st original video frame includes an icon that needs to be removed from the original video segment and the 4 th original video frame includes an icon that needs to be removed from the original video segment, it may be determined that the information detection result of the original video frame to be processed between the 1 st original video frame and the 4 th original video frame includes an icon that needs to be removed from the original video segment, which is only for illustration and not for limitation.

Alternatively, if the extracted original video frame including the information detection result is an adjacent video frame, for example, the 1 st video frame and the 2 nd video frame, and the number of frames separated between the two video frames is 0 frame, no original video frame to be processed may be directly used along with the information detection result of the extracted original video frame, which is only for illustration and not particularly limited herein.

As an optional implementation manner, step S108, based on the information detection result, performs information removal on the extracted original video frame and the original video frame to be processed, to obtain a plurality of target video frames, including: and respectively carrying out icon erasing on the extracted original video frame and the original video frame to be processed to obtain a target video frame in response to the icon characteristic of the target icon which is the information detection result.

In this embodiment, if the information detection result is the icon feature of the target icon, it may be determined that the original video frame contains the target icon, and then the extracted original video frame and the target icon in the original video frame to be processed may be erased respectively to obtain the target video frame, so as to achieve the technical effect of removing the icon in the original video segment, where the target video frame may be the video frame after removing the target icon, and the icon feature may be used to identify the target icon in the original video frame, for example, the icon feature may be the color, the icon style, and the icon content of the target icon.

It should be noted that, in the embodiment of the present disclosure, the above-mentioned target feature is only one implementation manner for identifying the target icon, and any method that may be used for identifying the target icon is within the scope of protection of the embodiment, which is not illustrated herein.

As an optional implementation manner, step S108, based on the information detection result, performs information removal on the extracted original video frame and the original video frame to be processed, to obtain a plurality of target video frames, including: and respectively cutting subtitle areas of the extracted original video frames and the original video frames to be processed to obtain target video frames, wherein the target video frames are video frames from which the target subtitles are removed, in response to the information detection result as subtitle characteristics of the target subtitles.

In this embodiment, if the information detection result is the caption feature of the target caption, it may be determined that the original video frame contains the target caption, and then the extracted original video frame and the caption area of the target caption in the original video frame to be processed may be cut to obtain the target video frame, so as to achieve the technical effect of removing the caption in the original video segment, where the target video frame may be a video frame after removing the target caption, and the caption feature may be used to identify the target caption in the original video frame, and for example, the caption feature may be information such as a font style or a font content of the target caption.

It should be noted that, in the embodiment of the present disclosure, the above caption feature is only one implementation manner for identifying the target caption, and any method that may be used for identifying the target caption is within the scope of protection of the embodiment, which is not illustrated herein.

As an optional implementation manner, step S108, in this embodiment, respectively performing information removal on the extracted original video frame and the original video frame to be processed based on the information detection result, to obtain a plurality of target video frames, including: respectively carrying out icon erasing on the extracted original video frame and the original video frame to be processed to obtain multi-frame intermediate target video frames in response to the icon characteristics of the target icon and the subtitle characteristics of the target subtitle which are the information detection results; and cutting a subtitle region of the target subtitle in the intermediate target video frame to obtain the target video frame.

In this embodiment, when the information detection result includes both the icon feature of the target icon and the subtitle feature of the target subtitle, icon erasure may be performed on the extracted original video frame and the original video frame to be processed, respectively, to obtain a multi-frame intermediate target video frame, and then the subtitle region of the target subtitle in the intermediate target video frame is cropped, so as to obtain the target video frame, so as to avoid the situation that the sizes of the target video frame after the subtitle is cropped and the target video frame after the icon is cropped are inconsistent, thereby achieving the technical effect of ensuring the display effect of the video clip after the icon and the subtitle are removed, where the intermediate target video frame may be the video frame from which the target icon is removed, and the target video frame may be the video frame from which the target icon and the target subtitle are removed.

Optionally, in the case that the information detection result is the caption feature of the target caption, the adopted information removal operation is to cut the caption area of the target caption, and in order to ensure the playing effect of the video clip, after the caption area is cut, the left and right cutting is usually performed on the video frame, at this time, the proportion of the video frame is also changed, and part of the target icon originally existing on the video frame may be cut, however, the removal of the target icon is the operation performed under the original proportion of the video frame, so that if the information detection result includes the icon feature of the target icon and the caption feature of the target caption at the same time, if the caption area of the target caption is cut first, then the icon is erased, the sizes of the target video frame after the caption cutting and the target video frame after the icon are inconsistent, thereby seriously affecting the display effect after the caption cutting and the video clip, in the steps of the disclosure, the situation that the sizes are inconsistent is first performed, and then the erasing area of the target caption is cut is performed, thereby ensuring the display effect after the video clip and the caption is performed.

It should be noted that, in the case that the information detection result is the caption feature of the target caption, the method of first performing the icon erasing and then cutting the caption area is only a preferred embodiment, and the sequence of the icon erasing and the caption area cutting is not specifically limited herein, that is, any method and process for implementing the icon erasing and the caption area cutting by changing the sequence of the icon erasing and the caption area cutting are within the protection scope of the present disclosure, and are not illustrated herein one by one.

As an optional implementation manner, step S108, based on the information detection result, respectively performs information removal on the extracted original video frame and the original video frame to be processed, to obtain a multi-frame target video frame, including: and detecting that the display area does not display target content, and respectively removing information from the extracted original video frame and the original video frame to be processed based on the information detection result to obtain multi-frame target video frames.

In this embodiment, the target content may be the picture content of the original video segment, for example, the character and the important text information in the original video segment, and when it is detected that the target object does not display the target content in the display area on the video picture of the original video segment, the information of the extracted original video frame and the original video frame to be processed may be removed according to the information detection result, so as to obtain the multi-frame target video frame, so as to achieve the technical effect of cutting the caption area while ensuring that the character and the important text information in the picture are not cut.

Alternatively, the target content may be detected by a face detection model and a text detection model, and if no face or text is detected in the display area of the target object on the video frame of the original video segment, it may be determined that the display area of the target object does not display the target content, where it is to be noted that, only a preferred embodiment of detecting the target content is provided, any method and process for detecting the target content to determine that the display area of the target object on the video frame of the original video segment does not display the target content are not illustrated herein.

As an optional implementation manner, step S102, extracting at least one frame of an original video frame from an original video clip includes: at least one original video frame is extracted from a plurality of original video frames in each time period in the original video clip.

In this embodiment, the video duration of the original video clip may include a plurality of time periods, and at least one frame of the original video frame may be extracted from a plurality of original video frames in each time period in the original video clip, where each time period includes the same number of frames of video frames, for example, the time period may be 1 second, and 25 frames of video may be included in 1 second, which is only for illustration and not specifically limited to the time period.

As an optional implementation manner, step S104, performing information detection on the original video frame, where obtaining the information detection result includes: determining at least one target area in the video picture of the extracted original video frame; and detecting information of the target area to obtain an information detection result.

In this embodiment, when information detection is performed on an original video frame, at least one target area may be determined in a video frame of the extracted original video frame according to priori knowledge, and then information detection is performed on the target area to obtain an information detection result, so as to exclude unnecessary detection areas and save detection time, thereby achieving the technical effect of improving the detection efficiency of information detection, where the target area may be an area where an icon or caption predicted according to priori knowledge is located in the video frame.

It should be noted that, the method for determining the target area in the video frame of the extracted original video frame according to the prior knowledge is only a preferred embodiment, and the method for determining the target area is not specifically limited herein, and any method and process for determining the target area in the video frame of the extracted original video frame are within the scope of the present disclosure, which is not illustrated herein.

As an alternative embodiment, determining at least one target area in the video frame of the extracted original video frame includes: in the video picture of the extracted original video frame, a corner area associated with the target icon or an edge area associated with the target subtitle is determined.

In this embodiment, when the target object is the target icon, a corner area associated with the target icon may be determined in the video frame of the extracted original video frame, and the corner area is determined as the target area, for example, four corner areas of 1/3 width and height of the video frame, where the corner areas may be four corners of the video frame, including an upper left corner, a lower left corner, an upper right corner, and a lower right corner; when the target object is the target subtitle, an edge area associated with the target subtitle may be determined in the video frame of the extracted original video frame, and the edge area may be determined as the target area, for example, an edge area below the video frame and accounting for 0.38 of the height of the entire video frame, where the edge areas may be an upper area and a lower area of the video frame, and it should be noted that the above 1/3 and 0.38 are only illustrative, and are not limited specifically herein.

Optionally, by determining the corner area associated with the target icon or the edge area associated with the target subtitle in the video frame of the extracted original video frame by the method, unnecessary detection areas can be eliminated when the original video frame is subjected to information detection, so that the purpose of saving detection time is achieved, and the technical effect of improving the detection efficiency of information detection is achieved.

As an optional implementation manner, step S104, performing information detection on the original video frame to obtain an information detection result, includes: and inputting the extracted original video frame into an information detection model to carry out information detection, so as to obtain an information detection result.

In this embodiment, when information detection is performed on an original video frame, the extracted original video frame may be input into an information detection model to perform information detection, so as to obtain an information detection result, so as to achieve a technical effect of performing information detection on the original video frame, where the information detection model may be a model obtained by training an initial detection model based on a video frame sample and a characteristic information sample of an icon and a subtitle marked in advance in the video frame sample, and the initial detection model may be a ppyo model, which is not specifically limited herein.

Optionally, the information detection model may be used to detect the icon and the subtitle in the original video frame at the same time, and the information detection model may also include an icon detection model and a subtitle detection model, where the icon detection model is used to detect the icon in the original video frame, and the subtitle detection model is used to detect the subtitle in the original video frame, where the type of the information detection model is not specifically limited, and any type of the information detection model used to detect the icon and the subtitle in the original video frame is within the scope of the disclosure, which is not illustrated one by one.

As an optional implementation manner, in step S102, a target frame number is spaced between every two adjacent original video frames in the extracted multi-frame original video frames.

In this embodiment, an interval target frame number is required between every two adjacent original video frames in a multi-frame original video frame extracted from an original video segment, so as to improve the coverage of the information detection result of the extracted original video frame in the original video segment, so that the obtained information detection result is more comprehensive, and the technical effects of saving detection resources and improving the information detection efficiency are achieved.

Alternatively, if the target frame number of the interval between every two adjacent original video frames in the extracted multi-frame original video frames is zero, that is, if the extracted multi-frame original video frames are adjacent original video frames in the original video clips, the information detection results of other original video frames cannot be determined by the information detection results of the extracted original video frames, for example, 10 frames in the 1 second original video clip, the original video frames extracted from the original video clips are the 4 th original video frame and the 5 th original video frame, the information detection results of the 4 th original video frame are the icons which need to be removed from the original video clips, the information detection results of the 5 th original video frame are the icons which need to be removed from the original video clips, that is, only the information detection results of the 4 th original video frame to the 5 th original video frame can be determined based on the information detection results and are unchanged, the method of spacing the target frame number between every two adjacent original video frames in the multi-frame original video frames extracted in the original video clips can not confirm the information detection results of other original video frames except the 4 th original video frame and the 5 th original video frame, thereby not only avoiding the problem that the information detection results of other video frames can not be detected according to the information detection results because the information detection results of the adjacent video frames are unchanged, but also achieving the technical effect of improving the coverage of the information detection results of the extracted original video frames in the original video clips, leading to more comprehensive obtained information detection results, the method saves detection resources, improves information detection efficiency, and also enables information removal operation applicable to other video frames to be more accurate.

Another video processing method according to an embodiment of the present disclosure is described below.

Fig. 2 (a) is a flowchart of another video processing method according to an embodiment of the present disclosure, as shown in fig. 2 (a), the method may include the steps of:

step S202, displaying the original video segment to be processed on the operation interface.

In the technical solution provided in step S202 of the present disclosure, the original video segment to be processed may be displayed on an operation interface, where the operation interface may be a display interface for processing the original video segment, for example, a display interface for software for processing the original video segment, the original video segment may include at least one frame of original video frame to be extracted, the original video frame may be a video frame of the original video segment, and the original video segment may be a video segment to be removed of an icon and a subtitle, for example, a video with an icon and a subtitle downloaded from a video website or a short video platform, which is merely illustrative herein, and the method for obtaining the original video segment is not specifically limited.

Step S204, in response to the information removal operation on the operation interface, displaying the target video clip on the operation interface.

In the technical solution provided in step S204 of the present disclosure, a control for triggering an information removal operation may be set on an operation interface, if the information removal operation is triggered on the operation interface, an information removed target video clip may be displayed on the operation interface, where the target video clip may be a video clip with an icon and/or a subtitle removed, the target video clip may be a video clip generated by combining multiple frames of target video frames, the multiple frames of target video frames may be a video frame obtained by respectively performing information removal on at least one extracted frame of original video frame and at least one frame of original video frame to be processed according to an information detection result, the information detection result may be used to characterize characteristic information of a target object to be removed from the original video clip, the target object may include an icon and/or a subtitle, and the at least one frame of original video frame to be processed may be an original video frame in the original video clip including the information detection result, for example, an original video frame including an icon in the original video clip, an original video frame including a subtitle in the original video clip, and an original video frame including both an icon and an original video frame including a subtitle, which are not specifically illustrated herein.

Step S206, responding to the video editing operation on the operation interface, and displaying the video editing result of editing the target video clip on the operation interface.

In the technical solution provided in step S206 of the present disclosure, after the target video clip with the icon and/or the subtitle removed is obtained, further editing and creation may be performed on the target video clip through a video editing operation on the operation interface, and a video editing result for editing the target video clip is displayed on the operation interface, where the video editing operation may include a video editing operation, a subtitle adding operation, an audio adding operation, and the like, which are only for illustration and not limiting the specific content of the video editing operation.

Fig. 2 (b) is a schematic diagram of an operation interface of a product for performing a video processing method according to an embodiment of the present disclosure, as shown in fig. 2, an original video clip 201 includes a caption 2011 and an icon 2012, and an information removal operation on the operation interface may be triggered by a control 202, so as to obtain a target video clip 203 after the caption 2011 and the icon 2012 are removed.

It should be noted that, the display mode of the operation interface is not specifically limited herein, and any display mode of the operation interface for removing subtitles and icons from the original video clip is within the protection scope of the present disclosure, which is not illustrated herein.

Through the steps S202 to S206, displaying the original video clip to be processed on the operation interface; responding to the information removing operation acted on the operation interface, and displaying a target video clip on the operation interface; and responding to video editing operation acted on the operation interface, and displaying a video editing result for editing the target video clip on the operation interface, thereby solving the technical problem of low efficiency of removing the information in the video clip and realizing the technical effect of improving the efficiency of removing the information in the video clip.

As an optional implementation manner, step S204, in response to a video editing instruction acting on the operation interface, displays a video editing result of editing the target video clip on the operation interface, includes: responding to the subtitle adding operation acted on the operation interface, and displaying a video editing result of adding the target subtitle to the target video clip on the operation interface; and/or, in response to an audio adding operation acting on the operation interface, displaying a video editing result of adding the target audio to the target video clip on the operation interface.

In this embodiment, the subtitle may be added again for the target video clip by the subtitle adding operation on the operation interface, and the video editing result of adding the target subtitle to the target video clip may be displayed on the operation interface, and/or the audio may be added again for the target video clip by the audio adding operation on the operation interface, and the video editing result of adding the target audio to the target video clip may be displayed on the operation interface, so as to achieve the technical effect of performing secondary authoring on the target video clip, where the target subtitle may be used to add the subtitle again for the target video clip, and the target audio may be used to re-dub the target video clip, for example, the target video clip may be used as a video material, and the subtitle and dubbing may be added again for the target video clip by the subtitle adding operation and the audio adding operation, so as to achieve secondary authoring on the target video clip, which is merely illustrated herein without being specifically limited.

As an alternative embodiment, the method further comprises: responding to video acquisition operation acted on an operation interface, and calling an original video clip from a short video platform; and/or transmitting the video editing result to the short video platform in response to the video transmitting operation acting on the operation interface.

In this embodiment, an original video clip may be directly called from a short video platform through a video obtaining operation on an operation interface, and after the original video clip is processed, a video editing result may be sent to the short video platform through a video sending operation on the operation interface, so as to achieve a technical effect of improving operation convenience of obtaining the original video clip and sharing the video editing result, and provide convenience for a user to create, where the short video platform may be software or a website for playing a short video, etc., and it is to be noted that the above description of the short video platform is only illustrative, and the short video platform is not specifically limited herein, and any platform that can obtain the original video clip and send the original video clip is within the protection scope of the present disclosure, and is not illustrated herein.

The video processing method of the embodiments of the present disclosure is further described below in connection with the preferred embodiments.

In the related art, aiming at icons and subtitles in a video, professional staff mainly uses professional video software or image editing software to erase, or a user manually cuts the icons and subtitles in the video frame by frame, so that the problem of low efficiency of removing the icons and the subtitles in the video is caused.

However, the embodiment of the disclosure provides a method for intelligently removing icons and subtitles in a video, which can automatically erase icons and cut subtitles by using a network through a deep learning method, so that the technical effect of improving the efficiency of removing icons and subtitles in the video is achieved, a user can input only a video to be processed, the video after removing the icons and the subtitles can be quickly obtained, the processed video can be used as a video material for secondary creation of the user, and the user can realize the secondary creation of the video by adding the subtitles and dubbing again.

Aiming at icons and subtitles in videos, the embodiment of the disclosure can adopt pre-marked data of the icons and the subtitles and detection frame information based on a deep learning framework, and perform special data training through Ppyolo to obtain subtitle detection and an icon detection model. The detection frame information can comprise icon frame information and subtitle frame information, the detection frame can be used for guiding supervised training, and the detection frame is utilized for marking places with icons and places with subtitles in the video, namely, the detection frame can be used for marking the icons and the subtitles in the video, so that model training is carried out.

Fig. 3 is a flowchart of a method for intelligently removing icons and subtitles from a video according to an embodiment of the disclosure, as shown in fig. 3, the method may include the steps of:

in step S301, a video to be processed is input.

In the technical solution provided in step S301 of the present disclosure, the video to be processed may be a video with icons and subtitles that needs to be removed, for example, a video with subtitles and icons downloaded from a website or a short video platform, which is only for illustration and not particularly limited.

Step S302, 3 frames/second are extracted from the video to be processed.

In the technical solution provided in step S302 of the present disclosure, 3 frames per second may be extracted from the video to be processed input in step S301 to perform parallel processing, that is, 3 frames per second may be extracted to perform icon detection and subtitle detection synchronously, so as to achieve the purpose of improving the video processing speed.

Alternatively, the 3 frames extracted above refer to frames with a certain interval, for example, 25 frames in 1 second, and the 3 frames extracted may be 1 st, 16 th and 25 th frames, so as to avoid that the icon detection result and the caption detection result, which are changed later, cannot be detected according to the previous icon detection result and caption detection result, because the icon detection result and the caption detection result of the three adjacent frames are not changed.

It should be noted that, the above-mentioned extracted frame number "3 frames/second" is an empirical value, and the extracted frame number in practical application may be a custom value, which is only illustrated herein, and the "extracted frame number" is not specifically limited.

In step S303, icon detection is performed on the content extracted from the video to be processed.

In the technical solution provided in the above step S303 of the present disclosure, when performing icon detection, according to a priori knowledge that icons in a video are generally located in corners, an icon detection result of four corners of 1/3 width and height of a video frame is selected as an icon to be removed finally, if the detection result is that an icon exists in the video to be processed, step S304 is entered, and the icon is erased frame by frame, that is, the detected icon position is utilized to erase the icon in each frame in the video, then step S305 is entered, and subtitle detection is continued for content extracted from the video to be processed, and if the detection result is that no icon exists in the video to be processed, step S305 is entered directly, and subtitle detection is performed for content extracted from the video to be processed.

Alternatively, the icon detection result of the video to be processed may be a detection result obtained by integrating the icon detection results of the respective frames into one frame.

Step S305 performs subtitle detection on content extracted from the video to be processed.

In the technical scheme provided in the above step S305 of the present disclosure, when performing subtitle detection on a video to be processed, a detection result in a region with a proportion of 0.38 of the height of the whole video frame below the video frame may be selected as a subtitle to be finally cut, if the detection result is that the subtitle exists in the video to be processed, step S306 is entered, text detection and face detection are performed on video content after the icon is erased, step S307 is entered after the detection result is obtained, the subtitle in the video to be processed is cut according to the detection results of the text detection and the face detection, step S308 is entered, and the video result is outputted, so as to achieve the purpose of cutting the subtitle region while ensuring that characters and important text information in the frame are not cut; if the detected result is that the subtitle does not exist in the video to be processed, the step S308 is entered, and the video result is output.

Alternatively, the video result may be a video synthesized after the icon in the video to be processed is erased and the subtitle is cut.

The method for intelligently removing the icons and the subtitles in the video recorded in the steps S301 to S308 can be applied to various occasions requiring video editing for removing the icons and the subtitles in the video clips, namely, a user can directly obtain the video from which the icons and the subtitles are removed after inputting the video clips requiring processing and waiting for network processing to complete, so that the technical problem of low efficiency of removing the information in the video clips is solved, and the technical effect of improving the efficiency of removing the information in the video clips is realized.

It should be noted that, the sequence of performing icon detection and subtitle detection on the video to be processed in the above embodiment is only a preferred example, and the sequence of performing icon detection and subtitle detection is not specifically limited herein, that is, a method of performing subtitle detection on the video to be processed first and then performing icon detection, and a method of performing subtitle detection and icon detection on the video to be processed simultaneously are all within the scope of protection of the present disclosure.

The embodiment of the disclosure also provides a video processing device for executing the video processing method of the embodiment shown in fig. 1.

Fig. 4 is a schematic diagram of a video processing apparatus according to an embodiment of the present disclosure, and as shown in fig. 4, the video processing apparatus 400 may include: an acquisition unit 401, a detection unit 402, a determination unit 403, a first removal unit 404, and a merging unit 405.

The obtaining unit 401 is configured to obtain an original video segment to be processed, and extract at least one frame of original video frame from the original video segment.

The detecting unit 402 is configured to perform information detection on an original video frame to obtain an information detection result, where the information detection result is used to characterize feature information of a target object that needs to be removed from the original video segment.

A determining unit 403, configured to determine, based on the information detection result, at least one frame of an original video frame to be processed from the original video segment, where the original video frame to be processed is a video frame including the information detection result in the original video segment.

The first removing unit 404 is configured to remove information from the extracted original video frame and the original video frame to be processed, respectively, based on the information detection result, to obtain a multi-frame target video frame, where the target video frame is a video frame from which the target object is removed.

And the merging unit 405 is configured to merge multiple frames of target video frames to generate a target video segment.

Alternatively, the determination unit 401 includes: and the determining module is used for determining the original video frame with the association relation with the extracted original video frame in the original video fragment as the original video frame to be processed comprising the information detection result.

Optionally, the determining module includes: and the determining submodule is used for determining the original video frames with the frame number smaller than the frame number threshold value between the original video frames and the extracted original video frames in the multi-frame original video frames of the original video fragments as the original video frames to be processed, wherein the original video frames comprise information detection results.

Optionally, the first removing unit 404 includes: the first icon erasing module is used for respectively carrying out icon erasing on the extracted original video frame and the original video frame to be processed to obtain a target video frame in response to the icon characteristics of which the information detection result is the target icon, wherein the target video frame is the video frame from which the target icon is removed.

Optionally, the first removing unit 404 includes: the first clipping module is used for respectively clipping the subtitle areas of the extracted original video frames and the original video frames to be processed to obtain target video frames in response to the information detection result as the subtitle characteristics of the target subtitles, wherein the target video frames are video frames from which the target subtitles are removed.

Optionally, the first removing unit 404 includes: the second icon erasing module is used for respectively carrying out icon erasing on the extracted original video frame and the original video frame to be processed to obtain multi-frame intermediate target video frames in response to the icon characteristics of the target icon and the subtitle characteristics of the target subtitle which are obtained by the information detection result, wherein the intermediate target video frames are video frames from which the target icon is removed; and the second clipping module is used for clipping the subtitle region of the target subtitle in the intermediate target video frame to obtain the target video frame, wherein the target video frame is the video frame from which the target icon and the target subtitle are removed.

Optionally, the first removing unit 404 includes: and the information removing module is used for detecting that the display area does not display target content, and respectively removing information of the extracted original video frame and the original video frame to be processed based on the information detection result to obtain multi-frame target video frames.

Alternatively, the acquisition unit 401 includes: and the extraction module is used for extracting at least one frame of original video frame from a plurality of original video frames in each time period in the original video segment.

Optionally, the detection unit 402 includes: the first determining module is used for determining at least one target area in the video picture of the extracted original video frame; and the first information detection module is used for detecting information of the target area to obtain an information detection result.

Optionally, the first determining module includes: and the first determining submodule is used for determining a corner area associated with the target icon or an edge area associated with the target subtitle in the video picture of the extracted original video frame.

Optionally, the detection unit 402 includes: the second information detection module is used for inputting the extracted original video frames into the information detection model to carry out information detection to obtain an information detection result, wherein the information detection model is obtained by training the initial detection model based on the video frame samples and characteristic information samples of target objects marked in the video frame samples.

In the video processing device of the embodiment of the disclosure, an obtaining unit is configured to obtain an original video segment to be processed, and extract at least one frame of original video frame from the original video segment; the detection unit is used for carrying out information detection on the original video frame to obtain an information detection result, wherein the information detection result is used for representing characteristic information of a target object which needs to be removed from the original video segment; the determining unit is used for determining at least one frame of original video frame to be processed from the original video segment based on the information detection result, wherein the original video frame to be processed is a video frame containing the information detection result in the original video segment; the first removing unit is used for respectively removing information from the extracted original video frame and the original video frame to be processed based on the information detection result to obtain a multi-frame target video frame, wherein the target video frame is the video frame from which the target object is removed; the merging unit is used for merging the multi-frame target video frames to generate a target video segment, so that the technical problem of low efficiency of removing the information in the video segment is solved, and the technical effect of improving the efficiency of removing the information in the video segment is realized.

The embodiment of the disclosure also provides a video processing device for executing the video processing method of the embodiment shown in fig. 2.

Fig. 5 is a schematic diagram of another video processing apparatus according to an embodiment of the present disclosure, as shown in fig. 5, the video processing apparatus 500 may include: a first display unit 501, a second removal unit 502, and an editing unit 503.

The first display unit 501 is configured to display an original video segment to be processed on an operation interface, where the original video segment includes at least one frame of an original video frame that needs to be extracted.

The second removing unit 502 is configured to respond to an information removing operation applied to the operation interface, and display a target video segment on the operation interface, where the target video segment is generated by combining multiple frames of target video frames, the multiple frames of target video frames are obtained by respectively removing information from at least one extracted frame of original video frame and at least one frame of original video frame to be processed based on an information detection result, the information detection result is used to characterize feature information of a target object to be removed from the original video segment, the at least one frame of original video frame to be processed is determined from the original video segment based on the information detection result, and the original video frame to be processed is a video frame including the information detection result in the original video segment.

An editing unit 503 for displaying a video editing result of editing the target video clip on the operation interface in response to a video editing operation acting on the operation interface.

Alternatively, the editing unit 503 includes: the first adding module is used for responding to the subtitle adding operation acted on the operation interface and displaying a video editing result of adding the target subtitle to the target video clip on the operation interface; and/or a second adding module, which is used for responding to the audio adding operation acted on the operation interface and displaying the video editing result of adding the target audio to the target video clip on the operation interface.

In the video processing device of the embodiment of the disclosure, a first display unit is configured to display an original video segment to be processed on an operation interface, where the original video segment includes at least one frame of original video frame that needs to be extracted; the second removing unit is used for responding to the information removing operation acted on the operation interface, displaying a target video segment on the operation interface, wherein the target video segment is generated by combining multiple frames of target video frames, the multiple frames of target video frames are obtained by respectively removing information from at least one frame of original video frame and at least one frame of original video frame to be processed, which are extracted based on an information detection result, the information detection result is used for representing characteristic information of a target object to be removed from the original video segment, the at least one frame of original video frame to be processed is determined from the original video segment based on the information detection result, and the original video frame to be processed is a video frame containing the information detection result in the original video segment; the editing unit is used for responding to the video editing operation acted on the operation interface and displaying the video editing result for editing the target video clip on the operation interface, thereby solving the technical problem of low efficiency of removing the information in the video clip and realizing the technical effect of improving the efficiency of removing the information in the video clip.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 6 is a schematic diagram of an electronic device for implementing a video processing method of an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM602, and RAM603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the respective methods and processes described above, for example, a video processing method. For example, in some embodiments, the video processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM602 and/or the communication unit 609. When a computer program is loaded into RAM603 and executed by computing unit 601, one or more steps of the video processing method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the video processing method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that the various forms of flow shown above may be used to reorder, add, or otherwise

And deleting. For example, the steps recited in the present disclosure may be performed in parallel or sequentially 5 or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions, 0 improvements, etc. that are within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A video processing method, comprising:

acquiring an original video segment to be processed, and extracting at least one frame of original video frame from the original video segment;

performing information detection on the original video frame to obtain an information detection result, wherein the information detection result is used for representing characteristic information of a target object to be removed from the original video segment;

determining at least one frame of original video frame to be processed from the original video segment based on the information detection result, wherein the original video frame to be processed is a video frame containing the information detection result in the original video segment;

Based on the information detection result, respectively carrying out information removal on the extracted original video frame and the original video frame to be processed to obtain a multi-frame target video frame, wherein the target video frame is the video frame from which the target object is removed;

and merging the multi-frame target video frames to generate a target video segment.

2. The method of claim 1, wherein determining at least one frame of the original video to be processed from the original video segment based on the information detection result comprises:

and in the original video segment, determining the original video frame with the association relation with the extracted original video frame as the original video frame to be processed comprising the information detection result.

3. The method of claim 2, wherein determining, in the original video clip, an original video frame having an association with the extracted original video frame as the original video frame to be processed including the information detection result comprises:

and determining the original video frames with the frame number smaller than a frame number threshold value between the original video frames of the original video segments and the extracted original video frames as the original video frames to be processed, wherein the original video frames comprise the information detection result.

4. The method of claim 1, wherein based on the information detection result, respectively performing information removal on the extracted original video frame and the original video frame to be processed to obtain a plurality of target video frames includes:

and respectively carrying out icon erasing on the extracted original video frame and the original video frame to be processed to obtain the target video frame in response to the information detection result as the icon characteristic of the target icon, wherein the target video frame is the video frame from which the target icon is removed.

5. The method of claim 1, wherein based on the information detection result, respectively performing information removal on the extracted original video frame and the original video frame to be processed to obtain a plurality of target video frames includes:

and respectively cutting subtitle areas of the extracted original video frames and the original video frames to be processed to obtain target video frames in response to the information detection result as subtitle characteristics of the target subtitles, wherein the target video frames are video frames from which the target subtitles are removed.

6. The method of claim 1, wherein based on the information detection result, respectively performing information removal on the extracted original video frame and the original video frame to be processed to obtain a plurality of target video frames includes:

Respectively carrying out icon erasing on the extracted original video frame and the original video frame to be processed to obtain multi-frame intermediate target video frames in response to the information detection result as the icon characteristic of the target icon and the subtitle characteristic of the target subtitle, wherein the intermediate target video frames are video frames from which the target icon is removed;

and cutting the subtitle region of the target subtitle in the intermediate target video frame to obtain the target video frame, wherein the target video frame is a video frame from which the target icon and the target subtitle are removed.

7. The method of claim 1, wherein the information detection result includes a display area of the target object on a video frame of the original video segment, and based on the information detection result, respectively performing information removal on the extracted original video frame and the original video frame to be processed to obtain a multi-frame target video frame, including:

and detecting that the display area does not display target content, and respectively carrying out information removal on the extracted original video frame and the original video frame to be processed based on the information detection result to obtain a multi-frame target video frame.

8. The method of claim 1, wherein the video duration of the original video clip comprises a plurality of time periods, and extracting at least one original video frame from the original video clip comprises:

and extracting the at least one frame of original video frame from the plurality of original video frames in each time period in the original video segment.

9. The method of claim 1, wherein performing information detection on the original video frame to obtain an information detection result comprises:

determining at least one target area in the video picture of the extracted original video frame;

and detecting information of the target area to obtain the information detection result.

10. The method of claim 9, wherein the target object comprises a target icon and/or a target subtitle, and determining at least one target area in the video frame of the extracted original video frame comprises:

and determining a corner area associated with the target icon or an edge area associated with the target subtitle in the video picture of the extracted original video frame.

11. The method according to any one of claims 1 to 10, wherein performing information detection on the original video frame to obtain an information detection result includes:

And inputting the extracted original video frame into an information detection model for information detection to obtain the information detection result, wherein the information detection model is obtained by training the initial detection model based on a video frame sample and a characteristic information sample of a target object marked in the video frame sample.

12. The method of any of claims 1 to 10, wherein a target number of frames is spaced between each adjacent two of the extracted multi-frame original video frames.

13. A video processing method, comprising:

displaying an original video segment to be processed on an operation interface, wherein the original video segment comprises at least one frame of original video frame to be extracted;

responding to an information removing operation acting on the operation interface, and displaying a target video segment on the operation interface, wherein the target video segment is generated by combining multiple frames of target video frames, the multiple frames of target video frames are obtained by respectively removing information of at least one extracted original video frame and at least one original video frame to be processed based on an information detection result, the information detection result is used for representing characteristic information of a target object to be removed from the original video segment, the at least one original video frame to be processed is determined from the original video segment based on the information detection result, and the original video frame to be processed is a video frame containing the information detection result in the original video segment;

And responding to video editing operation acted on the operation interface, and displaying a video editing result for editing the target video clip on the operation interface.

14. The method of claim 13, wherein displaying, on the operator interface, a video editing result of editing the target video clip in response to a video editing instruction acting on the operator interface comprises:

responding to the subtitle adding operation acted on the operation interface, and displaying a video editing result of adding a target subtitle to the target video segment on the operation interface; and/or

And responding to the audio adding operation acted on the operation interface, and displaying a video editing result of adding the target audio to the target video clip on the operation interface.

15. The method of claim 13, further comprising:

responding to video acquisition operation acted on the operation interface, and calling the original video clip from a short video platform; and/or

And responding to video sending operation acted on the operation interface, and sending the video editing result to the short video platform.

16. A video processing apparatus comprising:

The device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an original video segment to be processed and extracting at least one frame of original video frame from the original video segment;

the detection unit is used for carrying out information detection on the original video frame to obtain an information detection result, wherein the information detection result is used for representing characteristic information of a target object which needs to be removed from the original video segment;

the determining unit is used for determining at least one frame of original video frame to be processed from the original video fragments based on the information detection result, wherein the original video frame to be processed is a video frame containing the information detection result in the original video fragments;

the first removing unit is used for removing information from the extracted original video frame and the original video frame to be processed respectively based on the information detection result to obtain a multi-frame target video frame, wherein the target video frame is the video frame from which the target object is removed;

and the merging unit is used for merging the multi-frame target video frames to generate a target video segment.

17. The apparatus of claim 16, wherein the determining unit comprises:

And the determining module is used for determining the original video frame with the association relation with the extracted original video frame in the original video fragment as the original video frame to be processed comprising the information detection result.

18. A video processing apparatus comprising:

the first display unit is used for displaying an original video clip to be processed on the operation interface, wherein the original video clip comprises at least one frame of original video frame which needs to be extracted;

the second removing unit is used for responding to the information removing operation acted on the operation interface, displaying a target video segment on the operation interface, wherein the target video segment is generated by combining multiple frames of target video frames, the multiple frames of target video frames are obtained by respectively removing information of at least one frame of original video frame and at least one frame of original video frame to be processed, which are extracted based on an information detection result, the information detection result is used for representing characteristic information of a target object to be removed from the original video segment, the at least one frame of original video frame to be processed is determined from the original video segment based on the information detection result, and the original video frame to be processed is a video frame containing the information detection result in the original video segment;

And the editing unit is used for responding to the video editing operation acted on the operation interface and displaying a video editing result for editing the target video clip on the operation interface.

19. The apparatus of claim 18, wherein the editing unit comprises:

the first adding module is used for responding to the subtitle adding operation acted on the operation interface and displaying a video editing result of adding the target subtitle to the target video clip on the operation interface; and/or

And the second adding module is used for responding to the audio adding operation acted on the operation interface and displaying the video editing result of adding the target audio to the target video clip on the operation interface.

20. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-15.

21. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-15.

22. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-15.