CN113259713A

CN113259713A - Video processing method and device, terminal equipment and storage medium

Info

Publication number: CN113259713A
Application number: CN202110441021.7A
Authority: CN
Inventors: 王丹丹; 赵学华; 张平安
Original assignee: Shenzhen Institute of Information Technology
Current assignee: Shenzhen Institute of Information Technology
Priority date: 2021-04-23
Filing date: 2021-04-23
Publication date: 2021-08-13

Abstract

The application provides a video processing method, a video processing device, terminal equipment and a storage medium, relates to the technical field of video processing, and can implant target pictures such as advertisement pictures into a video efficiently and properly, so that information in the target pictures is pushed to a user under the condition that the watching experience of the user is not influenced. The video processing method comprises the following steps: determining at least one first video frame from a video to be processed, and determining a first target area in the first video frame, wherein an area edge corresponding to the first target area meets a preset edge condition; for each first video frame, obtaining a second video frame according to a target picture and edge information of a first target area in the first video frame, wherein the second video frame comprises the target picture; and obtaining a target video according to each second video frame.

Description

Video processing method and device, terminal equipment and storage medium

Technical Field

The present application belongs to the field of video processing technologies, and in particular, to a video processing method and apparatus, a terminal device, and a storage medium.

Background

With the development of mobile internet, people generally begin to search and watch video resources from the network, and particularly with the development of various video applications, mass video contents emerge, and vast netizens are also more and more accustomed to obtaining information by watching videos.

Disclosure of Invention

The embodiment of the application provides a video processing method, a video processing device, a terminal device and a storage medium, which can implant target pictures such as advertisement pictures into a video efficiently and properly, so that information in the target pictures is pushed to a user under the condition that the watching experience of the user is not influenced.

In a first aspect, an embodiment of the present application provides a video processing method, including:

determining at least one first video frame from a video to be processed, and determining a first target area in the first video frame, wherein an area edge corresponding to the first target area meets a preset edge condition;

for each first video frame, obtaining a second video frame according to a target picture and edge information of a first target area in the first video frame;

and obtaining a target video according to each second video frame.

In a second aspect, an embodiment of the present application provides a video processing apparatus, including:

the device comprises a determining module, a processing module and a processing module, wherein the determining module is used for determining at least one first video frame from a video to be processed and determining a first target area in the first video frame, and the area edge corresponding to the first target area meets a preset edge condition;

a first obtaining module, configured to obtain, for each of the first video frames, a second video frame according to a target picture and edge information of a first target area in the first video frame, where the second video frame includes the target picture;

and the second acquisition module is used for acquiring the target video according to each second video frame.

In a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program, and the video processing method is implemented.

In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the video processing method.

In a fifth aspect, the present application provides a computer program product, which when run on a terminal device, causes the terminal device to execute the video processing method according to any one of the above first aspects.

The video processing method provided by the embodiment of the application determines at least one first video frame containing a first target area from a video to be processed, and obtains a second video frame according to a target picture and edge information of the first target area in the first video frame for each first video frame, wherein the first target area can be an area which is expected and can be used for fusing the target picture, so that the second video frame containing the target picture can be obtained based on the edge information of the first target area in the first video frame, and finally, the target video is obtained according to each second video frame. The target pictures are implanted into the target areas of the first video frames of the target video, so that information in the target pictures can be synchronously pushed to a user in the process of playing the target video, the user does not need to wait for the completion of pushing the target pictures to watch the video, and the watching experience of the user can be better improved.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flowchart of a video processing method according to an embodiment of the present application.

Fig. 2 is a flowchart illustrating a specific implementation of step S11 of the video processing method according to an embodiment of the present application.

Fig. 3 is a flowchart illustrating a specific implementation of step S11 of the video processing method according to another embodiment of the present application.

Fig. 4 is a flowchart illustrating a specific implementation of step S11 of the video processing method according to another embodiment of the present application.

Fig. 5 is a flowchart illustrating a specific implementation of step S45 of the video processing method according to another embodiment of the present application.

Fig. 6 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application.

Fig. 7 is a schematic structural diagram of a terminal device according to another embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details.

In the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

The video processing method provided by the application can better optimize the current video processing algorithm, so that the existing terminal equipment can implant target pictures such as advertisement pictures and the like into the video better, efficiently and properly.

The following describes an exemplary video processing method provided by the present application with a specific embodiment.

Referring to fig. 1, a flow chart of a video processing method provided in an embodiment of the present application is schematically illustrated. The main execution body of the video processing method in the embodiment is terminal equipment. The video processing method provided by the embodiment of the application is suitable for terminal equipment such as a mobile phone, a tablet computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), and the like, and the embodiment of the application does not limit the specific type of the terminal equipment at all.

The video processing method as shown in fig. 1 comprises the following steps:

s11: determining at least one first video frame from a video to be processed, and determining a first target area in the first video frame, wherein an area edge corresponding to the first target area meets a preset edge condition;

in step S11, the video to be processed is composed of at least two frames of video frames, and the content of each frame of video may be different, i.e., there may be an area in each frame of video that can be used to embed the target picture.

The first video frame is a video frame which contains a first target area capable of being implanted into a target picture in the plurality of video frames. That is, the first target area is used as at least a part of the image area in the first video frame, and it can be understood that the area edge corresponding to the first target area satisfies the preset edge condition. For example, the first target area is an image area that has undergone at least one edge correction process, and an edge of the image area satisfies a preset edge condition that is set in advance as required, for example, after the edge correction process, a distance difference between edges respectively corresponding to two front and rear image areas is smaller than or equal to a preset difference.

Preferably, the first target area is at least a part of an image area of the first video frame that does not affect viewing, so that information in the target picture can be pushed to the user without affecting the viewing experience of the user. For example, the first target area may be a television screen included in the first video, a desk at a meeting, a smooth wall, an exterior facade of a building, or the like.

The shape of the first target region may be at least one of a circle, an ellipse, a triangle, a quadrangle, a pentagon or other polygons according to different target image implantation requirements.

Referring to fig. 2, as a possible implementation manner of this embodiment, the determining at least one first video frame from a video to be processed and determining a first target area in the first video frame includes:

s21: and determining at least one frame key frame from all video frames of the video to be processed.

S22: and inputting the key frame into a preset detection model for detection aiming at each key frame to obtain a detection result corresponding to the key frame.

S23: and if the key frame is determined to contain the image area meeting the preset condition according to the detection result, taking the key frame as the first video frame, and taking the image area meeting the preset condition in the key frame as the first target area.

In this embodiment, the key frame is a video frame extracted from each video frame of the video to be processed according to a preset extraction policy. The preset extraction strategy is used for describing a process of extracting key frames from the video to be processed. It is understood that the key frame may include an image region for implanting the target picture. For example, a video frame extracted according to the preset extraction policy includes a smooth rectangular wall, and the video frame may be used as a frame key frame.

The detection result is used for describing that each image area contained in the key frame is detected based on a preset detection model so as to determine whether a first target area which can be taken as an implantation target picture exists in the key frame. It can be understood that, through the detection result, whether the key frame includes an image region that can be used for implanting the target picture can be known, so as to provide a data basis for whether further processing needs to be performed on the image region. For example, it is described in the detection result of a key frame that the image area included in the key frame is a circular area, but not a rectangle, that is, when the first target area is required to be a circular area, it can be known from the detection result that the key frame can be used as a first video frame.

For example, the preset condition is that the first target area must be rectangular, but if the image area included in the key frame is known to be circular based on the detection result, the key frame cannot be used as the first video frame. Further, in some examples, the preset condition may be: the image area in the first video frame is an area where objects in a preset category and texture features meet preset requirements. For example, if the image area is a white wall surface with less texture features, the image area is an image area meeting a preset condition.

In this embodiment, because the preset detection model is obtained by training a video frame sample training set including an image region meeting a preset condition, when a key frame is input into the preset detection model for detection, the image region included in the key frame may be analyzed by the preset detection model, so as to determine whether the image region included in the key frame meets the preset condition according to an analysis result, thereby obtaining a corresponding detection result. And further determining whether the key frame can be used as the first video frame according to the detection result. When the key frame is detected through the preset detection model, only whether the image area contained in the key frame meets the preset condition needs to be paid attention to, namely, whether the image area can be used as the first target area, and the edge condition corresponding to the image area. The amount of computation is greatly reduced compared to the current approach of finding a sufficient amount of matching feature information in neighboring frames to determine whether a key frame can be the first video frame.

In practical application, it can be understood that a pre-trained preset detection model is stored in the terminal device in advance. The preset detection model is obtained by training an initial detection model based on a sample training set by using an object detection algorithm. The preset detection model can be trained by the terminal equipment in advance, and files corresponding to the preset detection model can be transplanted to the terminal equipment after other equipment is trained in advance. That is, the execution subject for training the preset detection model may be the same as or different from the execution subject for performing the first target region detection using the preset detection model. For example, when the initial preset detection model is trained by other devices, after the training of the initial preset detection model is finished by the other devices, the model parameters of the initial preset detection model are fixed to obtain a file corresponding to the preset detection model, and then the file is transplanted to the terminal device.

As a possible implementation manner of this embodiment, the determining at least one frame key frame from among the video frames of the video to be processed includes:

and determining the sampling rate according to the frame rate of the video to be processed.

And sampling the video to be processed according to the sampling rate to obtain at least one frame of the key frame.

In the present embodiment, the frame rate is the frequency at which bitmap images in units of frames appear continuously on the display. For example, the frame rate of the video to be processed may be 25fps or 30 fps.

The sampling rate is used to describe the number of video frames extracted from the video to be processed per second. For example, 5 frames per second, which means within one second, 5 consecutive video frames are extracted from the video to be processed, and each of the extracted video frames is used as a key frame.

It can be understood that, in order to facilitate rapid sampling of the video to be processed, the corresponding relationship between the frame rate of the video and the sampling rate is determined in advance, so when the frame rate of one video to be processed is known, the sampling rate can be determined according to the frame rate of the video to be processed, so as to facilitate sampling of the video to be processed according to the sampling rate to obtain at least one frame key frame. For example, when the frame rate of one to-be-processed video is determined to be 25fps, the corresponding sampling rate is 5 frames per second, so that the to-be-processed video can be sampled based on the sampling rate to obtain the key frame.

In some embodiments, at least one frame key frame is determined from each video frame of the video to be processed in a uniform sampling manner. The uniform sampling mode describes that a plurality of continuous video frames are extracted from each video frame of the video to be processed by adopting a set sampling rate.

For example, the sampling rate is 5 frames per second, that is, every 1 second, consecutive 5 frames of video frames are extracted from each video frame of the video to be processed, that is, 5 key frames are extracted.

In an embodiment, if it is determined that at least two first video frames exist as consecutive video frames in the video to be processed according to the detection result corresponding to each key frame, the at least two first video frames are used as a set of target video frame sets.

Referring to fig. 3, as a possible implementation manner of this embodiment, the determining at least one first video frame from a video to be processed and determining a first target area in the first video frame includes:

s31: from the video to be processed, at least one first video frame is determined.

S32: and if at least two first video frames exist as continuous video frames in the video to be processed, taking the at least two first video frames as a group of target video frame sets.

S33: for each group of target video frame sets, determining a first feature point of a first video frame in the first frame of the target video frame sets, where the first feature point is used to identify a first initial region in the first video frame of the first frame.

S34: and obtaining second characteristic points respectively corresponding to the first characteristic points in other first video frames through a visual tracking algorithm, wherein the other first video frames are first video frames in the target video frame set except for the first video frame of the first frame.

S35: and for each first feature point, determining the coordinates of the target feature point corresponding to the first feature point according to the coordinates of the first feature point in the first video frame of the first frame and the coordinates of the second feature point corresponding to the first feature point in other corresponding first video frames.

S36: and determining a first target area of each first video frame in the target video frame set according to the coordinates of the target feature point corresponding to each first feature point.

In this embodiment, the first feature point is used to identify a first initial region in the first video frame of the first frame. It can be understood that the first feature point is an edge pixel point corresponding to the first initial region in the first video frame of the first frame, and can be represented by the horizontal and vertical coordinates of the pixel point. For example, the first feature points at the four corners of the first initial region of the rectangle may be identified by the 4 first feature points.

In this embodiment, for each group of target video frame sets, a first feature point of a first video frame in a first frame in the target video frame sets is determined, and it is considered that an image region where a target picture can be implanted in the first video frame in the first frame, that is, a first initial region in the first video frame in the first frame, is preliminarily determined by the first feature point. Further, second feature points respectively corresponding to the first feature points in other first video frames are obtained through a visual tracking algorithm, and it is considered that first initial regions respectively corresponding to other first video frames are determined through the second feature points respectively corresponding to other first video frames. In addition, it is also considered that, because through the visual tracking algorithm, each first feature point in the first video frame of the first frame can be mapped to other first video frames, so that when the first target region in each first video frame is determined, the edge points around the same image region in each video frame can be rapidly processed respectively, so as to reduce the difficulty of determining the first target region in each first video frame.

In addition, because the first initial region is determined according to the first feature point in the first video frame of the first frame, and the first initial regions in the other first video frames are determined according to the second feature points corresponding to the first feature points in the other first video frames, but context video content information between the previous frame and the next frame is not considered yet, when each first video frame is played, there may be a problem that when the previous frame and the next frame are switched, content connection is not smooth enough, and there is a certain sense of incongruity. Therefore, for each first feature point, the coordinates of the target feature point corresponding to the first feature point are determined according to the coordinates of the first feature point in the first video frame of the first frame and the coordinates of the second feature point corresponding to the first feature point in the corresponding other first video frames, that is, the coordinates of the target feature point in each first video frame corresponding to the first feature point are determined, and then the coordinates of the target feature point corresponding to each first video frame are determined to determine the first target area of each first video frame.

It can be understood that, in a set of target video frame sets, the process of determining the coordinates of the target feature point corresponding to the first feature point specifically includes: and determining the coordinates of the target feature points in the first video frame of the first frame according to the coordinates of each first feature point in the first video frame of the first frame aiming at the coordinates of the target feature points in the first video frame of the first frame. And determining the coordinates of each target feature point in the first video frame according to the coordinates corresponding to the second feature point corresponding to the first feature point in the first video frame aiming at each first video frame except the first video frame of the first frame.

In some embodiments, the first target region of each first video frame is determined to be an image region satisfying a preset condition according to the coordinates of the target feature point corresponding to the first video frame, that is, the first video frame is a key frame including the image region satisfying the preset condition.

In some embodiments, for each first feature point, performing quadratic curve fitting according to the coordinates of the first feature point in the first video frame of the first frame and the coordinates of the second feature point corresponding to the first feature point in the other corresponding first video frames to obtain a quadratic curve equation. Substituting the abscissa or the ordinate of the first characteristic point into a quadratic curve fitting equation according to a quadratic curve equation and the coordinate of the first characteristic point, and finally calculating to obtain the coordinate of the target characteristic point of the first video frame of the first frame; and calculating the coordinates of the target feature points of other first video frames according to the quadratic curve equation and the coordinates of the second feature points corresponding to the first feature points in the corresponding other first video frames.

It can be understood that, when the abscissa of the first feature point is substituted into the quadratic curve fitting equation for calculation, the ordinate can be obtained through calculation, and the coordinate of the target feature point corresponding to the first feature point is represented by the abscissa of the first feature point and the calculated ordinate, and so on, to determine the coordinates of the target feature point in other video frames.

In this embodiment, a quadratic curve fitting is performed according to the coordinates of the first feature point in the first video frame of the first frame and the coordinates of the second feature point corresponding to the first feature point in the corresponding other first video frames, and the continuity problem of the same first feature point in different frames is considered as a curve fitting problem.

And the coordinates of the first characteristic point in the first frame of the first video frame and the coordinates of the second characteristic point corresponding to the first characteristic point in the other corresponding first video frames are obtained in the same coordinate system.

In practical application, the process of obtaining the quadratic curve equation by performing quadratic curve fitting using the coordinates of the first feature points and the coordinates of the second feature points may specifically refer to a related scheme in the prior art, and the coordinates of the target feature points corresponding to the first video frames are determined according to the quadratic curve equation, the coordinates of the first feature points, and the coordinates of the second feature points corresponding to the first feature points in the corresponding other first video frames, which may specifically refer to a related scheme in the prior art, and are not described herein again.

In an embodiment, if it is determined that at least one key frame is a first video frame from the video to be processed, for each first video frame, edge point information associated with the second initial region is obtained through an edge detection algorithm.

Referring to fig. 4, as a possible implementation manner of this embodiment, the determining at least one first video frame from a video to be processed and determining a first target area in the first video frame includes:

s41: determining at least one first video frame from the video to be processed, wherein the first video frame comprises a second initial area.

S42: and obtaining edge point information associated with the second initial area by an edge detection algorithm aiming at each first video frame.

S43: and obtaining a third initial area in the first video frame according to the edge point information.

S44: and obtaining a first edge point according to the edge point information, wherein the first edge point is an edge point of which the distance from the edge of the third initial area is less than a first distance threshold value.

S45: and determining a first target area in the first video frame according to the first edge point and the edge of the third initial area.

In this embodiment, the second initial region is an image region used for implanting the target picture before rectification, as at least a partial image region of the corresponding first video frame. In some embodiments, the second initial region is an image region corresponding to the implantable target picture marked by the user in the first video frame. For example, manually mark 4 feature points in the first video frame, and determine the second initial region by the 4 feature points.

The third initial region is an image region obtained by correcting the boundary of the second initial region as at least a partial image region of the corresponding first video frame.

The edge point information is used for describing a boundary corresponding to the second initial area in the first video frame. For example, the contour line corresponding to the region corresponding to the implantable target picture is described by the edge point information.

The first distance threshold is a distance between the first edge point and the edge of the third initial region, which is determined as required. By the first distance threshold and the edge point information, each first edge point having a distance equal to or less than a certain distance from the edge of the third initial region may be determined, so that, in consideration of the each first edge point and the third initial region, the edge of the third initial region is fitted to each first edge point to obtain a new edge of the third initial region. It is contemplated by the solution of the present application that the edge of the third initial region is updated in such a way that the edge of the third initial region is closer to the true boundary of the image area in the first video frame where the target picture is implantable, i.e. the boundary of the third initial region is rectified again.

It can be understood that, when the edge detection algorithm is used to detect the edge of the image region corresponding to the implantable target picture in the first video frame, the edge point information describing the image region is obtained, so that when the third initial region is determined according to the edge point information, the third initial region may be larger or smaller than the second initial region.

In this embodiment, when at least one first video frame is determined from the video to be processed, the second initial region included in each first video frame may be only a general region, that is, the region is not an image region that more satisfies the target picture embedding requirement. Therefore, it is further required to obtain, for each first video frame, edge point information associated with the second initial region through an edge detection algorithm, that is, edge point information corresponding to an image region corresponding to an implantable target picture in the first video frame is obtained through edge detection algorithm detection. And further, obtaining a third initial region in the first video frame according to the edge point information corresponding to the image region, namely, realizing the correction of the image region in the first video frame to obtain the third initial region.

In addition, because a certain error may also exist in the edge detection algorithm, when the image region corresponding to the target picture can be implanted in the positioned first video frame, the accuracy is still insufficient, so that the positioned image region cannot meet the requirement of picture implantation yet, therefore, according to the edge point information, a first edge point, that is, a first edge point at the boundary of the third initial region is determined, so as to obtain a new edge of the third initial region by fitting in consideration of the edge of the third initial region and the corresponding first edge point, that is, further determine the boundary of the image region corresponding to the target picture that can be implanted in the first video frame, thereby further improving the accuracy of the determined boundary of the image region corresponding to the target picture that can be implanted.

The edge detection algorithm may be one of a canny edge detection algorithm, a second-order edge detection algorithm, a Laplacian algorithm, or other edge detection algorithms. Preferably, the canny edge detection algorithm is used for edge detection in the embodiment of the present application.

Illustratively, the second initial region included in one first video frame is a rectangular region determined by manually labeling A, B, C and D, etc. 4 feature points in sequence. Further, edge point information between every two feature points is obtained through an edge detection algorithm, for example, the rectangular region is composed of 4 sides, such as an AB side, a BC side, a CD side, and a DA side, the edge point information corresponding to each side is obtained through the edge detection algorithm, the longest edge line segment corresponding to each side is obtained, the 4 edge line segments are extended to obtain intersection points, that is, 4 intersection points are obtained, and a third initial region is determined through the 4 intersection points. Further, according to edge point information corresponding to each edge, determining a first edge point, of which the distance to each edge of the third initial region is smaller than a first distance threshold, so as to perform straight line fitting according to the first edge point and each edge of the third initial region, obtain brand new 4 straight lines and 4 intersection points, and determine the first target region in the first video frame through the 4 intersection points.

In some embodiments, the first and second initial regions are the same, and the first initial region is different from the third initial region and the third initial region.

For each first video frame included in each set of target video frames: obtaining, by an edge detection algorithm, edge point information associated with a second initial region in the first video frame; obtaining a third initial area in the first video frame according to the edge point information; and then, determining the coordinates of the target feature point of the first video frame according to the first edge point and the edge of the third initial area, and then determining a first target area according to the coordinates of the target feature point of the first video frame.

In one embodiment, the second initial area is a rectangular area.

When the edge point information related to the second initial area is obtained through the edge detection algorithm for each first video frame, in order to obtain better edge point information, the vertical and horizontal edges of the first video frame processed through the edge detection algorithm are preset, that is, the edges in the vertical direction and the horizontal direction of the second initial area can be processed repeatedly, so that more edge point information related to the second initial area can be obtained. Further, according to the edge point information, a longest edge corresponding to each edge of the second initial region can be obtained, the longest edge corresponding to each edge is extended to obtain an intersection point between every two edges, and a third initial region in the first video frame is determined according to the obtained 4 intersection points, that is, the determined third initial region is surrounded by 4 straight line edges.

Further, in order to improve the accuracy of the determined first target area in the first video frame, for each straight line edge of a third initial area, a first edge point, whose distance from the edge of the third initial area is smaller than a first distance threshold, is obtained, and according to a straight line fitting algorithm, straight line fitting is performed on the first edge point and the edge of the third initial area, so as to obtain a new straight line edge. And acquiring intersection points of the four new straight line edges according to the four new straight line edges corresponding to the third initial region, namely acquiring four new intersection points, and determining the first target region according to the four new intersection points.

Wherein the first edge point and the edge of the third initial region may be fitted to a straight line using a least squares method. For a specific fitting process, a scheme of performing straight line fitting by using a least square method in the prior art may be referred to, and details are not described herein.

Referring to fig. 5, as a possible implementation manner of this embodiment, the determining a first target area in the first video frame according to the first edge point and the edge of the third initial area includes:

s51: and determining a fourth initial area in the first video frame according to the first edge point and the edge of the third initial area.

S52: updating the first distance threshold after determining the fourth initial region.

S53: and according to the updated first distance threshold, re-executing the step of obtaining the first edge point according to the edge point information and the step of determining the fourth initial area in the first video frame according to the first edge point and the edge of the third initial area until the re-executed times reach a preset number of times, or until the obtained fourth initial area meets a preset area condition, and taking the obtained fourth initial area as the first target area in the first video frame.

In this embodiment, the fourth initial region is at least a part of the image region in the first video frame.

After the fourth initial area is determined, updating the first distance threshold according to a preset updating strategy. Wherein the preset updating strategy is used for describing the process of updating the first distance threshold. For example, after determining the fourth initial region in the first video frame, the first distance threshold is updated according to a preset update policy "updated first distance threshold is equal to first distance threshold/1.5". For example, the first distance threshold before updating is 3, and after the first distance threshold is updated according to the preset update policy, the updated first distance threshold is 2.

In this embodiment, after the fourth initial region is determined, the first distance threshold is updated, the step of obtaining the first edge point according to the edge point information and the step of determining the fourth initial region in the first video frame according to the first updated distance threshold are executed again according to the updated first distance threshold, and it is considered that the edge of the fourth initial region is updated in such a way that the edge of the fourth initial region is closer to the boundary of the image region in the first video frame where the target picture can be implanted, that is, the accuracy of determining the region for the target picture can be improved. The preset times are the step of obtaining a first edge point according to the edge point information determined according to the requirement, and the maximum times of determining a fourth initial area in the first video frame according to the first edge point and the edge of the third initial area. The preset number of times is set, it is considered that the operation of determining the fourth initial area is stopped when the number of times of re-execution reaches the preset number of times.

The preset region condition describes that when the fourth initial region can be used as the first target region in the first video frame, a difference value between the updated first distance threshold and the first distance threshold before updating is less than or equal to a preset difference value.

For example, if the first distance threshold before updating is 3, the first distance threshold after updating is 2, that is, the difference therebetween is 1, and the preset difference is set to be 1, that is, the fourth initial area determined at the next time can be used as the first target area in the first video frame, the operation of updating the edge of the fourth initial area is stopped.

In some embodiments, the preset region condition is the same as the preset edge condition.

In some embodiments, the first and third initial regions are the same, and the first initial region is different from the second and fourth initial regions.

In some embodiments, the first and fourth initial regions are the same, and the first initial region is different from the second and third initial regions.

And determining the coordinates of the target feature point of the first video frame according to the first edge point and the edge of the third initial region, and determining a fourth initial region according to the coordinates of the target feature point of the first video frame.

S12: and aiming at each first video frame, obtaining a second video frame according to a target picture and edge information of a first target area in the first video frame, wherein the second video frame comprises the target picture.

In step S12, the target picture is a picture selected by the user and to be embedded in each video frame. For example, an advertisement picture in a first target area in a first video frame is to be implanted.

In this embodiment, for each of the first video frames, according to edge information of a first target area in the first video frame, a position and a picture size of an implantable target picture in the first video frame are determined, so that the target picture is adaptively processed according to the determined position and picture size, and the processed target picture is implanted into the first target area of the first video frame to obtain a second video frame.

As a possible implementation manner of this embodiment, the obtaining, for each of the first video frames, a second video frame according to a target picture and edge information of a first target area in the first video frame includes:

and for each first video frame, performing target processing on the target picture according to the image effect information of the first target area in the first video frame to obtain a processed target picture, wherein the target processing is used for matching the image effect of the target picture with the image effect of the first target area.

And obtaining a second video frame corresponding to the first video frame according to the processed target picture and the edge information of the first target area in the first video frame.

In this embodiment, the image effect information is used to describe an image effect of the first target region before the target picture is implanted into the first target region of the first video frame. For example, the first target region exhibits a clear, wrinkle-free, etc. image effect.

Therefore, in order to avoid that when a target picture is directly implanted into a first target region in a first video frame, the region after the target picture is implanted exhibits an unnatural image effect such as blur, wrinkle or lack of light and shadow for the target picture, it is further necessary to perform target processing on the target picture according to image effect information of the first target region in the first video frame, so that the image effect of the target picture matches the image effect of the first target region, and it is considered that when a second video frame corresponding to the first video frame is obtained according to the processed target picture and edge information of the first target region in the first video frame, the second video frame looks more natural and has no sense of incongruity.

In some embodiments, after determining the first target region of the first video frame, the fusion of the target picture and the first video frame may be accomplished using a poisson fusion algorithm and a free-form field algorithm.

S13: and obtaining a target video according to each second video frame.

In one embodiment, the target picture may be an advertisement picture.

By the scheme, the advertisement scenes in the video can be automatically extracted from the video to be processed. Namely, at least one first video frame in which an advertisement picture can be implanted is extracted from a video to be processed, a first target area in the first video frame is determined, and then the first target area is used as an advertisement scene. And then, according to the target picture and the edge information of the first target area in the first video frame, obtaining second video frames, and according to each second video frame, obtaining a target video, namely completing the operation of implanting the advertisement picture into the video. In addition, the embodiment can be applied to various occasions such as video advertisement insertion, post-advertisement implantation, commodity implantation and the like, and has wide application range.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 6 shows a block diagram of a video processing apparatus provided in the embodiment of the present application, corresponding to the video processing method described in the above embodiment, and only the relevant parts of the embodiment of the present application are shown for convenience of description.

Referring to fig. 6, the apparatus 100 includes:

a determining module 101, configured to determine at least one first video frame from a video to be processed, and determine a first target area in the first video frame, where an area edge corresponding to the first target area meets a preset edge condition;

a first obtaining module 102, configured to, for each first video frame, obtain a second video frame according to a target picture and edge information of a first target area in the first video frame, where the second video frame includes the target picture;

and the second obtaining module 103 is configured to obtain a target video according to each of the second video frames.

Optionally, the determining module 101 is further configured to determine at least one frame key frame from each video frame of the video to be processed; inputting the key frames into a preset detection model for detection aiming at each key frame to obtain a detection result corresponding to the key frame; and if the video frame is determined to contain the image area meeting the preset condition according to the detection result, taking the key frame as the first video frame, and taking the image area meeting the preset condition in the key frame as the first target area.

Optionally, the determining module 101 is further configured to determine a sampling rate according to the frame rate of the video to be processed, and sample the video to be processed according to the sampling rate to obtain at least one frame of the key frame.

Optionally, the determining module 101 is further configured to determine at least one first video frame from the video to be processed; if at least two first video frames exist as continuous video frames in the video to be processed, taking the at least two first video frames as a group of target video frame set; for each group of target video frame sets, determining a first feature point of a first video frame in the first frame of the target video frame sets, wherein the first feature point is used for identifying a first initial area in the first video frame of the first frame; obtaining second feature points respectively corresponding to the first feature points in other first video frames through a visual tracking algorithm, wherein the other first video frames are first video frames in the target video frame set except for a first video frame of a first frame; for each first feature point, determining coordinates of a target feature point corresponding to the first feature point according to coordinates of the first feature point in the first video frame of the first frame and coordinates of a second feature point corresponding to the first feature point in other corresponding first video frames; and determining a first target area of each first video frame in the target video frame set according to the coordinates of the target feature point corresponding to each first feature point.

Optionally, the determining module 101 is further configured to determine at least one first video frame from the video to be processed, where the first video frame includes a second initial region; for each first video frame, obtaining edge point information associated with the second initial area through an edge detection algorithm; obtaining a third initial area in the first video frame according to the edge point information; obtaining a first edge point according to the edge point information, wherein the first edge point is an edge point of which the distance from the edge of the third initial area is less than a first distance threshold value; and determining a first target area in the first video frame according to the first edge point and the edge of the third initial area.

Optionally, the determining module 101 is further configured to determine a fourth initial area in the first video frame according to the first edge point and the edge of the third initial area; after determining the fourth initial region, updating the first distance threshold; and according to the updated first distance threshold, re-executing the step of obtaining the first edge point according to the edge point information and the step of determining the fourth initial area in the first video frame according to the first edge point and the edge of the third initial area until the re-executed times reach a preset number of times, or until the obtained fourth initial area meets a preset area condition, and taking the obtained fourth initial area as the first target area in the first video frame.

Optionally, the first obtaining module 102 is configured to, for each first video frame, perform target processing on the target picture according to image effect information of a first target area in the first video frame, to obtain a processed target picture, where the target processing is used to match an image effect of the target picture with an image effect of the first target area; and obtaining a second video frame corresponding to the first video frame according to the processed target picture and the edge information of the first target area in the first video frame.

The video processing apparatus provided in this embodiment is configured to implement a video processing method in any method embodiment, where the functions of each module may refer to corresponding descriptions in the method embodiments, and the implementation principle and technical effect thereof are similar, and are not described herein again.

Fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 7, the terminal device 7 of this embodiment includes: at least one processor 70 (only one processor is shown in fig. 7), a memory 71, and a computer program 72 stored in the memory 71 and executable on the at least one processor 70, the processor 70 implementing the steps in any of the various video processing method embodiments described above when executing the computer program 72.

The terminal device 7 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 70, a memory 71. Those skilled in the art will appreciate that fig. 7 is only an example of the terminal device 7, and does not constitute a limitation to the terminal device 7, and may include more or less components than those shown, or combine some components, or different components, for example, and may further include input/output devices, network access devices, and the like.

The Processor 70 may be a Central Processing Unit (CPU), and the Processor 70 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), off-the-shelf Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 71 may in some embodiments be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7. In other embodiments, the memory 71 may also be an external storage device of the terminal device 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal device 7. The memory 71 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 71 may also be used to temporarily store data that has been output or is to be output.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.

The embodiments of the present application provide a computer program product, which when running on a terminal device, enables the terminal device to implement the steps in the above method embodiments when executed.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to an apparatus/terminal device, recording medium, computer Memory, Read-Only Memory (ROM), Random-Access Memory (RAM), electrical carrier wave signals, telecommunications signals, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A video processing method, comprising:

for each first video frame, obtaining a second video frame according to a target picture and edge information of a first target area in the first video frame, wherein the second video frame comprises the target picture;

and obtaining a target video according to each second video frame.

2. The video processing method of claim 1, wherein said determining at least one first video frame from the video to be processed and determining a first target area in the first video frame comprises:

determining at least one frame key frame from each video frame of the video to be processed;

inputting the key frames into a preset detection model for detection aiming at each key frame to obtain a detection result corresponding to the key frame;

and if the key frame is determined to contain the image area meeting the preset condition according to the detection result, taking the key frame as the first video frame, and taking the image area meeting the preset condition in the key frame as the first target area.

3. The video processing method according to claim 2, wherein said determining at least one key frame from among the video frames of the video to be processed comprises:

determining a sampling rate according to the frame rate of the video to be processed;

4. The video processing method of claim 1, wherein said determining at least one first video frame from the video to be processed and determining a first target area in the first video frame comprises:

determining at least one first video frame from the video to be processed;

if at least two first video frames exist as continuous video frames in the video to be processed, taking the at least two first video frames as a group of target video frame set;

for each group of target video frame sets, determining a first feature point of a first video frame in the first frame of the target video frame sets, wherein the first feature point is used for identifying a first initial area in the first video frame of the first frame;

obtaining second feature points respectively corresponding to the first feature points in other first video frames through a visual tracking algorithm, wherein the other first video frames are first video frames in the target video frame set except for a first video frame of a first frame;

for each first feature point, determining coordinates of a target feature point corresponding to the first feature point according to coordinates of the first feature point in the first video frame of the first frame and coordinates of a second feature point corresponding to the first feature point in other corresponding first video frames;

and determining a first target area of each first video frame in the target video frame set according to the coordinates of the target feature point corresponding to each first feature point.

5. The video processing method of claim 1, wherein said determining at least one first video frame from the video to be processed and determining a first target area in the first video frame comprises:

determining at least one first video frame from a video to be processed, wherein the first video frame comprises a second initial area;

for each first video frame, obtaining edge point information associated with the second initial area through an edge detection algorithm;

obtaining a third initial area in the first video frame according to the edge point information;

obtaining a first edge point according to the edge point information, wherein the first edge point is an edge point of which the distance from the edge of the third initial area is less than a first distance threshold value;

and determining a first target area in the first video frame according to the first edge point and the edge of the third initial area.

6. The video processing method of claim 5, wherein said determining a first target region in the first video frame based on the first edge point and the edge of the third initial region comprises:

determining a fourth initial area in the first video frame according to the first edge point and the edge of the third initial area;

after determining the fourth initial region, updating the first distance threshold;

and according to the updated first distance threshold, re-executing the step of obtaining the first edge point according to the edge point information and the step of determining the fourth initial area in the first video frame according to the first edge point and the edge of the third initial area until the re-executed times reach a preset number of times, or until the obtained fourth initial area meets a preset area condition, and taking the obtained fourth initial area as the first target area in the first video frame.

7. The video processing method according to any of claims 1 to 6, wherein said obtaining, for each of the first video frames, a second video frame according to a target picture and edge information of a first target area in the first video frame comprises:

for each first video frame, performing target processing on the target picture according to image effect information of a first target area in the first video frame to obtain a processed target picture, wherein the target processing is used for enabling the image effect of the target picture to be matched with the image effect of the first target area;

8. A video processing apparatus, comprising:

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the video processing method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the video processing method according to any one of claims 1 to 7.