CN110290425B

CN110290425B - Video processing method, device and storage medium

Info

Publication number: CN110290425B
Application number: CN201910691577.4A
Authority: CN
Inventors: 段聪; 吴江红
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-07-29
Filing date: 2019-07-29
Publication date: 2023-04-07
Anticipated expiration: 2039-07-29
Also published as: CN110290425A

Abstract

The invention provides a video processing method, a video processing device and a storage medium; the method comprises the following steps: acquiring a target video; responding to a segmentation operation aiming at a target object in the target video, and acquiring a foreground video taking the target object as a foreground from the target video; the foreground video comprises at least one foreground video frame; acquiring a background video; the background video comprises at least one background video frame; and responding to the synthesis operation aiming at the foreground video and the background video, overlapping the foreground video frame in the foreground video and the background video frame in the background video, and packaging the video frame obtained by overlapping into a synthesized video. According to the invention, the dynamic video can be synthesized.

Description

Video processing method, device and storage medium

Technical Field

The present application relates to multimedia technologies, and in particular, to a video processing method and apparatus, and a storage medium.

Background

With the continuous development of communication and mobile internet, the era mainly taking characters and pictures has become past, live webcast and short video services have rapidly increased, the occurrence of various video application programs has greatly reduced the threshold of people for making videos, and more users have begun to participate in video creation.

However, the video creation scheme in the related art can only synthesize a static object into a template video, and cannot synthesize a dynamic video.

Disclosure of Invention

The embodiment of the invention provides a video processing method, a video processing device and a storage medium, which can be used for synthesizing dynamic videos.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides a video processing method, which comprises the following steps:

acquiring a target video;

responding to a segmentation operation aiming at a target object in the target video, and acquiring a foreground video taking the target object as a foreground from the target video; the foreground video comprises at least one foreground video frame;

acquiring a background video; the background video comprises at least one background video frame;

in response to a compositing operation for the foreground video and the background video, superimposing foreground video frames in the foreground video with background video frames in the background video, and

and packaging the video frames obtained by superposition into a composite video.

An embodiment of the present invention provides a video processing apparatus, including:

a first acquisition unit configured to acquire a target video;

a segmentation unit, configured to, in response to a segmentation operation for a target object in the target video, obtain a foreground video with the target object as a foreground from the target video; the foreground video comprises at least one foreground video frame;

a second obtaining unit, configured to obtain a background video; the background video comprises at least one background video frame;

a synthesizing unit, configured to superimpose a foreground video frame in the foreground video and a background video frame in the background video in response to a synthesizing operation for the foreground video and the background video, and

In the foregoing aspect, the dividing unit is further configured to:

receiving a batch segmentation operation for at least two target videos;

and responding to the batch segmentation operation, acquiring a video fragment taking the target object as a foreground from each target video, and determining the video fragment as a corresponding foreground video.

In the above scheme, the synthesis unit is further configured to:

receiving a batch compositing operation for at least two of the foreground videos and the background videos;

and responding to the batch synthesis operation, and respectively overlaying the foreground video frames in the at least two foreground videos to the background video frames in the background video.

In the foregoing solution, the second obtaining unit is further configured to:

loading a video selection window displaying an alternative background video;

receiving a video selection operation for the video selection window;

and acquiring the background video selected by the video selection operation.

In the above solution, the apparatus further comprises: a preview unit configured to:

presenting an overlay effect of the foreground video frames and the background video frames in response to a preview operation for the foreground video and the background video.

In the foregoing solution, the dividing unit is further configured to:

identifying a target area where the target object is located from a video frame of the target video, and carrying out transparency processing on an area outside the target area in the video frame;

and packaging the video frame after the transparentization processing into the foreground video.

In the foregoing aspect, the dividing unit is further configured to:

identifying a target area where the target object is located in a video frame of the target video, and obtaining an image matrix corresponding to the video frame of the target video according to the target area; elements in the image matrix respectively represent the probability that the pixels of the corresponding video frame belong to the target area;

and performing mask processing on the image matrix and the corresponding video frame to transparentize the region except the target region in the video frame.

In the foregoing solution, the synthesis unit is further configured to:

acquiring the timestamp alignment relation of the foreground video frame and the background video frame;

and overlapping the foreground video frame in the foreground video and the background video frame corresponding to the timestamp alignment relation in the background video.

In the foregoing solution, the synthesis unit is further configured to:

and in response to the editing operation for setting the synthesis parameters for the foreground video and the background video, covering the foreground video frame on the background video frame, wherein the covering area of the foreground video frame in the background video frame conforms to the set synthesis parameters.

In the foregoing solution, the synthesis unit is further configured to:

constructing an initial matrix with the same size as the foreground video frame;

and adjusting elements in the initial matrix according to the editing operation to obtain a target matrix representing the variation of the set synthesis parameters.

In the above scheme, the synthesis unit is further configured to:

multiplying the target matrix with a foreground video frame in the foreground video to obtain an adjusted foreground video frame;

and covering the background video frame with the adjusted foreground video frame.

a memory for storing executable instructions;

and the processor is used for realizing the video processing method provided by the embodiment of the invention when executing the executable instructions stored in the memory.

The embodiment of the invention provides a storage medium, which stores executable instructions and is used for causing a processor to execute the executable instructions so as to realize the video processing method provided by the embodiment of the invention.

And segmenting a target object in the target video from the target video by taking the target object as a foreground video, and encapsulating a video frame synthesized by a foreground video frame of the segmented foreground video and a background video frame of a background video into a synthesized video, so that based on the content of the video, the target object in the target video is taken as the foreground and the video frame of the background video is taken as the background to synthesize a new video, and a dynamic video with coordinated picture content is obtained.

Drawings

FIG. 1 is an alternative architectural diagram of a video processing system architecture provided by embodiments of the present invention;

fig. 2 is a schematic diagram of an alternative structure of a video processing apparatus according to an embodiment of the present invention;

fig. 3 is a schematic flow chart of an alternative video processing method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an alternative display interface provided by an embodiment of the invention;

FIG. 5A is a schematic diagram of an alternative overlay effect provided by an embodiment of the present invention;

FIG. 5B is a schematic diagram of an alternative overlay effect provided by an embodiment of the present invention;

fig. 6 is a schematic flow chart of an alternative video processing method according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of an alternative training sample provided by an embodiment of the present invention;

FIG. 8 is a schematic diagram of an alternative editing interface provided by embodiments of the present invention;

FIG. 9 is a schematic diagram of an alternative editing interface provided by embodiments of the present invention;

fig. 10 is a schematic diagram of an alternative coding and decoding architecture of a video encoder according to an embodiment of the present invention;

fig. 11 is a schematic flow chart of an alternative video processing method in the related art;

fig. 12 is a diagram illustrating a composite effect of a video processing method in the related art;

fig. 13 is a schematic flow chart of an alternative video processing method in the related art;

fig. 14 is a schematic flow chart of an alternative video processing method according to an embodiment of the present invention;

fig. 15 is a schematic flow chart of an alternative video processing method according to an embodiment of the present invention;

fig. 16 is a schematic diagram of an alternative display interface provided by the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments that can be obtained by a person skilled in the art without making creative efforts fall within the protection scope of the present invention.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.

1) Backgrounds, scenes behind subjects in a picture of a video, can represent the spatiotemporal environment in which a person or an event is located, such as a building, a wall, a floor, etc. behind the person.

2) Foreground, content in the video frame closer to the shot than background, is the subject of the video presentation, e.g., a person standing in front of a building.

3) And the target video is used for extracting the foreground video during video synthesis.

4) And the background video is used for extracting the background video when video synthesis is carried out.

5) And (3) superposing, namely taking partial areas in one (or a plurality of) images as a foreground, and taking the other image as a background to synthesize to obtain a new image. Such as: and synthesizing a certain area in the image A with the image B to obtain an image C. Here, the image may be a video frame in a video.

6) The mask is an image matrix for masking (some or all pixels) in the image to be processed to highlight a portion in a specific image. The mask may be a two-dimensional matrix array, sometimes with multi-valued matrix data.

7) The masking process is a process of masking (e.g., transparentizing) some regions in the image based on a mask. Each pixel in the image is anded with a binary number (also called a mask) at the same location in the mask, such as 1&1=1;1&0=0.

8) And packaging, namely converting a plurality of video frames into a video file based on a certain frame rate and a certain video format. Wherein, the frame rate represents the number of frames per second, such as: 25 frames per second (fps), 60fps, etc. The video format may include: matroska Multimedia Container (MLV), audio Video Interleaved format (AVI), moving Picture Experts Group (MPEG) -4, and other Video file formats.

First, a technical solution for video synthesis in the related art is explained.

Technical scheme 1) synthesis of static image and dynamic video

And performing AI segmentation on the static image, segmenting a region corresponding to the target object, and fusing the segmented region serving as a background video serving as a background to obtain a synthesized video. Here, the fused object is a still image and a moving video, and the target object in the synthesized video is still, that is, the target object in each video frame in the synthesized video is still.

Technical solution 2) integral splicing of video pictures

The video frames of the two videos are spliced together left and right to form a larger video, and the images of the combined video are not fused in content in order to process the background of the videos.

In view of the problems in the foregoing technical solutions, embodiments of the present invention provide a video processing method, in which a target object in a target video is segmented from the target video, and a video frame obtained by combining a foreground video frame of the segmented foreground video and a background video frame of a background video is encapsulated into a composite video, so that a dynamic video is synthesized based on the content of the video, and a dynamic video with coordinated picture content is obtained.

An exemplary application of a video processing apparatus implementing an embodiment of the present invention is described below, where the video processing apparatus provided by the embodiment of the present invention may be integrated into various forms of electronic devices, and the electronic devices provided by the embodiment of the present invention may be implemented as various terminals, such as mobile terminals with wireless communication capabilities, e.g., mobile phones (mobile phones), tablet computers, notebook computers, and the like, and also as desktop computers, and the like. The electronic device may be implemented as one server or a server cluster including a plurality of servers, and is not limited herein.

Referring to fig. 1, fig. 1 is an alternative architecture diagram of a video processing system 100 according to an embodiment of the present invention, a terminal 400 is connected to a server 200 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of both, and uses a wireless link to implement data transmission. A video processing application is run in the terminal 400 and is provided with an interface 410 to receive the user's associated operations for the composite video.

Taking the video processing apparatus provided by the embodiment of the present invention as an example, in an exemplary application, when the terminal 400 needs to synthesize a video, the target video and the background video may be videos recorded by the terminal, and at this time, the terminal 400 may send the target video and the background video to the server to request the server 200 to perform video synthesis. At this time, after receiving the target video and the background video, the server 200 segments the target object in the target video by using the video processing method provided in the embodiment of the present invention, superimposes the foreground video frame in the foreground video and the background video frame in the background video with the segmented foreground video as the foreground and the background video as the background, encapsulates the superimposed video frames to obtain the composite video, and finally sends the encapsulated composite video to the terminal 400.

For example: as shown in fig. 1, the target video is a video 101, the background video is a video 102, the terminal 400 sends the video 101 and the video 102 to the server 200, the server 200 extracts a foreground video 104 taking a portrait 103 as a foreground from the video 101, and superimposes a foreground video frame 1041 (including 1041-1 to 1041-n) of the foreground video 104 and a background video frame 1021 (including 1021-1 to 1021-n) of the background video 102 respectively to obtain a video frame 1051 (including 1051-1 to 1051-n) of the composite video 105, where n is an integer greater than 1.

In still another exemplary application in which the video processing apparatus provided by the embodiment of the present invention is provided in the server 200, when the terminal 400 needs to synthesize a video, the identification information of the target video and the background video may be transmitted to the server 200. The server 200 determines the corresponding target video and the background video based on the received identification information, and by using the video processing method provided by the embodiment of the present invention, the target object in the target video is divided, the divided foreground video is used as the foreground, the background video is used as the background, the foreground video frame in the foreground video and the background video frame in the background video are overlapped, the overlapped video frames are encapsulated to obtain the composite video, and finally, the encapsulated video is sent to the terminal 400. The terminal 400 may release the synthesized video.

In an example in which the terminal 400 is used as an electronic device, the target video and the background video may be video files already packaged in the terminal 400, and the terminal 400 superimposes a foreground video frame in the foreground video and a background video frame in the background video by using the method for video processing provided in the embodiment of the present invention, and packages the superimposed video frames to obtain a synthesized video file.

In the above, the server and the terminal are respectively provided with the video processing apparatus provided by the embodiment of the present invention as an example for description, it can be understood that the video processing apparatus provided by the embodiment of the present invention may be distributed in the terminal and the server, so that the terminal and the server cooperate to complete the video processing method provided by the embodiment of the present invention.

It should be noted that, in the embodiment of the present invention, the types of the target video and the background video may be the same or different. Such as: the target video and the background video are both packaged video files. For another example: the target video is a video stream, and the background video is an encapsulated video file.

The video processing apparatus provided by the embodiment of the present invention may be implemented as hardware, software, or a combination of hardware and software.

As an example of a software implementation, the video processing apparatus may include one or more software modules, which are used to implement the video processing method provided by the embodiment of the present invention, individually or cooperatively, and the software modules may adopt various programming languages of various front-end or back-end.

As an example of a hardware implementation, the video processing apparatus may include one or more hardware modules, and the hardware modules may employ a hardware decoder (e.g., an ASIC, an Application Specific Integrated Circuit), a Complex Programmable Logic Device (CPLD), a Field Programmable Gate Array (FPGA), etc., which is programmed to implement the video processing method provided by the embodiment of the present invention individually or cooperatively.

An exemplary implementation of the video processing apparatus provided in the embodiments of the present invention is described below by combining software and hardware.

Referring to fig. 2, fig. 2 is a schematic diagram of an alternative structure of the video processing apparatus 20 according to an embodiment of the present invention, and according to the structure of the video processing apparatus 20 shown in fig. 2, other exemplary structures of the video processing apparatus 20 can be foreseen, so that the structure described herein should not be considered as a limitation, for example, some components described below may be omitted, or components not described below may be added to adapt to special requirements of some applications.

The video processing apparatus 20 shown in fig. 2 includes: at least one processor 210, memory 240, at least one network interface 220, and a user interface 230. The various components in the video processing device 20 are coupled together by a bus system 250. It will be appreciated that the bus system 250 is used to enable communications among the components. The bus system 250 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are designated as bus system 250 in FIG. 2.

The memory 240 may be either volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The non-volatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 240 described in connection with embodiments of the present invention is intended to comprise any suitable type of memory.

The memory 240 in the embodiment of the present invention can store data to support the operation of the server 200. Examples of such data include: any computer program for operating on video processing device 20, such as an operating system and application programs. The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application program may include various application programs.

As an example of the video processing method provided by the embodiment of the present invention implemented by combining software and hardware, the method provided by the embodiment of the present invention may be directly embodied as a combination of software modules executed by the processor 210, where the software modules may be located in a storage medium located in the memory 240, and the processor 210 reads executable instructions included in the software modules in the memory 240, and completes the video processing method provided by the embodiment of the present invention in combination with necessary hardware (for example, including the processor 210 and other components connected to the bus 250).

A video processing method implementing an embodiment of the present invention will be described in connection with the foregoing exemplary application and implementation of a video processing apparatus implementing an embodiment of the present invention. It is to be understood that the video processing method shown in fig. 3 may be executed by various electronic devices, such as a terminal or a server, or alternatively, by the terminal and the server in cooperation.

Referring to fig. 3, fig. 3 is an alternative flow chart of a video processing method according to an embodiment of the present invention, which will be described with reference to the steps shown in fig. 3.

Step S301, a target video is acquired.

The target video may be an already packaged video file. The target video may also be a video stream, such as: streaming media data of live video.

The number of target videos may be one or more.

A video processing application program runs on the terminal, a target video selection window is provided in the video processing application program, and identification information of candidate target videos is provided in the target video selection window, for example: video thumbnails, video names, etc. And the terminal receives the selection operation of the user and takes the video corresponding to the identification information selected by the selection operation as the target video. When the electronic device is a server, the video processing application program running on the terminal is a client of the server.

For the target video selected by the user, the target video can be presented on the terminal, so that the user can preview the selected target video to determine whether the selected target video is the required target video, and if the selected target video is not the required target video, the target video can be reselected based on the target video selection window.

For example, the target video selection window may be as shown in a window 401 in fig. 4, an icon 401, an icon 402, and an icon 403 in the window 401 are icons of candidate target videos, respectively, and when the operation selection icon 402 is selected, a video corresponding to the icon 402 is the target video. The window 401 further includes: more options 404 of more alternative target videos are triggered, and when the more options 404 receive the touch operation of the user, identification information of more alternative target videos can be presented. After the target video is acquired, the window 405 may be used as a preview window, and a picture of the target video may be presented in the window 405.

Step S302, in response to a segmentation operation for a target object in the target video, a foreground video with the target object as a foreground is acquired from the target video.

A segmentation portal for receiving a segmentation operation for a target object may be loaded on the terminal. The terminal can generate a segmentation instruction for acquiring a foreground video taking the target object as a foreground from the target video based on the received segmentation operation.

In an example, when the electronic device is a terminal, the terminal locally acquires a foreground video from a target video based on the segmentation instruction.

In another example, when the electronic device is a server, the terminal sends the segmentation instruction to the server, and the server acquires the foreground video from the target video based on the received segmentation instruction.

The electronic device may invoke an interface of a video encoder based on the segmentation instruction, input the target video into the video encoder based on the invoked interface, and decompose the target video frame into video frames through the video encoder. The electronic equipment carries out image recognition on each video frame, identifies a target object from each video frame, and obtains a foreground video frame forming a foreground video based on the area where the target object is located. The target object may be an object in the foreground in a video frame of a target video, such as a person, animal, etc. The video frames that make up the foreground video, which includes at least one foreground video frame, are referred to herein as foreground video frames.

When identifying a video with a target object as a foreground from a target video, the target object in the video frame of the target video can be identified by at least one of the following modes:

identification mode I and calibration mode

And receiving the calibration operation of the user for the video frame in the target video, and determining the object calibrated by the calibration operation of the user as the target object. The target calibrated by the calibration operation of the user may be a specific target, for example, one of a plurality of people, or may be a type of target, for example, a male or a female.

Identification mode II and automatic identification of image identification model

The foreground (e.g. human, animal) of the video frame is automatically identified as the target object by the image identification model.

Step S303, a background video is acquired.

The background video may be an already packaged video file. The background video may also be a video stream.

The video processing application program running on the terminal can be provided with a video selection window to receive a video selection operation of selecting the background video by a user, and the identification information of the background video is determined based on the video selection operation of the user.

Herein, a video frame in a background video is referred to as a background video frame, and the background video includes at least one background video frame.

It should be noted that, in the embodiment of the present invention, the execution order of step S301, step S302, and step S303 is not sequential, and step S301, step S302, and step S303 may be executed first.

Step S304, responding to the synthesis operation aiming at the foreground video and the background video, overlapping the foreground video frame in the foreground video and the background video frame in the background video, and packaging the video frame obtained by overlapping into a synthetic video.

A video processing application running on the terminal may be provided with an interactive portal that triggers video compositing to receive a compositing operation indicating compositing of the foreground video with the background video, and generate a compositing instruction based on the compositing operation.

When the electronic equipment is a server, the terminal sends the synthesis instruction to the server, and the server performs superposition of foreground video frames in the foreground video and background video frames in the background video based on the synthesis instruction so as to realize synthesis of the foreground video and the background video.

As shown in fig. 5A, for example: the foreground video includes: when the video a 'and the background video are the video D, the video frame of the video a' and the background video frame in the background video are superimposed, and the superimposition effect can be as shown in fig. 5A, where a background area 501 is a picture of the video D, and an object 502 is an area corresponding to an object a in a foreground in the video a.

And the electronic equipment superposes the foreground video frame of the foreground video and the background video frame of the background video according to the synthesis parameters. Here, the relative position and/or relative imaging size of the target object in the target video may be used as the synthesis parameter, and the user's editing operation may also be received based on the editing page, so that the user adjusts the synthesis parameter.

In the embodiment of the present invention, the operation modes of the user operations such as the selection operation, the division operation, and the composition operation may be: touch, voice, gesture, etc., and the embodiment of the present invention does not limit the operation mode of the user at all.

In the video processing method provided by the embodiment of the invention, a foreground video taking a target object as a foreground is segmented from a target video, and a video frame obtained by overlapping a foreground video frame of the segmented foreground video and a background video frame of a background video is encapsulated into a composite video, so that dynamic videos are synthesized based on the content of the videos to obtain the dynamic videos with coordinated picture contents. Based on an image segmentation technology, a target object is extracted from one video in real time and is synthesized with the other video, so that the automatic fusion effect of the two videos is realized, the video production efficiency of a user can be greatly improved, the user is stimulated to create more interesting videos, and a common user can produce videos similar to movie special effects.

In some embodiments, when the number of target videos is at least two, step S302 may be performed as: receiving a batch segmentation operation for at least the two target videos; and responding to the batch segmentation operation, acquiring a video fragment taking the target object as a foreground from each target video, and determining the video fragment as a corresponding foreground video.

When the number of target videos is plural, the division operation may be a batch division operation. For a plurality of target videos, each target video may be the same as or different from a target object of the foreground. The target objects corresponding to different target videos may be the same type of object. Here, the divided foreground video is a video clip made up of the target object in the target video.

At this time, step S303 may be performed as: receiving a batch compositing operation for at least two of the foreground videos and the background videos; and responding to the batch synthesis operation, and respectively overlapping foreground video frames in the at least two foreground videos with background video frames in the background video.

For example: the foreground video comprises: when the video a ', the video B', and the video C 'are the video D, the video frames of the video a', the video B ', and the video C' are overlapped with the background video frame in the background video together, and the overlapping effect can be as shown in fig. 5B, where a background area 501 is a picture of the video D, and an object 502, an object 503, and an object 504 are areas corresponding to an object a, an object B, and an object C in the foreground in the video a, the video B, and the video C, respectively.

In some embodiments, when step S303 may be performed: loading a video selection window displaying an alternative background video; receiving a video selection operation for the video selection window; and acquiring the background video selected by the video selection operation.

A video selection window is provided in a video processing application program running on the terminal, and identification information of alternative background videos is displayed in the video selection window. The identification information of the alternative background video in the video selection window can be obtained from the local or the network side. The terminal receives video selection operation based on the video selection window, so that a user selects background videos for video synthesis from alternative background videos based on the video selection operation.

For the background video selected by the user, the background video can be presented on the terminal, so that the user can preview the selected background video to determine whether the selected background video is the required background video, and if the selected background video is not the required background video, the background video can be reselected based on the video selection window to replace the selected background video. For example, the video selection window may be as shown in window 401 in fig. 4, and the description of the selection process of the background video is omitted here.

In some embodiments, in response to a preview operation for the foreground video and the background video, an overlay effect of the foreground video frames and the background video frames is presented.

The video processing application running on the terminal may be provided with an interactive portal for receiving a preview operation to receive the preview operation indicating the effect of previewing the superposition of the foreground video and the background video.

In some embodiments, as shown in fig. 6, after receiving the segmentation operation in step S302, the foreground video may be segmented from the target video frame by the following steps:

step S3021, identifying a target area where the target object is located from a video frame of the target video, and performing a transparency process on an area other than the target area in the video frame;

identifying a target area of a target object from a video frame of a target video in an image identification model or a calibration mode, keeping the pixel value of a pixel point belonging to the target area unchanged after identifying the target area, and setting the pixel value of the pixel point belonging to an area outside the target area to be 0, so that the area outside the target area is subjected to transparentization treatment, and the target object in the video frame of the target video is segmented.

And step S3022, packaging the video frame after the transparentization processing into the foreground video.

And packaging the transparently processed foreground video frames into foreground video based on a video codec.

In some embodiments, step S3021 may be implemented as follows:

identifying a target area where the target object is located in a video frame of the target video, and obtaining an image matrix corresponding to the video frame of the target video according to the target area; elements in the image matrix respectively represent the probability that the pixels of the corresponding video frame belong to the target area; and carrying out mask processing on the image matrix and the corresponding video frame so as to transparentize the area except the target area in the video frame.

Here, a target area in which a target object in the target video frame is located may be identified by an image recognition model, which outputs a binarized image matrix based on the identified target area. And identifying a target area where a target object in the target video frame is located through user calibration, and obtaining a binary image matrix according to the determined target area. In the image matrix, the corresponding element of the pixel outside the target area is 0, which represents that the pixel does not belong to the target area, and the corresponding element of the pixel of the target area is 1, which represents that the pixel belongs to the target area. And masking the image matrix and the video frame of the target video, wherein the pixel value of the pixel point of the target area is unchanged, and the pixel value of the pixel point of the area except the target area is 0, so that the area except the target area in the video frame is transparent.

Here, the image recognition model may be trained by performing a sample set of target object annotations. When the target object is a portrait, the training samples in the sample set may be labeled in a portrait picture 701 as shown in fig. 7.

In some embodiments, said overlaying a foreground video frame in said foreground video with a background video frame in said background video comprises:

acquiring a timestamp alignment relation between the foreground video frame and the background video frame; and overlapping the foreground video frame in the foreground video and the background video frame in the background video, which accords with the timestamp alignment relationship.

Before the foreground video frame and the background video frame are overlapped, the timestamp of each foreground video frame in the foreground video is acquired, the timestamp of each background video frame in the background video is acquired, the timestamp alignment relation of the foreground video frame and the background video frame, namely the relation between the time period of the foreground video and the time period of the background video, is determined according to the acquired timestamps, and the foreground video frame and the background video frame with the timestamp alignment relation are overlapped. The timestamp alignment relationship may be automatically determined according to the position of each foreground video frame on the time axis and the position of each background video frame on the time axis, or may be determined based on an editing function provided by a video processing application program. Wherein the editing function provided by the video processing application is capable of adjusting the position of the foreground video frame or the position of the background video frame on the time axis based on the timestamp adjustment operation of the user.

Such as: the time length of the background video is 2 minutes, the time period on the time axis is 0-2 minutes, the time length of the foreground video is 30 seconds, and the time stamp of the foreground video is aligned with the time period from 1 minute 16 seconds to 1 minute 45 seconds of the background video, so that the first frame in the foreground video frame and the first frame in the 1 minute 16 seconds of the background video have a time stamp alignment relation, and correspond frame by frame, and each foreground video frame in the foreground video frame and each background video frame in the 1 minute 16 seconds to 1 minute 45 seconds of the background video are overlapped. Here, the frame rates of the foreground video frame and the background video frame may be the same.

For another example: as in the above example, the alignment relationship between the timestamps of the foreground video frame and the background video frame is adjusted as shown in fig. 8, and before the adjustment, the start time of the foreground video is aligned with T1 of the background video, where T1 is 1 minute 16 seconds, and the time periods from 1 minute 16 seconds to 1 minute 45 seconds of the foreground video and the background video are aligned. The user adjusts the start time of the foreground video to the T2 of the background video based on the slidable control shown by the dotted line along the direction shown by the arrow, wherein T2 is 1 minute 06 seconds, the start position of the foreground video frame in the time axis of the background video is adjusted from 1 minute 16 seconds to 1 minute 06 seconds through the time adjustment operation, at this time, the time period of the foreground video frame is aligned with the 1 minute 06-1 minute 35 seconds of the background video frame, and then each foreground video frame in the foreground video frame and each background video frame in the 1 minute 06 seconds to the 1 minute 35 seconds in the background video are overlapped.

In the embodiment of the invention, the terminal can provide a time adjustment interface such as a slidable control on the user interface, so that a user can select the synthesis start time and the synthesis end time of the foreground video and the background video on the user interface through the time adjustment interface such as the slidable control. Note that the start time of the composition or the end time of the composition is between the start time and the end time of the background video. When the electronic equipment overlaps the foreground video frame and the background video frame, the background video is respectively decoded into the background video frame based on the selected initial time of the composition, and the decomposed background video frame and the foreground video frame are overlapped frame by frame until the end time of the composition. And if the interval between the synthesis starting time and the synthesis ending time is longer than the duration of the foreground video, taking the ending time of the foreground video as a standard. If the interval between the start time of the composition and the end time of the composition is shorter than the foreground video time, the composition is ended based on the selected end time of the composition, that is, the end of the foreground video is not reached.

In some embodiments, said overlaying foreground video frames in said foreground video with background video frames in said background video comprises:

in response to an editing operation that sets composition parameters for the foreground video and the background video; and covering the foreground video frame with the background video frame, wherein the covering area of the foreground video frame in the background video frame accords with the set synthesis parameters. The synthesis parameters include at least one of the following parameters: position, size, etc. to characterize the relative position, relative size, etc. of the foreground video frame in the background video frame.

A video processing application running on the terminal may provide an editing page in which foreground video frames of the foreground video and background video frames of the background video may be displayed, where the foreground video frames and the background video frames having a timestamp alignment relationship may be displayed.

And an editing interactive interface is loaded on the editing interface, and the editing operation for setting the synthesis parameters is received so as to set the synthesis parameters. The editing operation can be translation, rotation, zooming and other operations.

In practical applications, as shown in fig. 9, an editing interface for performing editing operations may be provided in an editing interface 901, where a rectangular frame 902 having the same size as a target object in a foreground video frame is provided, and a user editing operation on the foreground video is received based on the rectangular frame.

When the user is determined to finish the editing operation, the synthesis operation can be automatically triggered, and the synthesis operation can be received based on an interactive entrance operated on a display interface by the user. And responding to the synthesis operation, and overlaying the foreground video frame on the background video frame based on the synthesis parameters set by the editing operation, so that the overlaying of the foreground video in the background video frame conforms to the set synthesis parameters.

In some embodiments, the determining the synthesis parameters of the foreground video frame in the background video frame comprises: constructing an initial matrix with the same size as the foreground video frame; and adjusting elements in the initial matrix according to the editing operation to obtain a target matrix representing the variation of the synthesis parameters.

Here, the constructed matrix having the same height and width as the target object in the foreground video frame is referred to as an initial matrix. And adjusting the initial matrix according to the editing operation to obtain a target matrix representing the variation of the synthesis parameters. And when the editing operation is translation, modifying the value of the element corresponding to the pixel at which the translation position is located into the displacement size. And when the editing operation is zooming, modifying the value of the element corresponding to the pixel at which the zooming position is positioned into the zooming scale. And when the editing operation is rotation, modifying the value of the element corresponding to the pixel at which the rotated position is positioned into a scaled angle function.

For example, when the height and width of the foreground video frame is 3, the initial matrix may be a 3 × 3 matrix

The target matrix during translation may be ^ greater than or equal to ^ greater than>

The target matrix when rotating may be

Wherein, t _x 、t _y Respectively representing the magnitude of the displacement, s, translated in the x and y directions _x 、s _y Denotes the scale of scaling in x, y directions, respectively, and q in sin (q)/cos (q) denotes the angle of rotation. Wherein the scaling in the x, y directions is expressed as s _x 、s _y Representing a scaling s of two-dimensional spatial coordinates (x, y) in the horizontal direction centered at (0, 0) _x Multiple, scaling s in the vertical direction _y Multiple, that is, the horizontal distance of the transformed coordinate position distance (0, 0) becomes s of the horizontal distance of the original coordinate from the position center point _x Multiple, the vertical distance becomes s of the vertical distance of the original coordinate from the central point of the position _y And (4) doubling. Where 1 and 0 have no practical significance and are default parameters obtained when a calculation is expressed as a mathematical matrix.

In some embodiments, the overlaying the foreground video frame over the background video frame, and the overlay area of the foreground video frame in the background video frame conforms to the set composition parameters, includes:

multiplying the target matrix with a foreground video frame in the foreground video to obtain an adjusted foreground video frame; and covering the background video frame with the adjusted foreground video frame.

Here, the target matrix may be multiplied by the bitmap of the foreground video frame to obtain the adjusted bitmap of the foreground video frame. The Bitmap (Bitmap) is stored as a two-dimensional array of RGBA pixels. When a pixel with a coordinate position of p0 (x 0, y 0) in a foreground video frame is transformed, parameters such as a transformed displacement, a scaling size and the like are input into a reference matrix to obtain a corresponding target matrix M (x 0, y 0), and then the coordinate position of the pixel in the adjusted foreground video frame is p1 (x 1, y 1), and a calculation formula of p1 (x 1, y 1) is as follows:

p1(x1,y1)＝p0(x0,y0)*M(x0,y0)；

wherein p0 (x 0, y 0) is in the matrix [ x y]Transpose [ x y ]] ^T And (6) performing calculation.

Such as: when a space coordinate p0 (x 0, y 0) is first translated along the x direction by t _x Then translated along the y direction by t _y The resulting coordinates p1 (x 1, y 1) = (x 0+ t) _x ，y0+t _y ) When expressed in a matrix form, this may be:

each pixel in the foreground video frame can obtain a new coordinate position, so that a new two-dimensional array of pixels is obtained, and the two-dimensional array can be restored into a new Bitmap, namely the adjusted Bitmap.

The video processing method provided by the embodiment of the invention can provide the editing page, receives the editing operation of the user on the foreground video frame based on the editing page, and adjusts the relative position and the imaging size of the foreground video frame and the background video frame relative to the background video frame when the foreground video frame and the background video frame are synthesized.

For example, taking an Android platform as an example, a video encoder related in the embodiment of the present invention is described, where an encoding and decoding architecture of the video encoder is shown in fig. 10:

a codec may process input data to produce output data, the codec using a set of input buffers and output buffers to asynchronously process the data. An empty input buffer can be created by the inputter to fill with data and send to the codec for processing. The codec converts input data provided by the client and then outputs the converted data to an empty output buffer. And finally, the client acquires the data of the output buffer area, consumes the data in the output buffer area and releases the occupied output buffer area back to the coder and the decoder. If subsequent data needs to be processed, the codec repeats these operations.

The types of data that the codec can handle may include: compressed data and raw video data. These data can be processed through a buffer (ByteBuffers), and at this time, a screen buffer (Surface) is required to display the original video data, which can also improve the performance of encoding and decoding. Surface may use a local video buffer that is not mapped or copied to ByteBuffers. Such a mechanism makes the codec more efficient. Typically, when Surface is used, the original video data cannot be accessed, but the decoded original video frame can be accessed using an image reader (ImageReader).

In the following, an exemplary application of the embodiment of the present invention in an actual application scenario will be described, where the target object is a portrait and the electronic device is a terminal.

In the related art, a video composition scheme may be as shown in fig. 11, including: a background template 1101 is selected. The background template 1101 is a background video. Selecting a portrait picture 1102 with image content including a portrait, selecting a portrait area of the portrait picture 1102 in a user smearing mode, dividing the portrait picture 1102 into the portrait and a background by AI division based on the portrait area selected by the user, so as to scratch out the portrait 1103, displaying the portrait 1103 on a background template 1101 to form an edited image 1104, editing the positions of the edited image 1104, where the portrait 1103 and the background template 1101 are displayed, and after the editing is finished, fusing the portrait 1103 and the background template 1101 based on the edited synthesis parameters to obtain a synthesized video 1105 with the image content including the portrait.

The composite effect of the video composition scheme shown in fig. 11 is that, as shown in fig. 12, a portrait 1103 in a still portrait picture 1102 is composited into a background template 1101, resulting in a display page 1105 of a composite video 1104. The video synthesis scheme shown in fig. 11 is to segment the portrait background of a static picture and then synthesize the image into a video, and has great limitations, wherein firstly, a specially made template background video is required, secondly, the video can only be segmented aiming at the static picture, the extracted portrait is static, and much interest is lost. In addition, the region smearing selection is required to be manually performed for the portrait segmentation of the picture, and the efficiency for processing the video with a plurality of frames of images is low.

In the related technology, when watching a certain short video, a user can initiate a video co-shooting function, and synthesize the two videos to form a video with the same frame. The technical scheme for video co-shooting is that two videos are directly spliced left and right, and the two videos are relatively hard due to different scenes. The effect of video-on-shoot is shown in fig. 13, where a screen 1301 is a screen of one of videos of video-on-shoot, and a screen 1302 is a screen of the other video.

Therefore, the scheme of the video snap-shot function is to simply splice two videos left and right together to synthesize a larger video without processing the backgrounds of the two videos, and the synthesized video has two scenes, so that the synthesized video is more abrupt.

In order to solve the technical defect that the video synthesis scheme which can only synthesize a static picture into a video has a large limitation or a video synthesis scheme which splices two videos has a sharp scene, an embodiment of the present invention provides a video processing method, including: the steps of video selection, video decoding, portrait segmentation, image editing, video composition, and the like, as shown in fig. 14, include:

video selection is performed from local videos to obtain a background video 1401 and a portrait video 1402, namely a target video. The background video 1401 is video-decoded to obtain a video frame 1403, i.e. a background video frame. Video decoding is performed on the portrait video 1402 to obtain a video frame 1404. The video frame 1404 is input into a neural network model 1405 for portrait segmentation, a mask image 1406 of the portrait is output, and a portrait image 1407, namely a foreground video frame, is obtained through mask processing of the mask image 1406 of the portrait and the video frame 1404.

When a start editing operation is received, a background video frame 1402 and a portrait image 1407 are displayed on an editing interface. The portrait image 1407 on the editing interface receives the editing operation of the user, adjusts the relative position and the relative size of the portrait image 1407 relative to the video frame 1402 based on the editing operation of the user to obtain the relative relationship, and edits based on the portrait image of the relative relationship. The editing operation performed on the human image 1407 may include: translation, scaling, rotation, and the like. Upon receiving the preview operation, the edited portrait image 1408 and video frame 1402 are rendered, and output as a superimposed effect 1409 of the portrait image 1407 and video frame 1402. Upon receiving the composition operation, the edited portrait image 1408 and video frame 1402 are rendered again by rendering, resulting in a composite frame 1410, which composite frame 1410 is encapsulated again by the multimedia encoder into a composite video 1411. After the overlay effect 1409 is output, the editing operation can be continuously received, and the relative position and relative size of the human image 1407 with respect to the video frame 1402 can be adjusted.

The terminal device may display video selection options through a system album or custom album page, selecting the background video 1401 and the portrait video 1402 based on the displayed options. The terminal device decodes the background video 1401 and the portrait video 1402 into a plurality of single-frame images, respectively, by MediaCodec. The image of each frame decoded from the image video 1402 is divided into images 1407. The terminal device performs Matrix transformation on the Bitmap of each segmented portrait image 1407 through a target Matrix representing synthesis parameters to obtain the Bitmap of the edited portrait image, uploads the edited Bitmap to a texture Unit of a Graphics Processing Unit (GPU) through an OpenGL ES API, and the GPU of the terminal device performs image mixing operation on the texture corresponding to the background video frame 702 and the texture of the edited portrait image through a shader to obtain a final synthesized frame, and encodes the synthesized frame into a synthesized video through MediaCodec.

The following stages in the video processing method provided by the embodiment of the present invention are described below: the method comprises the steps of portrait segmentation, picture editing, rendering, and video decoding and synthesis.

1. Portrait segmentation

In the server, a set comprising a plurality of artificially labeled human image pictures is used as a training set, a neural network model is trained, the trained neural network model is stored and transplanted to the terminal equipment.

The server can collect the portrait pictures, label the collected portrait pictures in a manual mode, take the area corresponding to the portrait in the portrait pictures as the foreground, take the area except the portrait as the background, and separate each pixel point of the foreground and the background. The manually labeled portrait picture can be as shown in fig. 7, and a portrait 702 is labeled in a portrait picture 701.

For a dynamic target video, the target video is decoded into a static frame in real time through a video decoder (MediaCodec), then the static frame is input into a trained neural network model for processing, a segmented picture mask (binary image) is returned, and transparency mixing is carried out on the mask and an original image in the target video frame, so that a segmented portrait, namely a foreground video frame in the foreground video can be cut out.

2. Picture editing

The user can edit the divided portrait picture by translation, zooming, rotation and the like. The user can control the position and the size of the edition according to the requirement of the user.

The segmented portrait pictures are stored in a memory in a Bitmap mode, and the stored bitmaps can be transformed through a Matrix. By constructing a rectangular frame with the same width and height as the portrait picture, and then providing an interactive entry dragged and rotated by a user on a graphical interface, matrix generated by editing the rectangular frame by the user, namely a target Matrix, can be obtained, and the Bitmap after translation, scaling, rotation and other transformations can be obtained by multiplying the pixels of the original portrait picture by the Matrix.

3. Rendering

After the portrait is segmented from the target video, real-time previewing may be performed. In addition, after the size and position information is edited, real-time preview is also possible. During real-time preview, openGL ES is adopted in the terminal for rendering, RGB data of an image of each Frame is uploaded to a texture unit of a GPU, then rendering is carried out through a rendering pipeline of the GPU, and finally the GPU renders output image data to a Frame buffer r of a screen so as to display the image data on the screen.

Based on the fact that the GPU has an efficient parallel processing and rendering framework and is very suitable for processing and rendering images, the GPU is used for rendering through the API of the OpenGL ES, and the purpose of rendering special effects in real time can be achieved.

4. Video decoding and synthesis

The video frames in the video are obtained by decoding the video, so that the video is processed frame by frame. When the final video is fused, a video synthesis technology, that is, video coding, needs to be adopted.

For example, an Android platform can perform video encoding and decoding based on a MediaCodec module on Android.

The video processing method provided by the embodiment of the invention can be used for rapidly synthesizing a plurality of videos in the video processing application programs of mobile phone videos and live broadcasts, and the interesting video editing efficiency is improved. As shown in fig. 15, when the user uses the device, the user can select a plurality of videos in turn, such as: the video 151-1, the video 151-2 \8230, and the video 151-m, wherein the video 151-2 is a template video (i.e., a background video). A user clicks an interactive interface for one-key portrait keying provided by a video processing application program, the image application program performs portrait segmentation on videos 151-1, \ 8230except for a video 151-2, and each video in the videos 151-m correspondingly generates a plurality of portrait videos: portrait video 152-1 \8230andportrait video 152-m. In the portrait video, the background area of the non-portrait is transparent. The user can sequentially adjust the size and relative position of the portrait video relative to the template video (which can contain a background), and preview the fusion effect in real time. When the image application graph receives that a user clicks a composition button, video composition is carried out, the portrait video 152-1 \8230, the portrait video 152-m are jointly composited into the video 151-2, a composite video 153 is obtained, and the composited video 153 can be stored locally.

The video matting interface provided by the video processing application can be as 1601 in fig. 16, where a window 1602 is a video list of all to-be-scratched images currently in the work area, and there is a "scratch image" button 1603 and a "start edit" button 1604. When the user clicks the "scratch portrait" button 1603, the portrait in the currently selected video can be extracted and displayed in the preview area in real time, and the "start editing" button 1604 is clicked to enter the editing interface 1605. The editing interface 1605 is used to edit the relative position and size of the portrait video 1606 and background video 1607, and has an "replace background video" button 1608 for replacing the currently selected background video and a "start composition" button 1609 for starting the final video composition.

The video processing method provided by the embodiment of the invention automatically divides the portrait background of the dynamic video, so that the portrait is a moving and fresh image, and allows a user to select any background video and synthesize the portrait into the video, thereby realizing the fusion of two or even multiple sections of videos. For example, two segments of videos of a user performing a dance indoors are used, two separate portraits are extracted and combined into a scene video of another stage, and therefore the effect of the two people performing a cooperative performance in different places is achieved. Because the portrait background is segmented and the finally synthesized video scenes are unified, more creation spaces are introduced in the scheme, the imagination and creativity of a user can be fully stimulated, and the integral playability and interestingness of the software are improved.

An exemplary structure of software modules is described below, and in some embodiments, as shown in fig. 2, the software modules in the video processing apparatus may include:

a first acquisition unit 2401 for acquiring a target video;

a segmentation unit 2402, configured to, in response to a segmentation operation for a target object in the target video, obtain a foreground video with the target object as a foreground from the target video; the foreground video comprises at least one foreground video frame;

a second obtaining unit 2403, configured to obtain a background video; the background video comprises at least one background video frame;

a synthesizing unit 2404, configured to, in response to a synthesizing operation for the foreground video and the background video, overlay a foreground video frame in the foreground video and a background video frame in the background video, and encapsulate a video frame obtained by the overlay as a synthesized video.

In some embodiments, the splitting unit 2402 is further configured to:

receiving a batch segmentation operation for at least the two target videos; and responding to the batch segmentation operation, acquiring a video fragment taking the target object as a foreground from each target video, and determining the video fragment as a corresponding foreground video.

In some embodiments, synthesis unit 2404 is further configured to:

receiving a batch compositing operation for at least two of the foreground videos and the background videos; and responding to the batch synthesis operation, and respectively overlaying the foreground video frames in the at least two foreground videos to the background video frames in the background video.

In some embodiments, the second obtaining unit 2403 is further configured to:

loading a video selection window displaying an alternative background video; receiving a video selection operation for the video selection window; and acquiring the background video selected by the video selection operation.

In some embodiments, the apparatus further comprises: a preview unit configured to:

and presenting the superposition effect of the foreground video frame and the background video frame in response to the preview operation of the foreground video and the background video.

In some embodiments, the splitting unit 2402 is further configured to:

identifying a target area where the target object is located from a video frame of the target video, and carrying out transparency processing on an area outside the target area in the video frame; and packaging the video frame after the transparentization processing into the foreground video.

In some embodiments, the splitting unit 2402 is further configured to:

identifying a target area where the target object is located in a video frame of the target video, and obtaining an image matrix corresponding to the video frame of the target video according to the target area; elements in the image matrix respectively represent the probability that the pixels of the corresponding video frame belong to the target area; and performing mask processing on the image matrix and the corresponding video frame to transparentize the region except the target region in the video frame.

In some embodiments, synthesis unit 2403 is further configured to:

acquiring the timestamp alignment relation of the foreground video frame and the background video frame; and overlapping the foreground video frame in the foreground video and the background video frame in the background video, which accords with the timestamp alignment relationship.

In some embodiments, the synthesis unit 2403 is further configured to:

In some embodiments, synthesis unit 2403 is further configured to:

constructing an initial matrix with the same size as the foreground video frame; and adjusting elements in the initial matrix according to the editing operation to obtain a target matrix representing the variation of the synthesis parameters.

In some embodiments, synthesis unit 2403 is further configured to:

As an example of the method provided by the embodiment of the present invention implemented by hardware, the method provided by the embodiment of the present invention may be directly implemented by the processor 410 in the form of a hardware decoding processor, for example, and implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, programmable Logic Devices (PLDs), complex Programmable Logic Devices (CPLDs), field Programmable Gate Arrays (FPGAs), or other electronic components.

Embodiments of the present invention provide a storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform a method provided by embodiments of the present invention, for example, the method shown in fig. 3.

In some embodiments, the executable instructions may be in the form of a program, software module, script, or code written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

In summary, according to the embodiments of the present invention, a target object in a target video is segmented from the target video, a foreground video frame of the segmented foreground video is taken as a foreground, a background video frame of a background video is taken as a background, and a video frame is synthesized and encapsulated into a synthesized video, so that a dynamic video is synthesized based on the content of the video, and a dynamic video with a coordinated picture content is obtained. And performing batch target object segmentation processing on the plurality of target videos based on one-key operation of the display interface. And an editing interface is provided for the user, and the position and the imaging size of the foreground video relative to the background video are edited based on the editing operation of the user.

The above description is only an example of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present invention shall fall within the protection scope of the present invention.

Claims

1. A video processing method, comprising:

acquiring a target video;

in response to a calibration operation for a video frame in the target video, determining an object calibrated by the calibration operation as a target object;

in response to a segmentation operation for the target object in the target video, obtaining a foreground video taking the target object as a foreground from the target video; the foreground video comprises at least one foreground video frame;

in response to a timestamp adjustment operation for the foreground video frame, adjusting the position of the foreground video frame on a time axis, and displaying the foreground video frame and the background video frame with a timestamp alignment relation;

in response to a timestamp adjustment operation for the background video frame, adjusting the position of the background video frame on a time axis, and displaying the foreground video frame and the background video frame with a timestamp alignment relation;

in response to the synthesis operation for the foreground video and the background video, overlapping foreground video frames in the foreground video and background video frames in the background video, which conform to a timestamp alignment relationship, wherein the timestamp alignment relationship is used for aligning positions of the foreground video frames and the background video frames on a time axis;

and packaging the video frames obtained by superposition into a synthesized video.

2. The method of claim 1, wherein the obtaining a foreground video with the target object as a foreground from the target video in response to the segmentation operation for the target object in the target video comprises:

receiving a batch segmentation operation for at least two target videos;

3. The method of claim 1, wherein the overlaying foreground video frames in the foreground video and background video frames in the background video according to a timestamp alignment relationship in response to the composition operation for the foreground video and the background video comprises:

receiving a batch composition operation for at least two of the foreground video and the background video;

and responding to the batch synthesis operation, and respectively overlapping foreground video frames in the at least two foreground videos to background video frames in the background video which accord with the timestamp alignment relationship.

4. The method of claim 1, wherein the obtaining the background video comprises:

loading a video selection window displaying an alternative background video;

receiving a video selection operation for the video selection window;

and acquiring the background video selected by the video selection operation.

5. The method of claim 1, further comprising:

6. The method of claim 1, wherein the obtaining, from the target video, a foreground video with the target object as a foreground comprises:

7. The method according to claim 6, wherein the identifying a target area in which the target object is located from a video frame of the target video and performing a transparency process on an area outside the target area in the video frame comprises:

and carrying out mask processing on the image matrix and the corresponding video frame so as to transparentize the area except the target area in the video frame.

8. The method according to any one of claims 1 to 7, further comprising:

and in response to the editing operation for setting the synthesis parameters for the foreground video and the background video, covering the foreground video frame with the background video frame, wherein the covering area of the foreground video frame in the background video frame conforms to the set synthesis parameters.

9. The method of claim 8, further comprising:

and adjusting elements in the initial matrix according to the editing operation to obtain a target matrix representing the variation of the synthesis parameters.

10. The method according to claim 9, wherein the overlaying the foreground video frame on the background video frame, and the overlaying area of the foreground video frame in the background video frame meets the set composition parameters, comprises:

11. A video processing apparatus, comprising:

a first acquisition unit configured to acquire a target video;

a dividing unit, configured to determine, in response to a calibration operation for a video frame in the target video, an object calibrated by the calibration operation as a target object; in response to a segmentation operation for the target object in the target video, acquiring a foreground video with the target object as a foreground from the target video; the foreground video comprises at least one foreground video frame;

a synthesizing unit, configured to adjust a position of the foreground video frame on a time axis in response to a timestamp adjustment operation for the foreground video frame, and display the foreground video frame and the background video frame having a timestamp alignment relationship;

the synthesis unit is used for responding to the timestamp adjustment operation of the background video frame, adjusting the position of the background video frame on a time axis, and displaying the foreground video frame and the background video frame with timestamp alignment relation;

the synthesis unit is used for responding to the synthesis operation of the foreground video and the background video, and overlapping a foreground video frame in the foreground video and a background video frame in the background video, wherein the background video frame conforms to a timestamp alignment relation, and the timestamp alignment relation is used for aligning the positions of the foreground video frame and the background video frame on a time axis; and packaging the video frames obtained by superposition into a composite video.

12. A video processing apparatus, comprising:

a memory for storing executable instructions;

a processor for implementing the video processing method of any of claims 1 to 10 when executing executable instructions stored in the memory.

13. A storage medium having stored thereon executable instructions for causing a processor to perform the video processing method of any of claims 1 to 10 when executed.