CN112601029B

CN112601029B - Video segmentation method, terminal and storage medium with known background prior information

Info

Publication number: CN112601029B
Application number: CN202011340968.0A
Authority: CN
Inventors: 赵维杰; 富宸; 徐孝成; 王晨宇
Original assignee: Shanghai Weisha Network Technology Co ltd
Current assignee: Shanghai Weisha Network Technology Co ltd
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2023-01-03
Anticipated expiration: 2040-11-25
Also published as: CN112601029A

Abstract

The invention relates to a video segmentation method of known background prior information, which comprises the steps of firstly matching a current frame of a video with the background prior information, predicting to obtain a complete background of the current frame, and then segmenting a target foreground of the current frame. The invention can accurately segment when the lens moves greatly, and ensures the video segmentation effect.

Description

Video segmentation method, terminal and storage medium with known background prior information

Technical Field

The present invention relates to the field of video processing technologies, and in particular, to a video segmentation method, a terminal, and a storage medium for knowing background prior information.

Background

The existing video foreground and background segmentation mode generally acquires an image through a camera, and then uses an artificial matting or color key matting mode to scratch out a foreground area in the image, so as to realize the segmentation of a video foreground and a background. However, the manual image matting mode is complex to operate, and the convenience of dividing the video is low. While chroma keying can directly key out foreground regions in an image, this method requires relying on a relatively large solid background with the foreground.

Disclosure of Invention

The invention aims to provide a video segmentation method, a terminal and a storage medium for known background prior information, which are convenient and applicable to video foreground and background segmentation of any background, and can accurately segment a shot when the shot moves greatly, thereby ensuring the video segmentation effect.

The technical scheme adopted by the invention for solving the technical problems is as follows: the video segmentation method for the known background prior information is provided, a current frame of a video is matched with the background prior information, the complete background of the current frame is obtained through prediction, and then the target foreground of the current frame is segmented.

The video segmentation method comprises the following steps:

(1) Setting a background frame and storing the background frame;

(2) Extracting a current frame of the video stream;

(3) Judging whether the current frame is matched with the background frame, if not, entering the step (3), otherwise, entering the step (4);

(4) Matching the background frame to the background of the current frame in a correction mode;

(5) And segmenting the current frame to obtain the foreground of the current frame.

The background frame in the step (1) is a panoramic picture, and the panoramic picture is obtained by synthesizing a plurality of pictures at different angles.

The step (3) is specifically as follows: and (4) calculating the similarity of the areas outside the segmentation mask area of the current frame and the background frame, if the similarity is lower than a threshold value, entering the step (4), and if not, entering the step (5). The similarity can be picture difference, structure similarity, feature map similarity and the like.

The step (4) is specifically as follows: and respectively extracting and matching key points of a pre-stored background frame and a current frame by using a key point matching algorithm, selecting some key points with good matching, calculating a transformation matrix, cutting out the corresponding background part in the pre-stored background frame, transforming to the same visual angle as the current frame by using the transformation matrix, and inputting as a new background of the current frame.

The step (4) is specifically as follows: inputting a pre-stored background frame and a current frame into a convolutional neural network, wherein the output of the convolutional neural network is a series of spatial transformation relation mapping maps, then cutting out a corresponding background part in the pre-stored background frame, transforming the background part into the same visual angle as the current frame by using the spatial transformation relation mapping maps, and inputting the visual angle as a new background of the current frame.

The step (5) is specifically as follows: inputting a pre-stored background frame into a coding model to obtain a background characteristic diagram; inputting the current frame into the coding model to carry out feature decomposition to obtain a current frame feature map; fusing the current frame feature map and the background feature map, performing feature decoding on the fused feature map through a decoding model, and outputting an alpha mask map; and segmenting the current frame based on the alpha mask image to obtain the foreground of the current frame.

The technical scheme adopted by the invention for solving the technical problems is as follows: there is provided a terminal comprising a memory and a processor, the memory having stored thereon a video processing program executable on the processor, the video processing program when executed by the processor implementing the steps of the video segmentation method described above.

The technical scheme adopted by the invention for solving the technical problems is as follows: there is provided a computer readable storage medium having stored thereon a video processing program which, when executed by a processor, implements the steps of the video segmentation method described above.

Advantageous effects

Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: the invention detects the matching condition of the characteristic points of the current frame and the background frame, and automatically matches the background frame to the background of the current frame in a correction mode when the characteristic points of the current frame and the background frame are not matched, thereby ensuring that the lens can be accurately segmented when being moved greatly and ensuring the video segmentation effect.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention;

fig. 2 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present invention.

Detailed Description

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

The embodiment of the invention relates to a video segmentation method of known background prior information, which comprises the steps of matching a current frame of a video with the background prior information, predicting to obtain a complete background of the current frame, and then segmenting a target foreground of the current frame. As shown in fig. 1, the method comprises the following steps: setting a background frame and storing the background frame; extracting a current frame of the video stream; judging whether the current frame is matched with the background frame, if not, matching the background frame to the background of the current frame in a correction mode; segmenting the current frame to obtain the foreground of the current frame; and synthesizing the foreground and background videos.

Fig. 2 is a schematic diagram illustrating a terminal configuration of a hardware operating environment according to the present embodiment. The terminal of the embodiment can be a terminal device with a video shooting function, such as a smart phone, a tablet computer and a PC terminal.

The terminal includes: a processor (e.g., CPU), a communications bus, a user interface, a network interface, and memory. Wherein the communication bus is used for realizing connection communication among the components. The user interface may include an interface for connecting an input device and an output device. The network interface may include standard wired and wireless interfaces. The memory may be a high speed RAM memory or a stable memory such as a disk memory. The memory may also be a processor-independent storage device.

The terminal can also comprise a camera, an RF circuit, a sensor, an audio circuit, a WIFI module and the like.

A memory, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a video processing program. The operating system is a program for managing and controlling the terminal and software resources, and supports the running of a network communication module, a user interface module, a video processing program and other programs or software; the network communication module is used for managing and controlling the network interface; the user interface module is used for managing and controlling the user interface.

In the terminal, a network interface is mainly used for connecting a server or external equipment and carrying out data communication with the server or external equipment; the user interface is mainly used for connecting a terminal interface; the terminal calls the video processing program stored in the memory through the processor to realize the following steps:

step 1, setting a background frame and storing the background frame. The method comprises the steps of obtaining a clear background picture shot when a person leaves the background or obtaining a panoramic picture through synthesis of a plurality of pictures at different angles.

Step 2, extracting a current frame of a video stream, wherein the video stream is a video stream with a foreground and can also be a disordered picture sequence with the foreground;

and 3, judging whether the current frame is matched with a pre-stored background frame, if not, entering the step 4, otherwise, entering the step 5. The method specifically comprises the following steps: calculating the similarity of the current frame and the pre-stored area outside the segmentation mask area of the background frame, if the similarity is lower than a set threshold value, indicating that the current frame and the pre-stored area are not matched, and therefore, entering the step 4 for correction, otherwise, indicating that the current frame and the pre-stored area are matched, and directly entering the step 5 for segmentation. In this embodiment, the similarity may be a difference between pictures, a structural similarity between pictures, or a feature map similarity between pictures.

And 4, matching the pre-stored background frame to the background of the current frame in a correction mode. In this step, a key point matching algorithm and a convolutional neural network may be used to achieve the purpose of correction, which is specifically as follows:

when a key point matching algorithm is used, key point extraction and matching are respectively carried out on a pre-stored background frame and a current frame, part of key points which are well matched are selected, a transformation matrix is calculated, a corresponding background part in the pre-stored background frame is cut out, the transformation matrix is used for transforming to a visual angle which is the same as that of the current frame, and the cut background part is used as a new background input of the current frame.

When the convolutional neural network is used, inputting a pre-stored background frame and a current frame into the convolutional neural network, wherein the output of the convolutional neural network is a series of space transformation relation mapping maps, then cutting out a corresponding background part in the pre-stored background frame, transforming the background part to the same visual angle as the current frame by using the space transformation relation mapping maps, and taking the cut background part as a new background input of the current frame.

And 5, segmenting the current frame to obtain the foreground of the current frame. Specifically, inputting a pre-stored background frame into a coding model to obtain a background characteristic diagram; inputting the current frame into the coding model to carry out feature decomposition to obtain a current frame feature map (the background part in the current frame feature map is completely the same as the background feature map); fusing the current frame feature map and the background feature map (namely, matching and comparing the features of the current frame feature map and the background feature map in different scale feature spaces), reconstructing the fused feature map through a decoding model, and outputting an alpha mask map; and segmenting the current frame by using a mask based on the alpha mask image to obtain the foreground of the current frame. In order to make the segmented foreground have better effect, the edges of the segmented foreground can be subjected to post-processing operations such as sharpening.

It is easy to find that, the embodiment matches the background picture with the region to which the current video belongs, and segments the object different from the background in the current video frame, thereby ensuring that the lens can be accurately segmented when being moved greatly, and ensuring the video segmentation effect.

Claims

1. A video segmentation method of known background prior information is characterized in that a current frame of a video is matched with the background prior information, a complete background of the current frame is obtained through prediction, and then a target foreground of the current frame is segmented, and the method comprises the following steps:

(1) Setting a background frame and storing the background frame;

(2) Extracting a current frame of the video stream;

(3) Judging whether the current frame is matched with the background frame, if not, entering the step (4), otherwise, entering the step (5);

(4) Matching the background frame to the background of the current frame in a correction mode; the method specifically comprises the following steps: respectively extracting and matching key points of a pre-stored background frame and a current frame by using a key point matching algorithm, selecting some key points with good matching, calculating a transformation matrix, cutting out the corresponding background part in the pre-stored background frame, transforming to the same visual angle as the current frame by using the transformation matrix, and inputting as a new background of the current frame;

(5) Segmenting the current frame to obtain the foreground of the current frame; the method specifically comprises the following steps: inputting a pre-stored background frame into a coding model to obtain a background characteristic diagram; inputting the current frame into the coding model to carry out feature decomposition to obtain a current frame feature map; fusing the current frame feature map and the background feature map, performing feature decoding on the fused feature map through a decoding model, and outputting an alpha mask map; and segmenting the current frame based on the alpha mask image to obtain the foreground of the current frame.

2. The video segmentation method according to claim 1, wherein the background frame in step (1) is a panoramic picture, and the panoramic picture is synthesized from a plurality of pictures at different angles.

3. The video segmentation method according to claim 1, wherein the step (3) is specifically: and (4) calculating the similarity of the areas outside the segmentation mask area of the current frame and the background frame, if the similarity is lower than a threshold value, entering the step (4), and if not, entering the step (5).

4. A video segmentation method of known background prior information is characterized in that a current frame of a video is matched with the background prior information, a complete background of the current frame is obtained through prediction, and then a target foreground of the current frame is segmented, and the method comprises the following steps:

(1) Setting a background frame and storing the background frame;

(2) Extracting a current frame of a video stream;

(4) Matching the background frame to the background of the current frame in a correction mode; the method comprises the following specific steps: inputting a pre-stored background frame and a current frame into a convolutional neural network, wherein the output of the convolutional neural network is a series of space transformation relation mapping images, then cutting out a corresponding background part in the pre-stored background frame, transforming the background part into the same visual angle as the current frame by using the space transformation relation mapping images, and inputting the background part as a new background of the current frame;

5. The video segmentation method according to claim 4, wherein the background frame in step (1) is a panoramic picture, and the panoramic picture is obtained by synthesizing a plurality of pictures at different angles.

6. The video segmentation method according to claim 4, wherein the step (3) is specifically: and (4) calculating the similarity of the areas outside the segmentation mask area of the current frame and the background frame, if the similarity is lower than a threshold value, entering the step (4), and if not, entering the step (5).

7. A terminal comprising a memory and a processor, the memory having stored thereon a video processing program executable on the processor, the video processing program when executed by the processor implementing the steps of the video segmentation method as claimed in any one of claims 1 to 6.

8. A computer-readable storage medium, having stored thereon a video processing program which, when executed by a processor, implements the steps of the video segmentation method as claimed in any one of claims 1 to 6.