CN112258384A

CN112258384A - Method and system for removing background of real-time character video

Info

Publication number: CN112258384A
Application number: CN202011132128.5A
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing Zhongke Shenzhi Technology Co ltd
Current assignee: Beijing Zhongke Shenzhi Technology Co ltd
Priority date: 2020-10-22
Filing date: 2020-10-22
Publication date: 2021-01-22

Abstract

The embodiment of the invention provides a method and a system for removing background of a real-time character video, which comprise the following contents: acquiring a video stream in real time; preprocessing images in the video stream in real time; running a pre-trained student model in real time to extract a figure outline image in the image and fuse the figure outline image with a preset virtual image to form and display a fused image, and updating the student model; and operating the teacher model asynchronously with the student model to distill the video stream on line, and updating the student model according to the distillation result. The embodiment of the invention can effectively improve the synthesis speed of the virtual background, the character extraction accuracy and reduce the requirement on the equipment memory, thereby being very suitable for daily use of common users.

Description

Method and system for removing background of real-time character video

Technical Field

The invention relates to the technical field of image data processing, in particular to a method and a system for removing background of a real-time character video.

Background

With the popularity of online video conferencing, virtual backgrounds have become an interesting technological phenomenon. The virtual background is to systematically divide each input video into a foreground (mostly characters) and a background in real time. The user selects an image or video that replaces background pixels that are composited with the foreground layer to generate an artificial video stream.

In the prior art, a green screen is generally used as a background, but the method mainly considers the image synthesis precision and quality, does not consider the convenience and efficiency, and is also not suitable for daily use by an ordinary user because a green screen background cannot be provided in many times.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and a system for removing background from a real-time character video, so as to solve the above technical problems.

In order to achieve the above technical object, an embodiment of the present invention provides a method for removing background from a real-time character video, wherein the improvement is as follows: the method comprises the following steps:

acquiring a video stream in real time;

preprocessing images in the video stream in real time;

running a pre-trained student model in real time to extract a figure outline image in the image and fuse the figure outline image with a preset virtual image to form and display a fused image, and updating the student model;

and operating the teacher model asynchronously with the student model to distill the video stream on line, and updating the student model according to the distillation result.

In order to achieve the above technical object, an embodiment of the present invention provides a system for removing background from a real-time character video, wherein the improvement is as follows:

the acquisition module is used for acquiring the video stream in real time;

the processing module is used for preprocessing images in the video stream in real time;

the execution module is used for operating a pre-trained student model in real time so as to extract a figure outline image in the image and fuse the figure outline image with a preset virtual image to form and display a fused image and update the student model;

and the distillation module is used for operating the teacher model asynchronously with the student model so as to distill the video stream on line and update the student model according to the distillation result.

Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages:

1. the invention carries out virtual background fusion processing through the pre-trained student model, and the student model has fast operation and high efficiency, so that each frame can be processed and deduced fast enough to ensure the fluency of the real-time video.

2. The present invention performs online distillation by setting a teacher model, can balance controllable accuracy to obtain better throughput, reduce computational cost to achieve prediction of sufficiently high quality, and can achieve high performance at low resource settings by setting the teacher model to run asynchronously to the student models.

The invention has convenient operation and low cost, and can be widely applied to the technical field of image data processing, in particular to the virtual background image processing in a video conference.

Drawings

FIG. 1 is a flow diagram of one embodiment of a method for real-time character video background removal in accordance with the present invention;

FIG. 2 is a schematic diagram of one embodiment of a real-time character video background removal system of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.

As shown in fig. 1, an embodiment of the present invention provides a method for background removal of a live character video, including the following steps:

acquiring a video stream in real time;

preprocessing images in the video stream in real time;

In which online distillation is a video segmentation framework that uses temporal consistency between frames to reduce computational cost, most video streams view a very small subset, e.g., a fixed intersection, a particular room, performing online distillation through a high quality teacher model, can reduce computational cost to achieve sufficiently high quality predictions.

Obviously, because the virtual background fusion processing is carried out through the pre-trained student model, and the student model has fast operation and high efficiency, each frame can be processed and inferred at a fast enough speed to ensure the fluency of the real-time video.

Meanwhile, since the teacher model is set to perform online distillation, controllable accuracy can be balanced to obtain better throughput, computational cost is reduced to achieve prediction of sufficiently high quality, and the teacher model is set to operate asynchronously with the student model, high performance can be achieved under low resource setting.

Therefore, the scheme can effectively improve the synthesis speed of the virtual background and the character extraction accuracy, and reduce the requirement on the equipment memory, so that the method is very suitable for daily use of common users.

In one embodiment, the images in the video stream are preprocessed in real time, wherein the preprocessing includes reducing the image size and performing tensor processing on the images to further shorten the virtual background synthesis time and improve the efficiency.

In one embodiment, a pre-trained student model is run in real time to extract the figure outline image in the image and fuse the figure outline image with a preset virtual image, a fused image is formed and displayed, and the student model is updated, wherein the method comprises the following steps: and running a pre-trained student model in real time, judging whether a figure outline image exists in the image according to a preset mask area threshold value, if not, outputting a pure background image, updating the student model, if so, extracting the figure outline image, fusing the figure outline image with a preset virtual image, forming and displaying a fused image, and updating the student model.

The preset mask area threshold may be set to 5%, but is not limited thereto, and may be adjusted as needed in actual use.

Obviously, through the arrangement, the occurrence of related unexpected situations such as blank output by overfitting of the student model in the video without the person can be effectively avoided.

In one embodiment, a pre-trained student model is run in real time to extract the figure outline image in the image and fuse the figure outline image with a preset virtual image, form and display a fused image, and update the student model, wherein the student model is pre-trained according to BCE Loss function and Adam optimization algorithm to ensure that the student model can output a reasonable fused image at the beginning.

Wherein the student model may employ a JITNet dataset.

In one embodiment, the teacher model may employ MRCNN (target detection algorithm).

Based on the same inventive concept, an embodiment of the present invention further provides a system for removing a background from a live character video, as shown in fig. 2, including:

the acquisition module 1 is used for acquiring a video stream in real time;

the processing module 2 is used for preprocessing images in the video stream in real time;

the execution module 3 is used for running a pre-trained student model in real time so as to extract the figure outline image in the image and fuse the figure outline image with a preset virtual image to form and display a fused image and update the student model;

and the distillation module 4 is used for operating the teacher model asynchronously with the student model so as to carry out online distillation on the video stream and update the student model according to the distillation result.

In one embodiment, the processing module 2 comprises:

a size processing module 2 for reducing the size of the image;

and the tensor processing module 2 is used for carrying out tensor processing on the image.

Obviously, the virtual background synthesis time can be further shortened and the efficiency can be improved through the scheme.

In one embodiment, the execution module 3 includes:

the operation module is used for operating the pre-trained student model in real time;

the judging module is used for judging whether the figure outline image exists in the image according to the preset mask area threshold value;

and the execution submodule is used for outputting a pure background image and updating the student model when the figure outline image does not exist, and extracting the figure outline image and fusing the figure outline image with a preset virtual image to form and display a fused image and update the student model when the figure outline image exists.

In one embodiment, the execution module 3 includes:

and the training module is used for pre-training the student model according to the BCE Loss function and the Adam optimization algorithm so as to ensure that the student model can output a reasonable fusion image at the beginning.

It will be apparent to those skilled in the art that the modules or steps of the embodiments of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for background removal of a real-time character video is characterized by comprising the following steps:

acquiring a video stream in real time;

preprocessing images in the video stream in real time;

2. The method of claim 1, wherein the preprocessing comprises reducing the size of the image and performing tensor processing on the image.

3. The method for real-time character video background removal according to claim 1, wherein the real-time running of a pre-trained student model to blend the outline image of the character in the image with a preset virtual image, form and display a blended image, and update the student model comprises the following steps:

and running a pre-trained student model in real time, judging whether a figure outline image exists in the image according to a preset mask area threshold value, if not, outputting a pure background image, updating the student model, if so, extracting the figure outline image, fusing the figure outline image with a preset virtual image, forming and displaying a fused image, and updating the student model.

4. The method for real-time background removal of character videos as claimed in claim 1, wherein the real-time running of a pre-trained student model to blend the outline image of the character in the image with a preset virtual image, form and display a blended image, and update the student model, wherein:

the student models are pre-trained according to a loss function and an optimization algorithm.

5. A system for background removal of a live character video, comprising:

the acquisition module is used for acquiring the video stream in real time;

6. The system for real-time human video background removal as claimed in claim 5, wherein said processing module comprises:

a size processing module for reducing the size of the image;

and the tensor processing module is used for carrying out tensor processing on the image.

7. The system for real-time human video background removal as claimed in claim 5, wherein the execution module comprises:

8. The system for real-time human video background removal as claimed in claim 5, wherein the execution module comprises:

and the training module is used for training the student model in advance according to the loss function and the optimization algorithm.