CN112258384A - Method and system for removing background of real-time character video - Google Patents

Method and system for removing background of real-time character video Download PDF

Info

Publication number
CN112258384A
CN112258384A CN202011132128.5A CN202011132128A CN112258384A CN 112258384 A CN112258384 A CN 112258384A CN 202011132128 A CN202011132128 A CN 202011132128A CN 112258384 A CN112258384 A CN 112258384A
Authority
CN
China
Prior art keywords
image
student model
real
outline
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011132128.5A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Shenzhi Technology Co ltd
Original Assignee
Beijing Zhongke Shenzhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Shenzhi Technology Co ltd filed Critical Beijing Zhongke Shenzhi Technology Co ltd
Priority to CN202011132128.5A priority Critical patent/CN112258384A/en
Publication of CN112258384A publication Critical patent/CN112258384A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/272Means for inserting a foreground image in a background image, i.e. inlay, outlay
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a method and a system for removing background of a real-time character video, which comprise the following contents: acquiring a video stream in real time; preprocessing images in the video stream in real time; running a pre-trained student model in real time to extract a figure outline image in the image and fuse the figure outline image with a preset virtual image to form and display a fused image, and updating the student model; and operating the teacher model asynchronously with the student model to distill the video stream on line, and updating the student model according to the distillation result. The embodiment of the invention can effectively improve the synthesis speed of the virtual background, the character extraction accuracy and reduce the requirement on the equipment memory, thereby being very suitable for daily use of common users.

Description

Method and system for removing background of real-time character video
Technical Field
The invention relates to the technical field of image data processing, in particular to a method and a system for removing background of a real-time character video.
Background
With the popularity of online video conferencing, virtual backgrounds have become an interesting technological phenomenon. The virtual background is to systematically divide each input video into a foreground (mostly characters) and a background in real time. The user selects an image or video that replaces background pixels that are composited with the foreground layer to generate an artificial video stream.
In the prior art, a green screen is generally used as a background, but the method mainly considers the image synthesis precision and quality, does not consider the convenience and efficiency, and is also not suitable for daily use by an ordinary user because a green screen background cannot be provided in many times.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and a system for removing background from a real-time character video, so as to solve the above technical problems.
In order to achieve the above technical object, an embodiment of the present invention provides a method for removing background from a real-time character video, wherein the improvement is as follows: the method comprises the following steps:
acquiring a video stream in real time;
preprocessing images in the video stream in real time;
running a pre-trained student model in real time to extract a figure outline image in the image and fuse the figure outline image with a preset virtual image to form and display a fused image, and updating the student model;
and operating the teacher model asynchronously with the student model to distill the video stream on line, and updating the student model according to the distillation result.
In order to achieve the above technical object, an embodiment of the present invention provides a system for removing background from a real-time character video, wherein the improvement is as follows:
the acquisition module is used for acquiring the video stream in real time;
the processing module is used for preprocessing images in the video stream in real time;
the execution module is used for operating a pre-trained student model in real time so as to extract a figure outline image in the image and fuse the figure outline image with a preset virtual image to form and display a fused image and update the student model;
and the distillation module is used for operating the teacher model asynchronously with the student model so as to distill the video stream on line and update the student model according to the distillation result.
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages:
1. the invention carries out virtual background fusion processing through the pre-trained student model, and the student model has fast operation and high efficiency, so that each frame can be processed and deduced fast enough to ensure the fluency of the real-time video.
2. The present invention performs online distillation by setting a teacher model, can balance controllable accuracy to obtain better throughput, reduce computational cost to achieve prediction of sufficiently high quality, and can achieve high performance at low resource settings by setting the teacher model to run asynchronously to the student models.
The invention has convenient operation and low cost, and can be widely applied to the technical field of image data processing, in particular to the virtual background image processing in a video conference.
Drawings
FIG. 1 is a flow diagram of one embodiment of a method for real-time character video background removal in accordance with the present invention;
FIG. 2 is a schematic diagram of one embodiment of a real-time character video background removal system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
As shown in fig. 1, an embodiment of the present invention provides a method for background removal of a live character video, including the following steps:
acquiring a video stream in real time;
preprocessing images in the video stream in real time;
running a pre-trained student model in real time to extract a figure outline image in the image and fuse the figure outline image with a preset virtual image to form and display a fused image, and updating the student model;
and operating the teacher model asynchronously with the student model to distill the video stream on line, and updating the student model according to the distillation result.
In which online distillation is a video segmentation framework that uses temporal consistency between frames to reduce computational cost, most video streams view a very small subset, e.g., a fixed intersection, a particular room, performing online distillation through a high quality teacher model, can reduce computational cost to achieve sufficiently high quality predictions.
Obviously, because the virtual background fusion processing is carried out through the pre-trained student model, and the student model has fast operation and high efficiency, each frame can be processed and inferred at a fast enough speed to ensure the fluency of the real-time video.
Meanwhile, since the teacher model is set to perform online distillation, controllable accuracy can be balanced to obtain better throughput, computational cost is reduced to achieve prediction of sufficiently high quality, and the teacher model is set to operate asynchronously with the student model, high performance can be achieved under low resource setting.
Therefore, the scheme can effectively improve the synthesis speed of the virtual background and the character extraction accuracy, and reduce the requirement on the equipment memory, so that the method is very suitable for daily use of common users.
In one embodiment, the images in the video stream are preprocessed in real time, wherein the preprocessing includes reducing the image size and performing tensor processing on the images to further shorten the virtual background synthesis time and improve the efficiency.
In one embodiment, a pre-trained student model is run in real time to extract the figure outline image in the image and fuse the figure outline image with a preset virtual image, a fused image is formed and displayed, and the student model is updated, wherein the method comprises the following steps: and running a pre-trained student model in real time, judging whether a figure outline image exists in the image according to a preset mask area threshold value, if not, outputting a pure background image, updating the student model, if so, extracting the figure outline image, fusing the figure outline image with a preset virtual image, forming and displaying a fused image, and updating the student model.
The preset mask area threshold may be set to 5%, but is not limited thereto, and may be adjusted as needed in actual use.
Obviously, through the arrangement, the occurrence of related unexpected situations such as blank output by overfitting of the student model in the video without the person can be effectively avoided.
In one embodiment, a pre-trained student model is run in real time to extract the figure outline image in the image and fuse the figure outline image with a preset virtual image, form and display a fused image, and update the student model, wherein the student model is pre-trained according to BCE Loss function and Adam optimization algorithm to ensure that the student model can output a reasonable fused image at the beginning.
Wherein the student model may employ a JITNet dataset.
In one embodiment, the teacher model may employ MRCNN (target detection algorithm).
Based on the same inventive concept, an embodiment of the present invention further provides a system for removing a background from a live character video, as shown in fig. 2, including:
the acquisition module 1 is used for acquiring a video stream in real time;
the processing module 2 is used for preprocessing images in the video stream in real time;
the execution module 3 is used for running a pre-trained student model in real time so as to extract the figure outline image in the image and fuse the figure outline image with a preset virtual image to form and display a fused image and update the student model;
and the distillation module 4 is used for operating the teacher model asynchronously with the student model so as to carry out online distillation on the video stream and update the student model according to the distillation result.
In which online distillation is a video segmentation framework that uses temporal consistency between frames to reduce computational cost, most video streams view a very small subset, e.g., a fixed intersection, a particular room, performing online distillation through a high quality teacher model, can reduce computational cost to achieve sufficiently high quality predictions.
Obviously, because the virtual background fusion processing is carried out through the pre-trained student model, and the student model has fast operation and high efficiency, each frame can be processed and inferred at a fast enough speed to ensure the fluency of the real-time video.
Meanwhile, since the teacher model is set to perform online distillation, controllable accuracy can be balanced to obtain better throughput, computational cost is reduced to achieve prediction of sufficiently high quality, and the teacher model is set to operate asynchronously with the student model, high performance can be achieved under low resource setting.
Therefore, the scheme can effectively improve the synthesis speed of the virtual background and the character extraction accuracy, and reduce the requirement on the equipment memory, so that the method is very suitable for daily use of common users.
In one embodiment, the processing module 2 comprises:
a size processing module 2 for reducing the size of the image;
and the tensor processing module 2 is used for carrying out tensor processing on the image.
Obviously, the virtual background synthesis time can be further shortened and the efficiency can be improved through the scheme.
In one embodiment, the execution module 3 includes:
the operation module is used for operating the pre-trained student model in real time;
the judging module is used for judging whether the figure outline image exists in the image according to the preset mask area threshold value;
and the execution submodule is used for outputting a pure background image and updating the student model when the figure outline image does not exist, and extracting the figure outline image and fusing the figure outline image with a preset virtual image to form and display a fused image and update the student model when the figure outline image exists.
Obviously, through the arrangement, the occurrence of related unexpected situations such as blank output by overfitting of the student model in the video without the person can be effectively avoided.
In one embodiment, the execution module 3 includes:
and the training module is used for pre-training the student model according to the BCE Loss function and the Adam optimization algorithm so as to ensure that the student model can output a reasonable fusion image at the beginning.
It will be apparent to those skilled in the art that the modules or steps of the embodiments of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A method for background removal of a real-time character video is characterized by comprising the following steps:
acquiring a video stream in real time;
preprocessing images in the video stream in real time;
running a pre-trained student model in real time to extract a figure outline image in the image and fuse the figure outline image with a preset virtual image to form and display a fused image, and updating the student model;
and operating the teacher model asynchronously with the student model to distill the video stream on line, and updating the student model according to the distillation result.
2. The method of claim 1, wherein the preprocessing comprises reducing the size of the image and performing tensor processing on the image.
3. The method for real-time character video background removal according to claim 1, wherein the real-time running of a pre-trained student model to blend the outline image of the character in the image with a preset virtual image, form and display a blended image, and update the student model comprises the following steps:
and running a pre-trained student model in real time, judging whether a figure outline image exists in the image according to a preset mask area threshold value, if not, outputting a pure background image, updating the student model, if so, extracting the figure outline image, fusing the figure outline image with a preset virtual image, forming and displaying a fused image, and updating the student model.
4. The method for real-time background removal of character videos as claimed in claim 1, wherein the real-time running of a pre-trained student model to blend the outline image of the character in the image with a preset virtual image, form and display a blended image, and update the student model, wherein:
the student models are pre-trained according to a loss function and an optimization algorithm.
5. A system for background removal of a live character video, comprising:
the acquisition module is used for acquiring the video stream in real time;
the processing module is used for preprocessing images in the video stream in real time;
the execution module is used for operating a pre-trained student model in real time so as to extract a figure outline image in the image and fuse the figure outline image with a preset virtual image to form and display a fused image and update the student model;
and the distillation module is used for operating the teacher model asynchronously with the student model so as to distill the video stream on line and update the student model according to the distillation result.
6. The system for real-time human video background removal as claimed in claim 5, wherein said processing module comprises:
a size processing module for reducing the size of the image;
and the tensor processing module is used for carrying out tensor processing on the image.
7. The system for real-time human video background removal as claimed in claim 5, wherein the execution module comprises:
the operation module is used for operating the pre-trained student model in real time;
the judging module is used for judging whether the figure outline image exists in the image according to the preset mask area threshold value;
and the execution submodule is used for outputting a pure background image and updating the student model when the figure outline image does not exist, and extracting the figure outline image and fusing the figure outline image with a preset virtual image to form and display a fused image and update the student model when the figure outline image exists.
8. The system for real-time human video background removal as claimed in claim 5, wherein the execution module comprises:
and the training module is used for training the student model in advance according to the loss function and the optimization algorithm.
CN202011132128.5A 2020-10-22 2020-10-22 Method and system for removing background of real-time character video Pending CN112258384A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011132128.5A CN112258384A (en) 2020-10-22 2020-10-22 Method and system for removing background of real-time character video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011132128.5A CN112258384A (en) 2020-10-22 2020-10-22 Method and system for removing background of real-time character video

Publications (1)

Publication Number Publication Date
CN112258384A true CN112258384A (en) 2021-01-22

Family

ID=74263708

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011132128.5A Pending CN112258384A (en) 2020-10-22 2020-10-22 Method and system for removing background of real-time character video

Country Status (1)

Country Link
CN (1) CN112258384A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106954034A (en) * 2017-03-28 2017-07-14 宇龙计算机通信科技(深圳)有限公司 A kind of image processing method and device
CN108875764A (en) * 2017-07-12 2018-11-23 北京旷视科技有限公司 Model training method, device, system and computer-readable medium
CN110348496A (en) * 2019-06-27 2019-10-18 广州久邦世纪科技有限公司 A kind of method and system of facial image fusion
CN111723697A (en) * 2020-06-05 2020-09-29 广东海洋大学 Improved driver background segmentation method based on Mask-RCNN

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106954034A (en) * 2017-03-28 2017-07-14 宇龙计算机通信科技(深圳)有限公司 A kind of image processing method and device
CN108875764A (en) * 2017-07-12 2018-11-23 北京旷视科技有限公司 Model training method, device, system and computer-readable medium
CN110348496A (en) * 2019-06-27 2019-10-18 广州久邦世纪科技有限公司 A kind of method and system of facial image fusion
CN111723697A (en) * 2020-06-05 2020-09-29 广东海洋大学 Improved driver background segmentation method based on Mask-RCNN

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JO CHUANG, QIAN DONG: "JIT-Masker: Efficient Online Distillation for Background Matting", 《ARXIV:2006.06185V1》 *

Similar Documents

Publication Publication Date Title
US11967151B2 (en) Video classification method and apparatus, model training method and apparatus, device, and storage medium
US11830288B2 (en) Method and apparatus for training face fusion model and electronic device
US10599914B2 (en) Method and apparatus for human face image processing
US20230237841A1 (en) Occlusion Detection
CN112235520B (en) Image processing method and device, electronic equipment and storage medium
US11409794B2 (en) Image deformation control method and device and hardware device
US11961237B2 (en) Foreground data generation method and method for applying same, related apparatus, and system
CN111090778B (en) Picture generation method, device, equipment and storage medium
CN111832745A (en) Data augmentation method and device and electronic equipment
JP7401606B2 (en) Virtual object lip driving method, model training method, related equipment and electronic equipment
CN113160244B (en) Video processing method, device, electronic equipment and storage medium
CN109035147B (en) Image processing method and device, electronic device, storage medium and computer equipment
CN111383232A (en) Matting method, matting device, terminal equipment and computer-readable storage medium
CN114821734A (en) Method and device for driving expression of virtual character
JP2023539620A (en) Facial image processing method, display method, device and computer program
CN110555334A (en) face feature determination method and device, storage medium and electronic equipment
WO2022148248A1 (en) Image processing model training method, image processing method and apparatus, electronic device, and computer program product
JP2023543964A (en) Image processing method, image processing device, electronic device, storage medium and computer program
CN115967823A (en) Video cover generation method and device, electronic equipment and readable medium
CN116664603B (en) Image processing method, device, electronic equipment and storage medium
CN112714337A (en) Video processing method and device, electronic equipment and storage medium
CN111861954A (en) Method and device for editing human face, electronic equipment and readable storage medium
CN112258384A (en) Method and system for removing background of real-time character video
CN111787389B (en) Transposed video identification method, device, equipment and storage medium
US20230131418A1 (en) Two-dimensional (2d) feature database generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210122