CN112258384A - Method and system for removing background of real-time character video - Google Patents
Method and system for removing background of real-time character video Download PDFInfo
- Publication number
- CN112258384A CN112258384A CN202011132128.5A CN202011132128A CN112258384A CN 112258384 A CN112258384 A CN 112258384A CN 202011132128 A CN202011132128 A CN 202011132128A CN 112258384 A CN112258384 A CN 112258384A
- Authority
- CN
- China
- Prior art keywords
- image
- student model
- real
- outline
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 17
- 238000004821 distillation Methods 0.000 claims abstract description 18
- 238000007781 pre-processing Methods 0.000 claims abstract description 9
- 238000005457 optimization Methods 0.000 claims description 4
- 230000015572 biosynthetic process Effects 0.000 abstract description 6
- 238000003786 synthesis reaction Methods 0.000 abstract description 6
- 238000000605 extraction Methods 0.000 abstract description 3
- 238000007499 fusion processing Methods 0.000 description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/265—Mixing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/272—Means for inserting a foreground image in a background image, i.e. inlay, outlay
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention provides a method and a system for removing background of a real-time character video, which comprise the following contents: acquiring a video stream in real time; preprocessing images in the video stream in real time; running a pre-trained student model in real time to extract a figure outline image in the image and fuse the figure outline image with a preset virtual image to form and display a fused image, and updating the student model; and operating the teacher model asynchronously with the student model to distill the video stream on line, and updating the student model according to the distillation result. The embodiment of the invention can effectively improve the synthesis speed of the virtual background, the character extraction accuracy and reduce the requirement on the equipment memory, thereby being very suitable for daily use of common users.
Description
Technical Field
The invention relates to the technical field of image data processing, in particular to a method and a system for removing background of a real-time character video.
Background
With the popularity of online video conferencing, virtual backgrounds have become an interesting technological phenomenon. The virtual background is to systematically divide each input video into a foreground (mostly characters) and a background in real time. The user selects an image or video that replaces background pixels that are composited with the foreground layer to generate an artificial video stream.
In the prior art, a green screen is generally used as a background, but the method mainly considers the image synthesis precision and quality, does not consider the convenience and efficiency, and is also not suitable for daily use by an ordinary user because a green screen background cannot be provided in many times.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and a system for removing background from a real-time character video, so as to solve the above technical problems.
In order to achieve the above technical object, an embodiment of the present invention provides a method for removing background from a real-time character video, wherein the improvement is as follows: the method comprises the following steps:
acquiring a video stream in real time;
preprocessing images in the video stream in real time;
running a pre-trained student model in real time to extract a figure outline image in the image and fuse the figure outline image with a preset virtual image to form and display a fused image, and updating the student model;
and operating the teacher model asynchronously with the student model to distill the video stream on line, and updating the student model according to the distillation result.
In order to achieve the above technical object, an embodiment of the present invention provides a system for removing background from a real-time character video, wherein the improvement is as follows:
the acquisition module is used for acquiring the video stream in real time;
the processing module is used for preprocessing images in the video stream in real time;
the execution module is used for operating a pre-trained student model in real time so as to extract a figure outline image in the image and fuse the figure outline image with a preset virtual image to form and display a fused image and update the student model;
and the distillation module is used for operating the teacher model asynchronously with the student model so as to distill the video stream on line and update the student model according to the distillation result.
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages:
1. the invention carries out virtual background fusion processing through the pre-trained student model, and the student model has fast operation and high efficiency, so that each frame can be processed and deduced fast enough to ensure the fluency of the real-time video.
2. The present invention performs online distillation by setting a teacher model, can balance controllable accuracy to obtain better throughput, reduce computational cost to achieve prediction of sufficiently high quality, and can achieve high performance at low resource settings by setting the teacher model to run asynchronously to the student models.
The invention has convenient operation and low cost, and can be widely applied to the technical field of image data processing, in particular to the virtual background image processing in a video conference.
Drawings
FIG. 1 is a flow diagram of one embodiment of a method for real-time character video background removal in accordance with the present invention;
FIG. 2 is a schematic diagram of one embodiment of a real-time character video background removal system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
As shown in fig. 1, an embodiment of the present invention provides a method for background removal of a live character video, including the following steps:
acquiring a video stream in real time;
preprocessing images in the video stream in real time;
running a pre-trained student model in real time to extract a figure outline image in the image and fuse the figure outline image with a preset virtual image to form and display a fused image, and updating the student model;
and operating the teacher model asynchronously with the student model to distill the video stream on line, and updating the student model according to the distillation result.
In which online distillation is a video segmentation framework that uses temporal consistency between frames to reduce computational cost, most video streams view a very small subset, e.g., a fixed intersection, a particular room, performing online distillation through a high quality teacher model, can reduce computational cost to achieve sufficiently high quality predictions.
Obviously, because the virtual background fusion processing is carried out through the pre-trained student model, and the student model has fast operation and high efficiency, each frame can be processed and inferred at a fast enough speed to ensure the fluency of the real-time video.
Meanwhile, since the teacher model is set to perform online distillation, controllable accuracy can be balanced to obtain better throughput, computational cost is reduced to achieve prediction of sufficiently high quality, and the teacher model is set to operate asynchronously with the student model, high performance can be achieved under low resource setting.
Therefore, the scheme can effectively improve the synthesis speed of the virtual background and the character extraction accuracy, and reduce the requirement on the equipment memory, so that the method is very suitable for daily use of common users.
In one embodiment, the images in the video stream are preprocessed in real time, wherein the preprocessing includes reducing the image size and performing tensor processing on the images to further shorten the virtual background synthesis time and improve the efficiency.
In one embodiment, a pre-trained student model is run in real time to extract the figure outline image in the image and fuse the figure outline image with a preset virtual image, a fused image is formed and displayed, and the student model is updated, wherein the method comprises the following steps: and running a pre-trained student model in real time, judging whether a figure outline image exists in the image according to a preset mask area threshold value, if not, outputting a pure background image, updating the student model, if so, extracting the figure outline image, fusing the figure outline image with a preset virtual image, forming and displaying a fused image, and updating the student model.
The preset mask area threshold may be set to 5%, but is not limited thereto, and may be adjusted as needed in actual use.
Obviously, through the arrangement, the occurrence of related unexpected situations such as blank output by overfitting of the student model in the video without the person can be effectively avoided.
In one embodiment, a pre-trained student model is run in real time to extract the figure outline image in the image and fuse the figure outline image with a preset virtual image, form and display a fused image, and update the student model, wherein the student model is pre-trained according to BCE Loss function and Adam optimization algorithm to ensure that the student model can output a reasonable fused image at the beginning.
Wherein the student model may employ a JITNet dataset.
In one embodiment, the teacher model may employ MRCNN (target detection algorithm).
Based on the same inventive concept, an embodiment of the present invention further provides a system for removing a background from a live character video, as shown in fig. 2, including:
the acquisition module 1 is used for acquiring a video stream in real time;
the processing module 2 is used for preprocessing images in the video stream in real time;
the execution module 3 is used for running a pre-trained student model in real time so as to extract the figure outline image in the image and fuse the figure outline image with a preset virtual image to form and display a fused image and update the student model;
and the distillation module 4 is used for operating the teacher model asynchronously with the student model so as to carry out online distillation on the video stream and update the student model according to the distillation result.
In which online distillation is a video segmentation framework that uses temporal consistency between frames to reduce computational cost, most video streams view a very small subset, e.g., a fixed intersection, a particular room, performing online distillation through a high quality teacher model, can reduce computational cost to achieve sufficiently high quality predictions.
Obviously, because the virtual background fusion processing is carried out through the pre-trained student model, and the student model has fast operation and high efficiency, each frame can be processed and inferred at a fast enough speed to ensure the fluency of the real-time video.
Meanwhile, since the teacher model is set to perform online distillation, controllable accuracy can be balanced to obtain better throughput, computational cost is reduced to achieve prediction of sufficiently high quality, and the teacher model is set to operate asynchronously with the student model, high performance can be achieved under low resource setting.
Therefore, the scheme can effectively improve the synthesis speed of the virtual background and the character extraction accuracy, and reduce the requirement on the equipment memory, so that the method is very suitable for daily use of common users.
In one embodiment, the processing module 2 comprises:
a size processing module 2 for reducing the size of the image;
and the tensor processing module 2 is used for carrying out tensor processing on the image.
Obviously, the virtual background synthesis time can be further shortened and the efficiency can be improved through the scheme.
In one embodiment, the execution module 3 includes:
the operation module is used for operating the pre-trained student model in real time;
the judging module is used for judging whether the figure outline image exists in the image according to the preset mask area threshold value;
and the execution submodule is used for outputting a pure background image and updating the student model when the figure outline image does not exist, and extracting the figure outline image and fusing the figure outline image with a preset virtual image to form and display a fused image and update the student model when the figure outline image exists.
Obviously, through the arrangement, the occurrence of related unexpected situations such as blank output by overfitting of the student model in the video without the person can be effectively avoided.
In one embodiment, the execution module 3 includes:
and the training module is used for pre-training the student model according to the BCE Loss function and the Adam optimization algorithm so as to ensure that the student model can output a reasonable fusion image at the beginning.
It will be apparent to those skilled in the art that the modules or steps of the embodiments of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (8)
1. A method for background removal of a real-time character video is characterized by comprising the following steps:
acquiring a video stream in real time;
preprocessing images in the video stream in real time;
running a pre-trained student model in real time to extract a figure outline image in the image and fuse the figure outline image with a preset virtual image to form and display a fused image, and updating the student model;
and operating the teacher model asynchronously with the student model to distill the video stream on line, and updating the student model according to the distillation result.
2. The method of claim 1, wherein the preprocessing comprises reducing the size of the image and performing tensor processing on the image.
3. The method for real-time character video background removal according to claim 1, wherein the real-time running of a pre-trained student model to blend the outline image of the character in the image with a preset virtual image, form and display a blended image, and update the student model comprises the following steps:
and running a pre-trained student model in real time, judging whether a figure outline image exists in the image according to a preset mask area threshold value, if not, outputting a pure background image, updating the student model, if so, extracting the figure outline image, fusing the figure outline image with a preset virtual image, forming and displaying a fused image, and updating the student model.
4. The method for real-time background removal of character videos as claimed in claim 1, wherein the real-time running of a pre-trained student model to blend the outline image of the character in the image with a preset virtual image, form and display a blended image, and update the student model, wherein:
the student models are pre-trained according to a loss function and an optimization algorithm.
5. A system for background removal of a live character video, comprising:
the acquisition module is used for acquiring the video stream in real time;
the processing module is used for preprocessing images in the video stream in real time;
the execution module is used for operating a pre-trained student model in real time so as to extract a figure outline image in the image and fuse the figure outline image with a preset virtual image to form and display a fused image and update the student model;
and the distillation module is used for operating the teacher model asynchronously with the student model so as to distill the video stream on line and update the student model according to the distillation result.
6. The system for real-time human video background removal as claimed in claim 5, wherein said processing module comprises:
a size processing module for reducing the size of the image;
and the tensor processing module is used for carrying out tensor processing on the image.
7. The system for real-time human video background removal as claimed in claim 5, wherein the execution module comprises:
the operation module is used for operating the pre-trained student model in real time;
the judging module is used for judging whether the figure outline image exists in the image according to the preset mask area threshold value;
and the execution submodule is used for outputting a pure background image and updating the student model when the figure outline image does not exist, and extracting the figure outline image and fusing the figure outline image with a preset virtual image to form and display a fused image and update the student model when the figure outline image exists.
8. The system for real-time human video background removal as claimed in claim 5, wherein the execution module comprises:
and the training module is used for training the student model in advance according to the loss function and the optimization algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011132128.5A CN112258384A (en) | 2020-10-22 | 2020-10-22 | Method and system for removing background of real-time character video |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011132128.5A CN112258384A (en) | 2020-10-22 | 2020-10-22 | Method and system for removing background of real-time character video |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112258384A true CN112258384A (en) | 2021-01-22 |
Family
ID=74263708
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011132128.5A Pending CN112258384A (en) | 2020-10-22 | 2020-10-22 | Method and system for removing background of real-time character video |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112258384A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106954034A (en) * | 2017-03-28 | 2017-07-14 | 宇龙计算机通信科技(深圳)有限公司 | A kind of image processing method and device |
CN108875764A (en) * | 2017-07-12 | 2018-11-23 | 北京旷视科技有限公司 | Model training method, device, system and computer-readable medium |
CN110348496A (en) * | 2019-06-27 | 2019-10-18 | 广州久邦世纪科技有限公司 | A kind of method and system of facial image fusion |
CN111723697A (en) * | 2020-06-05 | 2020-09-29 | 广东海洋大学 | Improved driver background segmentation method based on Mask-RCNN |
-
2020
- 2020-10-22 CN CN202011132128.5A patent/CN112258384A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106954034A (en) * | 2017-03-28 | 2017-07-14 | 宇龙计算机通信科技(深圳)有限公司 | A kind of image processing method and device |
CN108875764A (en) * | 2017-07-12 | 2018-11-23 | 北京旷视科技有限公司 | Model training method, device, system and computer-readable medium |
CN110348496A (en) * | 2019-06-27 | 2019-10-18 | 广州久邦世纪科技有限公司 | A kind of method and system of facial image fusion |
CN111723697A (en) * | 2020-06-05 | 2020-09-29 | 广东海洋大学 | Improved driver background segmentation method based on Mask-RCNN |
Non-Patent Citations (1)
Title |
---|
JO CHUANG, QIAN DONG: "JIT-Masker: Efficient Online Distillation for Background Matting", 《ARXIV:2006.06185V1》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11967151B2 (en) | Video classification method and apparatus, model training method and apparatus, device, and storage medium | |
US11830288B2 (en) | Method and apparatus for training face fusion model and electronic device | |
US10599914B2 (en) | Method and apparatus for human face image processing | |
US20230237841A1 (en) | Occlusion Detection | |
CN112235520B (en) | Image processing method and device, electronic equipment and storage medium | |
US11409794B2 (en) | Image deformation control method and device and hardware device | |
US11961237B2 (en) | Foreground data generation method and method for applying same, related apparatus, and system | |
CN111090778B (en) | Picture generation method, device, equipment and storage medium | |
CN111832745A (en) | Data augmentation method and device and electronic equipment | |
JP7401606B2 (en) | Virtual object lip driving method, model training method, related equipment and electronic equipment | |
CN113160244B (en) | Video processing method, device, electronic equipment and storage medium | |
CN109035147B (en) | Image processing method and device, electronic device, storage medium and computer equipment | |
CN111383232A (en) | Matting method, matting device, terminal equipment and computer-readable storage medium | |
CN114821734A (en) | Method and device for driving expression of virtual character | |
JP2023539620A (en) | Facial image processing method, display method, device and computer program | |
CN110555334A (en) | face feature determination method and device, storage medium and electronic equipment | |
WO2022148248A1 (en) | Image processing model training method, image processing method and apparatus, electronic device, and computer program product | |
JP2023543964A (en) | Image processing method, image processing device, electronic device, storage medium and computer program | |
CN115967823A (en) | Video cover generation method and device, electronic equipment and readable medium | |
CN116664603B (en) | Image processing method, device, electronic equipment and storage medium | |
CN112714337A (en) | Video processing method and device, electronic equipment and storage medium | |
CN111861954A (en) | Method and device for editing human face, electronic equipment and readable storage medium | |
CN112258384A (en) | Method and system for removing background of real-time character video | |
CN111787389B (en) | Transposed video identification method, device, equipment and storage medium | |
US20230131418A1 (en) | Two-dimensional (2d) feature database generation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210122 |