CN117351118A - Lightweight fixed background matting method and system combined with depth information - Google Patents

Lightweight fixed background matting method and system combined with depth information Download PDF

Info

Publication number
CN117351118A
CN117351118A CN202311641622.8A CN202311641622A CN117351118A CN 117351118 A CN117351118 A CN 117351118A CN 202311641622 A CN202311641622 A CN 202311641622A CN 117351118 A CN117351118 A CN 117351118A
Authority
CN
China
Prior art keywords
image
matting
error
network
background
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311641622.8A
Other languages
Chinese (zh)
Other versions
CN117351118B (en
Inventor
李汉曦
李国锋
邱欣延
李波
武林
程艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Normal University
Original Assignee
Jiangxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Normal University filed Critical Jiangxi Normal University
Priority to CN202311641622.8A priority Critical patent/CN117351118B/en
Publication of CN117351118A publication Critical patent/CN117351118A/en
Application granted granted Critical
Publication of CN117351118B publication Critical patent/CN117351118B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/84Arrangements for image or video recognition or understanding using pattern recognition or machine learning using probabilistic graphical models from image or video features, e.g. Markov models or Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a light-weight fixed background matting method and a system combining depth information, wherein the method comprises the following steps: the method comprises the steps of collecting image data, inputting foreground and background images into a basic image matting network, obtaining a first rough image matting and a first error image, inputting a depth image into a Bayesian predictor, obtaining a second rough image matting, obtaining a second error image according to the absolute value of the image matting error and adding the second error image and the first error image to obtain a final error image, obtaining an image block to be refined, and inputting the image block to be refined into a refining network to obtain a refined predicted image. According to the light-weight fixed background image matting method combining the depth information, the image matting network is light-weight through distillation, the calculation resource and configuration requirements are reduced, the image matting speed is increased, the depth information is added, the image matting precision is improved, the error area is input into the finishing network in a blocking mode, the finishing speed is increased, the image matting precision is improved, and the aim of fixing the background image matting at low cost and high speed and high precision is achieved.

Description

Lightweight fixed background matting method and system combined with depth information
Technical Field
The invention relates to the field of image data processing, in particular to a lightweight fixed background matting method and system combined with depth information.
Background
With the rapid development of artificial intelligence, automatic Image Matting technology is increasingly developed, and the application of the automatic Image Matting technology in entertainment scenes is also more and more extensive, so that the Image Matting (Image Matting) is used as a computer vision technology with wide application, and aims to separate foreground images from single images or video streams, then carry out overlapping combination with new backgrounds, and the wide application in a plurality of vision tasks, such as Image or video editing, attracts wide attention in the computer vision field.
The existing image matting technology is endlessly layered, the fixed background matting technology is more than an application scene which is similar to background known such as lesson recording and is fixed for a long time in the using process, the advantage of background fixation is fully utilized, however, the existing fixed background matting technology needs too much calculation resources, has higher requirements on equipment configuration and needs professional hardware equipment, cannot well combine the shooting functions of the existing camera and mobile phone, and cannot well utilize the existing numerous shooting equipment capable of simultaneously acquiring RGB images and corresponding depth images, so that the objective requirement of real-time rapid image matting in practical application cannot be met.
Therefore, developing a fixed background image matting algorithm which has low requirements on hardware configuration and computing resources and can rapidly perform high-precision image matting in real time becomes a problem to be solved urgently.
Disclosure of Invention
Based on the above, the invention aims to provide a light-weight fixed background image matting method and system combining depth information, which are characterized in that a network distillation is used for light-weight basic image matting network, the requirements of computing resources and hardware configuration are reduced, the image matting speed of the network is accelerated, the depth information is added into an algorithm, the image matting precision is improved, and the error area is input into a finishing network in a blocking way, so that the finishing speed is accelerated, the image matting precision is greatly improved, and the aim of fixing background image matting with low cost, high speed and high precision is fulfilled.
The light-weight fixed background matting method combining the depth information provided by the invention comprises the following steps:
collecting image data, wherein the image data comprises a foreground image, a foreground depth image, a background image and a background depth image;
inputting the foreground image and the background image into a basic matting network to obtain a first rough matting prediction image and a first error image;
inputting the front depth image and the background depth image into a Bayesian predictor to obtain a second rough matting prediction image;
acquiring a second error image according to the first rough matting prediction image and the second rough matting prediction image and adding the second error image and the first error image to acquire a final error image;
and acquiring an image block to be refined according to the foreground image, the foreground depth image, the background depth image and the final error image, and inputting the image block to be refined into a refining network to acquire a refined prediction image.
In summary, according to the light-weight fixed background image matting method and system combining depth information, the network distillation is used for light-weight basic image matting network, the requirements of computing resources and hardware configuration are reduced, the image matting speed of the network is accelerated, the depth information is added into an algorithm, the image matting precision is improved, the error area is segmented and input into the finishing network, the finishing speed is accelerated, the image matting precision is greatly improved, and the aim of fixing the background image matting at low cost and high speed and high precision is fulfilled. Specifically, through carrying out neural network distillation to the well-trained keying network in advance for the lightweight network can learn the information that the complex network possessed, reduced required computational resource of keying and hardware configuration requirement, and the lightweight keying network can also have faster keying speed when guaranteeing the keying precision, the rethread introduces the depth information, obtains the keying prediction based on depth information according to Bayesian formula through the Bayesian predictor, further improves the accuracy of keying, and through tailorring error block stack input finishing network and carry out the finish repair, greatly improved the precision of keying when further improving the keying speed, realized high accuracy real-time keying.
Further, before the step of inputting the foreground image and the background image into the basic matting network to obtain the first rough matting prediction image and the first error image, the method further includes:
a training data set is acquired, and a teacher network and a student network for pre-training of a basic image matting network are set;
inputting the training data set into the teacher network, predicting the training data set by the teacher network, outputting a teacher network prediction result, and setting the teacher network prediction result as a soft label;
inputting the training data set into the student network, and performing predictive training on the training data set by the student network according to the soft label, and calculating training loss of the student network according to the following formula:
wherein,is total regression loss of student model, < >>Is a loss of student network, < >>Is distillation loss->Andweights of student network loss and distillation loss, respectively;
and updating parameters of the student network according to the training loss to acquire the basic image matting network.
Further, the step of inputting the foreground image and the background image into a basic matting network to obtain a first rough matting prediction image and a first error image includes:
inputting a foreground picture and a background picture into a basic matting network;
image feature extraction is carried out through backbone network and ASPP cavity space pyramid pooling in the basic image matting network;
and acquiring a first rough matting prediction image and a first error image through the basic matting decoder.
Further, the step of inputting the foreground depth image and the background depth image into a bayesian predictor, and obtaining a second rough matting prediction image includes:
inputting the acquired front depth image and background depth image into a Bayesian predictor;
calculating the mean value and variance of each pixel point in the background according to the background depth map;
constructing Gaussian probability distribution of background depth values, and setting the foreground depth values to be uniform distribution;
and acquiring the second rough matting prediction image through a Bayesian formula according to the front depth image and the background depth image.
Further, the step of obtaining a second error image according to the first rough matting prediction image and the second rough matting prediction image and adding the second error image to the first error image to obtain a final error image includes:
subtracting the first rough matting prediction image from the second rough matting prediction image;
acquiring an absolute value of an error of the first rough matting prediction image and the second rough matting prediction image;
and acquiring the second error image according to the absolute value of the error, and adding the second error image and the first error image according to proportion so as to acquire the final error image.
Further, the step of obtaining the image block to be refined according to the foreground image, the foreground depth image, the background depth image and the final error image includes:
marking an error region with an image error higher than a preset error index in the final error image;
clipping each image region corresponding to the error region in the foreground image, the background image, the foreground depth image and the background depth image to obtain a plurality of error image blocks corresponding to each error region;
and stacking the error image blocks of the single error region to obtain image blocks to be refined of each error region.
Further, the step of inputting the image block to be trimmed into a trimming network to obtain a trimming prediction image includes:
inputting the image block to be refined into a refining network;
the refining network refines the image blocks to be refined and performs corresponding error refining according to a preset error index so as to obtain refined image blocks;
and pasting the refined image block into a corresponding image area of the first rough matting prediction image to obtain the refined prediction image.
According to an embodiment of the invention, a lightweight fixed background matting system combined with depth information comprises:
the acquisition module is used for acquiring image data, wherein the image data comprises a foreground image, a foreground depth image, a background image and a background depth image;
the image matting module is used for inputting the foreground image and the background image into a basic image matting network to obtain a first rough image matting prediction image and a first error image, inputting the foreground depth image and the background depth image into a Bayesian predictor to obtain a second rough image matting prediction image, obtaining a second error image according to the first rough image matting prediction image and the second rough image matting prediction image, and adding the second error image and the first error image to obtain a final error image;
and the finishing module is used for acquiring an image block to be finished according to the foreground image, the foreground depth image, the background depth image and the final error image, and inputting the image block to be finished into a finishing network to acquire a finishing prediction image.
In another aspect of the present invention, there is further provided a storage medium including one or more programs stored in the storage medium, which when executed implement the lightweight fixed background matting method described above that combines depth information.
Another aspect of the invention also provides a computer device comprising a memory and a processor, wherein:
the memory is used for storing a computer program;
the processor is used for realizing the light-weight fixed background matting method combining the depth information when executing the computer program stored on the memory.
Drawings
Fig. 1 is a flowchart of a lightweight fixed background matting method combined with depth information according to a first embodiment of the present invention;
fig. 2 is a flowchart of a light-weight fixed background matting method combined with depth information according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a lightweight fixed background matting system combined with depth information according to a third embodiment of the present invention;
fig. 4 is a step chart of an algorithm for obtaining a fixed background depth probability distribution of a lightweight fixed background matting method combining depth information according to a first embodiment of the present invention;
fig. 5 is a step chart of a bayesian predictor algorithm of a lightweight fixed background matting method combining depth information according to a first embodiment of the present invention;
fig. 6 is a pre-finishing matting prediction diagram of a lightweight fixed background matting method combining depth information according to a second embodiment of the present invention;
fig. 7 is a post-finishing matting prediction diagram of a lightweight fixed background matting method combining depth information according to a second embodiment of the present invention.
The invention will be further described in the following detailed description in conjunction with the above-described figures.
Detailed Description
In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Several embodiments of the invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
It will be understood that when an element is referred to as being "mounted" on another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like are used herein for illustrative purposes only.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
Referring to fig. 1, a flowchart of a method for combining depth information and fixing background matting according to a first embodiment of the present invention is shown, where the method for combining depth information and fixing background matting includes steps S01 to S05, and the steps include:
step S01: collecting image data;
it should be noted that, in this embodiment, an Azure Kinect camera is used to collect image data by using a camera capable of collecting RGB and depth information, so that the matting effect is more stable, an RGB picture of fifty blank backgrounds and a corresponding depth map are collected, and an RGB foreground image requiring matting is circularly collected.
Step S02: inputting the foreground image and the corresponding background image into a basic matting network to obtain a first rough matting prediction image and a first error image;
in this embodiment, the pre-trained res net50 network is used as a teacher network, the mobilenet v2 network is used as a student network, the learning of the student network is guided by the output of the teacher network, i.e. the information learned by the res net50 network is distilled to the lightweight mobiletv 2 network through a neural network distillation technology, specifically, the pre-trained res net50 network on the ImageNet data set is pre-trained in a picture digging mode, firstly, the foreground images and the background data set disclosed by the network are extracted, different foreground images are attached to different backgrounds, random affine transformation, horizontal overturning, dirt, sharpening and the like are performed, the picture is more in accordance with the data enhancement technology of the actual situation, the res net50 network predicts the training set, the output result is used as a soft label of the teacher network, the soft label is richer than the traditional hard label, more information and the learning of the guidance student network can be provided, the mobilenet 2 network is used as the training set of the student network, the loss is calculated by the training network during the training set, the conventional training network is used for the learning network, the loss is more similar to the conventional teacher network, the learning network is further calculated, and the loss is more similar to the conventional training network, and the learning network is further calculated, and the loss is more similar to the conventional training network.
It will be appreciated that in this embodiment, the reason for choosing ResNet50 and MobileNet V2 is that, first, in the model structure, resNet50 is a classical convolutional neural network, including 50 convolutional layers, mobileNet V2 is a lightweight neural network, including fewer convolutional layers and fewer parameters, and in terms of precision, resNet50 can generally achieve higher precision in a large image dataset, and MobileNet V2 is more suitable for lightweight applications. In terms of calculation cost, as the number of parameters of the MobileNet V2 is smaller, the calculation cost is lower, and the method is suitable for scenes with limited calculation resources, so that the ResNet50 is suitable for fields requiring high precision, the MobileNet V2 is more suitable for lightweight application scenes, and the lightweight student network is guided by a high-precision teacher network, so that the lightweight chemical network has higher precision while maintaining the image matting speed, and is more in line with the actual situation of the actual problem to be solved by the invention.
Step S03: inputting the front depth image and the background depth image into a Bayesian predictor to obtain a second rough matting prediction image;
it should be noted that, the matting effect obtained by the natural background matting network is not accurate enough for our requirements, in order to improve the matting accuracy, the depth image information is introduced under the framework of the bayesian theory to improve the matting accuracy, the bayesian decision is to estimate the subjective probability (i.e. prior probability) of a part of unknown states under incomplete information, then correct the occurrence probability by using a bayesian formula, and finally make an optimal decision by using the correction probability, and the bayesian classification formula is as follows:
where P (A) is the prior probability or edge probability of A, called "prior" because it does not consider any factors in B, P (A|B) is the conditional probability of A after the occurrence of B is known, also called A's posterior probability due to the value of B, P (B|A) is the conditional probability of B after the occurrence of A is known, also called B's posterior probability due to the value of A, P (B) is the prior probability or edge probability of B, also called normalization constant (normalized constant);
the bayesian formula is a final prediction using prior information plus posterior information, and the probability that each pixel point of a natural picture is a foreground portrait or a background is equal, so in this embodiment, the probability of the foreground portrait under the condition of an unknown scene is taken as the prior probability, the posterior probability is obtained by correcting the prior probability by adding depth information, the posterior probability is taken as the final matting result, and the formula is obtained by combining the prior and the depth information:
wherein,representing the probability that the corresponding pixel point is a foreground portrait,/->Representing the probability of obtaining the corresponding depth value for the foreground image by knowing the corresponding pixel point, +.>Representing the probability that the corresponding pixel is background, < +.>Representing the probability that the corresponding pixel is known to take the corresponding depth value for the background,representing pixel point to obtain current depthThe value is the probability of the foreground person, i.e. the probability of the foreground person we require;
the event of whether the unit pixel point is a foreground human image is recorded as,/>Event indicating that the unit pixel point is a foreground person, +.>Representing the probability that the unit pixel point is a foreground portrait, then:
representing the probability that the unit pixel point is background, then:
depth by each pixel of the depth mapIndicating (I)>Representing the depth corresponding to the known current pixel point as +.>Is the probability of a foreground portrait,/-, for the condition (a)>In case that the current pixel point is known as the background, the corresponding depth is +.>Probability of->In case that the current pixel point is known to be a foreground human image, the corresponding depth is +.>In the present embodiment, the depth value +_for each pixel of the fixed background is set>Obeying independent Gaussian distribution, firstly obtaining multi-frame background depth, and calculating the mean value and variance of the background depth, thereby obtaining a probability distribution concrete algorithm of background depth values referring to FIG. 4, wherein a conditional probability distribution formula of the background depth values is as follows:
in the application scenario of the fixed background of the embodiment, under the condition that the current pixel point is known to be the foreground portrait, the depth values of the foreground portrait positions are set to be subject to uniform distribution, that is, the probability that the portrait appears at any position between the background and the camera is the same, and the portrait is almost impossible to be close to the camera or integrated with the background, so that the front and rear images are subtracted from each other in the depth processingUnits (/ -)>Set to 50), the conditional probability distribution of the resulting foreground image depth values is as follows:
combining the probability distribution to obtain the posterior probability of the foreground image at the depth corresponding to the current pixel point, and adding a super-parameter to make the keying result more accurateThis parameter is used to adjust the different ratios of foreground and background in the prior probability, so we get the final matting formula as follows:
detailed algorithm steps of the bayesian predictor refer to fig. 5, so as to obtain a matting prediction image based on depth information through the bayesian predictor.
Step S04: acquiring a second error image according to the first rough matting prediction image and the second rough matting prediction image, and adding the second error image and the first error image to acquire a final error image;
step S05: and acquiring an image block to be refined according to the foreground image, the foreground depth image, the background depth image and the final error image, and inputting the image block to be refined into a refining network to acquire a refined prediction image.
It should be noted that in this embodiment, the absolute value is subtracted from two rough matting images to obtain another Zhang Kou image error image, and then the two error images are added proportionally, so that a more accurate final error image can be obtained, because the matting prediction obtained by the basic matting network is insufficient to meet the use requirement, then, according to the larger error part in the error image, the corresponding squares of the error area are cut out from the RGB foreground image, the RGB background image and the corresponding depth image, and the squares are stacked together and input into the finishing network, so that a more fine matting result of the corresponding positions of the squares can be obtained, and finally, the fine image blocks are pasted back into the previous rough sketch, and the reason that the error image is not input through the cutting block is to accelerate the finishing speed, so that the more accurate matting prediction is obtained, the use requirement is well met, and the real-time matting can be achieved.
In summary, according to the light-weight fixed background image matting method and system combining depth information, the network distillation is used for light-weight basic image matting network, the requirements of computing resources and hardware configuration are reduced, the image matting speed of the network is accelerated, the depth information is added into an algorithm, the image matting precision is improved, the error area is segmented and input into the finishing network, the finishing speed is accelerated, the image matting precision is greatly improved, and the aim of fixing the background image matting at low cost and high speed and high precision is fulfilled. Specifically, through carrying out neural network distillation to the well-trained keying network in advance for the lightweight network can learn the information that the complex network possessed, reduced required computational resource of keying and hardware configuration requirement, and the lightweight keying network can also have faster keying speed when guaranteeing the keying precision, the rethread introduces the depth information, obtains the keying prediction based on depth information according to Bayesian formula through the Bayesian predictor, further improves the accuracy of keying, and through tailorring error block stack input finishing network and carry out the finish repair, greatly improved the precision of keying when further improving the keying speed, realized high accuracy real-time keying.
Referring to fig. 2, a flowchart of a light-weight fixed background matting method for combining depth information according to a second embodiment of the present invention is shown, and the light-weight fixed background matting method for combining depth information includes steps S11 to S19, wherein:
step S11: collecting image data;
step S12: obtaining a first rough matting prediction image and a first error image through a basic matting network after neural network distillation;
step S13: inputting the front depth image and the rear depth image into a Bayesian predictor;
step S14: acquiring a second rough matting prediction image through a Bayesian formula;
step S15: subtracting the first rough matting prediction image from the second rough matting prediction image to obtain an error absolute value;
step S16: acquiring a second error image according to the absolute value of the error, and adding the second error image and the first error image according to the proportion to acquire a final error image;
step S17: marking an error region with an image error higher than a preset error index in the final error image;
step S18: cutting and stacking each image area corresponding to the error area in the foreground image, the background image, the foreground depth image and the background depth image to obtain an image block to be refined;
step S19: inputting the image block to be refined into a refining network, refining the image block to be refined and performing corresponding error refining according to a preset error index to obtain a refined image block and pasting the refined image block into a corresponding image area of the first rough matting prediction image to obtain a refined prediction image;
note that, in this embodiment, the preset error index for the matting prediction is SAD: 5. MSE:2, grad:8. conn:5, index interpretation: SAD (Sum of Absolute Differences, absolute difference) sum, MSE (Mean Squared Error, mean square error), conn (Connectivity error), grad (Gradient error), the lower the above data index is, the better the above data index is, the error index is set based on 1080p resolution, the verification experiment is performed at 1080p resolution, at which the basic matting error index in the embodiment of the invention reaches SAD:6.95 MSE:2.67, grad:8.51, conn:6.41, the index after passing through the refining network is: SAD:3.39 MSE:1.22, grad:6.95, conn:3.33, the image matting precision is greatly improved after finishing, the comparison between the finishing and the finishing is shown in fig. 4 and 5, and the network image matting speed before distillation and light weight is FPS:19, which is insufficient to achieve real-time matting, cannot be applied to an actual scene, and the network speed after distillation is FPS:33, a real-time image matting effect can be achieved, because when the frame number FPS is greater than 25, the naked eyes of a person can be identified as videos, and no obvious problem of blocking can occur.
In summary, according to the light-weight fixed background image matting method and system combining depth information, the network distillation is used for light-weight basic image matting network, the requirements of computing resources and hardware configuration are reduced, the image matting speed of the network is accelerated, the depth information is added into an algorithm, the image matting precision is improved, the error area is segmented and input into the finishing network, the finishing speed is accelerated, the image matting precision is greatly improved, and the aim of fixing the background image matting at low cost and high speed and high precision is fulfilled. Specifically, through carrying out neural network distillation to the well-trained keying network in advance for the lightweight network can learn the information that the complex network possessed, reduced required computational resource of keying and hardware configuration requirement, and the lightweight keying network can also have faster keying speed when guaranteeing the keying precision, the rethread introduces the depth information, obtains the keying prediction based on depth information according to Bayesian formula through the Bayesian predictor, further improves the accuracy of keying, and through tailorring error block stack input finishing network and carry out the finish repair, greatly improved the precision of keying when further improving the keying speed, realized high accuracy real-time keying.
Referring to fig. 3, a schematic structural diagram of a light-weight fixed background matting system combining depth information according to a third embodiment of the present invention is shown, where the system includes:
the acquisition module 10 is used for acquiring image data, wherein the image data comprises a foreground image, a foreground depth image, a background image and a background depth image;
the matting module 20 is configured to input the foreground image and the background image into a basic matting network, obtain a first rough matting prediction image and a first error image, input the foreground depth image and the background depth image into a bayesian predictor, obtain a second rough matting prediction image, obtain a second error image according to the first rough matting prediction image and the second rough matting prediction image, and add the second error image and the first error image to obtain a final error image;
the trimming module 30 is configured to obtain an image block to be trimmed according to the foreground image, the foreground depth image, the background depth image and the final error image, and input the image block to be trimmed into a trimming network to obtain a trimming prediction image.
Further, the acquisition module 10 includes:
the image acquisition unit 101 is configured to acquire image data, where the image data includes a foreground image and a foreground depth image, a background image and a background depth image.
Further, the matting module 20 includes:
a basic image matting unit 201, configured to input the foreground image and the background image into a basic image matting network, and obtain a first rough image matting prediction image and a first error image;
the depth matting unit 202 is configured to input the front depth image and the background depth image into a bayesian predictor, obtain a second rough matting prediction image, obtain a second error image according to the first rough matting prediction image and the second rough matting prediction image, and add the second error image and the first error image to obtain a final error image.
Further, the finishing module 30 includes:
a block clipping unit 301, configured to obtain an image block to be refined according to the foreground image, the foreground depth image, the background depth image, and the final error image;
and the finishing unit 302 is configured to input the image block to be finished into a finishing network to obtain a finishing prediction image.
In another aspect, the present invention further provides a computer storage medium, on which one or more programs are stored, where the programs implement the light-weight fixed background matting method combined with depth information when executed by a processor.
The invention also provides computer equipment, which comprises a memory and a processor, wherein the memory is used for storing a computer program, and the processor is used for executing the computer program stored on the memory so as to realize the light-weight fixed background matting method combined with the depth information.
Those of skill in the art will appreciate that the logic and/or steps represented in the flow diagrams or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain or store the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (10)

1. The light-weight fixed background matting method combined with the depth information is characterized by comprising the following steps of:
collecting image data, wherein the image data comprises a foreground image, a foreground depth image, a background image and a background depth image;
inputting the foreground image and the background image into a basic matting network to obtain a first rough matting prediction image and a first error image;
inputting the front depth image and the background depth image into a Bayesian predictor to obtain a second rough matting prediction image;
acquiring a second error image according to the first rough matting prediction image and the second rough matting prediction image and adding the second error image and the first error image to acquire a final error image;
and acquiring an image block to be refined according to the foreground image, the foreground depth image, the background depth image and the final error image, and inputting the image block to be refined into a refining network to acquire a refined prediction image.
2. The method of claim 1, wherein the step of inputting the foreground image and the background image into a base matting network to obtain a first rough matting prediction image and a first error image further comprises:
a training data set is acquired, and a teacher network and a student network for pre-training of a basic image matting network are set;
inputting the training data set into the teacher network, predicting the training data set by the teacher network, outputting a teacher network prediction result, and setting the teacher network prediction result as a soft label;
inputting the training data set into the student network, and performing predictive training on the training data set by the student network according to the soft label, and calculating training loss of the student network according to the following formula:
wherein,is total regression loss of student model, < >>Is a loss of student network, < >>Is distillation loss->And->Weights of student network loss and distillation loss, respectively;
and updating parameters of the student network according to the training loss to acquire the basic image matting network.
3. The method of claim 1, wherein the step of inputting the foreground image and the background image into a base matting network to obtain a first rough matting prediction image and a first error image comprises:
inputting a foreground picture and a background picture into a basic matting network;
image feature extraction is carried out through backbone network and ASPP cavity space pyramid pooling in the basic image matting network;
and acquiring a first rough matting prediction image and a first error image through the basic matting decoder.
4. The method of claim 1, wherein the step of inputting the foreground depth image and the background depth image into a bayesian predictor and obtaining a second rough matting prediction image comprises:
inputting the acquired front depth image and background depth image into a Bayesian predictor;
calculating the mean value and variance of each pixel point in the background according to the background depth map;
constructing Gaussian probability distribution of background depth values, and setting the foreground depth values to be uniform distribution;
and acquiring the second rough matting prediction image through a Bayesian formula according to the front depth image and the background depth image.
5. A lightweight fixed background matting method in combination with depth information as defined in claim 1, characterized in that the step of obtaining a second error image from the first and second coarse matting prediction images and adding the second error image to the first error image to obtain a final error image comprises:
subtracting the first rough matting prediction image from the second rough matting prediction image;
acquiring an absolute value of an error of the first rough matting prediction image and the second rough matting prediction image;
and acquiring the second error image according to the absolute value of the error, and adding the second error image and the first error image according to proportion so as to acquire the final error image.
6. The method of claim 1, wherein the step of obtaining the image block to be refined from the foreground image, the foreground depth image, the background depth image, and the final error image comprises:
marking an error region with an image error higher than a preset error index in the final error image;
clipping each image region corresponding to the error region in the foreground image, the background image, the foreground depth image and the background depth image to obtain a plurality of error image blocks corresponding to each error region;
and stacking the error image blocks of the single error region to obtain image blocks to be refined of each error region.
7. A lightweight fixed background matting method in combination with depth information according to claim 1, characterized in that the step of inputting the image block to be refined into a refinement network to obtain a refinement prediction image comprises:
inputting the image block to be refined into a refining network;
the refining network refines the image blocks to be refined and performs corresponding error refining according to a preset error index so as to obtain refined image blocks;
and pasting the refined image block into a corresponding image area of the first rough matting prediction image to obtain the refined prediction image.
8. A lightweight fixed background matting system incorporating depth information, comprising:
the acquisition module is used for acquiring image data, wherein the image data comprises a foreground image, a foreground depth image, a background image and a background depth image;
the image matting module is used for inputting the foreground image and the background image into a basic image matting network to obtain a first rough image matting prediction image and a first error image, inputting the foreground depth image and the background depth image into a Bayesian predictor to obtain a second rough image matting prediction image, obtaining a second error image according to the first rough image matting prediction image and the second rough image matting prediction image, and adding the second error image and the first error image to obtain a final error image;
and the finishing module is used for acquiring an image block to be finished according to the foreground image, the foreground depth image, the background depth image and the final error image, and inputting the image block to be finished into a finishing network to acquire a finishing prediction image.
9. A storage medium, comprising: the storage medium stores one or more programs that when executed by a processor implement a lightweight fixed background matting method incorporating depth information as claimed in any one of claims 1 to 7.
10. A computer device comprising a memory and a processor, wherein:
the memory stores a computer program;
the processor, when executing the computer program stored on the memory, implements a lightweight fixed background matting method incorporating depth information as claimed in any one of claims 1 to 7.
CN202311641622.8A 2023-12-04 2023-12-04 Lightweight fixed background matting method and system combined with depth information Active CN117351118B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311641622.8A CN117351118B (en) 2023-12-04 2023-12-04 Lightweight fixed background matting method and system combined with depth information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311641622.8A CN117351118B (en) 2023-12-04 2023-12-04 Lightweight fixed background matting method and system combined with depth information

Publications (2)

Publication Number Publication Date
CN117351118A true CN117351118A (en) 2024-01-05
CN117351118B CN117351118B (en) 2024-02-23

Family

ID=89367720

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311641622.8A Active CN117351118B (en) 2023-12-04 2023-12-04 Lightweight fixed background matting method and system combined with depth information

Country Status (1)

Country Link
CN (1) CN117351118B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2919190A1 (en) * 2014-03-14 2015-09-16 Thomson Licensing Method and apparatus for alpha matting
CN105590312A (en) * 2014-11-12 2016-05-18 株式会社理光 Foreground image segmentation method and apparatus
CN106952276A (en) * 2017-03-20 2017-07-14 成都通甲优博科技有限责任公司 A kind of image matting method and device
CN109035253A (en) * 2018-07-04 2018-12-18 长沙全度影像科技有限公司 A kind of stingy drawing method of the deep learning automated graphics of semantic segmentation information guiding
CN110189339A (en) * 2019-06-03 2019-08-30 重庆大学 The active profile of depth map auxiliary scratches drawing method and system
CN112241960A (en) * 2020-10-01 2021-01-19 深圳奥比中光科技有限公司 Matting method and system based on depth information
CN112819848A (en) * 2021-02-04 2021-05-18 Oppo广东移动通信有限公司 Matting method, matting device and electronic equipment
CN112884776A (en) * 2021-01-22 2021-06-01 浙江大学 Deep learning cutout method based on synthesis data set augmentation
CN114038006A (en) * 2021-08-09 2022-02-11 奥比中光科技集团股份有限公司 Matting network training method and matting method
CN116342632A (en) * 2023-02-23 2023-06-27 奥比中光科技集团股份有限公司 Depth information-based matting method and matting network training method
CN117097853A (en) * 2023-08-16 2023-11-21 浙江理工大学 Real-time image matting method and system based on deep learning

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2919190A1 (en) * 2014-03-14 2015-09-16 Thomson Licensing Method and apparatus for alpha matting
CN105590312A (en) * 2014-11-12 2016-05-18 株式会社理光 Foreground image segmentation method and apparatus
CN106952276A (en) * 2017-03-20 2017-07-14 成都通甲优博科技有限责任公司 A kind of image matting method and device
CN109035253A (en) * 2018-07-04 2018-12-18 长沙全度影像科技有限公司 A kind of stingy drawing method of the deep learning automated graphics of semantic segmentation information guiding
CN110189339A (en) * 2019-06-03 2019-08-30 重庆大学 The active profile of depth map auxiliary scratches drawing method and system
CN112241960A (en) * 2020-10-01 2021-01-19 深圳奥比中光科技有限公司 Matting method and system based on depth information
CN112884776A (en) * 2021-01-22 2021-06-01 浙江大学 Deep learning cutout method based on synthesis data set augmentation
CN112819848A (en) * 2021-02-04 2021-05-18 Oppo广东移动通信有限公司 Matting method, matting device and electronic equipment
CN114038006A (en) * 2021-08-09 2022-02-11 奥比中光科技集团股份有限公司 Matting network training method and matting method
WO2023015755A1 (en) * 2021-08-09 2023-02-16 奥比中光科技集团股份有限公司 Matting network training method and matting method
CN116342632A (en) * 2023-02-23 2023-06-27 奥比中光科技集团股份有限公司 Depth information-based matting method and matting network training method
CN117097853A (en) * 2023-08-16 2023-11-21 浙江理工大学 Real-time image matting method and system based on deep learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LI, H. ET AL.: "Automatic, Illumination-Invariant and Real-Time Green-Screen Keying Using Deeply Guided Linear Models", SYMMETRY 2021, vol. 13, no. 8, 9 August 2021 (2021-08-09) *
朱小琴: "基于轻量级神经网络的人像分割和抠图技术研究与实现", 中国优秀硕士学位论文信息科技辑, 15 May 2022 (2022-05-15) *
董振: "基于生成对抗网络的图像背景去除关键技术研究", 中国优秀硕士学位论文集信息科技辑, 15 February 2023 (2023-02-15) *

Also Published As

Publication number Publication date
CN117351118B (en) 2024-02-23

Similar Documents

Publication Publication Date Title
Yang et al. Seeing deeply and bidirectionally: A deep learning approach for single image reflection removal
CN110188760B (en) Image processing model training method, image processing method and electronic equipment
CN109614921B (en) Cell segmentation method based on semi-supervised learning of confrontation generation network
CN111881720B (en) Automatic enhancement and expansion method, recognition method and system for data for deep learning
DE102017223559B4 (en) DEVICE FOR FOCUSING A CAMERA AND CONTROL PROCEDURES FOR THIS
US9152926B2 (en) Systems, methods, and media for updating a classifier
CN110807757B (en) Image quality evaluation method and device based on artificial intelligence and computer equipment
CN110148088B (en) Image processing method, image rain removing method, device, terminal and medium
CN108197669B (en) Feature training method and device of convolutional neural network
CN114723643B (en) Low-light image enhancement method based on reinforcement learning and aesthetic evaluation
CN113128369A (en) Lightweight network facial expression recognition method fusing balance loss
DE112019007393T5 (en) Method and system for training a model for image generation
Chira et al. Image super-resolution with deep variational autoencoders
CN112200887A (en) Multi-focus image fusion method based on gradient perception
CN115496925A (en) Image processing method, apparatus, storage medium, and program product
CN111242176A (en) Computer vision task processing method and device and electronic system
CN111428730B (en) Weak supervision fine-grained object classification method
CN111583282A (en) Image segmentation method, device, equipment and storage medium
CN110580696A (en) Multi-exposure image fast fusion method for detail preservation
CN112507981B (en) Model generation method, iris image quality evaluation method and electronic equipment
CN117351118B (en) Lightweight fixed background matting method and system combined with depth information
CN112463999A (en) Visual position identification method and device, computer equipment and readable storage medium
CN112084936A (en) Face image preprocessing method, device, equipment and storage medium
CN116664867A (en) Feature extraction method and device for selecting training samples based on multi-evidence fusion
CN114782507B (en) Asymmetric binocular stereo matching method and system based on unsupervised learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant