CN112862838A - Natural image matting method based on real-time click interaction of user - Google Patents
Natural image matting method based on real-time click interaction of user Download PDFInfo
- Publication number
- CN112862838A CN112862838A CN202110158221.1A CN202110158221A CN112862838A CN 112862838 A CN112862838 A CN 112862838A CN 202110158221 A CN202110158221 A CN 202110158221A CN 112862838 A CN112862838 A CN 112862838A
- Authority
- CN
- China
- Prior art keywords
- image
- mask
- user
- uncertainty
- image mask
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 230000003993 interaction Effects 0.000 title claims abstract description 28
- 238000012986 modification Methods 0.000 claims abstract description 6
- 230000004048 modification Effects 0.000 claims abstract description 6
- 238000005520 cutting process Methods 0.000 claims abstract description 5
- 238000012549 training Methods 0.000 claims description 17
- 230000002452 interceptive effect Effects 0.000 claims description 8
- 238000009826 distribution Methods 0.000 claims description 5
- 238000012937 correction Methods 0.000 claims description 4
- 230000007704 transition Effects 0.000 claims description 4
- 239000000284 extract Substances 0.000 claims description 2
- 230000000873 masking effect Effects 0.000 claims description 2
- 238000005065 mining Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 238000009966 trimming Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007730 finishing process Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000011158 quantitative evaluation Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/40—Filling a planar surface by adding surface attributes, e.g. colour or texture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20112—Image segmentation details
- G06T2207/20132—Image cropping
Abstract
The invention discloses a natural image matting method based on real-time click interaction of a user, which comprises the following steps: acquiring an input original image and an indicator graph containing foreground and background information, which is obtained by interacting with a user; extracting an image mask only containing foreground information in the indication image from a complete image mask of the original image according to the indication image to be used as a preliminary image mask; carrying out uncertainty estimation on the preliminary image mask to obtain an uncertainty map, cutting pixel blocks at corresponding positions with uncertainty exceeding a preset value from the preliminary image mask and the original image under the guidance of the uncertainty map, and carrying out local fine modification through a full convolution network without downsampling; and after a local finishing result is obtained, pasting the corresponding position of the initial image mask to obtain the finished image mask. The method has the advantages that the performance of the method is greatly advanced to the prior full-automatic matting method under the condition of only using few user interactions, and the method is equivalent to the most advanced matting method based on the trisection.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a natural image matting method based on real-time click interaction of a user.
Background
Image Matting is a fundamental and challenging problem in the field of computer vision. It requires accurate separation of foreground objects from the background while accurately estimating the per-pixel transparency (alpha) near the separation edge. Because it has a wide range of usage scenarios, for example: image composition and editing, movie production, virtual backgrounds in video conferencing, etc. have been studied by academia and industry for many years.
An image can be formally expressed as a mathematical formula from the perspective of image composition as follows:
Ii=αiFi+(1-αi)Bi,αi∈[0,1] (1)
in the above equation, I ═ x, y denotes a pixel position in the input image I, and αi,Fi,BiRespectively representing the foreground object transparency, foreground value and background value at pixel point i. This formula defines the pixel level interpretation of the image imaging: each pixel point in the image is formed by linearly combining a foreground and a background, and alphaiIt represents the proportional relationship of foreground to background, i.e., transparency. When alpha isiWhen the pixel value is equal to 1, the pixel point is completely composed of foreground pixel values, namely completely opaque; when alpha isiWhen the pixel point is equal to 0, the pixel point is completely composed ofBackground pixel value composition, i.e., completely transparent; when alpha isiWhen the element belongs to (0,1), the pixel point is represented to be formed by linearly combining a foreground pixel value and a background pixel value, and the pixel point is positioned in a joint area of a foreground area and a background area, such as: animal hair, plant branches and leaves, etc.
The aim of image matting is to solve the problem of the optimization by taking the formula (1) as the optimization problemiAnd (5) forming a single-channel image mask (alpha matte). For each pixel point, the problem needs to solve 7 unknowns including unknown single-channel transparency, 3-channel foreground pixel values and 3-channel background pixel values from known 3-channel pixel values of the image I, and obviously is a highly under-constrained problem.
To solve this problem, many classical approaches rely on a pre-defined ternary graph (Trimap) as an additional input to constrain the solution space. The trimap divides an image into three regions: foreground region, background region and transition region. The foreground region indicates that all pixels within the region consist entirely of foreground pixel values and the background region indicates that all pixels within the region consist entirely of background pixel values. Therefore, the task of image matting is simplified to regress the transparency of each pixel point, namely alpha, only for the transition region in the trisection mapi. Therefore, some image matting methods based on trimap as auxiliary input can generally achieve better performance. However, drawing a suitable trimap is very time consuming and laborious, with some complex examples drawing times even exceeding ten minutes, which is extremely unfriendly for users, especially non-professional users.
With the development of deeply learned fire heat, some matting methods have recently emerged that do not require a tri-segment as an additional auxiliary input. However, their performance is far inferior to the tripartite graph-based matting approach. The main reasons behind this are: for some images, the deep learning network can create ambiguity as to which foreground object to matte due to the lack of the guiding constraint of the trimap. To address this ambiguity, some approaches collect large-scale matting datasets for only certain object classes (e.g., faces) for training deep learning networks. However, this solution also has the problems of inextensibility, high cost, etc., and especially if the user wants to perform matting on categories that do not appear in the training set, the effect is often poor. In addition, if a plurality of characters (i.e. foreground) appear in an image, the user does not need to scratch all the characters, so some user interactions are inevitable, and the key is how to minimize the interaction cost of the user in the interaction process and accurately extract the foreground specified by the user, but at present, no effective scheme exists.
Disclosure of Invention
The invention aims to provide a natural image matting method based on real-time click interaction of a user, which can achieve performance equivalent to the image matting method based on a three-segment image with superior performance by only requiring the user to provide a small number of clicks (in most cases, when the image has no foreground ambiguity problem, no user clicks are needed) to indicate that the position is a foreground or a background.
The purpose of the invention is realized by the following technical scheme:
a natural image matting method based on real-time click interaction of a user comprises the following steps:
interactive matting stage: acquiring an input original image and an indicator graph containing foreground and background information, which is obtained by interacting with a user; extracting an image mask only containing foreground information in the indication image from a complete image mask of the original image according to the indication image to be used as a preliminary image mask;
local refinement stage guided by uncertainty: carrying out uncertainty estimation on the preliminary image mask to obtain an uncertainty map, cutting pixel blocks at corresponding positions with uncertainty exceeding a preset value from the preliminary image mask and the original image under the guidance of the uncertainty map, and carrying out local fine modification through a full convolution network without downsampling; and after obtaining a local fine correction result, pasting the corresponding position of the initial image mask to obtain a fine corrected image mask which is used as a complete image mask of the iteration.
The technical scheme provided by the invention can show that the performance of the method greatly precedes that of the existing full-automatic matting method under the condition of using only a few user interactions, and the method is equivalent to the most advanced matting method based on the tripartite drawing at present.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1 is a frame diagram of a natural image matting method based on user real-time click interaction according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a process of real-time user click interaction according to an embodiment of the present invention;
FIG. 3 is a comparison of visual tests on real human images provided by an embodiment of the present invention;
fig. 4 is a visual comparative display diagram before and after partial refinement according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a natural image matting method based on real-time click interaction of a user, which mainly comprises the following two stages as shown in fig. 1.
First, interactive cutout stage.
In the stage, an input original image and an indicator graph containing foreground and background information obtained by interacting with a user are obtained; and extracting an image mask only containing foreground information in the indication map from the complete image mask of the original image according to the indication map to serve as a preliminary image mask.
As shown in fig. 1, the operation of the interactive matting phase is accomplished by an encoder in cooperation with a mask decoder.
The encoder is used for encoding the input original image and the indication map.
And the mask decoder is used for predicting the preliminary image mask according to the coding result of the coder.
Illustratively, the encoder and the mask decoder may be implemented by a U-Net network.
In the embodiment of the invention, the original image can be a three-channel RGB image, and the indication image is a single-channel image.
In the embodiment of the invention, before the interaction with the user, all pixel values in the indication image are 0, and when the interaction with the user is carried out, if a click operation of an indication foreground added by the user is received, a dot with the radius of r and the pixel value of 1 is filled in the corresponding position of the indication image; if a click operation which is added by a user and indicates a background is received, a dot with the radius of r and the pixel value of-1 is filled in the corresponding position of the indication image.
In the embodiment of the invention, in order to enable the deep learning network to adjust the behavior of the deep learning network according to the user click, a main difficulty is to collect the training data with the click, but the method is quite expensive. Therefore, the invention innovatively provides that the user click is simulated in the training process: in the training phase, for each training image, a number (e.g., 0-6) of foreground or background points of a specified radius size (e.g., 15 pixels in radius) are randomly sampled to generate an indication map.
In the training stage, an error function formed by image space loss (reg) and gradient space loss (grad) is designed.
Image space loss applying L to transition region T of original image1Loss, L is applied to the foreground and background regions S ═ { F, B } of the original image2Loss:
in the above formula, αpAnd alphagRespectively representing a predicted preliminary image mask and a given supervision mask, and obtaining a mask value and a supervision information value of a corresponding pixel after adding superscripts, wherein | f | represents the number of elements of f; i. j each represents a pixel index;
gradient spatial loss is L of predicted preliminary image mask and supervision mask in spatial gradient1Loss:
wherein Ω represents all pixels of the original image I, and v is a gradient magnitude symbol, and the network can be effectively encouraged to generate a sharper matting result by introducing a gradient spatial loss amount.
And secondly, local fine modification guided by uncertainty.
This stage aims at realizing automatic local refinement to the preliminary image mask of interactive matting stage output to output more meticulous image mask, mainly include: carrying out uncertainty estimation on the preliminary image mask to obtain an uncertainty map, cutting pixel blocks of corresponding positions with uncertainty exceeding a preset value from the preliminary image mask and the original image under the guidance of the uncertainty map, and carrying out local fine modification through a full convolution network without downsampling; and after obtaining a local fine correction result, pasting the corresponding position of the initial image mask to obtain a fine corrected image mask which is used as a complete image mask of the iteration. The following is a description of the uncertainty estimation and local refinement process.
1) And estimating uncertainty.
As shown in fig. 1, the uncertainty map is estimated by the encoder working in conjunction with an uncertainty estimation module. The uncertainty estimation network is parallel to a mask decoder and shares the same encoder, and primary image mask prediction is described by using univariate Laplace distribution, so that an uncertainty map is estimated:
wherein mu is a preliminary image mask alphap(ii) a σ is an uncertainty map σ of the uncertainty estimation module outputpThe larger the value is, the larger the uncertainty degree of the result output by the uncertainty degree estimation module to the matting network is; x is a supervision mask alphag(ii) a f (x | μ, σ) represents the laplacian distribution that characterizes the preliminary image mask using both μ and σ parameters, with the ultimate goal of estimating the uncertainty map.
As will be understood by those skilled in the art, the above equation describes the process of the preliminary image mask prediction by using a univariate Laplace distribution, and the true value of each pixel point i is the supervisory information valueThe preliminary image mask predicted by the mask decoder is the mean value mu and the uncertainty estimation block is equivalent to the variance sigma in the predicted laplacian distribution, i.e. the uncertainty.
In the embodiment of the invention, the uncertainty estimation module is trained by adopting a negative log-likelihood minimization mode:
where i represents a pixel index.
In the implementation of the invention, the uncertainty represents the uncertainty of each pixel value of the initial image mask, and for the pixel point at any position, if the uncertainty is larger, the initial image mask value output at the corresponding position is more uncertain, so that further refinement is needed.
By the above-mentioned loss function LueAfter the uncertainty estimation module is trained, the uncertainty map of the initial image mask can be accurately estimated, so that the subsequent local fine correction process is guided.
2) And (5) local finishing process.
Under the guidance of an uncertain graph, cutting the preliminary image mask and the original image into pixel blocks (the default size of the pixel blocks can be set to 64 × 64) with the uncertainty exceeding a preset value (the specific numerical value can be set by self), and then sending the pixel blocks and the original image together into a full convolution network without downsampling for local refinement.
In the embodiment of the present invention, the full convolution network without downsampling is the refinement network shown in fig. 1. Since the block size of the clipped pixel block is generally much smaller than the size of the original image, the calculation overhead is much smaller than the global refinement approach. Since most of the pixels in the cropped pixel block have been accurately predicted for transparency, only a small fraction of the "stubborn" pixels need to be heavily refined. In order for the refinement network to give more attention to these "stubborn" pixels, a difficult sample mining objective function is employed for training:
where C represents the entire set of pixels, αpFor preliminary image masking, αgFor the supervised information, H represents the difficult pixel set ranked K% top in the whole pixel set with the corresponding supervised information error, λ represents the enhancement weight for the difficult pixel set H, and i, j both represent the pixel index.
For example, K may be 20, and λ may be 1, and of course, the numerical values of the parameters in the embodiments of the present invention are all examples and are not limited, and in practical applications, a user may set the numerical values according to needs or experience.
The whole framework shown in fig. 1 adopts a segmented training mode: 1) training the encoder and the mask decoder, the loss function is: l isalpha=Lreg+Lgrad. 2) Fixing the encoder and the mask decoder, retraining the uncertainty estimation module, and applying the loss function L as described aboveue. 3) Training the refinement network alone with the loss function L introduced aboverefine。
As shown in fig. 2, for a summary diagram of a real-time user click interaction process, the frame of the matting method in fig. 2 is the frame shown in fig. 1. At the initial moment, for the input original image, the encoder is matched with the mask decoder to predict a preliminary image mask, and then a refined image mask is obtained through local refinement and is used as a complete image mask at the initial moment. If the expected requirements of the user are not met, determining foreground and background information specified by the user in the original image through interaction with the user, and generating an indication map; after the indication graph is coded by the coder, the indication graph is input to a mask decoder through jumping connection, and a preliminary image mask is extracted from the complete image mask by the mask decoder; and then, local fine modification is carried out to obtain a fine modified image mask which is used as a complete image mask of the iteration. Each iteration comprises two stages of operation, the complete image masks related to the interactive matting stage are all the complete image masks obtained in the last iteration, and in the actual operation, the iteration can be repeated for multiple times until the required complete image masks are obtained. The original input image in fig. 2 contains two foreground objects, the left object is indicated as background and the right object is indicated as foreground in the user interaction, and finally an image mask containing only the right object is output.
Compared with the prior art, the method has the advantages that:
1. the method of the invention provides a brand-new interaction mode, and can achieve the performance equivalent to that of the method based on the three-segment diagram only by few click interaction; compared with a full-automatic image matting method without inputting additional information, the method has the advantages that the semantic ambiguity problem does not exist, the performance is greatly improved, the deep learning network can be generalized to the types which are not seen in a training set only by few clicks, and the high-quality image mask is output.
2. The local fine-trimming method guided by the uncertainty provided by the method can enable a user to flexibly select the number of small blocks to be subjected to local fine-trimming according to the calculation overhead of the user. The local refinement method is more flexible and efficient than existing global refinement methods, and reduces the computational overhead for most of the regions that have been predicted correctly.
On the other hand, in order to illustrate the performance advantages of the method of the present invention, the method of the present invention and other existing methods are subjected to quantitative index pairs on a DIM test set, as shown in Table 1. In the table, LF-matching is a full-automatic Matting method, and all other methods are based on a trisection Matting method. The quantitative evaluation index includes: sum of Absolute Difference (SAD), Mean Square Error (MSE), gradient (Grad), connectivity (Conn), all four indexes are as small as possible. As can be seen from Table 1, the method of the present invention has performance equivalent to that of the method based on the treelet matting while the interaction cost is greatly reduced.
TABLE 1 quantitative comparison results on DIM test set
Fig. 3 shows the comparison of the visual test of the present invention on real figures, when the test is performed, the present invention only trains on the figure data set. As can be seen from the figure, although only the portrait category exists in the training set, by providing the method with some users to click the indication foreground or background, the method can be easily generalized to the category which is not seen in the training set; by giving a background click indication to the foreground object, the method can retain the foreground object desired by the user. Therefore, the advantages of the invention are obvious.
FIG. 4 shows a visual comparative display of the present invention before and after partial refinement. In each row, from left to right, the following are in turn: original image, result before local refinement, result after local refinement, and supervision information. It can be clearly seen that local refinement can significantly improve edge detail, eliminating blurring artifacts.
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (6)
1. A natural image matting method based on user real-time click interaction is characterized by comprising the following steps:
interactive matting stage: acquiring an input original image and an indicator graph containing foreground and background information, which is obtained by interacting with a user; extracting an image mask only containing foreground information in the indication image from a complete image mask of the original image according to the indication image to be used as a preliminary image mask;
local refinement stage guided by uncertainty: carrying out uncertainty estimation on the preliminary image mask to obtain an uncertainty map, cutting pixel blocks at corresponding positions with uncertainty exceeding a preset value from the preliminary image mask and the original image under the guidance of the uncertainty map, and carrying out local fine modification through a full convolution network without downsampling; and after obtaining a local fine correction result, pasting the corresponding position of the initial image mask to obtain a fine corrected image mask which is used as a complete image mask of the iteration.
2. The natural image matting method based on user real-time click interaction as claimed in claim 1, characterized in that the operation of interactive matting phase is completed by the cooperation of encoder and mask decoder;
the encoder is used for encoding an input original image and an indication image;
the mask decoder is used for predicting the preliminary image mask according to the encoding result of the encoder;
at the initial moment, for an input original image, the encoder is matched with the mask decoder to predict a preliminary image mask, and then a refined image mask is obtained through local refinement and is used as a complete image mask at the initial moment; then, through interaction with a user, determining foreground and background information specified by the user in the original image, and generating an indication map; and the indication graph is input to a mask decoder through jump connection after being coded by the coder, and the mask decoder extracts a preliminary image mask from the complete image mask.
3. The natural image matting method based on user click interaction in real time according to claim 1 or 2, characterized in that the loss function of the interactive matting stage includes: image space loss and gradient space loss;
image space loss applying L to transition region T of original image1Loss, applying L to the foreground and background regions S of the original image2Loss:
in the above formula, αpAnd alphagRespectively representing a predicted preliminary image mask and a given supervision mask, and obtaining a mask value and a supervision information value of a corresponding pixel after adding superscripts, wherein | f | represents the number of elements of f; i. j each represents a pixel index;
gradient spatial loss is L of predicted preliminary image mask and supervision mask in spatial gradient1Loss:
where Ω represents all pixels of the original image I, and ∑ is the gradient magnitude symbol.
4. The method for natural image matting based on real-time click interaction of a user according to claim 1 or 2, characterized in that all pixel values in the indication map are 0 before the user interacts with the user, and when the user interacts with the user, if a click operation of indicating a foreground added by the user is received, a dot with radius r and pixel value 1 is filled in a corresponding position of the indication map; if the click operation of the indication background added by the user is received, filling a dot with the radius of r and the pixel value of-1 in the corresponding position of the indication image;
in the training phase, for each training image, a plurality of foreground points or background points with specified radius sizes are randomly sampled so as to generate an indication map.
5. The natural image matting method based on real-time click interaction of users according to claim 2, characterized in that an uncertainty map is estimated by a coder working in cooperation with an uncertainty estimation module;
the uncertainty estimation network is parallel to a mask decoder and shares the same encoder, and primary image mask prediction is described by using univariate Laplace distribution, so that an uncertainty map is estimated:
wherein mu is a preliminary image mask alphapAnd sigma is an uncertainty map sigma output by the uncertainty estimation modulepX is supervision information alphag;
Training the uncertainty estimation module using negative log-likelihood minimization:
where Ω represents all pixels of the original image I and I represents the pixel index.
6. The method as claimed in claim 1, wherein the full convolution network without downsampling is a refinement network, and is trained by adopting a difficult sample mining objective function:
where C represents the entire set of pixels, αpFor preliminary image masking, αgFor the supervised information, H represents the difficult pixel set ranked K% top in the whole pixel set with the corresponding supervised information error, λ represents the enhancement weight for the difficult pixel set H, and i, j both represent the pixel index.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110158221.1A CN112862838A (en) | 2021-02-04 | 2021-02-04 | Natural image matting method based on real-time click interaction of user |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110158221.1A CN112862838A (en) | 2021-02-04 | 2021-02-04 | Natural image matting method based on real-time click interaction of user |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112862838A true CN112862838A (en) | 2021-05-28 |
Family
ID=75988837
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110158221.1A Pending CN112862838A (en) | 2021-02-04 | 2021-02-04 | Natural image matting method based on real-time click interaction of user |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112862838A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113608805A (en) * | 2021-07-08 | 2021-11-05 | 阿里巴巴新加坡控股有限公司 | Mask prediction method, image processing method, display method and equipment |
CN113838084A (en) * | 2021-09-26 | 2021-12-24 | 上海大学 | Matting method based on codec network and guide map |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111223106A (en) * | 2019-10-28 | 2020-06-02 | 稿定(厦门)科技有限公司 | Full-automatic portrait mask matting method and system |
US20200311946A1 (en) * | 2019-03-26 | 2020-10-01 | Adobe Inc. | Interactive image matting using neural networks |
-
2021
- 2021-02-04 CN CN202110158221.1A patent/CN112862838A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200311946A1 (en) * | 2019-03-26 | 2020-10-01 | Adobe Inc. | Interactive image matting using neural networks |
CN111223106A (en) * | 2019-10-28 | 2020-06-02 | 稿定(厦门)科技有限公司 | Full-automatic portrait mask matting method and system |
Non-Patent Citations (1)
Title |
---|
TIANYI WEI ET.AL: "Improved Image Matting via Real-time User Clicks and Uncertainty Estimation", 《ARXIV: 2012.08323V1》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113608805A (en) * | 2021-07-08 | 2021-11-05 | 阿里巴巴新加坡控股有限公司 | Mask prediction method, image processing method, display method and equipment |
CN113608805B (en) * | 2021-07-08 | 2024-04-12 | 阿里巴巴创新公司 | Mask prediction method, image processing method, display method and device |
CN113838084A (en) * | 2021-09-26 | 2021-12-24 | 上海大学 | Matting method based on codec network and guide map |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | PDR-Net: Perception-inspired single image dehazing network with refinement | |
CN109712145B (en) | Image matting method and system | |
CN107818554B (en) | Information processing apparatus and information processing method | |
CN112862838A (en) | Natural image matting method based on real-time click interaction of user | |
CN114187624B (en) | Image generation method, device, electronic equipment and storage medium | |
Zhang et al. | GAIN: Gradient augmented inpainting network for irregular holes | |
Shahrian et al. | Temporally coherent and spatially accurate video matting | |
CN116205962B (en) | Monocular depth estimation method and system based on complete context information | |
Zheng et al. | Truncated low-rank and total p variation constrained color image completion and its moreau approximation algorithm | |
Zhang et al. | Hierarchical attention aggregation with multi-resolution feature learning for GAN-based underwater image enhancement | |
Cui et al. | Progressive dual-branch network for low-light image enhancement | |
CN112686830B (en) | Super-resolution method of single depth map based on image decomposition | |
CN115587967B (en) | Fundus image optic disk detection method based on HA-UNet network | |
CN116524307A (en) | Self-supervision pre-training method based on diffusion model | |
CN115731447A (en) | Decompressed image target detection method and system based on attention mechanism distillation | |
CN113516604B (en) | Image restoration method | |
Su et al. | Physical model and image translation fused network for single-image dehazing | |
CN112102216B (en) | Self-adaptive weight total variation image fusion method | |
CN112634331A (en) | Optical flow prediction method and device | |
CN113962332A (en) | Salient target identification method based on self-optimization fusion feedback | |
Hörentrup et al. | Confidence-aware guided image filter | |
CN113315995A (en) | Method and device for improving video quality, readable storage medium and electronic equipment | |
EP3032497A2 (en) | Method and apparatus for color correction | |
Saxena et al. | An efficient single image haze removal algorithm for computer vision applications | |
Zhuang et al. | Dimensional transformation mixer for ultra-high-definition industrial camera dehazing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210528 |