CN112862838A - Natural image matting method based on real-time click interaction of user - Google Patents

Natural image matting method based on real-time click interaction of user Download PDF

Info

Publication number
CN112862838A
CN112862838A CN202110158221.1A CN202110158221A CN112862838A CN 112862838 A CN112862838 A CN 112862838A CN 202110158221 A CN202110158221 A CN 202110158221A CN 112862838 A CN112862838 A CN 112862838A
Authority
CN
China
Prior art keywords
image
mask
user
uncertainty
image mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110158221.1A
Other languages
Chinese (zh)
Inventor
周文柏
张卫明
俞能海
韦天一
陈冬冬
廖菁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202110158221.1A priority Critical patent/CN112862838A/en
Publication of CN112862838A publication Critical patent/CN112862838A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/40Filling a planar surface by adding surface attributes, e.g. colour or texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Abstract

The invention discloses a natural image matting method based on real-time click interaction of a user, which comprises the following steps: acquiring an input original image and an indicator graph containing foreground and background information, which is obtained by interacting with a user; extracting an image mask only containing foreground information in the indication image from a complete image mask of the original image according to the indication image to be used as a preliminary image mask; carrying out uncertainty estimation on the preliminary image mask to obtain an uncertainty map, cutting pixel blocks at corresponding positions with uncertainty exceeding a preset value from the preliminary image mask and the original image under the guidance of the uncertainty map, and carrying out local fine modification through a full convolution network without downsampling; and after a local finishing result is obtained, pasting the corresponding position of the initial image mask to obtain the finished image mask. The method has the advantages that the performance of the method is greatly advanced to the prior full-automatic matting method under the condition of only using few user interactions, and the method is equivalent to the most advanced matting method based on the trisection.

Description

Natural image matting method based on real-time click interaction of user
Technical Field
The invention relates to the technical field of computer vision, in particular to a natural image matting method based on real-time click interaction of a user.
Background
Image Matting is a fundamental and challenging problem in the field of computer vision. It requires accurate separation of foreground objects from the background while accurately estimating the per-pixel transparency (alpha) near the separation edge. Because it has a wide range of usage scenarios, for example: image composition and editing, movie production, virtual backgrounds in video conferencing, etc. have been studied by academia and industry for many years.
An image can be formally expressed as a mathematical formula from the perspective of image composition as follows:
Ii=αiFi+(1-αi)Bii∈[0,1] (1)
in the above equation, I ═ x, y denotes a pixel position in the input image I, and αi,Fi,BiRespectively representing the foreground object transparency, foreground value and background value at pixel point i. This formula defines the pixel level interpretation of the image imaging: each pixel point in the image is formed by linearly combining a foreground and a background, and alphaiIt represents the proportional relationship of foreground to background, i.e., transparency. When alpha isiWhen the pixel value is equal to 1, the pixel point is completely composed of foreground pixel values, namely completely opaque; when alpha isiWhen the pixel point is equal to 0, the pixel point is completely composed ofBackground pixel value composition, i.e., completely transparent; when alpha isiWhen the element belongs to (0,1), the pixel point is represented to be formed by linearly combining a foreground pixel value and a background pixel value, and the pixel point is positioned in a joint area of a foreground area and a background area, such as: animal hair, plant branches and leaves, etc.
The aim of image matting is to solve the problem of the optimization by taking the formula (1) as the optimization problemiAnd (5) forming a single-channel image mask (alpha matte). For each pixel point, the problem needs to solve 7 unknowns including unknown single-channel transparency, 3-channel foreground pixel values and 3-channel background pixel values from known 3-channel pixel values of the image I, and obviously is a highly under-constrained problem.
To solve this problem, many classical approaches rely on a pre-defined ternary graph (Trimap) as an additional input to constrain the solution space. The trimap divides an image into three regions: foreground region, background region and transition region. The foreground region indicates that all pixels within the region consist entirely of foreground pixel values and the background region indicates that all pixels within the region consist entirely of background pixel values. Therefore, the task of image matting is simplified to regress the transparency of each pixel point, namely alpha, only for the transition region in the trisection mapi. Therefore, some image matting methods based on trimap as auxiliary input can generally achieve better performance. However, drawing a suitable trimap is very time consuming and laborious, with some complex examples drawing times even exceeding ten minutes, which is extremely unfriendly for users, especially non-professional users.
With the development of deeply learned fire heat, some matting methods have recently emerged that do not require a tri-segment as an additional auxiliary input. However, their performance is far inferior to the tripartite graph-based matting approach. The main reasons behind this are: for some images, the deep learning network can create ambiguity as to which foreground object to matte due to the lack of the guiding constraint of the trimap. To address this ambiguity, some approaches collect large-scale matting datasets for only certain object classes (e.g., faces) for training deep learning networks. However, this solution also has the problems of inextensibility, high cost, etc., and especially if the user wants to perform matting on categories that do not appear in the training set, the effect is often poor. In addition, if a plurality of characters (i.e. foreground) appear in an image, the user does not need to scratch all the characters, so some user interactions are inevitable, and the key is how to minimize the interaction cost of the user in the interaction process and accurately extract the foreground specified by the user, but at present, no effective scheme exists.
Disclosure of Invention
The invention aims to provide a natural image matting method based on real-time click interaction of a user, which can achieve performance equivalent to the image matting method based on a three-segment image with superior performance by only requiring the user to provide a small number of clicks (in most cases, when the image has no foreground ambiguity problem, no user clicks are needed) to indicate that the position is a foreground or a background.
The purpose of the invention is realized by the following technical scheme:
a natural image matting method based on real-time click interaction of a user comprises the following steps:
interactive matting stage: acquiring an input original image and an indicator graph containing foreground and background information, which is obtained by interacting with a user; extracting an image mask only containing foreground information in the indication image from a complete image mask of the original image according to the indication image to be used as a preliminary image mask;
local refinement stage guided by uncertainty: carrying out uncertainty estimation on the preliminary image mask to obtain an uncertainty map, cutting pixel blocks at corresponding positions with uncertainty exceeding a preset value from the preliminary image mask and the original image under the guidance of the uncertainty map, and carrying out local fine modification through a full convolution network without downsampling; and after obtaining a local fine correction result, pasting the corresponding position of the initial image mask to obtain a fine corrected image mask which is used as a complete image mask of the iteration.
The technical scheme provided by the invention can show that the performance of the method greatly precedes that of the existing full-automatic matting method under the condition of using only a few user interactions, and the method is equivalent to the most advanced matting method based on the tripartite drawing at present.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1 is a frame diagram of a natural image matting method based on user real-time click interaction according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a process of real-time user click interaction according to an embodiment of the present invention;
FIG. 3 is a comparison of visual tests on real human images provided by an embodiment of the present invention;
fig. 4 is a visual comparative display diagram before and after partial refinement according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a natural image matting method based on real-time click interaction of a user, which mainly comprises the following two stages as shown in fig. 1.
First, interactive cutout stage.
In the stage, an input original image and an indicator graph containing foreground and background information obtained by interacting with a user are obtained; and extracting an image mask only containing foreground information in the indication map from the complete image mask of the original image according to the indication map to serve as a preliminary image mask.
As shown in fig. 1, the operation of the interactive matting phase is accomplished by an encoder in cooperation with a mask decoder.
The encoder is used for encoding the input original image and the indication map.
And the mask decoder is used for predicting the preliminary image mask according to the coding result of the coder.
Illustratively, the encoder and the mask decoder may be implemented by a U-Net network.
In the embodiment of the invention, the original image can be a three-channel RGB image, and the indication image is a single-channel image.
In the embodiment of the invention, before the interaction with the user, all pixel values in the indication image are 0, and when the interaction with the user is carried out, if a click operation of an indication foreground added by the user is received, a dot with the radius of r and the pixel value of 1 is filled in the corresponding position of the indication image; if a click operation which is added by a user and indicates a background is received, a dot with the radius of r and the pixel value of-1 is filled in the corresponding position of the indication image.
In the embodiment of the invention, in order to enable the deep learning network to adjust the behavior of the deep learning network according to the user click, a main difficulty is to collect the training data with the click, but the method is quite expensive. Therefore, the invention innovatively provides that the user click is simulated in the training process: in the training phase, for each training image, a number (e.g., 0-6) of foreground or background points of a specified radius size (e.g., 15 pixels in radius) are randomly sampled to generate an indication map.
In the training stage, an error function formed by image space loss (reg) and gradient space loss (grad) is designed.
Image space loss applying L to transition region T of original image1Loss, L is applied to the foreground and background regions S ═ { F, B } of the original image2Loss:
Figure BDA0002934651720000041
in the above formula, αpAnd alphagRespectively representing a predicted preliminary image mask and a given supervision mask, and obtaining a mask value and a supervision information value of a corresponding pixel after adding superscripts, wherein | f | represents the number of elements of f; i. j each represents a pixel index;
gradient spatial loss is L of predicted preliminary image mask and supervision mask in spatial gradient1Loss:
Figure BDA0002934651720000042
wherein Ω represents all pixels of the original image I, and v is a gradient magnitude symbol, and the network can be effectively encouraged to generate a sharper matting result by introducing a gradient spatial loss amount.
And secondly, local fine modification guided by uncertainty.
This stage aims at realizing automatic local refinement to the preliminary image mask of interactive matting stage output to output more meticulous image mask, mainly include: carrying out uncertainty estimation on the preliminary image mask to obtain an uncertainty map, cutting pixel blocks of corresponding positions with uncertainty exceeding a preset value from the preliminary image mask and the original image under the guidance of the uncertainty map, and carrying out local fine modification through a full convolution network without downsampling; and after obtaining a local fine correction result, pasting the corresponding position of the initial image mask to obtain a fine corrected image mask which is used as a complete image mask of the iteration. The following is a description of the uncertainty estimation and local refinement process.
1) And estimating uncertainty.
As shown in fig. 1, the uncertainty map is estimated by the encoder working in conjunction with an uncertainty estimation module. The uncertainty estimation network is parallel to a mask decoder and shares the same encoder, and primary image mask prediction is described by using univariate Laplace distribution, so that an uncertainty map is estimated:
Figure BDA0002934651720000051
wherein mu is a preliminary image mask alphap(ii) a σ is an uncertainty map σ of the uncertainty estimation module outputpThe larger the value is, the larger the uncertainty degree of the result output by the uncertainty degree estimation module to the matting network is; x is a supervision mask alphag(ii) a f (x | μ, σ) represents the laplacian distribution that characterizes the preliminary image mask using both μ and σ parameters, with the ultimate goal of estimating the uncertainty map.
As will be understood by those skilled in the art, the above equation describes the process of the preliminary image mask prediction by using a univariate Laplace distribution, and the true value of each pixel point i is the supervisory information value
Figure BDA0002934651720000052
The preliminary image mask predicted by the mask decoder is the mean value mu and the uncertainty estimation block is equivalent to the variance sigma in the predicted laplacian distribution, i.e. the uncertainty.
In the embodiment of the invention, the uncertainty estimation module is trained by adopting a negative log-likelihood minimization mode:
Figure BDA0002934651720000053
where i represents a pixel index.
In the implementation of the invention, the uncertainty represents the uncertainty of each pixel value of the initial image mask, and for the pixel point at any position, if the uncertainty is larger, the initial image mask value output at the corresponding position is more uncertain, so that further refinement is needed.
By the above-mentioned loss function LueAfter the uncertainty estimation module is trained, the uncertainty map of the initial image mask can be accurately estimated, so that the subsequent local fine correction process is guided.
2) And (5) local finishing process.
Under the guidance of an uncertain graph, cutting the preliminary image mask and the original image into pixel blocks (the default size of the pixel blocks can be set to 64 × 64) with the uncertainty exceeding a preset value (the specific numerical value can be set by self), and then sending the pixel blocks and the original image together into a full convolution network without downsampling for local refinement.
In the embodiment of the present invention, the full convolution network without downsampling is the refinement network shown in fig. 1. Since the block size of the clipped pixel block is generally much smaller than the size of the original image, the calculation overhead is much smaller than the global refinement approach. Since most of the pixels in the cropped pixel block have been accurately predicted for transparency, only a small fraction of the "stubborn" pixels need to be heavily refined. In order for the refinement network to give more attention to these "stubborn" pixels, a difficult sample mining objective function is employed for training:
Figure BDA0002934651720000061
where C represents the entire set of pixels, αpFor preliminary image masking, αgFor the supervised information, H represents the difficult pixel set ranked K% top in the whole pixel set with the corresponding supervised information error, λ represents the enhancement weight for the difficult pixel set H, and i, j both represent the pixel index.
For example, K may be 20, and λ may be 1, and of course, the numerical values of the parameters in the embodiments of the present invention are all examples and are not limited, and in practical applications, a user may set the numerical values according to needs or experience.
The whole framework shown in fig. 1 adopts a segmented training mode: 1) training the encoder and the mask decoder, the loss function is: l isalpha=Lreg+Lgrad. 2) Fixing the encoder and the mask decoder, retraining the uncertainty estimation module, and applying the loss function L as described aboveue. 3) Training the refinement network alone with the loss function L introduced aboverefine
As shown in fig. 2, for a summary diagram of a real-time user click interaction process, the frame of the matting method in fig. 2 is the frame shown in fig. 1. At the initial moment, for the input original image, the encoder is matched with the mask decoder to predict a preliminary image mask, and then a refined image mask is obtained through local refinement and is used as a complete image mask at the initial moment. If the expected requirements of the user are not met, determining foreground and background information specified by the user in the original image through interaction with the user, and generating an indication map; after the indication graph is coded by the coder, the indication graph is input to a mask decoder through jumping connection, and a preliminary image mask is extracted from the complete image mask by the mask decoder; and then, local fine modification is carried out to obtain a fine modified image mask which is used as a complete image mask of the iteration. Each iteration comprises two stages of operation, the complete image masks related to the interactive matting stage are all the complete image masks obtained in the last iteration, and in the actual operation, the iteration can be repeated for multiple times until the required complete image masks are obtained. The original input image in fig. 2 contains two foreground objects, the left object is indicated as background and the right object is indicated as foreground in the user interaction, and finally an image mask containing only the right object is output.
Compared with the prior art, the method has the advantages that:
1. the method of the invention provides a brand-new interaction mode, and can achieve the performance equivalent to that of the method based on the three-segment diagram only by few click interaction; compared with a full-automatic image matting method without inputting additional information, the method has the advantages that the semantic ambiguity problem does not exist, the performance is greatly improved, the deep learning network can be generalized to the types which are not seen in a training set only by few clicks, and the high-quality image mask is output.
2. The local fine-trimming method guided by the uncertainty provided by the method can enable a user to flexibly select the number of small blocks to be subjected to local fine-trimming according to the calculation overhead of the user. The local refinement method is more flexible and efficient than existing global refinement methods, and reduces the computational overhead for most of the regions that have been predicted correctly.
On the other hand, in order to illustrate the performance advantages of the method of the present invention, the method of the present invention and other existing methods are subjected to quantitative index pairs on a DIM test set, as shown in Table 1. In the table, LF-matching is a full-automatic Matting method, and all other methods are based on a trisection Matting method. The quantitative evaluation index includes: sum of Absolute Difference (SAD), Mean Square Error (MSE), gradient (Grad), connectivity (Conn), all four indexes are as small as possible. As can be seen from Table 1, the method of the present invention has performance equivalent to that of the method based on the treelet matting while the interaction cost is greatly reduced.
Figure BDA0002934651720000071
TABLE 1 quantitative comparison results on DIM test set
Fig. 3 shows the comparison of the visual test of the present invention on real figures, when the test is performed, the present invention only trains on the figure data set. As can be seen from the figure, although only the portrait category exists in the training set, by providing the method with some users to click the indication foreground or background, the method can be easily generalized to the category which is not seen in the training set; by giving a background click indication to the foreground object, the method can retain the foreground object desired by the user. Therefore, the advantages of the invention are obvious.
FIG. 4 shows a visual comparative display of the present invention before and after partial refinement. In each row, from left to right, the following are in turn: original image, result before local refinement, result after local refinement, and supervision information. It can be clearly seen that local refinement can significantly improve edge detail, eliminating blurring artifacts.
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (6)

1. A natural image matting method based on user real-time click interaction is characterized by comprising the following steps:
interactive matting stage: acquiring an input original image and an indicator graph containing foreground and background information, which is obtained by interacting with a user; extracting an image mask only containing foreground information in the indication image from a complete image mask of the original image according to the indication image to be used as a preliminary image mask;
local refinement stage guided by uncertainty: carrying out uncertainty estimation on the preliminary image mask to obtain an uncertainty map, cutting pixel blocks at corresponding positions with uncertainty exceeding a preset value from the preliminary image mask and the original image under the guidance of the uncertainty map, and carrying out local fine modification through a full convolution network without downsampling; and after obtaining a local fine correction result, pasting the corresponding position of the initial image mask to obtain a fine corrected image mask which is used as a complete image mask of the iteration.
2. The natural image matting method based on user real-time click interaction as claimed in claim 1, characterized in that the operation of interactive matting phase is completed by the cooperation of encoder and mask decoder;
the encoder is used for encoding an input original image and an indication image;
the mask decoder is used for predicting the preliminary image mask according to the encoding result of the encoder;
at the initial moment, for an input original image, the encoder is matched with the mask decoder to predict a preliminary image mask, and then a refined image mask is obtained through local refinement and is used as a complete image mask at the initial moment; then, through interaction with a user, determining foreground and background information specified by the user in the original image, and generating an indication map; and the indication graph is input to a mask decoder through jump connection after being coded by the coder, and the mask decoder extracts a preliminary image mask from the complete image mask.
3. The natural image matting method based on user click interaction in real time according to claim 1 or 2, characterized in that the loss function of the interactive matting stage includes: image space loss and gradient space loss;
image space loss applying L to transition region T of original image1Loss, applying L to the foreground and background regions S of the original image2Loss:
Figure FDA0002934651710000011
in the above formula, αpAnd alphagRespectively representing a predicted preliminary image mask and a given supervision mask, and obtaining a mask value and a supervision information value of a corresponding pixel after adding superscripts, wherein | f | represents the number of elements of f; i. j each represents a pixel index;
gradient spatial loss is L of predicted preliminary image mask and supervision mask in spatial gradient1Loss:
Figure FDA0002934651710000012
where Ω represents all pixels of the original image I, and ∑ is the gradient magnitude symbol.
4. The method for natural image matting based on real-time click interaction of a user according to claim 1 or 2, characterized in that all pixel values in the indication map are 0 before the user interacts with the user, and when the user interacts with the user, if a click operation of indicating a foreground added by the user is received, a dot with radius r and pixel value 1 is filled in a corresponding position of the indication map; if the click operation of the indication background added by the user is received, filling a dot with the radius of r and the pixel value of-1 in the corresponding position of the indication image;
in the training phase, for each training image, a plurality of foreground points or background points with specified radius sizes are randomly sampled so as to generate an indication map.
5. The natural image matting method based on real-time click interaction of users according to claim 2, characterized in that an uncertainty map is estimated by a coder working in cooperation with an uncertainty estimation module;
the uncertainty estimation network is parallel to a mask decoder and shares the same encoder, and primary image mask prediction is described by using univariate Laplace distribution, so that an uncertainty map is estimated:
Figure FDA0002934651710000021
wherein mu is a preliminary image mask alphapAnd sigma is an uncertainty map sigma output by the uncertainty estimation modulepX is supervision information alphag
Training the uncertainty estimation module using negative log-likelihood minimization:
Figure FDA0002934651710000022
where Ω represents all pixels of the original image I and I represents the pixel index.
6. The method as claimed in claim 1, wherein the full convolution network without downsampling is a refinement network, and is trained by adopting a difficult sample mining objective function:
Figure FDA0002934651710000023
where C represents the entire set of pixels, αpFor preliminary image masking, αgFor the supervised information, H represents the difficult pixel set ranked K% top in the whole pixel set with the corresponding supervised information error, λ represents the enhancement weight for the difficult pixel set H, and i, j both represent the pixel index.
CN202110158221.1A 2021-02-04 2021-02-04 Natural image matting method based on real-time click interaction of user Pending CN112862838A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110158221.1A CN112862838A (en) 2021-02-04 2021-02-04 Natural image matting method based on real-time click interaction of user

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110158221.1A CN112862838A (en) 2021-02-04 2021-02-04 Natural image matting method based on real-time click interaction of user

Publications (1)

Publication Number Publication Date
CN112862838A true CN112862838A (en) 2021-05-28

Family

ID=75988837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110158221.1A Pending CN112862838A (en) 2021-02-04 2021-02-04 Natural image matting method based on real-time click interaction of user

Country Status (1)

Country Link
CN (1) CN112862838A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113608805A (en) * 2021-07-08 2021-11-05 阿里巴巴新加坡控股有限公司 Mask prediction method, image processing method, display method and equipment
CN113838084A (en) * 2021-09-26 2021-12-24 上海大学 Matting method based on codec network and guide map

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111223106A (en) * 2019-10-28 2020-06-02 稿定(厦门)科技有限公司 Full-automatic portrait mask matting method and system
US20200311946A1 (en) * 2019-03-26 2020-10-01 Adobe Inc. Interactive image matting using neural networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200311946A1 (en) * 2019-03-26 2020-10-01 Adobe Inc. Interactive image matting using neural networks
CN111223106A (en) * 2019-10-28 2020-06-02 稿定(厦门)科技有限公司 Full-automatic portrait mask matting method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TIANYI WEI ET.AL: "Improved Image Matting via Real-time User Clicks and Uncertainty Estimation", 《ARXIV: 2012.08323V1》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113608805A (en) * 2021-07-08 2021-11-05 阿里巴巴新加坡控股有限公司 Mask prediction method, image processing method, display method and equipment
CN113608805B (en) * 2021-07-08 2024-04-12 阿里巴巴创新公司 Mask prediction method, image processing method, display method and device
CN113838084A (en) * 2021-09-26 2021-12-24 上海大学 Matting method based on codec network and guide map

Similar Documents

Publication Publication Date Title
Li et al. PDR-Net: Perception-inspired single image dehazing network with refinement
CN109712145B (en) Image matting method and system
CN107818554B (en) Information processing apparatus and information processing method
CN112862838A (en) Natural image matting method based on real-time click interaction of user
CN114187624B (en) Image generation method, device, electronic equipment and storage medium
Zhang et al. GAIN: Gradient augmented inpainting network for irregular holes
Shahrian et al. Temporally coherent and spatially accurate video matting
CN116205962B (en) Monocular depth estimation method and system based on complete context information
Zheng et al. Truncated low-rank and total p variation constrained color image completion and its moreau approximation algorithm
Zhang et al. Hierarchical attention aggregation with multi-resolution feature learning for GAN-based underwater image enhancement
Cui et al. Progressive dual-branch network for low-light image enhancement
CN112686830B (en) Super-resolution method of single depth map based on image decomposition
CN115587967B (en) Fundus image optic disk detection method based on HA-UNet network
CN116524307A (en) Self-supervision pre-training method based on diffusion model
CN115731447A (en) Decompressed image target detection method and system based on attention mechanism distillation
CN113516604B (en) Image restoration method
Su et al. Physical model and image translation fused network for single-image dehazing
CN112102216B (en) Self-adaptive weight total variation image fusion method
CN112634331A (en) Optical flow prediction method and device
CN113962332A (en) Salient target identification method based on self-optimization fusion feedback
Hörentrup et al. Confidence-aware guided image filter
CN113315995A (en) Method and device for improving video quality, readable storage medium and electronic equipment
EP3032497A2 (en) Method and apparatus for color correction
Saxena et al. An efficient single image haze removal algorithm for computer vision applications
Zhuang et al. Dimensional transformation mixer for ultra-high-definition industrial camera dehazing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210528