CN111815649A - Image matting method and computer readable storage medium - Google Patents

Image matting method and computer readable storage medium Download PDF

Info

Publication number
CN111815649A
CN111815649A CN202010621083.1A CN202010621083A CN111815649A CN 111815649 A CN111815649 A CN 111815649A CN 202010621083 A CN202010621083 A CN 202010621083A CN 111815649 A CN111815649 A CN 111815649A
Authority
CN
China
Prior art keywords
portrait
transparency
matting
data
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010621083.1A
Other languages
Chinese (zh)
Other versions
CN111815649B (en
Inventor
董宇涵
王克
张凯
李志德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen International Graduate School of Tsinghua University
Original Assignee
Shenzhen International Graduate School of Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen International Graduate School of Tsinghua University filed Critical Shenzhen International Graduate School of Tsinghua University
Priority to CN202010621083.1A priority Critical patent/CN111815649B/en
Publication of CN111815649A publication Critical patent/CN111815649A/en
Application granted granted Critical
Publication of CN111815649B publication Critical patent/CN111815649B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a portrait matting method and a computer readable storage medium, the method comprises: acquiring portrait data, wherein the portrait data comprises a portrait picture or a portrait video; calculating to obtain the transparency of the portrait in the portrait data as a first transparency based on the trained deep learning network; obtaining a first portrait foreground by using the first transparency and the portrait data; adjusting the transparency of the portrait to a second transparency; and obtaining a second portrait foreground by utilizing the second transparency, and finishing portrait matting. The portrait picking is automatically realized through a deep learning network; after the results of automatic portrait matting are obtained, an interactive matting function is provided, allowing the user to further refine the matting effect through interactive operations on the results of automatic portrait matting to achieve higher quality matting results. The method has the advantages of high running speed and less memory occupation, and can be deployed on various intelligent devices to carry out real-time portrait matting processing.

Description

Image matting method and computer readable storage medium
Technical Field
The invention relates to the technical field of image matting, in particular to an image matting method and a computer-readable storage medium.
Background
The cutout is a basic image editing technology and has wide application and important economic value. Matting is the basic operation in the work of visual special effects, artistic design, later stage of films and televisions and the like. Many commercialized products for matting or integrating matting functions are available in our daily lives.
The current matting techniques mainly have two types:
the first type is an interactive matting method represented by the design software "Photoshop". This type of method requires the user to guide the matting algorithm to complete the matting through interactive operations. The method has good matting effect, but the interactive operation is very complicated, a large amount of time is consumed for a user, and certain learning cost and skill requirements are also provided for the user. Matting is an ill-defined problem. Additional interaction information may be introduced in order to solve the problem. From the interaction information the algorithm knows part of the foreground and part of the background. The sampling modeling method is based on a mathematical statistics method, and is used for respectively sampling a known foreground and a known background and building a foreground distribution model and a background distribution model. The distribution model of the unknown region is a hybrid model of the foreground model and the background model. The method specifically comprises parameter methods such as Ruzon and Tomasi matting algorithm and Bayesian matting algorithm, and a nonparametric method for representing a data distribution model in a frequency histogram personalized manner. The method based on the affinity measurement understands the transparency of the foreground as the affinity of the pixels of the unknown region with respect to the image background and the image foreground, such as a poisson image matting method, a random walking method, a geodesic distance method and a closed method. However, when the form of the interactive information is rough, the sample information is insufficient to cause a large error; when the image texture is more complex, the estimation error of the model is larger.
The second category is a fully automatic portrait matting method represented by intelligent photo applications. The method does not need user interaction guide and can automatically realize portrait keying. The method is simple and quick to operate, but the matting effect is poor. The full-automatic portrait matting method is mainly realized based on a deep learning theory, such as a DAPM model, an SHM model, an LDN + FB model, an MMNet model and an SDPN model, and does not need a user to provide interactive information any more. However, the matting effect is to be improved.
Therefore, a simple and convenient image matting method with good matting effect is lacked in the prior art.
The above background disclosure is only for the purpose of assisting understanding of the concept and technical solution of the present invention and does not necessarily belong to the prior art of the present patent application, and should not be used for evaluating the novelty and inventive step of the present application in the case that there is no clear evidence that the above content is disclosed at the filing date of the present patent application.
Disclosure of Invention
The invention provides a portrait matting method and a computer-readable storage medium for solving the existing problems.
In order to solve the above problems, the technical solution adopted by the present invention is as follows:
an image matting method comprises the following steps: s1: acquiring portrait data, wherein the portrait data comprises a portrait picture or a portrait video; s2: calculating to obtain the transparency of the portrait in the portrait data as a first transparency based on the trained deep learning network; s3: obtaining a first portrait foreground by using the first transparency and the portrait data; s4: adjusting the transparency of the portrait to a second transparency; s5: and obtaining a second portrait foreground by utilizing the second transparency, and finishing portrait matting.
Preferably, the deep-learning network structure comprises an encoder unit and a decoder unit; each of the encoder units comprises two branches: a coding branch and a spatial attention branch; the coding branch is used for coding and inputting the result to the next encoder unit; the spatial attention branch is used for generating a spatial attention distribution map, and the spatial attention distribution map is fused into a corresponding decoder unit in a point-to-point corresponding addition mode; each decoder unit only has one branch, the input characteristics are sampled firstly, then the spatial attention distribution maps output by the corresponding encoder units are fused according to the operation of adding the corresponding elements, then decoding is carried out, and finally the spatial attention distribution maps are output to the next decoder unit; and normalizing the output data of the decoder unit to 0-1 through a Sigmoid function, namely obtaining the first transparency of the portrait.
Preferably, the encoding branch of the encoder unit comprises in sequence: a two-dimensional convolution layer, a batch normalization layer, a modified linear active layer, a maximum pooling layer; the spatial attention branch sequentially comprises: a two-dimensional convolution layer, a batch normalization layer, a modified linear activation layer; the decoder unit includes: a 2-fold upsampling layer, a two-dimensional convolution layer, a batch normalization layer, a modified linear active layer.
Preferably, the number of the encoder units is 5, and the number of the decoder units is 5.
Preferably, the deep learning network is trained by using a data set, the data set includes portrait pictures or portrait videos and corresponding labels, and a loss function during training is as follows:
L(A,Agt)=γLmse(A,Agt)+tLrgb(A,Agt)+wLgrad(A,Agt)
wherein L ismse(A,Agt) Is the loss of mean square error, Lrgb(A,Agt) Is a loss of synthesis, Lgrad(A,Agt) Gradient loss, wherein gamma, t and w are weight coefficients of mean square error loss, synthesis loss and gradient loss respectively;
the loss of mean square error is:
Figure BDA0002563044260000031
the synthesis loss is:
Figure BDA0002563044260000032
the gradient loss is:
Figure BDA0002563044260000033
wherein z represents pixel points of a portrait picture or a portrait video frame, A represents the transparency of the portrait output by the deep learning network, and AzA value A representing the human image transparency output by the deep learning network at a pixel point zgtA tag representing a portrait picture or a portrait video frame,
Figure BDA0002563044260000034
value of a label representing a portrait picture or a portrait video frame at a pixel point z, IzRepresenting an input portrait picture or video frame,
Figure BDA0002563044260000035
is a gradient operator and k represents the number of pixels of an image or video frame.
Preferably, adaptive moment estimation is employed as the optimizer.
Preferably, adjusting the transparency of the portrait to a second transparency comprises: displaying the portrait picture as a first gray-scale image according to the first transparency of the portrait in the portrait picture, adjusting a black area and a white area of the first gray-scale image to obtain an adjusted second gray-scale image, and normalizing the second gray-scale image to 0-1 to obtain the second transparency of the portrait.
Preferably, adjusting the transparency of the portrait to a second transparency comprises: the first transparency of the portrait in the portrait video frame is obtained by a Sigmoid function as follows:
Figure BDA0002563044260000036
adjusting the first transparency of the portrait to the second transparency according to the following formula:
Figure BDA0002563044260000037
where x is the output data of the decoder unit 5 and P is the custom coefficient of x.
Preferably, the method also comprises the step of evaluating the effect of the portrait matting by adopting a gradient error, a connectivity error, a mean absolute value error and a mean square error;
the average absolute value error is:
Figure BDA0002563044260000041
the mean square error is:
Figure BDA0002563044260000042
the gradient error is:
Figure BDA0002563044260000043
the connectivity error is:
Figure BDA0002563044260000044
wherein z represents pixel points of a portrait picture or a portrait video frame, A represents the transparency of the portrait output by the deep learning network, and AzA value A representing the human image transparency output by the deep learning network at a pixel point zgtA tag representing a portrait picture or a portrait video frame,
Figure BDA0002563044260000045
the value of the label representing the portrait picture or the portrait video frame at pixel point z,
Figure BDA0002563044260000046
is a gradient operator, k represents the number of pixels of a portrait picture or a portrait video frame,
Figure BDA0002563044260000047
representing the neighborhood Ω according to pixel point zzCalculate output transparency AzThe degree of connectivity at the pixel point z,
Figure BDA0002563044260000048
representing the neighborhood Ω according to pixel point zzLabel for calculating portrait picture or portrait video frame
Figure BDA0002563044260000049
Connectivity at pixel point z.
The invention also provides a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of the above.
The invention has the beneficial effects that: a method for portrait matting and a computer readable storage medium are provided, which automatically realize portrait matting through a deep learning network; after the results of automatic portrait matting are obtained, an interactive matting function is provided, allowing the user to further refine the matting effect through interactive operations on the results of automatic portrait matting to achieve higher quality matting results.
Furthermore, the method has high running speed and small occupied memory, and can be deployed on various intelligent devices to carry out real-time portrait matting processing.
Drawings
Fig. 1 is a schematic diagram of a method for image matting according to an embodiment of the present invention.
Fig. 2(a) and fig. 2(b) are schematic diagrams of a deep learning network structure including an encoder unit and a decoder unit according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a deep learning network structure according to an embodiment of the present invention.
FIG. 4 is a schematic diagram of symbolic representations of different depth learning layers in an embodiment of the present invention.
Fig. 5 is a schematic diagram of a network structure and specific parameters of a deep learning-based image matting algorithm in the embodiment of the present invention.
FIG. 6 is a diagram illustrating the PSigmoid function at different P values in an embodiment of the present invention.
Fig. 7(a) -7 (d) are schematic diagrams illustrating the effect of an image matting algorithm in the embodiment of the present invention.
Fig. 8(a) -8 (c) are schematic diagrams illustrating the adjustment effect of "P value" in the embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects to be solved by the embodiments of the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It will be understood that when an element is referred to as being "secured to" or "disposed on" another element, it can be directly on the other element or be indirectly on the other element. When an element is referred to as being "connected to" another element, it can be directly connected to the other element or be indirectly connected to the other element. In addition, the connection may be for either a fixing function or a circuit connection function.
It is to be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in an orientation or positional relationship indicated in the drawings for convenience in describing the embodiments of the present invention and to simplify the description, and are not intended to indicate or imply that the referenced device or element must have a particular orientation, be constructed in a particular orientation, and be in any way limiting of the present invention.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the embodiments of the present invention, "a plurality" means two or more unless specifically limited otherwise.
As shown in fig. 1, the present invention provides a method for image matting, comprising the following steps:
s1: acquiring portrait data, wherein the portrait data comprises a portrait picture or a portrait video;
s2: calculating to obtain the transparency of the portrait in the portrait data as a first transparency based on the trained deep learning network;
s3: obtaining a first portrait foreground by using the first transparency and the portrait data;
s4: adjusting the transparency of the portrait to a second transparency;
s5: and obtaining a second portrait foreground by utilizing the second transparency, and finishing portrait matting.
The method of the invention obtains the first portrait foreground through the deep learning network, does not need user interaction, and is simple and convenient. For the picture with simple background or simple texture, the effect of the foreground of the first portrait is better, and the foreground of the second portrait can be obtained without interactive operation; for the picture with complex background or complex texture, the first portrait foreground may have larger error, and at this time, the second portrait foreground can be obtained by further repairing on the basis of the first portrait foreground through interactive operation, so that the ab initio interactive operation is avoided, and the interactive cost is saved.
The invention automatically realizes portrait digging through a deep learning network; after the results of automatic portrait matting are obtained, an interactive matting function is provided, allowing the user to further refine the matting effect through interactive operations on the results of automatic portrait matting to achieve higher quality matting results.
Furthermore, the method has high running speed and small occupied memory, and can be deployed on various intelligent devices to carry out real-time portrait matting processing.
Different deep learning-based portrait matting algorithms have different network structures. Different network architectures exhibit different algorithmic performance. The deep learning-based portrait matting algorithm provided by the invention has a unique deep learning-based network structure.
As shown in fig. 2(a) and 2(b), the deep learning network structure includes an encoder unit and a decoder unit;
each of the encoder units comprises two branches: a coding branch and a spatial attention branch;
the coding branch is used for coding and inputting the result to the next encoder unit;
the spatial attention branch is used for generating a spatial attention distribution map, and the spatial attention distribution map is fused into a corresponding decoder unit in a point-to-point corresponding addition mode;
each decoder unit only has one branch, the input characteristics are sampled firstly, then the spatial attention distribution maps output by the corresponding encoder units are fused according to the operation of adding the corresponding elements, then decoding is carried out, and finally the spatial attention distribution maps are output to the next decoder unit;
and normalizing the output data of the decoder unit to 0-1 through a Sigmoid function, namely obtaining the first transparency of the portrait.
In one embodiment of the present invention, the encoding branch of the encoder unit sequentially comprises: a two-dimensional convolution layer, a batch normalization layer, a modified linear active layer, a maximum pooling layer;
the spatial attention branch sequentially comprises: a two-dimensional convolution layer, a batch normalization layer, a modified linear activation layer;
the decoder unit includes: a 2-fold upsampling layer, a two-dimensional convolution layer, a batch normalization layer, a modified linear active layer.
The input data enters an encoder unit, an encoding branch performs operations of 'Conv', 'BN', 'ReLU', 'Conv', 'BN', 'ReLU' and 'Pooling' on the input data from top to bottom in sequence, and then the data is output; the spatial attention branch performs "Conv", "BN", and "ReLU" operations on input data in order from top to bottom, and then outputs the data. As in the network structure of the encoder unit shown in fig. 2(b), input data enters the decoder unit. The input data is operated in the order of "Up-sampling", "Conv", "BN", "ReLU", "Conv", "BN", and "ReLU" from top to bottom, and then the data is output.
The English names used are the names of different types of layers in deep learning, and are explained in detail as follows:
conv is a two-dimensional convolutional Layer (2D volume Layer);
BN Batch Normalization Layer (Batch Normalization Layer);
ReLU modified Linear activation Unit (Rectified Linear Unit);
pooling: maximum Pooling Layer (Max Pooling Layer);
up-sampling: 2 times the upsampled layer.
As shown in fig. 3, the number of the encoder units is 5, and the number of the decoder units is 5. The input portrait data passes through 5 encoder units and enters the decoder unit. The transparency of the portrait is output via 5 decoder units. The input portrait data goes directly to the encoder unit 1 and at the same time to the decoder unit 5. The data output by the encoding branch of encoder unit 1 enters encoder unit 2, and encoder unit 1 has no spatial attention branch. The data output by the encoding branch of the encoder unit 2 enters the encoder unit 3, and the data output by the spatial attention branch is fused into the input data of the decoder unit 4 in a manner that corresponding elements are added. The data output by the encoding branch of the encoder unit 3 enters the decoder unit 4, and the data output by the spatial attention branch is fused into the input data of the decoder unit 3 in a manner that corresponding elements are added. The data output by the encoding branch of the encoder unit 4 enters the encoder unit 5 and the data output by the spatial attention branch is fused into the input data of the decoder unit 2 in a manner that the corresponding elements are added. The data output by the encoding branch of the encoder unit 5 and the data output by the spatial attention branch of the encoder unit 5 are fused in such a way that the corresponding elements are added, and then enter the decoder unit 1. The output data of the decoder unit 1 and the data output from the spatial attention branch of the encoder unit 4 are fused in such a manner that the corresponding elements are added, and then enter the decoder unit 2. The output data of the decoder unit 2 and the output data of the spatial attention branch of the encoder unit 3 are fused in a corresponding element addition manner, and then enter the decoder unit 3. The output data of the decoder unit 3 and the data output from the spatial attention branch of the encoder unit 2 are fused in such a manner that the corresponding elements are added, and then enter the decoder unit 4. The output data of the decoder unit 4 and the input portrait data are concatenated according to the channel and then enter the decoder unit 5. The output data of the decoder unit 5 passes through the defined PSigmoid function and then outputs the transparency of the portrait. In the training and testing stage, the P value of the PSigmoid function is set to be 1; in the interactive repairing stage of the implementation stage, the P value can be adjusted correspondingly according to the output effect of the algorithm.
As shown in fig. 4 and 5, the input portrait data is scaled to a specification of 256 × 3 into the encoder unit 1. The detailed parameters in encoder unit 1, encoder unit 2, encoder unit 3, encoder unit 4 and encoder unit 5 are set as: "kernel _ size" of all convolution layers Conv is set to 3 x 3, "stride" is set to 1, "padding" is set to 1, no bias term is added; the size of all the largest Pooling layers "Pooling" is set to 2. The encoder unit 1 inputs 3 channels of data and outputs 8 channels of data; the encoder unit 2 inputs 8-channel data and outputs 16-channel data; the encoder unit 3 inputs 16-channel data and outputs 32-channel data; the encoder unit 4 inputs 32 channel data and outputs 48 channel data; the encoder unit 5 inputs 48 channel data and outputs 64 channel data. After passing through the encoder unit 1, the encoder unit 2, the encoder unit 3, the encoder unit 4, and the encoder unit 5, the input portrait data is changed from the specification of 256 × 3 to the specification of 16 × 64. And then into decoder unit 1.
The detailed parameters in decoder unit 1, decoder unit 2, decoder unit 3, decoder unit 4 and decoder unit 5 are set to: the size of all upsampling layers "UP-sampling" is set to 2, and an upsampling mode of bilinear interpolation is adopted. All convolution layers Conv have "kernel _ size" set to 3 x 3, "stride" set to 1, "padding" set to 1, no bias term added. The decoder unit 1 inputs 64 channel data and outputs 48 channel data; the decoder unit 2 inputs 48 channel data and outputs 32 channel data; the decoder unit 3 inputs 32-channel data and outputs 16-channel data; the decoder unit 4 inputs 16 channels of data and outputs 8 channels of data; the decoder unit 5 inputs 11-channel data and outputs 1-channel data. The output data of the decoder unit 5 is output through the PSigmoid function, and the portrait transparency with the specification of 256 × 1 is obtained. And the portrait transparency is scaled back to the size of the originally input portrait data, and the portrait transparency is multiplied by corresponding elements of the originally input portrait data to obtain the portrait foreground.
On the basis of the above embodiment, 6 encoder units and corresponding 6 decoder units are adopted, and the specific structure and parameter setting of each encoder unit and decoder unit are the same as those in embodiment 1. The other portions are the same as in the above embodiment.
On the basis of the above-described embodiment, 4 encoder units and corresponding 4 decoder units are employed, and the specific structure and parameter settings of each encoder unit and decoder unit are the same as those in embodiment 1. The other portions are the same as in example 1.
In one embodiment of the present invention, the deep learning network needs to be pre-trained. Training data is prepared, a loss function and an optimizer are set, and training of the deep learning network can be completed in an iterative updating mode.
The training data may be from a public data set, with data enhancement of the training data. Firstly, the image size is randomly zoomed, and the zooming proportion is uniformly distributed in 0.8-1.2. Then, the image is rotated with a 50% probability, and the rotation angles are uniformly distributed at-30 to-30 degrees. Finally, the luminance conversion is performed by using the parameter 0.1, the contrast conversion is performed by using the parameter 0.1, the saturation conversion is performed by using the parameter 0.1, and the hue conversion is performed by using the parameter 0.05. And (4) generating 20 pictures according to the steps for each picture in the training set. During the training process, the input image is then horizontally flipped with a 50% probability. The above data enhancement methods are merely exemplary, and other data enhancement methods may be employed.
Training a deep learning network by adopting a data set, wherein the data set comprises portrait pictures or portrait videos and corresponding labels, and a loss function during training is as follows:
L(A,Agt)=γLmse(A,Agt)+tLrgb(A,Agt)+wLgrad(A,Agt)
wherein L ismse(A,Agt) Is the loss of mean square error, Lrgb(A,Agt) Is a loss of synthesis, Lgrad(A,Agt) Gradient loss, wherein gamma, t and w are weight coefficients of mean square error loss, synthesis loss and gradient loss respectively;
the mean square error loss utilizes the mean square error between the output result of the deep learning network and the label as a loss function:
Figure BDA0002563044260000101
the synthesis loss utilizes the mean square error of the color picture synthesized by transparency as a loss function:
Figure BDA0002563044260000102
the gradient loss utilizes the gradient of the output result of the deep learning network and the gradient of the label, and utilizes the absolute value error between the two as a loss function:
Figure BDA0002563044260000103
wherein z represents pixel points of a portrait picture or a portrait video frame, A represents the transparency of the portrait output by the deep learning network, and AzA value A representing the human image transparency output by the deep learning network at a pixel point zgtA tag representing a portrait picture or a portrait video frame,
Figure BDA0002563044260000104
value of a label representing a portrait picture or a portrait video frame at a pixel point z, IzRepresenting an input portrait picture or video frame,
Figure BDA0002563044260000105
is a gradient operator and k represents the number of pixels of an image or video frame.
In order to complete the training of the portrait matting algorithm based on deep learning, the invention adopts Adaptive Moment Estimation (Adam) as an optimizer. It is understood that other optimizers may be used to perform the training of the algorithm proposed by the present invention, and the present invention is not limited thereto.
In one embodiment of the present invention, adjusting the transparency of the portrait to a second transparency comprises:
displaying the portrait picture as a first gray-scale image according to the first transparency of the portrait in the portrait picture, adjusting a black area and a white area of the first gray-scale image to obtain an adjusted second gray-scale image, and normalizing the second gray-scale image to 0-1 to obtain the second transparency of the portrait.
Specifically, after the portrait transparency and the portrait foreground are automatically obtained by the portrait matting algorithm based on deep learning, the user can further repair the portrait foreground through interactive operation. There are two specific interactive operations:
the first interaction is for a portrait picture: and displaying the transparency of the portrait output by the deep learning network as a gray scale map, and modifying the gray scale map of the transparency of the portrait through 'smearing' and 'erasing' operations. "painting" black indicates that the area is modified into a foreground area of a portrait; "painted" white indicates that the area is modified to be the background area of the portrait. And normalizing the gray level diagram of the modified portrait transparency to 0-1 to obtain the corrected portrait transparency. And calculating to obtain the portrait foreground according to the corrected portrait transparency and the original input data.
Second adjusting the transparency of the portrait to a second transparency comprises:
output data of a decoder unit in the portrait matting algorithm based on deep learning is normalized to 0-1 through a Sigmoid function, and therefore the transparency of the portrait is obtained. The Sigmoid function is a nonlinear activation unit in deep learning theory, and is defined as follows:
Figure BDA0002563044260000111
the human image transparency output by the human image matting algorithm based on deep learning is influenced by the data distribution of the data output by the decoder unit and the characteristics of the Sigmoid function. Then changing the characteristics of the Sigmoid function itself will also change the transparency of the portrait output by the algorithm.
Adjusting the first transparency of the portrait to the second transparency according to the following formula:
Figure BDA0002563044260000112
where x is the output data of the decoder unit 5 and P is the coefficient of x, it can be customized. When P is 1, PSigmoid is Sigmoid. And directly using the PSigmoid, and setting the P value to be 1 in a training stage, a testing stage and a full-automatic portrait matting stage. And replacing the Sigmoid function in the deep learning-based portrait matting algorithm with the PSigmoid function. The user can improve the output effect of video matting by adjusting the 'P value'.
As shown in fig. 6, the PSigmoid functions are respectively when P is 0.5, P is 1, and P is 5.0. When P is 1, PSigmoid is Sigmoid; when P is larger than 1, the slope of the PSigmoid function is increased, and the semitransparent area of the transparency of the portrait output by the algorithm is reduced, so that the edge of the foreground of the portrait output by the algorithm is smoother; when P is smaller than 1, the slope of the PSigmoid function becomes smaller, and the semitransparent area of the transparency of the portrait output by the algorithm becomes larger, so that the details of the foreground of the portrait output by the algorithm become richer. In particular, as P approaches infinity, portrait matting can degrade into portrait segmentation. In training and testing, the P value is set to be 1, and in interactive repairing, a user can influence the output effect of the algorithm by setting the P value.
In a specific embodiment of the present invention, the image matting method proposed by the present invention is implemented using a PyTorch deep learning framework and Python programming language. The PyTorch deep learning framework and Python programming language used in this example are not limitations of the present invention. The method provided by the invention is realized by using any other deep learning framework and any programming language, and the method belongs to the coverage of the invention. Where the deep learning network architecture of the present invention is fixed, the specific parameter settings are allowed to vary. This embodiment gives only a special case of one of the algorithm parameter settings. Any specific parameter setting under the network architecture described in the present method is within the scope of the present invention.
To complete the training of the deep learning network of the present invention, the public dataset DAPM natural portrait dataset is used. The data set was divided into a training set of 1700 portrait data and a testing set of 300 portrait data. Data enhancement is performed on the training set. Firstly, the image size is randomly zoomed, and the zooming proportion is uniformly distributed in 0.8-1.2. Then, the image is rotated with a 50% probability, and the rotation angles are uniformly distributed at-30 to-30 degrees. Finally, the luminance conversion is performed by using the parameter 0.1, the contrast conversion is performed by using the parameter 0.1, the saturation conversion is performed by using the parameter 0.1, and the hue conversion is performed by using the parameter 0.05. And (4) generating 20 pictures according to the steps for each picture in the training set. The 1700 pictures can generate 34000 training data. During the training process, the input image is then horizontally flipped with a 50% probability.
The above-described loss function is implemented using Python programming language, where γ is 1, t is 1, and w is 4. The Python programming language used and the specific settings of the parameters γ, t and w are not limitations of the present invention. It is within the scope of the present invention to implement the loss function proposed by the present invention using any other programming language and parameters γ, t, and w.
Adaptive Moment Estimation (Adam) in the PyTorch depth learning framework was used as the optimizer. The "weight _ decay" parameter of the Adam optimizer is set to 0.0005. The algorithm was trained for 200 rounds with an initial learning rate of 0.01, multiplying the learning rate by 0.1 every 50 rounds.
In this embodiment, a PyTorch deep learning framework, a Python programming language, and an OpenCV function library are used to implement the image matting method. Portrait data is input. When the input portrait data is a picture, the picture is converted into a picture in an RGB format, and the longest edge of the picture is scaled to 256 pixels. It is then converted into 3-dimensional tensor data input based on a PyTorch deep learning network structure. The network structure based on deep learning operates on the input tensor. And outputting a one-dimensional tensor by a portrait matting algorithm based on deep learning. The tensor is normalized to 0-1 through a Sigmoid function, and the transparency of the portrait is obtained. The portrait transparency is scaled back to the size of the original input data. A one-dimensional tensor representing the transparency of the portrait and a 3-dimensional tensor of the original input data are combined into a 4-dimensional tensor. Wherein the one-dimensional tensor representing the transparency of the portrait is at the 4 th channel of the 4-dimensional tensor. And storing the 4-dimensional tensor into a picture in a PNG format to obtain a portrait foreground picture. And repeatedly amplifying the one-dimensional tensor output by the portrait matting algorithm into a three-dimensional tensor, and storing the three-dimensional tensor into a JPEG-format picture to obtain a grayscale image of portrait transparency. When the input portrait data is a video, the video is treated as pictures frame by frame, and the method is the same as the above operation. And (4) gathering the portrait foreground sequences obtained frame by frame according to a time sequence and storing the portrait foreground sequences into a video to obtain a video matting result.
When the input portrait data is a picture, a portrait foreground picture and a grayscale picture of the portrait transparency are obtained through a portrait matting algorithm. The user can restore the grey-scale map of the transparency of the portrait by means of "painting" and "erasing" operations. The specific method is that the black modified area is painted as the foreground; the area is modified to background by "painting" white. And after the gray level image of the human image transparency is modified, converting the gray level image into a one-dimensional tensor. And converting the portrait foreground picture in the PNG format into a 4-dimensional tensor, and replacing the 4 th channel of the 4-dimensional tensor of the portrait foreground picture by using the modified one-dimensional tensor of the portrait transparency. And the updated 4-dimensional tensor is restored into the portrait foreground picture in the PNG format. And restoring the gray scale image of the transparency of the modified portrait into the gray scale image of the transparency of the portrait. The above operations may be repeated until the user completes the interactive remedial operation.
And when the input data is the portrait video, obtaining the video of the portrait foreground through a portrait matting algorithm. The 'Sigmoid' function in the figure matting algorithm based on deep learning is replaced by the 'PSigmoid' function defined by the invention. And adjusting the P value in the PSimoid function, and recalculating the transparency of the portrait based on a deep learning portrait matting algorithm. And obtaining the repaired portrait foreground video according to the portrait transparency and the original input data. The above operations may be repeated until the user completes the interactive remedial operation.
This example completes the test on the DAPM natural portrait dataset. The data set is a commonly used public test set in the field of image matting and is used for evaluating and comparing the processing effects of different image matting algorithms.
The method adopts gradient error, connectivity error, mean absolute value error and mean square error to evaluate the effect of portrait matting;
the average absolute value error is:
Figure BDA0002563044260000131
the mean square error is:
Figure BDA0002563044260000132
the gradient error is:
Figure BDA0002563044260000133
the connectivity error is:
Figure BDA0002563044260000141
wherein z represents pixel points of portrait pictures or portrait video frames, and A represents output of the deep learning networkTransparency of the portrait, AzA value A representing the human image transparency output by the deep learning network at a pixel point zgtA tag representing a portrait picture or a portrait video frame,
Figure BDA0002563044260000142
the value of the label representing the portrait picture or the portrait video frame at pixel point z,
Figure BDA0002563044260000143
is a gradient operator, k represents the number of pixels of a portrait picture or a portrait video frame,
Figure BDA0002563044260000144
representing the neighborhood Ω according to pixel point zzCalculate output transparency AzThe degree of connectivity at the pixel point z,
Figure BDA0002563044260000145
representing the neighborhood Ω according to pixel point zzLabel for calculating portrait picture or portrait video frame
Figure BDA0002563044260000146
Connectivity at pixel point z.
The image matting method has the function of automatically extracting the image foreground. As shown in fig. 7(a) -7 (d), the original image in the data set, the first portrait foreground obtained by the method of the present invention, and the image of the synthesized new background are sequentially from left to right, and it can be seen from the images that the present invention has a good effect of automatically extracting the portrait foreground, and has a good processing effect on portrait boundaries, especially on hair. The image matting method provided by the invention has the function of interactively repairing the image matting result.
As shown in fig. 8(a) -8 (c), the original pictures are sequentially from left to right, the first portrait foreground and the second portrait foreground obtained by the method of the present invention are sequentially adjusted from fig. 8(a) -8 (c) to have P equal to 0.1, P equal to 0.07, and P equal to 0.03, it can be seen that the output effect can be further improved by adjusting the "P value", the portrait foreground automatically extracted by the portrait matting algorithm based on deep learning has errors at the hair part, and the influence of the errors on the matting effect is weakened by changing the value of the P value. Aiming at the repair of the video image matting result, the 'P value' method is simple to operate and easy to implement.
As shown in Table 1, the method of the present invention was tested on DAPM natural portrait datasets with mean absolute error MAD of 22.071 × 10-3Mean square error MSE of 11.806 × 10-3Gradient error GE of 2.043X 10-3The connectivity error CE is 18.937 × 10-3
TABLE 1 evaluation results of the portrait matting algorithm
Figure BDA0002563044260000151
As shown in Table 2, the evaluation result of the invention is improved by about 18% compared with the optimal result of other image matting algorithms in the prior art, and the gradient error GE is from 2.48 multiplied by 10-3Reduced to 2.043X 10-3
TABLE 2 comparison of portrait matting algorithms
Figure BDA0002563044260000152
On a "DELL Inspiron 15-7572" notebook computer, the run time required for the method of the present invention to process a picture is approximately 17 milliseconds. The notebook computer carries a core display card with the model number of 'Intel UHD Graphics 620', the display memory of 128MB and the shared memory of 4044 MB.
The model size of the method of the invention based on PyTorch implementation is 1.52 MB. The method of the invention is tested on an intelligent mobile phone with the model of OPPO Find X. The smart phone is matched with a CPU with the model of Snapdragon 845 and a GPU with the model of Adreno 630. The test results show that the method of the invention takes about 60 milliseconds to process a picture on the mobile phone. The method of the invention has real-time processing speed on the mobile phone.
An embodiment of the present application further provides a control apparatus, including a processor and a storage medium for storing a computer program; wherein a processor is adapted to perform at least the method as described above when executing the computer program.
Embodiments of the present application also provide a storage medium for storing a computer program, which when executed performs at least the method described above.
Embodiments of the present application further provide a processor, where the processor executes a computer program to perform at least the method described above.
The storage medium may be implemented by any type of volatile or non-volatile storage device, or combination thereof. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an erasable Programmable Read-Only Memory (EPROM), an electrically erasable Programmable Read-Only Memory (EEPROM), a magnetic random Access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double data rate Synchronous Dynamic Random Access Memory (DDRSDRAM, Double DataRateSync Synchronous Random Access Memory), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM, Enhanced Synchronous Dynamic Random Access Memory), Synchronous link Dynamic Random Access Memory (SLDRAM, Synchronous Dynamic Random Access Memory (SLDRAM), Direct Memory (DRMBER, Random Access Memory). The storage media described in connection with the embodiments of the invention are intended to comprise, without being limited to, these and any other suitable types of memory.
In the several embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.
The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments.
Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict.
The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several equivalent substitutions or obvious modifications can be made without departing from the spirit of the invention, and all the properties or uses are considered to be within the scope of the invention.

Claims (10)

1. A method for portrait matting is characterized by comprising the following steps:
s1: acquiring portrait data, wherein the portrait data comprises a portrait picture or a portrait video;
s2: calculating to obtain the transparency of the portrait in the portrait data as a first transparency based on the trained deep learning network;
s3: obtaining a first portrait foreground by using the first transparency and the portrait data;
s4: adjusting the transparency of the portrait to a second transparency;
s5: and obtaining a second portrait foreground by utilizing the second transparency, and finishing portrait matting.
2. The method of portrait matting according to claim 1, characterized in that the deep-learned network structure comprises an encoder unit and a decoder unit;
each of the encoder units comprises two branches: a coding branch and a spatial attention branch;
the coding branch is used for coding and inputting the result to the next encoder unit;
the spatial attention branch is used for generating a spatial attention distribution map, and the spatial attention distribution map is fused into a corresponding decoder unit in a point-to-point corresponding addition mode;
each decoder unit only has one branch, the input characteristics are sampled firstly, then the spatial attention distribution maps output by the corresponding encoder units are fused according to the operation of adding the corresponding elements, then decoding is carried out, and finally the spatial attention distribution maps are output to the next decoder unit;
and normalizing the output data of the decoder unit to 0-1 through a Sigmoid function, namely obtaining the first transparency of the portrait.
3. The image matting method according to claim 2, characterized in that the coding branches of the encoder unit comprise in order: a two-dimensional convolution layer, a batch normalization layer, a modified linear active layer, a maximum pooling layer;
the spatial attention branch sequentially comprises: a two-dimensional convolution layer, a batch normalization layer, a modified linear activation layer;
the decoder unit includes: a 2-fold upsampling layer, a two-dimensional convolution layer, a batch normalization layer, a modified linear active layer.
4. The image matting method according to claim 3, wherein said encoder unit is 5 and said decoder unit is 5.
5. The portrait matting method according to claim 4, characterized in that the deep learning network is trained with a data set, the data set comprises portrait pictures or portrait videos and corresponding labels, and the loss function during training is:
L(A,Agt)=γLmse(A,Agt)+tLrgb(A,Agt)+wLgrad(A,Agt)
wherein L ismse(A,Agt) Is the loss of mean square error, Lrgb(A,Agt) Is a loss of synthesis, Lgrad(A,Agt) Gradient loss, wherein gamma, t and w are weight coefficients of mean square error loss, synthesis loss and gradient loss respectively;
the loss of mean square error is:
Figure FDA0002563044250000021
the synthesis loss is:
Figure FDA0002563044250000022
the gradient loss is:
Figure FDA0002563044250000023
wherein z represents pixel points of a portrait picture or a portrait video frame, A represents the transparency of the portrait output by the deep learning network, and AzA value A representing the human image transparency output by the deep learning network at a pixel point zgtA tag representing a portrait picture or a portrait video frame,
Figure FDA0002563044250000024
value of a label representing a portrait picture or a portrait video frame at a pixel point z, IzRepresenting an input portrait picture or video frame,
Figure FDA0002563044250000025
is a gradient operator and k represents the number of pixels of an image or video frame.
6. The image matting method according to claim 5, characterized in that adaptive moment estimation is used as an optimizer.
7. The portrait matting method according to claim 6, wherein adjusting the transparency of the portrait to a second transparency comprises:
displaying the portrait picture as a first gray-scale image according to the first transparency of the portrait in the portrait picture, adjusting a black area and a white area of the first gray-scale image to obtain an adjusted second gray-scale image, and normalizing the second gray-scale image to 0-1 to obtain the second transparency of the portrait.
8. The portrait matting method according to claim 7, wherein adjusting the transparency of the portrait to a second transparency comprises:
the first transparency of the portrait in the portrait video frame is obtained by a Sigmoid function as follows:
Figure FDA0002563044250000031
adjusting the first transparency of the portrait to the second transparency according to the following formula:
Figure FDA0002563044250000032
where x is the output data of the decoder unit 5 and P is the custom coefficient of x.
9. The image matting method according to any one of claims 1 to 8, further comprising evaluating the effect of image matting using gradient error, connectivity error, mean absolute value error and mean square error;
the average absolute value error is:
Figure FDA0002563044250000033
the mean square error is:
Figure FDA0002563044250000034
the gradient error is:
Figure FDA0002563044250000035
the connectivity error is:
Figure FDA0002563044250000036
wherein z represents pixel points of a portrait picture or a portrait video frame, A represents the transparency of the portrait output by the deep learning network, and AzA value A representing the human image transparency output by the deep learning network at a pixel point zgtA tag representing a portrait picture or a portrait video frame,
Figure FDA0002563044250000037
the value of the label representing the portrait picture or the portrait video frame at pixel point z,
Figure FDA0002563044250000038
is a gradient operator, k represents the number of pixels of a portrait picture or a portrait video frame,
Figure FDA0002563044250000039
representing the neighborhood Ω according to pixel point zzCalculate output transparency AzThe degree of connectivity at the pixel point z,
Figure FDA0002563044250000041
representing the neighborhood Ω according to pixel point zzLabel for calculating portrait picture or portrait video frame
Figure FDA0002563044250000042
Connectivity at pixel point z.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 9.
CN202010621083.1A 2020-06-30 2020-06-30 Portrait matting method and computer readable storage medium Active CN111815649B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010621083.1A CN111815649B (en) 2020-06-30 2020-06-30 Portrait matting method and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010621083.1A CN111815649B (en) 2020-06-30 2020-06-30 Portrait matting method and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111815649A true CN111815649A (en) 2020-10-23
CN111815649B CN111815649B (en) 2023-12-01

Family

ID=72856626

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010621083.1A Active CN111815649B (en) 2020-06-30 2020-06-30 Portrait matting method and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111815649B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112396598A (en) * 2020-12-03 2021-02-23 中山大学 Image matting method and system based on single-stage multi-task collaborative learning
CN112929743A (en) * 2021-01-22 2021-06-08 广州光锥元信息科技有限公司 Method and device for adding video special effect to specified object in video and mobile terminal
CN113838084A (en) * 2021-09-26 2021-12-24 上海大学 Matting method based on codec network and guide map
CN114187317A (en) * 2021-12-10 2022-03-15 北京百度网讯科技有限公司 Image matting method and device, electronic equipment and storage medium
CN114786040A (en) * 2022-06-15 2022-07-22 阿里巴巴(中国)有限公司 Data communication method, system, electronic device and storage medium
WO2023098649A1 (en) * 2021-11-30 2023-06-08 北京字节跳动网络技术有限公司 Video generation method and apparatus, and device and storage medium
WO2023159746A1 (en) * 2022-02-23 2023-08-31 平安科技(深圳)有限公司 Image matting method and apparatus based on image segmentation, computer device, and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108961279A (en) * 2018-06-28 2018-12-07 Oppo(重庆)智能科技有限公司 Image processing method, device and mobile terminal
CN110400323A (en) * 2019-07-30 2019-11-01 上海艾麒信息科技有限公司 It is a kind of to scratch drawing system, method and device automatically
CN110570429A (en) * 2019-08-30 2019-12-13 华南理工大学 Lightweight real-time semantic segmentation method based on three-dimensional point cloud

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108961279A (en) * 2018-06-28 2018-12-07 Oppo(重庆)智能科技有限公司 Image processing method, device and mobile terminal
CN110400323A (en) * 2019-07-30 2019-11-01 上海艾麒信息科技有限公司 It is a kind of to scratch drawing system, method and device automatically
CN110570429A (en) * 2019-08-30 2019-12-13 华南理工大学 Lightweight real-time semantic segmentation method based on three-dimensional point cloud

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
RISHAB SHARMA ET.AL: "AlphaNet: An Attention Guided Deep Network for Automatic Image Matting", 《ARXIV:2003.03613V1 [CS.CV]》, pages 3 - 7 *
SHENGCHUAN ZHANG ET.AL: "Robust Face Sketch Synthesis via Generative Adversarial Fusion of Priors and Parametric Sigmoid", 《PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-18)》, pages 1163 - 1169 *
郝华颖 等: "一种基于改进ResU-Net 的角膜神经分割算法", 《计算机工程》, pages 2 - 7 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112396598A (en) * 2020-12-03 2021-02-23 中山大学 Image matting method and system based on single-stage multi-task collaborative learning
CN112396598B (en) * 2020-12-03 2023-08-15 中山大学 Portrait matting method and system based on single-stage multitask collaborative learning
CN112929743A (en) * 2021-01-22 2021-06-08 广州光锥元信息科技有限公司 Method and device for adding video special effect to specified object in video and mobile terminal
CN113838084A (en) * 2021-09-26 2021-12-24 上海大学 Matting method based on codec network and guide map
WO2023098649A1 (en) * 2021-11-30 2023-06-08 北京字节跳动网络技术有限公司 Video generation method and apparatus, and device and storage medium
CN114187317A (en) * 2021-12-10 2022-03-15 北京百度网讯科技有限公司 Image matting method and device, electronic equipment and storage medium
CN114187317B (en) * 2021-12-10 2023-01-31 北京百度网讯科技有限公司 Image matting method and device, electronic equipment and storage medium
WO2023159746A1 (en) * 2022-02-23 2023-08-31 平安科技(深圳)有限公司 Image matting method and apparatus based on image segmentation, computer device, and medium
CN114786040A (en) * 2022-06-15 2022-07-22 阿里巴巴(中国)有限公司 Data communication method, system, electronic device and storage medium
CN114786040B (en) * 2022-06-15 2022-09-23 阿里巴巴(中国)有限公司 Data communication method, system, electronic device and storage medium
WO2023241459A1 (en) * 2022-06-15 2023-12-21 阿里巴巴(中国)有限公司 Data communication method and system, and electronic device and storage medium

Also Published As

Publication number Publication date
CN111815649B (en) 2023-12-01

Similar Documents

Publication Publication Date Title
CN111815649A (en) Image matting method and computer readable storage medium
Bashir et al. A comprehensive review of deep learning-based single image super-resolution
Kim et al. Global and local enhancement networks for paired and unpaired image enhancement
CN111369582B (en) Image segmentation method, background replacement method, device, equipment and storage medium
Panetta et al. Tmo-net: A parameter-free tone mapping operator using generative adversarial network, and performance benchmarking on large scale hdr dataset
US11727628B2 (en) Neural opacity point cloud
CN111861886B (en) Image super-resolution reconstruction method based on multi-scale feedback network
CN112308866A (en) Image processing method, image processing device, electronic equipment and storage medium
WO2022156621A1 (en) Artificial intelligence-based image coloring method and apparatus, electronic device, computer readable storage medium, and computer program product
US20220392025A1 (en) Restoring degraded digital images through a deep learning framework
US20230368339A1 (en) Object class inpainting in digital images utilizing class-specific inpainting neural networks
CN116416342A (en) Image processing method, apparatus, computer device, and computer-readable storage medium
López-Tapia et al. A single video super-resolution GAN for multiple downsampling operators based on pseudo-inverse image formation models
CN114612289A (en) Stylized image generation method and device and image processing equipment
CN116645592A (en) Crack detection method based on image processing and storage medium
CN116071300A (en) Cell nucleus segmentation method based on context feature fusion and related equipment
CN112132232A (en) Medical image classification labeling method and system and server
CN116188649A (en) Three-dimensional face model driving method based on voice and related device
US11887277B2 (en) Removing compression artifacts from digital images and videos utilizing generative machine-learning models
CN114494786A (en) Fine-grained image classification method based on multilayer coordination convolutional neural network
CN112967373B (en) Facial image feature coding method based on nonlinear 3DMM
CN116912148B (en) Image enhancement method, device, computer equipment and computer readable storage medium
CN108401104A (en) Bifocal camera digital zooming method based on frequency band reparation and super-resolution
CN110348339B (en) Method for extracting handwritten document text lines based on case segmentation
CN115187768A (en) Fisheye image target detection method based on improved YOLOv5

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant