CN112581480A - Automatic image matting method, system and readable storage medium thereof - Google Patents
Automatic image matting method, system and readable storage medium thereof Download PDFInfo
- Publication number
- CN112581480A CN112581480A CN202011531661.9A CN202011531661A CN112581480A CN 112581480 A CN112581480 A CN 112581480A CN 202011531661 A CN202011531661 A CN 202011531661A CN 112581480 A CN112581480 A CN 112581480A
- Authority
- CN
- China
- Prior art keywords
- image
- parameters
- layer
- convolution
- deconvolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000003860 storage Methods 0.000 title abstract description 12
- 230000011218 segmentation Effects 0.000 claims abstract description 49
- 238000012545 processing Methods 0.000 claims abstract description 11
- 238000011176 pooling Methods 0.000 claims description 17
- 230000004913 activation Effects 0.000 claims description 15
- 230000004927 fusion Effects 0.000 claims description 13
- 238000005520 cutting process Methods 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 6
- 239000000126 substance Substances 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 2
- 238000000605 extraction Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 230000007797 corrosion Effects 0.000 description 1
- 238000005260 corrosion Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/187—Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Processing (AREA)
Abstract
The invention relates to the field of image processing, in particular to an automatic cutout method, an automatic cutout system and a readable storage medium, wherein the method comprises the following steps: acquiring data of an original image; performing semantic segmentation on the data of the original image to obtain a Trimap image, and dividing a foreground, a background and an uncertain region in the Trimap image; introducing the data of the original image and the parameters of the Trimap image into a convolution network together for fine segmentation to obtain a primary extracted image; fusing parameters of any more than two areas of a foreground, a background and an uncertain area in the Trimap image in the primary extracted image, and adjusting the primary extracted image to obtain a final extracted image; according to the invention, as the parameters of the Trimap image are introduced into the primary extracted image, the matting effect is more accurate, manual operation is not needed, the labor cost is effectively saved, and the working efficiency is higher.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to an automatic cutout method, an automatic cutout system and a readable storage medium storing the method.
Background
With the development of science and technology, people often utilize some digital images in social life; before a digital image is used, some image processing is often required, and matting is one of the most frequently-performed operations in the image processing, namely extracting a foreground part of a picture or image from an original picture or image from a background to form an individual layer; which is mainly prepared for later image synthesis.
In the technology in the market today, the matting work is mainly to designate all or part of foreground and background areas in a manual operation mode; and because the synthetic background of the certificate photo is pure color, the tiny scratch flaws are easily amplified in the synthetic image, and therefore the method has higher precision requirement on the foreground boundary in the certificate photo. Therefore, when the document photo is subjected to the cutout processing, the generated photo is often subjected to manual cutout by virtue of a cutout tool, and the background is manually replaced to complete the production of the document photo. And the process must be fine to the hair so that the hair and the background look like no disharmony, which often needs a very professional person to complete, and the process is cumbersome, time consuming and laborious. Some automatic image matting methods in the market are generally image segmentation schemes based on deep learning, and often only relatively coarse segmentation results can be obtained, so that it is difficult to directly generate fine extracted images, and the requirements of certificate photographs cannot be met.
Disclosure of Invention
In order to overcome the above-mentioned drawbacks, the present invention provides a method and a system for automatic image matting with high speed, high efficiency and high precision, and a readable storage medium storing the method.
The purpose of the invention is realized by the following technical scheme:
the invention relates to an automatic image matting method, which comprises the following steps:
acquiring data of an original image;
performing semantic segmentation on the data of the original image to obtain a Trimap image, and dividing a foreground, a background and an uncertain region in the Trimap image;
introducing the data of the original image and the parameters of the Trimap image into a convolution network together for fine segmentation to obtain a primary extracted image;
and fusing parameters of any more than two areas of the foreground, the background and the uncertain area in the Trimap image in the preliminarily extracted image, and adjusting the preliminarily extracted image to obtain a finally extracted image.
In the invention, the step of introducing the data of the original image and the parameters of the Trimap image into a convolution network for fine segmentation to obtain a primary extracted image comprises the following steps:
setting parameters of a convolution network, and introducing the data of the original image and the parameters of the Trimap image into the convolution network together for convolution to obtain parameters of a first convolution layer;
performing convolution, activation and pooling operations on the parameters in each convolution layer from top to bottom according to the parameters of the first convolution layer to respectively obtain parameters of a second convolution layer, parameters of a third convolution layer, parameters of a fourth convolution layer and parameters of the bottom layer;
carrying out deconvolution, activation and anti-pooling operations on the parameters of the bottommost layer and the parameters of the fourth convolution layer together to obtain parameters of the fourth deconvolution layer;
carrying out deconvolution, activation and anti-pooling operations on the parameters of the fourth deconvolution layer and the parameters of the third deconvolution layer together to obtain parameters of the third deconvolution layer;
carrying out deconvolution, activation and inverse pooling operations on the parameters of the third deconvolution layer and the parameters of the second deconvolution layer together to obtain parameters of the second deconvolution layer;
carrying out deconvolution, activation and anti-pooling operations on the parameters of the second deconvolution layer and the parameters of the first convolution layer together to obtain parameters of the first deconvolution layer;
and adjusting the number of output channels of the parameters of the first deconvolution layer to obtain a preliminary extracted image.
In the present invention, the performing the convolution and the performing the deconvolution each include: and (6) carrying out normalization processing.
In the invention, fusing the parameters of any more than two areas of the foreground, the background and the uncertain area in the Trimap image in the primary extracted image, and adjusting the primary extracted image to obtain the final extracted image comprises:
fusing parameters of the foreground and the uncertain region in the Trimap image in the primary extracted image according to a fusion formula to obtain a final extracted image, wherein the fusion formula is as follows:
wherein the content of the first and second substances,to extract the image for the final; fsIs a foreground parameter; u shapesIs an uncertain region parameter;to extract an image preliminarily.
In the present invention, the acquiring data of the original image includes:
and importing a portrait image, detecting the binocular position and the head height position of the portrait in the portrait image, determining a cutting area according to the binocular position and the head height position, and acquiring an image in the cutting area to obtain data of an original image.
In the present invention, the acquiring data of the original image comprises:
and performing data enhancement processing on the data of the original image.
In the present invention, after the adjusting the preliminary extracted image to obtain the final extracted image, the method includes:
and extracting the final extracted image, and combining the extracted final extracted image with a new background image to form a new person image.
Based on the same conception, the invention also provides an automatic cutout system, which comprises:
the image acquisition module is used for acquiring data of an original image;
the semantic segmentation module is connected with the image acquisition module and used for performing semantic segmentation on the data of the original image to obtain a Trimap image and dividing a foreground, a background and an uncertain region in the Trimap image;
the fine segmentation module is respectively connected with the image acquisition module and the semantic segmentation module and is used for introducing the data of the original image and the parameters of the Trimap image into a convolution network together for fine segmentation to obtain a primary extracted image;
and the image fusion module is respectively connected with the fine segmentation module and the semantic segmentation module and is used for fusing parameters of any more than two areas of a foreground, a background and an uncertain area in the Trimap image in the primary extracted image and adjusting the primary extracted image to obtain a final extracted image.
In the present invention, the fine segmentation module includes:
the convolution network comprises a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer and a bottom layer, wherein the first convolution layer, the second convolution layer, the third convolution layer, the fourth convolution layer and the bottom layer are sequentially convolved from top to bottom together with the data of the original image and the parameters of the Trimap image; and a fourth deconvolution layer, a third deconvolution layer, a second deconvolution layer and a first deconvolution layer which are deconvoluted are sequentially arranged on the bottommost layer from bottom to top; the fourth convolution layer is connected with the fourth deconvolution layer and used for providing parameters of the fourth convolution layer for the fourth deconvolution layer; the third convolution layer is connected with the third deconvolution layer and used for providing parameters of the third convolution layer for the third deconvolution layer; the second convolution layer is connected with the second deconvolution layer and used for providing parameters of the second convolution layer for the second deconvolution layer.
Based on the same concept, the present invention also provides a computer-readable program storage medium storing computer program instructions which, when executed by a computer, cause the computer to perform the method as described above.
In the automatic image matting method, firstly, semantic segmentation is carried out on an original image to obtain a Trimap image; finely dividing the Trimap image to obtain a primary extracted image; then importing parameters of the Trimap image into the preliminary extracted image, and adjusting the preliminary extracted image to obtain a final extracted image; according to the invention, as the parameters of the Trimap image are introduced into the primary extracted image, the matting effect is more accurate, manual operation is not needed, the labor cost is effectively saved, and the working efficiency is higher.
Drawings
For the purpose of easy explanation, the present invention will be described in detail with reference to the following preferred embodiments and the accompanying drawings.
FIG. 1 is a schematic view of the workflow of one embodiment of the automatic matting method of the present invention;
FIG. 2 is a schematic view of the work flow of another embodiment of the automatic matting method of the present invention;
FIG. 3 is a schematic diagram of the operation principle of fine segmentation in the automatic matting method according to the present invention;
fig. 4 is a schematic diagram of a logical structure of an embodiment of the automatic matting system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", and the like, indicate orientations and positional relationships based on those shown in the drawings, and are used only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be considered as limiting the present invention. Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
In the description of the present invention, it should be noted that the terms "mounted," "connected," and "connected" are to be construed broadly and may be, for example, fixedly connected, detachably connected, or integrally connected unless otherwise explicitly stated or limited. Either mechanically or electrically. Either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
An embodiment of the present invention will be described in detail below with reference to fig. 1, which includes:
s101, acquiring data of original image
Importing a portrait image, and directly taking the imported portrait image as data of an original image if the portrait image is a portrait image meeting the certificate photo standard; the method is suitable for the situation that the background color of the existing certificate photo is changed in real life; the portrait image comprises a portrait head and an upper shoulder area, and is an RGB image.
S102, performing semantic segmentation on the original image to obtain a Trimap image
Performing semantic segmentation on the data of the original image to obtain a Trimap image, and dividing a foreground, a background and an uncertain region in the Trimap image; the Trimap is a static image matting algorithm, which can roughly divide a given image, that is, divide the given image into a foreground, a background and an uncertain region. In this implementation, the Trimap image includes a foreground, a background, and an uncertain region. Specifically, each pixel in the input original image is subjected to semantic segmentation through a T-net network, and a Trimap image is generated.
S103, finely dividing the original image and the Trimap image together
Introducing the original image and parameters of the Trimap image into a convolution network together for fine segmentation to obtain a primary extracted image alpha _ p; in the embodiment, the M-net network is used for fine segmentation, and a rough preliminary extraction image alpha _ p is generated to describe the detail information of the portrait by taking a Trimap image output by the T-net and an RGB image of an original image as input. Wherein, the M-net adopts a coding and decoding network similar to a U-net structure.
S104, fusing the preliminary extracted image with the Trimap image to obtain a final extracted image
The preliminary extracted image alpha _ p obtained in the previous step may basically represent the real final extracted image, but there still exists some flaws and still needs blending fine tuning. Therefore, in the step, parameters of any more than two areas of the foreground, the background and the uncertain area in the Trimap image are fused in the preliminary extraction image alpha _ p, and the preliminary extraction image alpha _ p is adjusted to obtain a final extraction image. The method is mainly used for fusing semantic information generated by the T-net and structure and texture detail information generated by the M-net to obtain a fine final extracted image. The final extracted image contains an alpha channel, and the alpha channel refers to the transparency and translucency of one picture. In the embodiment, since the coarsely-segmented Trimap image and the finely-segmented preliminary-extracted image alpha _ p are fused to generate the final-extracted image, the image appears finer and more natural. The generated final extracted image can be refined to hair, so that the user cannot see any sense of incongruity after the background is replaced.
In another embodiment, an automatic matting method according to the present invention is described in detail below with reference to fig. 2 to 3, which includes:
s201, acquiring data of original image through cropping
Importing a portrait image, if the portrait image is a photo taken by a user, adopting a face detection algorithm and a face key point positioning algorithm, detecting binocular positions and head height positions of the portrait in the portrait image, determining a cutting area according to the binocular positions and the head height positions, and acquiring an image in the cutting area, so that the cut image comprises a portrait head area and a portrait shoulder area, and obtaining data of an original image; the foreground part in the data of the original image meets the requirements of the certificate photo on the portrait.
S202, data enhancement is carried out on data of the original image
Taking the data of the cut original image as input RGB image data for model training, and performing data enhancement processing on the data of the original image; in order to increase the generalization capability of the model, the method adopts a random erasure enhancement method and image enhancement methods such as random cropping and scaling. The random cutting and scaling operation of the image can better expand the data set and increase the expression capacity of the data. The random erasing method randomly selects a rectangular area on the original image, and replaces the pixel value of the area with a random value. In the process, pictures participating in training can be shielded to different degrees, so that the problems that in the actual process, the portrait part on the portrait image is shielded, and the patterns, colors and the like of different clothes are influenced are solved, the overfitting risk is reduced, the generalization performance of the model is improved, and because the portrait area is partially shielded, the portrait area can still be correctly segmented, the network is forced to extract local more robust features, and the generalization capability of the model is improved to a certain degree.
S203, performing semantic segmentation on the original image to obtain a Trimap image
Performing semantic segmentation on the data of the original image to obtain a Trimap image, and dividing a foreground, a background and an uncertain region in the Trimap image; the Trimap is a static image matting algorithm, which can roughly divide a given image, that is, divide the given image into a foreground, a background and an uncertain region. In this implementation, the Trimap image includes a foreground, a background, and an uncertain region. Specifically, each pixel in the input original image is subjected to semantic segmentation through a T-net network, and a Trimap image is generated.
In order to increase the running speed of the model and reduce the size of the model, the mobilenet v2 is selected as the backbone network of the T-net in the embodiment. The mobile v2 is a lightweight network that can be deployed at mobile terminals such as mobile phones, and the size of the model and the complexity of the algorithm can be greatly reduced by using the network as a backbone network. According to the method, cross entropy loss is performed on labeled data of Trimap generated by performing expansion corrosion on a label and a Trimap graph generated by a network, so that model training and parameter tuning are guided. Wherein the cross entropy loss function formula (1) is as follows:
where M is the number of classes, yiIs an index variable, 1 if the column is identical to the sample i column, otherwise 0, piThe predicted probability that the observation sample i belongs to class c.
S204, finely dividing the original image and the Trimap image together
Introducing the original image and parameters of the Trimap image into a convolution network together for fine segmentation to obtain a primary extracted image alpha _ p; in the embodiment, the M-net network is used for fine segmentation, and a rough preliminary extraction image alpha _ p is generated to describe the detail information of the portrait by taking a Trimap image output by the T-net and an RGB image of an original image as input. Wherein, the M-net adopts a coding and decoding network similar to a U-net structure.
Preferably, the step of introducing the data of the original image and the parameters of the Trimap image into a convolution network together for fine segmentation to obtain a preliminary extracted image alpha _ p includes:
setting parameters of a convolution network, and introducing the data of the original image and the parameters of the Trimap image into the convolution network together for convolution to obtain parameters of a first convolution layer S1; performing convolution, activation and pooling operations on the parameters in each convolution layer from top to bottom according to the parameters of the first convolution layer S1 to obtain parameters of a second convolution layer S2, parameters of a third convolution layer S3, parameters of a fourth convolution layer S4 and parameters of a bottom layer X; carrying out deconvolution, activation and inverse pooling operations on the parameters of the bottom layer X and the parameters of the fourth convolution layer S4 together to obtain parameters of a fourth inverse convolution layer; carrying out deconvolution, activation and inverse pooling operations on the parameters of the fourth deconvolution layer S4 'and the parameters of the third convolution layer S3 together to obtain parameters of a third deconvolution layer S3'; carrying out deconvolution, activation and inverse pooling operations on the parameters of the third deconvolution layer S3 'and the parameters of the second convolution layer S2 together to obtain parameters of a second deconvolution layer S2'; carrying out deconvolution, activation and inverse pooling operations on the parameters of the second deconvolution layer S2 'and the parameters of the first convolution layer S1 together to obtain parameters of the first deconvolution layer S1'; and adjusting the number of output channels of the parameters of the first deconvolution layer S1' to obtain a preliminary extraction image alpha _ p. In the above steps, normalization processing is required after convolution and deconvolution. Since there are a lot of parameters in the encoding stage, which are easy to over-fit, this embodiment adds a Batch Normalization layer after each convolution layer to speed up the convergence of the model. Batch normalization is to forcibly pull back the distribution of the input value of any neuron of each layer of neural network to a standard normal distribution with the mean value of 0 and the variance of 1 through a certain normalization means. Specifically, in the encoding stage, the method receives 320 × 320 images, the input channel is 6, three channels of RGB images and three channels of Trimap images. The top-down multilayer feature fusion is carried out through a series of convolution layers with different filter numbers and pooling, and the step-by-step up-sampling operation is carried out through a series of different filter numbers in a decoding stage in order to remove the pooling, so that supervision and loss regression can be carried out only by using high-level features, and the feature map of each layer can be effectively utilized. As in the generation of the third deconvolution layer S3' S3', the upsampled fourth deconvolution layer S4' and the fourth convolution layer S4 are added together, and the same applies to the other layers. In this embodiment, the encoding stage sets convolution by 3 × 3, and the dimensions of the features output after each layer passes through 16, 24, 32, 96, and 320 are respectively that the first convolution layer S1 is: 320 × 16, the second convolution layer S2 is: 160 × 24, the third convolution layer S3 is: 80 × 32, the fourth convolution layer S4 is: 40X 96, the bottom layer X being: 20 × 320, the convolution layer is set to 3 × 3 size in the decoding stage, the number of channels of each layer is 96, 32, 24, 16, and the dimensions of the features output after 3 passes through each layer are respectively: the fourth deconvolution layer S4' is: 40 × 96, the third deconvolution layer S3' is: 80 x 32, second deconvolution layer S2': 160 × 24, first deconvolution layer S1': and 320 × 16, and finally setting the output channel to be 3 channels through the convolution layer, namely generating a preliminary extracted image alpha _ p, wherein the final output size is 320 × 3.
S205, fusing the preliminary extracted image with the Trimap image to obtain a final extracted image
The preliminary extracted image alpha _ p obtained in the previous step may basically represent the real final extracted image, but there still exists some flaws and still needs blending fine tuning. Therefore, in the step, parameters of any more than two areas of the foreground, the background and the uncertain area in the Trimap image are fused in the preliminary extraction image alpha _ p, and the preliminary extraction image alpha _ p is adjusted to obtain a final extraction image. The method is mainly used for fusing semantic information generated by the T-net and structure and texture detail information generated by the M-net to obtain a fine final extracted image. The final extracted image contains an alpha channel, and the alpha channel refers to the transparency and translucency of one picture. In the embodiment, since the coarsely-segmented Trimap image and the finely-segmented preliminary-extracted image alpha _ p are fused to generate the final-extracted image, the image appears finer and more natural. The generated final extracted image can be refined to hair, so that the user cannot see any sense of incongruity after the background is replaced.
In the fusion process of the embodiment, the result generated by the M-net is mainly used, and the result generated by the T-net is used as an auxiliary. The specific fusion mode is shown in formula 2 below:
in the formulaFor the result of M-net generation, Fs,Bs,UsIs the result of T-net generation, FsRepresents the foreground, BsRepresents the background, UsRepresenting an uncertainty region. Due to Fs+Bs=1-Us
So equation 2 can be written as:
from the formula 3, it can be analyzed that when U issWhen the trend is 1, FsMoving toward 0, the final result is approximated by the output result of M _ net when U is equal tosWhen the trend is 0, FsTrending toward 1, the final result is Trimap result FsAs an approximation, the result of such fusion is very natural to combine coarse segmentation with fine details.
S206, integrating the final extracted image and a new background image to form a new character image
And extracting the final extracted image, and combining the extracted final extracted image with a new background image to form a new person image. Wherein, the new background image may be: red, white, and blue solid images.
The present invention includes a computer readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on the above readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
An embodiment of an automatic matting system according to the invention is described in detail below, with reference to fig. 3 to 4, which includes:
the image acquisition module 101, the image acquisition module 101 is used for acquiring data of an original image; the original image comprises a portrait head area and an upper shoulder area, the portrait image meeting the requirements of the certificate photo on the portrait can be directly used as the original image, the face detection algorithm and the face key point positioning algorithm are adopted for the portrait image not meeting the requirements of the certificate photo on the portrait, the binocular position and the head height position of the portrait are detected in the portrait image, the cutting area is determined according to the binocular position and the head height position, the image in the cutting area is obtained, and the cut image comprises the portrait head area and the upper shoulder area so as to obtain data of the original image.
The semantic segmentation module 102 is connected to the image acquisition module 101, and is configured to perform semantic segmentation on data of the original image to obtain a Trimap image, and partition a foreground, a background, and an uncertain region in the Trimap image; the Trimap is a static image matting algorithm, which can roughly divide a given image, that is, divide the given image into a foreground, a background and an uncertain region. In this implementation, the Trimap image includes a foreground, a background, and an uncertain region. Specifically, each pixel in the input original image is subjected to semantic segmentation through a T-net network, and a Trimap image is generated.
The fine segmentation module 103 is respectively connected with the image acquisition module 101 and the semantic segmentation module 102, and is configured to introduce the data of the original image and the parameters of the Trimap image into a convolutional network together for fine segmentation to obtain a preliminary extracted image alpha _ p; specifically, the method utilizes an M-net network to carry out fine segmentation, and takes a Trimap image output by T-net and an RGB image of an original image as input to generate a rough preliminary extraction image alpha _ p for describing detail information of a portrait.
An image fusion module 104, where the image fusion module 104 is respectively connected to the fine segmentation module 103 and the semantic segmentation module 102, and is configured to fuse parameters of any two or more regions, namely a foreground region, a background region and an uncertain region, in the Trimap image in the preliminary extracted image alpha _ p, and adjust the preliminary extracted image alpha _ p to obtain a final extracted image; the method is mainly used for fusing semantic information generated by the T-net and structure and texture detail information generated by the M-net to obtain a fine final extracted image. The final extracted image contains an alpha channel, and the alpha channel refers to the transparency and translucency of one picture. Specifically, parameters of a foreground and an uncertain region in the Trimap image are fused in the primary extracted image according to a fusion formula to obtain a final extracted image, wherein the fusion formula is as follows:wherein the content of the first and second substances,to extract the image for the final; fsIs a foreground parameter; u shapesIs an uncertain region parameter;to extract an image preliminarily.
In the present invention, the fine segmentation module 103 includes:
a convolution network, wherein the convolution network comprises a first convolution layer S1, a second convolution layer S2, a third convolution layer S3, a fourth convolution layer S4 and a bottom layer X, wherein the data of the original image and the parameters of the Trimap image are convoluted together from top to bottom in sequence; and the bottommost layer X is provided with a fourth deconvolution layer S4', a third deconvolution layer S3', a second deconvolution layer S2 'and a first deconvolution layer S1' in sequence from bottom to top; and the fourth convolutional layer S4 is connected to the fourth deconvolution layer S4 'for providing the fourth deconvolution layer S4' with parameters of a fourth convolutional layer S4; the third convolutional layer S3 is connected to the third deconvolution layer S3 'for providing the third deconvolution layer S3' with parameters of a third convolutional layer S3; the second convolutional layer S2 is connected to the second deconvolution layer S2 'for providing the second deconvolution layer S2' with parameters of the second convolutional layer S2. As in the generation of the third deconvolution layer S3', the upsampled fourth deconvolution layer S4' and the fourth convolution layer S4 are added together, and the same applies to the other layers. In this embodiment, the encoding stage sets convolution by 3 × 3, and the dimensions of the features output after each layer passes through 16, 24, 32, 96, and 320 are respectively that the first convolution layer S1 is: 320 × 16, the second convolution layer S2 is: 160 × 24, the third convolution layer S3 is: 80 × 32, the fourth convolution layer S4 is: 40X 96, the bottom layer X being: 20 × 320, the convolution layer is set to 3 × 3 size in the decoding stage, the number of channels of each layer is 96, 32, 24, 16, and the dimensions of the features output after 3 passes through each layer are respectively: the fourth deconvolution layer S4' is: 40 × 96, the third deconvolution layer S3' is: 80 x 32, second deconvolution layer S2': 160 × 24, first deconvolution layer S1': and 320 × 16, and finally setting the output channel to be 3 channels through the convolution layer, namely generating a preliminary extracted image alpha _ p, wherein the final output size is 320 × 3.
In the description of the present specification, reference to the description of the terms "one embodiment", "some embodiments", "an illustrative embodiment", "an example", "a specific example", or "some examples", etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
Claims (10)
1. An automatic matting method, characterized by comprising:
acquiring data of an original image;
performing semantic segmentation on the data of the original image to obtain a Trimap image, and dividing a foreground, a background and an uncertain region in the Trimap image;
introducing the data of the original image and the parameters of the Trimap image into a convolution network together for fine segmentation to obtain a primary extracted image;
and fusing parameters of any more than two areas of the foreground, the background and the uncertain area in the Trimap image in the preliminarily extracted image, and adjusting the preliminarily extracted image to obtain a finally extracted image.
2. The automatic matting method according to claim 1, wherein the step of introducing the data of the original image and the parameters of the Trimap image into a convolution network for fine segmentation to obtain a preliminary extracted image comprises:
setting parameters of a convolution network, and introducing the data of the original image and the parameters of the Trimap image into the convolution network together for convolution to obtain parameters of a first convolution layer;
performing convolution, activation and pooling operations on the parameters in each convolution layer from top to bottom according to the parameters of the first convolution layer to respectively obtain parameters of a second convolution layer, parameters of a third convolution layer, parameters of a fourth convolution layer and parameters of the bottom layer;
carrying out deconvolution, activation and anti-pooling operations on the parameters of the bottommost layer and the parameters of the fourth convolution layer together to obtain parameters of the fourth deconvolution layer;
carrying out deconvolution, activation and anti-pooling operations on the parameters of the fourth deconvolution layer and the parameters of the third deconvolution layer together to obtain parameters of the third deconvolution layer;
carrying out deconvolution, activation and inverse pooling operations on the parameters of the third deconvolution layer and the parameters of the second deconvolution layer together to obtain parameters of the second deconvolution layer;
carrying out deconvolution, activation and anti-pooling operations on the parameters of the second deconvolution layer and the parameters of the first convolution layer together to obtain parameters of the first deconvolution layer;
and adjusting the number of output channels of the parameters of the first deconvolution layer to obtain a preliminary extracted image.
3. The automatic matting method according to claim 2, characterized in that both said performing a convolution and said performing a deconvolution are followed by: and (6) carrying out normalization processing.
4. The automatic matting method according to claim 1, wherein fusing parameters of any two or more of a foreground, a background and an uncertain region in the Trimap image in the preliminary extracted image, and adjusting the preliminary extracted image to obtain a final extracted image comprises:
fusing parameters of the foreground and the uncertain region in the Trimap image in the primary extracted image according to a fusion formula to obtain a final extracted image, wherein the fusion formula is as follows:
5. The automatic matting method according to claim 4, wherein the acquiring data of an original image includes:
and importing a portrait image, detecting the binocular position and the head height position of the portrait in the portrait image, determining a cutting area according to the binocular position and the head height position, and acquiring an image in the cutting area to obtain data of an original image.
6. The automatic matting method according to claim 5, characterized in that said obtaining of data of an original image comprises, after:
and performing data enhancement processing on the data of the original image.
7. The automatic matting method according to claim 6, wherein said adjusting the preliminary extracted image to obtain the final extracted image includes:
and extracting the final extracted image, and combining the extracted final extracted image with a new background image to form a new person image.
8. An automatic matting system, comprising:
the image acquisition module is used for acquiring data of an original image;
the semantic segmentation module is connected with the image acquisition module and used for performing semantic segmentation on the data of the original image to obtain a Trimap image and dividing a foreground, a background and an uncertain region in the Trimap image;
the fine segmentation module is respectively connected with the image acquisition module and the semantic segmentation module and is used for introducing the data of the original image and the parameters of the Trimap image into a convolution network together for fine segmentation to obtain a primary extracted image;
and the image fusion module is respectively connected with the fine segmentation module and the semantic segmentation module and is used for fusing parameters of any more than two areas of a foreground, a background and an uncertain area in the Trimap image in the primary extracted image and adjusting the primary extracted image to obtain a final extracted image.
9. The automatic matting system according to claim 8, wherein the fine segmentation module includes:
the convolution network comprises a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer and a bottom layer, wherein the first convolution layer, the second convolution layer, the third convolution layer, the fourth convolution layer and the bottom layer are sequentially convolved from top to bottom together with the data of the original image and the parameters of the Trimap image; and a fourth deconvolution layer, a third deconvolution layer, a second deconvolution layer and a first deconvolution layer which are deconvoluted are sequentially arranged on the bottommost layer from bottom to top; the fourth convolution layer is connected with the fourth deconvolution layer and used for providing parameters of the fourth convolution layer for the fourth deconvolution layer; the third convolution layer is connected with the third deconvolution layer and used for providing parameters of the third convolution layer for the third deconvolution layer; the second convolution layer is connected with the second deconvolution layer and used for providing parameters of the second convolution layer for the second deconvolution layer.
10. A computer-readable program storage medium storing computer program instructions which, when executed by a computer, cause the computer to perform the method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011531661.9A CN112581480A (en) | 2020-12-22 | 2020-12-22 | Automatic image matting method, system and readable storage medium thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011531661.9A CN112581480A (en) | 2020-12-22 | 2020-12-22 | Automatic image matting method, system and readable storage medium thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112581480A true CN112581480A (en) | 2021-03-30 |
Family
ID=75139410
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011531661.9A Pending CN112581480A (en) | 2020-12-22 | 2020-12-22 | Automatic image matting method, system and readable storage medium thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112581480A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113409224A (en) * | 2021-07-09 | 2021-09-17 | 浙江大学 | Image target pertinence enhancing method, device, equipment and storage medium |
CN114820666A (en) * | 2022-04-29 | 2022-07-29 | 深圳万兴软件有限公司 | Method and device for increasing matting accuracy, computer equipment and storage medium |
WO2023137905A1 (en) * | 2022-01-21 | 2023-07-27 | 小米科技(武汉)有限公司 | Image processing method and apparatus, and electronic device and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018107825A1 (en) * | 2016-12-13 | 2018-06-21 | 华为技术有限公司 | Matting method and device |
US20190080456A1 (en) * | 2017-09-12 | 2019-03-14 | Shenzhen Keya Medical Technology Corporation | Method and system for performing segmentation of image having a sparsely distributed object |
CN109948562A (en) * | 2019-03-25 | 2019-06-28 | 浙江啄云智能科技有限公司 | A kind of safe examination system deep learning sample generating method based on radioscopic image |
CN110008832A (en) * | 2019-02-27 | 2019-07-12 | 西安电子科技大学 | Based on deep learning character image automatic division method, information data processing terminal |
CN110610509A (en) * | 2019-09-18 | 2019-12-24 | 上海大学 | Optimized matting method and system capable of assigning categories |
CN111223106A (en) * | 2019-10-28 | 2020-06-02 | 稿定(厦门)科技有限公司 | Full-automatic portrait mask matting method and system |
-
2020
- 2020-12-22 CN CN202011531661.9A patent/CN112581480A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018107825A1 (en) * | 2016-12-13 | 2018-06-21 | 华为技术有限公司 | Matting method and device |
US20190080456A1 (en) * | 2017-09-12 | 2019-03-14 | Shenzhen Keya Medical Technology Corporation | Method and system for performing segmentation of image having a sparsely distributed object |
CN110008832A (en) * | 2019-02-27 | 2019-07-12 | 西安电子科技大学 | Based on deep learning character image automatic division method, information data processing terminal |
CN109948562A (en) * | 2019-03-25 | 2019-06-28 | 浙江啄云智能科技有限公司 | A kind of safe examination system deep learning sample generating method based on radioscopic image |
CN110610509A (en) * | 2019-09-18 | 2019-12-24 | 上海大学 | Optimized matting method and system capable of assigning categories |
CN111223106A (en) * | 2019-10-28 | 2020-06-02 | 稿定(厦门)科技有限公司 | Full-automatic portrait mask matting method and system |
Non-Patent Citations (2)
Title |
---|
QUAN CHEN ET AL.: "Semantic Human Matting", 《ACM MULTIMEDIA 2018》, pages 618 - 626 * |
张盛林 等: "基于图像抠图技术的多聚焦图像融合方法", 《计算机应用》, vol. 36, pages 1949 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113409224A (en) * | 2021-07-09 | 2021-09-17 | 浙江大学 | Image target pertinence enhancing method, device, equipment and storage medium |
CN113409224B (en) * | 2021-07-09 | 2023-07-04 | 浙江大学 | Image target pertinence enhancement method, device, equipment and storage medium |
WO2023137905A1 (en) * | 2022-01-21 | 2023-07-27 | 小米科技(武汉)有限公司 | Image processing method and apparatus, and electronic device and storage medium |
CN114820666A (en) * | 2022-04-29 | 2022-07-29 | 深圳万兴软件有限公司 | Method and device for increasing matting accuracy, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112581480A (en) | Automatic image matting method, system and readable storage medium thereof | |
WO2021164534A1 (en) | Image processing method and apparatus, device, and storage medium | |
CN110400323B (en) | Automatic cutout system, method and device | |
CN110889855B (en) | Certificate photo matting method and system based on end-to-end convolution neural network | |
US8655069B2 (en) | Updating image segmentation following user input | |
Xiao et al. | Example‐Based Colourization Via Dense Encoding Pyramids | |
CN114283164B (en) | Breast cancer pathological section image segmentation prediction system based on UNet3+ | |
CN110866938B (en) | Full-automatic video moving object segmentation method | |
CN105701489A (en) | Novel digital extraction and identification method and system thereof | |
CN112949754B (en) | Text recognition data synthesis method based on image fusion | |
CN110827371A (en) | Certificate photo generation method and device, electronic equipment and storage medium | |
CN112784849B (en) | Glandular segmentation method based on multi-scale attention selection | |
CN113392791A (en) | Skin prediction processing method, device, equipment and storage medium | |
CN105580050A (en) | Providing control points in images | |
CN113158856B (en) | Processing method and device for extracting target area in remote sensing image | |
CN115471901B (en) | Multi-pose face frontization method and system based on generation of confrontation network | |
CN111325263A (en) | Image processing method and device, intelligent microscope, readable storage medium and equipment | |
CN116485944A (en) | Image processing method and device, computer readable storage medium and electronic equipment | |
CN115641317A (en) | Pathological image-oriented dynamic knowledge backtracking multi-example learning and image classification method | |
CN114882282A (en) | Neural network prediction method for colorectal cancer treatment effect based on MRI and CT images | |
CN111932557B (en) | Image semantic segmentation method and device based on ensemble learning and probability map model | |
CN114494272A (en) | Metal part fast segmentation method based on deep learning | |
CN114373109A (en) | Natural image matting method and natural image matting device based on deep learning | |
CN114708274A (en) | Image segmentation method and system of T-CutMix data enhancement and three-dimensional convolution neural network based on real-time selection mechanism | |
EP2698693B1 (en) | Local image translating method and terminal with touch screen |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |