CN112581480A - Automatic image matting method, system and readable storage medium thereof - Google Patents

Automatic image matting method, system and readable storage medium thereof Download PDF

Info

Publication number
CN112581480A
CN112581480A CN202011531661.9A CN202011531661A CN112581480A CN 112581480 A CN112581480 A CN 112581480A CN 202011531661 A CN202011531661 A CN 202011531661A CN 112581480 A CN112581480 A CN 112581480A
Authority
CN
China
Prior art keywords
image
parameters
layer
convolution
deconvolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011531661.9A
Other languages
Chinese (zh)
Inventor
李增前
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Emperor Technology Co Ltd
Original Assignee
Shenzhen Emperor Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Emperor Technology Co Ltd filed Critical Shenzhen Emperor Technology Co Ltd
Priority to CN202011531661.9A priority Critical patent/CN112581480A/en
Publication of CN112581480A publication Critical patent/CN112581480A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/187Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to the field of image processing, in particular to an automatic cutout method, an automatic cutout system and a readable storage medium, wherein the method comprises the following steps: acquiring data of an original image; performing semantic segmentation on the data of the original image to obtain a Trimap image, and dividing a foreground, a background and an uncertain region in the Trimap image; introducing the data of the original image and the parameters of the Trimap image into a convolution network together for fine segmentation to obtain a primary extracted image; fusing parameters of any more than two areas of a foreground, a background and an uncertain area in the Trimap image in the primary extracted image, and adjusting the primary extracted image to obtain a final extracted image; according to the invention, as the parameters of the Trimap image are introduced into the primary extracted image, the matting effect is more accurate, manual operation is not needed, the labor cost is effectively saved, and the working efficiency is higher.

Description

Automatic image matting method, system and readable storage medium thereof
Technical Field
The invention relates to the technical field of image processing, in particular to an automatic cutout method, an automatic cutout system and a readable storage medium storing the method.
Background
With the development of science and technology, people often utilize some digital images in social life; before a digital image is used, some image processing is often required, and matting is one of the most frequently-performed operations in the image processing, namely extracting a foreground part of a picture or image from an original picture or image from a background to form an individual layer; which is mainly prepared for later image synthesis.
In the technology in the market today, the matting work is mainly to designate all or part of foreground and background areas in a manual operation mode; and because the synthetic background of the certificate photo is pure color, the tiny scratch flaws are easily amplified in the synthetic image, and therefore the method has higher precision requirement on the foreground boundary in the certificate photo. Therefore, when the document photo is subjected to the cutout processing, the generated photo is often subjected to manual cutout by virtue of a cutout tool, and the background is manually replaced to complete the production of the document photo. And the process must be fine to the hair so that the hair and the background look like no disharmony, which often needs a very professional person to complete, and the process is cumbersome, time consuming and laborious. Some automatic image matting methods in the market are generally image segmentation schemes based on deep learning, and often only relatively coarse segmentation results can be obtained, so that it is difficult to directly generate fine extracted images, and the requirements of certificate photographs cannot be met.
Disclosure of Invention
In order to overcome the above-mentioned drawbacks, the present invention provides a method and a system for automatic image matting with high speed, high efficiency and high precision, and a readable storage medium storing the method.
The purpose of the invention is realized by the following technical scheme:
the invention relates to an automatic image matting method, which comprises the following steps:
acquiring data of an original image;
performing semantic segmentation on the data of the original image to obtain a Trimap image, and dividing a foreground, a background and an uncertain region in the Trimap image;
introducing the data of the original image and the parameters of the Trimap image into a convolution network together for fine segmentation to obtain a primary extracted image;
and fusing parameters of any more than two areas of the foreground, the background and the uncertain area in the Trimap image in the preliminarily extracted image, and adjusting the preliminarily extracted image to obtain a finally extracted image.
In the invention, the step of introducing the data of the original image and the parameters of the Trimap image into a convolution network for fine segmentation to obtain a primary extracted image comprises the following steps:
setting parameters of a convolution network, and introducing the data of the original image and the parameters of the Trimap image into the convolution network together for convolution to obtain parameters of a first convolution layer;
performing convolution, activation and pooling operations on the parameters in each convolution layer from top to bottom according to the parameters of the first convolution layer to respectively obtain parameters of a second convolution layer, parameters of a third convolution layer, parameters of a fourth convolution layer and parameters of the bottom layer;
carrying out deconvolution, activation and anti-pooling operations on the parameters of the bottommost layer and the parameters of the fourth convolution layer together to obtain parameters of the fourth deconvolution layer;
carrying out deconvolution, activation and anti-pooling operations on the parameters of the fourth deconvolution layer and the parameters of the third deconvolution layer together to obtain parameters of the third deconvolution layer;
carrying out deconvolution, activation and inverse pooling operations on the parameters of the third deconvolution layer and the parameters of the second deconvolution layer together to obtain parameters of the second deconvolution layer;
carrying out deconvolution, activation and anti-pooling operations on the parameters of the second deconvolution layer and the parameters of the first convolution layer together to obtain parameters of the first deconvolution layer;
and adjusting the number of output channels of the parameters of the first deconvolution layer to obtain a preliminary extracted image.
In the present invention, the performing the convolution and the performing the deconvolution each include: and (6) carrying out normalization processing.
In the invention, fusing the parameters of any more than two areas of the foreground, the background and the uncertain area in the Trimap image in the primary extracted image, and adjusting the primary extracted image to obtain the final extracted image comprises:
fusing parameters of the foreground and the uncertain region in the Trimap image in the primary extracted image according to a fusion formula to obtain a final extracted image, wherein the fusion formula is as follows:
Figure BDA0002852263630000031
wherein the content of the first and second substances,
Figure BDA0002852263630000032
to extract the image for the final; fsIs a foreground parameter; u shapesIs an uncertain region parameter;
Figure BDA0002852263630000033
to extract an image preliminarily.
In the present invention, the acquiring data of the original image includes:
and importing a portrait image, detecting the binocular position and the head height position of the portrait in the portrait image, determining a cutting area according to the binocular position and the head height position, and acquiring an image in the cutting area to obtain data of an original image.
In the present invention, the acquiring data of the original image comprises:
and performing data enhancement processing on the data of the original image.
In the present invention, after the adjusting the preliminary extracted image to obtain the final extracted image, the method includes:
and extracting the final extracted image, and combining the extracted final extracted image with a new background image to form a new person image.
Based on the same conception, the invention also provides an automatic cutout system, which comprises:
the image acquisition module is used for acquiring data of an original image;
the semantic segmentation module is connected with the image acquisition module and used for performing semantic segmentation on the data of the original image to obtain a Trimap image and dividing a foreground, a background and an uncertain region in the Trimap image;
the fine segmentation module is respectively connected with the image acquisition module and the semantic segmentation module and is used for introducing the data of the original image and the parameters of the Trimap image into a convolution network together for fine segmentation to obtain a primary extracted image;
and the image fusion module is respectively connected with the fine segmentation module and the semantic segmentation module and is used for fusing parameters of any more than two areas of a foreground, a background and an uncertain area in the Trimap image in the primary extracted image and adjusting the primary extracted image to obtain a final extracted image.
In the present invention, the fine segmentation module includes:
the convolution network comprises a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer and a bottom layer, wherein the first convolution layer, the second convolution layer, the third convolution layer, the fourth convolution layer and the bottom layer are sequentially convolved from top to bottom together with the data of the original image and the parameters of the Trimap image; and a fourth deconvolution layer, a third deconvolution layer, a second deconvolution layer and a first deconvolution layer which are deconvoluted are sequentially arranged on the bottommost layer from bottom to top; the fourth convolution layer is connected with the fourth deconvolution layer and used for providing parameters of the fourth convolution layer for the fourth deconvolution layer; the third convolution layer is connected with the third deconvolution layer and used for providing parameters of the third convolution layer for the third deconvolution layer; the second convolution layer is connected with the second deconvolution layer and used for providing parameters of the second convolution layer for the second deconvolution layer.
Based on the same concept, the present invention also provides a computer-readable program storage medium storing computer program instructions which, when executed by a computer, cause the computer to perform the method as described above.
In the automatic image matting method, firstly, semantic segmentation is carried out on an original image to obtain a Trimap image; finely dividing the Trimap image to obtain a primary extracted image; then importing parameters of the Trimap image into the preliminary extracted image, and adjusting the preliminary extracted image to obtain a final extracted image; according to the invention, as the parameters of the Trimap image are introduced into the primary extracted image, the matting effect is more accurate, manual operation is not needed, the labor cost is effectively saved, and the working efficiency is higher.
Drawings
For the purpose of easy explanation, the present invention will be described in detail with reference to the following preferred embodiments and the accompanying drawings.
FIG. 1 is a schematic view of the workflow of one embodiment of the automatic matting method of the present invention;
FIG. 2 is a schematic view of the work flow of another embodiment of the automatic matting method of the present invention;
FIG. 3 is a schematic diagram of the operation principle of fine segmentation in the automatic matting method according to the present invention;
fig. 4 is a schematic diagram of a logical structure of an embodiment of the automatic matting system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", and the like, indicate orientations and positional relationships based on those shown in the drawings, and are used only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be considered as limiting the present invention. Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
In the description of the present invention, it should be noted that the terms "mounted," "connected," and "connected" are to be construed broadly and may be, for example, fixedly connected, detachably connected, or integrally connected unless otherwise explicitly stated or limited. Either mechanically or electrically. Either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
An embodiment of the present invention will be described in detail below with reference to fig. 1, which includes:
s101, acquiring data of original image
Importing a portrait image, and directly taking the imported portrait image as data of an original image if the portrait image is a portrait image meeting the certificate photo standard; the method is suitable for the situation that the background color of the existing certificate photo is changed in real life; the portrait image comprises a portrait head and an upper shoulder area, and is an RGB image.
S102, performing semantic segmentation on the original image to obtain a Trimap image
Performing semantic segmentation on the data of the original image to obtain a Trimap image, and dividing a foreground, a background and an uncertain region in the Trimap image; the Trimap is a static image matting algorithm, which can roughly divide a given image, that is, divide the given image into a foreground, a background and an uncertain region. In this implementation, the Trimap image includes a foreground, a background, and an uncertain region. Specifically, each pixel in the input original image is subjected to semantic segmentation through a T-net network, and a Trimap image is generated.
S103, finely dividing the original image and the Trimap image together
Introducing the original image and parameters of the Trimap image into a convolution network together for fine segmentation to obtain a primary extracted image alpha _ p; in the embodiment, the M-net network is used for fine segmentation, and a rough preliminary extraction image alpha _ p is generated to describe the detail information of the portrait by taking a Trimap image output by the T-net and an RGB image of an original image as input. Wherein, the M-net adopts a coding and decoding network similar to a U-net structure.
S104, fusing the preliminary extracted image with the Trimap image to obtain a final extracted image
The preliminary extracted image alpha _ p obtained in the previous step may basically represent the real final extracted image, but there still exists some flaws and still needs blending fine tuning. Therefore, in the step, parameters of any more than two areas of the foreground, the background and the uncertain area in the Trimap image are fused in the preliminary extraction image alpha _ p, and the preliminary extraction image alpha _ p is adjusted to obtain a final extraction image. The method is mainly used for fusing semantic information generated by the T-net and structure and texture detail information generated by the M-net to obtain a fine final extracted image. The final extracted image contains an alpha channel, and the alpha channel refers to the transparency and translucency of one picture. In the embodiment, since the coarsely-segmented Trimap image and the finely-segmented preliminary-extracted image alpha _ p are fused to generate the final-extracted image, the image appears finer and more natural. The generated final extracted image can be refined to hair, so that the user cannot see any sense of incongruity after the background is replaced.
In another embodiment, an automatic matting method according to the present invention is described in detail below with reference to fig. 2 to 3, which includes:
s201, acquiring data of original image through cropping
Importing a portrait image, if the portrait image is a photo taken by a user, adopting a face detection algorithm and a face key point positioning algorithm, detecting binocular positions and head height positions of the portrait in the portrait image, determining a cutting area according to the binocular positions and the head height positions, and acquiring an image in the cutting area, so that the cut image comprises a portrait head area and a portrait shoulder area, and obtaining data of an original image; the foreground part in the data of the original image meets the requirements of the certificate photo on the portrait.
S202, data enhancement is carried out on data of the original image
Taking the data of the cut original image as input RGB image data for model training, and performing data enhancement processing on the data of the original image; in order to increase the generalization capability of the model, the method adopts a random erasure enhancement method and image enhancement methods such as random cropping and scaling. The random cutting and scaling operation of the image can better expand the data set and increase the expression capacity of the data. The random erasing method randomly selects a rectangular area on the original image, and replaces the pixel value of the area with a random value. In the process, pictures participating in training can be shielded to different degrees, so that the problems that in the actual process, the portrait part on the portrait image is shielded, and the patterns, colors and the like of different clothes are influenced are solved, the overfitting risk is reduced, the generalization performance of the model is improved, and because the portrait area is partially shielded, the portrait area can still be correctly segmented, the network is forced to extract local more robust features, and the generalization capability of the model is improved to a certain degree.
S203, performing semantic segmentation on the original image to obtain a Trimap image
Performing semantic segmentation on the data of the original image to obtain a Trimap image, and dividing a foreground, a background and an uncertain region in the Trimap image; the Trimap is a static image matting algorithm, which can roughly divide a given image, that is, divide the given image into a foreground, a background and an uncertain region. In this implementation, the Trimap image includes a foreground, a background, and an uncertain region. Specifically, each pixel in the input original image is subjected to semantic segmentation through a T-net network, and a Trimap image is generated.
In order to increase the running speed of the model and reduce the size of the model, the mobilenet v2 is selected as the backbone network of the T-net in the embodiment. The mobile v2 is a lightweight network that can be deployed at mobile terminals such as mobile phones, and the size of the model and the complexity of the algorithm can be greatly reduced by using the network as a backbone network. According to the method, cross entropy loss is performed on labeled data of Trimap generated by performing expansion corrosion on a label and a Trimap graph generated by a network, so that model training and parameter tuning are guided. Wherein the cross entropy loss function formula (1) is as follows:
Figure BDA0002852263630000081
where M is the number of classes, yiIs an index variable, 1 if the column is identical to the sample i column, otherwise 0, piThe predicted probability that the observation sample i belongs to class c.
S204, finely dividing the original image and the Trimap image together
Introducing the original image and parameters of the Trimap image into a convolution network together for fine segmentation to obtain a primary extracted image alpha _ p; in the embodiment, the M-net network is used for fine segmentation, and a rough preliminary extraction image alpha _ p is generated to describe the detail information of the portrait by taking a Trimap image output by the T-net and an RGB image of an original image as input. Wherein, the M-net adopts a coding and decoding network similar to a U-net structure.
Preferably, the step of introducing the data of the original image and the parameters of the Trimap image into a convolution network together for fine segmentation to obtain a preliminary extracted image alpha _ p includes:
setting parameters of a convolution network, and introducing the data of the original image and the parameters of the Trimap image into the convolution network together for convolution to obtain parameters of a first convolution layer S1; performing convolution, activation and pooling operations on the parameters in each convolution layer from top to bottom according to the parameters of the first convolution layer S1 to obtain parameters of a second convolution layer S2, parameters of a third convolution layer S3, parameters of a fourth convolution layer S4 and parameters of a bottom layer X; carrying out deconvolution, activation and inverse pooling operations on the parameters of the bottom layer X and the parameters of the fourth convolution layer S4 together to obtain parameters of a fourth inverse convolution layer; carrying out deconvolution, activation and inverse pooling operations on the parameters of the fourth deconvolution layer S4 'and the parameters of the third convolution layer S3 together to obtain parameters of a third deconvolution layer S3'; carrying out deconvolution, activation and inverse pooling operations on the parameters of the third deconvolution layer S3 'and the parameters of the second convolution layer S2 together to obtain parameters of a second deconvolution layer S2'; carrying out deconvolution, activation and inverse pooling operations on the parameters of the second deconvolution layer S2 'and the parameters of the first convolution layer S1 together to obtain parameters of the first deconvolution layer S1'; and adjusting the number of output channels of the parameters of the first deconvolution layer S1' to obtain a preliminary extraction image alpha _ p. In the above steps, normalization processing is required after convolution and deconvolution. Since there are a lot of parameters in the encoding stage, which are easy to over-fit, this embodiment adds a Batch Normalization layer after each convolution layer to speed up the convergence of the model. Batch normalization is to forcibly pull back the distribution of the input value of any neuron of each layer of neural network to a standard normal distribution with the mean value of 0 and the variance of 1 through a certain normalization means. Specifically, in the encoding stage, the method receives 320 × 320 images, the input channel is 6, three channels of RGB images and three channels of Trimap images. The top-down multilayer feature fusion is carried out through a series of convolution layers with different filter numbers and pooling, and the step-by-step up-sampling operation is carried out through a series of different filter numbers in a decoding stage in order to remove the pooling, so that supervision and loss regression can be carried out only by using high-level features, and the feature map of each layer can be effectively utilized. As in the generation of the third deconvolution layer S3' S3', the upsampled fourth deconvolution layer S4' and the fourth convolution layer S4 are added together, and the same applies to the other layers. In this embodiment, the encoding stage sets convolution by 3 × 3, and the dimensions of the features output after each layer passes through 16, 24, 32, 96, and 320 are respectively that the first convolution layer S1 is: 320 × 16, the second convolution layer S2 is: 160 × 24, the third convolution layer S3 is: 80 × 32, the fourth convolution layer S4 is: 40X 96, the bottom layer X being: 20 × 320, the convolution layer is set to 3 × 3 size in the decoding stage, the number of channels of each layer is 96, 32, 24, 16, and the dimensions of the features output after 3 passes through each layer are respectively: the fourth deconvolution layer S4' is: 40 × 96, the third deconvolution layer S3' is: 80 x 32, second deconvolution layer S2': 160 × 24, first deconvolution layer S1': and 320 × 16, and finally setting the output channel to be 3 channels through the convolution layer, namely generating a preliminary extracted image alpha _ p, wherein the final output size is 320 × 3.
S205, fusing the preliminary extracted image with the Trimap image to obtain a final extracted image
The preliminary extracted image alpha _ p obtained in the previous step may basically represent the real final extracted image, but there still exists some flaws and still needs blending fine tuning. Therefore, in the step, parameters of any more than two areas of the foreground, the background and the uncertain area in the Trimap image are fused in the preliminary extraction image alpha _ p, and the preliminary extraction image alpha _ p is adjusted to obtain a final extraction image. The method is mainly used for fusing semantic information generated by the T-net and structure and texture detail information generated by the M-net to obtain a fine final extracted image. The final extracted image contains an alpha channel, and the alpha channel refers to the transparency and translucency of one picture. In the embodiment, since the coarsely-segmented Trimap image and the finely-segmented preliminary-extracted image alpha _ p are fused to generate the final-extracted image, the image appears finer and more natural. The generated final extracted image can be refined to hair, so that the user cannot see any sense of incongruity after the background is replaced.
In the fusion process of the embodiment, the result generated by the M-net is mainly used, and the result generated by the T-net is used as an auxiliary. The specific fusion mode is shown in formula 2 below:
Figure BDA0002852263630000101
in the formula
Figure BDA0002852263630000102
For the result of M-net generation, Fs,Bs,UsIs the result of T-net generation, FsRepresents the foreground, BsRepresents the background, UsRepresenting an uncertainty region. Due to Fs+Bs=1-Us
So equation 2 can be written as:
Figure BDA0002852263630000103
from the formula 3, it can be analyzed that when U issWhen the trend is 1, FsMoving toward 0, the final result is approximated by the output result of M _ net when U is equal tosWhen the trend is 0, FsTrending toward 1, the final result is Trimap result FsAs an approximation, the result of such fusion is very natural to combine coarse segmentation with fine details.
S206, integrating the final extracted image and a new background image to form a new character image
And extracting the final extracted image, and combining the extracted final extracted image with a new background image to form a new person image. Wherein, the new background image may be: red, white, and blue solid images.
The present invention includes a computer readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on the above readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
An embodiment of an automatic matting system according to the invention is described in detail below, with reference to fig. 3 to 4, which includes:
the image acquisition module 101, the image acquisition module 101 is used for acquiring data of an original image; the original image comprises a portrait head area and an upper shoulder area, the portrait image meeting the requirements of the certificate photo on the portrait can be directly used as the original image, the face detection algorithm and the face key point positioning algorithm are adopted for the portrait image not meeting the requirements of the certificate photo on the portrait, the binocular position and the head height position of the portrait are detected in the portrait image, the cutting area is determined according to the binocular position and the head height position, the image in the cutting area is obtained, and the cut image comprises the portrait head area and the upper shoulder area so as to obtain data of the original image.
The semantic segmentation module 102 is connected to the image acquisition module 101, and is configured to perform semantic segmentation on data of the original image to obtain a Trimap image, and partition a foreground, a background, and an uncertain region in the Trimap image; the Trimap is a static image matting algorithm, which can roughly divide a given image, that is, divide the given image into a foreground, a background and an uncertain region. In this implementation, the Trimap image includes a foreground, a background, and an uncertain region. Specifically, each pixel in the input original image is subjected to semantic segmentation through a T-net network, and a Trimap image is generated.
The fine segmentation module 103 is respectively connected with the image acquisition module 101 and the semantic segmentation module 102, and is configured to introduce the data of the original image and the parameters of the Trimap image into a convolutional network together for fine segmentation to obtain a preliminary extracted image alpha _ p; specifically, the method utilizes an M-net network to carry out fine segmentation, and takes a Trimap image output by T-net and an RGB image of an original image as input to generate a rough preliminary extraction image alpha _ p for describing detail information of a portrait.
An image fusion module 104, where the image fusion module 104 is respectively connected to the fine segmentation module 103 and the semantic segmentation module 102, and is configured to fuse parameters of any two or more regions, namely a foreground region, a background region and an uncertain region, in the Trimap image in the preliminary extracted image alpha _ p, and adjust the preliminary extracted image alpha _ p to obtain a final extracted image; the method is mainly used for fusing semantic information generated by the T-net and structure and texture detail information generated by the M-net to obtain a fine final extracted image. The final extracted image contains an alpha channel, and the alpha channel refers to the transparency and translucency of one picture. Specifically, parameters of a foreground and an uncertain region in the Trimap image are fused in the primary extracted image according to a fusion formula to obtain a final extracted image, wherein the fusion formula is as follows:
Figure BDA0002852263630000121
wherein the content of the first and second substances,
Figure BDA0002852263630000122
to extract the image for the final; fsIs a foreground parameter; u shapesIs an uncertain region parameter;
Figure BDA0002852263630000123
to extract an image preliminarily.
In the present invention, the fine segmentation module 103 includes:
a convolution network, wherein the convolution network comprises a first convolution layer S1, a second convolution layer S2, a third convolution layer S3, a fourth convolution layer S4 and a bottom layer X, wherein the data of the original image and the parameters of the Trimap image are convoluted together from top to bottom in sequence; and the bottommost layer X is provided with a fourth deconvolution layer S4', a third deconvolution layer S3', a second deconvolution layer S2 'and a first deconvolution layer S1' in sequence from bottom to top; and the fourth convolutional layer S4 is connected to the fourth deconvolution layer S4 'for providing the fourth deconvolution layer S4' with parameters of a fourth convolutional layer S4; the third convolutional layer S3 is connected to the third deconvolution layer S3 'for providing the third deconvolution layer S3' with parameters of a third convolutional layer S3; the second convolutional layer S2 is connected to the second deconvolution layer S2 'for providing the second deconvolution layer S2' with parameters of the second convolutional layer S2. As in the generation of the third deconvolution layer S3', the upsampled fourth deconvolution layer S4' and the fourth convolution layer S4 are added together, and the same applies to the other layers. In this embodiment, the encoding stage sets convolution by 3 × 3, and the dimensions of the features output after each layer passes through 16, 24, 32, 96, and 320 are respectively that the first convolution layer S1 is: 320 × 16, the second convolution layer S2 is: 160 × 24, the third convolution layer S3 is: 80 × 32, the fourth convolution layer S4 is: 40X 96, the bottom layer X being: 20 × 320, the convolution layer is set to 3 × 3 size in the decoding stage, the number of channels of each layer is 96, 32, 24, 16, and the dimensions of the features output after 3 passes through each layer are respectively: the fourth deconvolution layer S4' is: 40 × 96, the third deconvolution layer S3' is: 80 x 32, second deconvolution layer S2': 160 × 24, first deconvolution layer S1': and 320 × 16, and finally setting the output channel to be 3 channels through the convolution layer, namely generating a preliminary extracted image alpha _ p, wherein the final output size is 320 × 3.
In the description of the present specification, reference to the description of the terms "one embodiment", "some embodiments", "an illustrative embodiment", "an example", "a specific example", or "some examples", etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. An automatic matting method, characterized by comprising:
acquiring data of an original image;
performing semantic segmentation on the data of the original image to obtain a Trimap image, and dividing a foreground, a background and an uncertain region in the Trimap image;
introducing the data of the original image and the parameters of the Trimap image into a convolution network together for fine segmentation to obtain a primary extracted image;
and fusing parameters of any more than two areas of the foreground, the background and the uncertain area in the Trimap image in the preliminarily extracted image, and adjusting the preliminarily extracted image to obtain a finally extracted image.
2. The automatic matting method according to claim 1, wherein the step of introducing the data of the original image and the parameters of the Trimap image into a convolution network for fine segmentation to obtain a preliminary extracted image comprises:
setting parameters of a convolution network, and introducing the data of the original image and the parameters of the Trimap image into the convolution network together for convolution to obtain parameters of a first convolution layer;
performing convolution, activation and pooling operations on the parameters in each convolution layer from top to bottom according to the parameters of the first convolution layer to respectively obtain parameters of a second convolution layer, parameters of a third convolution layer, parameters of a fourth convolution layer and parameters of the bottom layer;
carrying out deconvolution, activation and anti-pooling operations on the parameters of the bottommost layer and the parameters of the fourth convolution layer together to obtain parameters of the fourth deconvolution layer;
carrying out deconvolution, activation and anti-pooling operations on the parameters of the fourth deconvolution layer and the parameters of the third deconvolution layer together to obtain parameters of the third deconvolution layer;
carrying out deconvolution, activation and inverse pooling operations on the parameters of the third deconvolution layer and the parameters of the second deconvolution layer together to obtain parameters of the second deconvolution layer;
carrying out deconvolution, activation and anti-pooling operations on the parameters of the second deconvolution layer and the parameters of the first convolution layer together to obtain parameters of the first deconvolution layer;
and adjusting the number of output channels of the parameters of the first deconvolution layer to obtain a preliminary extracted image.
3. The automatic matting method according to claim 2, characterized in that both said performing a convolution and said performing a deconvolution are followed by: and (6) carrying out normalization processing.
4. The automatic matting method according to claim 1, wherein fusing parameters of any two or more of a foreground, a background and an uncertain region in the Trimap image in the preliminary extracted image, and adjusting the preliminary extracted image to obtain a final extracted image comprises:
fusing parameters of the foreground and the uncertain region in the Trimap image in the primary extracted image according to a fusion formula to obtain a final extracted image, wherein the fusion formula is as follows:
Figure FDA0002852263620000021
wherein the content of the first and second substances,
Figure FDA0002852263620000022
to extract the image for the final; fsIs a foreground parameter; u shapesIs an uncertain region parameter;
Figure FDA0002852263620000023
to extract an image preliminarily.
5. The automatic matting method according to claim 4, wherein the acquiring data of an original image includes:
and importing a portrait image, detecting the binocular position and the head height position of the portrait in the portrait image, determining a cutting area according to the binocular position and the head height position, and acquiring an image in the cutting area to obtain data of an original image.
6. The automatic matting method according to claim 5, characterized in that said obtaining of data of an original image comprises, after:
and performing data enhancement processing on the data of the original image.
7. The automatic matting method according to claim 6, wherein said adjusting the preliminary extracted image to obtain the final extracted image includes:
and extracting the final extracted image, and combining the extracted final extracted image with a new background image to form a new person image.
8. An automatic matting system, comprising:
the image acquisition module is used for acquiring data of an original image;
the semantic segmentation module is connected with the image acquisition module and used for performing semantic segmentation on the data of the original image to obtain a Trimap image and dividing a foreground, a background and an uncertain region in the Trimap image;
the fine segmentation module is respectively connected with the image acquisition module and the semantic segmentation module and is used for introducing the data of the original image and the parameters of the Trimap image into a convolution network together for fine segmentation to obtain a primary extracted image;
and the image fusion module is respectively connected with the fine segmentation module and the semantic segmentation module and is used for fusing parameters of any more than two areas of a foreground, a background and an uncertain area in the Trimap image in the primary extracted image and adjusting the primary extracted image to obtain a final extracted image.
9. The automatic matting system according to claim 8, wherein the fine segmentation module includes:
the convolution network comprises a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer and a bottom layer, wherein the first convolution layer, the second convolution layer, the third convolution layer, the fourth convolution layer and the bottom layer are sequentially convolved from top to bottom together with the data of the original image and the parameters of the Trimap image; and a fourth deconvolution layer, a third deconvolution layer, a second deconvolution layer and a first deconvolution layer which are deconvoluted are sequentially arranged on the bottommost layer from bottom to top; the fourth convolution layer is connected with the fourth deconvolution layer and used for providing parameters of the fourth convolution layer for the fourth deconvolution layer; the third convolution layer is connected with the third deconvolution layer and used for providing parameters of the third convolution layer for the third deconvolution layer; the second convolution layer is connected with the second deconvolution layer and used for providing parameters of the second convolution layer for the second deconvolution layer.
10. A computer-readable program storage medium storing computer program instructions which, when executed by a computer, cause the computer to perform the method according to any one of claims 1 to 7.
CN202011531661.9A 2020-12-22 2020-12-22 Automatic image matting method, system and readable storage medium thereof Pending CN112581480A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011531661.9A CN112581480A (en) 2020-12-22 2020-12-22 Automatic image matting method, system and readable storage medium thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011531661.9A CN112581480A (en) 2020-12-22 2020-12-22 Automatic image matting method, system and readable storage medium thereof

Publications (1)

Publication Number Publication Date
CN112581480A true CN112581480A (en) 2021-03-30

Family

ID=75139410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011531661.9A Pending CN112581480A (en) 2020-12-22 2020-12-22 Automatic image matting method, system and readable storage medium thereof

Country Status (1)

Country Link
CN (1) CN112581480A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113409224A (en) * 2021-07-09 2021-09-17 浙江大学 Image target pertinence enhancing method, device, equipment and storage medium
CN114820666A (en) * 2022-04-29 2022-07-29 深圳万兴软件有限公司 Method and device for increasing matting accuracy, computer equipment and storage medium
WO2023137905A1 (en) * 2022-01-21 2023-07-27 小米科技(武汉)有限公司 Image processing method and apparatus, and electronic device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018107825A1 (en) * 2016-12-13 2018-06-21 华为技术有限公司 Matting method and device
US20190080456A1 (en) * 2017-09-12 2019-03-14 Shenzhen Keya Medical Technology Corporation Method and system for performing segmentation of image having a sparsely distributed object
CN109948562A (en) * 2019-03-25 2019-06-28 浙江啄云智能科技有限公司 A kind of safe examination system deep learning sample generating method based on radioscopic image
CN110008832A (en) * 2019-02-27 2019-07-12 西安电子科技大学 Based on deep learning character image automatic division method, information data processing terminal
CN110610509A (en) * 2019-09-18 2019-12-24 上海大学 Optimized matting method and system capable of assigning categories
CN111223106A (en) * 2019-10-28 2020-06-02 稿定(厦门)科技有限公司 Full-automatic portrait mask matting method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018107825A1 (en) * 2016-12-13 2018-06-21 华为技术有限公司 Matting method and device
US20190080456A1 (en) * 2017-09-12 2019-03-14 Shenzhen Keya Medical Technology Corporation Method and system for performing segmentation of image having a sparsely distributed object
CN110008832A (en) * 2019-02-27 2019-07-12 西安电子科技大学 Based on deep learning character image automatic division method, information data processing terminal
CN109948562A (en) * 2019-03-25 2019-06-28 浙江啄云智能科技有限公司 A kind of safe examination system deep learning sample generating method based on radioscopic image
CN110610509A (en) * 2019-09-18 2019-12-24 上海大学 Optimized matting method and system capable of assigning categories
CN111223106A (en) * 2019-10-28 2020-06-02 稿定(厦门)科技有限公司 Full-automatic portrait mask matting method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
QUAN CHEN ET AL.: "Semantic Human Matting", 《ACM MULTIMEDIA 2018》, pages 618 - 626 *
张盛林 等: "基于图像抠图技术的多聚焦图像融合方法", 《计算机应用》, vol. 36, pages 1949 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113409224A (en) * 2021-07-09 2021-09-17 浙江大学 Image target pertinence enhancing method, device, equipment and storage medium
CN113409224B (en) * 2021-07-09 2023-07-04 浙江大学 Image target pertinence enhancement method, device, equipment and storage medium
WO2023137905A1 (en) * 2022-01-21 2023-07-27 小米科技(武汉)有限公司 Image processing method and apparatus, and electronic device and storage medium
CN114820666A (en) * 2022-04-29 2022-07-29 深圳万兴软件有限公司 Method and device for increasing matting accuracy, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112581480A (en) Automatic image matting method, system and readable storage medium thereof
WO2021164534A1 (en) Image processing method and apparatus, device, and storage medium
CN110400323B (en) Automatic cutout system, method and device
CN110889855B (en) Certificate photo matting method and system based on end-to-end convolution neural network
US8655069B2 (en) Updating image segmentation following user input
Xiao et al. Example‐Based Colourization Via Dense Encoding Pyramids
CN114283164B (en) Breast cancer pathological section image segmentation prediction system based on UNet3+
CN110866938B (en) Full-automatic video moving object segmentation method
CN105701489A (en) Novel digital extraction and identification method and system thereof
CN112949754B (en) Text recognition data synthesis method based on image fusion
CN110827371A (en) Certificate photo generation method and device, electronic equipment and storage medium
CN112784849B (en) Glandular segmentation method based on multi-scale attention selection
CN113392791A (en) Skin prediction processing method, device, equipment and storage medium
CN105580050A (en) Providing control points in images
CN113158856B (en) Processing method and device for extracting target area in remote sensing image
CN115471901B (en) Multi-pose face frontization method and system based on generation of confrontation network
CN111325263A (en) Image processing method and device, intelligent microscope, readable storage medium and equipment
CN116485944A (en) Image processing method and device, computer readable storage medium and electronic equipment
CN115641317A (en) Pathological image-oriented dynamic knowledge backtracking multi-example learning and image classification method
CN114882282A (en) Neural network prediction method for colorectal cancer treatment effect based on MRI and CT images
CN111932557B (en) Image semantic segmentation method and device based on ensemble learning and probability map model
CN114494272A (en) Metal part fast segmentation method based on deep learning
CN114373109A (en) Natural image matting method and natural image matting device based on deep learning
CN114708274A (en) Image segmentation method and system of T-CutMix data enhancement and three-dimensional convolution neural network based on real-time selection mechanism
EP2698693B1 (en) Local image translating method and terminal with touch screen

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination