CN112581480A

CN112581480A - Automatic image matting method, system and readable storage medium thereof

Info

Publication number: CN112581480A
Application number: CN202011531661.9A
Authority: CN
Inventors: 李增前
Original assignee: Shenzhen Emperor Technology Co Ltd
Current assignee: Shenzhen Emperor Technology Co Ltd
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2021-03-30

Abstract

The invention relates to the field of image processing, in particular to an automatic cutout method, an automatic cutout system and a readable storage medium, wherein the method comprises the following steps: acquiring data of an original image; performing semantic segmentation on the data of the original image to obtain a Trimap image, and dividing a foreground, a background and an uncertain region in the Trimap image; introducing the data of the original image and the parameters of the Trimap image into a convolution network together for fine segmentation to obtain a primary extracted image; fusing parameters of any more than two areas of a foreground, a background and an uncertain area in the Trimap image in the primary extracted image, and adjusting the primary extracted image to obtain a final extracted image; according to the invention, as the parameters of the Trimap image are introduced into the primary extracted image, the matting effect is more accurate, manual operation is not needed, the labor cost is effectively saved, and the working efficiency is higher.

Description

Automatic image matting method, system and readable storage medium thereof

Technical Field

The invention relates to the technical field of image processing, in particular to an automatic cutout method, an automatic cutout system and a readable storage medium storing the method.

Background

With the development of science and technology, people often utilize some digital images in social life; before a digital image is used, some image processing is often required, and matting is one of the most frequently-performed operations in the image processing, namely extracting a foreground part of a picture or image from an original picture or image from a background to form an individual layer; which is mainly prepared for later image synthesis.

In the technology in the market today, the matting work is mainly to designate all or part of foreground and background areas in a manual operation mode; and because the synthetic background of the certificate photo is pure color, the tiny scratch flaws are easily amplified in the synthetic image, and therefore the method has higher precision requirement on the foreground boundary in the certificate photo. Therefore, when the document photo is subjected to the cutout processing, the generated photo is often subjected to manual cutout by virtue of a cutout tool, and the background is manually replaced to complete the production of the document photo. And the process must be fine to the hair so that the hair and the background look like no disharmony, which often needs a very professional person to complete, and the process is cumbersome, time consuming and laborious. Some automatic image matting methods in the market are generally image segmentation schemes based on deep learning, and often only relatively coarse segmentation results can be obtained, so that it is difficult to directly generate fine extracted images, and the requirements of certificate photographs cannot be met.

Disclosure of Invention

In order to overcome the above-mentioned drawbacks, the present invention provides a method and a system for automatic image matting with high speed, high efficiency and high precision, and a readable storage medium storing the method.

The purpose of the invention is realized by the following technical scheme:

the invention relates to an automatic image matting method, which comprises the following steps:

acquiring data of an original image;

performing semantic segmentation on the data of the original image to obtain a Trimap image, and dividing a foreground, a background and an uncertain region in the Trimap image;

introducing the data of the original image and the parameters of the Trimap image into a convolution network together for fine segmentation to obtain a primary extracted image;

and fusing parameters of any more than two areas of the foreground, the background and the uncertain area in the Trimap image in the preliminarily extracted image, and adjusting the preliminarily extracted image to obtain a finally extracted image.

In the invention, the step of introducing the data of the original image and the parameters of the Trimap image into a convolution network for fine segmentation to obtain a primary extracted image comprises the following steps:

setting parameters of a convolution network, and introducing the data of the original image and the parameters of the Trimap image into the convolution network together for convolution to obtain parameters of a first convolution layer;

performing convolution, activation and pooling operations on the parameters in each convolution layer from top to bottom according to the parameters of the first convolution layer to respectively obtain parameters of a second convolution layer, parameters of a third convolution layer, parameters of a fourth convolution layer and parameters of the bottom layer;

carrying out deconvolution, activation and anti-pooling operations on the parameters of the bottommost layer and the parameters of the fourth convolution layer together to obtain parameters of the fourth deconvolution layer;

carrying out deconvolution, activation and anti-pooling operations on the parameters of the fourth deconvolution layer and the parameters of the third deconvolution layer together to obtain parameters of the third deconvolution layer;

carrying out deconvolution, activation and inverse pooling operations on the parameters of the third deconvolution layer and the parameters of the second deconvolution layer together to obtain parameters of the second deconvolution layer;

carrying out deconvolution, activation and anti-pooling operations on the parameters of the second deconvolution layer and the parameters of the first convolution layer together to obtain parameters of the first deconvolution layer;

and adjusting the number of output channels of the parameters of the first deconvolution layer to obtain a preliminary extracted image.

In the present invention, the performing the convolution and the performing the deconvolution each include: and (6) carrying out normalization processing.

In the invention, fusing the parameters of any more than two areas of the foreground, the background and the uncertain area in the Trimap image in the primary extracted image, and adjusting the primary extracted image to obtain the final extracted image comprises:

fusing parameters of the foreground and the uncertain region in the Trimap image in the primary extracted image according to a fusion formula to obtain a final extracted image, wherein the fusion formula is as follows:

wherein the content of the first and second substances,

to extract the image for the final; f_sIs a foreground parameter; u shape_sIs an uncertain region parameter;

to extract an image preliminarily.

In the present invention, the acquiring data of the original image includes:

and importing a portrait image, detecting the binocular position and the head height position of the portrait in the portrait image, determining a cutting area according to the binocular position and the head height position, and acquiring an image in the cutting area to obtain data of an original image.

In the present invention, the acquiring data of the original image comprises:

and performing data enhancement processing on the data of the original image.

In the present invention, after the adjusting the preliminary extracted image to obtain the final extracted image, the method includes:

and extracting the final extracted image, and combining the extracted final extracted image with a new background image to form a new person image.

Based on the same conception, the invention also provides an automatic cutout system, which comprises:

the image acquisition module is used for acquiring data of an original image;

the semantic segmentation module is connected with the image acquisition module and used for performing semantic segmentation on the data of the original image to obtain a Trimap image and dividing a foreground, a background and an uncertain region in the Trimap image;

the fine segmentation module is respectively connected with the image acquisition module and the semantic segmentation module and is used for introducing the data of the original image and the parameters of the Trimap image into a convolution network together for fine segmentation to obtain a primary extracted image;

and the image fusion module is respectively connected with the fine segmentation module and the semantic segmentation module and is used for fusing parameters of any more than two areas of a foreground, a background and an uncertain area in the Trimap image in the primary extracted image and adjusting the primary extracted image to obtain a final extracted image.

In the present invention, the fine segmentation module includes:

the convolution network comprises a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer and a bottom layer, wherein the first convolution layer, the second convolution layer, the third convolution layer, the fourth convolution layer and the bottom layer are sequentially convolved from top to bottom together with the data of the original image and the parameters of the Trimap image; and a fourth deconvolution layer, a third deconvolution layer, a second deconvolution layer and a first deconvolution layer which are deconvoluted are sequentially arranged on the bottommost layer from bottom to top; the fourth convolution layer is connected with the fourth deconvolution layer and used for providing parameters of the fourth convolution layer for the fourth deconvolution layer; the third convolution layer is connected with the third deconvolution layer and used for providing parameters of the third convolution layer for the third deconvolution layer; the second convolution layer is connected with the second deconvolution layer and used for providing parameters of the second convolution layer for the second deconvolution layer.

Based on the same concept, the present invention also provides a computer-readable program storage medium storing computer program instructions which, when executed by a computer, cause the computer to perform the method as described above.

In the automatic image matting method, firstly, semantic segmentation is carried out on an original image to obtain a Trimap image; finely dividing the Trimap image to obtain a primary extracted image; then importing parameters of the Trimap image into the preliminary extracted image, and adjusting the preliminary extracted image to obtain a final extracted image; according to the invention, as the parameters of the Trimap image are introduced into the primary extracted image, the matting effect is more accurate, manual operation is not needed, the labor cost is effectively saved, and the working efficiency is higher.

Drawings

For the purpose of easy explanation, the present invention will be described in detail with reference to the following preferred embodiments and the accompanying drawings.

FIG. 1 is a schematic view of the workflow of one embodiment of the automatic matting method of the present invention;

FIG. 2 is a schematic view of the work flow of another embodiment of the automatic matting method of the present invention;

FIG. 3 is a schematic diagram of the operation principle of fine segmentation in the automatic matting method according to the present invention;

fig. 4 is a schematic diagram of a logical structure of an embodiment of the automatic matting system of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", and the like, indicate orientations and positional relationships based on those shown in the drawings, and are used only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be considered as limiting the present invention. Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

In the description of the present invention, it should be noted that the terms "mounted," "connected," and "connected" are to be construed broadly and may be, for example, fixedly connected, detachably connected, or integrally connected unless otherwise explicitly stated or limited. Either mechanically or electrically. Either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

An embodiment of the present invention will be described in detail below with reference to fig. 1, which includes:

s101, acquiring data of original image

Importing a portrait image, and directly taking the imported portrait image as data of an original image if the portrait image is a portrait image meeting the certificate photo standard; the method is suitable for the situation that the background color of the existing certificate photo is changed in real life; the portrait image comprises a portrait head and an upper shoulder area, and is an RGB image.

S102, performing semantic segmentation on the original image to obtain a Trimap image

Performing semantic segmentation on the data of the original image to obtain a Trimap image, and dividing a foreground, a background and an uncertain region in the Trimap image; the Trimap is a static image matting algorithm, which can roughly divide a given image, that is, divide the given image into a foreground, a background and an uncertain region. In this implementation, the Trimap image includes a foreground, a background, and an uncertain region. Specifically, each pixel in the input original image is subjected to semantic segmentation through a T-net network, and a Trimap image is generated.

S103, finely dividing the original image and the Trimap image together

Introducing the original image and parameters of the Trimap image into a convolution network together for fine segmentation to obtain a primary extracted image alpha _ p; in the embodiment, the M-net network is used for fine segmentation, and a rough preliminary extraction image alpha _ p is generated to describe the detail information of the portrait by taking a Trimap image output by the T-net and an RGB image of an original image as input. Wherein, the M-net adopts a coding and decoding network similar to a U-net structure.

S104, fusing the preliminary extracted image with the Trimap image to obtain a final extracted image

The preliminary extracted image alpha _ p obtained in the previous step may basically represent the real final extracted image, but there still exists some flaws and still needs blending fine tuning. Therefore, in the step, parameters of any more than two areas of the foreground, the background and the uncertain area in the Trimap image are fused in the preliminary extraction image alpha _ p, and the preliminary extraction image alpha _ p is adjusted to obtain a final extraction image. The method is mainly used for fusing semantic information generated by the T-net and structure and texture detail information generated by the M-net to obtain a fine final extracted image. The final extracted image contains an alpha channel, and the alpha channel refers to the transparency and translucency of one picture. In the embodiment, since the coarsely-segmented Trimap image and the finely-segmented preliminary-extracted image alpha _ p are fused to generate the final-extracted image, the image appears finer and more natural. The generated final extracted image can be refined to hair, so that the user cannot see any sense of incongruity after the background is replaced.

In another embodiment, an automatic matting method according to the present invention is described in detail below with reference to fig. 2 to 3, which includes:

s201, acquiring data of original image through cropping

Importing a portrait image, if the portrait image is a photo taken by a user, adopting a face detection algorithm and a face key point positioning algorithm, detecting binocular positions and head height positions of the portrait in the portrait image, determining a cutting area according to the binocular positions and the head height positions, and acquiring an image in the cutting area, so that the cut image comprises a portrait head area and a portrait shoulder area, and obtaining data of an original image; the foreground part in the data of the original image meets the requirements of the certificate photo on the portrait.

S202, data enhancement is carried out on data of the original image

Taking the data of the cut original image as input RGB image data for model training, and performing data enhancement processing on the data of the original image; in order to increase the generalization capability of the model, the method adopts a random erasure enhancement method and image enhancement methods such as random cropping and scaling. The random cutting and scaling operation of the image can better expand the data set and increase the expression capacity of the data. The random erasing method randomly selects a rectangular area on the original image, and replaces the pixel value of the area with a random value. In the process, pictures participating in training can be shielded to different degrees, so that the problems that in the actual process, the portrait part on the portrait image is shielded, and the patterns, colors and the like of different clothes are influenced are solved, the overfitting risk is reduced, the generalization performance of the model is improved, and because the portrait area is partially shielded, the portrait area can still be correctly segmented, the network is forced to extract local more robust features, and the generalization capability of the model is improved to a certain degree.

S203, performing semantic segmentation on the original image to obtain a Trimap image

In order to increase the running speed of the model and reduce the size of the model, the mobilenet v2 is selected as the backbone network of the T-net in the embodiment. The mobile v2 is a lightweight network that can be deployed at mobile terminals such as mobile phones, and the size of the model and the complexity of the algorithm can be greatly reduced by using the network as a backbone network. According to the method, cross entropy loss is performed on labeled data of Trimap generated by performing expansion corrosion on a label and a Trimap graph generated by a network, so that model training and parameter tuning are guided. Wherein the cross entropy loss function formula (1) is as follows:

where M is the number of classes, y_iIs an index variable, 1 if the column is identical to the sample i column, otherwise 0, p_iThe predicted probability that the observation sample i belongs to class c.

S204, finely dividing the original image and the Trimap image together

Preferably, the step of introducing the data of the original image and the parameters of the Trimap image into a convolution network together for fine segmentation to obtain a preliminary extracted image alpha _ p includes:

setting parameters of a convolution network, and introducing the data of the original image and the parameters of the Trimap image into the convolution network together for convolution to obtain parameters of a first convolution layer S1; performing convolution, activation and pooling operations on the parameters in each convolution layer from top to bottom according to the parameters of the first convolution layer S1 to obtain parameters of a second convolution layer S2, parameters of a third convolution layer S3, parameters of a fourth convolution layer S4 and parameters of a bottom layer X; carrying out deconvolution, activation and inverse pooling operations on the parameters of the bottom layer X and the parameters of the fourth convolution layer S4 together to obtain parameters of a fourth inverse convolution layer; carrying out deconvolution, activation and inverse pooling operations on the parameters of the fourth deconvolution layer S4 'and the parameters of the third convolution layer S3 together to obtain parameters of a third deconvolution layer S3'; carrying out deconvolution, activation and inverse pooling operations on the parameters of the third deconvolution layer S3 'and the parameters of the second convolution layer S2 together to obtain parameters of a second deconvolution layer S2'; carrying out deconvolution, activation and inverse pooling operations on the parameters of the second deconvolution layer S2 'and the parameters of the first convolution layer S1 together to obtain parameters of the first deconvolution layer S1'; and adjusting the number of output channels of the parameters of the first deconvolution layer S1' to obtain a preliminary extraction image alpha _ p. In the above steps, normalization processing is required after convolution and deconvolution. Since there are a lot of parameters in the encoding stage, which are easy to over-fit, this embodiment adds a Batch Normalization layer after each convolution layer to speed up the convergence of the model. Batch normalization is to forcibly pull back the distribution of the input value of any neuron of each layer of neural network to a standard normal distribution with the mean value of 0 and the variance of 1 through a certain normalization means. Specifically, in the encoding stage, the method receives 320 × 320 images, the input channel is 6, three channels of RGB images and three channels of Trimap images. The top-down multilayer feature fusion is carried out through a series of convolution layers with different filter numbers and pooling, and the step-by-step up-sampling operation is carried out through a series of different filter numbers in a decoding stage in order to remove the pooling, so that supervision and loss regression can be carried out only by using high-level features, and the feature map of each layer can be effectively utilized. As in the generation of the third deconvolution layer S3' S3', the upsampled fourth deconvolution layer S4' and the fourth convolution layer S4 are added together, and the same applies to the other layers. In this embodiment, the encoding stage sets convolution by 3 × 3, and the dimensions of the features output after each layer passes through 16, 24, 32, 96, and 320 are respectively that the first convolution layer S1 is: 320 × 16, the second convolution layer S2 is: 160 × 24, the third convolution layer S3 is: 80 × 32, the fourth convolution layer S4 is: 40X 96, the bottom layer X being: 20 × 320, the convolution layer is set to 3 × 3 size in the decoding stage, the number of channels of each layer is 96, 32, 24, 16, and the dimensions of the features output after 3 passes through each layer are respectively: the fourth deconvolution layer S4' is: 40 × 96, the third deconvolution layer S3' is: 80 x 32, second deconvolution layer S2': 160 × 24, first deconvolution layer S1': and 320 × 16, and finally setting the output channel to be 3 channels through the convolution layer, namely generating a preliminary extracted image alpha _ p, wherein the final output size is 320 × 3.

S205, fusing the preliminary extracted image with the Trimap image to obtain a final extracted image

In the fusion process of the embodiment, the result generated by the M-net is mainly used, and the result generated by the T-net is used as an auxiliary. The specific fusion mode is shown in formula 2 below:

in the formula

For the result of M-net generation, F_s，B_s，U_sIs the result of T-net generation, F_sRepresents the foreground, B_sRepresents the background, U_sRepresenting an uncertainty region. Due to F_s+B_s＝1-U_s

So equation 2 can be written as:

from the formula 3, it can be analyzed that when U is_sWhen the trend is 1, F_sMoving toward 0, the final result is approximated by the output result of M _ net when U is equal to_sWhen the trend is 0, F_sTrending toward 1, the final result is Trimap result F_sAs an approximation, the result of such fusion is very natural to combine coarse segmentation with fine details.

S206, integrating the final extracted image and a new background image to form a new character image

And extracting the final extracted image, and combining the extracted final extracted image with a new background image to form a new person image. Wherein, the new background image may be: red, white, and blue solid images.

The present invention includes a computer readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on the above readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

An embodiment of an automatic matting system according to the invention is described in detail below, with reference to fig. 3 to 4, which includes:

the image acquisition module 101, the image acquisition module 101 is used for acquiring data of an original image; the original image comprises a portrait head area and an upper shoulder area, the portrait image meeting the requirements of the certificate photo on the portrait can be directly used as the original image, the face detection algorithm and the face key point positioning algorithm are adopted for the portrait image not meeting the requirements of the certificate photo on the portrait, the binocular position and the head height position of the portrait are detected in the portrait image, the cutting area is determined according to the binocular position and the head height position, the image in the cutting area is obtained, and the cut image comprises the portrait head area and the upper shoulder area so as to obtain data of the original image.

The semantic segmentation module 102 is connected to the image acquisition module 101, and is configured to perform semantic segmentation on data of the original image to obtain a Trimap image, and partition a foreground, a background, and an uncertain region in the Trimap image; the Trimap is a static image matting algorithm, which can roughly divide a given image, that is, divide the given image into a foreground, a background and an uncertain region. In this implementation, the Trimap image includes a foreground, a background, and an uncertain region. Specifically, each pixel in the input original image is subjected to semantic segmentation through a T-net network, and a Trimap image is generated.

The fine segmentation module 103 is respectively connected with the image acquisition module 101 and the semantic segmentation module 102, and is configured to introduce the data of the original image and the parameters of the Trimap image into a convolutional network together for fine segmentation to obtain a preliminary extracted image alpha _ p; specifically, the method utilizes an M-net network to carry out fine segmentation, and takes a Trimap image output by T-net and an RGB image of an original image as input to generate a rough preliminary extraction image alpha _ p for describing detail information of a portrait.

An image fusion module 104, where the image fusion module 104 is respectively connected to the fine segmentation module 103 and the semantic segmentation module 102, and is configured to fuse parameters of any two or more regions, namely a foreground region, a background region and an uncertain region, in the Trimap image in the preliminary extracted image alpha _ p, and adjust the preliminary extracted image alpha _ p to obtain a final extracted image; the method is mainly used for fusing semantic information generated by the T-net and structure and texture detail information generated by the M-net to obtain a fine final extracted image. The final extracted image contains an alpha channel, and the alpha channel refers to the transparency and translucency of one picture. Specifically, parameters of a foreground and an uncertain region in the Trimap image are fused in the primary extracted image according to a fusion formula to obtain a final extracted image, wherein the fusion formula is as follows:

wherein the content of the first and second substances,

to extract an image preliminarily.

In the present invention, the fine segmentation module 103 includes:

a convolution network, wherein the convolution network comprises a first convolution layer S1, a second convolution layer S2, a third convolution layer S3, a fourth convolution layer S4 and a bottom layer X, wherein the data of the original image and the parameters of the Trimap image are convoluted together from top to bottom in sequence; and the bottommost layer X is provided with a fourth deconvolution layer S4', a third deconvolution layer S3', a second deconvolution layer S2 'and a first deconvolution layer S1' in sequence from bottom to top; and the fourth convolutional layer S4 is connected to the fourth deconvolution layer S4 'for providing the fourth deconvolution layer S4' with parameters of a fourth convolutional layer S4; the third convolutional layer S3 is connected to the third deconvolution layer S3 'for providing the third deconvolution layer S3' with parameters of a third convolutional layer S3; the second convolutional layer S2 is connected to the second deconvolution layer S2 'for providing the second deconvolution layer S2' with parameters of the second convolutional layer S2. As in the generation of the third deconvolution layer S3', the upsampled fourth deconvolution layer S4' and the fourth convolution layer S4 are added together, and the same applies to the other layers. In this embodiment, the encoding stage sets convolution by 3 × 3, and the dimensions of the features output after each layer passes through 16, 24, 32, 96, and 320 are respectively that the first convolution layer S1 is: 320 × 16, the second convolution layer S2 is: 160 × 24, the third convolution layer S3 is: 80 × 32, the fourth convolution layer S4 is: 40X 96, the bottom layer X being: 20 × 320, the convolution layer is set to 3 × 3 size in the decoding stage, the number of channels of each layer is 96, 32, 24, 16, and the dimensions of the features output after 3 passes through each layer are respectively: the fourth deconvolution layer S4' is: 40 × 96, the third deconvolution layer S3' is: 80 x 32, second deconvolution layer S2': 160 × 24, first deconvolution layer S1': and 320 × 16, and finally setting the output channel to be 3 channels through the convolution layer, namely generating a preliminary extracted image alpha _ p, wherein the final output size is 320 × 3.

In the description of the present specification, reference to the description of the terms "one embodiment", "some embodiments", "an illustrative embodiment", "an example", "a specific example", or "some examples", etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. An automatic matting method, characterized by comprising:

acquiring data of an original image;

2. The automatic matting method according to claim 1, wherein the step of introducing the data of the original image and the parameters of the Trimap image into a convolution network for fine segmentation to obtain a preliminary extracted image comprises:

3. The automatic matting method according to claim 2, characterized in that both said performing a convolution and said performing a deconvolution are followed by: and (6) carrying out normalization processing.

4. The automatic matting method according to claim 1, wherein fusing parameters of any two or more of a foreground, a background and an uncertain region in the Trimap image in the preliminary extracted image, and adjusting the preliminary extracted image to obtain a final extracted image comprises:

wherein the content of the first and second substances,

to extract an image preliminarily.

5. The automatic matting method according to claim 4, wherein the acquiring data of an original image includes:

6. The automatic matting method according to claim 5, characterized in that said obtaining of data of an original image comprises, after:

and performing data enhancement processing on the data of the original image.

7. The automatic matting method according to claim 6, wherein said adjusting the preliminary extracted image to obtain the final extracted image includes:

8. An automatic matting system, comprising:

the image acquisition module is used for acquiring data of an original image;

9. The automatic matting system according to claim 8, wherein the fine segmentation module includes:

10. A computer-readable program storage medium storing computer program instructions which, when executed by a computer, cause the computer to perform the method according to any one of claims 1 to 7.