CN117809350A

CN117809350A - Artistic portrait drawing generation method based on UNet

Info

Publication number: CN117809350A
Application number: CN202311871886.2A
Authority: CN
Inventors: 李娟�; 吴斌龙
Original assignee: Fuzhou Institute of Technology
Current assignee: Fuzhou Institute of Technology
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2024-04-02

Abstract

The invention relates to an artistic portrait drawing generation method based on UNet. Comprising the following steps: s1, acquiring an image containing a human face to perform human face positioning including human face detection and human face key point positioning; s2, preprocessing face images, and obtaining standard portrait images with 512x512 resolution through face alignment; s3, outputting semantic segmentation information and portrait images through a UNet network at the same time, wherein the semantic segmentation information comprises segmented portraits and background masks, and the portrait images are fine portrait images; and S4, fusing the background Mask and the detailed portrait image to obtain the artistic portrait. The invention has high automation degree and high portrait restoring precision, and provides technical support for the generation of the portrait of the person.

Description

Artistic portrait drawing generation method based on UNet

Technical Field

The invention relates to the field of image processing and computer vision, in particular to an artistic portrait drawing generation method based on UNet.

Background

The artistic portrait generation refers to that a computer generates a portrait with artistic style according to a given face image, and is one of research hotspots of an image generation algorithm. The artistic portrait is very good in favor of protecting privacy and safety, and simultaneously, the artistic portrait can display the character characteristics with fun and harmony. On one hand, people increasingly like to release artistic portrait as an avatar on Internet platforms such as tremble, microblog and the like. In this way, they not only show personal identities to acquaintances, but also protect personal privacy. On the other hand, in order to achieve a high quality portrait, a artist needs to learn and train for a long time and professionally, so as to express facial features, mental aspects and mental quality of a person, the artist needs to grasp key features of the face in a short time in addition to having a solid drawing work, and a realistic artistic portrait can be easily obtained based on mechanical arm painting creation of the artistic portrait.

Prior art 1: wu Tao et al propose a compact line portrait creation method based on semantic segmentation. Firstly, carrying out semantic segmentation on a face image, dividing the face into different areas, extracting edge contours and five-sense organ detail lines based on the different areas, and carrying out edge tangential flow optimization so as to strengthen direction information; on the basis, a line drawing is utilized to generate a reconciliation image, and parameters of a line extraction method are adjusted for different segmentation areas by utilizing the optimized edge tangential flow, a face semantic segmentation result and the reconciliation image, so that line filtering of detail irrelevant areas and line reinforcement of detail important areas are realized, and a concise line portrait is generated.

Disadvantages: the method requires a face image with a cleaner background, and face detail features (such as moles on the face) cannot be extracted.

Prior art 2: ran et al propose apdragwinggan based on generating an antagonizing network. Artists often draw artistic portraits using different drawing techniques for different facial parts, such as hair using long, continuous lines, etc. Apdragwinggan employs a plurality of convolutional neural networks based on the above theory, wherein the generator comprises one global network, six local networks, and one converged network. The global network may generate an overall structure of the face image corresponding to the portrait, while the six local networks may learn different rendering techniques for different facial regions to generate high quality facial components. And finally, fusing the characteristics generated by the global network and the local network through a fusion network to generate a high-quality portrait. Meanwhile, a distance transformation loss function is provided for the problem that the portrait and the original picture do not accurately correspond. This subtle error is tolerated by computing the sum of the pixel on each line in the actual portrait to the nearest pixel distance of the same type in the generated portrait and the sum of the distances of the generated portrait to the actual portrait. Subsequently, ran et al have proposed apdragwinggan++ on the basis of apdragwinggan, demonstrating that any generating countermeasure network with a single generator cannot generate a steel drawing style artistic portrait, and providing a theoretical explanation of the necessity of a composite structure of the global network and the local network. Meanwhile, the distance transformation loss and the line continuity loss based on nonlinear mapping are provided to improve the generation quality of the line.

Prior art 3: qin et al propose a simple and powerful deep network architecture U ² Net. The model is embedded with a layer of U-shaped structure on the basis of the original UNet, and provides a residual connection RSU structure, so that the model does not need to use an image pre-training backbone model. The RSU structure obtains more context information by fusing the low-level high-resolution local features with the high-level low-resolution global features. Meanwhile, the RSU structure uses pooling operation, so that high resolution can be obtained under the condition that the network layer number is deepened, and meanwhile, the memory occupancy rate and the calculation cost are not increased. Subsequently, the model is applied to the task of converting the face photo into the portrait and is obtainedGood results are achieved.

Disadvantages: the method requires a face image with a cleaner background.

Prior art 4: 201711324257.2-A Portrait drawing system, method and storage medium, which disclose a Portrait drawing method comprising the steps of: the acquisition step: acquiring a portrait picture of a user; and (3) color clustering: performing color clustering on the acquired portrait pictures to obtain user head features, wherein the user head features comprise facial edge areas, hairs, facial contours and facial images; a first portrait drawing step: and controlling the drawing mechanical arm to draw the portrait of the user according to the head characteristics of the user and the portrait style template. The invention also discloses a portrait drawing system and a computer readable storage medium. According to the portrait drawing system and the portrait drawing method, portrait pictures with different painting styles are obtained by obtaining the portrait images of the user and the constructed portrait style template, and portrait drawing of the user is completed through the painting mechanical arm, so that personalized portrait drawing requirements of different people are met.

The method is characterized in that a head characteristic clustering method based on portrait images is used, portrait drawing is completed by using a mechanical arm, the portrait images are required to be cleaner, and a clustering algorithm is adopted.

Prior art 5: 202011431526.7-Portrait generation method, device, electronic equipment and medium, which discloses a Portrait generation method, device, electronic equipment and medium. In the application, an image to be processed can be obtained, a face area in the image to be processed is determined, and face alignment processing is carried out on the face area image to obtain an initial portrait image; obtaining a facial geometric image in the initial portrait image through facial component information of preset times, space self-adaptive normalization and geometric loss function; and eliminating the deformation degree in the facial geometric image by using the relaxed pixel level reconstruction loss to obtain the target portrait image. By applying the technical scheme, the face geometric image in the synthetic artistic portrait can be accurately captured by the new generator by adopting cyclic utilization of face component information and improved space self-adaptive normalization and geometric loss function, and the deformation between the input image and the corresponding target image is eliminated by using the relaxed pixel level reconstruction loss, so that a robust character artistic portrait generating method is formed.

The method for generating the artistic portrait based on the face region by recycling face component information and improved space self-adaptive normalization and geometric loss function is described, the face region image is required to be cleaner, and a multi-task generator focuses on a single-style image main task and a face semantic tag auxiliary task.

Prior art 6: 202110142946.1-sketch generation method, device, electronic equipment and medium, which discloses a sketch generation method, device, electronic equipment and medium. In the application, an initial sketch image and a portrait image can be acquired, wherein the initial sketch image and the portrait image comprise face images of target users; generating a first simple image of the target user by using the initial sketch image, and generating a second simple image of the target user by using the portrait image; a target prime drawing of the face image is generated based on the first and second simple images. By applying the technical scheme, the reasonable and realistic portrait can be generated by utilizing different face stroke types, so that the problem that the simple drawing is not written practically enough when the data sets are not paired is solved.

The method is used for generating a target sketch image of a face image through a first simple image and a second simple image, and the target sketch image method requires that the face area image is cleaner.

Disclosure of Invention

Aiming at the problem that the existing face portrait generation method requires cleaner face area background, the invention aims to generate a vivid portrait, does not limit the image background to be processed at the same time, and is applicable to more scenes, so the invention provides an UNet-based artistic portrait generation method.

In order to achieve the above purpose, the technical scheme of the invention is as follows: an artistic portrait creation method based on UNet includes the following steps:

s1, acquiring an image containing a human face, and carrying out human face positioning including human face detection and human face key point positioning;

s2, preprocessing face images, and obtaining standard portrait images with 512x512 resolution through face alignment;

s3, outputting semantic segmentation information and portrait images through a UNet network at the same time, wherein the semantic segmentation information comprises segmented portraits and background masks, and the portrait images are fine portrait images;

and S4, fusing the background Mask and the detailed portrait image to obtain the artistic portrait.

In an embodiment of the present invention, the step S1 includes the following steps:

step S11, obtaining a maximum face by carrying out face detection on an image containing the face, wherein the formula for obtaining the maximum face is as follows:

wherein w is _i And h _i Is the width and height of the detection frame of the ith face, w _i-1 And h _i-1 The width and height of the detection frame of the i-1 th face are the same;

step S12, extracting 5-point face key points landmark for the maximum face in step S11 ₅ 。

In an embodiment of the present invention, the step S2 includes the following steps:

step S21, using the 5-point face key point landmark in step S12 ₅ Carrying out affine transformation operation on the maximum face image by coordinates to obtain a face image with 512x512 resolution;

and S22, filling white into pixel points which do not exist in the affine transformation operation process, and obtaining the standard face image.

In an embodiment of the present invention, the step S3 includes the following steps:

s31, preprocessing an input image, including image pairing, clipping and data enhancement, to construct a training data set;

step S32, designing a multi-task network based on the semantic segmentation and the portrait of the UNet, wherein a basic unit of the network is a residual U block, and constructing a semantic segmentation network and a portrait image network based on the UNet network based on the residual U block;

step S33, designing a loss function for training the network designed in the step S32;

step S34, training a portrait drawing image network by using the training data set;

step S35, inputting the face image with any resolution to be detected into a designed network, and generating a semantic segmentation Mask image and a portrait image by using the trained network.

In one embodiment of the present invention, the step S31 is specifically as follows:

step S311, scaling each image in the training data set into images with the same size of H1 XW 1;

step S312, converting the semantic segmentation label Mask image in the training data set from a black background white portrait to a black background white portrait;

step S313, randomly overturning the face image, the semantic segmentation label Mask image and the portrait picture label image in the training data set up and down, and randomly cutting the face image, the semantic segmentation label Mask image and the portrait picture label image into an H multiplied by W image;

step S314, normalizing the images in the training data set to give an image I _train Calculating normalized imageThe formula of (2) is as follows:

wherein I is _train Is an H W-sized image with 8-bit color depth, I _{bit_max} Is H×W in size and has pixel values of all255 images;

step S315, carrying out standardization processing on the face image, and giving an image I _train Calculating normalized imagesThe formula of (2) is as follows:

wherein I is _train Is an H x W size image with 8 bit color depth,r, G, B channel images, respectively, refer to color images.

In one embodiment of the present invention, the step S32 is specifically as follows:

step S321, a basic unit of the design network is a residual U block RSU, which comprises three parts of contents: a convolution layer, a U-shaped symmetrical coding and decoding structure and a residual error structure; convolutionally layer versus input feature map X _train Convolution obtains local feature f in feature map ₁ (X _train ) The method comprises the steps of carrying out a first treatment on the surface of the The multi-scale context information in the learning feature map of the U-shaped symmetrical coding and decoding structure is marked as U (f) ₁ (X _train ) The encoding and decoding of the U-shaped symmetrical network in the U-shaped symmetrical encoding and decoding structure are spliced by the feature map with the same scale; the output of the residual U block RSU is denoted as f ₁ (X _train )+U(f ₁ (X _train ))；X _train Size HxWxC _in ；

Step S322, a multi-task network based on the semantic segmentation and the portrait of UNet is built based on a residual U block RSU, and the multi-task network comprises four parts of contents: a multitasking shared six-stage encoder part, a semantic segmentation and portrait multitasking five-stage decoding part, a semantic segmentation fusion part and a portrait fusion part.

In one embodiment of the present invention, step S322 is specifically as follows:

step S3221, multitasking shared six-phase encoder section, en ₁ ,En ₂ ,En ₃ ,En ₄ Using residual U blocks RSU-7, RSU-6, RSU-5, RSU-4, en, respectively ₅ ,En ₆ The two encoders are not standard RSU blocks, but replace downsampling operation by adopting hole convolution;

step S3222, semantic segmentation five-stage decoding section, decoding section De ₁ ,De ₂ ,De ₃ ,De ₄ And encoder section En ₁ ,En ₂ ,En ₃ ,En ₄ Completely symmetrical; de ₂ ,De ₃ ,De ₄ ,De ₅ Each decoder characteristic diagram needs to splice the characteristic diagrams of the symmetrical encoders, so that the characteristic diagram is unchanged in size and the number of channels is 2 times;

step S3223, a portrait five-stage decoding part, wherein the structure of the decoding part is identical to that of the semantic segmentation five-stage decoding part, and the difference is that the decoding parameters of the two parts are not shared;

step S3224, semantic segmentation fusion part, firstly, en is activated by a 3x3 convolution layer, an up-sampling layer and Sigmoid ₆ ,De ₅ ,De ₄ ,De ₃ ,De ₂ ,De ₁ Part of the feature map is mapped into semantic segmentation Mask images of each stageAnd all the images are spliced and then pass through a 1x1 convolution layer and a Sigmoid activation function to generate a final semantic segmentation Mask image +.>

Step S3225, portrait fusion section, first activated by a 3x3 convolution layer, upsampling layer, sigmoidEn where the function partitions semantics ₆ ,De ₅ ,De ₄ ,De ₃ ,De ₂ ,De ₁ Part of the feature map is mapped into portrait image of each stageAll the images are spliced and then pass through a convolution layer of 1x1 and a Sigmoid activation function to generate a final portrait image +.>

In one embodiment of the present invention, step S33 is specifically as follows:

step S331, segmenting the semantic Mask image obtained in step S3224 and step S325And portrait picture +.> Respectively constructing a loss function with a standard semantic segmentation Mask label image and a portrait label image, adopting standard binary cross entropy loss, and adopting the following formula:

wherein H, W is the high and wide resolution of the image, r, c is a pixel point on the image, P _G(r,c) Is the value of r, c pixel point, P of the label image _S(r,c) Is predictedThe value of the label image r, c pixel points;

step S332, step S331 is repeatedThe formula is as follows:

wherein,representation->Standard binary cross entropy loss with semantically segmented Mask tag image, < >>Representation->Loss of standard binary cross entropy with portrait label images.

In one embodiment of the present invention, the step S34 is specifically as follows:

step S341, selecting a random image X in the training data set constructed in the step S31;

step S342, training the semantic segmentation network and portrait drawing image network for image encoding and decoding, inputting an image X, and obtaining a final semantic segmentation Mask image through the encoding and decoding network of the UNet networkAnd portrait picture +.> Calculating the total loss function loss +.in step S332>

Step S343, calculating the gradient of each parameter in the UNet network by using a back propagation method, and updating the parameters by using an Adam optimization method;

step S344, step S341 to step S343 are one iteration, iterating 1000 times, and randomly sampling a plurality of image pairs as a batch for training in each iteration process.

In one embodiment of the present invention, step S4 is specifically as follows:

the semantic segmentation Mask image and the portrait image can be obtained through step S35, then the semantic segmentation Mask image and the portrait image are fused, specifically, when the pixel value of the semantic segmentation Mask image is white, the pixel value of the semantic segmentation Mask image is reserved, otherwise, the pixel is replaced by the pixel value of the portrait image, and the formula is as follows:

where r, c are a pixel point on the image,is a semantically segmented Mask image, +.>Is a portrait image, and S (r, c) is a fused artistic portrait.

Compared with the prior art, the invention has the following beneficial effects: aiming at the problem that the existing face portrait generation method requires cleaner background of a face area, the invention aims to generate a vivid portrait, does not limit the image background to be processed at the same time, and is applicable to more scenes; the method of the invention can output high-quality artistic face portrait without depending on the background of the face image.

Drawings

Fig. 1 is a schematic diagram of a method for generating artistic portrait based on UNet according to the present invention.

Fig. 2 is a schematic diagram of a multitasking method for designing UNet-based single-style portraits and semantic segmentation according to the present invention.

Fig. 3 is a diagram of a multi-task model for designing UNet-based single-style portraits and semantic segmentation according to the present invention.

Fig. 4 is a residual U block RSU employed in the present invention.

Fig. 5 is a schematic diagram of an artistic portrait drawing method according to the present invention.

Detailed Description

The technical scheme of the invention is specifically described below with reference to the accompanying drawings.

As shown in fig. 1, the invention provides a method for generating artistic portrait based on UNet, which comprises the following steps:

s1, acquiring an image containing a human face to perform human face positioning including human face detection and human face key point positioning;

s2, preprocessing face images, and obtaining standard face images with 512x512 resolution through face alignment;

s4, fusing the background Mask and the detailed portrait image to obtain an artistic portrait;

further, step S1 includes the steps of:

step S11, carrying out face detection on an image acquisition image containing a face to obtain a maximum face, wherein the formula for obtaining the maximum face is as follows:

wherein w is _i And h _i Is the width and height of the detection frame of the ith face, w _i-1 And h _i-1 Is the detection frame width and height of the i-1 th face.

Step S12, step pairMaximum face extraction 5-point face key point landmark in step S11 ₅ 。

Further, step S2 includes the steps of:

step S21, using the 5-point face key point landmark in step S12 ₅ And carrying out affine transformation operation on the maximum face image by the coordinates to obtain a face image with 512x512 resolution.

Further, as shown in fig. 2, step S3 includes the following steps:

and S31, preprocessing the input image, including image pairing, clipping and data enhancement, to construct a training data set.

Further, step S31 includes the steps of:

step S311, scaling each image in the training data set into an image of the same size of h1×w1.

Step S312, converting the semantic segmentation label Mask image in the training data set from a black background white portrait to a black background white portrait.

Step S313, randomly overturning the face image, the semantic segmentation label Mask image and the face portrait image in the training data set up and down, and randomly cutting the face image, the semantic segmentation label Mask image and the face portrait image into an H multiplied by W image.

Step S314, carrying out normalization processing on the training image. Given image I _train Calculating normalized imageThe formula of (2) is as follows:

wherein I is _train Is an H W-sized image with 8-bit color depth, I _{bit_max} Is an image of h×w size and has pixel values of 255.

Step S315, standardizing the face imageAnd (5) managing. Given image i _train Calculating normalized imagesThe formula of (2) is as follows:

Step S32, as shown in FIG. 3, a multi-task network based on the semantic segmentation and the portrait is designed, the basic unit of the network is a residual U block, and the UNet network is built for the semantic segmentation network and the portrait network based on the residual U block.

Further, step S32 includes the steps of:

step S321, as shown in fig. 4, the basic unit residual U block RSU of the design network includes three parts of contents: convolution layer, U-shaped symmetrical coding and decoding structure and residual structure. Convolutional layer alignment feature map X _train Obtaining local feature f in feature map through convolution layer ₁ (X _train ) The method comprises the steps of carrying out a first treatment on the surface of the The multi-scale context information in the learning feature map of the U-shaped symmetrical encoding and decoding structure can be recorded as U (f) ₁ (X _train ) The encoding and decoding of the U-shaped symmetrical network are spliced with the feature map with the same scale; the output of the residual U block can be denoted as f ₁ (X _train )+U(f ₁ (X _train ))。

Wherein X is _train To input feature graphs, e.g. HxWxC _in 。

Step S322, as shown in fig. 3, builds a multi-task network of semantic segmentation and portrait of UNet based on the basic unit residual U block, which includes four parts of contents: a multitasking shared six-stage encoder part, a semantic segmentation and portrait multitasking five-stage decoding part, a semantic segmentation fusion part and a portrait fusion part.

Further, step S322 includes the steps of:

step S3221, a multitasking shared six-phase encoder section. En is provided with ₁ ,En ₂ ,En ₃ ,En ₄ Residual U blocks RSU-7, RSU-6, RSU-5, RSU-4, en used respectively ₅ ,En ₆ The two encoders are not standard RSU blocks, but instead replace the downsampling operation with a hole convolution.

Step S3222, a semantic segmentation five-stage decoding part. Decoding part De ₁ ,De ₂ ,De ₃ ,De ₄ And encoder section En ₁ ,En ₂ ,En ₃ ,En ₄ Is completely symmetrical. In particular, de ₂ ,De ₃ ,De ₄ ,De ₅ Each decoder feature map needs to splice feature maps of the symmetrical encoder, so that the feature map is unchanged in size and 2 times of channels are realized, and part of the decoder feature maps have more semantic information of different layers.

Step S3223, portrait five-stage decoding section. The decoding part structure is identical to the semantic segmentation five-stage decoding part structure, except that the decoding parameters of the two parts are not shared.

Step S3224, a semantic segmentation fusion part. First En is activated by a 3x3 convolution layer, up-sampling layer, sigmoid activation function ₆ ,De ₅ ,De ₄ ,De ₃ ,De ₂ ,De ₁ Part of the feature map is mapped into semantic segmentation Mask images of each stageAll the components are spliced and then are generated through a 1x1 convolution layer and a Sigmoid activation functionFinal semantic segmentation Mask image +.>

Step S3225, a portrait fusion section. First, the En of semantic segmentation is activated by a 3x3 convolution layer, an up-sampling layer and Sigmoid ₆ ,De ₅ ,De ₄ ,De ₃ ,De ₂ ,De ₁ Part of the feature map is mapped into portrait image of each stageAll the images are spliced and then pass through a convolution layer of 1x1 and a Sigmoid activation function to generate a final portrait image +.>

Step S33, designing a loss function for training the network designed in step S32.

Further, step S33 includes the steps of:

step S331, the semantic segmentation Mask image obtained in step S3224 and step S325 is obtainedAnd portrait picture +.> Respectively constructing a loss function with a standard semantic segmentation Mask label image and a portrait label image, adopting standard binary cross entropy loss, and adopting the following formula:

wherein H, W is the high and wide resolution of the image, r, c is a pixel point on the image, P _G(r,c) R, c-image of label imageThe value of the pixel, P _S(r,c) Is predictedThe value of the pixel point of the label image r, c.

Step S332, step S331 is repeatedThe formula is as follows:

Step S34, training the portrait drawing image network by using the training data set.

Step S341, selecting a random training image X in the data set constructed in the step S31.

Step S342, training the semantic segmentation and portrait drawing network of image coding and decoding. The input image X is encoded and decoded through a UNet network to obtain a final semantic segmentation Mask imageAnd portrait drawing imageCalculating the total loss function loss +.in step S332>

Step S343, calculating the gradient of each parameter in the UNet network by using a back propagation method, and updating the parameters by using an Adam optimization method.

In step S344, the above steps are one iteration of the training process, the whole training process needs 1000 iterations, and in each iteration process, a plurality of image pairs are randomly sampled to train as a batch.

Further, the step S4 specifically includes:

as shown in fig. 5, a semantic segmentation Mask image and a portrait image can be obtained through step S35, and then the semantic segmentation Mask is used for the pixels of the face image, the pixels of the portrait image are used for the pixels of the background, and the background pixels of the semantic segmentation Mask are used for the pixels of the background. The formula is as follows:

where r, c are a pixel point on the image,is a semantically segmented Mask image, +.>Is a portrait image. S (r, c) is the artistic portrait after fusion.

The above is a preferred embodiment of the present invention, and all changes made according to the technical solution of the present invention belong to the protection scope of the present invention when the generated functional effects do not exceed the scope of the technical solution of the present invention.

Claims

1. The method for generating the artistic portrait based on UNet is characterized by comprising the following steps:

2. The UNet-based artistic portrait creation method according to claim 1, wherein step S1 includes the steps of:

3. The method for generating artistic portrait based on UNet according to claim 2, wherein step S2 comprises the steps of:

4. The method for generating artistic portrait based on UNet according to claim 1, wherein step S3 comprises the steps of:

5. The method of generating artistic portrait based on UNet according to claim 4, wherein step S31 is specifically as follows:

wherein I is _train Is an H W-sized image with 8-bit color depth, I _{bit_max} An image of h×w size and having 255 pixel values;

6. The method of generating artistic portrait based on UNet according to claim 4, wherein step S32 is specifically as follows:

step S321, a basic unit of the design network is a residual U block RSU, which comprises three parts of contents: convolution layer, U-shaped symmetrical coding and decoding structure,Residual error structure; convolutionally layer versus input feature map X _train Convolution obtains local feature f in feature map ₁ (X _train ) The method comprises the steps of carrying out a first treatment on the surface of the The multi-scale context information in the learning feature map of the U-shaped symmetrical coding and decoding structure is marked as U (f) ₁ (X _train ) The encoding and decoding of the U-shaped symmetrical network in the U-shaped symmetrical encoding and decoding structure are spliced by the feature map with the same scale; the output of the residual U block RSU is denoted as f ₁ (X _train )+U(f ₁ (X _train ))；X _train Size H xWx C _in ；

7. The method of generating an artistic portrait based on UNet according to claim 6, wherein step S322 is specifically as follows:

step S3224, semantic segmentation fusion partFirst, en is activated by a 3x3 convolution layer, an up-sampling layer, a Sigmoid activation function ₆ ,De ₅ ,De ₄ ,De ₃ ,De ₂ ,De ₁ Part of the feature map is mapped into semantic segmentation Mask images of each stageAnd all the images are spliced and then pass through a 1x1 convolution layer and a Sigmoid activation function to generate a final semantic segmentation Mask image +.>

Step S3225, portrait fusion part, firstly, the En of semantic segmentation is performed through a 3x3 convolution layer, an up-sampling layer and a Sigmoid activation function ₆ ,De ₅ ,De ₄ ,De ₃ ,De ₂ ,De ₁ Part of the feature map is mapped into portrait image of each stageAll the images are spliced and then pass through a convolution layer of 1x1 and a Sigmoid activation function to generate a final portrait image +.>

8. The method for generating artistic portrait based on UNet according to claim 7, wherein step S33 is specifically as follows:

step S332, step S331 is repeatedThe formula is as follows:

wherein,representation->Standard binary cross entropy loss with semantically segmented Mask tag image, < >>Representation ofLoss of standard binary cross entropy with portrait label images.

9. The method for generating artistic portrait based on UNet according to claim 8, wherein step S34 is specifically as follows:

10. The method for generating artistic portrait based on UNet according to claim 4, wherein step S4 is specifically as follows:

the semantic segmentation Mask image and the portrait image can be obtained through step S35, and then the semantic segmentation Mask image and the portrait image are fused, specifically, when the pixel value of the semantic segmentation Mask image is white, the pixel value of the semantic segmentation Mask image is reserved, otherwise, the pixel is replaced by the pixel value of the portrait image, and the formula is as follows: