CN117809350A - Artistic portrait drawing generation method based on UNet - Google Patents

Artistic portrait drawing generation method based on UNet Download PDF

Info

Publication number
CN117809350A
CN117809350A CN202311871886.2A CN202311871886A CN117809350A CN 117809350 A CN117809350 A CN 117809350A CN 202311871886 A CN202311871886 A CN 202311871886A CN 117809350 A CN117809350 A CN 117809350A
Authority
CN
China
Prior art keywords
image
portrait
semantic segmentation
face
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311871886.2A
Other languages
Chinese (zh)
Inventor
李娟�
吴斌龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou Institute of Technology
Original Assignee
Fuzhou Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou Institute of Technology filed Critical Fuzhou Institute of Technology
Priority to CN202311871886.2A priority Critical patent/CN117809350A/en
Publication of CN117809350A publication Critical patent/CN117809350A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Processing (AREA)

Abstract

The invention relates to an artistic portrait drawing generation method based on UNet. Comprising the following steps: s1, acquiring an image containing a human face to perform human face positioning including human face detection and human face key point positioning; s2, preprocessing face images, and obtaining standard portrait images with 512x512 resolution through face alignment; s3, outputting semantic segmentation information and portrait images through a UNet network at the same time, wherein the semantic segmentation information comprises segmented portraits and background masks, and the portrait images are fine portrait images; and S4, fusing the background Mask and the detailed portrait image to obtain the artistic portrait. The invention has high automation degree and high portrait restoring precision, and provides technical support for the generation of the portrait of the person.

Description

Artistic portrait drawing generation method based on UNet
Technical Field
The invention relates to the field of image processing and computer vision, in particular to an artistic portrait drawing generation method based on UNet.
Background
The artistic portrait generation refers to that a computer generates a portrait with artistic style according to a given face image, and is one of research hotspots of an image generation algorithm. The artistic portrait is very good in favor of protecting privacy and safety, and simultaneously, the artistic portrait can display the character characteristics with fun and harmony. On one hand, people increasingly like to release artistic portrait as an avatar on Internet platforms such as tremble, microblog and the like. In this way, they not only show personal identities to acquaintances, but also protect personal privacy. On the other hand, in order to achieve a high quality portrait, a artist needs to learn and train for a long time and professionally, so as to express facial features, mental aspects and mental quality of a person, the artist needs to grasp key features of the face in a short time in addition to having a solid drawing work, and a realistic artistic portrait can be easily obtained based on mechanical arm painting creation of the artistic portrait.
Prior art 1: wu Tao et al propose a compact line portrait creation method based on semantic segmentation. Firstly, carrying out semantic segmentation on a face image, dividing the face into different areas, extracting edge contours and five-sense organ detail lines based on the different areas, and carrying out edge tangential flow optimization so as to strengthen direction information; on the basis, a line drawing is utilized to generate a reconciliation image, and parameters of a line extraction method are adjusted for different segmentation areas by utilizing the optimized edge tangential flow, a face semantic segmentation result and the reconciliation image, so that line filtering of detail irrelevant areas and line reinforcement of detail important areas are realized, and a concise line portrait is generated.
Disadvantages: the method requires a face image with a cleaner background, and face detail features (such as moles on the face) cannot be extracted.
Prior art 2: ran et al propose apdragwinggan based on generating an antagonizing network. Artists often draw artistic portraits using different drawing techniques for different facial parts, such as hair using long, continuous lines, etc. Apdragwinggan employs a plurality of convolutional neural networks based on the above theory, wherein the generator comprises one global network, six local networks, and one converged network. The global network may generate an overall structure of the face image corresponding to the portrait, while the six local networks may learn different rendering techniques for different facial regions to generate high quality facial components. And finally, fusing the characteristics generated by the global network and the local network through a fusion network to generate a high-quality portrait. Meanwhile, a distance transformation loss function is provided for the problem that the portrait and the original picture do not accurately correspond. This subtle error is tolerated by computing the sum of the pixel on each line in the actual portrait to the nearest pixel distance of the same type in the generated portrait and the sum of the distances of the generated portrait to the actual portrait. Subsequently, ran et al have proposed apdragwinggan++ on the basis of apdragwinggan, demonstrating that any generating countermeasure network with a single generator cannot generate a steel drawing style artistic portrait, and providing a theoretical explanation of the necessity of a composite structure of the global network and the local network. Meanwhile, the distance transformation loss and the line continuity loss based on nonlinear mapping are provided to improve the generation quality of the line.
Disadvantages: the method requires a face image with a cleaner background, and face detail features (such as moles on the face) cannot be extracted.
Prior art 3: qin et al propose a simple and powerful deep network architecture U 2 Net. The model is embedded with a layer of U-shaped structure on the basis of the original UNet, and provides a residual connection RSU structure, so that the model does not need to use an image pre-training backbone model. The RSU structure obtains more context information by fusing the low-level high-resolution local features with the high-level low-resolution global features. Meanwhile, the RSU structure uses pooling operation, so that high resolution can be obtained under the condition that the network layer number is deepened, and meanwhile, the memory occupancy rate and the calculation cost are not increased. Subsequently, the model is applied to the task of converting the face photo into the portrait and is obtainedGood results are achieved.
Disadvantages: the method requires a face image with a cleaner background.
Prior art 4: 201711324257.2-A Portrait drawing system, method and storage medium, which disclose a Portrait drawing method comprising the steps of: the acquisition step: acquiring a portrait picture of a user; and (3) color clustering: performing color clustering on the acquired portrait pictures to obtain user head features, wherein the user head features comprise facial edge areas, hairs, facial contours and facial images; a first portrait drawing step: and controlling the drawing mechanical arm to draw the portrait of the user according to the head characteristics of the user and the portrait style template. The invention also discloses a portrait drawing system and a computer readable storage medium. According to the portrait drawing system and the portrait drawing method, portrait pictures with different painting styles are obtained by obtaining the portrait images of the user and the constructed portrait style template, and portrait drawing of the user is completed through the painting mechanical arm, so that personalized portrait drawing requirements of different people are met.
The method is characterized in that a head characteristic clustering method based on portrait images is used, portrait drawing is completed by using a mechanical arm, the portrait images are required to be cleaner, and a clustering algorithm is adopted.
Prior art 5: 202011431526.7-Portrait generation method, device, electronic equipment and medium, which discloses a Portrait generation method, device, electronic equipment and medium. In the application, an image to be processed can be obtained, a face area in the image to be processed is determined, and face alignment processing is carried out on the face area image to obtain an initial portrait image; obtaining a facial geometric image in the initial portrait image through facial component information of preset times, space self-adaptive normalization and geometric loss function; and eliminating the deformation degree in the facial geometric image by using the relaxed pixel level reconstruction loss to obtain the target portrait image. By applying the technical scheme, the face geometric image in the synthetic artistic portrait can be accurately captured by the new generator by adopting cyclic utilization of face component information and improved space self-adaptive normalization and geometric loss function, and the deformation between the input image and the corresponding target image is eliminated by using the relaxed pixel level reconstruction loss, so that a robust character artistic portrait generating method is formed.
The method for generating the artistic portrait based on the face region by recycling face component information and improved space self-adaptive normalization and geometric loss function is described, the face region image is required to be cleaner, and a multi-task generator focuses on a single-style image main task and a face semantic tag auxiliary task.
Prior art 6: 202110142946.1-sketch generation method, device, electronic equipment and medium, which discloses a sketch generation method, device, electronic equipment and medium. In the application, an initial sketch image and a portrait image can be acquired, wherein the initial sketch image and the portrait image comprise face images of target users; generating a first simple image of the target user by using the initial sketch image, and generating a second simple image of the target user by using the portrait image; a target prime drawing of the face image is generated based on the first and second simple images. By applying the technical scheme, the reasonable and realistic portrait can be generated by utilizing different face stroke types, so that the problem that the simple drawing is not written practically enough when the data sets are not paired is solved.
The method is used for generating a target sketch image of a face image through a first simple image and a second simple image, and the target sketch image method requires that the face area image is cleaner.
Disclosure of Invention
Aiming at the problem that the existing face portrait generation method requires cleaner face area background, the invention aims to generate a vivid portrait, does not limit the image background to be processed at the same time, and is applicable to more scenes, so the invention provides an UNet-based artistic portrait generation method.
In order to achieve the above purpose, the technical scheme of the invention is as follows: an artistic portrait creation method based on UNet includes the following steps:
s1, acquiring an image containing a human face, and carrying out human face positioning including human face detection and human face key point positioning;
s2, preprocessing face images, and obtaining standard portrait images with 512x512 resolution through face alignment;
s3, outputting semantic segmentation information and portrait images through a UNet network at the same time, wherein the semantic segmentation information comprises segmented portraits and background masks, and the portrait images are fine portrait images;
and S4, fusing the background Mask and the detailed portrait image to obtain the artistic portrait.
In an embodiment of the present invention, the step S1 includes the following steps:
step S11, obtaining a maximum face by carrying out face detection on an image containing the face, wherein the formula for obtaining the maximum face is as follows:
wherein w is i And h i Is the width and height of the detection frame of the ith face, w i-1 And h i-1 The width and height of the detection frame of the i-1 th face are the same;
step S12, extracting 5-point face key points landmark for the maximum face in step S11 5
In an embodiment of the present invention, the step S2 includes the following steps:
step S21, using the 5-point face key point landmark in step S12 5 Carrying out affine transformation operation on the maximum face image by coordinates to obtain a face image with 512x512 resolution;
and S22, filling white into pixel points which do not exist in the affine transformation operation process, and obtaining the standard face image.
In an embodiment of the present invention, the step S3 includes the following steps:
s31, preprocessing an input image, including image pairing, clipping and data enhancement, to construct a training data set;
step S32, designing a multi-task network based on the semantic segmentation and the portrait of the UNet, wherein a basic unit of the network is a residual U block, and constructing a semantic segmentation network and a portrait image network based on the UNet network based on the residual U block;
step S33, designing a loss function for training the network designed in the step S32;
step S34, training a portrait drawing image network by using the training data set;
step S35, inputting the face image with any resolution to be detected into a designed network, and generating a semantic segmentation Mask image and a portrait image by using the trained network.
In one embodiment of the present invention, the step S31 is specifically as follows:
step S311, scaling each image in the training data set into images with the same size of H1 XW 1;
step S312, converting the semantic segmentation label Mask image in the training data set from a black background white portrait to a black background white portrait;
step S313, randomly overturning the face image, the semantic segmentation label Mask image and the portrait picture label image in the training data set up and down, and randomly cutting the face image, the semantic segmentation label Mask image and the portrait picture label image into an H multiplied by W image;
step S314, normalizing the images in the training data set to give an image I train Calculating normalized imageThe formula of (2) is as follows:
wherein I is train Is an H W-sized image with 8-bit color depth, I bit_max Is H×W in size and has pixel values of all255 images;
step S315, carrying out standardization processing on the face image, and giving an image I train Calculating normalized imagesThe formula of (2) is as follows:
wherein I is train Is an H x W size image with 8 bit color depth,r, G, B channel images, respectively, refer to color images.
In one embodiment of the present invention, the step S32 is specifically as follows:
step S321, a basic unit of the design network is a residual U block RSU, which comprises three parts of contents: a convolution layer, a U-shaped symmetrical coding and decoding structure and a residual error structure; convolutionally layer versus input feature map X train Convolution obtains local feature f in feature map 1 (X train ) The method comprises the steps of carrying out a first treatment on the surface of the The multi-scale context information in the learning feature map of the U-shaped symmetrical coding and decoding structure is marked as U (f) 1 (X train ) The encoding and decoding of the U-shaped symmetrical network in the U-shaped symmetrical encoding and decoding structure are spliced by the feature map with the same scale; the output of the residual U block RSU is denoted as f 1 (X train )+U(f 1 (X train ));X train Size HxWxC in
Step S322, a multi-task network based on the semantic segmentation and the portrait of UNet is built based on a residual U block RSU, and the multi-task network comprises four parts of contents: a multitasking shared six-stage encoder part, a semantic segmentation and portrait multitasking five-stage decoding part, a semantic segmentation fusion part and a portrait fusion part.
In one embodiment of the present invention, step S322 is specifically as follows:
step S3221, multitasking shared six-phase encoder section, en 1 ,En 2 ,En 3 ,En 4 Using residual U blocks RSU-7, RSU-6, RSU-5, RSU-4, en, respectively 5 ,En 6 The two encoders are not standard RSU blocks, but replace downsampling operation by adopting hole convolution;
step S3222, semantic segmentation five-stage decoding section, decoding section De 1 ,De 2 ,De 3 ,De 4 And encoder section En 1 ,En 2 ,En 3 ,En 4 Completely symmetrical; de 2 ,De 3 ,De 4 ,De 5 Each decoder characteristic diagram needs to splice the characteristic diagrams of the symmetrical encoders, so that the characteristic diagram is unchanged in size and the number of channels is 2 times;
step S3223, a portrait five-stage decoding part, wherein the structure of the decoding part is identical to that of the semantic segmentation five-stage decoding part, and the difference is that the decoding parameters of the two parts are not shared;
step S3224, semantic segmentation fusion part, firstly, en is activated by a 3x3 convolution layer, an up-sampling layer and Sigmoid 6 ,De 5 ,De 4 ,De 3 ,De 2 ,De 1 Part of the feature map is mapped into semantic segmentation Mask images of each stageAnd all the images are spliced and then pass through a 1x1 convolution layer and a Sigmoid activation function to generate a final semantic segmentation Mask image +.>
Step S3225, portrait fusion section, first activated by a 3x3 convolution layer, upsampling layer, sigmoidEn where the function partitions semantics 6 ,De 5 ,De 4 ,De 3 ,De 2 ,De 1 Part of the feature map is mapped into portrait image of each stageAll the images are spliced and then pass through a convolution layer of 1x1 and a Sigmoid activation function to generate a final portrait image +.>
In one embodiment of the present invention, step S33 is specifically as follows:
step S331, segmenting the semantic Mask image obtained in step S3224 and step S325And portrait picture +.> Respectively constructing a loss function with a standard semantic segmentation Mask label image and a portrait label image, adopting standard binary cross entropy loss, and adopting the following formula:
wherein H, W is the high and wide resolution of the image, r, c is a pixel point on the image, P G(r,c) Is the value of r, c pixel point, P of the label image S(r,c) Is predictedThe value of the label image r, c pixel points;
step S332, step S331 is repeatedThe formula is as follows:
wherein,representation->Standard binary cross entropy loss with semantically segmented Mask tag image, < >>Representation->Loss of standard binary cross entropy with portrait label images.
In one embodiment of the present invention, the step S34 is specifically as follows:
step S341, selecting a random image X in the training data set constructed in the step S31;
step S342, training the semantic segmentation network and portrait drawing image network for image encoding and decoding, inputting an image X, and obtaining a final semantic segmentation Mask image through the encoding and decoding network of the UNet networkAnd portrait picture +.> Calculating the total loss function loss +.in step S332>
Step S343, calculating the gradient of each parameter in the UNet network by using a back propagation method, and updating the parameters by using an Adam optimization method;
step S344, step S341 to step S343 are one iteration, iterating 1000 times, and randomly sampling a plurality of image pairs as a batch for training in each iteration process.
In one embodiment of the present invention, step S4 is specifically as follows:
the semantic segmentation Mask image and the portrait image can be obtained through step S35, then the semantic segmentation Mask image and the portrait image are fused, specifically, when the pixel value of the semantic segmentation Mask image is white, the pixel value of the semantic segmentation Mask image is reserved, otherwise, the pixel is replaced by the pixel value of the portrait image, and the formula is as follows:
where r, c are a pixel point on the image,is a semantically segmented Mask image, +.>Is a portrait image, and S (r, c) is a fused artistic portrait.
Compared with the prior art, the invention has the following beneficial effects: aiming at the problem that the existing face portrait generation method requires cleaner background of a face area, the invention aims to generate a vivid portrait, does not limit the image background to be processed at the same time, and is applicable to more scenes; the method of the invention can output high-quality artistic face portrait without depending on the background of the face image.
Drawings
Fig. 1 is a schematic diagram of a method for generating artistic portrait based on UNet according to the present invention.
Fig. 2 is a schematic diagram of a multitasking method for designing UNet-based single-style portraits and semantic segmentation according to the present invention.
Fig. 3 is a diagram of a multi-task model for designing UNet-based single-style portraits and semantic segmentation according to the present invention.
Fig. 4 is a residual U block RSU employed in the present invention.
Fig. 5 is a schematic diagram of an artistic portrait drawing method according to the present invention.
Detailed Description
The technical scheme of the invention is specifically described below with reference to the accompanying drawings.
As shown in fig. 1, the invention provides a method for generating artistic portrait based on UNet, which comprises the following steps:
s1, acquiring an image containing a human face to perform human face positioning including human face detection and human face key point positioning;
s2, preprocessing face images, and obtaining standard face images with 512x512 resolution through face alignment;
s3, outputting semantic segmentation information and portrait images through a UNet network at the same time, wherein the semantic segmentation information comprises segmented portraits and background masks, and the portrait images are fine portrait images;
s4, fusing the background Mask and the detailed portrait image to obtain an artistic portrait;
further, step S1 includes the steps of:
step S11, carrying out face detection on an image acquisition image containing a face to obtain a maximum face, wherein the formula for obtaining the maximum face is as follows:
wherein w is i And h i Is the width and height of the detection frame of the ith face, w i-1 And h i-1 Is the detection frame width and height of the i-1 th face.
Step S12, step pairMaximum face extraction 5-point face key point landmark in step S11 5
Further, step S2 includes the steps of:
step S21, using the 5-point face key point landmark in step S12 5 And carrying out affine transformation operation on the maximum face image by the coordinates to obtain a face image with 512x512 resolution.
And S22, filling white into pixel points which do not exist in the affine transformation operation process, and obtaining the standard face image.
Further, as shown in fig. 2, step S3 includes the following steps:
and S31, preprocessing the input image, including image pairing, clipping and data enhancement, to construct a training data set.
Further, step S31 includes the steps of:
step S311, scaling each image in the training data set into an image of the same size of h1×w1.
Step S312, converting the semantic segmentation label Mask image in the training data set from a black background white portrait to a black background white portrait.
Step S313, randomly overturning the face image, the semantic segmentation label Mask image and the face portrait image in the training data set up and down, and randomly cutting the face image, the semantic segmentation label Mask image and the face portrait image into an H multiplied by W image.
Step S314, carrying out normalization processing on the training image. Given image I train Calculating normalized imageThe formula of (2) is as follows:
wherein I is train Is an H W-sized image with 8-bit color depth, I bit_max Is an image of h×w size and has pixel values of 255.
Step S315, standardizing the face imageAnd (5) managing. Given image i train Calculating normalized imagesThe formula of (2) is as follows:
wherein i is train Is an H x W size image with 8 bit color depth,r, G, B channel images, respectively, refer to color images.
Step S32, as shown in FIG. 3, a multi-task network based on the semantic segmentation and the portrait is designed, the basic unit of the network is a residual U block, and the UNet network is built for the semantic segmentation network and the portrait network based on the residual U block.
Further, step S32 includes the steps of:
step S321, as shown in fig. 4, the basic unit residual U block RSU of the design network includes three parts of contents: convolution layer, U-shaped symmetrical coding and decoding structure and residual structure. Convolutional layer alignment feature map X train Obtaining local feature f in feature map through convolution layer 1 (X train ) The method comprises the steps of carrying out a first treatment on the surface of the The multi-scale context information in the learning feature map of the U-shaped symmetrical encoding and decoding structure can be recorded as U (f) 1 (X train ) The encoding and decoding of the U-shaped symmetrical network are spliced with the feature map with the same scale; the output of the residual U block can be denoted as f 1 (X train )+U(f 1 (X train ))。
Wherein X is train To input feature graphs, e.g. HxWxC in
Step S322, as shown in fig. 3, builds a multi-task network of semantic segmentation and portrait of UNet based on the basic unit residual U block, which includes four parts of contents: a multitasking shared six-stage encoder part, a semantic segmentation and portrait multitasking five-stage decoding part, a semantic segmentation fusion part and a portrait fusion part.
Further, step S322 includes the steps of:
step S3221, a multitasking shared six-phase encoder section. En is provided with 1 ,En 2 ,En 3 ,En 4 Residual U blocks RSU-7, RSU-6, RSU-5, RSU-4, en used respectively 5 ,En 6 The two encoders are not standard RSU blocks, but instead replace the downsampling operation with a hole convolution.
Step S3222, a semantic segmentation five-stage decoding part. Decoding part De 1 ,De 2 ,De 3 ,De 4 And encoder section En 1 ,En 2 ,En 3 ,En 4 Is completely symmetrical. In particular, de 2 ,De 3 ,De 4 ,De 5 Each decoder feature map needs to splice feature maps of the symmetrical encoder, so that the feature map is unchanged in size and 2 times of channels are realized, and part of the decoder feature maps have more semantic information of different layers.
Step S3223, portrait five-stage decoding section. The decoding part structure is identical to the semantic segmentation five-stage decoding part structure, except that the decoding parameters of the two parts are not shared.
Step S3224, a semantic segmentation fusion part. First En is activated by a 3x3 convolution layer, up-sampling layer, sigmoid activation function 6 ,De 5 ,De 4 ,De 3 ,De 2 ,De 1 Part of the feature map is mapped into semantic segmentation Mask images of each stageAll the components are spliced and then are generated through a 1x1 convolution layer and a Sigmoid activation functionFinal semantic segmentation Mask image +.>
Step S3225, a portrait fusion section. First, the En of semantic segmentation is activated by a 3x3 convolution layer, an up-sampling layer and Sigmoid 6 ,De 5 ,De 4 ,De 3 ,De 2 ,De 1 Part of the feature map is mapped into portrait image of each stageAll the images are spliced and then pass through a convolution layer of 1x1 and a Sigmoid activation function to generate a final portrait image +.>
Step S33, designing a loss function for training the network designed in step S32.
Further, step S33 includes the steps of:
step S331, the semantic segmentation Mask image obtained in step S3224 and step S325 is obtainedAnd portrait picture +.> Respectively constructing a loss function with a standard semantic segmentation Mask label image and a portrait label image, adopting standard binary cross entropy loss, and adopting the following formula:
wherein H, W is the high and wide resolution of the image, r, c is a pixel point on the image, P G(r,c) R, c-image of label imageThe value of the pixel, P S(r,c) Is predictedThe value of the pixel point of the label image r, c.
Step S332, step S331 is repeatedThe formula is as follows:
wherein,representation->Standard binary cross entropy loss with semantically segmented Mask tag image, < >>Representation->Loss of standard binary cross entropy with portrait label images.
Step S34, training the portrait drawing image network by using the training data set.
Step S341, selecting a random training image X in the data set constructed in the step S31.
Step S342, training the semantic segmentation and portrait drawing network of image coding and decoding. The input image X is encoded and decoded through a UNet network to obtain a final semantic segmentation Mask imageAnd portrait drawing imageCalculating the total loss function loss +.in step S332>
Step S343, calculating the gradient of each parameter in the UNet network by using a back propagation method, and updating the parameters by using an Adam optimization method.
In step S344, the above steps are one iteration of the training process, the whole training process needs 1000 iterations, and in each iteration process, a plurality of image pairs are randomly sampled to train as a batch.
Step S35, inputting the face image with any resolution to be detected into a designed network, and generating a semantic segmentation Mask image and a portrait image by using the trained network.
Further, the step S4 specifically includes:
as shown in fig. 5, a semantic segmentation Mask image and a portrait image can be obtained through step S35, and then the semantic segmentation Mask is used for the pixels of the face image, the pixels of the portrait image are used for the pixels of the background, and the background pixels of the semantic segmentation Mask are used for the pixels of the background. The formula is as follows:
where r, c are a pixel point on the image,is a semantically segmented Mask image, +.>Is a portrait image. S (r, c) is the artistic portrait after fusion.
The above is a preferred embodiment of the present invention, and all changes made according to the technical solution of the present invention belong to the protection scope of the present invention when the generated functional effects do not exceed the scope of the technical solution of the present invention.

Claims (10)

1. The method for generating the artistic portrait based on UNet is characterized by comprising the following steps:
s1, acquiring an image containing a human face, and carrying out human face positioning including human face detection and human face key point positioning;
s2, preprocessing face images, and obtaining standard portrait images with 512x512 resolution through face alignment;
s3, outputting semantic segmentation information and portrait images through a UNet network at the same time, wherein the semantic segmentation information comprises segmented portraits and background masks, and the portrait images are fine portrait images;
and S4, fusing the background Mask and the detailed portrait image to obtain the artistic portrait.
2. The UNet-based artistic portrait creation method according to claim 1, wherein step S1 includes the steps of:
step S11, obtaining a maximum face by carrying out face detection on an image containing the face, wherein the formula for obtaining the maximum face is as follows:
wherein w is i And h i Is the width and height of the detection frame of the ith face, w i-1 And h i-1 The width and height of the detection frame of the i-1 th face are the same;
step S12, extracting 5-point face key points landmark for the maximum face in step S11 5
3. The method for generating artistic portrait based on UNet according to claim 2, wherein step S2 comprises the steps of:
step S21, using the 5-point face key point landmark in step S12 5 Carrying out affine transformation operation on the maximum face image by coordinates to obtain a face image with 512x512 resolution;
and S22, filling white into pixel points which do not exist in the affine transformation operation process, and obtaining the standard face image.
4. The method for generating artistic portrait based on UNet according to claim 1, wherein step S3 comprises the steps of:
s31, preprocessing an input image, including image pairing, clipping and data enhancement, to construct a training data set;
step S32, designing a multi-task network based on the semantic segmentation and the portrait of the UNet, wherein a basic unit of the network is a residual U block, and constructing a semantic segmentation network and a portrait image network based on the UNet network based on the residual U block;
step S33, designing a loss function for training the network designed in the step S32;
step S34, training a portrait drawing image network by using the training data set;
step S35, inputting the face image with any resolution to be detected into a designed network, and generating a semantic segmentation Mask image and a portrait image by using the trained network.
5. The method of generating artistic portrait based on UNet according to claim 4, wherein step S31 is specifically as follows:
step S311, scaling each image in the training data set into images with the same size of H1 XW 1;
step S312, converting the semantic segmentation label Mask image in the training data set from a black background white portrait to a black background white portrait;
step S313, randomly overturning the face image, the semantic segmentation label Mask image and the portrait picture label image in the training data set up and down, and randomly cutting the face image, the semantic segmentation label Mask image and the portrait picture label image into an H multiplied by W image;
step S314, normalizing the images in the training data set to give an image I train Calculating normalized imageThe formula of (2) is as follows:
wherein I is train Is an H W-sized image with 8-bit color depth, I bit_max An image of h×w size and having 255 pixel values;
step S315, carrying out standardization processing on the face image, and giving an image I train Calculating normalized imagesThe formula of (2) is as follows:
wherein I is train Is an H x W size image with 8 bit color depth,r, G, B channel images, respectively, refer to color images.
6. The method of generating artistic portrait based on UNet according to claim 4, wherein step S32 is specifically as follows:
step S321, a basic unit of the design network is a residual U block RSU, which comprises three parts of contents: convolution layer, U-shaped symmetrical coding and decoding structure,Residual error structure; convolutionally layer versus input feature map X train Convolution obtains local feature f in feature map 1 (X train ) The method comprises the steps of carrying out a first treatment on the surface of the The multi-scale context information in the learning feature map of the U-shaped symmetrical coding and decoding structure is marked as U (f) 1 (X train ) The encoding and decoding of the U-shaped symmetrical network in the U-shaped symmetrical encoding and decoding structure are spliced by the feature map with the same scale; the output of the residual U block RSU is denoted as f 1 (X train )+U(f 1 (X train ));X train Size H xWx C in
Step S322, a multi-task network based on the semantic segmentation and the portrait of UNet is built based on a residual U block RSU, and the multi-task network comprises four parts of contents: a multitasking shared six-stage encoder part, a semantic segmentation and portrait multitasking five-stage decoding part, a semantic segmentation fusion part and a portrait fusion part.
7. The method of generating an artistic portrait based on UNet according to claim 6, wherein step S322 is specifically as follows:
step S3221, multitasking shared six-phase encoder section, en 1 ,En 2 ,En 3 ,En 4 Using residual U blocks RSU-7, RSU-6, RSU-5, RSU-4, en, respectively 5 ,En 6 The two encoders are not standard RSU blocks, but replace downsampling operation by adopting hole convolution;
step S3222, semantic segmentation five-stage decoding section, decoding section De 1 ,De 2 ,De 3 ,De 4 And encoder section En 1 ,En 2 ,En 3 ,En 4 Completely symmetrical; de 2 ,De 3 ,De 4 ,De 5 Each decoder characteristic diagram needs to splice the characteristic diagrams of the symmetrical encoders, so that the characteristic diagram is unchanged in size and the number of channels is 2 times;
step S3223, a portrait five-stage decoding part, wherein the structure of the decoding part is identical to that of the semantic segmentation five-stage decoding part, and the difference is that the decoding parameters of the two parts are not shared;
step S3224, semantic segmentation fusion partFirst, en is activated by a 3x3 convolution layer, an up-sampling layer, a Sigmoid activation function 6 ,De 5 ,De 4 ,De 3 ,De 2 ,De 1 Part of the feature map is mapped into semantic segmentation Mask images of each stageAnd all the images are spliced and then pass through a 1x1 convolution layer and a Sigmoid activation function to generate a final semantic segmentation Mask image +.>
Step S3225, portrait fusion part, firstly, the En of semantic segmentation is performed through a 3x3 convolution layer, an up-sampling layer and a Sigmoid activation function 6 ,De 5 ,De 4 ,De 3 ,De 2 ,De 1 Part of the feature map is mapped into portrait image of each stageAll the images are spliced and then pass through a convolution layer of 1x1 and a Sigmoid activation function to generate a final portrait image +.>
8. The method for generating artistic portrait based on UNet according to claim 7, wherein step S33 is specifically as follows:
step S331, segmenting the semantic Mask image obtained in step S3224 and step S325And portrait picture +.> Respectively constructing a loss function with a standard semantic segmentation Mask label image and a portrait label image, adopting standard binary cross entropy loss, and adopting the following formula:
wherein H, W is the high and wide resolution of the image, r, c is a pixel point on the image, P G(r,c) Is the value of r, c pixel point, P of the label image S(r,c) Is predictedThe value of the label image r, c pixel points;
step S332, step S331 is repeatedThe formula is as follows:
wherein,representation->Standard binary cross entropy loss with semantically segmented Mask tag image, < >>Representation ofLoss of standard binary cross entropy with portrait label images.
9. The method for generating artistic portrait based on UNet according to claim 8, wherein step S34 is specifically as follows:
step S341, selecting a random image X in the training data set constructed in the step S31;
step S342, training the semantic segmentation network and portrait drawing image network for image encoding and decoding, inputting an image X, and obtaining a final semantic segmentation Mask image through the encoding and decoding network of the UNet networkAnd portrait picture +.> Calculating the total loss function loss +.in step S332>
Step S343, calculating the gradient of each parameter in the UNet network by using a back propagation method, and updating the parameters by using an Adam optimization method;
step S344, step S341 to step S343 are one iteration, iterating 1000 times, and randomly sampling a plurality of image pairs as a batch for training in each iteration process.
10. The method for generating artistic portrait based on UNet according to claim 4, wherein step S4 is specifically as follows:
the semantic segmentation Mask image and the portrait image can be obtained through step S35, and then the semantic segmentation Mask image and the portrait image are fused, specifically, when the pixel value of the semantic segmentation Mask image is white, the pixel value of the semantic segmentation Mask image is reserved, otherwise, the pixel is replaced by the pixel value of the portrait image, and the formula is as follows:
where r, c are a pixel point on the image,is a semantically segmented Mask image, +.>Is a portrait image, and S (r, c) is a fused artistic portrait.
CN202311871886.2A 2023-12-29 2023-12-29 Artistic portrait drawing generation method based on UNet Pending CN117809350A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311871886.2A CN117809350A (en) 2023-12-29 2023-12-29 Artistic portrait drawing generation method based on UNet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311871886.2A CN117809350A (en) 2023-12-29 2023-12-29 Artistic portrait drawing generation method based on UNet

Publications (1)

Publication Number Publication Date
CN117809350A true CN117809350A (en) 2024-04-02

Family

ID=90431901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311871886.2A Pending CN117809350A (en) 2023-12-29 2023-12-29 Artistic portrait drawing generation method based on UNet

Country Status (1)

Country Link
CN (1) CN117809350A (en)

Similar Documents

Publication Publication Date Title
Xiang et al. Deep learning for image inpainting: A survey
Xia et al. Gan inversion: A survey
Liu et al. Robust single image super-resolution via deep networks with sparse prior
CN113793408B (en) Real-time audio driving face generation method, device and server
EP3329463B1 (en) Method and device for image synthesis
CN113822969B (en) Training neural radiation field model, face generation method, device and server
Yu et al. A unified learning framework for single image super-resolution
CN111243050B (en) Portrait simple drawing figure generation method and system and painting robot
Yan et al. Fine-grained attention and feature-sharing generative adversarial networks for single image super-resolution
Liu et al. Partial convolution for padding, inpainting, and image synthesis
CN112766160A (en) Face replacement method based on multi-stage attribute encoder and attention mechanism
Chen et al. Cross parallax attention network for stereo image super-resolution
Coustaty et al. Towards historical document indexing: extraction of drop cap letters
CN112949707B (en) Cross-modal face image generation method based on multi-scale semantic information supervision
CN111797702A (en) Face counterfeit video detection method based on spatial local binary pattern and optical flow gradient
CN113762147A (en) Facial expression migration method and device, electronic equipment and storage medium
CN114245215A (en) Method, device, electronic equipment, medium and product for generating speaking video
Liu et al. Residual-guided multiscale fusion network for bit-depth enhancement
CN117496099A (en) Three-dimensional image editing method, system, electronic device and storage medium
CN112163605A (en) Multi-domain image translation method based on attention network generation
CN117809350A (en) Artistic portrait drawing generation method based on UNet
Hua et al. Image super resolution using fractal coding and residual network
CN112906527B (en) Finger vein biological key generation method based on deep neural network coding
CN115131465A (en) Identity relationship maintenance-based face anonymous image generation and identification method
CN113179156B (en) Handwritten signature biological key generation method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination