CN117809350A - Artistic portrait drawing generation method based on UNet - Google Patents
Artistic portrait drawing generation method based on UNet Download PDFInfo
- Publication number
- CN117809350A CN117809350A CN202311871886.2A CN202311871886A CN117809350A CN 117809350 A CN117809350 A CN 117809350A CN 202311871886 A CN202311871886 A CN 202311871886A CN 117809350 A CN117809350 A CN 117809350A
- Authority
- CN
- China
- Prior art keywords
- image
- portrait
- semantic segmentation
- face
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 230000011218 segmentation Effects 0.000 claims abstract description 79
- 238000001514 detection method Methods 0.000 claims abstract description 13
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 35
- 230000006870 function Effects 0.000 claims description 22
- 230000004927 fusion Effects 0.000 claims description 14
- 238000010586 diagram Methods 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 9
- 230000009466 transformation Effects 0.000 claims description 8
- 238000005070 sampling Methods 0.000 claims description 7
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 238000013461 design Methods 0.000 claims description 3
- 230000001815 facial effect Effects 0.000 description 10
- 238000010606 normalization Methods 0.000 description 4
- 238000010422 painting Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 2
- 210000004209 hair Anatomy 0.000 description 2
- 230000003340 mental effect Effects 0.000 description 2
- 229910000831 Steel Inorganic materials 0.000 description 1
- 206010044565 Tremor Diseases 0.000 description 1
- 230000003042 antagnostic effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000004064 recycling Methods 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000010959 steel Substances 0.000 description 1
Landscapes
- Image Processing (AREA)
Abstract
The invention relates to an artistic portrait drawing generation method based on UNet. Comprising the following steps: s1, acquiring an image containing a human face to perform human face positioning including human face detection and human face key point positioning; s2, preprocessing face images, and obtaining standard portrait images with 512x512 resolution through face alignment; s3, outputting semantic segmentation information and portrait images through a UNet network at the same time, wherein the semantic segmentation information comprises segmented portraits and background masks, and the portrait images are fine portrait images; and S4, fusing the background Mask and the detailed portrait image to obtain the artistic portrait. The invention has high automation degree and high portrait restoring precision, and provides technical support for the generation of the portrait of the person.
Description
Technical Field
The invention relates to the field of image processing and computer vision, in particular to an artistic portrait drawing generation method based on UNet.
Background
The artistic portrait generation refers to that a computer generates a portrait with artistic style according to a given face image, and is one of research hotspots of an image generation algorithm. The artistic portrait is very good in favor of protecting privacy and safety, and simultaneously, the artistic portrait can display the character characteristics with fun and harmony. On one hand, people increasingly like to release artistic portrait as an avatar on Internet platforms such as tremble, microblog and the like. In this way, they not only show personal identities to acquaintances, but also protect personal privacy. On the other hand, in order to achieve a high quality portrait, a artist needs to learn and train for a long time and professionally, so as to express facial features, mental aspects and mental quality of a person, the artist needs to grasp key features of the face in a short time in addition to having a solid drawing work, and a realistic artistic portrait can be easily obtained based on mechanical arm painting creation of the artistic portrait.
Prior art 1: wu Tao et al propose a compact line portrait creation method based on semantic segmentation. Firstly, carrying out semantic segmentation on a face image, dividing the face into different areas, extracting edge contours and five-sense organ detail lines based on the different areas, and carrying out edge tangential flow optimization so as to strengthen direction information; on the basis, a line drawing is utilized to generate a reconciliation image, and parameters of a line extraction method are adjusted for different segmentation areas by utilizing the optimized edge tangential flow, a face semantic segmentation result and the reconciliation image, so that line filtering of detail irrelevant areas and line reinforcement of detail important areas are realized, and a concise line portrait is generated.
Disadvantages: the method requires a face image with a cleaner background, and face detail features (such as moles on the face) cannot be extracted.
Prior art 2: ran et al propose apdragwinggan based on generating an antagonizing network. Artists often draw artistic portraits using different drawing techniques for different facial parts, such as hair using long, continuous lines, etc. Apdragwinggan employs a plurality of convolutional neural networks based on the above theory, wherein the generator comprises one global network, six local networks, and one converged network. The global network may generate an overall structure of the face image corresponding to the portrait, while the six local networks may learn different rendering techniques for different facial regions to generate high quality facial components. And finally, fusing the characteristics generated by the global network and the local network through a fusion network to generate a high-quality portrait. Meanwhile, a distance transformation loss function is provided for the problem that the portrait and the original picture do not accurately correspond. This subtle error is tolerated by computing the sum of the pixel on each line in the actual portrait to the nearest pixel distance of the same type in the generated portrait and the sum of the distances of the generated portrait to the actual portrait. Subsequently, ran et al have proposed apdragwinggan++ on the basis of apdragwinggan, demonstrating that any generating countermeasure network with a single generator cannot generate a steel drawing style artistic portrait, and providing a theoretical explanation of the necessity of a composite structure of the global network and the local network. Meanwhile, the distance transformation loss and the line continuity loss based on nonlinear mapping are provided to improve the generation quality of the line.
Disadvantages: the method requires a face image with a cleaner background, and face detail features (such as moles on the face) cannot be extracted.
Prior art 3: qin et al propose a simple and powerful deep network architecture U 2 Net. The model is embedded with a layer of U-shaped structure on the basis of the original UNet, and provides a residual connection RSU structure, so that the model does not need to use an image pre-training backbone model. The RSU structure obtains more context information by fusing the low-level high-resolution local features with the high-level low-resolution global features. Meanwhile, the RSU structure uses pooling operation, so that high resolution can be obtained under the condition that the network layer number is deepened, and meanwhile, the memory occupancy rate and the calculation cost are not increased. Subsequently, the model is applied to the task of converting the face photo into the portrait and is obtainedGood results are achieved.
Disadvantages: the method requires a face image with a cleaner background.
Prior art 4: 201711324257.2-A Portrait drawing system, method and storage medium, which disclose a Portrait drawing method comprising the steps of: the acquisition step: acquiring a portrait picture of a user; and (3) color clustering: performing color clustering on the acquired portrait pictures to obtain user head features, wherein the user head features comprise facial edge areas, hairs, facial contours and facial images; a first portrait drawing step: and controlling the drawing mechanical arm to draw the portrait of the user according to the head characteristics of the user and the portrait style template. The invention also discloses a portrait drawing system and a computer readable storage medium. According to the portrait drawing system and the portrait drawing method, portrait pictures with different painting styles are obtained by obtaining the portrait images of the user and the constructed portrait style template, and portrait drawing of the user is completed through the painting mechanical arm, so that personalized portrait drawing requirements of different people are met.
The method is characterized in that a head characteristic clustering method based on portrait images is used, portrait drawing is completed by using a mechanical arm, the portrait images are required to be cleaner, and a clustering algorithm is adopted.
Prior art 5: 202011431526.7-Portrait generation method, device, electronic equipment and medium, which discloses a Portrait generation method, device, electronic equipment and medium. In the application, an image to be processed can be obtained, a face area in the image to be processed is determined, and face alignment processing is carried out on the face area image to obtain an initial portrait image; obtaining a facial geometric image in the initial portrait image through facial component information of preset times, space self-adaptive normalization and geometric loss function; and eliminating the deformation degree in the facial geometric image by using the relaxed pixel level reconstruction loss to obtain the target portrait image. By applying the technical scheme, the face geometric image in the synthetic artistic portrait can be accurately captured by the new generator by adopting cyclic utilization of face component information and improved space self-adaptive normalization and geometric loss function, and the deformation between the input image and the corresponding target image is eliminated by using the relaxed pixel level reconstruction loss, so that a robust character artistic portrait generating method is formed.
The method for generating the artistic portrait based on the face region by recycling face component information and improved space self-adaptive normalization and geometric loss function is described, the face region image is required to be cleaner, and a multi-task generator focuses on a single-style image main task and a face semantic tag auxiliary task.
Prior art 6: 202110142946.1-sketch generation method, device, electronic equipment and medium, which discloses a sketch generation method, device, electronic equipment and medium. In the application, an initial sketch image and a portrait image can be acquired, wherein the initial sketch image and the portrait image comprise face images of target users; generating a first simple image of the target user by using the initial sketch image, and generating a second simple image of the target user by using the portrait image; a target prime drawing of the face image is generated based on the first and second simple images. By applying the technical scheme, the reasonable and realistic portrait can be generated by utilizing different face stroke types, so that the problem that the simple drawing is not written practically enough when the data sets are not paired is solved.
The method is used for generating a target sketch image of a face image through a first simple image and a second simple image, and the target sketch image method requires that the face area image is cleaner.
Disclosure of Invention
Aiming at the problem that the existing face portrait generation method requires cleaner face area background, the invention aims to generate a vivid portrait, does not limit the image background to be processed at the same time, and is applicable to more scenes, so the invention provides an UNet-based artistic portrait generation method.
In order to achieve the above purpose, the technical scheme of the invention is as follows: an artistic portrait creation method based on UNet includes the following steps:
s1, acquiring an image containing a human face, and carrying out human face positioning including human face detection and human face key point positioning;
s2, preprocessing face images, and obtaining standard portrait images with 512x512 resolution through face alignment;
s3, outputting semantic segmentation information and portrait images through a UNet network at the same time, wherein the semantic segmentation information comprises segmented portraits and background masks, and the portrait images are fine portrait images;
and S4, fusing the background Mask and the detailed portrait image to obtain the artistic portrait.
In an embodiment of the present invention, the step S1 includes the following steps:
step S11, obtaining a maximum face by carrying out face detection on an image containing the face, wherein the formula for obtaining the maximum face is as follows:
wherein w is i And h i Is the width and height of the detection frame of the ith face, w i-1 And h i-1 The width and height of the detection frame of the i-1 th face are the same;
step S12, extracting 5-point face key points landmark for the maximum face in step S11 5 。
In an embodiment of the present invention, the step S2 includes the following steps:
step S21, using the 5-point face key point landmark in step S12 5 Carrying out affine transformation operation on the maximum face image by coordinates to obtain a face image with 512x512 resolution;
and S22, filling white into pixel points which do not exist in the affine transformation operation process, and obtaining the standard face image.
In an embodiment of the present invention, the step S3 includes the following steps:
s31, preprocessing an input image, including image pairing, clipping and data enhancement, to construct a training data set;
step S32, designing a multi-task network based on the semantic segmentation and the portrait of the UNet, wherein a basic unit of the network is a residual U block, and constructing a semantic segmentation network and a portrait image network based on the UNet network based on the residual U block;
step S33, designing a loss function for training the network designed in the step S32;
step S34, training a portrait drawing image network by using the training data set;
step S35, inputting the face image with any resolution to be detected into a designed network, and generating a semantic segmentation Mask image and a portrait image by using the trained network.
In one embodiment of the present invention, the step S31 is specifically as follows:
step S311, scaling each image in the training data set into images with the same size of H1 XW 1;
step S312, converting the semantic segmentation label Mask image in the training data set from a black background white portrait to a black background white portrait;
step S313, randomly overturning the face image, the semantic segmentation label Mask image and the portrait picture label image in the training data set up and down, and randomly cutting the face image, the semantic segmentation label Mask image and the portrait picture label image into an H multiplied by W image;
step S314, normalizing the images in the training data set to give an image I train Calculating normalized imageThe formula of (2) is as follows:
wherein I is train Is an H W-sized image with 8-bit color depth, I bit_max Is H×W in size and has pixel values of all255 images;
step S315, carrying out standardization processing on the face image, and giving an image I train Calculating normalized imagesThe formula of (2) is as follows:
wherein I is train Is an H x W size image with 8 bit color depth,r, G, B channel images, respectively, refer to color images.
In one embodiment of the present invention, the step S32 is specifically as follows:
step S321, a basic unit of the design network is a residual U block RSU, which comprises three parts of contents: a convolution layer, a U-shaped symmetrical coding and decoding structure and a residual error structure; convolutionally layer versus input feature map X train Convolution obtains local feature f in feature map 1 (X train ) The method comprises the steps of carrying out a first treatment on the surface of the The multi-scale context information in the learning feature map of the U-shaped symmetrical coding and decoding structure is marked as U (f) 1 (X train ) The encoding and decoding of the U-shaped symmetrical network in the U-shaped symmetrical encoding and decoding structure are spliced by the feature map with the same scale; the output of the residual U block RSU is denoted as f 1 (X train )+U(f 1 (X train ));X train Size HxWxC in ;
Step S322, a multi-task network based on the semantic segmentation and the portrait of UNet is built based on a residual U block RSU, and the multi-task network comprises four parts of contents: a multitasking shared six-stage encoder part, a semantic segmentation and portrait multitasking five-stage decoding part, a semantic segmentation fusion part and a portrait fusion part.
In one embodiment of the present invention, step S322 is specifically as follows:
step S3221, multitasking shared six-phase encoder section, en 1 ,En 2 ,En 3 ,En 4 Using residual U blocks RSU-7, RSU-6, RSU-5, RSU-4, en, respectively 5 ,En 6 The two encoders are not standard RSU blocks, but replace downsampling operation by adopting hole convolution;
step S3222, semantic segmentation five-stage decoding section, decoding section De 1 ,De 2 ,De 3 ,De 4 And encoder section En 1 ,En 2 ,En 3 ,En 4 Completely symmetrical; de 2 ,De 3 ,De 4 ,De 5 Each decoder characteristic diagram needs to splice the characteristic diagrams of the symmetrical encoders, so that the characteristic diagram is unchanged in size and the number of channels is 2 times;
step S3223, a portrait five-stage decoding part, wherein the structure of the decoding part is identical to that of the semantic segmentation five-stage decoding part, and the difference is that the decoding parameters of the two parts are not shared;
step S3224, semantic segmentation fusion part, firstly, en is activated by a 3x3 convolution layer, an up-sampling layer and Sigmoid 6 ,De 5 ,De 4 ,De 3 ,De 2 ,De 1 Part of the feature map is mapped into semantic segmentation Mask images of each stageAnd all the images are spliced and then pass through a 1x1 convolution layer and a Sigmoid activation function to generate a final semantic segmentation Mask image +.>
Step S3225, portrait fusion section, first activated by a 3x3 convolution layer, upsampling layer, sigmoidEn where the function partitions semantics 6 ,De 5 ,De 4 ,De 3 ,De 2 ,De 1 Part of the feature map is mapped into portrait image of each stageAll the images are spliced and then pass through a convolution layer of 1x1 and a Sigmoid activation function to generate a final portrait image +.>
In one embodiment of the present invention, step S33 is specifically as follows:
step S331, segmenting the semantic Mask image obtained in step S3224 and step S325And portrait picture +.> Respectively constructing a loss function with a standard semantic segmentation Mask label image and a portrait label image, adopting standard binary cross entropy loss, and adopting the following formula:
wherein H, W is the high and wide resolution of the image, r, c is a pixel point on the image, P G(r,c) Is the value of r, c pixel point, P of the label image S(r,c) Is predictedThe value of the label image r, c pixel points;
step S332, step S331 is repeatedThe formula is as follows:
wherein,representation->Standard binary cross entropy loss with semantically segmented Mask tag image, < >>Representation->Loss of standard binary cross entropy with portrait label images.
In one embodiment of the present invention, the step S34 is specifically as follows:
step S341, selecting a random image X in the training data set constructed in the step S31;
step S342, training the semantic segmentation network and portrait drawing image network for image encoding and decoding, inputting an image X, and obtaining a final semantic segmentation Mask image through the encoding and decoding network of the UNet networkAnd portrait picture +.> Calculating the total loss function loss +.in step S332>
Step S343, calculating the gradient of each parameter in the UNet network by using a back propagation method, and updating the parameters by using an Adam optimization method;
step S344, step S341 to step S343 are one iteration, iterating 1000 times, and randomly sampling a plurality of image pairs as a batch for training in each iteration process.
In one embodiment of the present invention, step S4 is specifically as follows:
the semantic segmentation Mask image and the portrait image can be obtained through step S35, then the semantic segmentation Mask image and the portrait image are fused, specifically, when the pixel value of the semantic segmentation Mask image is white, the pixel value of the semantic segmentation Mask image is reserved, otherwise, the pixel is replaced by the pixel value of the portrait image, and the formula is as follows:
where r, c are a pixel point on the image,is a semantically segmented Mask image, +.>Is a portrait image, and S (r, c) is a fused artistic portrait.
Compared with the prior art, the invention has the following beneficial effects: aiming at the problem that the existing face portrait generation method requires cleaner background of a face area, the invention aims to generate a vivid portrait, does not limit the image background to be processed at the same time, and is applicable to more scenes; the method of the invention can output high-quality artistic face portrait without depending on the background of the face image.
Drawings
Fig. 1 is a schematic diagram of a method for generating artistic portrait based on UNet according to the present invention.
Fig. 2 is a schematic diagram of a multitasking method for designing UNet-based single-style portraits and semantic segmentation according to the present invention.
Fig. 3 is a diagram of a multi-task model for designing UNet-based single-style portraits and semantic segmentation according to the present invention.
Fig. 4 is a residual U block RSU employed in the present invention.
Fig. 5 is a schematic diagram of an artistic portrait drawing method according to the present invention.
Detailed Description
The technical scheme of the invention is specifically described below with reference to the accompanying drawings.
As shown in fig. 1, the invention provides a method for generating artistic portrait based on UNet, which comprises the following steps:
s1, acquiring an image containing a human face to perform human face positioning including human face detection and human face key point positioning;
s2, preprocessing face images, and obtaining standard face images with 512x512 resolution through face alignment;
s3, outputting semantic segmentation information and portrait images through a UNet network at the same time, wherein the semantic segmentation information comprises segmented portraits and background masks, and the portrait images are fine portrait images;
s4, fusing the background Mask and the detailed portrait image to obtain an artistic portrait;
further, step S1 includes the steps of:
step S11, carrying out face detection on an image acquisition image containing a face to obtain a maximum face, wherein the formula for obtaining the maximum face is as follows:
wherein w is i And h i Is the width and height of the detection frame of the ith face, w i-1 And h i-1 Is the detection frame width and height of the i-1 th face.
Step S12, step pairMaximum face extraction 5-point face key point landmark in step S11 5 。
Further, step S2 includes the steps of:
step S21, using the 5-point face key point landmark in step S12 5 And carrying out affine transformation operation on the maximum face image by the coordinates to obtain a face image with 512x512 resolution.
And S22, filling white into pixel points which do not exist in the affine transformation operation process, and obtaining the standard face image.
Further, as shown in fig. 2, step S3 includes the following steps:
and S31, preprocessing the input image, including image pairing, clipping and data enhancement, to construct a training data set.
Further, step S31 includes the steps of:
step S311, scaling each image in the training data set into an image of the same size of h1×w1.
Step S312, converting the semantic segmentation label Mask image in the training data set from a black background white portrait to a black background white portrait.
Step S313, randomly overturning the face image, the semantic segmentation label Mask image and the face portrait image in the training data set up and down, and randomly cutting the face image, the semantic segmentation label Mask image and the face portrait image into an H multiplied by W image.
Step S314, carrying out normalization processing on the training image. Given image I train Calculating normalized imageThe formula of (2) is as follows:
wherein I is train Is an H W-sized image with 8-bit color depth, I bit_max Is an image of h×w size and has pixel values of 255.
Step S315, standardizing the face imageAnd (5) managing. Given image i train Calculating normalized imagesThe formula of (2) is as follows:
wherein i is train Is an H x W size image with 8 bit color depth,r, G, B channel images, respectively, refer to color images.
Step S32, as shown in FIG. 3, a multi-task network based on the semantic segmentation and the portrait is designed, the basic unit of the network is a residual U block, and the UNet network is built for the semantic segmentation network and the portrait network based on the residual U block.
Further, step S32 includes the steps of:
step S321, as shown in fig. 4, the basic unit residual U block RSU of the design network includes three parts of contents: convolution layer, U-shaped symmetrical coding and decoding structure and residual structure. Convolutional layer alignment feature map X train Obtaining local feature f in feature map through convolution layer 1 (X train ) The method comprises the steps of carrying out a first treatment on the surface of the The multi-scale context information in the learning feature map of the U-shaped symmetrical encoding and decoding structure can be recorded as U (f) 1 (X train ) The encoding and decoding of the U-shaped symmetrical network are spliced with the feature map with the same scale; the output of the residual U block can be denoted as f 1 (X train )+U(f 1 (X train ))。
Wherein X is train To input feature graphs, e.g. HxWxC in 。
Step S322, as shown in fig. 3, builds a multi-task network of semantic segmentation and portrait of UNet based on the basic unit residual U block, which includes four parts of contents: a multitasking shared six-stage encoder part, a semantic segmentation and portrait multitasking five-stage decoding part, a semantic segmentation fusion part and a portrait fusion part.
Further, step S322 includes the steps of:
step S3221, a multitasking shared six-phase encoder section. En is provided with 1 ,En 2 ,En 3 ,En 4 Residual U blocks RSU-7, RSU-6, RSU-5, RSU-4, en used respectively 5 ,En 6 The two encoders are not standard RSU blocks, but instead replace the downsampling operation with a hole convolution.
Step S3222, a semantic segmentation five-stage decoding part. Decoding part De 1 ,De 2 ,De 3 ,De 4 And encoder section En 1 ,En 2 ,En 3 ,En 4 Is completely symmetrical. In particular, de 2 ,De 3 ,De 4 ,De 5 Each decoder feature map needs to splice feature maps of the symmetrical encoder, so that the feature map is unchanged in size and 2 times of channels are realized, and part of the decoder feature maps have more semantic information of different layers.
Step S3223, portrait five-stage decoding section. The decoding part structure is identical to the semantic segmentation five-stage decoding part structure, except that the decoding parameters of the two parts are not shared.
Step S3224, a semantic segmentation fusion part. First En is activated by a 3x3 convolution layer, up-sampling layer, sigmoid activation function 6 ,De 5 ,De 4 ,De 3 ,De 2 ,De 1 Part of the feature map is mapped into semantic segmentation Mask images of each stageAll the components are spliced and then are generated through a 1x1 convolution layer and a Sigmoid activation functionFinal semantic segmentation Mask image +.>
Step S3225, a portrait fusion section. First, the En of semantic segmentation is activated by a 3x3 convolution layer, an up-sampling layer and Sigmoid 6 ,De 5 ,De 4 ,De 3 ,De 2 ,De 1 Part of the feature map is mapped into portrait image of each stageAll the images are spliced and then pass through a convolution layer of 1x1 and a Sigmoid activation function to generate a final portrait image +.>
Step S33, designing a loss function for training the network designed in step S32.
Further, step S33 includes the steps of:
step S331, the semantic segmentation Mask image obtained in step S3224 and step S325 is obtainedAnd portrait picture +.> Respectively constructing a loss function with a standard semantic segmentation Mask label image and a portrait label image, adopting standard binary cross entropy loss, and adopting the following formula:
wherein H, W is the high and wide resolution of the image, r, c is a pixel point on the image, P G(r,c) R, c-image of label imageThe value of the pixel, P S(r,c) Is predictedThe value of the pixel point of the label image r, c.
Step S332, step S331 is repeatedThe formula is as follows:
wherein,representation->Standard binary cross entropy loss with semantically segmented Mask tag image, < >>Representation->Loss of standard binary cross entropy with portrait label images.
Step S34, training the portrait drawing image network by using the training data set.
Step S341, selecting a random training image X in the data set constructed in the step S31.
Step S342, training the semantic segmentation and portrait drawing network of image coding and decoding. The input image X is encoded and decoded through a UNet network to obtain a final semantic segmentation Mask imageAnd portrait drawing imageCalculating the total loss function loss +.in step S332>
Step S343, calculating the gradient of each parameter in the UNet network by using a back propagation method, and updating the parameters by using an Adam optimization method.
In step S344, the above steps are one iteration of the training process, the whole training process needs 1000 iterations, and in each iteration process, a plurality of image pairs are randomly sampled to train as a batch.
Step S35, inputting the face image with any resolution to be detected into a designed network, and generating a semantic segmentation Mask image and a portrait image by using the trained network.
Further, the step S4 specifically includes:
as shown in fig. 5, a semantic segmentation Mask image and a portrait image can be obtained through step S35, and then the semantic segmentation Mask is used for the pixels of the face image, the pixels of the portrait image are used for the pixels of the background, and the background pixels of the semantic segmentation Mask are used for the pixels of the background. The formula is as follows:
where r, c are a pixel point on the image,is a semantically segmented Mask image, +.>Is a portrait image. S (r, c) is the artistic portrait after fusion.
The above is a preferred embodiment of the present invention, and all changes made according to the technical solution of the present invention belong to the protection scope of the present invention when the generated functional effects do not exceed the scope of the technical solution of the present invention.
Claims (10)
1. The method for generating the artistic portrait based on UNet is characterized by comprising the following steps:
s1, acquiring an image containing a human face, and carrying out human face positioning including human face detection and human face key point positioning;
s2, preprocessing face images, and obtaining standard portrait images with 512x512 resolution through face alignment;
s3, outputting semantic segmentation information and portrait images through a UNet network at the same time, wherein the semantic segmentation information comprises segmented portraits and background masks, and the portrait images are fine portrait images;
and S4, fusing the background Mask and the detailed portrait image to obtain the artistic portrait.
2. The UNet-based artistic portrait creation method according to claim 1, wherein step S1 includes the steps of:
step S11, obtaining a maximum face by carrying out face detection on an image containing the face, wherein the formula for obtaining the maximum face is as follows:
wherein w is i And h i Is the width and height of the detection frame of the ith face, w i-1 And h i-1 The width and height of the detection frame of the i-1 th face are the same;
step S12, extracting 5-point face key points landmark for the maximum face in step S11 5 。
3. The method for generating artistic portrait based on UNet according to claim 2, wherein step S2 comprises the steps of:
step S21, using the 5-point face key point landmark in step S12 5 Carrying out affine transformation operation on the maximum face image by coordinates to obtain a face image with 512x512 resolution;
and S22, filling white into pixel points which do not exist in the affine transformation operation process, and obtaining the standard face image.
4. The method for generating artistic portrait based on UNet according to claim 1, wherein step S3 comprises the steps of:
s31, preprocessing an input image, including image pairing, clipping and data enhancement, to construct a training data set;
step S32, designing a multi-task network based on the semantic segmentation and the portrait of the UNet, wherein a basic unit of the network is a residual U block, and constructing a semantic segmentation network and a portrait image network based on the UNet network based on the residual U block;
step S33, designing a loss function for training the network designed in the step S32;
step S34, training a portrait drawing image network by using the training data set;
step S35, inputting the face image with any resolution to be detected into a designed network, and generating a semantic segmentation Mask image and a portrait image by using the trained network.
5. The method of generating artistic portrait based on UNet according to claim 4, wherein step S31 is specifically as follows:
step S311, scaling each image in the training data set into images with the same size of H1 XW 1;
step S312, converting the semantic segmentation label Mask image in the training data set from a black background white portrait to a black background white portrait;
step S313, randomly overturning the face image, the semantic segmentation label Mask image and the portrait picture label image in the training data set up and down, and randomly cutting the face image, the semantic segmentation label Mask image and the portrait picture label image into an H multiplied by W image;
step S314, normalizing the images in the training data set to give an image I train Calculating normalized imageThe formula of (2) is as follows:
wherein I is train Is an H W-sized image with 8-bit color depth, I bit_max An image of h×w size and having 255 pixel values;
step S315, carrying out standardization processing on the face image, and giving an image I train Calculating normalized imagesThe formula of (2) is as follows:
wherein I is train Is an H x W size image with 8 bit color depth,r, G, B channel images, respectively, refer to color images.
6. The method of generating artistic portrait based on UNet according to claim 4, wherein step S32 is specifically as follows:
step S321, a basic unit of the design network is a residual U block RSU, which comprises three parts of contents: convolution layer, U-shaped symmetrical coding and decoding structure,Residual error structure; convolutionally layer versus input feature map X train Convolution obtains local feature f in feature map 1 (X train ) The method comprises the steps of carrying out a first treatment on the surface of the The multi-scale context information in the learning feature map of the U-shaped symmetrical coding and decoding structure is marked as U (f) 1 (X train ) The encoding and decoding of the U-shaped symmetrical network in the U-shaped symmetrical encoding and decoding structure are spliced by the feature map with the same scale; the output of the residual U block RSU is denoted as f 1 (X train )+U(f 1 (X train ));X train Size H xWx C in ;
Step S322, a multi-task network based on the semantic segmentation and the portrait of UNet is built based on a residual U block RSU, and the multi-task network comprises four parts of contents: a multitasking shared six-stage encoder part, a semantic segmentation and portrait multitasking five-stage decoding part, a semantic segmentation fusion part and a portrait fusion part.
7. The method of generating an artistic portrait based on UNet according to claim 6, wherein step S322 is specifically as follows:
step S3221, multitasking shared six-phase encoder section, en 1 ,En 2 ,En 3 ,En 4 Using residual U blocks RSU-7, RSU-6, RSU-5, RSU-4, en, respectively 5 ,En 6 The two encoders are not standard RSU blocks, but replace downsampling operation by adopting hole convolution;
step S3222, semantic segmentation five-stage decoding section, decoding section De 1 ,De 2 ,De 3 ,De 4 And encoder section En 1 ,En 2 ,En 3 ,En 4 Completely symmetrical; de 2 ,De 3 ,De 4 ,De 5 Each decoder characteristic diagram needs to splice the characteristic diagrams of the symmetrical encoders, so that the characteristic diagram is unchanged in size and the number of channels is 2 times;
step S3223, a portrait five-stage decoding part, wherein the structure of the decoding part is identical to that of the semantic segmentation five-stage decoding part, and the difference is that the decoding parameters of the two parts are not shared;
step S3224, semantic segmentation fusion partFirst, en is activated by a 3x3 convolution layer, an up-sampling layer, a Sigmoid activation function 6 ,De 5 ,De 4 ,De 3 ,De 2 ,De 1 Part of the feature map is mapped into semantic segmentation Mask images of each stageAnd all the images are spliced and then pass through a 1x1 convolution layer and a Sigmoid activation function to generate a final semantic segmentation Mask image +.>
Step S3225, portrait fusion part, firstly, the En of semantic segmentation is performed through a 3x3 convolution layer, an up-sampling layer and a Sigmoid activation function 6 ,De 5 ,De 4 ,De 3 ,De 2 ,De 1 Part of the feature map is mapped into portrait image of each stageAll the images are spliced and then pass through a convolution layer of 1x1 and a Sigmoid activation function to generate a final portrait image +.>
8. The method for generating artistic portrait based on UNet according to claim 7, wherein step S33 is specifically as follows:
step S331, segmenting the semantic Mask image obtained in step S3224 and step S325And portrait picture +.> Respectively constructing a loss function with a standard semantic segmentation Mask label image and a portrait label image, adopting standard binary cross entropy loss, and adopting the following formula:
wherein H, W is the high and wide resolution of the image, r, c is a pixel point on the image, P G(r,c) Is the value of r, c pixel point, P of the label image S(r,c) Is predictedThe value of the label image r, c pixel points;
step S332, step S331 is repeatedThe formula is as follows:
wherein,representation->Standard binary cross entropy loss with semantically segmented Mask tag image, < >>Representation ofLoss of standard binary cross entropy with portrait label images.
9. The method for generating artistic portrait based on UNet according to claim 8, wherein step S34 is specifically as follows:
step S341, selecting a random image X in the training data set constructed in the step S31;
step S342, training the semantic segmentation network and portrait drawing image network for image encoding and decoding, inputting an image X, and obtaining a final semantic segmentation Mask image through the encoding and decoding network of the UNet networkAnd portrait picture +.> Calculating the total loss function loss +.in step S332>
Step S343, calculating the gradient of each parameter in the UNet network by using a back propagation method, and updating the parameters by using an Adam optimization method;
step S344, step S341 to step S343 are one iteration, iterating 1000 times, and randomly sampling a plurality of image pairs as a batch for training in each iteration process.
10. The method for generating artistic portrait based on UNet according to claim 4, wherein step S4 is specifically as follows:
the semantic segmentation Mask image and the portrait image can be obtained through step S35, and then the semantic segmentation Mask image and the portrait image are fused, specifically, when the pixel value of the semantic segmentation Mask image is white, the pixel value of the semantic segmentation Mask image is reserved, otherwise, the pixel is replaced by the pixel value of the portrait image, and the formula is as follows:
where r, c are a pixel point on the image,is a semantically segmented Mask image, +.>Is a portrait image, and S (r, c) is a fused artistic portrait.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311871886.2A CN117809350A (en) | 2023-12-29 | 2023-12-29 | Artistic portrait drawing generation method based on UNet |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311871886.2A CN117809350A (en) | 2023-12-29 | 2023-12-29 | Artistic portrait drawing generation method based on UNet |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117809350A true CN117809350A (en) | 2024-04-02 |
Family
ID=90431901
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311871886.2A Pending CN117809350A (en) | 2023-12-29 | 2023-12-29 | Artistic portrait drawing generation method based on UNet |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117809350A (en) |
-
2023
- 2023-12-29 CN CN202311871886.2A patent/CN117809350A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xiang et al. | Deep learning for image inpainting: A survey | |
Xia et al. | Gan inversion: A survey | |
Liu et al. | Robust single image super-resolution via deep networks with sparse prior | |
CN113793408B (en) | Real-time audio driving face generation method, device and server | |
EP3329463B1 (en) | Method and device for image synthesis | |
CN113822969B (en) | Training neural radiation field model, face generation method, device and server | |
Yu et al. | A unified learning framework for single image super-resolution | |
CN111243050B (en) | Portrait simple drawing figure generation method and system and painting robot | |
Yan et al. | Fine-grained attention and feature-sharing generative adversarial networks for single image super-resolution | |
Liu et al. | Partial convolution for padding, inpainting, and image synthesis | |
CN112766160A (en) | Face replacement method based on multi-stage attribute encoder and attention mechanism | |
Chen et al. | Cross parallax attention network for stereo image super-resolution | |
Coustaty et al. | Towards historical document indexing: extraction of drop cap letters | |
CN112949707B (en) | Cross-modal face image generation method based on multi-scale semantic information supervision | |
CN111797702A (en) | Face counterfeit video detection method based on spatial local binary pattern and optical flow gradient | |
CN113762147A (en) | Facial expression migration method and device, electronic equipment and storage medium | |
CN114245215A (en) | Method, device, electronic equipment, medium and product for generating speaking video | |
Liu et al. | Residual-guided multiscale fusion network for bit-depth enhancement | |
CN117496099A (en) | Three-dimensional image editing method, system, electronic device and storage medium | |
CN112163605A (en) | Multi-domain image translation method based on attention network generation | |
CN117809350A (en) | Artistic portrait drawing generation method based on UNet | |
Hua et al. | Image super resolution using fractal coding and residual network | |
CN112906527B (en) | Finger vein biological key generation method based on deep neural network coding | |
CN115131465A (en) | Identity relationship maintenance-based face anonymous image generation and identification method | |
CN113179156B (en) | Handwritten signature biological key generation method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |