WO2021139557A1

WO2021139557A1 - Portrait stick figure generation method and system, and drawing robot

Info

Publication number: WO2021139557A1
Application number: PCT/CN2020/140335
Authority: WO
Inventors: 朱静洁; 高飞; 李鹏; 俞泽远; 王韬
Original assignee: 杭州未名信科科技有限公司; 浙江省北大信息技术高等研究院
Priority date: 2020-01-08
Filing date: 2020-12-28
Publication date: 2021-07-15
Also published as: CN111243050A; CN111243050B

Abstract

A portrait stick figure generation method and system, and a drawing robot. The portrait stick figure generation method comprises: performing image preprocessing according to a portrait photo to obtain a preprocessed portrait image (S101); and obtaining a stick figure image by means of a convolutional neural network model according to the preprocessed portrait image and a stick figure style photo (S102). According to the method, a high-quality stick figure can be quickly generated from a portrait photo, and the method is applicable to the drawing robot; a portrait stick figure can be drawn in a short time; the problem that stick figure generation methods in the prior art cannot be well applied to drawing robots to draw vivid portrait stick figures is solved.

Description

Method and system for generating stick figures of portrait and painting robot

Technical field

This application belongs to the field of image processing technology, and specifically relates to a method, system and painting robot for generating stick figures.

Background technique

With the development of artificial intelligence, more and more people have begun to study the combination of artificial intelligence and art, that is, computational art. On the other hand, artificial intelligence technology is more and more closely related to life, and the field of family companion robots is also booming. Family companion robots not only enter our lives, but also illuminate our spiritual world. At present, family companion robots can perform artistic creations such as cartoons and sketches. Simple strokes using simple elements such as dots and lines are sufficient to express human characteristics vividly, and are more in line with the requirements of robots to draw portraits of lively characters in a short time.

At present, there are relatively few researches on the production of stick figures and their application to robots. In the existing algorithms based on convolutional neural networks, most of them are based on the existing style photos, and the target photos are transformed into Style photos, while preserving the content of the photos. Existing algorithms all replace the content characteristics of style characteristics in a patch-to-patch manner, and each training based on data can only obtain a model that adapts to one style of painting.

Due to the complex content of actual face portraits, there are also differences in the details that face parts need to present, and the current painting robots also have certain limitations, which makes the face portrait simple stroke drawing algorithm based on painting robots face a huge challenge.

Specifically, there are mainly the following difficulties: (1) Many algorithms can generate good style paintings, but the effect of generating portraits with the same style is not satisfactory. The contours of the face are messy with ghosting and the details of the facial features. It cannot be accurately expressed and is completely unsuitable for drawing portraits by drawing robots to extract trajectories. Therefore, in the process of generating simple strokes of portraits based on painting robots, it is a problem that must be solved to retain the identity information of the characters and conform to the drawing of the painting robot. (2) The current style photo conversion is basically based on website-style interaction, and it takes a while to generate a stylized image. For the elderly and children, operating the website is very difficult for them and cannot be used as a souvenir collection. In terms of entertainment and life, drawing robots based on robot interaction and increasing the speed of portrait generation and drawing are also an important aspect.

Therefore, there is an urgent need for a method for generating high-quality and vivid portrait stick figures, and is suitable for drawing by painting robots.

Summary of the invention

The present invention proposes a portrait stick figure generation method, system and painting robot, aiming to solve the problem that the stick figure generation method in the prior art cannot be well applied to the portrait stick figure of the moving image of the drawing robot.

According to the first aspect of the embodiments of the present application, there is provided a method for generating stick figures of portraits, including the following steps:

Perform image preprocessing according to portrait photos to obtain preprocessed portrait images;

According to the preprocessed portrait image and the stick figure style photo, the stick figure image is obtained through the convolutional neural network model. The convolutional neural network model is specifically:

According to the pre-processed portrait image and the stick figure style photo, the high-level semantic features of the pre-processed portrait image and stick figure style photo are obtained through the VGG encoder;

Input high-level semantic features to the adaptive instantiation AdaIN module to obtain statistical features;

Input the statistical features to the decoder to get an image with stick figure style.

Optionally, image preprocessing is performed according to the portrait photo to obtain the preprocessed portrait image, and the image preprocessing specifically includes:

Perform face block diagram and key point detection of facial features based on portrait photos to obtain facial bounding box information and position coordinates of key points of facial features;

Obtain a face-aligned portrait image according to the facial bounding box information and the position coordinates of the key points of the facial features;

Obtain a portrait photo parsing mask image according to the face alignment portrait image;

Analyze the mask image according to the portrait photo to obtain the portrait image without background.

Optionally,

The encoder adopts VGG encoder;

Adaptive instantiation module adopts AdaIN network structure;

The decoder adopts AdaIN network structure.

Optionally, the loss function used for optimization of the convolutional neural network model includes a content loss function, a style loss function, a local sparse loss function, and a consistency loss function.

Optionally, after obtaining the stick figure image through the convolutional neural network model according to the preprocessed portrait image, the method further includes:

The post-processing of the stick figure is performed according to the stick figure image to obtain the final stick figure image suitable for the painting robot.

Optionally, the post-processing of the stick figure includes Gaussian blur processing, adaptive binarization processing, and line expansion processing.

Optionally, the post-processing of the stick figure specifically includes:

Input the stick figure image to the low-pass filter to perform Gaussian blur processing to obtain a Gaussian blurred image;

According to the Gaussian blurred image, the binary image is obtained by the adaptive binarization method of histogram equalization;

Perform line expansion processing according to the binary image to obtain the final stick figure image.

According to the second aspect of the embodiments of the present application, there is provided a system for generating stick figures of portraits, which specifically includes:

Portrait photo preprocessing module: used to perform image preprocessing according to portrait photos to obtain preprocessed portrait images;

Stick figure generation module: It is used to obtain stick figure images through convolutional neural network model according to preprocessed portrait images and stick figure style photos.

Optionally, the portrait photo preprocessing module includes:

Face key point detection model: used to detect facial block diagrams and key points of facial features based on portrait photos to obtain facial bounding box information and position coordinates of key points of facial features;

Face alignment unit: used to obtain face alignment portrait images according to the facial bounding box information and the position coordinates of the key points of facial features;

Face analysis model: used to align portrait images to obtain portrait photo analysis masks;

Image background removal unit: used to parse the mask image according to the portrait photo to obtain the background removal portrait image.

According to a third aspect of the embodiments of the present application, a painting robot is provided, which specifically includes a processor, a communication module, a camera module, and a portrait execution module, wherein the processor can execute the above portrait stick figure generation method.

Using the portrait stick figure generation method, system and painting robot in the embodiments of this application, pre-processed portrait images are obtained by image preprocessing according to portrait photos; then pre-processed portrait images and stick figure style photos are obtained through convolutional neural network models A stick figure image, the convolutional neural network model is: obtain the high-level semantic features of the pre-processed portrait image and stick figure style photo through an encoder according to the pre-processed portrait image and stick figure style photo; input the high-level semantics The feature is sent to the adaptive instantiation module to obtain the statistical feature; the statistical feature is input to the decoder to obtain an image with a stick figure style. This application realizes that high-quality stick figures can be quickly generated from portrait photos, which is suitable for painting robots and can draw portrait stick figures in a short time. This solves the problem that the stick figure generation method of the prior art cannot be well applied to the portrait stick figure of the animated figure drawn by the painting robot.

Description of the drawings

The drawings described here are used to provide a further understanding of the application and constitute a part of the application. The exemplary embodiments and descriptions of the application are used to explain the application, and do not constitute an improper limitation of the application. In the attached picture:

Fig. 1 shows a flow chart of the steps of a method for generating stick figures of portraits according to an embodiment of the present application;

FIG. 2 shows a schematic diagram of a network structure of a deep convolutional neural network model according to an embodiment of the present application;

FIG. 3 shows a schematic diagram of a specific network structure of an encoder and a decoder in a deep convolutional neural network model according to an embodiment of the present application;

Figure 4 shows a schematic structural diagram of a system for generating stick figures of portraits according to an embodiment of the present application;

Fig. 5 shows a schematic diagram of the design process of a system for generating simple strokes of portraits according to another embodiment of the present application.

Detailed ways

In the process of realizing this application, the inventor found that with the continuous development of artificial intelligence technology, painting robots are increasingly used in human life. Portraits are drawn in virtual reality, augmented reality, and robot portrait rendering systems, such as multimedia and personality. It is widely used in entertainment and the Internet. Due to the complex content of actual face portraits, there are also differences in the details that face parts need to present, and current painting robots also have certain limitations, making the simple strokes algorithm of face portraits based on painting robots face huge challenges when applied to painting robots . Therefore, there is an urgent need for a method for generating high-quality and vivid portrait stick figures, and is suitable for drawing by painting robots.

In response to the above problems, an embodiment of the present application provides a method for generating stick figures for portraits. Preprocessed portrait images are obtained by image preprocessing according to portrait photos; then the preprocessed portrait images and stick figures style photos are passed through convolutional nerves. The network model obtains the stick figure image, realizes that the portrait photo can be quickly generated high-quality stick figure, and is suitable for painting robots, which can draw portrait stick figure in a short time. This solves the problem that the stick figure generation method of the prior art cannot be well applied to the portrait stick figure of the animated figure drawn by the painting robot.

Compared with the prior art, this application discloses a multi-style portrait stick figure generation method for painting robots, which can perform face recognition and face cutting operations through portrait photos, and then perform portrait-stick figure style conversion. The details of each part generated by the stick figure generation model used in this application are more abundant.

Specifically, in the process of portrait-sticky stroke style conversion, the stick figure generation model adopted in this application is suitable for multiple stick figure styles, and has robustness to adapt to multiple stick figure styles and retain the details of character identity information;

After the portrait-simplified stroke style conversion, when performing the display, this application integrates the algorithm into the painting robot to quickly generate the portrait of the simple stroke image to meet the needs of family companionship.

In order to make the technical solutions and advantages of the embodiments of the present application clearer, the exemplary embodiments of the present application will be described in further detail below in conjunction with the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, and Not all examples are exhaustive. It should be noted that the embodiments in the application and the features in the embodiments can be combined with each other if there is no conflict.

Example 1

Fig. 1 shows a flow chart of the steps of a method for generating stick figures of portraits according to an embodiment of the present application.

As shown in Figure 1, the method for generating stick figures in this embodiment specifically includes the following steps:

S101: Perform image preprocessing according to the portrait photo to obtain a preprocessed portrait image;

S102: Obtain the stick figure image through the convolutional neural network model according to the preprocessed portrait image and stick figure style photo.

In S101, image preprocessing is performed according to the portrait photo to obtain the preprocessed portrait image, and the image preprocessing specifically includes:

1) According to the portrait photo, the face block diagram and the key points of the facial features are detected, and the information of the face bounding box and the position coordinates of the key points of the facial features are obtained.

Specifically, for a given portrait photo, the face block diagram and key point detection are performed through the face key point prediction model to obtain the face bounding box information of the portrait photo and the corresponding position coordinates of the key points of the facial features. The key points of the facial features are the center of the left eye, the center of the right eye, the tip of the nose and the corner of the mouth.

In this embodiment, 1000 face images are randomly selected from the CelebA and CelebA-HQ datasets, with sizes of 178*218 and 1024*1024, respectively, and data augmentation is performed on these 2000 face images. Gaussian blur, horizontal flip and mirror flip operations to obtain 6000 content images I ^{C in the} training set. Randomly select n images of stick figure style as the style image I ^{S in the} training set. The present invention uses the character photos I ^T taken by mobile phones as the test set.

Specifically, in this embodiment, based on the content of the image I ^T MTCNN for detecting critical points, MTCNN (Multi-task convolutional neural network) are roughly classified into quickly generate candidate window (Candidate Bounding Box) of the P-Net, with high accuracy the candidate The R-Net selected by window filtering and the O-Net three-part network structure that generates the final bounding box and five key points of the face. The five key points finally obtained are the center of the left eye, the center of the right eye, the tip of the nose, the corner of the left mouth, and the right. Mouth corner position: Landmark={p _leye ,p _reye ,p _nose ,p _lmouth ,p _rmouth }.

2) Obtain the face-aligned portrait image according to the facial bounding box information and the position coordinates of the key points of the facial features.

This step belongs to the face alignment step. The position coordinates of the left and right eye centers in the key points of the face are subjected to an affine transformation operation to align the face.

First, calculate the horizontal deviation angle of the center of the two eyes by the value of the vertical axis, and rotate the image to keep the center of the two eyes level; then, the distance between the two eyes is kept fixed by zooming.

Using the key points of the center of the left and right eyes, the two key points are kept in a horizontal position and a fixed distance from the upper boundary of the image through affine transformation and image cutting operations to perform face alignment.

3) Obtain a portrait photo parsing mask image according to the face alignment portrait image.

4) Analyze the mask image according to the portrait photo to obtain the portrait image with the background removed. In this step, the background area in the mask is used to change the color of the portrait photo in its area to white to achieve the background removal operation of the portrait photo.

Specifically, the face-aligned portrait image is detected based on the portrait photo parsing mask method to obtain a labeled parsing mask image M _m×n ={ki _,j =0,1,...,n}, where m×n It is the same size as the detected face image, k _i,j = 0,1,...,n is the category to which each pixel belongs, including categories such as background, face, left and right eyes.

According to the detected background area, the color of the face image is set to white to achieve the background removal operation, and the processed content image I ^{TT is obtained} .

During training, in this embodiment, all images {I ^TT ,I ^S } are uniformly scaled to an equal size of 512 in width, and 256*256 patches are randomly cropped; during testing, all images are uniformly scaled to a width of 512, etc. Proportional size.

In S102, the stick figure image is obtained through the convolutional neural network model according to the preprocessed portrait image and stick figure style photos.

Fig. 2 shows a schematic diagram of a network structure of a deep convolutional neural network model according to an embodiment of the present application.

As shown in Figure 2, the convolutional neural network model is specifically: according to the preprocessed portrait image and the stick figure style photo, the high-level semantic features of the preprocessed portrait image and stick figure style photo are obtained through the VGG encoder;

Among them, the loss function used in the optimization of the convolutional neural network model includes the content loss function, the style loss function, the local sparse loss function, and the consistency loss function.

specific:

Convolutional neural network model generation steps: First, the deep convolutional neural network model is inspired by the AdaIN network structure, and the high-level semantic features of the content image and style image are obtained through the encoder; then, the last feature map in the encoder is used as an adaptive Instantiate the input of the AdaIN (Adaptive Instance Normalization) module, and combine the content features of the preprocessed portrait image obtained by S101 with the style features of the stick figure style photos through learning feature statistics; finally, the statistical features are output inverted after passing through the decoder Converted to image space to get an image with stick figure style.

FIG. 3 shows a schematic diagram of the specific network structure of the encoder and the decoder in the deep convolutional neural network model according to an embodiment of the present application.

As shown in Figure 3, in the encoder, since training an encoder requires a lot of time and computing power, we take the existing VGG network and load its pre-trained model as the encoder, which will preprocess portrait images and The stick figure style photos are put into the VGG encoder, and the coding formula is as follows:

g _c =v(I ^TT ) formula (1)

g _s ＝v(I ^S ) formula (2)

Where v(·) is the VGG encoder with pre-trained model parameters, g _c is the high-level semantic feature obtained by inputting the content image into the VGG encoder, and g _s is the high-level semantic feature obtained by inputting the style image into the VGG encoder.

The network structure of the first few layers of the VGG network of the VGG model, such as the result of Relu4_1, is used as the output feature of the encoder, and the output feature is input into the AdaIN module for learning feature statistics. The learning feature statistics o formula is:

o=AdaIN(g _c ,g _s ) formula (3)

Among them, AdaIN is an adaptive instantiation module, which learns feature statistics through the combination of mean and standard deviation. The specific formula of AdaIN is as follows:

Among them, μ(·) is the mean value of the calculated feature, and σ(·) is the standard deviation of the calculated feature.

The statistical features obtained by adaptively instantiating the AdaIN module are decoded and output inversely converted into image space.

The decoder network structure shown in Figure 3, the decoder is divided into 12 modules (block), the second, seventh, and tenth modules are Upsampling Layer, and the last module is Reflection Padding. And convolution (Convolutional Neural Networks, CNN), the rest of the modules have 3 kinds of operation components: mirror filling, convolution and corrected linear units (Rectified Linear Units, ReLU)).

Get an image with stick figure style through the decoder:

cs=d(o) Formula (5)

Among them, d(·) is the decoder, and cs is the image obtained by the encoder.

Regarding the specific calculation of the loss function: the optimization of the neural network model uses a variety of loss functions to combine. details as follows:

For content loss, the calculation formula of the _{content loss function L content is:}

L _content =||v(cs)-o|| ₂ Formula (6)

Where v(cs) represents the features obtained by inputting the color space image obtained by the decoder to the VGG encoder, o is the feature statistics of the VGG encoder, and ||·|| ₁ represents the calculation between the target feature and the output image feature Euclidean distance.

For style loss, by optimizing the statistical data of the average and standard deviation of the transmission style features, the formula of the _{style loss function L style is:}

Among them, each φ _i (·) means that one layer of VGG-19 is used to calculate the style loss. The embodiment of the present application uses relu1_1, relu2_1, relu3_1, and relu4_1 layer features with equal weights.

For local sparse loss, on the basis of the existing face structure analysis mask, each component is optimized _{separately. The} formula of the local sparse loss function L lsparse is:

L _lsparse =||M′Θ(1-d(o))|| ₁ formula (8)

Where Θ represents the multiplication of the corresponding element points, M'is the tag mask after updating M, and M has n categories in total.

In the embodiment of this application, the areas where the contours of the eyebrows, eyes, glasses, nose, mouth, face and background are extracted are all marked as 0, and the remaining areas are all marked as 1, and M′ _{m×n having the same size as M is obtained.} , The purpose is to sparse the area labeled 1 so that the generated result fits the drawing trajectory of the painting robot more closely.

For the consistency loss, the consistency loss function formula is:

L _consist =||d(AdaIN(g _s ,g _s )-I ^s || ₁ formula (9)

Among them, ||·|| ₁ means calculating the Euclidean distance between the two, and the Euclidean distance makes the map generated by the global generator consistent with the pixels of the stick figure style photo.

Finally, the total loss function of the neural network is:

L=λ ₁ L _content +λ ₂ L _style +λ ₃ L _lsparse +λ ₄ L _consist formula (10)

Among them, λ ₁ , λ ₂ , λ ₃ , and λ ₄ are custom weights.

Example 2

In the second embodiment, after obtaining the stick figure image through the convolutional neural network model according to the preprocessed portrait image and stick figure style photos in S102 of the embodiment 1, the following steps are added:

S103: Perform post-processing of the stick figure according to the stick figure image to obtain a final stick figure image suitable for the painting robot.

Specifically, in S103, the post-processing of the stick figure includes Gaussian blur processing, adaptive binarization processing, and line expansion processing.

Among them, the post-processing of stick figures includes:

In order to reduce the unnecessary and unnecessary edges in the stick figure image, stick figure post-processing realizes the transition optimization from the stick figure generation result to the drawing result of the painting robot.

Specifically, the Gaussian blur operation is first adopted. The Gaussian blur is essentially a low-pass filter, that is, each pixel in the output image is the weighted sum of the corresponding pixel in the original image and the surrounding pixels. The formula of the low-pass filter is:

The Gaussian distribution weight matrix and the original image matrix are used for convolution to obtain a Gaussian blurred image. Because the use of a specified threshold binarization will cause unnecessary dark spots, this embodiment uses histogram equalization (Otsu) adaptive binarization The optimization method finds the optimal threshold and binarizes it. The specific process is as follows:

① Calculate the normalized histogram of the input image, and use p _i ,i=0,1,...,l-1 to represent each component of the histogram;

②For k=0,1,...,l-1, calculate the cumulative sum P ₁ (k) and the cumulative mean m(k);

③Calculate the global average gray value m _G ;

④For k=0,1,...,l-1, calculate the variance between classes

⑤ Obtain the Otsu threshold k ^* , which is the maximum k value. If the maximum value is not unique, use the average of the detected maximum values k to obtain k ^* , thereby obtaining the separability measure η ^* ;

The adaptive binarization method of histogram equalization obtains a binary image with black pixels in the foreground and white pixels in the background.

Finally, the line expansion process is performed according to the calorific value image, and the line expansion formula is:

Where f is our binary image, b is the convolution template, and the value of the template is defined as

And the expansion of the image by b at any position (x, y) is defined as the maximum value of f and the overlapping area b in the image.

After the post-processing operation of the stick figure generation, the stick figure image that the painting robot can draw continuous smooth and non-hollow lines is finally obtained.

Example 3

Fig. 4 shows a schematic structural diagram of a system for generating stick figures of portraits according to an embodiment of the present application.

As shown in Figure 4, a simple stroke generation system based on portrait photos specifically includes:

Portrait photo preprocessing module 10: used to perform image preprocessing according to the portrait photo to obtain a preprocessed portrait image;

The stick figure generating module 20 is used to obtain the stick figure image through the convolutional neural network model according to the preprocessed portrait image and the stick figure style photo.

Specifically, the portrait photo preprocessing module 10 includes:

The portrait stick figure generation system of the embodiment shown in FIG. 5 adds a stick figure post-processing module.

Specifically, the stick figure post-processing module performs stick figure post-processing according to the stick figure image to obtain the final stick figure image. The stick figure post-processing includes Gaussian blur processing, adaptive binarization processing, and line expansion processing.

The portrait stick figure generation method, system and painting robot in the embodiments of this application obtain the pre-processed portrait image by image preprocessing according to the portrait photo; then according to the pre-processed portrait image and stick figure style photo, the convolutional neural network model is used to obtain the simple figure. The stroke image realizes the ability to quickly generate high-quality stick figures from portrait photos, and is suitable for painting robots, which can draw portrait stick figures in a short time. This solves the problem that the stick figure generation method of the prior art cannot be well applied to the portrait stick figure of the animated figure drawn by the painting robot.

Operations such as face recognition and face cutting can be performed through portrait photos, and then portrait-to-stick stroke style conversion. The stick figure generation model used in this application has richer details in each part generated by the stick figure generation model. Specifically, through the feature statistics between the content image and the style image, local sparse constraints, and post-processing, the details of the generated character portrait stick figure are more abundant than the method based on rule generation or direct global generation.

After the portrait-simplified stroke style conversion, when displaying, this application integrates the algorithm into the painting robot to quickly generate portrait images of simple strokes to meet the needs of family companionship.

This embodiment also provides a painting robot, which specifically includes a processor, a communication module, a camera module, and a portrait execution module, wherein the processor can execute the above portrait stick figure generation method.

Based on the same inventive concept, the embodiments of the present application also provide a computer program product. Since the principle of the computer program product to solve the problem is similar to the method provided in the first embodiment of the present application, the implementation of the computer program product can refer to the method The implementation of the repetition will not be repeated.

Those skilled in the art should understand that the embodiments of the present application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.

This application is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of this application. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are generated It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Although the preferred embodiments of the present application have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present application.

Obviously, those skilled in the art can make various changes and modifications to the application without departing from the spirit and scope of the application. In this way, if these modifications and variations of this application fall within the scope of the claims of this application and their equivalent technologies, then this application is also intended to include these modifications and variations.

Claims

A method for generating simple strokes of portraits is characterized in that it comprises the following steps:

Perform image preprocessing according to portrait photos to obtain preprocessed portrait images;

According to the preprocessed portrait image and the stick figure style photo, the stick figure image is obtained through the convolutional neural network model, and the convolutional neural network model is:

Obtain high-level semantic features of the preprocessed portrait image and the stick figure style photo through an encoder according to the preprocessed portrait image and stick figure style photo;

Input the high-level semantic features to the adaptive instantiation module to obtain statistical features;

Input the statistical characteristics to the decoder to obtain an image with a stick figure style.
The method for generating stick figures of portraits according to claim 1, wherein the image preprocessing is performed according to the portrait photos to obtain the preprocessed portrait images, and the image preprocessing specifically comprises:

Perform face block diagram and key point detection of facial features based on portrait photos to obtain facial bounding box information and position coordinates of key points of facial features;

Obtaining a face-aligned portrait image according to the facial bounding box information and the position coordinates of the key points of facial features;

Obtaining a portrait photo parsing mask image according to the face-aligned portrait image;

The mask image is analyzed according to the portrait photo to obtain the portrait image with the background removed.
The method for generating stick figures of portraits according to claim 1, wherein the encoder adopts a VGG encoder;

The adaptive instantiation module adopts the AdaIN network structure;

The decoder adopts the AdaIN network structure.
The method for generating stick figures of portraits according to claim 1, wherein the loss function used for optimization of the convolutional neural network model includes a content loss function, a style loss function, a local sparse loss function, and a consistency loss function.
The method for generating stick figures for portraits according to claim 1, characterized in that, after the stick figure images are obtained through the convolutional neural network model according to the preprocessed portrait images, the method further comprises:

Performing post-processing of the stick figure according to the stick figure image to obtain the final stick figure image suitable for the painting robot.
The method for generating stick figures in portrait according to claim 5, wherein the stick figure post-processing includes Gaussian blur processing, adaptive binarization processing and line expansion processing.
The method for generating stick figures for portraits according to any one of claims 5 or 6, wherein the stick figure post-processing specifically comprises:

Input the stick figure image to a low-pass filter to perform Gaussian blur processing to obtain a Gaussian blurred image;

According to the Gaussian blurred image, the binary image is obtained by the adaptive binarization method of histogram equalization;

Perform line expansion processing according to the binary image to obtain the final stick figure image.
A system for generating simple strokes of portraits, which is characterized in that it specifically includes:

Portrait photo preprocessing module: used to perform image preprocessing according to portrait photos to obtain preprocessed portrait images;

Stick figure generation module: used to obtain stick figure images through the convolutional neural network model according to the preprocessed portrait images and stick figure style photos;

The convolutional neural network model is:

Obtain high-level semantic features of the preprocessed portrait image and the stick figure style photo through a VGG encoder according to the preprocessed portrait image and stick figure style photo;

Input the high-level semantic features to the adaptive instantiation AdaIN module to obtain statistical features;

Input the statistical characteristics to the decoder to obtain an image with a stick figure style.
The portrait stick figure generation system according to claim 8, wherein the portrait photo preprocessing module comprises:

Face key point detection model: used to detect facial block diagrams and key points of facial features based on portrait photos to obtain facial bounding box information and position coordinates of key points of facial features;

Face alignment unit: used to obtain a face alignment portrait image according to the facial bounding box information and the position coordinates of the key points of facial features;

Face analysis model: used to obtain a portrait photo analysis mask according to the face alignment portrait image;

Image background removal unit: used to parse the mask image according to the portrait photo to obtain the background removal portrait image.
A painting robot, which is characterized by comprising: a processor, a communication module, a camera module, and a portrait execution module, wherein the processor can execute a portrait sketch as described in any one of claims 1 to 6; Stroke generation method.