CN112907708B - Face cartoon method, equipment and computer storage medium - Google Patents
Face cartoon method, equipment and computer storage medium Download PDFInfo
- Publication number
- CN112907708B CN112907708B CN202110167873.1A CN202110167873A CN112907708B CN 112907708 B CN112907708 B CN 112907708B CN 202110167873 A CN202110167873 A CN 202110167873A CN 112907708 B CN112907708 B CN 112907708B
- Authority
- CN
- China
- Prior art keywords
- face
- convolution
- feature map
- cartoon
- head
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Processing (AREA)
Abstract
The application discloses a face cartoon method, equipment and a computer storage medium, wherein the method comprises the following steps: extracting a preset number of face key points of an input image by adopting a face detection model based on self-adaptive convolution; extracting a head image based on the face key points; head image segmentation is carried out on the head image based on an improved U-Net network model, and head images are extracted; executing cartoon processing based on the head portrait to generate a cartoon head portrait; and replacing the head portrait in the input image with the cartoon head portrait. The application solves the problem of privacy protection of patients in ultrasonic scanning, realizes the cartoon of the faces of the patients in ultrasonic scanning, and ensures the privacy protection of the patients in the examination.
Description
Technical Field
The present application relates to the field of image processing, and in particular, to a method and apparatus for face cartoon, and a computer storage medium.
Background
When realizing automatic mammary gland ultrasonic scanning, the position of the breast needs to be positioned and shot by using a depth camera, but a face possibly exists in an image shot by the camera, so that a patient cannot well protect personal privacy when performing mammary gland ultrasonic scanning, and the risk of exposure of privacy pictures exists.
Disclosure of Invention
In view of the above, the embodiment of the application provides a face cartoon method, equipment and a computer storage medium, which solve the problem of privacy protection of patients in ultrasonic scanning, realize the cartoon of the faces of the patients in ultrasonic scanning and ensure the privacy protection of the patients in the examination.
The embodiment of the application provides a face cartoon method, which comprises the following steps:
extracting a preset number of face key points of an input image by adopting a face detection model based on self-adaptive convolution;
extracting a head image based on the face key points;
head image segmentation is carried out on the head image based on an improved U-Net network model, and head images are extracted;
executing cartoon processing based on the head portrait to generate a cartoon head portrait;
and replacing the head portrait in the input image with the cartoon head portrait.
In an embodiment, before the step of extracting the preset number of face key points of the input image by using the face detection model based on the adaptive convolution, the method further includes:
creating a face detection model based on adaptive convolution, specifically comprising:
inputting the images in the training set into the face detection model based on the self-adaptive convolution to generate a face detection result;
comparing the face detection result with an image tag and calculating an error;
back-propagating the error, and updating parameters of the face detection model based on the adaptive convolution;
and generating the face detection model based on the self-adaptive convolution until the error meets a preset threshold value.
In an embodiment, the extracting the preset number of face key points of the input image by using the face detection model based on the adaptive convolution includes:
the input images sequentially pass through a first number of preset structure operations to generate a first intermediate feature map;
generating a second intermediate feature map by performing the self-adaptive convolution operation on the first intermediate feature map;
inputting the second intermediate feature map into a full-connection layer, and extracting the preset number of face key points;
the preset structure operation is self-adaptive convolution operation and preset pooling operation.
In an embodiment, the adaptive convolution operation includes:
inputting an input feature map with the size of h multiplied by w multiplied by c into a 3 multiplied by 3 self-adaptive convolution layer, and convolving the input feature map by using the convolution layer adopting a first preset activation function to generate a first convolution result;
inputting the first convolution result into a convolution layer adopting a second preset activation function to carry out convolution operation, and generating a second convolution result;
performing reconstruction operation on the second convolution result to generate an offset domain of 3h multiplied by 3w multiplied by 2;
performing preset interpolation calculation on the input feature map by using the offset domain to generate a first feature map of 3h multiplied by 3w multiplied by c;
the input feature map is subjected to convolution and preset pooling operation in sequence to obtain a weight vector of 1 multiplied by c;
multiplying the first feature map by the weight vector to obtain a second feature map of 3h×3w×c;
inputting the second feature map into a convolution layer with the number of convolution kernels d and the step length of 3 multiplied by 3 to obtain a third feature map with the number of h multiplied by w multiplied by d;
taking the third feature map as an output of the adaptive convolution;
where h is the height of the input feature map, w is the width of the input feature map, and c is the number of channels.
In an embodiment, the calculation formula of the pixel value in the third feature map includes:
wherein y (p) 0 ) Representing a pixel p in said third feature map y 0 Is of the value of x (p 0 +p n +Δp n ) P representing a pixel in the input feature map x 0 +p n +Δp n Value, p n Represents a conventional convolution displacement parameter, w (p n ) Weights representing a conventional convolution, R being p n Expressed as r= { (-1, -1), (-1, 0), (-1, 1), (0, -1), (0, 0), (0, 1), (1, -1), (1, 0), (1, 1) }, Δp n For the offset field, w c Is the weight vector.
In an embodiment, the extracting the head image based on the face keypoints includes:
determining a first face frame of a minimum circumscribed rectangle based on the coordinates of the face key points;
and expanding a preset range in the transverse direction and the longitudinal direction respectively based on the width and the height of the first face frame to generate the head image.
In one embodiment, the improved U-Net network model building process comprises:
the conventional convolution of the encoder portion of the original U-Net network model is replaced with the adaptive convolution.
In one embodiment, the head image segmentation is performed on the head image based on the improved U-Net network model, and the head image extraction comprises:
inputting the head image into the modified U-Net network model;
outputting a probability map corresponding to the head image through calculation of the improved U-Net network model;
converting the probability map into a binary map according to a preset threshold;
and extracting the head portrait according to the binary image.
In order to achieve the above object, there is also provided a computer storage medium having a face cartoon method program stored thereon, which when executed by a processor, implements the steps of any one of the methods described above.
In order to achieve the above object, a face cartoon device is provided, which comprises a memory, a processor and a face cartoon method program stored in the memory and capable of running on the processor, wherein the processor implements any one of the steps of the method when executing the face cartoon method program.
One or more technical solutions provided in the embodiments of the present application at least have the following technical effects or advantages:
extracting a preset number of face key points of an input image by adopting a face detection model based on self-adaptive convolution; by improving the conventional convolution, the self-adaptive convolution improves the learning capacity of the face detection model, so that the accuracy of the face detection model is improved.
Extracting a head image based on the face key points; and extracting head images in the pictures through the key points of the human faces, and providing correct data support for the follow-up head image extraction.
Head image segmentation is carried out on the head image based on an improved U-Net network model, and head images are extracted; the adaptive convolution is added into the U-Net network model to improve the self-adaptive convolution, so that the learning capacity of the U-Net network model is improved, the segmentation effect is improved, and the accuracy of head portrait extraction is ensured.
Executing cartoon processing based on the head portrait to generate a cartoon head portrait; the head portrait is subjected to cartoon processing, so that the face privacy of a patient is ensured not to be revealed.
Replacing the head portrait in the input image with the cartoon head portrait; by displaying the head portrait in the input image in the form of cartoon head portrait instead, the privacy of the face of the patient is protected.
The application solves the problem of privacy protection of patients in ultrasonic scanning, realizes the cartoon of the faces of the patients in ultrasonic scanning, and ensures the privacy protection of the patients in the examination.
Drawings
FIG. 1 is a flowchart of a face cartoonization method according to a first embodiment of the present application;
FIG. 2 is a schematic diagram of a face detection model creation flow in the face cartoonization method of the present application;
fig. 3 is a flowchart illustrating steps for implementing step S110 in the first embodiment of the face cartoonization method of the present application;
FIG. 4 is a schematic diagram of a face detection model according to the face cartoonization method of the present application;
FIG. 5 is a schematic flow chart of an adaptive convolution operation in the face cartoonization method of the present application;
FIG. 6 is a schematic diagram of an adaptive convolution structure in the face cartoonization method of the present application;
fig. 7 is a flowchart illustrating steps for implementing step S120 in the first embodiment of the face cartoonization method of the present application;
FIG. 8 is a schematic diagram of a modified U-Net network model of the face cartoonization method of the present application;
fig. 9 is a flowchart illustrating steps for implementing step S130 in the first embodiment of the face cartoonization method of the present application;
fig. 10 is a schematic diagram of a hardware architecture of a face cartoon method according to an embodiment of the present application.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The main solutions of the embodiments of the present application are: extracting a preset number of face key points of an input image by adopting a face detection model based on self-adaptive convolution; extracting a head image based on the face key points; head image segmentation is carried out on the head image based on an improved U-Net network model, and head images are extracted; executing cartoon processing based on the head portrait to generate a cartoon head portrait; and replacing the head portrait in the input image with the cartoon head portrait. The application solves the problem of privacy protection of patients in ultrasonic scanning, realizes the cartoon of the faces of the patients in ultrasonic scanning, and ensures the privacy protection of the patients in the examination.
In order to better understand the above technical solutions, the following detailed description will refer to the accompanying drawings and specific embodiments.
Referring to fig. 1, fig. 1 is a first embodiment of a face cartoonization method according to the present application, the method includes:
step S110: and extracting the preset number of face key points of the input image by adopting a face detection model based on self-adaptive convolution.
In particular, the adaptive convolution is an improvement over conventional convolution in that the convolution kernel shape and the weights of the different channels can be adaptively changed.
Specifically, the face detection model may be a network structure at least including an input layer, a convolution layer, a full connection layer and an output layer, wherein the convolution layer operates based on adaptive convolution and pooling, and is obtained through training of a training set.
Specifically, in this embodiment, the training set gathers ultrasound scanned images, and the images are labeled by a preset method in a self-defining manner, so that the labeled images have labels that meet a preset standard.
Specifically, in this embodiment, the preset number may be 5, including 2 eye keypoints, 2 mouth corner keypoints, and 1 nose keypoint. The preset number may also be 7, including 2 eye key points, 2 mouth corner key points, 1 nose key point and 2 eyebrow key points, and the preset number may also be other numbers, which are not limited herein and are correspondingly adjusted according to the service requirements.
Step S120: and extracting a head image based on the face key points.
Specifically, according to key points and key point coordinates of an input image extracted by a face detection model, a head image is determined according to a preset method, and the head image at least comprises the extracted key points.
Step S130: and performing head image segmentation on the head image based on the improved U-Net network model, and extracting the head image.
In particular, the U-Net network model was first published in 2015 at MICCAI conference, and the two most important features are U-shaped network structure and layer jump connection, wherein the left half part of the U-shaped network structure is an encoder, and the rear part is a decoder.
In this embodiment, the improved U-Net network model introduces adaptive convolution to the encoder portion of the original U-Net network model to improve the segmentation effect.
Step S140: and executing cartoon processing based on the head portrait to generate a cartoon head portrait.
Specifically, the cartoon treatment is to convert the head portrait into a corresponding cartoon figure so as to protect the privacy of the patient.
Step S150: and replacing the head portrait in the input image with the cartoon head portrait.
Specifically, according to the coordinates of the head image and the binary image of the head image position, the cartoon head image can be embedded into the input image, so that fusion of the cartoon head image and the input image is realized.
In the above embodiment, the following beneficial effects exist: extracting a preset number of face key points of an input image by adopting a face detection model based on self-adaptive convolution; by improving the conventional convolution, the self-adaptive convolution improves the learning capacity of the face detection model, so that the accuracy of the face detection model is improved. Extracting a head image based on the face key points; and extracting head images in the pictures through the key points of the human faces, and providing correct data support for the follow-up head image extraction. Head image segmentation is carried out on the head image based on an improved U-Net network model, and head images are extracted; the adaptive convolution is added into the U-Net network model to improve the self-adaptive convolution, so that the learning capacity of the U-Net network model is improved, the segmentation effect is improved, and the accuracy of head portrait extraction is ensured. Executing cartoon processing based on the head portrait to generate a cartoon head portrait; the head portrait is subjected to cartoon processing, so that the face privacy of a patient is ensured not to be revealed. Replacing the head portrait in the input image with the cartoon head portrait; by displaying the head portrait in the input image in the form of cartoon head portrait instead, the privacy of the face of the patient is protected. The application solves the problem of privacy protection of patients in ultrasonic scanning, realizes the cartoon of the faces of the patients in ultrasonic scanning, and ensures the privacy protection of the patients in the examination.
Referring to fig. 2, fig. 2 is a flow of creating a face detection model in the face cartoonization method according to the present application, before the step of extracting a preset number of face key points of an input image by using the face detection model based on adaptive convolution, the method further includes:
creating a face detection model based on adaptive convolution, specifically comprising:
step S210: and inputting the images in the training set into the face detection model based on the self-adaptive convolution to generate a face detection result.
Specifically, the training set is image data collected according to a preset standard, and is marked according to a preset method, so that the image data in the training set has an image tag.
Step S220: and comparing the face detection result with an image tag and calculating an error.
Step S230: and back-propagating the error and updating parameters of the face detection model.
Specifically, in the back propagation process, the parameters of the face detection model are updated according to the errors of the face detection result and the image labels.
Step S240: and generating the face detection model based on the self-adaptive convolution until the error meets a preset threshold value.
Specifically, the preset threshold is not limited herein, and is dynamically adjusted according to specific requirements.
In the above embodiment, the following beneficial effects exist: the learning and feature extraction capability of the face detection model based on the self-adaptive convolution is enhanced, and the accuracy of the face detection model is improved, so that the accuracy of face key point extraction is ensured.
Referring to fig. 3, fig. 3 is a specific implementation step of step S110 in the first embodiment of the face cartoonization method of the present application, wherein the step of extracting a preset number of face key points of an input image by using a face detection model based on adaptive convolution includes:
step S111: sequentially performing a first number of preset structure operations on the input image to generate a first intermediate feature map;
specifically, in this embodiment, as shown in fig. 4, a schematic structure diagram of a face detection model is shown, where the first number of preset structures may be 4 preset structures, or may be other numbers of preset structures, which are not limited herein, and may be specifically selected according to specific settings.
Specifically, the preset structure operation is an adaptive convolution operation and a preset pooling operation. The preset pooling may be maximum pooling, average pooling operation or other pooling operation, which is not limited herein, and may be specifically selected according to specific settings.
Specifically, in this embodiment, as shown in fig. 4, the input image sequentially passes through the first adaptive convolution, the first max pooling, the second adaptive convolution, the second max pooling, the third adaptive convolution, the third max pooling, the fourth adaptive convolution, and the fourth max pooling, to generate the first intermediate feature map.
Step S112: and generating a second intermediate feature map by carrying out the self-adaptive convolution operation on the first intermediate feature map.
Specifically, in this embodiment, as shown in fig. 4, the first intermediate feature map is subjected to a fifth adaptive convolution to generate a second intermediate feature map.
Step S113: and inputting the second intermediate feature map into a full-connection layer, and extracting the preset number of key points of the personal faces.
Specifically, in this embodiment, as shown in fig. 4, the second intermediate feature map is input into the full connection layer, and face classification information, bounding box information, and key point information may be extracted.
In the above embodiment, the following beneficial effects exist: the face detection model with the self-adaptive convolution has higher capability of learning and extracting features, and the accuracy of the face detection model is greatly improved.
Referring to fig. 5, fig. 5 is a calculation process of the adaptive convolution of the present application, the adaptive convolution operation includes:
step S310: an input feature map with the size of h multiplied by w multiplied by c is input into a 3 multiplied by 3 self-adaptive convolution layer, and the input feature map is convolved by using the convolution layer adopting a first preset activation function, so that a first convolution result is generated.
Specifically, in this embodiment, the first preset activation function may be a pralu, a pralu (Parametric Rectified Linear Unit), which is a modified linear unit with parameters.
It should be noted that the first preset activation function is not limited herein, and the corresponding activation function is selected according to specific requirements.
Step S320: and inputting the first convolution result into a convolution layer adopting a second preset activation function to carry out convolution operation, so as to generate a second convolution result.
Specifically, in this embodiment, the second preset activation function may be Relu, relu (Rectified Linear Unit), which is a modified linear unit.
It should be noted that the second preset activation function is not limited herein, and the corresponding activation function is selected according to specific requirements.
Step S330: and carrying out reconstruction operation on the second convolution result to generate an offset domain of 3h multiplied by 3w multiplied by 2.
Step S340: and carrying out preset interpolation calculation on the input feature map by using the offset domain to generate a first feature map of 3h multiplied by 3w multiplied by c.
Specifically, in this embodiment, the preset interpolation calculation may use a bilinear interpolation method, where bilinear interpolation is a linear interpolation extension of an interpolation function with two variables, and the core idea is to perform linear interpolation in two directions respectively.
It should be noted that the preset interpolation is not limited herein, and the corresponding interpolation method is selected according to specific requirements.
Step S350: and carrying out convolution and preset pooling operation on the input feature map in sequence to obtain a weight vector of 1 multiplied by c.
Specifically, in this embodiment, the preset pooling may be global average pooling (Golbal Average Pooling), and a value may be obtained by adding all pixel values of the feature map and then averaging the pixel values, that is, the value is used to represent the corresponding feature map.
It should be noted that the preset pooling is not limited herein, and may be global maximum pooling, and a corresponding pooling method may be selected according to specific needs.
Step S360: and multiplying the first characteristic diagram by the weight vector to obtain a second characteristic diagram of 3h multiplied by 3w multiplied by c.
Step S370: and inputting the second characteristic diagram into a convolution layer with the number of convolution kernels of d and the step length of 3 multiplied by 3, and obtaining a third characteristic diagram with the number of h multiplied by w multiplied by d.
Step S380: and taking the third characteristic diagram as an output of the adaptive convolution.
Where h is the height of the input feature map, w is the width of the input feature map, and c is the number of channels.
Fig. 6 shows a schematic diagram of an adaptive convolution structure.
In the above embodiment, the following beneficial effects exist: the self-adaptive convolution can adaptively change the shape of the convolution kernel and the weights of different channels, improve the learning and extraction capacity of the features, and ensure the correctness of the face detection model, thereby ensuring the accuracy of head portrait extraction.
In one embodiment, the calculation formula of the pixel value in the third feature map includes:
wherein y (p) 0 ) Representing a pixel p in said third feature map y 0 Is of the value of x (p 0 +p n +Δp n ) P representing a pixel in the input feature map x 0 +p n +Δp n Value, p n Represents a conventional convolution displacement parameter, w (p n ) Weights representing a conventional convolution, R being p n Expressed as r= { (-1, -1), (-1, 0), (-1, 1), (0, -1), (0, 0), (0, 1), (1, -1), (1, 0), (1, 1) }, Δp n For the offset field, w c Is the weight vector.
Specifically, po+pn is the grid domain in FIG. 6, p 0 +p n +Δp n Representing the addition of the grid field to the offset field.
In the above embodiment, the following beneficial effects exist: the calculation formula and the process of the self-adaptive convolution are provided, and the calculation accuracy of the self-adaptive convolution is ensured, so that the learning and feature extraction capabilities of a face detection network and an improved U-Net network model are ensured.
Referring to fig. 7, fig. 7 is a specific implementation step of step S120 in the first embodiment of the face cartoonization method of the present application, where the extracting a head image based on the face key point includes:
step S121: determining a first face frame of a minimum circumscribed rectangle based on the coordinates of the face key points;
specifically, according to the coordinates of the key points of the face, the determined minimum circumscribed rectangle is a first face frame; the minimum bounding rectangle at least comprises all key points of the face, and the minimum bounding rectangle refers to the maximum range of a plurality of two-dimensional shapes (such as points, straight lines and polygons) expressed by coordinates of the key points, namely, the rectangle marked with boundaries by the maximum abscissa, the minimum abscissa, the maximum ordinate and the minimum ordinate in each vertex (the key point of the face) of the given two-dimensional shape.
Step S122: and expanding a preset range in the transverse direction and the longitudinal direction respectively based on the width and the height of the first face frame to generate the head image.
Specifically, the width of the first face frame can be enlarged by 0.3 times based on the width of the first face frame; the height of the first face frame can be enlarged upwards and downwards by 0.3 times based on the height; the multiple is not limited to 0.3, and can be dynamically adjusted according to specific requirements.
In the above embodiment, the following beneficial effects exist: based on the face key points, a first face frame is determined, a head image is generated by expanding a preset range, and the accuracy of the extracted head image is guaranteed, so that the head image segmentation effect is guaranteed.
In one embodiment, the improved U-Net network model building process comprises:
the conventional convolution of the encoder portion of the original U-Net network model is replaced with the adaptive convolution.
Specifically, the conventional convolution of the encoder section in the original U-Net network model is replaced by an adaptive convolution, as shown in FIG. 8, the left section (from the input layer to C9) is the encoder section, and the right section (from C10 to C19) is the decoder section; wherein, C1-C9 correspond to the first adaptive convolution layer-ninth adaptive convolution layer respectively, P1-P4 correspond to the first maximum pooling layer-fourth maximum pooling layer respectively, T1-T4 correspond to the first deconvolution layer-fourth deconvolution layer respectively, C10-C19 correspond to the first regular convolution layer-tenth regular convolution layer respectively, it should be noted that the number of convolution kernels of the tenth regular convolution layer is the number of categories.
In the above embodiment, the following beneficial effects exist: by introducing the self-adaptive network into the U-Net network model, the characteristic learning capacity of the improved U-Net network model is improved, and therefore the head portrait segmentation effect is improved.
Referring to fig. 9, fig. 9 is a specific implementation step of step S130 in the first embodiment of the face cartoonization method of the present application, where the head image is segmented based on the improved U-Net network model, and extracting the head image includes:
step S131: the head image is input to the modified U-Net network model.
Step S132: and outputting a probability map corresponding to the head image through calculation of the improved U-Net network model.
Specifically, the output is subjected to a softmax function to obtain a probability map of the same size as the head image, the value of each pixel in the probability map representing the probability of belonging to the head region.
Step S133: and converting the probability map into a binary map according to a preset threshold value.
Specifically, when the preset threshold is 0.5, a binary image of the head portrait position can be obtained;
specifically, the specific implementation step of converting the probability map into a binary map comprises the following steps:
step S1: acquiring a value (probability of belonging to a head region) of each pixel in the probability map;
step S2: when the pixel value is larger than 0.5, the gray value of the pixel is assigned to 0 or 255;
when the pixel value is smaller than 0.5, the gray value of the pixel is assigned to 255 or 0;
step S3: and displaying according to the gray value of the pixel to generate a binary image.
It should be noted that the preset threshold is not limited herein, and may be dynamically adjusted according to specific requirements.
Step S134: and extracting the head portrait according to the binary image.
Specifically, each pixel on the image in the binary image has only two possible values or gray level states, so that in the binary image, the head portrait is obviously compared with the background, and the accuracy of head portrait extraction is improved.
In the above embodiment, the following beneficial effects exist: the head portrait image is segmented through the improved U-Net network model, so that the segmentation accuracy is improved, and the head portrait extraction accuracy is ensured.
In one embodiment, the step of generating the cartoon head portrait by performing cartoon processing based on the head portrait includes:
and executing the preset cartoon style conversion based on the head portrait to generate a cartoon head portrait.
Specifically, the cartoon type can be realized through a cycleGAN image style conversion network, and the conversion of different cartoon styles can be realized through other conversion networks, so that the cartoon type can be dynamically adjusted according to specific requirements.
In the above embodiment, there are beneficial effects: and the cartoon head portrait is generated by executing cartoon processing on the head portrait, so that the privacy of the face of the patient is ensured to be protected in the checking process.
The application also provides a computer storage medium, wherein the computer storage medium is stored with a human face cartoon method program, and the human face cartoon method program realizes the steps of any one of the methods when being executed by a processor.
The application also provides a face cartoon device which comprises a memory, a processor and a face cartoon method program which is stored in the memory and can run on the processor, wherein the processor realizes any step of the method when executing the face cartoon method program.
The face cartoon device 010 according to the present application includes as shown in fig. 10: at least one processor 012, a memory 011.
The processor 012 may be an integrated circuit chip having signal processing capability. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software form in the processor 012. The processor 012 may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory 011, and the processor 012 reads information in the memory 011 and performs the steps of the above method in combination with its hardware.
It is to be appreciated that memory 011 in embodiments of the present application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (Double data rate SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct memory bus RAM (DRRAM). The memory 011 of the systems and methods described by embodiments of the present application is intended to comprise, without being limited to, these and any other suitable types of memory.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (8)
1. A method for face cartoonization, the method comprising:
extracting a preset number of face key points of an input image by adopting a face detection model based on self-adaptive convolution;
extracting a head image based on the face key points;
head image segmentation is carried out on the head image based on an improved U-Net network model, and head images are extracted;
executing cartoon processing based on the head portrait to generate a cartoon head portrait;
replacing the head portrait in the input image with the cartoon head portrait;
the method for extracting the preset number of face key points of the input image by adopting the face detection model based on the self-adaptive convolution comprises the following steps:
the input images sequentially pass through a first number of preset structure operations to generate a first intermediate feature map;
generating a second intermediate feature map by performing the self-adaptive convolution operation on the first intermediate feature map;
inputting the second intermediate feature map into a full-connection layer, and extracting the preset number of face key points;
wherein the preset structure operation is an adaptive convolution operation and a preset pooling operation;
wherein the adaptive convolution operation comprises:
inputting an input feature map with the size of h multiplied by w multiplied by c into a 3 multiplied by 3 self-adaptive convolution layer, and convolving the input feature map by using the convolution layer adopting a first preset activation function to generate a first convolution result;
inputting the first convolution result into a convolution layer adopting a second preset activation function to carry out convolution operation, and generating a second convolution result;
performing reconstruction operation on the second convolution result to generate an offset domain of 3h multiplied by 3w multiplied by 2;
performing preset interpolation calculation on the input feature map by using the offset domain to generate a first feature map of 3h multiplied by 3w multiplied by c;
the input feature map is subjected to convolution and preset pooling operation in sequence to obtain a weight vector of 1 multiplied by c;
multiplying the first feature map by the weight vector to obtain a second feature map of 3h×3w×c;
inputting the second feature map into a convolution layer with the number of convolution kernels d and the step length of 3 multiplied by 3 to obtain a third feature map with the number of h multiplied by w multiplied by d;
taking the third feature map as an output of the adaptive convolution;
where h is the height of the input feature map, w is the width of the input feature map, and c is the number of channels.
2. The face cartoonization method of claim 1, further comprising, prior to the step of extracting a preset number of face keypoints of the input image using a face detection model based on adaptive convolution:
creating a face detection model based on adaptive convolution, specifically comprising:
inputting the images in the training set into the face detection model based on the self-adaptive convolution to generate a face detection result;
comparing the face detection result with an image tag and calculating an error;
back-propagating the error, and updating parameters of the face detection model based on the adaptive convolution;
and generating the face detection model based on the self-adaptive convolution until the error meets a preset threshold value.
3. The face cartoonization method of claim 1, wherein the calculation formula of the pixel values in the third feature map comprises:
wherein y (p) 0 ) Representing a pixel p in said third feature map y 0 Is used as a reference to the value of (a),representing +.>Value, p n Represents a conventional convolution displacement parameter, w (p n ) Weights representing a conventional convolution, R being p n Expressed as r= { (-1, -1), (-1, 0), (-1, 1), (0, -1), (0, 0), (0, 1), (1, -1), (1, 0), (1, 1) }, p n For the offset field, w c Is the weight vector.
4. The face cartoonization method of claim 1, wherein the extracting a head image based on the face keypoints comprises:
determining a first face frame of a minimum circumscribed rectangle based on the coordinates of the face key points;
and expanding a preset range in the transverse direction and the longitudinal direction respectively based on the width and the height of the first face frame to generate the head image.
5. The face cartoonization method of claim 1, wherein said process of constructing an improved U-Net network model comprises:
the conventional convolution of the encoder portion in the U-Net network model is replaced with the adaptive convolution.
6. The face cartoonization method of claim 1, wherein said head image is head image segmented based on an improved U-Net network model, extracting a head image comprising:
inputting the head image into the modified U-Net network model;
outputting a probability map corresponding to the head image through calculation of the improved U-Net network model;
converting the probability map into a binary map according to a preset threshold;
and extracting the head portrait according to the binary image.
7. A computer storage medium, wherein a face cartoon method program is stored on the computer storage medium, and when the face cartoon method program is executed by a processor, the steps of the face cartoon method according to any one of claims 1-6 are realized.
8. The face cartoon device is characterized by comprising a memory, a processor and a face cartoon method program which is stored in the memory and can run on the processor, wherein the processor realizes the steps of the face cartoon method in any one of claims 1-6 when executing the face cartoon method program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110167873.1A CN112907708B (en) | 2021-02-05 | 2021-02-05 | Face cartoon method, equipment and computer storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110167873.1A CN112907708B (en) | 2021-02-05 | 2021-02-05 | Face cartoon method, equipment and computer storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112907708A CN112907708A (en) | 2021-06-04 |
CN112907708B true CN112907708B (en) | 2023-09-19 |
Family
ID=76123673
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110167873.1A Active CN112907708B (en) | 2021-02-05 | 2021-02-05 | Face cartoon method, equipment and computer storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112907708B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109191476A (en) * | 2018-09-10 | 2019-01-11 | 重庆邮电大学 | The automatic segmentation of Biomedical Image based on U-net network structure |
CN110070483A (en) * | 2019-03-26 | 2019-07-30 | 中山大学 | A kind of portrait cartooning method based on production confrontation network |
CN110111246A (en) * | 2019-05-15 | 2019-08-09 | 北京市商汤科技开发有限公司 | A kind of avatars generation method and device, storage medium |
CN110580726A (en) * | 2019-08-21 | 2019-12-17 | 中山大学 | Dynamic convolution network-based face sketch generation model and method in natural scene |
CN112115860A (en) * | 2020-09-18 | 2020-12-22 | 深圳市威富视界有限公司 | Face key point positioning method and device, computer equipment and storage medium |
-
2021
- 2021-02-05 CN CN202110167873.1A patent/CN112907708B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109191476A (en) * | 2018-09-10 | 2019-01-11 | 重庆邮电大学 | The automatic segmentation of Biomedical Image based on U-net network structure |
CN110070483A (en) * | 2019-03-26 | 2019-07-30 | 中山大学 | A kind of portrait cartooning method based on production confrontation network |
CN110111246A (en) * | 2019-05-15 | 2019-08-09 | 北京市商汤科技开发有限公司 | A kind of avatars generation method and device, storage medium |
CN110580726A (en) * | 2019-08-21 | 2019-12-17 | 中山大学 | Dynamic convolution network-based face sketch generation model and method in natural scene |
CN112115860A (en) * | 2020-09-18 | 2020-12-22 | 深圳市威富视界有限公司 | Face key point positioning method and device, computer equipment and storage medium |
Non-Patent Citations (2)
Title |
---|
Photo Realistic 3D Cartoon Face Modeling Based on Active Shape Model;Zhigeng Pan等;《Transactions on Edutainment II》;299–311 * |
关键人脸轮廓区域卡通风格化生成算法;范林龙 等;《图学学报》;第42卷(第1期);44-51 * |
Also Published As
Publication number | Publication date |
---|---|
CN112907708A (en) | 2021-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11861501B2 (en) | Semantic segmentation method and apparatus for three-dimensional image, terminal, and storage medium | |
CN113807334B (en) | Residual error network-based multi-scale feature fusion crowd density estimation method | |
CN110070124A (en) | A kind of image amplification method and system based on production confrontation network | |
CN107644225A (en) | Pulmonary lesionses recognition methods, device and realization device | |
CN114529459B (en) | Method, system and medium for enhancing image edge | |
CN116994140A (en) | Cultivated land extraction method, device, equipment and medium based on remote sensing image | |
CN112308866B (en) | Image processing method, device, electronic equipment and storage medium | |
JP2019117577A (en) | Program, learning processing method, learning model, data structure, learning device and object recognition device | |
CN113538530B (en) | Ear medical image segmentation method and device, electronic equipment and storage medium | |
CN110956632A (en) | Method and device for automatically detecting pectoralis major region in molybdenum target image | |
CN111681165A (en) | Image processing method, image processing device, computer equipment and computer readable storage medium | |
CN110136052A (en) | A kind of image processing method, device and electronic equipment | |
CN111488930A (en) | Training method of classification network, target detection method and device and electronic equipment | |
US20220076119A1 (en) | Device and method of training a generative neural network | |
CN115631112B (en) | Building contour correction method and device based on deep learning | |
CN112651953A (en) | Image similarity calculation method and device, computer equipment and storage medium | |
CN117115184A (en) | Training method and segmentation method of medical image segmentation model and related products | |
CN115100494A (en) | Identification method, device and equipment of focus image and readable storage medium | |
CN113554656B (en) | Optical remote sensing image example segmentation method and device based on graph neural network | |
CN113077477B (en) | Image vectorization method and device and terminal equipment | |
CN111681236B (en) | Target density estimation method with attention mechanism | |
CN112669204B (en) | Image processing method, training method and device of image processing model | |
CN112907708B (en) | Face cartoon method, equipment and computer storage medium | |
CN115860067B (en) | Method, device, computer equipment and storage medium for generating countermeasure network training | |
CN109934132B (en) | Face recognition method, system and storage medium based on random discarded convolution data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |