CN111223164A - Face sketch generating method and device - Google Patents

Face sketch generating method and device Download PDF

Info

Publication number
CN111223164A
CN111223164A CN202010016526.4A CN202010016526A CN111223164A CN 111223164 A CN111223164 A CN 111223164A CN 202010016526 A CN202010016526 A CN 202010016526A CN 111223164 A CN111223164 A CN 111223164A
Authority
CN
China
Prior art keywords
face
model
strokes
sketch
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010016526.4A
Other languages
Chinese (zh)
Other versions
CN111223164B (en
Inventor
高飞
朱静洁
李鹏
俞泽远
王韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced Institute of Information Technology AIIT of Peking University
Hangzhou Weiming Information Technology Co Ltd
Original Assignee
Advanced Institute of Information Technology AIIT of Peking University
Hangzhou Weiming Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced Institute of Information Technology AIIT of Peking University, Hangzhou Weiming Information Technology Co Ltd filed Critical Advanced Institute of Information Technology AIIT of Peking University
Priority to CN202010016526.4A priority Critical patent/CN111223164B/en
Publication of CN111223164A publication Critical patent/CN111223164A/en
Application granted granted Critical
Publication of CN111223164B publication Critical patent/CN111223164B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a device for generating human face simplified strokes, comprising the following steps: cutting out a face image from the image and predicting an attribute category; inputting the face image into a general face portrait synthesis model so that the general face portrait synthesis model synthesizes first face simple strokes; determining a special human face portrait synthetic model required to be used by the attribute category, and inputting a human face image into the special human face portrait synthetic model so as to enable the special human face portrait synthetic model to synthesize a second human face simple stroke; and fusing the first face sketch and the second face sketch. When a general face portrait synthesis model is used for synthesizing first face simple strokes, different special face portrait synthesis models are used for synthesizing second face simple strokes according to different face attribute types, so that the influence of face attribute changes on the synthesis quality of the portrait simple strokes is overcome, the face simple strokes obtained by the first face simple strokes and the second face simple strokes are more accurate, and the personalized requirements of different face attributes are met.

Description

Face sketch generating method and device
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for generating human face sketch lines.
Background
The face image is converted into the simple stroke, and the method has important application value in the public safety field and the digital entertainment field.
In the traditional image processing method, if the effect of the generated simple strokes is good, the operation complexity of the image processing method is required to be high, which is difficult to meet the requirement of real-time performance, and along with the development of the machine learning technology, the image processing technology based on the machine learning has higher operation speed and higher accuracy than the traditional image processing technology, so that a plurality of machine learning models for generating the simple strokes by the human face images are derived.
However, the simple strokes generated by the machine learning models from the face images are greatly influenced by the changes of the face attributes (such as the attributes of face texture characteristics) and the generation effect is poor.
Disclosure of Invention
The invention aims to provide a method and a device for generating human face simple strokes, which aim to overcome the defects of the prior art and are realized by the following technical scheme.
The first aspect of the invention provides a method for generating human face sketch strokes, which comprises the following steps:
cutting out a face image from a received image, and predicting the attribute category of a face in the face image;
inputting the face image into a trained general face portrait synthesis model so that the general face portrait synthesis model synthesizes first face simple strokes of the face image;
determining a trained special human face portrait synthetic model required to be used by the attribute category, and inputting the human face image into the special human face portrait synthetic model so as to enable the special human face portrait synthetic model to synthesize a second human face simple stroke of the human face image;
and fusing the first face sketch and the second face sketch to obtain a third face sketch.
A second aspect of the present invention provides a device for generating human face sketch strokes, the device comprising:
the attribute prediction module is used for cutting out a face image from the received image and predicting the attribute category of the face in the face image;
the general synthesis module is used for inputting the face image into a trained general face portrait synthesis model so as to enable the general face portrait synthesis model to synthesize a first face simple stroke of the face image;
the special synthesis module is used for determining a trained special human face portrait synthesis model required to be used by the attribute category and inputting the human face image into the special human face portrait synthesis model so as to enable the special human face portrait synthesis model to synthesize a second human face simple stroke of the human face image;
and the fusion module is used for fusing the first face sketch and the second face sketch to obtain a third face sketch.
In the embodiment of the invention, when a general face portrait synthesis model is used for synthesizing the first face simple strokes, different special face portrait synthesis models are used for synthesizing the second face simple strokes according to different face attribute types, so that the influence of face attribute changes on the synthesis quality of the portrait simple strokes is overcome, and then the third face simple strokes obtained by fusing the first face simple strokes and the second face simple strokes are more accurate and meet the personalized requirements of different face attributes.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flowchart illustrating an embodiment of a method for generating human face skeleton strokes according to an exemplary embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating segmentation of different regions of a human face according to the present invention;
FIG. 3 is a schematic structural diagram of a face sketch generating system shown in the present invention;
FIG. 4 is a diagram illustrating a hardware configuration of an electronic device in accordance with an exemplary embodiment of the present invention;
fig. 5 is a flowchart illustrating an embodiment of a face sketch generating device according to an exemplary embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The invention provides a human face simple stroke generation method, which aims to overcome the influence of human face attribute change on the synthesis quality of portrait simple strokes, so that high-quality portrait simple strokes with clear and attractive appearance and consistent identity can be synthesized for human face images with any attribute, and the individual requirements of different human face attributes are met.
The following describes the method for generating human face strokes in detail by using a specific embodiment.
Fig. 1 is a flowchart illustrating an embodiment of a face sketch generating method according to an exemplary embodiment of the present invention, where the face sketch generating method may be applied to an electronic device (e.g., a PC, a terminal, a server, etc.). As shown in fig. 1, the method for generating human face sketching includes the following steps:
step 101: cutting out a face image from the received image, and predicting the attribute category of the face in the face image.
In an embodiment, for a process of cutting out a face image from a received image, the image may be input into a trained face detection model, so that the face detection model detects a face in the image, predicts a position of a face key point, performs affine transformation on the image according to the position of the face key point to correct the face in the image, and finally cuts out a face image with a set size from the image after affine transformation.
The face key points may include key positions such as a left eye center, a right eye center, a nose tip, two mouth corners, and the like. The face in the image can be corrected by affine transformation. Optionally, the left eye and the right eye of the human face in the image may be located at horizontal positions through affine transformation, and a set pixel distance is provided between the left eye and the right eye.
For example, the left eye and the right eye can be adjusted to horizontal positions by affine transformation, and the distance between the two eyes is adjusted to 120 pixels, when clipping is performed, a face image with the size of 512 by 512 pixels can be clipped from the boundary of the two eyes to the upper edge of the image, wherein the distance between the two eyes is 250 pixels, and the center point of the two eyes is located on the vertical central line of the face image.
Those skilled in the art will understand that the face detection model may be implemented in the related art, and the specific implementation manner of the face detection model in the present invention is not limited, for example, the MTCNN model may be used to perform face key point detection.
It should be noted that before predicting the attribute type of the face in the face image, operations of adjusting brightness and beautifying skin may be performed on the face image, so as to improve the visual quality of the face image.
In an embodiment, for the process of predicting the attribute class of the face in the face image, the face image may be input into a trained prediction model, so as to extract a feature map of the face image from a feature extraction network in the prediction model and output to an attribute prediction network in the prediction model, where the attribute prediction network predicts the attribute class of the face based on the feature map.
Illustratively, the attribute categories may include young men, young women, old men, old women, and the like.
In the present invention, the prediction model is a multi-task learning model, that is, the prediction model further includes a weight prediction network, and for the training process of the prediction model, please refer to the following related description in step 104, which is not detailed herein first.
Step 102: and inputting the face image into a trained general face portrait synthesis model so as to enable the general face portrait synthesis model to synthesize a first face simple stroke of the face image.
And the output of the general human face portrait synthesis model is a human face portrait simple stroke directly synthesized without considering the attribute class of the human face.
Step 103: and determining a trained special face portrait synthetic model required by the attribute category, and inputting the face image into the special face portrait synthetic model so as to enable the special face portrait synthetic model to synthesize a second face simple stroke of the face image.
The output of the special face portrait synthesis model is a face portrait simplified stroke synthesized by considering the face attribute category, and each attribute category correspondingly uses one special face portrait synthesis model to synthesize the face portrait simplified stroke.
Before executing step 102 and step 103, a generic face portrait synthesis model and a dedicated face portrait synthesis model corresponding to each attribute class need to be trained respectively.
The general face portrait synthetic model and the special face portrait synthetic model both adopt a generation confrontation network structure during training. And the human face portrait synthesis model and the special human face portrait synthesis model are both realized by adopting an encoder-decoder structure.
The encoder is implemented using multiple convolutional layers, e.g., the encoder may employ a VGGFace feature extractor, and correspondingly, the decoder is implemented using multiple transposed convolutional layers, a normalization layer, and an activation layer.
The training process of the dedicated human face portrait synthesis model corresponding to each attribute category may include:
the attribute categories include four categories: the method comprises the steps of firstly obtaining a face sample set, marking each face sample in the face sample set with an attribute category, obtaining real face simple strokes corresponding to each face sample, constructing a corresponding special face portrait synthesis model and a discrimination model aiming at each attribute category, and optimizing the constructed special face portrait synthesis model and the discrimination model in an alternating iteration mode by using the face samples marked with the attribute categories and the corresponding real face simple strokes.
The special human face portrait synthesis model is input as a human face sample and output as a synthesized human face simple stroke; the judgment model is input as a synthesized face sketch, output as the judgment result and the face attribute of the face sketch, input as a real face sketch and output as the judgment result and the face attribute of the real face sketch.
Based on the description, the loss value of the discrimination model is obtained from the discrimination result and the classification of the synthesized face sketch strokes and the discrimination result and the attribute classification of the real face sketch strokes, and the loss value of the special face portrait synthesis model is obtained from the content loss value between the synthesized face sketch strokes and the face sample, the style loss value between the synthesized face sketch strokes and the real face sketch strokes and the loss value of the discrimination model.
Wherein, the loss value L of the discrimination model is calculated according to the discrimination result and the classification of the synthesized face sketch strokes and the discrimination result and the attribute classification of the real face sketch strokesca-advCross entropy loss function calculations may be employed.
The calculation formula of the content loss value between the synthesized face strokes and the face samples is as follows:
Figure BDA0002359086850000081
Figure BDA0002359086850000082
representing encoders, i.e.
Figure BDA0002359086850000083
After the face sample x is input into the coder, the feature diagram of the j-th computing layer is obtained,
Figure BDA0002359086850000084
representing the characteristic graph of the j-th computing layer after the synthesized face sketch strokes G (x) are input into the encoder, Cj、HjAnd WjThe channel number, the length and the width of the feature map output by the jth computing layer are respectively.
The formula for calculating the style loss value between the synthesized face sketch and the real face sketch is as follows:
Figure BDA0002359086850000085
gram (. circle.) denotes a Gram matrix, i.e.
Figure BDA0002359086850000086
Representing the Gram matrix of the characteristic diagram output by the k-th computing layer after the synthesized face simple stroke G (x) is input into the encoder,
Figure BDA0002359086850000087
and the Gram matrix represents the characteristic diagram output by the k-th computing layer after the strokes s of the real face are input into the encoder.
Loss value L based on the above discriminant modelca-advContent loss value LcontentStyle loss value LstyleThe formula for calculating the loss value of the special face portrait synthesis model is as follows:
Lglobal=Lidentity+λLstyle+βLca-adv(formula 3)
Wherein lambda is more than or equal to 0, and β is more than or equal to 0.
In the training process, the special human face portrait synthesis model and the discrimination model are alternately and iteratively optimized, namely the optimization formula is as follows:
Figure BDA0002359086850000091
wherein G represents the optimization of the special human face portrait synthesis model, and D represents the optimization of the discrimination model.
It should be noted that the training process for the general face portrait synthesis model is the same as the training principle of the special face portrait synthesis model, and the difference is that the general face portrait synthesis model can be trained and optimized by using all face samples in the face sample set, and details are not repeated.
Optionally, the general face portrait synthesis model may be trained first, then the special face portrait synthesis model is initialized by using the general face portrait synthesis model, and then the special face portrait synthesis model is subjected to fine tuning optimization, so as to improve the model training efficiency.
Step 104: and fusing the first face sketch and the second face sketch to obtain a third face sketch.
The prediction model described in the step 101 further includes a weight prediction network, the feature extraction network in the prediction model extracts the feature map of the face image and outputs the feature map to the weight prediction network, and the weight prediction network predicts fusion weights based on the feature map.
Further, the first face sketch and the second face sketch are fused by utilizing the fusion weight, and the fusion formula is as follows:
Ofinal=β·Gk*(x)+(1-β)·Gu(x) (formula 5)
Wherein, represents the dot product of pixel values, Gk*(x) Representing a second face of the simple stroke, Gu(x) Representing the first face sketch and β representing the fusion weight.
Aiming at the training process of the prediction model, each face sample in the face sample set can be used for optimizing a feature extraction network and an attribute prediction network in the constructed prediction model until the feature extraction network and the attribute prediction network are converged, and then a feature map obtained by each face sample in the face sample set through the optimized feature extraction network is used for optimizing a weight prediction network in the constructed prediction model until the loss value of the weight prediction network is lower than a preset value;
the characteristic extraction network comprises a plurality of convolution layers, and the attribute prediction network and the weight prediction network are both realized by a plurality of fully-connected layers.
And the loss value of the weight prediction network is a content loss value between the face sample and the third face simple stroke obtained by fusion.
And the third face simple-strokes obtained by fusion are obtained by fusing first face simple-strokes and second face simple-strokes with fusion weights obtained by a feature diagram of the face samples through a weight prediction network after the face samples are respectively subjected to a general face portrait synthetic model and a corresponding special face portrait synthetic model.
Therefore, the attribute type prediction is realized by the feature extraction network and the attribute prediction network in the prediction model, and the fusion weight prediction is realized by the feature extraction network and the weight prediction network in the prediction model, so that the prediction model is a multi-task learning model.
Before optimizing the weight prediction network, the optimization training of the general face portrait synthesis model and the special face portrait synthesis model needs to be completed.
It should be noted that after the face image is cut out from the received image, the face image may be input into the trained face analysis model, so that the face analysis model segments each region of the face in the face image, and obtains the position of the face region output by the face analysis model.
The region of each part of the human face may include 11 region areas, such as left eyebrow, right eyebrow, left eye, right eye, nose, mouth, face, hair, neck, trunk, and background. Referring to fig. 2, as shown in fig. 2, the analysis results corresponding to 11 regions output by the face analysis model, the 11 regions include a left eyebrow, a right eyebrow, a left eye, a right eye, a nose, a mouth, a face, hair, a neck, a trunk, and a background.
It should be further noted that after the third face skeleton strokes are obtained, post-processing operation may be performed on the third face skeleton strokes, skeleton strokes located in the face region in the processed third face skeleton strokes may be adjusted, and vectorization operation may be performed on the adjusted third face skeleton strokes to obtain final face skeleton strokes.
The post-processing operation comprises the operations of blurring, binarization, expansion and the like so as to make up for narrower gaps and slender gullies, eliminate smaller cavities and fill up fractures in the contour line so as to achieve the purpose of smoothing the contour.
Optionally, when the simple strokes in the face region are adjusted, the simple strokes located in the face region may be removed for a user whose attribute category is young women, so as to achieve an effect of removing black lines and spots in the face skin region, for a user whose attribute category is old men, the left eye, the right eye and the nose region are respectively expanded by a preset pixel distance to obtain an expanded region, and the simple strokes located in the face region but not located in the expanded region are removed, so as to achieve an effect of constraining lines corresponding to crow's feet and statute lines in the face skin region within a threshold length.
Finally, vectorization operation is carried out on the adjusted third face sketch so that the generated lines are smoother, the final face portrait sketch is simpler and more attractive, and the characteristics and the requirements of users of different ages and sexes are met.
For the process from the step 101 to the step 104, referring to the system structure shown in fig. 3, firstly, inputting a face photo into a general generator to obtain first face simple strokes, inputting the face photo into a prediction model to obtain attribute categories and fusion weights, selecting corresponding special generators by an attribute category control gating module, inputting the face photo into the selected special generators to obtain second face simple strokes, further fusing the first face simple strokes and the second face simple strokes by using the fusion weights to obtain third face simple strokes, and then performing operations such as blurring, binaryzation, expansion, adjustment of simple strokes in a face area, vectorization and the like on the third face simple strokes by using an adaptive post-processing module to obtain final face image simple strokes for output.
In this embodiment, when a general face portrait synthesis model is used to synthesize first face simple strokes, different special face portrait synthesis models are used to synthesize second face simple strokes according to different face attribute categories, so as to overcome the influence of face attribute changes on the synthesis quality of the portrait simple strokes, and then third face simple strokes obtained by fusing the first face simple strokes and the second face simple strokes are more accurate, and meet the personalized requirements of different face attributes.
Fig. 4 is a hardware block diagram of an electronic device according to an exemplary embodiment of the present invention, the electronic device including: a communication interface 401, a processor 402, a machine-readable storage medium 403, and a bus 404; wherein the communication interface 401, the processor 402 and the machine-readable storage medium 403 communicate with each other via a bus 404. The processor 402 can execute the above-described face sketch generating method by reading and executing machine executable instructions corresponding to the control logic of the face sketch generating method in the machine readable storage medium 403, and the details of the method are as described in the above embodiments, and will not be described herein again.
The machine-readable storage medium 403 referred to in this disclosure may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: volatile memory, non-volatile memory, or similar storage media. In particular, the machine-readable storage medium 403 may be a RAM (Random Access Memory), a flash Memory, a storage drive (e.g., a hard disk drive), any type of storage disk (e.g., an optical disk, a DVD, etc.), or similar storage medium, or a combination thereof.
Corresponding to the embodiment of the face sketch generating method, the invention also provides an embodiment of a face sketch generating device.
Fig. 5 is a flowchart illustrating an embodiment of a face sketch generating device according to an exemplary embodiment of the present invention, where the face sketch generating device may be applied to an electronic device. As shown in fig. 5, the face sketch generating device includes:
an attribute prediction module 510, configured to crop a face image from a received image, and predict an attribute category of a face in the face image;
a general synthesis module 520, configured to input the face image into a trained general face portrait synthesis model, so that the general face portrait synthesis model synthesizes a first face simple stroke of the face image;
a special synthesis module 530, configured to determine a trained special face portrait synthesis model that needs to be used by the attribute category, and input the face image into the special face portrait synthesis model, so that the special face portrait synthesis model synthesizes a second face simple stroke of the face image;
and the fusion module 540 is configured to fuse the first face sketch and the second face sketch to obtain a third face sketch.
In an optional implementation manner, the attribute prediction module 510 is specifically configured to, in a process of predicting an attribute category of a face in the face image, input the face image into a trained prediction model, extract a feature map of the face image by using a feature extraction network in the prediction model, and output the feature map to an attribute prediction network in the prediction model, where the attribute prediction network predicts an attribute category of the face based on the feature map.
In an alternative implementation, the apparatus further comprises (not shown in fig. 5):
the training module is used for acquiring a face sample set, wherein each face sample in the face sample set is marked with an attribute type, and the attribute types comprise young males, young females, old males and old females; acquiring real face simplified strokes corresponding to each face sample in the face sample set; aiming at each attribute category, constructing a corresponding special human face portrait synthesis model and a discrimination model, and optimizing the constructed special human face portrait synthesis model and the discrimination model in an alternate iteration mode by using human face samples marked with the attribute category and corresponding real human face simple strokes; the special human face portrait synthesis model is input as a human face sample and output as a synthesized human face simple stroke; the judging model inputs the synthesized face strokes and outputs the judging results and the face attributes of the face strokes, and inputs the real face strokes and outputs the judging results and the face attributes of the real face strokes; the loss value of the discrimination model is obtained from the discrimination result and the classification of the synthesized face sketch strokes and the discrimination result and the attribute classification of the real face sketch strokes, and the loss value of the special face portrait synthesis model is obtained from the content loss value between the synthesized face sketch strokes and the face sample, the style loss value between the synthesized face sketch strokes and the real face sketch strokes and the loss value of the discrimination model.
In an optional implementation manner, the prediction model further includes a weight prediction network, and the attribute prediction module 510 is further configured to extract a feature map of the face image by using a feature extraction network and output the feature map to the weight prediction network, where the weight prediction network predicts and fuses weights based on the feature map;
the fusion module 540 is specifically configured to fuse the first face sketch and the second face sketch by using the fusion weight.
In an optional implementation manner, the training module is further configured to optimize a feature extraction network and an attribute prediction network in the constructed prediction model by using each face sample in the face sample set until the feature extraction network and the attribute prediction network converge; optimizing a weight prediction network in a constructed prediction model by using a feature map obtained by each face sample in the face sample set through an optimized feature extraction network until the loss value of the weight prediction network is lower than a preset value; the loss value of the weight prediction network is a content loss value between a face sample and a third face simple stroke obtained by fusion, the third face simple stroke is obtained by fusing a first face simple stroke and a second face simple stroke by using a feature map of the face sample through a fusion weight obtained by the weight prediction network after the face sample is subjected to a general face portrait synthetic model and a corresponding special face portrait synthetic model respectively.
In an alternative implementation, the apparatus further comprises (not shown in fig. 5):
a face analysis module, configured to, after the attribute prediction module 510 cuts out a face image from a received image, input the face image into a trained face analysis model, so that the face analysis model segments each region of a face in the face image, and obtains a position of a face region output by the face analysis model;
a post-processing module, configured to perform post-processing on the third face sketch after the fusion module 540 fuses the first face sketch and the second face sketch to obtain a third face sketch; and adjusting the simplified strokes in the face area in the processed third face simplified strokes, and performing vectorization operation on the adjusted third face simplified strokes to obtain final face simplified strokes.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for generating human face simple strokes is characterized by comprising the following steps:
cutting out a face image from a received image, and predicting the attribute category of a face in the face image;
inputting the face image into a trained general face portrait synthesis model so that the general face portrait synthesis model synthesizes first face simple strokes of the face image;
determining a trained special human face portrait synthetic model required to be used by the attribute category, and inputting the human face image into the special human face portrait synthetic model so as to enable the special human face portrait synthetic model to synthesize a second human face simple stroke of the human face image;
and fusing the first face sketch and the second face sketch to obtain a third face sketch.
2. The method of claim 1, wherein predicting the attribute class of the face in the face image comprises:
inputting the face image into a trained prediction model, extracting a feature map of the face image by a feature extraction network in the prediction model and outputting the feature map to an attribute prediction network in the prediction model, wherein the attribute prediction network predicts the attribute category of the face based on the feature map.
3. The method of claim 2, wherein the training process of the dedicated human face portrait synthesis model corresponding to each attribute class comprises:
acquiring a face sample set, wherein each face sample in the face sample set is marked with an attribute type, and the attribute types comprise young males, young females, old males and old females;
acquiring real face simplified strokes corresponding to each face sample in the face sample set;
aiming at each attribute category, constructing a corresponding special human face portrait synthesis model and a discrimination model, and optimizing the constructed special human face portrait synthesis model and the discrimination model in an alternate iteration mode by using human face samples marked with the attribute category and corresponding real human face simple strokes;
the special human face portrait synthesis model is input as a human face sample and output as a synthesized human face simple stroke; the judging model inputs the synthesized face strokes and outputs the judging results and the face attributes of the face strokes, and inputs the real face strokes and outputs the judging results and the face attributes of the real face strokes;
the loss value of the discrimination model is obtained from the discrimination result and the classification of the synthesized face sketch strokes and the discrimination result and the attribute classification of the real face sketch strokes, and the loss value of the special face portrait synthesis model is obtained from the content loss value between the synthesized face sketch strokes and the face sample, the style loss value between the synthesized face sketch strokes and the real face sketch strokes and the loss value of the discrimination model.
4. The method according to claim 2, wherein the prediction model further comprises a weight prediction network, a feature extraction network extracts a feature map of the face image and outputs the feature map to the weight prediction network, and the weight prediction network predicts fusion weights based on the feature map;
fusing the first face sketch and the second face sketch to obtain a third face sketch, comprising:
and fusing the first face sketch and the second face sketch by using the fusion weight.
5. The method of claim 3, wherein the training process of the predictive model comprises:
optimizing a feature extraction network and an attribute prediction network in a constructed prediction model by using each face sample in the face sample set until the feature extraction network and the attribute prediction network are converged;
optimizing a weight prediction network in a constructed prediction model by using a feature map obtained by each face sample in the face sample set through an optimized feature extraction network until the loss value of the weight prediction network is lower than a preset value;
the loss value of the weight prediction network is a content loss value between a face sample and a third face simple stroke obtained by fusion, the third face simple stroke is obtained by fusing a first face simple stroke and a second face simple stroke by using a feature map of the face sample through a fusion weight obtained by the weight prediction network after the face sample is subjected to a general face portrait synthetic model and a corresponding special face portrait synthetic model respectively.
6. The method of claim 1, wherein after cropping the face image from the received image, the method further comprises:
inputting the face image into a trained face analysis model so that the face analysis model can segment each region of the face in the face image and obtain the position of the face region output by the face analysis model;
and after the first face sketch and the second face sketch are fused to obtain a third face sketch, the method further comprises the following steps:
carrying out post-processing operation on the third face simplified strokes;
and adjusting the simplified strokes in the face area in the processed third face simplified strokes, and performing vectorization operation on the adjusted third face simplified strokes to obtain final face simplified strokes.
7. An apparatus for generating human face strokes, the apparatus comprising:
the attribute prediction module is used for cutting out a face image from the received image and predicting the attribute category of the face in the face image;
the general synthesis module is used for inputting the face image into a trained general face portrait synthesis model so as to enable the general face portrait synthesis model to synthesize a first face simple stroke of the face image;
the special synthesis module is used for determining a trained special human face portrait synthesis model required to be used by the attribute category and inputting the human face image into the special human face portrait synthesis model so as to enable the special human face portrait synthesis model to synthesize a second human face simple stroke of the human face image;
and the fusion module is used for fusing the first face sketch and the second face sketch to obtain a third face sketch.
8. The apparatus according to claim 7, wherein the attribute prediction module is specifically configured to, in predicting the attribute class of the face in the face image, input the face image into a trained prediction model, so as to extract a feature map of the face image from a feature extraction network in the prediction model and output the feature map to an attribute prediction network in the prediction model, where the attribute prediction network predicts the attribute class of the face based on the feature map.
9. The apparatus of claim 8, further comprising:
the training module is used for acquiring a face sample set, wherein each face sample in the face sample set is marked with an attribute type, and the attribute types comprise young males, young females, old males and old females; acquiring real face simplified strokes corresponding to each face sample in the face sample set; aiming at each attribute category, constructing a corresponding special human face portrait synthesis model and a discrimination model, and optimizing the constructed special human face portrait synthesis model and the discrimination model in an alternate iteration mode by using human face samples marked with the attribute category and corresponding real human face simple strokes; the special human face portrait synthesis model is input as a human face sample and output as a synthesized human face simple stroke; the judging model inputs the synthesized face strokes and outputs the judging results and the face attributes of the face strokes, and inputs the real face strokes and outputs the judging results and the face attributes of the real face strokes; the loss value of the discrimination model is obtained from the discrimination result and the classification of the synthesized face sketch strokes and the discrimination result and the attribute classification of the real face sketch strokes, and the loss value of the special face portrait synthesis model is obtained from the content loss value between the synthesized face sketch strokes and the face sample, the style loss value between the synthesized face sketch strokes and the real face sketch strokes and the loss value of the discrimination model.
10. The apparatus of claim 8, wherein the prediction model further comprises a weight prediction network, and the attribute prediction module is further configured to extract a feature map of the face image by using a feature extraction network and output the feature map to the weight prediction network, and the weight prediction network is configured to predict fusion weights based on the feature map;
the fusion module is specifically configured to fuse the first face sketch and the second face sketch by using the fusion weight.
CN202010016526.4A 2020-01-08 2020-01-08 Face simple drawing generation method and device Active CN111223164B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010016526.4A CN111223164B (en) 2020-01-08 2020-01-08 Face simple drawing generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010016526.4A CN111223164B (en) 2020-01-08 2020-01-08 Face simple drawing generation method and device

Publications (2)

Publication Number Publication Date
CN111223164A true CN111223164A (en) 2020-06-02
CN111223164B CN111223164B (en) 2023-10-24

Family

ID=70828116

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010016526.4A Active CN111223164B (en) 2020-01-08 2020-01-08 Face simple drawing generation method and device

Country Status (1)

Country Link
CN (1) CN111223164B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003099779A (en) * 2001-09-21 2003-04-04 Japan Science & Technology Corp Device, method, and program for evaluating person attribute
JP2004102359A (en) * 2002-09-04 2004-04-02 Advanced Telecommunication Research Institute International Image processing device, method and program
US20100021066A1 (en) * 2008-07-10 2010-01-28 Kohtaro Sabe Information processing apparatus and method, program, and recording medium
US20170180501A1 (en) * 2015-12-21 2017-06-22 Industrial Technology Research Institute Message pushing method and message pushing device
CN107038429A (en) * 2017-05-03 2017-08-11 四川云图睿视科技有限公司 A kind of multitask cascade face alignment method based on deep learning
CN108596839A (en) * 2018-03-22 2018-09-28 中山大学 A kind of human-face cartoon generation method and its device based on deep learning
CN110023989A (en) * 2017-03-29 2019-07-16 华为技术有限公司 A kind of generation method and device of sketch image
CN110222588A (en) * 2019-05-15 2019-09-10 合肥进毅智能技术有限公司 A kind of human face sketch image aging synthetic method, device and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003099779A (en) * 2001-09-21 2003-04-04 Japan Science & Technology Corp Device, method, and program for evaluating person attribute
JP2004102359A (en) * 2002-09-04 2004-04-02 Advanced Telecommunication Research Institute International Image processing device, method and program
US20100021066A1 (en) * 2008-07-10 2010-01-28 Kohtaro Sabe Information processing apparatus and method, program, and recording medium
US20170180501A1 (en) * 2015-12-21 2017-06-22 Industrial Technology Research Institute Message pushing method and message pushing device
CN110023989A (en) * 2017-03-29 2019-07-16 华为技术有限公司 A kind of generation method and device of sketch image
CN107038429A (en) * 2017-05-03 2017-08-11 四川云图睿视科技有限公司 A kind of multitask cascade face alignment method based on deep learning
CN108596839A (en) * 2018-03-22 2018-09-28 中山大学 A kind of human-face cartoon generation method and its device based on deep learning
CN110222588A (en) * 2019-05-15 2019-09-10 合肥进毅智能技术有限公司 A kind of human face sketch image aging synthetic method, device and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
MINGJIN ZHANG等: "Dual-Transfer Face Sketch–Photo Synthesis" *
周仁琴;刘福新;: "面向移动数字娱乐的卡通人脸动画系统" *
王楠楠;李洁;高新波;: "人脸画像合成研究的综述与对比分析" *
黄菲;高飞;朱静洁;戴玲娜;俞俊;: "基于生成对抗网络的异质人脸图像合成:进展与挑战" *

Also Published As

Publication number Publication date
CN111223164B (en) 2023-10-24

Similar Documents

Publication Publication Date Title
CN109376582B (en) Interactive face cartoon method based on generation of confrontation network
Jiang et al. Scfont: Structure-guided chinese font generation via deep stacked networks
CN112950661B (en) Attention-based generation method for generating network face cartoon
CN109886881B (en) Face makeup removal method
CN111652049A (en) Face image processing model training method and device, electronic equipment and storage medium
CN110706302A (en) System and method for text synthesis image
CN113780149A (en) Method for efficiently extracting building target of remote sensing image based on attention mechanism
Singh et al. Neural style transfer: A critical review
US11282257B2 (en) Pose selection and animation of characters using video data and training techniques
CN113963409A (en) Training of face attribute editing model and face attribute editing method
CN113034355B (en) Portrait image double-chin removing method based on deep learning
Hsieh et al. Automatic trimap generation for digital image matting
CN114359517A (en) Avatar generation method, avatar generation system, and computing device
CN115546461A (en) Face attribute editing method based on mask denoising and feature selection
CN116012835A (en) Two-stage scene text erasing method based on text segmentation
Bian et al. Conditional adversarial consistent identity autoencoder for cross-age face synthesis
CN114240811A (en) Method for generating new image based on multiple images
CN116310008B (en) Image processing method based on less sample learning and related equipment
Liu et al. A3GAN: An attribute-aware attentive generative adversarial network for face aging
CN116721008A (en) User-defined expression synthesis method and system
CN111223164B (en) Face simple drawing generation method and device
CN111275778B (en) Face simple drawing generation method and device
CN113947520A (en) Method for realizing face makeup conversion based on generation of confrontation network
Rai et al. Improved attribute manipulation in the latent space of stylegan for semantic face editing
Rehman et al. Investigation and Morphing Attack Detection Techniques in Multimedia: A Detail Review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200826

Address after: Room 101, building 1, block C, Qianjiang Century Park, ningwei street, Xiaoshan District, Hangzhou City, Zhejiang Province

Applicant after: Hangzhou Weiming Information Technology Co.,Ltd.

Applicant after: Institute of Information Technology, Zhejiang Peking University

Address before: Room 288-1, 857 Xinbei Road, Ningwei Town, Xiaoshan District, Hangzhou City, Zhejiang Province

Applicant before: Institute of Information Technology, Zhejiang Peking University

Applicant before: Hangzhou Weiming Information Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20200602

Assignee: Zhejiang Visual Intelligence Innovation Center Co.,Ltd.

Assignor: Institute of Information Technology, Zhejiang Peking University|Hangzhou Weiming Information Technology Co.,Ltd.

Contract record no.: X2023330000927

Denomination of invention: Method and device for generating simple facial strokes

Granted publication date: 20231024

License type: Common License

Record date: 20231219

EE01 Entry into force of recordation of patent licensing contract