CN115601459A - CNN-based face ornament generation method, device and equipment - Google Patents

CNN-based face ornament generation method, device and equipment Download PDF

Info

Publication number
CN115601459A
CN115601459A CN202211137358.XA CN202211137358A CN115601459A CN 115601459 A CN115601459 A CN 115601459A CN 202211137358 A CN202211137358 A CN 202211137358A CN 115601459 A CN115601459 A CN 115601459A
Authority
CN
China
Prior art keywords
image
jewelry
model
ornament
face
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211137358.XA
Other languages
Chinese (zh)
Inventor
周勉
尚伟艺
刘洛麒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Meitu Technology Co Ltd
Original Assignee
Xiamen Meitu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Meitu Technology Co Ltd filed Critical Xiamen Meitu Technology Co Ltd
Priority to CN202211137358.XA priority Critical patent/CN115601459A/en
Publication of CN115601459A publication Critical patent/CN115601459A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Abstract

The invention discloses a face ornament generation method, a device, equipment and a storage medium based on CNN, comprising the following steps: inputting an image to be processed, and preprocessing the image to be processed to obtain a first face image, wherein the image to be processed comprises a face area; determining a target area for modification in the first face image, and inputting the target area to a pre-trained jewelry generation model after superimposing an jewelry hand drawing on the target area to obtain a jewelry generation image; inputting the jewelry generation image and the first face image into a pre-trained jewelry fusion model for fusion to obtain a first result chart; and performing portrait posture reduction and fusion processing on the first result image to obtain a second result image. The method can fully utilize the context information such as illumination conditions, portrait postures and the like in the image to generate the jewelry, so that the generated jewelry is more natural and reasonable.

Description

CNN-based face ornament generation method, device and equipment
Technical Field
The invention relates to the technical field of image processing, in particular to a face ornament generation method, a face ornament generation device and face ornament generation equipment based on CNN.
Background
The face ornaments comprise articles decorated by faces such as glasses, false eyelashes, beautiful pupils, earrings and the like, and the color value of the portrait can be effectively improved by wearing the ornaments. In the past, people often need to spend a great deal of time on-line experience or online shopping attempts to find suitable ornaments, and the process is time-consuming and labor-consuming. In order to save time, computer simulation is tried to attach the jewelry to the face of a target person so as to realize virtual jewelry wearing, so that the wearing effect of the jewelry can be viewed in advance. The traditional simulation mode is mainly characterized in that existing jewelry materials are pasted to a target area of a portrait, and a certain fusion algorithm is matched to enable the final effect to look natural, but the traditional mode of adding ornaments to the portrait through a mapping mode is prone to causing the phenomenon of disharmonious effect and unnatural situation, has higher requirements on the photographing of the portrait, and cannot meet the defects of personalized requirements.
Disclosure of Invention
In view of the above, the present invention aims to provide a CNN-based face ornament generation method, apparatus and device, and to solve the problems that the existing method of adding ornaments to a portrait by mapping is easy to cause a situation of improper effect and unnatural mapping effect.
In order to achieve the above object, the present invention provides a CNN-based face ornament generation method, including:
inputting an image to be processed, and preprocessing the image to be processed to obtain a first face image, wherein the image to be processed comprises a face area;
determining a target area for modification in the first face image, and inputting the target area to a pre-trained jewelry generation model after superimposing a jewelry hand drawing on the target area to obtain a jewelry generation image;
inputting the jewelry generation image and the first face image into a pre-trained jewelry fusion model for fusion to obtain a first result chart;
and performing portrait posture reduction and fusion processing on the first result image to obtain a second result image.
Preferably, the jewelry generation model comprises a freehand mapping model, a context coding model and a decoding model, wherein,
mapping the drawing process of the ornament hand drawing into jewelry characteristics through the hand drawing mapping model;
extracting the portrait characteristics around the ornament hand-drawing picture through the context coding model;
and fusing the jewelry characteristics and the portrait characteristics through the decoding model to generate the jewelry generating image.
Preferably, the hand-drawn mapping model comprises a multilayer perceptron and an adaptive mapping layer; the mapping of the drawing process of the ornament hand drawing to the ornament characteristics through the hand drawing mapping model comprises the following steps:
cutting the ornament hand-drawn picture into a plurality of line segments, dividing the line segments into Q groups, wherein each group comprises K connected line segments, and inputting each group of line segments into the multilayer sensor to obtain a D-dimensional characteristic vector;
and mapping the D-dimensional feature vector into a fixed-dimension jewelry feature map through the self-adaptive mapping layer.
Preferably, the inputting the jewelry generating image and the first face image into a pre-trained jewelry fusion model for fusion to obtain a first result diagram includes:
according to the formula O = Ix (1-O) a )+O rgb ×O a Calculating the final output of the jewelry fusion model; wherein O represents the first result graph, I represents an input picture, O rgb Representing said item of jewellery generating an image, O a The fusion ratio is indicated.
Preferably, the training process of the jewelry generation model comprises the following steps:
supervising a result graph output by the jewelry generation model by utilizing a first loss function and a second loss function, wherein,
the first loss function is:
Figure BDA0003852680140000021
the second loss function is:
Figure BDA0003852680140000022
Figure BDA0003852680140000023
o represents a result graph output by the jewelry generation model, T represents a target image, H represents an image height, W represents an image width, and W represents win Representing the side length of a square window, O sketch Grass plot, l 'representing results of model output' sketch Expressing the sketch image after key position extraction, lambda i,j Representing the window-dependent weight parameter.
Preferably according to the formula
Figure BDA0003852680140000031
Calculating said λ i,j Where, ζ is a sign function,
Figure BDA0003852680140000032
preferably, the training process of the jewelry fusion model includes:
monitoring a result graph output by the jewelry fusion model by utilizing a third loss function and a fourth loss function; wherein the content of the first and second substances,
the third loss function is:
Figure BDA0003852680140000033
the fourth loss function is:
Figure BDA0003852680140000034
F o a result graph, F, representing the output of the jewelry fusion model t Shows the target effect diagram phi j And H represents the image height, and W represents the image width.
In order to achieve the above object, the present invention further provides a CNN-based face ornament generation apparatus, including:
the system comprises a preprocessing unit, a display unit and a processing unit, wherein the preprocessing unit is used for inputting an image to be processed, preprocessing the image to be processed to obtain a first face image, and the image to be processed comprises a face area;
the ornament generation unit is used for determining a target area for modification in the first face image, and inputting the target area to a pre-trained ornament generation model after overlaying an ornament hand drawing on the target area to obtain an ornament generation image;
the first fusion unit is used for inputting the jewelry generation image and the first face image into a pre-trained jewelry fusion model for fusion to obtain a first result chart;
and the second fusion unit is used for carrying out portrait posture reduction and fusion processing on the first result image to obtain a second result image.
In order to achieve the above object, the present invention also proposes an apparatus comprising a processor, a memory, and a computer program stored in the memory, the computer program being executed by the processor to implement the steps of a CNN-based face ornament generation method as described in the above embodiments.
In order to achieve the above object, the present invention also proposes a computer readable storage medium having a computer program stored thereon, the computer program being executed by a processor to implement the steps of a CNN-based face ornament generation method as described in the above embodiments.
Has the advantages that:
according to the scheme, the target area is selected in the face image, the ornament generation model and the ornament fusion model are adopted to generate the face ornament according to the ornament hand-drawing image, and the ornament generation is directly performed in the target area of the face image, so that the illumination condition, the portrait posture and other context information in the image can be fully utilized to generate the ornament, and the generated ornament is more natural and reasonable.
According to the scheme, the ornament is generated based on the ornament hand-drawing of the user, the background information of pixels around the ornament can be better utilized while the constraint of a material library is broken away, the wearing effect of the ornament is more reasonable and natural, meanwhile, the individual customization of the user can be realized by intervening the ornament hand-drawing, the personalized ornament effect is obtained, and the personalized requirement is met.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a CNN-based method for generating a facial ornament according to an embodiment of the present invention.
Fig. 2 is a schematic flow chart of a method for generating a facial ornament according to an embodiment of the present invention.
Fig. 3 is a schematic overall structure diagram of the jewelry generation model according to an embodiment of the present invention.
Fig. 4 is a schematic structural diagram of an SMN according to an embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a CEN according to an embodiment of the present invention.
Fig. 6 is a schematic structural diagram of a residual fourier convolution block according to an embodiment of the present invention.
Fig. 7 is a schematic structural diagram of DN according to an embodiment of the present invention.
Fig. 8 is a schematic structural diagram of a UCL module according to an embodiment of the present invention.
Fig. 9 is a schematic structural diagram of a jewelry fusion model according to an embodiment of the present invention.
Fig. 10 is a schematic structural diagram of a CNN-based face ornament generation apparatus according to an embodiment of the present invention.
The implementation, functional features and advantages of the present invention will be further described with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without inventive efforts based on the embodiments of the present invention, are within the scope of protection of the present invention.
In the description of the present invention, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature.
The present invention will be described in detail with reference to examples.
Because the existing face ornament generation mode needs to prepare materials in advance, the selection of a user is greatly limited by a material library, namely, the user mainly depends on the materials, and the content is limited; moreover, the user needs to select the favorite materials from a large number of materials, which is a time-consuming process, so that the searching is time-consuming, namely, the user requirements cannot be met due to few materials in the traditional mode, or the personalized requirements of the user cannot be met due to time-consuming multi-user searching of the materials. In addition, the material is not related to the portrait of the user, the material is generated independently of the specific portrait, and the illumination of the portrait and the face pose information cannot be fully utilized, so that the photographing condition of the portrait needs to be limited.
Based on the method, the target area is selected in the face image, the jewelry generation model and the jewelry fusion model are adopted to generate the face jewelry according to the jewelry hand-drawing image, the illumination condition, the portrait posture and other context information in the image can be fully utilized to generate the jewelry, and the generated jewelry is more natural and reasonable; ornament generation is carried out based on the ornament hand-drawn picture of the user, and the individual requirements of the user can be met.
Fig. 1 is a schematic flow chart of a CNN-based method for generating a facial ornament according to an embodiment of the present invention.
In this embodiment, the method includes:
s11, inputting an image to be processed, and preprocessing the image to be processed to obtain a first face image, wherein the image to be processed comprises a face area.
S12, determining a target area to be modified in the first face image, and inputting the target area to a pre-trained jewelry generation model after superimposing an jewelry hand drawing on the target area to obtain a jewelry generation image.
And S13, inputting the jewelry generation image and the first face image into a pre-trained jewelry fusion model for fusion to obtain a first result chart.
And S14, performing portrait posture reduction and fusion processing on the first result image to obtain a second result image.
Refer to the flow chart of the method for generating the ornaments shown in FIG. 2. In this embodiment, the preprocessing the image to be processed includes: obtaining a face point set FP in an image I to be processed by a mature face detection and face alignment method based on CNN at present, calculating an external rectangle thereof, and expanding outwards to obtain a cutting rectangle of a face; and obtaining the rotation angle of the face through the cutting rectangle of the face, and cutting the face image F after the face is straightened from the image I to be processed. And simultaneously converting the face point set FP into the coordinates of the face image F, and recording as FP.
The user selects a target wearing area of the ornament in the face image F and inputs the ornament hand drawing, and the ornament generation model generates the ornament according to the ornament hand drawing and related background information. After the user confirms that the generated ornament meets the preset condition, the image of the ornament area is fused back to the face image F again, and the final effect is optimized through the ornament fusion model, so that the generated effect is more natural. And finally, rotating the face back to the pose of the original image according to FP and fusing. The face jewelry is generated based on a sketch drawn by a user, and the user can continuously add new texture information according to the generation effect of the model in the whole process, so that the jewelry effect is gradually refined.
Further, the jewelry-generating model comprises a freehand mapping model, a context coding model and a decoding model, wherein,
mapping the drawing process of the ornament hand drawing into jewelry characteristics through the hand drawing mapping model;
extracting the portrait characteristics around the ornament hand-drawing picture through the context coding model;
and fusing the jewelry characteristics and the portrait characteristics through the decoding model to generate the jewelry generating image.
The overall structure of the jewelry generative model is shown in fig. 3. In this embodiment, the jewelry generation model (denoted as organization network, abbreviated as OrN) is composed of three sub-networks, including a hand-drawing mapping model (denoted as SketchMappingNet, abbreviated as SMN), where the SMN is used to map a drawing process of a user hand-drawing to jewelry features; the second submodel is a context coding model (ContextEncodingNet, abbreviated CEN) for coding image information near the freehand painting; the third model is a decoding model (decodrenet, abbreviated as DN) for fusing the features extracted by the two sub-models and generating a jewelry generating image.
Further, the hand-drawn mapping model comprises a multilayer perceptron and an adaptive mapping layer; the mapping of the hand drawing process of the ornament hand drawing to the characteristics of the ornament by the hand drawing mapping model comprises the following steps:
cutting the ornament hand-drawn picture into a plurality of line segments, dividing the line segments into Q groups, wherein each group comprises K connected line segments, and inputting each group of line segments into the multilayer perceptron to obtain a D-dimensional characteristic vector;
and mapping the D-dimensional feature vector into a fixed-dimension jewelry feature map through the self-adaptive mapping layer.
In the embodiment, the ornament hand drawing is input by the user hand drawing, and in most cases, only a few areas have pixels, so that the input directly serving as the CNN appears sparse. In order to make the subsequent coding more targeted and effectively extract the ornament hand drawing, the SMN tries to take the edge drawn by the user as input, so that the problem of sparseness of image input is avoided. The SMN functions to receive several edges as input and map them into a fixed scale feature map. In the process, firstly, a hand-drawing of a user needs to be cut into a plurality of straight line segments, each line segment is represented by a quadruple (x 1, y1, x2, y 2), and K connected edges are taken as input and sent into the SMN. Since one freehand drawing can generally extract a plurality of edges, the total number of the edges is not generally an integral multiple of K, at this time, the number of the edges can be supplemented to be an integral multiple of K by filling 0, and the number of the completed edges is assumed to be Q times of K.
See fig. 4 for a schematic diagram of the SMN structure. SMNs are composed of multi-layer perceptrons (MLPs) and adaptive mapping layers (ADPLs). The two parts are executed separately, MLP receives only K edges at a time as input to get a D dimensional feature vector, when the total number of edges is Q x K, the Q x D dimensional output is made by connecting layers all together, and ADPL functions to map the Q x D output to a fixed dimensional output for fusion with the CEN output. While ADPL consists of two parts, adpPool and a full linker. The working process of the AdpPool is to divide the input of Q × D into P shares, each share is averaged separately to obtain the output of P × 1, the function of the full connection layer (Linear) is to map the input of P × 1 into a characteristic diagram of H × W1, and the characteristic diagram is the same as the output scale of CEN and the input scale of DN.
See fig. 5 for a schematic diagram of CEN structure. The context coding model receives 4 channels of input, where the first 3 channels are target area maps of RGB and the fourth channel is jewelry hand drawings, and the submodel consists of residual fourier convolution (Res-FFT-ConvBlock) and convolutional layer stacking for extracting the portrait features around the hand drawings. The structure of the residual fourier convolution block can be seen in fig. 6. The structure aims to extract context information around the hand-drawn graph, and convolution layers with different combined sensing domains are considered, so that the structure comprises a group of Fourier convolution and residual convolution branches, wherein the Fourier convolution branches can effectively reduce the operation amount of a module while ensuring that a receptive field is enlarged, and a small convolution kernel is used as a branch to extract the image characteristics of a small receptive field.
See figure 7 for a schematic diagram of the structure of DN. The main structure of the decoder DN is built up from convolutional layers, which receive as inputs the outputs from the Context Encoder (CEN) and the hand-mapped model (SMN), respectively, while introducing noise signals layer by layer to increase the texture details of the model output. The output of the DN is an RGB image. The UCL module in the decoder DN is an amplifying module for amplifying the output of the SMN stage by stage and filtering out the noise signals therein, and the structure of the UCL module can be seen in fig. 8. The UCL module is mainly composed of a PixelShuffle module and a self-attention module. The PixelShuffle module maps the input of H x W x 4C to 2H x 2W x C by rearranging the feature maps, wherein H represents the height of the feature maps, W represents the height of the feature maps, and C represents the number of channels of the feature maps. In the figure, conv-ReLU is convolution-ReLu activation, conv-BN-Sigmoid is convolution-batch regularization-Sigmoid activation, and Conv-Leaky is convolution-LeakyReLU activation.
Further, the inputting the jewelry generation image and the first face image into a pre-trained jewelry fusion model for fusion to obtain a first result diagram, including:
according to the formula O = Ix (1-O) a )+O rgb ×O a Calculating the final output of the jewelry fusion model; wherein O represents the first result chart, I represents an input picture including the first face image and the jewelry-generating image, and O rgb Representing said item of jewellery generating an image, O a The fusion ratio is indicated.
See fig. 9 for a schematic structural diagram of the jewelry fusion model. Because the function of the OrN is to generate the jewelry in the local part of the wearing area, the generated pictures also need to be fused to the portrait in a certain proportion. The traditional effect fusion is to fuse RGB generated by a model to an original image according to a certain proportion, and although the method is simple and convenient, the influence of the whole illumination condition of the portrait on the jewelry wearing effect is ignored. Therefore, the jewelry fusion model designed by the embodiment solves the problems, and the structure of the jewelry fusion model consists of a stacked convolution module (ConvBlock) and a residual module (ResBlock). The input of the jewelry fusion model is a portrait and a jewelry region map (the dimensions of the portrait and the portrait are the same) and the output is a fusion proportion O a . Recording the output of the jewelry generation model as O rgb If the input picture (including the first face image and the jewelry-generating image) is denoted as I and the final output is denoted as O, the result can be expressed as:
O=I×(1-O a )+O rgb ×O a
further, the training process of the jewelry generation model comprises the following steps:
supervising a result graph output by the jewelry generation model by utilizing a first loss function and a second loss function, wherein,
the first loss function is:
Figure BDA0003852680140000091
the second loss function is:
Figure BDA0003852680140000092
Figure BDA0003852680140000093
o represents a result graph output by the jewelry generating model, T represents a target image, H represents an image height, W represents an image width, and W represents win Representing the side length of a square window, O sketch Grass plot, l 'representing results of model output' sketch Expressing the sketch image after key position extraction, lambda i,j Representing a window-dependent weight parameter.
In the specific implementation, the data acquisition preparation is required to be carried out in advance to train the model, wherein the data set is composed of a large number of data pairs < I, I d Composition, wherein I represents a portrait without jewelry, I d The portrait with jewelry is shown, and data acquisition is carried out through two channels of model photographing and traditional chartlet in the embodiment. In which I d Obtaining a hand-drawing image I of a jewelry area by methods such as a hand-drawing model (the purpose of the model is to convert a color picture into a hand-drawing image (line drawing)), image edge detection and the like sketch . These data will then be divided into a training set and a test set. During training, lines on the hand drawing are randomly erased and distorted, and the lines are straightened to simulate the condition of manual hand drawing.
Monitoring the output result of the jewelry generating model by constructing a loss function, and comparing the output of the jewelry generating model with a target image at a pixel level, wherein the loss function comprises the following steps:
Figure BDA0003852680140000101
where O represents the model output, T represents the target image, H represents the image height, and W represents the image width.
Inputting the output result O into a hand-drawn graph model based on deep learning to obtain a hand-drawn graph O sketch Namely:
O sketch =Sketch(θ,O)
where Sketch represents a hand-drawing model, and θ is a parameter of the hand-drawing model. Limited by input equipment and user drawing capability, because the hand-drawn picture input by the user is generally rough and has a certain difference with the outline of the real object, if the hand-drawn picture of the model generation result is required to be completely consistent with the input of the user, a proper jewelry image is difficult to obtain in practical use, so long as the hand-drawn picture and the hand-drawn picture are similar, in order to calculate similarity, firstly, filtering is carried out on the jewelry hand-drawn picture input by the user, the key position calculation loss is extracted, and the process of extracting the key position is as follows:
using side length W win The square window traverses the input hand-drawn graph and has a value v for the point (x, y) at the center of the window x,y Comprises the following steps:
Figure BDA0003852680140000102
wherein the content of the first and second substances,
Figure BDA0003852680140000103
the image with the key position extracted is recorded as I' skctch
The result generated by the ornament generation model generally has more details than the hand-drawn by the user, so the result output by the model is used as the hand-drawn graph O obtained by inputting the hand-drawn graph model sketch In addition to covering the above-mentioned critical points, at the same time should allow a suitable increase in I sketch Details not described therein. Likewise, the length of side is W win The square window traverses the image area, and for convenience of expression, the central position of the window is set as the origin of coordinates, and the hand-drawn part Loss can be expressed as:
Figure BDA0003852680140000104
wherein λ is i,j Is a window related parameter, and the calculation formula is:
Figure BDA0003852680140000111
where ζ is a sign function:
Figure BDA0003852680140000112
the purpose of the window correlation function is to make the pixels of the key points get sufficient "attention", and at the same time, appropriately relax the requirements on the surrounding area, so that the hand-drawing generating the result is similar to the hand-drawing result of the user.
Further, the training process of the jewelry fusion model comprises the following steps:
monitoring a result graph output by the jewelry fusion model by utilizing a third loss function and a fourth loss function; wherein the content of the first and second substances,
the third loss function is:
Figure BDA0003852680140000113
the fourth loss function is:
Figure BDA0003852680140000114
F o a result graph, F, representing the output of the jewelry fusion model t Shows the target effect diagram phi j And H represents the image height, and W represents the image width.
In specific implementation, the final output of the jewelry fusion model also needs to be supervised. For ease of presentation, F is used o Representing the final output of the fusion model, F t Representing the target effect, and adopting the following loss function for supervision:
fusion loss:
Figure BDA0003852680140000115
the Loss uses L1-Loss, and the final output result of the jewelry fusion model is supervised at the pixel level.
Perception of loss:
Figure BDA0003852680140000121
wherein phi j And the characteristic diagram represents the output of the last convolutional layer of the j module of the image through the VGG16 network.
Judging loss:
L dis =-log(D f (F o ,F t ))
wherein D is f For the discriminator, the output is 1 when the picture is a real picture, in order to bring the generated result closer to the real picture.
According to the method, the jewelry is generated by means of the jewelry hand drawing input by the user on the face image, the constraint of a material library is eliminated, the background information of pixels around the jewelry can be better utilized, the wearing effect of the jewelry is more reasonable and natural, and meanwhile, the personal customization of the user can be realized by intervening the hand drawing, so that the personalized jewelry effect is obtained.
Fig. 10 is a schematic structural diagram of a CNN-based face ornament generation apparatus according to an embodiment of the present invention.
In the present embodiment, the apparatus 10 includes:
the image preprocessing device comprises a preprocessing unit 101, a first image processing unit and a second image processing unit, wherein the preprocessing unit 101 is used for inputting an image to be processed, preprocessing the image to be processed to obtain a first face image, and the image to be processed comprises a face area;
the jewelry generation unit 102 is used for determining a target region to be decorated in the first face image, and inputting the target region to a pre-trained jewelry generation model after superimposing an jewelry hand drawing on the target region to obtain a jewelry generation image;
the first fusion unit 103 is used for inputting the jewelry generation image and the first face image into a pre-trained jewelry fusion model for fusion to obtain a first result diagram;
and a second fusion unit 104, configured to perform portrait posture reduction and fusion processing on the first result graph to obtain a second result graph.
Each unit module of the apparatus 10 can respectively execute the corresponding steps in the above method embodiments, and therefore, the description of each unit module is omitted here, and please refer to the description of the corresponding steps above in detail.
An embodiment of the present invention further provides an apparatus, where the apparatus includes the above-mentioned CNN-based face ornament generating device, where the CNN-based face ornament generating device may adopt the structure in the embodiment in fig. 10, and correspondingly, the technical solution in the embodiment of the method shown in fig. 1 may be implemented, and the implementation principle and the technical effect thereof are similar, and details may be referred to relevant descriptions in the above-mentioned embodiments, and are not described herein again.
The apparatus comprises: a device having a photographing function, such as a mobile phone, a digital camera, or a tablet computer, or a device having an image processing function, or a device having an image display function. The apparatus may include components such as a memory, a processor, an input unit, a display unit, a power supply, and the like.
The memory may be used to store software programs and modules, and the processor may execute various functional applications and data processing by operating the software programs and modules stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (e.g., an image playing function, etc.) required by at least one function, and the like; the storage data area may store data created according to use of the device, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may further include a memory controller to provide access to the memory by the processor and the input unit.
The input unit may be used to receive input numeric or character or image information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. Specifically, the input unit of the present embodiment may include a touch-sensitive surface (e.g., a touch display screen) and other input devices in addition to the camera.
The display unit may be used to display information input by or provided to a user and various graphical user interfaces of the device, which may be made up of graphics, text, icons, video, and any combination thereof. The display unit may include a display panel, and optionally, the display panel may be configured in the form of an LCD (Liquid crystal display), an OLED (organic light-emitting diode), or the like. Further, the touch-sensitive surface may overlie the display panel, and when a touch operation is detected on or near the touch-sensitive surface, the touch operation is transmitted to the processor to determine the type of touch event, and the processor then provides a corresponding visual output on the display panel in accordance with the type of touch event.
An embodiment of the present invention further provides a computer-readable storage medium, which may be the computer-readable storage medium contained in the memory in the foregoing embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium has stored therein at least one instruction that is loaded and executed by a processor to implement the CNN-based face ornaments generation method shown in fig. 1. The computer readable storage medium may be a read-only memory, a magnetic or optical disk, or the like.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the apparatus embodiment, and the storage medium embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and reference may be made to some descriptions of the method embodiment for relevant points.
Also, in this document, the terms "include", "include" or any other variation thereof are intended to cover a non-exclusive inclusion, so that a process, method, article, or apparatus that includes a series of elements includes not only those elements but also other elements not explicitly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
While the foregoing specification illustrates and describes the preferred embodiments of the present invention, it is to be understood that the invention is not limited to the precise forms disclosed herein and is not to be construed as limited to other embodiments, but may be used in various other combinations, modifications, and environments and may be modified within the scope of the inventive concept as expressed herein, by the above teachings or by the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A CNN-based face ornament generation method is characterized by comprising the following steps:
inputting an image to be processed, and preprocessing the image to be processed to obtain a first face image, wherein the image to be processed comprises a face area;
determining a target area for modification in the first face image, and inputting the target area to a pre-trained jewelry generation model after superimposing an jewelry hand drawing on the target area to obtain a jewelry generation image;
inputting the jewelry generation image and the first face image into a pre-trained jewelry fusion model for fusion to obtain a first result chart;
and performing portrait posture reduction and fusion processing on the first result image to obtain a second result image.
2. The CNN-based face adornment generation method of claim 1, wherein the jewelry generation model comprises a freehand mapping model, a context coding model and a decoding model,
mapping the drawing process of the ornament hand drawing into jewelry characteristics through the hand drawing mapping model;
extracting the portrait characteristics around the ornament hand-drawing picture through the context coding model;
and fusing the jewelry characteristics and the portrait characteristics through the decoding model to generate the jewelry generating image.
3. The CNN-based method for generating ornaments as claimed in claim 2, wherein the freehand mapping model comprises a multi-layered perceptron and an adaptive mapping layer; the mapping of the hand drawing process of the ornament hand drawing to the characteristics of the ornament by the hand drawing mapping model comprises the following steps:
cutting the ornament hand-drawn picture into a plurality of line segments, dividing the line segments into Q groups, wherein each group comprises K connected line segments, and inputting each group of line segments into the multilayer perceptron to obtain a D-dimensional feature vector;
and mapping the D-dimensional feature vector into a fixed-dimension jewelry feature map through the self-adaptive mapping layer.
4. The CNN-based face ornament generation method of claim 1, wherein the inputting the ornament generation image and the first face image into a pre-trained ornament fusion model for fusion to obtain a first result diagram comprises:
according to the formula O = Ix (1-O) a )+O rgb ×O a Calculating the final output of the jewelry fusion model; wherein O represents the first result graph, I represents an input picture, O rgb Representing said item of jewellery generating an image, O a The fusion ratio is indicated.
5. The CNN-based face ornament generation method of claim 1, wherein the training process of the ornament generation model comprises:
supervising a result graph output by the jewelry generation model by utilizing a first loss function and a second loss function, wherein,
the first loss function is:
Figure DEST_PATH_BDA0003852680140000021
the second loss function is:
Figure FDA0003852680130000023
o represents a result graph output by the jewelry generating model, T represents a target image, H represents an image height, W represents an image width, and W represents win Representing the side length of a square window, O sketch Grass plot, l 'representing the results of model output' sketch Expressing the sketch image after key position extraction, lambda i,j Representing a window-dependent weight parameter.
6. The CNN-based face ornament generation method of claim 5, wherein the CNN-based face ornament generation method is based on a formula
Figure DEST_PATH_BDA0003852680140000031
Calculating the lambda i,j Where, ζ is a sign function,
Figure FDA00038526801300000210
7. the CNN-based face ornament generation method of claim 1, wherein the jewelry fusion model training process comprises:
supervising a result graph output by the jewelry fusion model by utilizing a third loss function and a fourth loss function; wherein the content of the first and second substances,
the third loss function is:
Figure DEST_PATH_BDA0003852680140000033
the fourth loss function is:
Figure DEST_PATH_BDA0003852680140000034
F o a result graph, F, representing the output of the jewelry fusion model t Shows the target effect diagram phi j And H represents the image height, and W represents the image width.
8. A CNN-based face ornament generation apparatus, comprising:
the system comprises a preprocessing unit, a display unit and a processing unit, wherein the preprocessing unit is used for inputting an image to be processed, preprocessing the image to be processed to obtain a first face image, and the image to be processed comprises a face area;
the ornament generation unit is used for determining a target area for modification in the first person face image, and inputting the ornament to a pre-trained ornament generation model after the ornament hand drawing is superimposed on the target area to obtain an ornament generation image;
the first fusion unit is used for inputting the jewelry generation image and the first face image into a pre-trained jewelry fusion model for fusion to obtain a first result chart;
and the second fusion unit is used for carrying out portrait posture reduction and fusion processing on the first result image to obtain a second result image.
9. An apparatus comprising a processor, a memory, and a computer program stored in the memory for execution by the processor to perform the steps of a CNN-based face ornaments generation method as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium, having stored thereon a computer program for execution by a processor to perform the steps of a CNN-based face ornaments generation method as claimed in any one of claims 1 to 7.
CN202211137358.XA 2022-09-19 2022-09-19 CNN-based face ornament generation method, device and equipment Pending CN115601459A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211137358.XA CN115601459A (en) 2022-09-19 2022-09-19 CNN-based face ornament generation method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211137358.XA CN115601459A (en) 2022-09-19 2022-09-19 CNN-based face ornament generation method, device and equipment

Publications (1)

Publication Number Publication Date
CN115601459A true CN115601459A (en) 2023-01-13

Family

ID=84843937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211137358.XA Pending CN115601459A (en) 2022-09-19 2022-09-19 CNN-based face ornament generation method, device and equipment

Country Status (1)

Country Link
CN (1) CN115601459A (en)

Similar Documents

Publication Publication Date Title
US10810633B2 (en) Generating a shoppable video
US20240062444A1 (en) Virtual clothing try-on
CN114930399A (en) Image generation using surface-based neurosynthesis
KR20170094279A (en) Methods for generating a 3d virtual body model of a person combined with a 3d garment image, and related devices, systems and computer program products
US11521362B2 (en) Messaging system with neural hair rendering
CN111491187B (en) Video recommendation method, device, equipment and storage medium
US11900506B2 (en) Controlling interactive fashion based on facial expressions
US20230066179A1 (en) Interactive fashion with music ar
CN115244495A (en) Real-time styling for virtual environment motion
WO2023197780A1 (en) Image processing method and apparatus, electronic device, and storage medium
CN109074680A (en) Realtime graphic and signal processing method and system in augmented reality based on communication
US20240013463A1 (en) Applying animated 3d avatar in ar experiences
WO2023055825A1 (en) 3d upper garment tracking
CN114026524B (en) Method, system, and computer-readable medium for animating a face
US11960653B2 (en) Controlling augmented reality effects through multi-modal human interaction
CN115601459A (en) CNN-based face ornament generation method, device and equipment
WO2022213030A1 (en) Neural networks accompaniment extraction from songs
Hossain et al. Professional Information Visualization Using Augmented Reality; AR Visiting Card
US20240127563A1 (en) Stylizing a whole-body of a person
US20240087266A1 (en) Deforming real-world object using image warping
US11983826B2 (en) 3D upper garment tracking
US20240163489A1 (en) Navigating previously captured images and ar experiences
US20240160343A1 (en) Selectively modifying a gui
Wang Enhancing the Creative Process in Digital Prototyping
WO2024058966A1 (en) Deforming real-world object using image warping

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination