CN115601459A

CN115601459A - CNN-based face ornament generation method, device and equipment

Info

Publication number: CN115601459A
Application number: CN202211137358.XA
Authority: CN
Inventors: 周勉; 尚伟艺; 刘洛麒
Original assignee: Xiamen Meitu Technology Co Ltd
Current assignee: Xiamen Meitu Technology Co Ltd
Priority date: 2022-09-19
Filing date: 2022-09-19
Publication date: 2023-01-13

Abstract

The invention discloses a face ornament generation method, a device, equipment and a storage medium based on CNN, comprising the following steps: inputting an image to be processed, and preprocessing the image to be processed to obtain a first face image, wherein the image to be processed comprises a face area; determining a target area for modification in the first face image, and inputting the target area to a pre-trained jewelry generation model after superimposing an jewelry hand drawing on the target area to obtain a jewelry generation image; inputting the jewelry generation image and the first face image into a pre-trained jewelry fusion model for fusion to obtain a first result chart; and performing portrait posture reduction and fusion processing on the first result image to obtain a second result image. The method can fully utilize the context information such as illumination conditions, portrait postures and the like in the image to generate the jewelry, so that the generated jewelry is more natural and reasonable.

Description

CNN-based face ornament generation method, device and equipment

Technical Field

The invention relates to the technical field of image processing, in particular to a face ornament generation method, a face ornament generation device and face ornament generation equipment based on CNN.

Background

The face ornaments comprise articles decorated by faces such as glasses, false eyelashes, beautiful pupils, earrings and the like, and the color value of the portrait can be effectively improved by wearing the ornaments. In the past, people often need to spend a great deal of time on-line experience or online shopping attempts to find suitable ornaments, and the process is time-consuming and labor-consuming. In order to save time, computer simulation is tried to attach the jewelry to the face of a target person so as to realize virtual jewelry wearing, so that the wearing effect of the jewelry can be viewed in advance. The traditional simulation mode is mainly characterized in that existing jewelry materials are pasted to a target area of a portrait, and a certain fusion algorithm is matched to enable the final effect to look natural, but the traditional mode of adding ornaments to the portrait through a mapping mode is prone to causing the phenomenon of disharmonious effect and unnatural situation, has higher requirements on the photographing of the portrait, and cannot meet the defects of personalized requirements.

Disclosure of Invention

In view of the above, the present invention aims to provide a CNN-based face ornament generation method, apparatus and device, and to solve the problems that the existing method of adding ornaments to a portrait by mapping is easy to cause a situation of improper effect and unnatural mapping effect.

In order to achieve the above object, the present invention provides a CNN-based face ornament generation method, including:

inputting an image to be processed, and preprocessing the image to be processed to obtain a first face image, wherein the image to be processed comprises a face area;

determining a target area for modification in the first face image, and inputting the target area to a pre-trained jewelry generation model after superimposing a jewelry hand drawing on the target area to obtain a jewelry generation image;

inputting the jewelry generation image and the first face image into a pre-trained jewelry fusion model for fusion to obtain a first result chart;

and performing portrait posture reduction and fusion processing on the first result image to obtain a second result image.

Preferably, the jewelry generation model comprises a freehand mapping model, a context coding model and a decoding model, wherein,

mapping the drawing process of the ornament hand drawing into jewelry characteristics through the hand drawing mapping model;

extracting the portrait characteristics around the ornament hand-drawing picture through the context coding model;

and fusing the jewelry characteristics and the portrait characteristics through the decoding model to generate the jewelry generating image.

Preferably, the hand-drawn mapping model comprises a multilayer perceptron and an adaptive mapping layer; the mapping of the drawing process of the ornament hand drawing to the ornament characteristics through the hand drawing mapping model comprises the following steps:

cutting the ornament hand-drawn picture into a plurality of line segments, dividing the line segments into Q groups, wherein each group comprises K connected line segments, and inputting each group of line segments into the multilayer sensor to obtain a D-dimensional characteristic vector;

and mapping the D-dimensional feature vector into a fixed-dimension jewelry feature map through the self-adaptive mapping layer.

Preferably, the inputting the jewelry generating image and the first face image into a pre-trained jewelry fusion model for fusion to obtain a first result diagram includes:

according to the formula O = Ix (1-O) _a )+O _rgb ×O _a Calculating the final output of the jewelry fusion model; wherein O represents the first result graph, I represents an input picture, O _rgb Representing said item of jewellery generating an image, O _a The fusion ratio is indicated.

Preferably, the training process of the jewelry generation model comprises the following steps:

supervising a result graph output by the jewelry generation model by utilizing a first loss function and a second loss function, wherein,

the first loss function is:

the second loss function is:

o represents a result graph output by the jewelry generation model, T represents a target image, H represents an image height, W represents an image width, and W represents _win Representing the side length of a square window, O _sketch Grass plot, l 'representing results of model output' _sketch Expressing the sketch image after key position extraction, lambda _i，j Representing the window-dependent weight parameter.

Preferably according to the formula

Calculating said λ _i，j Where, ζ is a sign function,

preferably, the training process of the jewelry fusion model includes:

monitoring a result graph output by the jewelry fusion model by utilizing a third loss function and a fourth loss function; wherein the content of the first and second substances,

the third loss function is:

the fourth loss function is:

F _o a result graph, F, representing the output of the jewelry fusion model _t Shows the target effect diagram phi _j And H represents the image height, and W represents the image width.

In order to achieve the above object, the present invention further provides a CNN-based face ornament generation apparatus, including:

the system comprises a preprocessing unit, a display unit and a processing unit, wherein the preprocessing unit is used for inputting an image to be processed, preprocessing the image to be processed to obtain a first face image, and the image to be processed comprises a face area;

the ornament generation unit is used for determining a target area for modification in the first face image, and inputting the target area to a pre-trained ornament generation model after overlaying an ornament hand drawing on the target area to obtain an ornament generation image;

the first fusion unit is used for inputting the jewelry generation image and the first face image into a pre-trained jewelry fusion model for fusion to obtain a first result chart;

and the second fusion unit is used for carrying out portrait posture reduction and fusion processing on the first result image to obtain a second result image.

In order to achieve the above object, the present invention also proposes an apparatus comprising a processor, a memory, and a computer program stored in the memory, the computer program being executed by the processor to implement the steps of a CNN-based face ornament generation method as described in the above embodiments.

In order to achieve the above object, the present invention also proposes a computer readable storage medium having a computer program stored thereon, the computer program being executed by a processor to implement the steps of a CNN-based face ornament generation method as described in the above embodiments.

Has the advantages that:

according to the scheme, the target area is selected in the face image, the ornament generation model and the ornament fusion model are adopted to generate the face ornament according to the ornament hand-drawing image, and the ornament generation is directly performed in the target area of the face image, so that the illumination condition, the portrait posture and other context information in the image can be fully utilized to generate the ornament, and the generated ornament is more natural and reasonable.

According to the scheme, the ornament is generated based on the ornament hand-drawing of the user, the background information of pixels around the ornament can be better utilized while the constraint of a material library is broken away, the wearing effect of the ornament is more reasonable and natural, meanwhile, the individual customization of the user can be realized by intervening the ornament hand-drawing, the personalized ornament effect is obtained, and the personalized requirement is met.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a CNN-based method for generating a facial ornament according to an embodiment of the present invention.

Fig. 2 is a schematic flow chart of a method for generating a facial ornament according to an embodiment of the present invention.

Fig. 3 is a schematic overall structure diagram of the jewelry generation model according to an embodiment of the present invention.

Fig. 4 is a schematic structural diagram of an SMN according to an embodiment of the present invention.

Fig. 5 is a schematic structural diagram of a CEN according to an embodiment of the present invention.

Fig. 6 is a schematic structural diagram of a residual fourier convolution block according to an embodiment of the present invention.

Fig. 7 is a schematic structural diagram of DN according to an embodiment of the present invention.

Fig. 8 is a schematic structural diagram of a UCL module according to an embodiment of the present invention.

Fig. 9 is a schematic structural diagram of a jewelry fusion model according to an embodiment of the present invention.

Fig. 10 is a schematic structural diagram of a CNN-based face ornament generation apparatus according to an embodiment of the present invention.

The implementation, functional features and advantages of the present invention will be further described with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without inventive efforts based on the embodiments of the present invention, are within the scope of protection of the present invention.

In the description of the present invention, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature.

The present invention will be described in detail with reference to examples.

Because the existing face ornament generation mode needs to prepare materials in advance, the selection of a user is greatly limited by a material library, namely, the user mainly depends on the materials, and the content is limited; moreover, the user needs to select the favorite materials from a large number of materials, which is a time-consuming process, so that the searching is time-consuming, namely, the user requirements cannot be met due to few materials in the traditional mode, or the personalized requirements of the user cannot be met due to time-consuming multi-user searching of the materials. In addition, the material is not related to the portrait of the user, the material is generated independently of the specific portrait, and the illumination of the portrait and the face pose information cannot be fully utilized, so that the photographing condition of the portrait needs to be limited.

Based on the method, the target area is selected in the face image, the jewelry generation model and the jewelry fusion model are adopted to generate the face jewelry according to the jewelry hand-drawing image, the illumination condition, the portrait posture and other context information in the image can be fully utilized to generate the jewelry, and the generated jewelry is more natural and reasonable; ornament generation is carried out based on the ornament hand-drawn picture of the user, and the individual requirements of the user can be met.

In this embodiment, the method includes:

s11, inputting an image to be processed, and preprocessing the image to be processed to obtain a first face image, wherein the image to be processed comprises a face area.

S12, determining a target area to be modified in the first face image, and inputting the target area to a pre-trained jewelry generation model after superimposing an jewelry hand drawing on the target area to obtain a jewelry generation image.

And S13, inputting the jewelry generation image and the first face image into a pre-trained jewelry fusion model for fusion to obtain a first result chart.

And S14, performing portrait posture reduction and fusion processing on the first result image to obtain a second result image.

Refer to the flow chart of the method for generating the ornaments shown in FIG. 2. In this embodiment, the preprocessing the image to be processed includes: obtaining a face point set FP in an image I to be processed by a mature face detection and face alignment method based on CNN at present, calculating an external rectangle thereof, and expanding outwards to obtain a cutting rectangle of a face; and obtaining the rotation angle of the face through the cutting rectangle of the face, and cutting the face image F after the face is straightened from the image I to be processed. And simultaneously converting the face point set FP into the coordinates of the face image F, and recording as FP.

The user selects a target wearing area of the ornament in the face image F and inputs the ornament hand drawing, and the ornament generation model generates the ornament according to the ornament hand drawing and related background information. After the user confirms that the generated ornament meets the preset condition, the image of the ornament area is fused back to the face image F again, and the final effect is optimized through the ornament fusion model, so that the generated effect is more natural. And finally, rotating the face back to the pose of the original image according to FP and fusing. The face jewelry is generated based on a sketch drawn by a user, and the user can continuously add new texture information according to the generation effect of the model in the whole process, so that the jewelry effect is gradually refined.

Further, the jewelry-generating model comprises a freehand mapping model, a context coding model and a decoding model, wherein,

The overall structure of the jewelry generative model is shown in fig. 3. In this embodiment, the jewelry generation model (denoted as organization network, abbreviated as OrN) is composed of three sub-networks, including a hand-drawing mapping model (denoted as SketchMappingNet, abbreviated as SMN), where the SMN is used to map a drawing process of a user hand-drawing to jewelry features; the second submodel is a context coding model (ContextEncodingNet, abbreviated CEN) for coding image information near the freehand painting; the third model is a decoding model (decodrenet, abbreviated as DN) for fusing the features extracted by the two sub-models and generating a jewelry generating image.

Further, the hand-drawn mapping model comprises a multilayer perceptron and an adaptive mapping layer; the mapping of the hand drawing process of the ornament hand drawing to the characteristics of the ornament by the hand drawing mapping model comprises the following steps:

cutting the ornament hand-drawn picture into a plurality of line segments, dividing the line segments into Q groups, wherein each group comprises K connected line segments, and inputting each group of line segments into the multilayer perceptron to obtain a D-dimensional characteristic vector;

In the embodiment, the ornament hand drawing is input by the user hand drawing, and in most cases, only a few areas have pixels, so that the input directly serving as the CNN appears sparse. In order to make the subsequent coding more targeted and effectively extract the ornament hand drawing, the SMN tries to take the edge drawn by the user as input, so that the problem of sparseness of image input is avoided. The SMN functions to receive several edges as input and map them into a fixed scale feature map. In the process, firstly, a hand-drawing of a user needs to be cut into a plurality of straight line segments, each line segment is represented by a quadruple (x 1, y1, x2, y 2), and K connected edges are taken as input and sent into the SMN. Since one freehand drawing can generally extract a plurality of edges, the total number of the edges is not generally an integral multiple of K, at this time, the number of the edges can be supplemented to be an integral multiple of K by filling 0, and the number of the completed edges is assumed to be Q times of K.

See fig. 4 for a schematic diagram of the SMN structure. SMNs are composed of multi-layer perceptrons (MLPs) and adaptive mapping layers (ADPLs). The two parts are executed separately, MLP receives only K edges at a time as input to get a D dimensional feature vector, when the total number of edges is Q x K, the Q x D dimensional output is made by connecting layers all together, and ADPL functions to map the Q x D output to a fixed dimensional output for fusion with the CEN output. While ADPL consists of two parts, adpPool and a full linker. The working process of the AdpPool is to divide the input of Q × D into P shares, each share is averaged separately to obtain the output of P × 1, the function of the full connection layer (Linear) is to map the input of P × 1 into a characteristic diagram of H × W1, and the characteristic diagram is the same as the output scale of CEN and the input scale of DN.

See fig. 5 for a schematic diagram of CEN structure. The context coding model receives 4 channels of input, where the first 3 channels are target area maps of RGB and the fourth channel is jewelry hand drawings, and the submodel consists of residual fourier convolution (Res-FFT-ConvBlock) and convolutional layer stacking for extracting the portrait features around the hand drawings. The structure of the residual fourier convolution block can be seen in fig. 6. The structure aims to extract context information around the hand-drawn graph, and convolution layers with different combined sensing domains are considered, so that the structure comprises a group of Fourier convolution and residual convolution branches, wherein the Fourier convolution branches can effectively reduce the operation amount of a module while ensuring that a receptive field is enlarged, and a small convolution kernel is used as a branch to extract the image characteristics of a small receptive field.

See figure 7 for a schematic diagram of the structure of DN. The main structure of the decoder DN is built up from convolutional layers, which receive as inputs the outputs from the Context Encoder (CEN) and the hand-mapped model (SMN), respectively, while introducing noise signals layer by layer to increase the texture details of the model output. The output of the DN is an RGB image. The UCL module in the decoder DN is an amplifying module for amplifying the output of the SMN stage by stage and filtering out the noise signals therein, and the structure of the UCL module can be seen in fig. 8. The UCL module is mainly composed of a PixelShuffle module and a self-attention module. The PixelShuffle module maps the input of H x W x 4C to 2H x 2W x C by rearranging the feature maps, wherein H represents the height of the feature maps, W represents the height of the feature maps, and C represents the number of channels of the feature maps. In the figure, conv-ReLU is convolution-ReLu activation, conv-BN-Sigmoid is convolution-batch regularization-Sigmoid activation, and Conv-Leaky is convolution-LeakyReLU activation.

Further, the inputting the jewelry generation image and the first face image into a pre-trained jewelry fusion model for fusion to obtain a first result diagram, including:

according to the formula O = Ix (1-O) _a )+O _rgb ×O _a Calculating the final output of the jewelry fusion model; wherein O represents the first result chart, I represents an input picture including the first face image and the jewelry-generating image, and O _rgb Representing said item of jewellery generating an image, O _a The fusion ratio is indicated.

See fig. 9 for a schematic structural diagram of the jewelry fusion model. Because the function of the OrN is to generate the jewelry in the local part of the wearing area, the generated pictures also need to be fused to the portrait in a certain proportion. The traditional effect fusion is to fuse RGB generated by a model to an original image according to a certain proportion, and although the method is simple and convenient, the influence of the whole illumination condition of the portrait on the jewelry wearing effect is ignored. Therefore, the jewelry fusion model designed by the embodiment solves the problems, and the structure of the jewelry fusion model consists of a stacked convolution module (ConvBlock) and a residual module (ResBlock). The input of the jewelry fusion model is a portrait and a jewelry region map (the dimensions of the portrait and the portrait are the same) and the output is a fusion proportion O _a . Recording the output of the jewelry generation model as O _rgb If the input picture (including the first face image and the jewelry-generating image) is denoted as I and the final output is denoted as O, the result can be expressed as:

O＝I×(1-O _a )+O _rgb ×O _a

further, the training process of the jewelry generation model comprises the following steps:

the first loss function is:

the second loss function is:

o represents a result graph output by the jewelry generating model, T represents a target image, H represents an image height, W represents an image width, and W represents _win Representing the side length of a square window, O _sketch Grass plot, l 'representing results of model output' _sketch Expressing the sketch image after key position extraction, lambda _i，j Representing a window-dependent weight parameter.

In the specific implementation, the data acquisition preparation is required to be carried out in advance to train the model, wherein the data set is composed of a large number of data pairs < I, I _d Composition, wherein I represents a portrait without jewelry, I _d The portrait with jewelry is shown, and data acquisition is carried out through two channels of model photographing and traditional chartlet in the embodiment. In which I _d Obtaining a hand-drawing image I of a jewelry area by methods such as a hand-drawing model (the purpose of the model is to convert a color picture into a hand-drawing image (line drawing)), image edge detection and the like _sketch . These data will then be divided into a training set and a test set. During training, lines on the hand drawing are randomly erased and distorted, and the lines are straightened to simulate the condition of manual hand drawing.

Monitoring the output result of the jewelry generating model by constructing a loss function, and comparing the output of the jewelry generating model with a target image at a pixel level, wherein the loss function comprises the following steps:

where O represents the model output, T represents the target image, H represents the image height, and W represents the image width.

Inputting the output result O into a hand-drawn graph model based on deep learning to obtain a hand-drawn graph O _sketch Namely:

O _sketch ＝Sketch(θ，O)

where Sketch represents a hand-drawing model, and θ is a parameter of the hand-drawing model. Limited by input equipment and user drawing capability, because the hand-drawn picture input by the user is generally rough and has a certain difference with the outline of the real object, if the hand-drawn picture of the model generation result is required to be completely consistent with the input of the user, a proper jewelry image is difficult to obtain in practical use, so long as the hand-drawn picture and the hand-drawn picture are similar, in order to calculate similarity, firstly, filtering is carried out on the jewelry hand-drawn picture input by the user, the key position calculation loss is extracted, and the process of extracting the key position is as follows:

using side length W _win The square window traverses the input hand-drawn graph and has a value v for the point (x, y) at the center of the window _x，y Comprises the following steps:

wherein the content of the first and second substances,

the image with the key position extracted is recorded as I' _skctch 。

The result generated by the ornament generation model generally has more details than the hand-drawn by the user, so the result output by the model is used as the hand-drawn graph O obtained by inputting the hand-drawn graph model _sketch In addition to covering the above-mentioned critical points, at the same time should allow a suitable increase in I _sketch Details not described therein. Likewise, the length of side is W _win The square window traverses the image area, and for convenience of expression, the central position of the window is set as the origin of coordinates, and the hand-drawn part Loss can be expressed as:

wherein λ is _i，j Is a window related parameter, and the calculation formula is:

where ζ is a sign function:

the purpose of the window correlation function is to make the pixels of the key points get sufficient "attention", and at the same time, appropriately relax the requirements on the surrounding area, so that the hand-drawing generating the result is similar to the hand-drawing result of the user.

Further, the training process of the jewelry fusion model comprises the following steps:

the third loss function is:

the fourth loss function is:

In specific implementation, the final output of the jewelry fusion model also needs to be supervised. For ease of presentation, F is used _o Representing the final output of the fusion model, F _t Representing the target effect, and adopting the following loss function for supervision:

fusion loss:

the Loss uses L1-Loss, and the final output result of the jewelry fusion model is supervised at the pixel level.

Perception of loss:

wherein phi _j And the characteristic diagram represents the output of the last convolutional layer of the j module of the image through the VGG16 network.

Judging loss:

L _dis ＝-log(D _f (F _o ，F _t ))

wherein D is _f For the discriminator, the output is 1 when the picture is a real picture, in order to bring the generated result closer to the real picture.

According to the method, the jewelry is generated by means of the jewelry hand drawing input by the user on the face image, the constraint of a material library is eliminated, the background information of pixels around the jewelry can be better utilized, the wearing effect of the jewelry is more reasonable and natural, and meanwhile, the personal customization of the user can be realized by intervening the hand drawing, so that the personalized jewelry effect is obtained.

In the present embodiment, the apparatus 10 includes:

the image preprocessing device comprises a preprocessing unit 101, a first image processing unit and a second image processing unit, wherein the preprocessing unit 101 is used for inputting an image to be processed, preprocessing the image to be processed to obtain a first face image, and the image to be processed comprises a face area;

the jewelry generation unit 102 is used for determining a target region to be decorated in the first face image, and inputting the target region to a pre-trained jewelry generation model after superimposing an jewelry hand drawing on the target region to obtain a jewelry generation image;

the first fusion unit 103 is used for inputting the jewelry generation image and the first face image into a pre-trained jewelry fusion model for fusion to obtain a first result diagram;

and a second fusion unit 104, configured to perform portrait posture reduction and fusion processing on the first result graph to obtain a second result graph.

Each unit module of the apparatus 10 can respectively execute the corresponding steps in the above method embodiments, and therefore, the description of each unit module is omitted here, and please refer to the description of the corresponding steps above in detail.

An embodiment of the present invention further provides an apparatus, where the apparatus includes the above-mentioned CNN-based face ornament generating device, where the CNN-based face ornament generating device may adopt the structure in the embodiment in fig. 10, and correspondingly, the technical solution in the embodiment of the method shown in fig. 1 may be implemented, and the implementation principle and the technical effect thereof are similar, and details may be referred to relevant descriptions in the above-mentioned embodiments, and are not described herein again.

The apparatus comprises: a device having a photographing function, such as a mobile phone, a digital camera, or a tablet computer, or a device having an image processing function, or a device having an image display function. The apparatus may include components such as a memory, a processor, an input unit, a display unit, a power supply, and the like.

The memory may be used to store software programs and modules, and the processor may execute various functional applications and data processing by operating the software programs and modules stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (e.g., an image playing function, etc.) required by at least one function, and the like; the storage data area may store data created according to use of the device, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may further include a memory controller to provide access to the memory by the processor and the input unit.

The input unit may be used to receive input numeric or character or image information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. Specifically, the input unit of the present embodiment may include a touch-sensitive surface (e.g., a touch display screen) and other input devices in addition to the camera.

The display unit may be used to display information input by or provided to a user and various graphical user interfaces of the device, which may be made up of graphics, text, icons, video, and any combination thereof. The display unit may include a display panel, and optionally, the display panel may be configured in the form of an LCD (Liquid crystal display), an OLED (organic light-emitting diode), or the like. Further, the touch-sensitive surface may overlie the display panel, and when a touch operation is detected on or near the touch-sensitive surface, the touch operation is transmitted to the processor to determine the type of touch event, and the processor then provides a corresponding visual output on the display panel in accordance with the type of touch event.

An embodiment of the present invention further provides a computer-readable storage medium, which may be the computer-readable storage medium contained in the memory in the foregoing embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium has stored therein at least one instruction that is loaded and executed by a processor to implement the CNN-based face ornaments generation method shown in fig. 1. The computer readable storage medium may be a read-only memory, a magnetic or optical disk, or the like.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the apparatus embodiment, and the storage medium embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and reference may be made to some descriptions of the method embodiment for relevant points.

Also, in this document, the terms "include", "include" or any other variation thereof are intended to cover a non-exclusive inclusion, so that a process, method, article, or apparatus that includes a series of elements includes not only those elements but also other elements not explicitly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

While the foregoing specification illustrates and describes the preferred embodiments of the present invention, it is to be understood that the invention is not limited to the precise forms disclosed herein and is not to be construed as limited to other embodiments, but may be used in various other combinations, modifications, and environments and may be modified within the scope of the inventive concept as expressed herein, by the above teachings or by the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A CNN-based face ornament generation method is characterized by comprising the following steps:

determining a target area for modification in the first face image, and inputting the target area to a pre-trained jewelry generation model after superimposing an jewelry hand drawing on the target area to obtain a jewelry generation image;

2. The CNN-based face adornment generation method of claim 1, wherein the jewelry generation model comprises a freehand mapping model, a context coding model and a decoding model,

3. The CNN-based method for generating ornaments as claimed in claim 2, wherein the freehand mapping model comprises a multi-layered perceptron and an adaptive mapping layer; the mapping of the hand drawing process of the ornament hand drawing to the characteristics of the ornament by the hand drawing mapping model comprises the following steps:

cutting the ornament hand-drawn picture into a plurality of line segments, dividing the line segments into Q groups, wherein each group comprises K connected line segments, and inputting each group of line segments into the multilayer perceptron to obtain a D-dimensional feature vector;

4. The CNN-based face ornament generation method of claim 1, wherein the inputting the ornament generation image and the first face image into a pre-trained ornament fusion model for fusion to obtain a first result diagram comprises:

5. The CNN-based face ornament generation method of claim 1, wherein the training process of the ornament generation model comprises:

the first loss function is:

the second loss function is:

o represents a result graph output by the jewelry generating model, T represents a target image, H represents an image height, W represents an image width, and W represents _win Representing the side length of a square window, O _sketch Grass plot, l 'representing the results of model output' _sketch Expressing the sketch image after key position extraction, lambda _i，j Representing a window-dependent weight parameter.

6. The CNN-based face ornament generation method of claim 5, wherein the CNN-based face ornament generation method is based on a formula

Calculating the lambda _i，j Where, ζ is a sign function,

7. the CNN-based face ornament generation method of claim 1, wherein the jewelry fusion model training process comprises:

supervising a result graph output by the jewelry fusion model by utilizing a third loss function and a fourth loss function; wherein the content of the first and second substances,

the third loss function is:

the fourth loss function is:

8. A CNN-based face ornament generation apparatus, comprising:

the ornament generation unit is used for determining a target area for modification in the first person face image, and inputting the ornament to a pre-trained ornament generation model after the ornament hand drawing is superimposed on the target area to obtain an ornament generation image;

9. An apparatus comprising a processor, a memory, and a computer program stored in the memory for execution by the processor to perform the steps of a CNN-based face ornaments generation method as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, having stored thereon a computer program for execution by a processor to perform the steps of a CNN-based face ornaments generation method as claimed in any one of claims 1 to 7.