CN114155358B

CN114155358B - Portrait relief data set construction method

Info

Publication number: CN114155358B
Application number: CN202111167113.7A
Authority: CN
Inventors: 张玉伟; 刘延庆; 罗萍; 周浩; 陈彦钊; 杨洪广
Original assignee: Qilu University of Technology
Current assignee: Qilu University of Technology
Priority date: 2021-10-02
Filing date: 2021-10-02
Publication date: 2024-02-20
Anticipated expiration: 2041-10-02
Also published as: CN114155358A

Abstract

The invention discloses a method for constructing a portrait relief data set, belongs to the technical field of portrait relief models, and aims to solve the technical problem of how to construct a high-quality portrait relief data sample with complete head characteristics. The method comprises the following steps: acquiring a portrait drawing, a mask drawing and a line drawing based on the 3D portrait sculpture, and constructing and training a network model; for a reference image, extracting a line drawing with accurately positioned features, extracting a hair line drawing, extracting a mask drawing through a MODET network, merging the hair line drawing and the line drawing with accurately positioned features to obtain a final line drawing, inputting the mask drawing and the final line drawing into a trained network model, and outputting an overall portrait drawing; for a reference image, obtaining a face law map with fine geometric details through a Resunate network, and fusing the whole face law map and the face law map; migrating the texture normal direction to the integrated portrait sketch after fusion; and (5) carrying out relief depth reconstruction to obtain a portrait relief model.

Description

Portrait relief data set construction method

Technical Field

The invention relates to the technical field of portrait relief models, in particular to a portrait relief data set construction method.

Background

The portrait relief is a stylized sculpture artistic form and is widely applied to seals, commemorative coins, buildings, artworks and the like. Traditional manual engraving and software modeling of portrait relief requires expertise and is time consuming and laborious. With the development of artificial intelligence technology, it has become possible to implement end-to-end modeling of portrait relief from a single picture, but to implement supervised training of a deep neural network, the construction of a data set is crucial. At present, no portrait relief data set with enough sample number exists in academia and industry.

Based on the above analysis, how to construct a high quality portrait relief data sample with complete head features is a technical problem to be solved.

Disclosure of Invention

The technical task of the invention is to provide a portrait relief data set construction method aiming at the defects, so as to solve the problem of how to construct a high-quality portrait relief data sample with complete head characteristics.

The invention relates to a portrait relief data set construction method, which comprises the following steps:

acquiring a portrait drawing, a mask drawing and a line drawing based on a 3D portrait sculpture, taking the mask drawing and the line drawing as input, taking the portrait drawing as output, and constructing and training a network model, wherein the network model is a network model of an encoding-decoding structure;

acquiring a portrait image as a reference image;

for a reference image, carrying out filtering treatment, extracting a line drawing with accurately positioned characteristics, carrying out bilateral filtering, extracting edges, extracting a hair line drawing, extracting a mask drawing through a MODET network, merging the hair line drawing and the line drawing with accurately positioned characteristics to obtain a final line drawing, inputting the mask drawing and the final line drawing into the trained network model, and outputting an integral portrait drawing;

for a reference image, mapping the portrait image to a face method image through a Resunate network to obtain the face method image with fine geometric details, and fusing the whole portrait image and the face method image to obtain a fused whole portrait image;

for a reference image, obtaining a texture normal of each pixel, and migrating the texture normal to the fused integral portrait sketch through a vector rotation method to obtain a final portrait sketch;

and carrying out relief depth reconstruction on the final portrait image to obtain a portrait relief model.

Preferably, the method for obtaining the portrait image, the mask image and the line drawing based on the 3D portrait sculpture comprises the following steps:

acquiring a plurality of 3D portrait sculptures with different identities, hairstyles and expressions;

performing multi-angle sampling on each 3D portrait sculpture;

and generating a portrait graph, a mask graph and a line graph for each sampling angle, wherein the line graph is an application edges line graph.

Preferably, when the mask graph and the line graph are taken as input and the portrait graph is taken as output, a training sample is constructed based on the mask graph, the line graph and the portrait graph, and a loss function is defined by the average included angle between the normal direction of the vertex of the training sample and the normal direction of the predicted vertex of the network, wherein the loss function is expressed as:

wherein N is _i Represents the normal of the vertex of the training sample, N _i ' represents the network predicted vertex normal, and M represents the number of normal vertices.

Preferably, for a reference image, a line drawing for precisely positioning features is extracted by filtering processing through a filtering frame of an ETF stream, and the method comprises the following steps of:

carrying out RGB image denoising on the single reference image to obtain a denoised reference image;

for the denoised reference image, edge tangential flow processing is carried out through a filter frame of the ETF flow, so as to obtain a line drawing with accurately positioned characteristics;

the filtering framework of the ETF stream comprises an FDoG filter and an FBL filter, wherein the FDoG filter is used for drawing lines, and the FBL filter is used for carrying out region smoothing on the lines.

Preferably, for the reference image, bilateral filtering and filtering are performed to extract edges and extract hair line bar graphs, and the method comprises the following steps:

converting the denoised reference image into a Lab color space to obtain a color quantized reference image;

for the reference image with quantized color, bilateral filtering is carried out in the gradient direction and the tangential direction to obtain a reference image after bilateral filtering, filtering and extracting edges are carried out on the reference image after bilateral filtering through a separated FDOG filter, and the extracted edges are overlapped to the reference image with quantized color;

for color quantized reference images, filtering is performed in the gradient direction by DOG filters of ETF streams and smoothing is applied along flow fields derived from smoothed structure tensors, creating smooth and coherent straight and curved segments, resulting in a hair line bar graph.

Preferably, the method for fusing the integral portrait map and the face portrait map comprises the following steps:

taking the vertex normal vector difference values of the integral face method diagram and the face method diagram as boundary conditions, estimating the normal vector difference value delta N of all the face vertices by a first equation, wherein the first equation is as follows:

L·ΔN＝O

wherein L is a Laplacian-Bellamide matrix, and delta N epsilon R ^n×3 A difference matrix which is a normal vector, wherein n is the number of vertexes;

and adding the normal vector difference value to the integral portrait sketch to obtain a fused integral portrait sketch.

Preferably, for each pixel in the reference image, the texture normal is defined as:

wherein g _x And g _y Representing the texture gradient, the parameter f is used to control the intensity of the texture detail.

Preferably, the texture normal migration to the fused integral portrait graph through a vector rotation method comprises the following steps:

setting the normal vector of the texture as n _d The global normal vector is n _b Texture normal vector and z-axis n _z ＝[0,0,1]The included angle between them isθ _d The included angle between the global normal vector and the z axis is theta _b ；

In a vector n _z ×n _b For the rotation axis, n is _b Rotation angle max { θ _d ,90-θ _b New pixel normal is obtained.

Preferably, the relief depth of the final portrait drawing is reconstructed by a minimum energy equation, and the minimum energy equation is expressed as:

wherein the first term of the minimization energy equation constrains the predicted depth gradient G to be as close as possible to the known gradient, the first term constraining the predicted relief depth H to be within the target depth H.

Preferably, the minimized energy equation is equivalent to solving the equivalent equation:

ΔH+μ·H＝div G+μ·h

where Δh represents the laplace value of the depth;

gradient g= (G) _x ,G _y )＝(-N _x /N _z ,N _y /N _z ) Wherein N is _x 、N _y N _z Three components representing the vertex normal;

divergence degree

The parameter μ is used to balance the two energy terms.

The portrait relief data set construction method of the invention has the following advantages:

1. training a network model through a line drawing, a mask drawing and a portrait drawing which are obtained by a 3D portrait sculpture, obtaining the line drawing and the mask drawing of the portrait image as inputs, predicting an overall portrait drawing through the trained network model, predicting a portrait drawing with fine geometric details through a Resunate network, fusing the overall portrait drawing and the portrait drawing into a fused overall portrait drawing, enhancing texture details of the fused overall portrait drawing to obtain a final portrait drawing, and carrying out relief depth reconstruction on the final portrait drawing to obtain a portrait relief model, wherein the number of model samples is large and the range is wide;

2. the portrait map is obtained by portrait map fusion and texture detail enhancement, and has complete and high-quality head characteristics; the reconstructed portrait relief not only maintains the details and texture characteristics of the normal map, but also has reasonable depth layering sense.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

The invention is further described below with reference to the accompanying drawings.

FIG. 1 is a block flow diagram of a method for constructing a relief data set of a portrait of example 1;

FIG. 2 is a schematic diagram of a 3D statue sculpture model in a method for constructing a statue relief data set according to embodiment 1;

FIG. 3 is a schematic diagram of a network model in a method for constructing a relief data set according to embodiment 1;

FIG. 4 is a line drawing, a predictive method and a real method of the figure sculpture in the method for constructing the figure relief data set of example 1;

FIG. 5 is a line drawing-based portrait layout prediction in a portrait relief data set construction method according to example 1;

wherein, (a) the reference image (b) the portrait drawing extracted from the reference image (c) the portrait drawing predicted through the network;

FIG. 6 is a normal enhanced portrait layout prediction in the method for constructing a portrait relief data set according to example 1;

wherein, (a) a portrait image generated by referring to a portrait (b) line drawing, (c) a human face portrait image generated by a literature [7] (d) a portrait image after human face fusion (e) a portrait image with enhanced texture details;

fig. 7 example 1 a portrait relief model sample reconstructed by the portrait relief data set construction method and a reference image thereof.

Detailed Description

The invention will be further described with reference to the accompanying drawings and specific examples, so that those skilled in the art can better understand the invention and implement it, but the examples are not meant to limit the invention, and the technical features of the embodiments of the invention and the examples can be combined with each other without conflict.

The embodiment of the invention provides a portrait relief data set construction method which is used for solving the technical problem of how to construct a high-quality portrait relief data sample with complete head characteristics.

Examples:

s100, acquiring a portrait drawing, a mask drawing and a line drawing based on a 3D portrait sculpture, taking the mask drawing and the line drawing as input, taking the portrait drawing as output, constructing and training a network model, wherein the network model is a network model of an encoding-decoding structure;

s200, acquiring a portrait image as a reference image;

s300, carrying out filtering treatment and extracting a line drawing with accurately positioned features for a reference image, carrying out bilateral filtering and filtering to extract edges and extract a hair line drawing, extracting a mask drawing through a MODET network, merging the hair line drawing and the line drawing with accurately positioned features to obtain a final line drawing, inputting the mask drawing and the final line drawing into the trained network model, and outputting an integral portrait drawing;

s400, mapping the portrait image to the face method image through a Resunate network for the reference image to obtain the face method image with fine geometric details, and fusing the whole portrait image and the face method image to obtain a fused whole portrait image;

s500, for a reference image, acquiring a texture normal of each pixel, and migrating the texture normal to the fused integral portrait sketch through a vector rotation method to obtain a final portrait sketch;

and S600, performing relief depth reconstruction on the final portrait image to obtain a portrait relief model.

Step S100 is based on the 3D portrait sculpture to obtain a portrait drawing, a mask drawing and a line drawing, and comprises the following steps: acquiring a plurality of 3D portrait sculptures with different identities, hairstyles and expressions; for each 3D portrait sculpture, multi-angle sampling is carried out, and for each sampling angle, a portrait drawing, a mask drawing and a line drawing are generated, wherein the line drawing is an application edges line drawing.

In this example, 77 high quality 3D statues with various identities, hairstyles and expressions were collected for the construction of sample data for network training, as shown in fig. 1. Every 3D portrait sculpture sequentially carries out multi-angle sampling at intervals of 5 degrees, the sampling range is left and right swing angles-80 degrees to 80 degrees, up and down swing angles-5 degrees to 20 degrees, and the sampling quantity is 198. For each sampling direction, an artificial image and a mask image are generated, and an application edges line drawing is generated at the same time, as shown in fig. 2. 77 statue sculptures are combined to obtain 15246 statue pictures, mask pictures and line drawings, and the resolution of the pictures is 360x360. Wherein, training sample number is 13000, and test sample number is 2246.

Wherein, a mask map is generated by the following method: in a three-dimensional browsing environment, the normal components (nx, ny, nz) of the model vertices are converted into RGB three channels for each rotation to a sampling angle. The background in the three-dimensional environment is black, and the model vertex is set to be white, so that a black-and-white mask image can be automatically generated.

The specific method for generating the application edges line drawing comprises the following steps: a line associated with a viewpoint is captured on a three-dimensional object. For a local triangular patch of the three-dimensional object surface, when the normal changes at a local maximum rate with respect to the viewpoint position, a line is drawn inside it, which lines are called "application edges".

Step S100 is to train a network model based on the 3D model, where the network model is a model of the encoding-decoding structure, and is used to implement the prediction from the line drawing to the portrait drawing. As shown in fig. 2. And taking the mask images and line drawings of the two channels as input and outputting the portrait images of the three channels. During network training, in order to enable the prediction normal to keep more geometric details, a training sample is constructed based on a mask diagram, a line drawing and an image method diagram, a loss function is defined by the average included angle between the normal of the vertex of the training sample and the normal of the prediction vertex of the network, and the loss function is expressed as follows:

After network training, 2246 test samples are used for evaluating the prediction quality, the average angle error and the pixel percentages with the average angle error smaller than 20 degrees, 25 degrees and 30 degrees are used as evaluation bases, and the results are respectively 11.92 degrees, 84.86 degrees, 91.11 percent and 94.67 percent, and the comparison effect of the network prediction method diagram and the real method diagram in the data set is shown in fig. 3.

Step S300 predicts a portrait sketch through the trained network model, and the reference portrait line drawing needs to be extracted before the portrait sketch is predicted.

First, for a reference image, a line drawing with accurately positioned features is extracted by performing a filtering process through a filtering frame of an ETF stream. The method specifically comprises the following steps:

(1) Carrying out RGB image denoising on the single reference image to obtain a denoised reference image;

(2) And carrying out edge tangential flow processing on the denoised reference image through a filter frame of the ETF flow to obtain a line drawing with accurately positioned characteristics.

The filtering framework of the ETF stream includes an FDoG filter for line drawing and an FBL filter for region smoothing the line.

The method comprises the steps of generating a line drawing with accurately positioned characteristics, denoising a single RGB picture, and then establishing a smooth and coherent edge flow field with reserved characteristics through edge flow cutting (ETF) treatment. ETF is essentially a bilateral filter that processes vector data, using nonlinear smoothing vectors to preserve significant edges at the center pixel of each filter, and allows weak edges to change direction themselves to follow adjacent dominant edges. Important shape boundaries in the scene are then captured using the ETF stream-based filtering framework and displayed with a smooth set of consecutive lines. The ETF stream based filtering framework includes FDoG filters for line drawing and FBL filters for region smoothing. The FDoG filter adopts a DoG edge model, and guides the DoG filter along an ETF flow (flow curve) to improve the quality of a generated line, namely, when moving along the edge flow, a linear DoG filter is applied in the gradient direction, and the responses of the filters are accumulated along the flow. The FBL filter is an ETF-based linear bilateral filter with the aim of achieving region smoothing of the lines, removing unimportant details from inside the region, while preserving important shapes. The key is to use two weight parameters, one in the spatial domain and one in the color domain, to perform smoothing between similar colors.

Then, for the reference image, bilateral filtering and filtering are carried out to extract edges and extract hair line bar charts. The method specifically comprises the following steps:

(2) Converting the denoised reference image into a Lab color space to obtain a color quantized reference image;

(3) For the reference image with quantized color, bilateral filtering is carried out in the gradient direction and the tangential direction to obtain a reference image after bilateral filtering, filtering and extracting edges are carried out on the reference image after bilateral filtering through a separated FDOG filter, and the extracted edges are overlapped to the reference image with quantized color;

(3) For color quantized reference images, filtering is performed in the gradient direction by DOG filters of ETF streams and smoothing is applied along flow fields derived from smoothed structure tensors, creating smooth and coherent straight and curved segments, resulting in a hair line bar graph.

The method comprises the steps of firstly calculating a smooth structure tensor in an RGB color space of a denoised picture to estimate the local direction of a picture line, and converting an input picture into a Lab color space in order to avoid noise at color and brightness abrupt positions. And then obtaining a picture with simplified color by using separation orientation alignment bilateral filtering, namely approximating the bilateral filtering through separation realization, filtering in a gradient direction firstly, filtering in a tangential direction, and keeping edges while smoothing the picture. Filtering the bilateral filtered result using a separate FDOG filter to extract edges, and superimposing the extracted edges on the color quantized output. To extract significant important edges, an ETF flow based DOG filter is first applied in the gradient direction, then smoothing is applied along the flow field derived from the smoothed structure tensor, creating smooth and coherent straight and curved segments.

For the reference image, when the portrait mask is generated, the photo background is separated from the foreground portrait. And dividing the portrait matting target of the trimap-free into 3 subtasks of semantic estimation, detail prediction and semantic and detail fusion by using an MODET network for collaborative training. Capturing a portrait main body through low-resolution semantic estimation, outputting a rough foreground mask, extracting portrait edge details through high-resolution detail prediction to obtain a fine foreground boundary, and finally fusing the features of the first two branches through semantic and detail fusion to obtain a final portrait segmentation result.

Because of the sparsity of the characteristic lines and the lack of illumination information, the geometric details and texture characteristics of the face in the predictive method map do not reach the optimal effect, and the quality of the portrait method map is further improved through two steps: (1) face normal enhancement. (2) The texture details of the eyes, beard and body parts are enhanced.

First, step S400 is performed for normal enhancement, which uses depth-based learning techniques to estimate the face normal from a single color image, generating a face normal map of a person with reasonable depth and geometric texture details. The method selects a new learning framework, combines the robustness of cross-modal learning and the detail transmission capability of skip connection, uses a Resunate network to perform an end-to-end training framework, and learns the mapping from a color image to the normal direction of a human face. Wherein the cross-modal modeling uses two encoder-decoder networks with shared hidden space, the adjustment of skip connections by image encoder to color image decoder and normal encoder to normal decoder during training allows for picture-to-normal conversion with paired and unpaired picture and normal data.

Generating an overall portrait graph adapting to different illumination conditions through a Resunate network, as shown in fig. 6c, in order to realize seamless fusion of the portrait graph, taking a vertex normal vector difference value of the overall portrait graph and the portrait graph as a boundary condition, estimating a normal vector difference value delta N of all the portrait vertices through a first equation, wherein the first equation is as follows:

L·ΔN＝O

wherein L is a Laplacian-Bellamide matrix, and delta N epsilon R ^n×3 A difference matrix which is a normal vector, wherein n is the number of vertexes; and adding the normal difference value to the integral portrait sketch to obtain the integral portrait sketch after fusion. As shown in FIG. 6d, the quality of the fused face map is greatly improved.

Step S500 is then performed to add normal texture details to enhance the sharpness of eyes, beards and body parts. For each pixel of the above part in the reference image, its texture normal is defined as:

On the basis, the texture normal direction is migrated to the integrated portrait sketch after fusion by a vector rotation method, and the method comprises the following steps: setting the normal vector of the texture as n _d The global normal vector is n _b Texture normal vector and z-axis n _z ＝[0,0,1]The included angle between them is theta _d The included angle between the global normal vector and the z axis is theta _b The method comprises the steps of carrying out a first treatment on the surface of the In a vector n _z ×n _b For the rotation axis, n is _b Rotation angle max { θ _d ,90-θ _b New pixel normal is obtained.

By this operation, the texture details of the eyes, beard and body parts are further enhanced, as shown in fig. 6 e.

After the final image map is obtained, step S600 is executed to reconstruct the image map in depth.

In the embodiment, reconstruction from the portrait image to the relief depth is realized by minimizing the following energy equation, wherein the minimum energy equation is expressed as:

Minimizing the energy equation is equivalent to solving the following equivalent equation:

ΔH+μ·H＝div G+μ·h

where Δh represents the laplace value of the depth;

divergence degree

The parameter μ is used to balance the two energy terms, μ=0.01 by default, h=0.03.

The reconstructed portrait relief model sample and the reference image thereof are shown in fig. 7.

According to the portrait relief model construction method, a network model is trained through a line drawing, a mask drawing and a portrait method drawing which are obtained through a 3D model, the line drawing and the mask drawing of the portrait image are obtained as input, the whole portrait method drawing is predicted through the trained network model, the portrait method drawing with fine geometric details is predicted through a Resunate network, the whole portrait method drawing and the portrait method drawing are fused into a fused whole portrait method drawing, texture details of the fused whole portrait method drawing are enhanced, a final portrait method drawing is obtained, relief depth reconstruction is conducted on the final portrait method drawing, and therefore a portrait relief model is obtained, and the number of model samples is large, and the range is wide.

While the invention has been illustrated and described in detail in the drawings and in the preferred embodiments, the invention is not limited to the disclosed embodiments, and it will be appreciated by those skilled in the art that the code audits of the various embodiments described above may be combined to produce further embodiments of the invention, which are also within the scope of the invention.

Claims

1. The portrait relief data set construction method is characterized by comprising the following steps:

acquiring a portrait image as a reference image;

2. The portrait relief data set constructing method according to claim 1, wherein a portrait law map, a mask map and a line drawing are obtained based on a 3D portrait sculpture, comprising the steps of:

performing multi-angle sampling on each 3D portrait sculpture;

3. The method of constructing a portrait relief data set according to claim 1 wherein when a network model is trained using the mask pattern and line pattern as input and the portrait pattern as output, a training sample is constructed based on the mask pattern, line pattern and portrait pattern, and a loss function is defined by an average angle between a normal direction of a vertex of the training sample and a normal direction of a predicted vertex of the network, the loss function being expressed as:

wherein N is _i Represent training sample vertex normal, N' _i Representing the normal of the predicted vertices of the network, and M represents the number of normal vertices.

4. The portrait relief data set construction method according to claim 1, wherein for the reference image, a line drawing for precisely positioning features is extracted by filtering processing through a filter frame of ETF stream, comprising the steps of:

5. The method for constructing a relief data set according to claim 1, wherein the steps of bilateral filtering and filtering the reference image to extract edges and extract hair line bar graph, comprises the steps of:

6. The method of constructing a relief data set of a portrait, according to claim 1, characterized by fusing an overall portrait map and a face map, comprising the steps of:

L·ΔN＝0

7. A method of constructing a set of portrait relief data according to claim 1 wherein for each pixel in a reference image, the texture normal is defined as:

8. The method of constructing a relief data set according to claim 7, wherein the normal migration of texture to the fused integral relief data set by vector rotation method comprises the steps of:

setting the normal vector of the texture as n _d The global normal vector is n _b Texture normal vector and z-axis n _z ＝[0,0,1]The included angle between them is theta _d The included angle between the global normal vector and the z axis is theta _b ；

9. The method of constructing a relief data set according to claim 1, wherein the relief depth of the final figure drawing is reconstructed by minimizing an energy equation, the minimized energy equation being expressed as:

10. The portrait relief data set construction method according to claim 9 wherein the minimized energy equation is equivalent to solving the equivalent equation:

ΔH+μ·H＝div G+μ·h

where Δh represents the laplace value of the depth;

divergence degree

The parameter μ is used to balance the two energy terms.