CN114155358A

CN114155358A - Portrait relief data set construction method

Info

Publication number: CN114155358A
Application number: CN202111167113.7A
Authority: CN
Inventors: 张玉伟; 刘延庆; 罗萍; 周浩; 陈彦钊; 杨洪广
Original assignee: Qilu University of Technology
Current assignee: Qilu University of Technology
Priority date: 2021-10-02
Filing date: 2021-10-02
Publication date: 2022-03-08
Anticipated expiration: 2041-10-02
Also published as: CN114155358B

Abstract

The invention discloses a portrait relief data set construction method, belongs to the technical field of portrait relief models, and aims to solve the technical problem of how to construct a high-quality portrait relief data sample with complete head characteristics. The method comprises the following steps: acquiring a portrait method picture, a mask picture and a line picture based on the 3D portrait sculpture, and constructing and training a network model; for a reference image, extracting a line drawing with accurately positioned features, extracting a hair line drawing, extracting a mask drawing through an MODNet network, combining the hair line drawing and the line drawing with accurately positioned features to obtain a final line drawing, inputting the mask drawing and the final line drawing into a trained network model, and outputting an integral human image method drawing; for the reference image, obtaining a human face method image with fine geometric details through a ResUnet network, and fusing the whole human face method image and the human face method image; the texture is normally migrated to the fused integral portrait picture; and carrying out relief depth reconstruction to obtain a portrait relief model.

Description

Portrait relief data set construction method

Technical Field

The invention relates to the technical field of portrait relief models, in particular to a portrait relief data set construction method.

Background

The portrait relief is a stylized sculpture art form and is widely applied to the aspects of seals, commemorative coins, buildings, artware and the like. Traditional manual engraving and software modeling portrait embossing requires professional skills and is time-consuming and labor-consuming. With the development of artificial intelligence technology, it has become possible to realize end-to-end modeling of portrait reliefs from a single picture, but the realization of supervised training of a deep neural network is of great importance for the construction of a data set. Currently, there are no portrait relief datasets available in academia and industry in sufficient sample numbers.

Based on the analysis, how to construct a high-quality portrait relief data sample with complete head features is a technical problem to be solved.

Disclosure of Invention

The invention aims at the defects and provides a portrait relief data set construction method to solve the problem of how to construct a high-quality portrait relief data sample with complete head characteristics.

The invention discloses a portrait relief data set construction method, which comprises the following steps:

acquiring a portrait method picture, a mask picture and a line picture based on a 3D portrait sculpture, and constructing and training a network model by taking the mask picture and the line picture as input and the portrait method picture as output, wherein the network model is a network model of a coding-decoding structure;

acquiring a portrait image as a reference image;

for a reference image, carrying out filtering processing, extracting a line drawing with accurately positioned features, carrying out bilateral filtering, filtering and edge extraction, extracting a hair line drawing, extracting a mask drawing through an MODNet network, combining the hair line drawing and the line drawing with accurately positioned features to obtain a final line drawing, inputting the mask drawing and the final line drawing into the trained network model, and outputting an integral anthropomorphic image;

for the reference image, mapping the portrait image to a face normal image through a ResUnet network to obtain the face normal image with fine geometric details, and fusing the whole portrait image and the face normal image to obtain a fused whole portrait image;

for a reference image, acquiring a texture normal of each pixel, and migrating the texture normal to the fused integral portrait image through a vector rotation method to obtain a final portrait image;

and carrying out relief depth reconstruction on the final portrait map to obtain a portrait relief model.

Preferably, the obtaining of the portrait map, the mask map and the line map based on the 3D portrait sculpture comprises the steps of:

acquiring a plurality of 3D portrait sculptures with different identities, hairstyles and expressions;

for each 3D portrait sculpture, performing multi-angle sampling;

and generating a portrait map, a mask map and a line map for each sampling angle, wherein the line map is an Apantent Ridges line map.

Preferably, when the mask map and the line map are used as input and the human image map is used as output, when a network model is trained, a training sample is constructed based on the mask map, the line map and the human image map, and a loss function is defined by an average included angle between a training sample vertex normal direction and a network prediction vertex normal direction, wherein the loss function is expressed as:

wherein N is_iRepresenting training sample vertex Normal, N_i' denotes a network prediction vertex normal direction, and M denotes the number of normal vertices.

Preferably, for the reference image, the filtering processing is performed through a filtering framework of the ETF stream, and a line graph with accurately positioned features is extracted, including the following steps:

carrying out RGB image denoising on a single reference image to obtain a denoised reference image;

performing edge tangential flow processing on the denoised reference image through a filtering frame of an ETF flow to obtain a line graph with accurately positioned features;

the filtering framework of the ETF stream comprises an FDoG filter and an FBL filter, wherein the FDoG filter is used for drawing lines, and the FBL filter is used for performing region smoothing processing on the lines.

Preferably, the method for extracting the hair line drawing by performing bilateral filtering and filtering to extract the edge and extract the hair line drawing on the reference image comprises the following steps:

converting the denoised reference image into a Lab color space to obtain a color quantized reference image;

carrying out bilateral filtering on the reference image with quantized color in the gradient direction and the tangential direction to obtain a reference image after bilateral filtering, filtering the reference image after bilateral filtering through a separated FDOG filter to extract edges, and overlapping the extracted edges to the reference image with quantized color;

for the color quantized reference image, filtering is performed in the gradient direction by a DOG filter of the ETF flow and smoothing is applied along the flow field derived from the smoothed structure tensor, creating smooth and coherent straight and curved line segments, resulting in a hair line drawing.

Preferably, the method for fusing and fusing the whole human face method image and the human face method image comprises the following steps:

the vertex normal vector difference value of the whole portrait image and the human face normal image is used as a boundary condition, and normal vector difference values delta N of all portrait vertexes are estimated through a first equation, wherein the first equation is as follows:

L·ΔN＝O

wherein L is a Laplace-Belladrami matrix, and delta N belongs to R^n×3A difference matrix of normal vectors is obtained, and n is the number of vertexes;

and adding the normal vector difference value to the integral portrait picture to obtain a fused integral portrait picture.

Preferably, for each pixel in the reference image, the texture normal is defined as:

wherein, g_xAnd g_yRepresenting the texture gradient, and the parameter f is used to control the texture detail strength.

Preferably, the method for normally migrating the texture to the fused whole human image map by the vector rotation method comprises the following steps:

setting the normal vector of texture to n_dGlobal normal vector of n_bTexture normal vector and z-axis n_z＝[0,0,1]The included angle between is theta_dThe angle between the global normal vector and the z-axis is theta_b；

By a vector n_z×n_bAs a rotation axis, n is_bRotation angle max [ theta ]_d,90-θ_bGet the new pixel normal.

Preferably, the relief depth of the final portrait drawing is reconstructed by a minimization energy equation to obtain a final portrait relief model, wherein the minimization energy equation is expressed as:

wherein the minimizing energy equation first term limits the predicted relief depth H to within the target depth H such that the predicted depth gradient G is as close as possible to the known gradient.

Preferably, the energy minimization equation is equivalent to solving the equivalent equation:

ΔH+μ·H＝div G+μ·h

where Δ H represents the laplacian value of depth;

gradient G＝(G_x,G_y)＝(-N_x/N_z,N_y/N_z) Wherein N is_x、N_yAnd N_zThree components representing the vertex normal;

divergence degree

The parameter μ is used to balance the two energy terms.

The portrait relief data set construction method has the following advantages:

1. training a network model through a line graph, a mask graph and a portrait method graph obtained by a 3D portrait sculpture, obtaining the line graph and the mask graph of the portrait image as input, predicting an integral portrait method graph through the trained network model, predicting a face method graph with fine geometric details through a ResUnet network, fusing the integral portrait method graph and the face method graph into a fused integral portrait method graph, performing texture detail enhancement on the fused integral portrait method graph to obtain a final portrait method graph, and performing relief depth reconstruction on the final portrait method graph to obtain a portrait relief model, wherein the model has a large sample number and a wide range;

2. the portrait image is obtained by portrait image fusion and texture detail enhancement, and has complete and high-quality head characteristics; the reconstructed portrait embossment not only retains the detail characteristics and the texture characteristics of the normal map, but also has reasonable depth layering.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

The invention is further described below with reference to the accompanying drawings.

FIG. 1 is a block flow diagram of a method of constructing a relief data set of a portrait according to example 1;

FIG. 2 is a schematic structural diagram of a 3D portrait sculpture model in the portrait relief dataset construction method of embodiment 1;

FIG. 3 is a schematic structural diagram of a network model in the portrait relief dataset construction method according to embodiment 1;

FIG. 4 is a line drawing, a prediction method drawing and a true method drawing of the portrait sculpture in the portrait relief dataset construction method of embodiment 1;

FIG. 5 is a portrait map prediction based on line drawings in the portrait relief dataset construction method of example 1;

wherein, (a) the reference image (b) the portrait line graph extracted from the reference image (c) the portrait graph predicted through the network;

FIG. 6 is a normal enhanced portrait session prediction in the portrait relief dataset construction method of example 1;

wherein, (a) a portrait image generated by referring to a portrait image (b) a line image, (c) a face portrait image generated by a document [7], (d) a portrait image after face fusion, (e) a portrait image after texture detail enhancement;

FIG. 7 shows a portrait relief model sample and its reference image reconstructed by the portrait relief dataset construction method in embodiment 1.

Detailed Description

The present invention is further described in the following with reference to the drawings and the specific embodiments so that those skilled in the art can better understand the present invention and can implement the present invention, but the embodiments are not to be construed as limiting the present invention, and the embodiments and the technical features of the embodiments can be combined with each other without conflict.

The embodiment of the invention provides a portrait relief data set construction method, which is used for solving the technical problem of how to construct a high-quality portrait relief data sample with complete head characteristics.

Example (b):

s100, acquiring a portrait method picture, a mask picture and a line picture based on the 3D portrait sculpture, constructing and training a network model by taking the mask picture and the line picture as input and the portrait method picture as output, wherein the network model is a network model with a coding-decoding structure;

s200, acquiring a portrait image as a reference image;

s300, filtering a reference image, extracting a line drawing with accurately positioned features, carrying out bilateral filtering, filtering to extract edges, extracting a hair line drawing, extracting a mask drawing through an MODNet network, combining the hair line drawing and the line drawing with accurately positioned features to obtain a final line drawing, inputting the mask drawing and the final line drawing into the trained network model, and outputting an integral portrait drawing;

s400, mapping the portrait image to a face normal image through a ResUnet network for the reference image to obtain the face normal image with fine geometric details, and fusing the whole portrait image and the face normal image to obtain the fused whole portrait image;

s500, for the reference image, obtaining a texture normal of each pixel, and migrating the texture normal to the fused integral portrait image through a vector rotation method to obtain a final portrait image;

s600, carrying out relief depth reconstruction on the final portrait map to obtain a portrait relief model.

Step S100 is to obtain a portrait drawing, a mask drawing and a line drawing based on the 3D portrait sculpture, including the steps of: acquiring a plurality of 3D portrait sculptures with different identities, hairstyles and expressions; and for each 3D portrait sculpture, performing multi-angle sampling, and generating a portrait drawing, a mask drawing and a line drawing for each sampling angle, wherein the line drawing is an aspect Ridges line drawing.

In order to construct sample data for network training in this embodiment, 77 high-quality 3D portrait sculptures having various identities, hairstyles and expressions were collected, as shown in fig. 1. Each 3D portrait sculpture is sampled at multiple angles of every 5 degrees in sequence, the sampling range is from-80 degrees to 80 degrees in the left-right swing angle, from-5 degrees to 20 degrees in the up-down swing angle, and the sampling quantity is 198. For each sampling direction, a portrait map and a mask map are generated, along with an aspect Ridges line map, as shown in fig. 2. 77 the portrait sculpture gets 15246 portrait pictures, shade pictures and line pictures, and the resolution of the pictures is 360x 360. Wherein, the number of training samples is 13000, and the number of testing samples is 2246.

The method comprises the following steps of: in a three-dimensional browsing environment, the normal components (nx, ny, nz) of the model vertices are converted into RGB three channels every time the model vertices are rotated to a sampling angle. The background in the three-dimensional environment is black, and the model vertex is set to be white, so that a black-white mask image can be automatically generated.

The specific method for generating the Appatent Ridges line graph comprises the following steps: a line associated with a viewpoint is captured on a three-dimensional object. For a local triangular patch of the three-dimensional object surface, when the normal line changes at a local maximum rate with respect to the viewpoint position, a line is drawn inside it, which is called "approximate Ridges".

Step S100 is to train a network model based on the 3D model, where the network model is a model of an encoding-decoding structure and is used to realize prediction from a line graph to a human image graph. As shown in fig. 2. And taking the mask images and the line images of the two channels as input, and outputting the three-channel portrait images. During network training, in order to enable a prediction normal to retain more geometric details, a training sample is constructed based on a mask graph, a line graph and a human image graph, a loss function is defined by an average included angle between a training sample vertex normal and a network prediction vertex normal, and the loss function is expressed as:

After network training, 2246 test samples are used for evaluating the prediction quality, the average angle error and the pixel percentage of which the average angle error is less than 20 degrees, 25 degrees and 30 degrees are taken as evaluation bases, the results are respectively 11.92 degrees, 84.86 degrees, 91.11 percent and 94.67 percent, and the comparison effect of a network prediction law diagram and a real law diagram in a data set is shown in fig. 3.

Step S300, a portrait drawing is predicted through the trained network model, and a reference portrait line drawing needs to be extracted before the portrait drawing is predicted.

Firstly, for a reference image, filtering processing is carried out through a filtering framework of an ETF stream, and a line graph with accurately positioned features is extracted. The method specifically comprises the following steps:

(1) carrying out RGB image denoising on a single reference image to obtain a denoised reference image;

(2) and for the denoised reference image, performing edge tangential flow processing through a filtering frame of the ETF flow to obtain a line graph with accurately positioned features.

The filtering framework of the ETF stream includes an FDoG filter for line drawing and an FBL filter for region smoothing of lines.

The method generates a line graph with accurately positioned features, a single RGB picture is denoised, and then an Edge Tangent Flow (ETF) is processed to establish a smooth and coherent edge flow field with the features. ETF is essentially a bilateral filter for processing vector data, using a non-linear smoothing vector to preserve salient edges at the center pixel of each filter, and allowing weak edges to redirect themselves to follow the adjacent main edges. The significant shape boundaries in the scene are then captured using an ETF stream-based filtering framework and displayed with a set of smoothly-coherent lines. The ETF flow based filtering framework includes FDoG filters for line drawing and FBL filters for region smoothing. The FDoG filter adopts a method based on a DoG edge model, the quality of a generated line is improved by guiding the DoG filter along an ETF flow (flow curve), namely, when the FDoG filter moves along an edge flow, a linear DoG filter is applied in a gradient direction, and the response of each filter is accumulated along the flow. The FBL filter is an ETF-based linear bilateral filter with the purpose of achieving regional smoothing of lines, removing insignificant detail from inside the region, while preserving significant shapes. The key is to perform smoothing between similar colors using two weight parameters, one in the spatial domain and one in the color domain.

Then, bilateral filtering and filtering are carried out on the reference image to extract edges and extract a hair line drawing. The method specifically comprises the following steps:

(2) converting the denoised reference image into a Lab color space to obtain a color quantized reference image;

(3) carrying out bilateral filtering on the reference image with quantized color in the gradient direction and the tangential direction to obtain a reference image after bilateral filtering, filtering the reference image after bilateral filtering through a separated FDOG filter to extract edges, and overlapping the extracted edges to the reference image with quantized color;

(3) for the color quantized reference image, filtering is performed in the gradient direction by a DOG filter of the ETF flow and smoothing is applied along the flow field derived from the smoothed structure tensor, creating smooth and coherent straight and curved line segments, resulting in a hair line drawing.

The method comprises the steps of firstly calculating a smooth structure tensor of a denoised picture in an RGB color space to estimate the local direction of lines of the picture, and converting an input picture into an Lab color space in order to avoid noise at color and brightness outbursts. Then, the image with simplified colors is obtained by separating, directionally aligning and bilateral filtering, namely, a bilateral filter is approximated by separation, filtering is carried out in the gradient direction firstly, then filtering is carried out in the tangential direction, and the edge is kept while the image is smoothed. The results of the bilateral filtering are filtered using a separate FDOG filter to extract edges, which are superimposed on the color quantized output. To extract significant edges, first apply an ETF flow-based DOG filter in the gradient direction, and then apply smoothing along the flow field derived from the smoothed structure tensor, creating smooth and coherent straight and curved line segments.

For the reference image, the photo background is separated from the foreground portrait when generating the portrait mask. And dividing the human image matting target of the trimap-free into 3 subtasks of semantic estimation, detail prediction and semantic and detail fusion by using an MODNet network for collaborative training. Capturing a portrait subject through low-resolution semantic estimation, outputting a rough foreground mask, extracting portrait edge details through high-resolution detail prediction to obtain a fine foreground boundary, and finally fusing the characteristics of the first two branches through semantic and detail fusion to obtain a final portrait segmentation result.

Due to the sparsity of characteristic lines and the loss of illumination information, the geometric details and the textural features of the human face in the prediction method map do not achieve the best effect, and the quality of the human image method map is further improved through two steps: (1) and (5) normal enhancement of the human face. (2) The texture details of the eyes, beard and body parts are enhanced.

First, step S400 is performed for normal enhancement, which uses a depth-based learning technique to estimate the face normal from a single color image, generating a portrait face normal map with reasonable depth and geometric texture details. The method selects a new learning framework, combines robustness of cross-mode learning and detail transfer capability of skipping connection, uses a ResUnet network to carry out an end-to-end training framework, and learns the mapping from a color image to a face normal direction. Wherein cross-modal modeling uses two encoder-decoder networks with shared hidden space, picture-to-normal conversion with paired and unpaired picture and normal data is allowed by adjustment of skip connections of image encoder to color image decoder and normal encoder to normal decoder during training.

Generating an integral portrait drawing adapting to different illumination conditions through a ResUnet network, as shown in FIG. 6c, in order to realize seamless fusion of the face portrait drawing, using vertex normal vector difference values of the integral portrait drawing and the face portrait drawing as boundary conditions, and estimating normal vector difference values delta N of all portrait vertexes through a first equation, wherein the first equation is as follows:

L·ΔN＝O

wherein L is a Laplace-Belladrami matrix, and delta N belongs to R^n×3A difference matrix of normal vectors is obtained, and n is the number of vertexes; and adding the normal vector difference value to the whole portrait map to obtain the fused whole portrait map. As shown in fig. 6d, the quality of the fused face map is greatly improved.

Step S500 is then performed to add normal texture detail to enhance the sharpness of the eyes, beard and body parts. For each pixel in the above part of the reference image, its texture normal is defined as:

On the basis, the normal migration of the texture to the fused integral portrait map by a vector rotation method comprises the following steps: setting the normal vector of texture to n_dGlobal normal vector of n_bTexture normal vector and z-axis n_z＝[0,0,1]The included angle between is theta_dThe angle between the global normal vector and the z-axis is theta_b(ii) a By a vector n_z×n_bAs a rotation axis, n is_bRotation angle max [ theta ]_d,90-θ_bGet the new pixel normal.

By this operation, the details of the texture of the eyes, beard and body parts are further enhanced, as shown in fig. 6 e.

After the final portrait image is obtained, step S600 is performed to perform depth reconstruction on the portrait image.

This embodiment achieves reconstruction of the relief depth from the anthropomorphic image by minimizing the following energy equation, which is expressed as:

The minimizing energy equation is equivalent to solving the equivalent equation:

ΔH+μ·H＝div G+μ·h

where Δ H represents the laplacian value of depth;

gradient G ═ G (G)_x,G_y)＝(-N_x/N_z,N_y/N_z) Wherein N is_x、N_yAnd N_zThree components representing the vertex normal;

divergence degree

The parameter μ is used to balance the two energy terms, by default, μ ═ 0.01 and h ═ 0.03.

The reconstructed portrait relief model sample and its reference image are shown in fig. 7.

The portrait relief model building method includes the steps of training a network model through a line graph, a mask graph and a portrait map which are obtained through a 3D model, obtaining the line graph and the mask graph of the portrait image as input, predicting the whole portrait map through the trained network model, predicting a face map with fine geometric details through a ResUnet network, fusing the whole portrait map and the face map into a fused whole portrait map, performing texture detail enhancement on the fused whole portrait map to obtain a final portrait map, and performing relief depth reconstruction on the final portrait map to obtain the portrait relief model.

While the invention has been shown and described in detail in the drawings and in the preferred embodiments, it is not intended to limit the invention to the embodiments disclosed, and it will be apparent to those skilled in the art that various combinations of the code auditing means in the various embodiments described above may be used to obtain further embodiments of the invention, which are also within the scope of the invention.

Claims

1. The portrait relief data set construction method is characterized by comprising the following steps:

acquiring a portrait image as a reference image;

2. The portrait relief dataset construction method according to claim 1, characterized in that a portrait map, a mask map and a line map are obtained based on a 3D portrait sculpture, comprising the steps of:

for each 3D portrait sculpture, performing multi-angle sampling;

3. The portrait relief dataset construction method according to claim 1, wherein when training a network model, with the mask map and the line map as inputs and the portrait map as outputs, training samples are constructed based on the mask map, the line map and the portrait map, and a loss function is defined by an average included angle between a training sample vertex normal and a network prediction vertex normal, and the loss function is expressed as:

wherein N is_iRepresenting training sample vertex Normal, N'_iRepresenting the network predicted vertex normal, and M represents the number of normal vertices.

4. The portrait relief dataset construction method according to claim 1, wherein for the reference image, the filtering processing is performed through the filtering framework of the ETF stream, and the line drawing with the accurately positioned features is extracted, comprising the following steps:

5. The portrait relief dataset construction method according to claim 1, wherein for a reference image, bilateral filtering and filtering are performed to extract edges, extract a hair line drawing, comprising the steps of:

6. The portrait relief dataset construction method according to claim 1, wherein the whole portrait map and the face map are fused, comprising the steps of:

L·ΔN＝0

7. A portrait relief dataset construction method according to claim 1, characterized in that for each pixel in the reference image, the texture normal is defined as:

8. The portrait relief dataset construction method according to claim 7, characterized in that the normal migration of texture to the fused overall portrait map by vector rotation method comprises the following steps:

9. The portrait relief dataset construction method according to claim 1, characterized in that the relief depth of the final portrait drawing is reconstructed by minimizing an energy equation to obtain a final portrait relief model, the minimizing energy equation being expressed as:

10. The portrait relief dataset construction method of claim 9, wherein the minimization of energy equation is equivalent to solving the equivalent equation:

ΔH+μ·H＝div G+μ·h

where Δ H represents the laplacian value of depth;

divergence degree

The parameter μ is used to balance the two energy terms.