CN114359452A

CN114359452A - Three-dimensional model texture synthesis method based on semantic image translation

Info

Publication number: CN114359452A
Application number: CN202111514168.0A
Authority: CN
Inventors: 阮系标; 宋海川; 马利庄
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2021-12-13
Filing date: 2021-12-13
Publication date: 2022-04-15
Anticipated expiration: 2041-12-13
Also published as: CN114359452B

Abstract

The invention discloses a three-dimensional model texture synthesis method based on semantic image translation. And then, performing human-computer interaction to fill semantic labels in the six rendering images, wherein different colors represent different semantic labels, and meanwhile, the six rendering images filled with the semantic labels are used as input data and input into the antagonistic neural network to synthesize six target images with consistent styles. Finally, all the synthesized target images are mapped to a texture map, so that an applicable complete texture image is obtained. The method solves the problem of time-consuming drawing in the process of manufacturing the three-dimensional model texture, simplifies the work of drawing texture details into filling semantic labels, synthesizes the texture details with a fixed style by using an antagonistic neural network, and reduces the workload of personnel in the process of manufacturing the texture.

Description

Three-dimensional model texture synthesis method based on semantic image translation

Technical Field

The invention relates to the technical field of image synthesis and texture mapping, in particular to a three-dimensional model texture synthesis method based on semantic image translation.

Background

Texture mapping of a three-dimensional model is carried out by art workers who need to spread uv a pure geometric model to spread the three-dimensional model into a two-dimensional plane and draw materials and the like on the two-dimensional plane, the drawing work flow is complicated, along with the development of the game industry, a game layer is endless, characters of different scene articles need different model mappings, and the speed of producing the game model mappings by the art workers is difficult to meet the industry speed of high-speed development.

In the previous research of generating model maps, existing image resources on the internet are used to realize texture migration from 2D images to 3D models, but these methods all require multi-view images of the same object as source data or the 2D images and the 3D images have high similarity, and the acquisition of these images is difficult, so these methods are difficult to be really applied to texture generation.

Disclosure of Invention

The invention aims to provide a three-dimensional model texture synthesis method based on semantic image translation, which synthesizes corresponding targets for filled semantic label images by using an antagonistic neural network, controls the style of the synthesized images by using a style migration method, and maps a plurality of synthesized images into one texture map by using the gravity center coordinates of triangles.

The specific technical scheme for realizing the purpose of the invention is as follows:

a three-dimensional model texture synthesis method based on semantic image translation utilizes rasterization to render geometric model outline, semantic labels are filled in the geometric model outline to be added into a countermeasure neural network to realize texture synthesis, and the method comprises the following specific steps:

step 1: rasterizing rendering profile and retaining triangle information

1.1) introducing a pure geometric model into a renderer, selecting a rotation matrix and a model matrix to adjust the model, realizing the rendering of the outlines of a main view, a top view, a left view, a right view, a bottom view and a back view of the model, setting the rendered outlines and internal pixel values to be 255, and setting the other pixel values to be 0;

1.2) the rendering geometric model saves the information of a triangular patch in the geometric model rendered by each rendering map besides the obtained six rendering maps, generates a corresponding file for each rendering map, and records the coordinates of each pixel point in the rendering map, the coordinates of a triangle vertex of the geometric model to which the pixel point belongs and the coordinates of the gravity center of the pixel in each file;

1.3) the geometric model with uv coordinate information retains uv coordinate information and generates an initial texture map for subsequent coloring by the uv coordinate, wherein the size of the initial texture map is set to 1024 × 1024; the geometric model without uv coordinate information calculates the area of the triangle according to the information of the triangle vertex to distribute uv coordinates to each triangle vertex, and an initial texture image is generated for subsequent coloring, and the size of the initial texture image is set to be 1024x 1024;

step 2: semantic labeling of rendered views by user interaction

2.1) performing filling operation on the six rendered views obtained in the step 1, only using different colors to paint the area inside the model outline in the rendered views, and assigning semantic information to the area painted by each color;

2.2) after the filling, the calibration program carries out painting inspection on the overlapping area of the six views, if semantic conflict is found, modification is prompted, and if no semantic conflict exists, the rendered views after painting are output;

and step 3: rendering view with semantic information and synthesizing target image through antagonistic neural network

3.1) inputting the rendering view with the semantic information into an antagonistic neural network, wherein the network can transmit the semantic information in each layer of the network and synthesize a target image corresponding to semantic distribution according to the semantic information;

3.2) the antagonistic neural network learns the image style by using an encoder, an image acquired from the internet is randomly appointed to be used as a style reference image for style migration before a semantic distribution image is input, a texture image similar to the style of the reference image is synthesized, the encoder reduces the dimension of the image to 1024 dimensions by using a 3-layer convolution network, then outputs a potential code with 512 dimensions as style characteristic information by using 8 full-connection layers, and splices the input layers of two generators with semantic characteristics in the antagonistic neural network;

3.3) the antagonistic neural network synthesizes a high-resolution image by using two generators and four discriminators, the low-resolution generator outputs a rough synthesized image by using a U-NET structure, the high-resolution generator extracts semantic image features by using a three-layer convolution structure and outputs feature information to combine the rough synthesized image feature information output by the low-resolution generator, and then inputs the rough synthesized image feature information into six layers of convolution layers to synthesize the high-resolution image, the six layers of convolution layers are used as final output layers, and an adaptive example normalization layer is added after each two layers of the former four layers; the four discriminator networks adopt a Patch-GAN structure, and the four discriminators judge that the scale distribution is original image, original image 1/2 downsampling, original image 1/4 downsampling and original image 1/8 downsampling;

and 4, step 4: mapping multiple composite images into a texture map

4.1) with the target image in the step 3.1 as source data, coloring the initial texture image generated in the step 1.3 by taking the vertex information and the barycentric coordinate information of the triangle corresponding to each rendering view in the step 1.1 as references, reading a pixel value from the source data, obtaining a corresponding pixel in the initial texture image according to the triangle corresponding to the pixel and the barycentric coordinate thereof, and giving the read pixel value to the corresponding pixel in the initial texture image;

4.2) for the pixels with multiple coloring contradictions in the initial texture map, firstly adopting a multiple voting scheme, if the coloring is consistent in 3 times or more, adopting the pixel value, and if the pixel value with multiple coloring is not consistent, adopting a proximity mixing method, and adopting 8 pixels around the pixel for weighted mixing to obtain a final pixel value.

In step 1.2, the rendering geometric model stores, in addition to the six rendering graphs obtained, triangle patch information in the geometric model rendered for each rendering graph, where the triangle patch information refers to triangle vertex coordinates and triangle number information, and the number of triangles rendered for each rendering graph is not necessarily equal and is less than the total number of triangle patches in the geometric model due to the visibility of the triangles rendered for each rendering graph.

Step 1.3, the geometric model without uv coordinate information calculates the area of a triangle according to the information of the vertices of the triangle, allocates uv coordinates to each vertex of the triangle, and generates an initial texture image for subsequent coloring, wherein the area of the initial texture image is calculated by an area formula according to the information of the vertices of the triangle, the initial texture image divides spaces with equal size according to the total number of the patches of the triangle, and then the size of the spaces is adjusted according to the size of the area of the triangle, and the larger the area is, the larger the allocated space is.

And 3.2, synthesizing a texture image with a style similar to that of the reference image, splicing the output style code with semantic features in an input layer of the generator, and inputting the style code into a self-adaptive example normalization layer of the high-resolution generator to realize the migration of the style details of the reference image.

The three-dimensional model texture synthesis method based on semantic image translation simplifies complex texture manufacturing processes, adopts a rasterization technology to render geometric model outlines, simplifies drawing work into filling semantic labels to reduce manual labor, utilizes an anti-neural network to generate texture images with high resolution and high detail retention, and can perform style migration according to an input reference image to specify the final style of a synthesized image.

Drawings

FIG. 1 is a flow chart of model multi-view contour rendering;

FIG. 2 is a flow chart of the generation of texture for an antagonistic neural network;

FIG. 3 is a flow chart of an embodiment of the invention.

Detailed Description

For the purpose of facilitating an understanding of the present invention, the following detailed description is given with reference to the accompanying drawings and examples.

Examples

Referring to fig. 1, in step 1 of the present invention, a geometric model is imported to obtain a multi-view rendering image and barycentric coordinates of triangles related to rendered pixels in each image are retained, and if original uv coordinate information in a triangular patch of the geometric model, a space occupied by each triangle on an initial texture map is allocated according to the original uv coordinate information. If the geometric model has no uv coordinate information, the initial texture map pixel space is allocated according to the area size of the triangle and the total number of the triangles, so that each triangle occupies a proper proportion of the space in the initial texture map.

S100: importing the model into a rasterization renderer, reading vertex coordinates of the geometric model and triangle patch information, and reading the vertex coordinates if the vertex coordinates exist;

S110-S120: for each geometric model, setting a model matrix to perform operations such as rotation, translation, scaling and the like on the model until rendering images of a main view, a top view, a left view, a right view, a bottom view and a rear view are obtained;

s130: the rendered pixels in each view are related to a triangular patch in the geometric model, so that the barycentric coordinates of the rendered pixels in the views and the triangular information to which the rendered pixels belong are calculated and stored as the mapping relation between the rendered views and the initial texture map;

s140: when the uv information exists in the geometric model, when an initial texture map of 1024x1024 is generated, the initial texture map space is allocated to each triangle patch according to uv coordinates. And if the geometry model does not have uv information, distributing the initial texture map space according to the area size of the triangular patch.

Referring to fig. 2, step 3 of the present invention is to input the rendered view with semantic information after user interaction as input data into the antagonistic neural network, which can synthesize a target image conforming to semantic distribution.

S200: filling different colors on a rendering view by a user, representing different semantics for each color, and inputting the rendering view serving as input data into a confrontation neural network;

S210-S220: inputting a sample image by a user, and outputting a style code through a trained encoder to enable the final synthesized image effect to be similar to the style of the sample image;

S230-S240: extracting image characteristics from a semantic distribution image input by a user through a convolution layer, and fusing the characteristics and style codes output by an encoder to be used as data of the next convolution kernel operation;

S250-S260: and performing operation through different convolution kernels, and outputting a final composite image according to the resolution specified by a user.

Referring to fig. 3, in step 1 of the present invention, a multi-view is rendered, a model is input to a renderer for rendering, a model contour under a multi-view is obtained, and related information is retained to prepare for subsequent texture mapping.

S100-S120: performing matrix operation on the model, and performing affine change on the model vertex to render to obtain a model outline under multiple viewing angles, namely multiple views;

s130: saving triangle information to which a pixel in each rendering graph belongs, and reserving a gravity center coordinate value of the pixel relative to the triangle;

s300: providing brushes of semantic labels represented by different colors, providing a drawing interface, transmitting the rendered outline drawing to the interface for display, and performing filling operation;

s310: the filled multi-view rendering outline graph is used as input data, and the rendering graph is operated by a user and has semantic distribution similar to a semantic segmentation graph.

Referring to fig. 3, in step 3 of the present invention, against the synthetic texture of the neural network, when selecting the input data, a sample image is also required to be input as a sample of style migration.

S210: selecting a sample image from the Internet as input data of a style encoder;

s220: outputting style codes by the trained encoder, and performing feature fusion with the semantic distribution image;

S230-S260: extracting feature information from the semantic distribution image, performing feature fusion operation with style codes extracted from the sample image, inputting the feature information serving as input data into a countermeasure neural network, and inducing and synthesizing a target image by keeping semantic features;

s320: since each view can obtain a synthesized target image, the images obtained from six views need to be mapped onto an initial texture map, since a single triangle which may exist appears in different images and a coloring conflict situation appears when the triangle is mapped back, a conflict solution strategy is needed, a proximity blending scheme and a majority voting scheme are adopted to process the coloring conflict problem, namely, a majority voting method is firstly used, if the coloring is consistent in 3 times or more, the pixel value is adopted, and if the coloring is not consistent in 3 times or more, the pixel values of eight pixels around the coloring pixel are averaged by using the proximity blending method;

s330: after the rendering problem is handled, the pixel values of all the target images are mapped into the initial texture map, and the initial texture map is rendered to become a texture map applicable to any game engine.

Claims

1. A three-dimensional model texture synthesis method based on semantic image translation is characterized by comprising the following specific steps:

step 1: rasterizing rendering profile and retaining triangle information

step 2: semantic labeling of rendered views by user interaction

and 4, step 4: mapping multiple composite images into a texture map

2. The method according to claim 1, wherein the rendering geometric model in step 1.2 stores triangle patch information in the geometric model rendered for each rendering in addition to the six renderings obtained, wherein the triangle patch information refers to triangle vertex coordinates and triangle number information, and the number of triangles rendered for each rendering is not necessarily equal and is less than the total number of triangle patches in the geometric model due to visibility of the triangles rendered for each rendering.

3. The three-dimensional model texture synthesis method based on semantic image translation according to claim 1, wherein in step 1.3, the geometric model without uv coordinate information calculates the size of the triangle area according to the triangle vertex information, allocates uv coordinates to each triangle vertex, and generates an initial texture image for subsequent coloring, the size of the approximate area is calculated by using the triangle vertex coordinate information through an area formula, the initial texture image divides spaces with equal size according to the total number of triangle patches, and then adjusts the size of the space according to the size of the triangle area, and the larger the area is, the larger the space is allocated, the larger the space is.

4. The method according to claim 1, wherein 3.2, the texture image with similar style to the reference image is synthesized, the output style code is spliced with the semantic features in the input layer of the generator, and the style code is input into the adaptive instance normalization layer of the high resolution generator to realize the transfer of the details of the reference image style.