CN114677292B

CN114677292B - High-resolution material recovery method based on two image inverse rendering neural network

Info

Publication number: CN114677292B
Application number: CN202210217527.4A
Authority: CN
Inventors: 沈旭昆; 李志强; 胡勇
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2022-03-07
Filing date: 2022-03-07
Publication date: 2022-11-01
Anticipated expiration: 2042-03-07
Also published as: CN114677292A

Abstract

The embodiment of the disclosure discloses a high-resolution material recovery method based on a neural network reversely rendered by two images. One embodiment of the method comprises: shooting to obtain an initial flashlight image and an initial guide image; performing segmentation processing on the initial flash map and the initial guide map; selecting image areas from the flash map and the guide map as a flash master tile and a guide master tile; performing segmentation processing to obtain a flash lamp image block set and a guide image block set; screening the flash lamp pattern block set and the guide pattern block set to obtain a flash lamp auxiliary pattern block set and a guide auxiliary pattern block set; rearranging the secondary image block set of the flash lamp to obtain a relighting main image block set; generating a main material graph group; obtaining an auxiliary material graph group set based on each guide auxiliary graph block, the main material graph group and the guide main graph block in the guide auxiliary graph block set; and splicing the sub-material graph group set to obtain a high-resolution material graph group. This embodiment achieves the effect of reducing the display memory consumption.

Description

High-resolution material recovery method based on two image inverse rendering neural network

Technical Field

The embodiment of the disclosure relates to the technical field of high-resolution material recovery, in particular to a high-resolution material recovery method based on two image inverse rendering neural networks.

Background

Realistic graphics is widely used in the fields of electronic games, movies, and the like. The method mainly aims to simulate the interaction process of illumination and scenes in the real world through an algorithm by means of the strong computing power of a computer, and finally generate an image with high reality, wherein the interaction process is expressed by material. Material appearance modeling is used to represent complex lighting effects of light, material and geometry interactions that need to be represented in physics using a high-dimensional mathematical function. For opaque objects, this complex process is defined by a six-dimensional Spatially Varying Bidirectional Reflectance Distribution Function (SVBRDF). The high dimensional nature of this function makes it an urgent technical challenge to reconstruct a realistic appearance using a handheld device.

The poor appearance reconstructed by the existing material recovery method causes more artifacts on the surface appearance of the rendered plane material, and when a high-resolution material is reconstructed, the rendered plane effect is very poor and cannot achieve the illumination effect acceptable to human vision. In addition, the existing method for recovering the material needs to shoot more than two images, which increases the shooting complexity and has high requirement on video memory.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure provide a high resolution material recovery method based on a neural network for inverse rendering of two images, to solve one or more of the technical problems mentioned in the above background.

Controlling a mobile terminal to shoot an image to obtain an initial flash map and an initial guide map, wherein the initial flash map is an image shot by the mobile terminal when a flash is turned on, and the initial guide map is an image shot by the mobile terminal under ambient light when the flash is not turned on; dividing the initial flashlight image and the initial guide image to obtain a flashlight image and a guide image; selecting image areas from the flash map and the guide map as a flash master block and a guide master block, respectively, wherein the positions of the flash master block and the guide master block in the image areas in the flash map and the guide map are matched; the flash lamp graph and the guide graph are subjected to segmentation processing to obtain a flash lamp graph block set and a guide graph block set; randomly screening the flash lamp pattern blocks and the guide pattern blocks from the flash lamp pattern block set and the guide pattern block set to serve as flash lamp secondary pattern blocks and guide secondary pattern blocks to obtain a flash lamp secondary pattern block set and a guide secondary pattern block set; rearranging the flash lamp auxiliary image block set based on the guide auxiliary image block set to obtain a relighting main image block set; generating a master material map group based on the flash map, the relighting master block set, and the flash master block; generating an auxiliary material graph group based on each guide auxiliary graph block in the guide auxiliary graph block set, the main material graph group and the guide main graph block to obtain an auxiliary material graph group set; and splicing each auxiliary material graph group in the auxiliary material graph group set to obtain a high-resolution material graph group.

Most real-world objects are composed of stationary spatially distributed classes of materials (e.g., stationary materials) that exhibit similar properties of the same or similar materials and textures at different locations, i.e., stationary properties. Such as wood, some metals, textiles, leather, man-made materials, etc. Attention is paid to enough self-similar stationary material samples which have the characteristic that high similarity small neighborhoods appear at different positions. Since the same material has different incident light directions and viewpoint directions at different spatial positions, these different reflections can provide clues to the missing features and ambiguity, and therefore, the combination of these different reflections can improve the fidelity of the surface appearance reconstruction.

While some of the earlier methods can restore realistic surface appearance by densely sampling controllable light sources and viewpoints, this process of restoring material requires extreme acquisition conditions, long time consuming and expensive equipment. Other methods may determine the texture as accurately as possible by using consumer-grade cameras, but they still require taking a large number of pictures, increasing the complexity of the process of taking pictures and reconstructing the texture. In recent years, deep learning has promoted the development of material recovery from a single picture. However, for static materials, these methods do not use the characteristics of different positions with the same material attributes to restore materials so as to reconstruct a more realistic surface appearance, so that they restore poor materials to cause more artifacts in the rendered object, and they mainly aim at reconstructing low-resolution materials, and when reconstructing high-resolution materials, because the receptive field is very small, these training models often produce worse results to cause extremely poor rendering effect, and cannot achieve the surface appearance acceptable for human vision. Increasing the number of input images can provide more clues to the reconstruction process for resolving missing or ambiguous features, however these methods do not utilize static features, resulting in the inability to recover reasonable material from fewer images, while increasing the complexity of acquiring pictures. Moreover, when the methods are directly used to realize the reconstruction of the high-resolution material, the increase of the number of pictures and the resolution will greatly increase the consumption of video memory, so that the common user cannot complete the task by using a computer with a common configuration.

The disclosure provides a method for determining a high-resolution material from two images by using a material self-similarity characteristic aiming at a static material. In order to reduce the complexity of the acquired images, the two images are shot by the method, so that the fidelity of surface appearance reconstruction is improved, more information is acquired from the two images, and the recovery of high-resolution materials is supported by a common user. Because only two images need to be shot, the complexity of image acquisition is reduced, high-resolution materials can be determined, and the requirement of high video memory is avoided because the low-resolution images are processed by the network, so that the method is suitable for users with common equipment to use, and more reasonable materials can be determined from the two images to reconstruct vivid surface appearance.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is a flow diagram of some embodiments of a high resolution material recovery method based on a two-image inverse rendering neural network according to the present disclosure;

FIG. 2 is a schematic diagram of an application scenario of a high resolution material recovery method based on a two-image inverse rendering neural network according to some embodiments of the present disclosure;

FIG. 3 is a schematic diagram of another application scenario of a high resolution material recovery method based on two image inverse rendering neural networks according to some embodiments of the present disclosure;

fig. 4 is a schematic diagram of another application scenario of a high-resolution material restoration method based on a neural network reversely rendered from two images according to some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and the embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

FIG. 1 is a flow diagram of some embodiments of a high resolution material recovery method based on a two-image inverse rendering neural network according to the present disclosure. The high-resolution material recovery method based on the two image inverse rendering neural network comprises the following steps:

step 101, controlling the mobile terminal to shoot images to obtain an initial flash map and an initial guide map.

In some embodiments, the execution subject of the high-resolution material restoration method based on the two image reverse rendering neural networks may control the mobile terminal to perform image shooting in a wired connection manner or a wireless connection manner, so as to obtain an initial flash map and an initial guide map. The initial flash map is an image captured by the mobile terminal when a flash is turned on, and the initial guide map is an image captured by the mobile terminal under ambient light when the flash is not turned on.

And 102, performing segmentation processing on the initial flash map and the initial guide map to obtain a flash map and a guide map.

In some embodiments, the execution subject may perform a division process on the initial flashlight view and the initial guide view to obtain a flashlight view and a guide view.

And 103, respectively selecting image areas from the flash map and the guide map as a flash master block and a guide master block.

In some embodiments, the execution body may select an image region from the flash map and the guide map as a flash master and a guide master, respectively. Wherein the positions of the flash main pattern block and the guide main pattern block in the image areas of the flash map and the guide map are matched. For example, the position coordinates of the image areas of the flash master block and the guide master block in the flash map and the guide map are the same.

And step 104, carrying out segmentation processing on the flash lamp graph and the guide graph to obtain a flash lamp graph block set and a guide graph block set.

In some embodiments, the execution body may perform a segmentation process on the flash map and the guide map to obtain a flash tile set and a guide tile set.

And 105, randomly screening the flash lamp pattern blocks and the guide pattern blocks from the flash lamp pattern block set and the guide pattern block set to serve as flash lamp sub pattern blocks and guide sub pattern blocks to obtain a flash lamp sub pattern block set and a guide sub pattern block set.

In some embodiments, the execution body may randomly filter the flash tiles and the boot tiles from the flash tile set and the boot tile set as flash sub-tiles and boot sub-tiles to obtain a flash sub-tile set and a boot sub-tile set.

And 106, rearranging the flash lamp secondary image block set based on the guide secondary image block set to obtain a relighting main image block set.

In some embodiments, the execution body may rearrange the flash sub-tile set based on the guide sub-tile set to obtain a relighting main tile set.

Step 107, generating a master material map group based on the flash map, the relighting master pattern set, and the flash master pattern.

In some embodiments, the execution subject may generate a master material map group based on the flash map, the relighting master pattern set, and the flash master pattern. The texture may refer to the illumination effect of the interaction of light and objects, and the texture may be a set of texture maps with repeated textures, and the texture may be random or repeated. The material can also be expressed as a six-dimensional space variation bidirectional reflection distribution function, and the parametric expression of the material can be expressed as a plurality of maps (diffuse map, normal map, rough map, and specular map). The image can be rendered by putting the maps, the viewpoints and the light sources into a rendering equation.

In some optional implementation manners of some embodiments, the extracting information of each piece of development language information included in the development engineering information set to obtain a general command information set may include the following steps:

firstly, inputting the flash lamp diagram into a pre-trained material generation model to obtain an initial high-resolution material diagram group.

And secondly, segmenting the initial high-resolution material map group to obtain an initial main material map group. The segmentation process may be an image segmentation process.

And thirdly, obtaining a main material image group based on the initial main material image group, the relighting main image block set, the flash lamp main image block and a pre-trained automatic encoder.

Optionally, the pre-trained auto-encoder comprises an encoder and a decoder; and obtaining a master material map group based on the initial master material map group, the relighting master pattern set, the flash master pattern block, and a pre-trained auto-encoder, may include:

firstly, inputting the initial main material picture group into the encoder and the decoder module to obtain an initial decoding main material picture group. The decoder module comprises a parameter optimization module and a decoder.

And a second step of performing the following inverse rendering step based on the initial decoding main material graph group and the decoder module:

generating a joint loss value based on the initial decoded master texture map set, the relighting master texture map set, and the flash master texture map. In response to determining that the joint loss value converges to a predetermined threshold, treating the initially decoded master material map set as a master material map set. And adjusting the potential vector parameters in the decoder module in response to determining that the joint loss value does not converge on the predetermined threshold value, taking the adjusted decoder module as the decoder module, inputting the adjusted potential vector parameters into the decoder module to obtain an initial decoding main material map group, and performing the inverse rendering step again. Wherein, in order to obtain the potential space and decode the output texture map from the potential space, the technique uses a full convolution automatic encoder to solve the potential spaceThe space is modeled, the automatic encoder is composed of an encoder E () and a decoder D (), the encoder E () converts the material into the corresponding potential space, and the decoder D () converts the potential space into the material: z = E(s), s = D (z). Wherein s represents the material. z characterizes the corresponding potential space. The loss function used by the auto-encoder consists of two parts:

wherein L is_trainRepresenting the loss function used by the auto-encoder. L is a radical of an alcohol_mapThe loss value of the texture map is shown. L is a radical of an alcohol_renderThe 9 graphs representing the determined texture renderings are logarithmically valued for loss.

Optionally, the generating a joint loss value based on the initial decoded master texture map set, the relighting master texture map set, and the flash master texture map block may include:

and rendering the initial decoding main material graph group under the same illumination condition with each relighting main graph block in the relighting main graph block set to obtain a first rendering graph set.

And secondly, generating a first rendering loss value based on the first rendering graph set and the relighting main graph set.

And thirdly, rendering the initial decoding main material graph group under the same illumination condition with the flash lamp main graph block to obtain a second rendering graph.

And fourthly, generating a second rendering loss value based on the second rendering graph and the flash master graph block.

And fifthly, obtaining a joint loss value based on the first rendering loss value, the second rendering loss value, the first preset parameter and the second preset parameter. The process of generating the joint loss value can be seen in fig. 3.

Optionally, the joint loss value may be obtained based on the first rendering loss value, the second rendering loss value, the first preset parameter, and the second preset parameter by:

based on the first rendering loss value, the second rendering loss value, the first preset parameter and the second preset parameter, obtaining a joint loss value by using the following formula:

wherein L represents the above joint loss value. α represents the first preset parameter. R¹And representing a first rendering graph in the first rendering graph set. i represents a serial number.

And representing the ith first rendering map in the first rendering map set. I is¹Representing a relighting master tile in the set of relighting master tiles.

Representing the ith re-illuminated master tile in the set of re-illuminated master tiles. β represents the second predetermined parameter. R²The second rendering graph is shown. I is²Representing the flash master. | represents the norm.

The value of (1) is obtained by subtracting the pixel value of the corresponding position of the ith relighting main block in the relighting main block set from the pixel value of the ith first rendering in the first rendering set, then taking absolute values and adding the absolute values, and finally dividing the absolute values by the number of pixels. II R²-I²The value of | is the value obtained by subtracting the pixel value of the corresponding position of the main block of the flash lamp from the pixel value of the second rendering image, then taking absolute values and adding the absolute values, and finally dividing the absolute values by the number of the pixels.

And 108, generating an auxiliary material graph group based on each guide auxiliary graph block, the main material graph group and the guide main graph block in the guide auxiliary graph block set to obtain an auxiliary material graph group set.

In some embodiments, the execution agent may generate an auxiliary material map group based on each of the guidance auxiliary map blocks, the main material map group, and the guidance main map block in the guidance auxiliary map block set, resulting in an auxiliary material map group set.

In some optional implementations of some embodiments, the generating a sub-material map group based on each guide sub-tile in the guide sub-tile set, the main material map group, and the guide main tile may include:

firstly, generating descriptors based on each pixel point in the guide sub-image block to obtain descriptor sets.

And secondly, generating a reference descriptor based on each reference pixel point in the guide main image block to obtain a reference descriptor set.

And thirdly, determining a reference descriptor corresponding to the descriptor in the reference descriptor set as a matching descriptor for each descriptor in the descriptor set to obtain a matching descriptor set.

And fourthly, generating a rearrangement mapping table based on the position information of each pixel point in the guide auxiliary image block and the position information of the reference pixel point corresponding to the matching descriptor set.

And fifthly, generating an auxiliary material graph group according to the rearrangement mapping table and the guide auxiliary graph block.

Optionally, the generating the descriptor based on each pixel point in the guidance sub-tile may include the following steps:

firstly, respectively using Gaussian kernels with the window size of 16 and the standard deviation of 4 and the window size of 32 and the standard deviation of 8 to conduct Gaussian smoothing on the guide sub-image blocks to obtain a first matching sub-image block and a second matching sub-image block.

Second, 33 × 33,65 × 65, and 5 × 5 red, green, and blue pixel neighborhoods are selected from the first matching sub-tile, the second matching sub-tile, and the leading sub-tile, respectively, and 128, 96, and 32 point pairs are selected from the 33 × 33,65 × 65, and 5 × 5 red, green, and blue pixel neighborhoods, respectively, using a binary robust independent primitive feature descriptor method. The binary robust independent primitive feature descriptor method may be to use BRIEF G II for point pair selection. The BRIEF G II may be a binary robust independent base feature G II.

Third, 128-bit, 96-bit, and 32-bit descriptors are determined based on the selected 128, 96, and 32 point pairs.

Fourthly, the determined 128-bit descriptor, the determined 96-bit descriptor and the 32-bit descriptor are connected to obtain a 768-bit descriptor.

Optionally, the generating a reference descriptor based on each reference pixel point in the guidance master tile may include the following steps:

firstly, gaussian kernels with the window size of 16 and the standard deviation of 4 and with the window size of 32 and the standard deviation of 8 are respectively used for carrying out Gaussian smoothing on the guide master block to obtain a first matching master block and a second matching master block.

Second, 33 × 33,65 × 65, and 5 × 5 red, green, and blue pixel neighborhoods are selected from the first matching master tile, the second matching master tile, and the guide master tile, respectively, and 128, 96, and 32 point pairs are selected from the 33 × 33,65 × 65, and 5 × 5 red, green, and blue pixel neighborhoods, respectively, using a binary robust independent primitive feature descriptor method.

Optionally, for each descriptor in the descriptor set, determining a reference descriptor corresponding to the descriptor in the reference descriptor set as a matching descriptor, which may include the following steps:

first, determining the hamming distance between the descriptor and each reference descriptor in the reference descriptor set to obtain a hamming distance set.

And secondly, sequencing the Hamming distances in the Hamming distance set from small to large to obtain a Hamming distance sequence.

And thirdly, determining the reference descriptor corresponding to the minimum Hamming distance in the Hamming distance sequence as a matching descriptor.

And step 109, splicing each sub-material graph group in the sub-material graph group set to obtain a high-resolution material graph group.

In some embodiments, the execution body may stitch each sub-material map group in the sub-material map group set to obtain a high-resolution material map group.

Optionally, the execution main body may further send the high-resolution material diagram group to a three-dimensional modeling device, so that the three-dimensional modeling device constructs a three-dimensional model.

Fig. 2 is a schematic diagram of an application scenario of a high-resolution material restoration method based on a neural network reversely rendered by two images according to some embodiments of the present disclosure.

In order to reduce material estimation supporting high-resolution images, reduce video memory consumption and reduce the complexity of shot images. And reconstructing a high-quality material by using the material characteristics and the strong learning ability of deep learning through the two images. The technology provides a flow method for determining a high-resolution material from two images (one image can be shot by a mobile phone with a flash lamp, namely an initial flash lamp image, and the other image can be shot under ambient light, namely an initial guide image) by utilizing material characteristics and based on a depth inverse rendering framework. First, to take advantage of the texture of structural or random repetition, a reflection sample transmission method is introduced to combine multiple observations into one master tile. Specifically, two images of 2048 × 2048 resolution are cropped, and 256 × 256 pixels at the same position are randomly cropped from the two images as master tiles of the flash map and the guide map. Meanwhile, the flash map and the guide map may be divided into 64 tiles of 256 × 256 pixels, respectively, and 32 tiles may be selected as sub tiles. The materials of the master tile and the slave tile may be defined as a master material and a slave material. The reflection sample transmission method can convert the sub pattern block into a rough main pattern block, which is similar to re-rendering the main pattern block under the same condition of the incident light and emergent light directions of the sub pattern block, so as to obtain the re-illuminated main pattern block. Secondly, the texture of the master tile can be restored by adopting depth inverse rendering. In order to initialize the potential vectors of the automatic encoder, a convolutional neural network using a single image as input obtains an initial high-resolution material map set from a flashlight image, and cuts an initial main material map set from the initial high-resolution material map set as the input of the automatic encoder. In addition, more reasonable master tile materials may be reconstructed using the relighting master tile set and the flash master tile as inputs to the loss function. A refinement post-processing step may then be used to re-introduce details for the primary material. Finally, when all the sub-materials are generated from the main material by using a retro-reflective sample transmission method and the high-resolution material is obtained by splicing the sub-materials, in order to remove random artifacts and edge artifacts of the main block region in the high-resolution material, the main material can be mapped to the sub-materials by using a skip rearrangement method in the retro-reflective sample transmission process. And refining the high-resolution material again.

The details of the whole process are explained here, which can reconstruct reasonable high resolution material from two images while reducing video memory consumption, and the three contents of loss function and jump rearrangement.

1. Reconstruction of high resolution material

The complete frame can reduce the video memory consumption and the complexity of image shooting for determining the material of the high-resolution image. Therefore, the texture similarity and the strong learning capacity of deep learning are utilized, and high-quality materials are reconstructed through the two images. The reason why the framework can effectively solve these problems is mainly explained here.

(1) A full convolution network trained on a low resolution image dataset can also process high resolution images. However, the relatively small receptive field results in poor quality of the high resolution material being determined. For the texture, the main texture may be mapped to the sub texture by using a reverse transfer method, and in order to determine the high resolution texture from the two images, the determined main texture may be similarly transferred to the sub texture. Then, a high-resolution material can be generated by stitching these sub-materials. Finally, refinement is used to generate more reasonable high resolution material. Therefore, it is critical to determine a reasonable host material to recover the high resolution material.

(2) A single inference network seeks to determine reasonable material from a single image or a fixed number of images. However, to support more images as input, these methods must greatly increase the video memory consumption to train the network. Furthermore, because of the noise and artifacts present and irregularities in the relighting of the master block, no relighting of the master block data set is available for training and a large amount of data with texture-like features is lacking. These problems prevent training inference networks to determine the dominant material. In addition, slight movements of the flash may change the local lighting conditions, thereby significantly changing the brightness of the tile, which may cause artifacts to the determination result of the inference network. In order to determine a reasonable dominant texture and reduce the consumption of video memory, a depth inverse rendering method is used to determine the dominant texture. The initial high-resolution texture is determined for a single flashlight image by using only a single inference network, and the texture of the master image block is determined by using the heavy illumination master image block with small resolution and the original master image block, so that the video memory consumption cannot increase along with the increase of the image resolution. To be precise, the required video memory for the network may be 7G for two 2560 × 2560 resolution images.

2. Loss function

It is desirable to determine as reasonable a principal material as possible because the principal material may affect the quality of the determined high resolution material. The intuitive method is to directly realize a differentiable rendering loss L_relitIt renders the predicted texture map s under the same lighting conditions as the relighting of the main block, and computes these rendering maps R (s, L (I)) and the relighting of the main block set I_iFirst paradigm L1 difference between:

L_relit＝||R(s，L_i)-I_i||。

however, imperfect matching between the points of the main block and the pixel points of the secondary block introduces noise and artifacts to the re-illumination of the main block. These rough, heavily illuminated master patterns can also contribute noise to the determined master material. While the underlying space may regularize the optimization, it is not sufficient to reduce the interference of noise and artifacts. To go intoDetermining more reasonable main material in one step, and adding another rendering loss L_origThe loss calculating flash lamp main block I_origAnd using the same illumination conditions L_origRendered tile R (s, L)_orig) First paradigm L1 difference between:

L_orig＝||R(s，L_orig)-I_orig||。

albeit L_relitThe formula (a) generates a main block material with noise and artifacts, but it recovers the coarse-scale structural information. Based on the results of this coarseness, L_origThe detailed texture constraint is modeled to reduce noise and artifact interference, and the best reconstruction result is obtained by using a joint loss function:

L＝α×L_relit+β×L_orig。

wherein L represents the above joint loss value. α represents the first preset parameter. β represents the second predetermined parameter. L is_relitRepresenting the L1 difference between the rendering graph R (s, L (i)) and the relighting master tile set. L is_origFlash lamp main block I_origAnd using the same illumination conditions L_origRendered tile R (s, L)_orig) A first paradigm L1 difference therebetween.

The overall loss function map is shown in fig. 3, and the premise for reducing noise and artifacts to determine reasonable fine-scale information is to restore the coarse-scale structure. Therefore, the value of α needs to be set larger than β to perform the optimization from coarse to fine. Empirically, α is set to 6 and β is set to 1 to balance the magnitude of each loss. In order to refine the material of the main image block, a loss function similar to the formula L may be used for refinement, but α and β are set to 1.

3. Jump rearrangement

The inverse transport method introduces significant edge artifacts when the host material is transported to the high resolution material. It was observed that the edge pixel values of the restored master tile texture map were either lower or higher than the average pixel value of the entire tile, and this difference in pixel values was very significant in the diffuse and specular maps, and occasionally appears in the roughness map. Specifically, when the average pixel value of the edge of the specular reflection map is slightly lower, the average pixel value of the edge of the network-generated diffuse reflection map becomes higher in order to ensure that the rendered image is the same as the input image. Lower or higher pixels are defined as high and low artifacts. The inverse transport method based on rearrangement maps the primary material to the secondary material and splices the secondary materials together to generate a high resolution material. However, using a realignment approach can introduce artifacts into the side material. Specifically, it may be assumed that the sub-block includes a part of the main block, and in this region, the rearranged sub-block is equivalent to a part of the main block copied, and in this case, the edge pixel value of the main block region in the generated high-resolution material is either all higher than the surrounding region or all lower than the surrounding region, so that high and low artifacts are obviously generated at the edge of the main block region.

To remove these high and low artifacts (e.g., square artifacts and random artifacts), the present technique proposes a skip rearrangement method to generate a rearrangement map from the main block to the sub-block. The process of hop rearrangement is shown in fig. 4. First, the center area of the guide map is clipped to the guide master tile (the clipping area may be 216 × 216 pixels). For each pixel point A of the guidance sub-image block, a pixel point B with the most similar material is found in the guidance main image block, and then the position information of the two points is recorded, wherein the similarity of the two points (A and B) is judged by using a binary robust independent basic feature descriptor (BRIEF descriptor). The BRIEF descriptor may be a descriptor of a feature point. The implementation method of the process is as follows: considering the similarity of multiple scales, gaussian kernels with window sizes of 16 and 32 and standard deviations of 4 and 8 are used for respectively performing Gaussian smoothing on the guide main image blocks to reduce noise interference, the guide main image blocks are added to obtain three processed guide main image blocks P1, P2 and P3, for P1, P2 and P3, 33 x 33,65 x 65 and 5 x 5 pixel neighborhoods are selected for one pixel point, 128, 96 and 32 point pairs are respectively selected by a BRIEF G II, 128-bit, 96-bit and 32-bit descriptors are calculated, the operations are repeated for three color channels, and finally the descriptors are connected to obtain 768-bit descriptors. And performing the operation on each pixel point of the guide auxiliary image block to obtain a corresponding descriptor. And traversing all pixel points of the guide main image block aiming at one pixel point in the guide auxiliary image block, and calculating a corresponding descriptor. And then calculating the Hamming distance of the two point descriptors, wherein the two pixel points with the minimum Hamming distance are the best matching points, and therefore, storing the position information of the two points with the minimum Hamming distance. Based on the method, all pixel points of the traversing guide secondary image block find corresponding best matching points from the guide main image block, and record corresponding position information to generate a rearrangement mapping table. The primary material is then rearranged to the secondary material using a rearrangement mapping table. Fig. 4 also shows the rearranged relighting master block, wherein the rearranged relighting master block and the jump rearranged relighting master block are compared in effect.

For all secondary tiles, the same method is used to remap from the primary to the secondary textures. Finally, when all the sub-materials are spliced, a window median filter is applied to clear the boundaries between the spliced regions. Since this skip rearrangement method does not select matching points from the edges of the main block, it reduces square artifacts and random artifacts of the generated high-resolution material.

The invention can reduce the complexity of image acquisition, automatically rebuild a reasonable or accurate material map, can replace the professional to make the material when the quality of the map is very accurate, and can still provide a reference and a target clue for the person to create the material map when the quality of the map is slightly poor, thereby assisting the professional to make the material and finally achieving the effect of reducing the labor cost of material making. In addition, only two images need to be shot, so that the complexity of image acquisition is greatly reduced, high-resolution materials can be determined, and the consumption of video memory is low, so that the method is suitable for users with common equipment, the quality of the materials determined from the two images is better, the fidelity of the reconstructed surface appearance is higher, and the method is more favorable for assisting professionals in making the materials.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combinations of the above-mentioned features, and other embodiments in which the above-mentioned features or their equivalents are combined arbitrarily without departing from the spirit of the invention are also encompassed. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A high-resolution material recovery method based on two image inverse rendering neural networks comprises the following steps:

controlling a mobile terminal to shoot an image to obtain an initial flash image and an initial guide image, wherein the initial flash image is an image shot by the mobile terminal when a flash is turned on, and the initial guide image is an image shot by the mobile terminal under ambient light when the flash is not turned on;

performing segmentation processing on the initial flash map and the initial guide map to obtain a flash map and a guide map;

selecting image regions from the flash map and the guide map as a flash master tile and a guide master tile, respectively, wherein the positions of the image regions in the flash map and the guide map of the flash master tile and the guide master tile are matched;

the flash lamp graph and the guide graph are subjected to segmentation processing to obtain a flash lamp tile set and a guide tile set;

randomly screening the flash lamp pattern blocks and the guide pattern blocks from the flash lamp pattern block set and the guide pattern block set to serve as flash lamp sub pattern blocks and guide sub pattern blocks to obtain a flash lamp sub pattern block set and a guide sub pattern block set;

rearranging the flash lamp secondary image block set based on the guide secondary image block set to obtain a relighting main image block set;

generating a master material map group based on the flash map, the relighting master tile set, and the flash master tile;

generating an auxiliary material graph group based on each guide auxiliary graph block in the guide auxiliary graph block set, the main material graph group and the guide main graph block to obtain an auxiliary material graph group set;

splicing each auxiliary material graph group in the auxiliary material graph group set to obtain a high-resolution material graph group;

wherein the generating a master material map group based on the flash map, the relighting master tile set, and the flash master tile comprises:

inputting the flash lamp graph into a pre-trained material generation model to obtain an initial high-resolution material graph group;

segmenting the initial high-resolution material map group to obtain an initial main material map group;

obtaining a master material graph group based on the initial master material graph group, the relighting master block set, the flash lamp master block and a pre-trained automatic encoder;

wherein the generating a secondary material map group based on each boot secondary tile in the set of boot secondary tiles, the primary material map group, and the boot primary tile comprises:

generating descriptors based on each pixel point in the guide auxiliary image block to obtain descriptor sets;

generating a reference descriptor based on each reference pixel point in the guide main image block to obtain a reference descriptor set;

for each descriptor in the descriptor set, determining a reference descriptor corresponding to the descriptor in the reference descriptor set as a matching descriptor to obtain a matching descriptor set;

generating a rearrangement mapping table based on the position information of each pixel point in the guide auxiliary image block and the position information of the reference pixel point corresponding to the matching descriptor set;

and generating an auxiliary material graph group according to the rearrangement mapping table and the main material graph group.

2. The method of claim 1, wherein the method further comprises:

and sending the high-resolution material map group to three-dimensional modeling equipment so as to construct a three-dimensional model for the three-dimensional modeling equipment.

3. The method of claim 2, wherein the pre-trained auto-encoder comprises an encoder and a decoder module; and

obtaining a master material map group based on the initial master material map group, the relighting master pattern set, the flash master pattern block, and a pre-trained auto-encoder, comprising:

inputting the initial main material picture group into the encoder and the decoder module to obtain an initial decoding main material picture group, wherein the decoder module comprises a parameter optimization module and a decoder;

based on the initial decoding main material graph group and the decoder module, performing the following inverse rendering steps:

generating a joint penalty value based on the initial decoding master texture map group, the relighting master tile set, and the flash master tile;

in response to determining that the joint loss value converges to a predetermined threshold, treating the initial decoded master material map set as a master material map set;

in response to determining that the joint loss value does not converge to a predetermined threshold, adjusting a potential vector parameter in a decoder module, taking the adjusted decoder module as the decoder module, inputting the adjusted potential vector parameter to the decoder module to obtain an initial decoded main material map group, and performing the inverse rendering step again.

4. The method of claim 3, wherein generating a joint loss value based on the initial decoded master texture map group, the relighted master tile set, and the flash master tile comprises:

rendering the initial decoding main material graph group under the same illumination condition with each relighting main graph block in the relighting main graph block set to obtain a first rendering graph set;

generating a first rendering penalty value based on the first rendering graph set and the relighting master graph set;

rendering the initial decoding main material image group under the same illumination condition with the flash lamp main image block to obtain a second rendering image;

generating a second rendering penalty value based on the second rendering map and the flash master tile;

and obtaining a joint loss value based on the first rendering loss value, the second rendering loss value, the first preset parameter and the second preset parameter.

5. The method of claim 4, wherein the deriving a joint penalty value based on the first rendering penalty value, the second rendering penalty value, a first preset parameter, and a second preset parameter comprises:

wherein L represents the joint loss value, α represents the first preset parameter, and R¹Represents a first rendering graph in the first rendering graph set, i represents a sequence number,

representing the ith first rendering in the first rendering set, I¹Representing a relighting master tile in the set of relighting master tiles,

represents the ith relighting master pattern block in the relighting master pattern set, beta represents the second preset parameter, R²Representing said second rendering, I²Represents the flash master pattern, | represents a norm,

the value of (a) is the value obtained by subtracting the pixel value of the corresponding position of the ith relighting main block in the relighting main block set from the pixel value of the ith first rendering in the first rendering set, then taking the absolute value and adding the absolute value, and finally dividing the absolute value by the number of pixels, < R >²-I²The value of | is obtained by subtracting the pixel value of the corresponding position of the main image block of the flash lamp from the pixel value of the second rendering image, then taking absolute values and adding the absolute values, and finally dividing the absolute values by the number of pixels.

6. The method of claim 5, wherein the generating a descriptor based on each pixel point in the boot sub-tile comprises:

respectively using Gaussian kernels with the window size of 16, the standard deviation of 4, the window size of 32 and the standard deviation of 8 to conduct Gaussian smoothing on the guide sub-image block to obtain a first matching sub-image block and a second matching sub-image block;

selecting three pixel neighborhoods 33 x 33,65 x 65 and 5 x 5 red, green and blue from the first matching sub-tile, the second matching sub-tile and the guiding sub-tile, respectively, and selecting 128, 96 and 32 point pairs from the three pixel neighborhoods 33 x 33,65 x 65 and 5 x 5 red, green and blue, respectively, using a binary robust independent primitive feature descriptor method;

determining 128-bit, 96-bit, and 32-bit descriptors based on the selected 128, 96, and 32 point pairs;

the determined 128-bit, 96-bit and 32-bit descriptors are concatenated, resulting in a 768-bit descriptor.

7. The method of claim 6, wherein the generating a reference descriptor based on each reference pixel point in the boot master tile comprises:

respectively using Gaussian kernels with the window size of 16, the standard deviation of 4, the window size of 32 and the standard deviation of 8 to conduct Gaussian smoothing on the guide master block to obtain a first matching master block and a second matching master block;

selecting three pixel neighborhoods 33 x 33,65 x 65 and 5 x 5 red, green and blue from the first matching master tile, the second matching master tile and the guide master tile, respectively, and selecting 128, 96 and 32 point pairs from the three pixel neighborhoods 33 x 33,65 x 65 and 5 x 5 red, green and blue using a binary robust independent primitive feature descriptor method;

8. The method of claim 7, wherein the determining, for each descriptor in the set of descriptors, a reference descriptor in the set of reference descriptors that corresponds to the descriptor as a matching descriptor comprises:

determining a Hamming distance between the descriptor and each reference descriptor in the reference descriptor set to obtain a Hamming distance set;

sequencing all the Hamming distances in the Hamming distance set from small to large to obtain a Hamming distance sequence;

and determining the reference descriptor corresponding to the minimum Hamming distance in the Hamming distance sequence as a matching descriptor.