CN117876562A

CN117876562A - Optimization method for texture mapping of non-lambertian surface based on consumer-level RGB-D sensor

Info

Publication number: CN117876562A
Application number: CN202410086085.3A
Authority: CN
Inventors: 栾峰; 赵静; 常思瑶; 魏阳杰; 李文豪; 于航翊
Original assignee: 东北大学
Priority date: 2024-01-22
Filing date: 2024-01-22
Publication date: 2024-04-12

Abstract

Recent advances in sensor technology have introduced low cost RGB video plus depth sensors that can acquire both color and depth images at the video rate. Many researchers have proposed methods of texture mapping three-dimensional reconstructed models using consumer grade RGBD cameras, such as Kinect. But existing texture mapping algorithms are mostly directed to the langer model. The invention is a texture optimization framework for a non-lambertian surface, fully utilizing depth information, and reconstructing the surface texture of a model using a composite image. The specific contributions are: 1) The existing camera pose optimization strategy is expanded by utilizing the depth information, and the accuracy of the texture-free region is improved; 2) The color inconsistency between key frames is coordinated by using the joint color histogram, so that the negative influence of highlight pixels on reconstruction textures is avoided; 3) A BDS-G texture synthesis method based on a bi-directional similarity function is provided, and texture edges are aligned by combining depth data. This method allows the input of arbitrary viewing angles and re-synthesizes the real texture for the highlight regions in the input RGB image. We performed a number of experiments on multiple multi-view stereo datasets, the experimental results demonstrating the effectiveness of the method of the invention.

Description

Optimization method for texture mapping of non-lambertian surface based on consumer-level RGB-D sensor

Technical Field

The invention belongs to the technical field of computer graphics of non-lambertian object 3D reconstruction, and relates to a consumer-level RGB-D camera-based mapping method for non-lambertian surface textures.

Background

With the advent of RGB-D sensors, reconstruction of real world scenes and object models has become more convenient. And, through the vigorous development of 3D modeling technology in recent years, the technology of three-dimensional reconstruction by using a consumer-grade RGB-D camera realizes qualitative leap. For indoor scenes we can reconstruct very detailed 3D models and texture details. However, the texture quality of the reconstructed model is far from meeting the requirements of applications such as actual AR/VR, medical imaging, cultural relics repair, 3D printing, etc.

Reconstructing a three-dimensional model of a scene or object from an RGB-D sequence is mainly affected by several factors: 1) Because the accuracy of the depth camera is low, the captured depth data contains a large amount of noise, so that the reconstructed initial geometric model often has geometric errors and distortions; 2) The camera pose is also estimated from noise-containing data, which is also inaccurate; 3) RGB camera of the consumer-grade depth camera and shutter of the depth camera are not completely synchronous; 4) Due to factors such as illumination variation, camera exposure, white balance and the like, the color inconsistency problem exists among the acquired RGB frames. These factors can significantly reduce the fidelity of the reconstructed surface.

In order to solve the above problems, and to make the reconstructed texture as realistic and natural as possible, li et al (WACV, 1413-1422, 2019) devised a capturing system consisting of a set of RGB-D cameras, which are sequentially arranged around the target area to acquire information of different viewing angles. The method can avoid the accumulated error of the camera pose; oliveira et al (Sensors, vol.21, no.9, art. No.3248, 2021) consider camera pose to be always inaccurate, they tend to use a propagation algorithm that ensures that transformations are minimized in selecting a camera for each triangle surface in the mesh, thus making the texture mapping process more robust. However, these methods all require multiple RGB-D sensors, which are costly and difficult to integrate; wang et al (Remote Sensing, vol.14, no.9, art. No.2160, 2022) propose a parallel approach to texture reconstruction, construct a graph cut algorithm to select the best view angle for each mesh plane, and smooth between adjacent views to mitigate texture seams.

For non-lambertian materials, the high-light regions in the input RGB image can cause texture loss in the final reconstructed model, which is obviously not ideal for practical application scenarios. To address these problems, whelan et al (TOG, vol.37, no.4, art. No.102, 2018) introduced a fully automated pipe that was able to reconstruct the geometry and range of planar glass and mirrors, while automatically dividing both based on the estimated planes and positions of multiple reflective surfaces in a scene; zhou et al (PAMI, vol.42, no.7,1594-1605,2020) designed a new concentric multispectral light field (CMSLF) and built a physical-based reflectivity model thereon to estimate depth and multispectral reflectivity map without applying any surface priors; cheng et al (CVPR, 16226-162352021,2021) formulate the task of surface reconstruction as an energy minimization problem involving a single unified target. The method can robustly reconstruct complex geometries and surface reflectivities in the event of imprecise initial geometries.

Cheng et al (CVPR, 16226-16235,2021) formulate the task of surface reconstruction as an energy minimization problem involving a single unified target. The method can robustly reconstruct complex geometries and surface reflectivities in the event of imprecise initial geometries. Their experimental solution is based on a camera-light scanner arrangement, which means that it is difficult for an average user to reconstruct his own desired object or scene appearance using consumer-grade RGB-D cameras. In addition, the method does not consider the problems of illumination and shadow variation during scanning, which can seriously affect the optimization process of the energy function.

Wang et al (Remote Sensing, vol.14, no.9, art. No.2160, 2022) propose a parallel approach to texture reconstruction, construct a graph cut algorithm to select the best view angle for each mesh plane, and smooth between adjacent views to mitigate texture seams. This view selection based approach evaluates at only one location of the proxel. Even if only a small pose difference exists, the projection points do not correspond to the identical vertexes on the model; in addition, if the dimensions of the two images are different, the searched pixels cover areas of different sizes in three-dimensional space, which can mislead global adjustment in the process of real multi-view texture reconstruction.

Li et al (WACV, 1413-1422, 2019) devised a capture system consisting of a set of RGB-D cameras arranged sequentially around a target area to capture information from different perspectives to avoid cumulative errors in camera pose. However, the method requires a plurality of RGB-D sensors, has high cost and is difficult to integrate; in addition, the method does not carry out additional processing on the key frame, and the generated reconstructed texture is influenced by the highlight pixels and has obvious unnatural illumination effect.

Disclosure of Invention

In view of the limitations of the above method, a main object of the present invention is to use a consumer-class RGB-D camera to propose a texture mapping pipeline for non-lambertian objects, which can efficiently restore the non-lambertian objects in a scene to their real and natural appearance.

An optimization method for texture mapping of a non-lambertian surface based on a consumer-grade RGB-D sensor, comprising:

step 1: acquiring an RGB image and a depth image of a non-lambertian object by using a consumer-grade RGB-D camera;

step 2: reconstructing a grid model from the input depth sequence as an initial model of texture mapping;

step 3: selecting the image subsequence with the clearest color from the RGB sequence as a candidate texture image, and extracting a key frame;

Step 4: the camera pose of each key frame is optimized by combining the optimization strategies of color and depth;

step 5: estimating the reflection coefficient of the model surface by using the optimized camera pose;

step 6: performing color correction on the key frame by using the optimized camera pose and adopting a method of combining color histograms;

step 7: providing a texture synthesis method based on patch aiming at a high-reflectivity region on a model, and re-synthesizing textures for the high-brightness region in a key frame;

step 8: the synthesized image is used for texture mapping of the high reflectivity region and vertex positions are optimized to sufficiently align the synthesized texture boundaries with the boundaries of the outer region, eliminating texture seams.

The invention discloses an optimization method for texture mapping of a non-lambertian surface based on a consumer-level RGB-D sensor, which is characterized in that the step 1 is specifically as follows:

step 1.1: collecting a scene RGB image and a scene depth image of a non-lambertian object by using an Azure Kinect DK depth camera, and collecting scene data of different placing postures and illumination conditions;

step 1.2: an RGB image stream and an aligned depth image stream are formed into an RGB-D sequence. An initial geometric model is generated using the sequence of depth images, the color images are treated as texture images, which are mapped onto the reconstructed geometric model to generate textures.

The optimization method for texture mapping of the non-lambertian surface based on the consumer-level RGB-D sensor is characterized in that the step 2 is specifically as follows:

step 2.1: generating an initial model M from an input depth sequence ₀ And a set of camera poses { T ] _i Corresponds to the selected color image sub-sequence { C } _i Sum depth image sub-sequence { D } _i }。

Step 2.2: reconstructing the geometric model using Elastic Reconstruction method and recording the estimated camera pose T for each frame _i . T is a 4 x 4 transformation matrix, M ₀ Is transformed from world coordinates to local camera coordinates, defined as:

where R is a 3×3 rotation matrix and t is a 3×1 translation vector. Perspective projection is denoted by P, which can project a three-dimensional vertex v= [ x, y, z ]] ^T Onto a two-dimensional image plane, including a de-homogenization operation. The pixel u (u, v) corresponding to the vertex v on the image is calculated as follows:

f in _x 、f _y C is focal length _x 、c _y Is the optical center coordinates of the pinhole camera model.

The optimization method for texture mapping of the non-lambertian surface based on the consumer-level RGB-D sensor is characterized in that the step 3 is specifically as follows:

step 3.1: the blur-estimation method is used to quantify the blur of each image and select the sharpest color image from low to high depending on the blur of the image.

Step 3.2: and eliminating redundant color images, ensuring coverage of a reconstruction model, and selecting a new key frame according to the following formula in consideration of hardware specifications of Azure Kinect DK:

I _i ＝{I _i ∈Φ _KF :∠(R _k ,R _i )＞30°||Dist(t _k ,t _i )＞0.2&&S _ispec /S _i ＜0.1} (3)

wherein phi is _KF For a key frame set selected by fuzzy measurement, R is a rotation matrix of a camera pose corresponding to the key frame, t is a translation vector, S _i S is the number of pixel points contained in the current frame _ispec Is the number of highlight pixels in key frame i, k being the index of the frame preceding the current frame i. We call these selected best color images and depth images aligned with them key frames.

The optimization method for texture mapping of the non-lambertian surface based on the consumer-level RGB-D sensor is characterized in that the step 4 is specifically as follows:

step 4.1: of computational modelsInitial texture to obtain an initial texture model M _C0 . To obtain a mesh model M ₀ We will M ₀ Each vertex v on _i Projected onto all visible key frames, and all color values of the vertex on the key frames are stored, and then the vertex v is calculated by weighted average _i Is a color of (c). Specifically we use k _ij ＝cos(θ) ² /d ² As vertex v _i Weights of projected color values on the jth keyframe, where θ is the vertex v _i The angle between the normal at the jth keyframe and its viewing angle, d is the vertex v _i Distance to the camera center of keyframe j.

Step 4.2 optimizes the camera pose of each keyframe using a concept similar to the JointTG method to ensure that the texture of the model is as consistent as possible with the texture projected onto all visible keyframes. The difference is that, for the highlight region brought by the non-lambertian object to the key frame, we do not directly combine the color consistency and the geometric consistency, but use the color difference between the vertex and the corresponding point in the key frame to guide the weight ratio of the color consistency and the geometric consistency. The objective function is defined as:

E _tex ＝δ ₁ E _c +δ ₂ E _d (4)

wherein E is _c Is a photometric consistency term, E _d Is a depth consistency term. Delta ₁ And delta ₂ Is a weight that is related to the color difference value of the vertex and the corresponding point in the keyframe:

δ ₂ ＝ln(|C(v _i )-Γ _j (v _i ,T _j )|+1) (6)

wherein C (v) _i ) For texture model M _c Upper vertex v _i Color intensity of (f), function Γ _j (v _i ,T _j ) Obtaining vertex v _i By camera pose T _j Projected to a corresponding point in keyframe jColor intensity.

E _c Is defined as Color Map Optimization method, ensures M _c The photometric error of the upper vertex and its corresponding projected texture on each keyframe is minimal:

Where j and i are the indices of the keyframes and vertices, respectively.

E _d Ensure M _c The depth error between the top vertex and its corresponding point in each depth key frame is as small as possible:

wherein D (v) _i ) Obtaining vector v _i The third element of (2), function pi _j (v _i ,T _j ) Obtaining vertex v _i Through the position and posture T of the camera _j And projecting the depth value of the corresponding point in the j-th depth key frame.

Minimizing equation E _tex The problem of inaccurate pose estimation of the camera can be corrected by the objective function formula, and blurring and ghost of the reconstructed texture are reduced.

The optimization method for texture mapping of the non-lambertian surface based on the consumer-level RGB-D sensor is characterized in that the step 5 is specifically as follows:

step 5.1: all key frames are first projected to one view using the Capture Render method, and the resulting image can be vectorized into an image matrix a. Since the diffuse reflection map D has isotropy, i.e., the observed lambertian surface intensity is the same regardless of the viewing angle of the observer, D is a low rank matrix. Whereas S is sparse because the mirror matrix S contains only highlights. By minimizing the kernel norms of D and the 1 norms of S, the diffuse and specular maps for each keyframe can be finally obtained:

A＝D+S (9)

min _D,S ||D|| _* +λ||S|| ₁ (10)

Step 5.2: according to the Phong illumination model, specular reflection occurs only when the viewing direction and the reflection direction are the same. Therefore, the incidence direction of the light can be calculated by using the highlight information, the normal and the camera pose:

l＝2*(o·n)*n-r (11)

where o is the viewing direction, r is the reflecting direction, and n is the surface normal. The surface reflectance of the model can then be obtained by minimizing the disparity between the reflected intensities of the model vertices and the intensities of the corresponding points in the highlight map, as shown in fig. 3. We assume in the experiment that the illumination intensity is uniform, and construct the objective function as follows:

wherein ρ (S, v) function represents the intensity value, k, of the corresponding point of vertex v in the high-light map S _v Reflection coefficient, alpha, representing vertex v _v The luminance parameter is represented by a value of,representing the specular reflection intensity at the vertex v on the model.

The optimization method for texture mapping of the non-lambertian surface based on the consumer-level RGB-D sensor is characterized in that the step 6 is specifically as follows:

step 6.1: each vertex of the geometric model is projected onto a depth image aligned with the RGB image to perform a visibility test. If a vertex is visible in both camera views, we re-project the vertex onto the RGB image to obtain a pair of color values and add them to the color correspondence list. There should be enough overlap between the images involved in color correction to ensure that the estimation of the color mapping function is based on a large number of color mapping samples. A color correspondence is then established between each pair of RGB images, and a joint color histogram (JIH) of the two RGB images can be drawn from all pairs of color values. JIH can also be formulated:

Where δ (u, v) is a kronecker function, the function value of which is 1 when u=v, otherwise 0, x and y represent all possible color values in the source image S and the target image T, p, respectively _s And p _t Is the coordinates of the geometric model with the same vertex mapped to the images S and T, C _p Is a set of corresponding point pairs in S and T all corresponding to the same vertex.

Step 6.2: the color mapping function CMF is estimated on the basis of the JIH. The CMF is a function of mapping the color of the source image S to the target image T, resulting in a color corrected image T'. In this case, the CMF may be expressed as p' =f _m (p) wherein f _m Is the estimated CMF for each pair of images according to the created map, and p 'is the resulting color of the image T' after color correction of a given color p of the target image T. Ideally the color intensities of the corresponding points in the reference image and the target image should be equal, so a linear regression method can be used to fit the observations from JIH to estimate the CMF. However, considering that non-lambertian objects appear in white, high-light areas under natural light, these points are not accurate and may also undermine the robustness of the estimated CMF. We will therefore not consider points close to the extremum and far from the linear curve when estimating the CMF. In addition, a regression analysis called Support Vector Regression (SVR) can be used to estimate the single CMF in the current channel using a linear kernel and a radius basis function kernel. In addition, since there are three color channels per pair of images, one JIH per channel, there are also three CMFs, each channel being color corrected independently.

The optimization method for texture mapping of the non-lambertian surface based on the consumer-level RGB-D sensor is characterized in that the step 7 specifically comprises the following steps:

step 7.1: model vertices are projected onto the keyframes to detect vertex visibility, and color values for the corresponding points are stored to identify highlight regions.

Step 7.2: a texture is synthesized for the non-lambertian region in the model, a bi-directional similarity (BDS) function is introduced based on the Image sum method, and a BDS-G texture synthesis method is constructed on the basis thereof. The addition of an additional transformation at the boundary of the synthesized texture effectively avoids blurring and ghosting and fully aligns the synthesized texture with the external texture, ensuring that a realistic RGB-D reconstructed texture result is generated. The bi-directional similar image generation is defined as follows:

s is a source image, and contains real texture information corresponding to a highlight patch in a target image T, wherein T synthesizes textures for the highlight patch by using S. An integrity term (completion) refers to the fact that for each patch T in the target image T, a similar patch S can be found in the source image S. The coherence term (coherence) is an inversion operation on the integrity term, so that the generated target image T retains the original information of the source image S as much as possible. Dist (s, t) is a similarity measure between patch s and patch t, which uses SSD (sum of squared difference) in RGB space as the similarity measure between patches. L is the total number of pixels each patch contains.

Step 7.3: and re-synthesizing texture images corresponding to the highlight areas, so that the reconstructed textures are more real and natural:

E _BDS-G ＝E _BDS +λ _G E _G (15)

wherein lambda is _G Representing coefficients between items. We set lambda in the experiment _G ＝7。

In the formula, we use E _BDS-G The function re-synthesizes the texture image of the highlight region and aligns it with the outer texture boundary. E (E) _BDS-G Is a bi-directional similarity term from the corresponding source image S according to an objective function formula _A Resynthesis of target graphsImage T _A Texture of the highlight region in (a).

Step 7.4: the boundary of the synthesized texture is sufficiently aligned with the boundary of the external texture, additional transformations are added to the boundary vertices of the synthesized texture, and the boundary vertices of the synthesized texture are moved during BDS synthesis to align with adjacent external textures. For better contrast to Ji Wenli, depth information is considered in synthesizing textures. E (E) _G Is a geometric consistency term that constrains the depth values of vertices at the boundaries of the synthesized texture to remain consistent with the depth values of corresponding points in the source image, so that the synthesized texture can be aligned at the edges with the external texture:

where i, j, k are the indices of key frame, highlight region and synthetic texture boundary pixels, respectively, and T is the additional transform that causes the boundary vertices u of the synthetic texture to be _k Near the common vertex of the outer texture boundary, T represents the transformation of the last iteration. u (u) _k ' represents u _k Corresponding pixels in image B. m represents the constraining weight of the image a re-composition. D (-) is used for taking a depth value corresponding to the pixel point, and pi (-) is used for taking the depth value D _B Mapped into view a.

The optimization method for texture mapping of the non-lambertian surface based on the consumer-level RGB-D sensor is characterized in that the step 8 specifically comprises the following steps:

step 8.1: a globally aligned texture reconstruction model is generated, we re-synthesize all source texture images and align them sufficiently with the external texture at the boundary. We express the final energy function as:

where n is the index of the texture patch that needs to be re-synthesized. And (3) re-synthesizing all texture images by using the above formula, so that all texture images are aligned on a reconstruction model, and a seamless texture result is generated.

The optimization method for texture mapping of the non-lambertian surface based on the consumer-level RGB-D sensor has at least the following beneficial effects:

1. we propose a novel texture mapping pipeline for non-lambertian objects that can restore true texture for non-lambertian objects under various illumination.

2. Instead of discarding the data for the highlight region as outliers, we combine the RGB data with depth to optimize the camera pose.

3. The color mapping function is estimated based on the image joint histogram and the linear fitting method, and consistency correction is carried out on the color of the key frame.

4. We use depth information to optimize the texture synthesis process to align the synthesized texture with the external texture.

5. Extensive experiments were performed on multiple data sets. Experimental results show that the method has better qualitative and quantitative results.

6. The non-lambertian object can be restored to the real texture by using a consumer-grade RGB-D depth camera without an industrial-grade depth camera.

Drawings

FIG. 1 is a general flow chart of a texture mapping optimization method for non-Lambert surfaces based on consumer-level RGB-D sensors according to the invention

FIG. 2 (a) RGB image of non-lambertian object (b) specular map separated from diffuse reflection map (d) grayscale image separated from grayscale image (c) grayscale image corresponding to RGB image

FIG. 3 (a) visualization of non-lambertian object specular map sequence (b) model surface normals (c) estimated reflectivity

FIG. 4 (a) (b) RGB map of non-lambertian object (c) (d) initial JIH map and color corrected JIH map using JointTG method (e) (f) color corrected JIH (g) (h) color consistency corrected RGB map using CMF method

FIG. 5 texture synthesis pipeline (a) texture model after camera pose optimization (b) input data (c) our texture synthesis pipeline

FIG. 6 (a) RGB image (b) depth image

FIG. 7 reconstructed grid model

FIG. 8 (a) before camera pose optimization (b) after camera pose optimization

Fig. 9 (a) before key frame color correction (b) after key frame color correction

FIG. 10 (a) before texture synthesis (b) after texture synthesis

FIG. 11 is applied to a 3D model

Fig. 12 texture mapping results using different methods using outdoor data sets acquired by Azure Kinect DK

Fig. 13 texture mapping results using different methods using outdoor data sets acquired by Azure Kinect DK

Fig. 14 texture mapping results using data sets acquired by Azure Kinect DK under low light level using different methods

Fig. 15 texture mapping results using Azure Kinect DK under red/green/blue light using different methods

FIG. 16 provides a qualitative comparison of different methods on the common dataset Intrinsic3D and foundation

FIG. 17 texture mapping results for various steps of the inventive method

FIG. 18 texture mapping results of the inventive method with and without BDS-G texture synthesis strategies

FIG. 19 runtime comparison in 5 different 3D models

Detailed Description

The invention discloses an optimization method for texture mapping of a non-lambertian surface based on a consumer-level RGB-D sensor. As shown in fig. 1, our goal is to generate high fidelity textures for models with non-lambertian surfaces using commercial RGB-D sensors. To achieve this goal, we propose a new BDS-G texture synthesis approach to re-synthesize textures for highlight regions in key frames and use the synthesized images for texture mapping to 3D models. The main idea of the method is divided into the following steps:

step one: the RGB-D sequence acquired using a commercial RGB-D camera Azure Kinect DK (other devices such as Microsoft Kinect and Realsense D435/435i could be used instead) is used as input. An RGB image stream and an aligned depth image stream form an RGB-D sequence. Typically, an initial geometric model is generated using a sequence of depth images, the color images are treated as texture images, which are mapped onto the reconstructed geometric model to generate textures.

Step two: a mesh model is reconstructed from the input depth sequence as an initial model of the texture map and a subset is extracted from the original color sequence as texture candidates. This step generates an initial model M ₀ And a set of camera poses { T ] _i Corresponds to the selected color image sub-sequence { C } _i Sum depth image sub-sequence { D } _i }. To improve quality and reduce computational complexity, we use Elastic Reconstruction method to reconstruct the geometric model instead of using Kinect Fusion method and record the estimated camera pose T for each frame _i . T is a 4 x 4 transformation matrix, M ₀ Is transformed from world coordinates to local camera coordinates, defined as:

Step three: to obtain a sharp texture effect, we select the image sub-sequence with the sharpest color from the RGB sequence as the candidate texture image. Specifically, we use the blur-estimate method to quantify the blur of each image and select the sharpest color image from low to high depending on the blur of the image. In addition, to eliminate redundant color images while guaranteeing coverage of the reconstructed model, we select a new key frame in consideration of hardware specifications of Azure Kinect DK as follows:

Step four: in order to reduce the influence of camera pose drift on a final texture model as much as possible, and simultaneously consider that a highlight region existing in an RGB image causes a larger error on a light consistency target, we propose a joint color and depth optimization strategy to optimize the camera pose of each key frame, the strategy fully utilizes collected RGB and depth data, rather than just discarding the data of the highlight region as an outlier, and finally all texture maps are aligned as much as possible. The method comprises the following specific steps:

(1) First, we calculate the initial texture of the model to obtain an initial texture model M _C0 . To obtain a mesh model M ₀ We will M ₀ Each vertex v on _i Projected onto all visible key frames, and all color values of the vertex on the key frames are stored, and then the vertex v is calculated by weighted average _i Is a color of (c). Specifically we use k _ij ＝cos(θ) ² /d ² As vertex v _i Weights of projected color values on the jth keyframe, where θ is the vertex v _i The angle between the normal at the jth keyframe and its viewing angle,d is the vertex v _i Distance to the camera center of keyframe j.

(2) Because of camera drift and geometric errors, the texture color obtained by projection is inaccurate and easily results in texture blurring, we use ideas similar to the JointTG method to optimize the camera pose of each keyframe to ensure that the texture of the model is as consistent as possible with the texture obtained by projection onto all visible keyframes. The difference is that, for the highlight region brought by the non-lambertian object to the key frame, we do not directly combine the color consistency and the geometric consistency, but use the color difference between the vertex and the corresponding point in the key frame to guide the weight ratio of the color consistency and the geometric consistency. The objective function is defined as:

E _tex ＝δ ₁ E _c +δ ₂ E _d (4)

δ ₂ ＝ln(|C(v _i )-Γ _j (v _i ,T _j )|+1) (6)

wherein C (v) _i ) For texture model M _c Upper vertex v _i Color intensity of (f), function Γ _j (v _i ,T _j ) Obtaining vertex v _i By camera pose T _j The color intensity projected to the corresponding point in keyframe j.

where j and i are the indices of the keyframes and vertices, respectively.

wherein D (v) _i ) Obtaining vector v _i The third element of (2), function II _j (v _i ,T _j ) Obtaining vertex v _i Through the position and posture T of the camera _j And projecting the depth value of the corresponding point in the j-th depth key frame.

(3) Minimizing equation E _tex The problem of inaccurate pose estimation of the camera can be corrected by the objective function formula, and blurring and ghost of the reconstructed texture are reduced. After the pose of the camera is optimized, the surface reflection coefficient of the model is estimated by the pose of the camera after optimization, and then the color of the key frame is corrected, so that the unnatural texture effect caused by factors such as illumination change, optical distortion of the camera and the like is avoided.

Step five: the optimized camera pose is used to estimate the reflection coefficients of the model surface in order to distinguish the non-lambertian material from the scene for subsequent regional texture mapping. The key frame is decomposed into a diffuse reflection map and a highlight map, and then reflection coefficients and brightness parameters of the model surface are obtained by using highlight information, normal and camera pose and used as guidance for subsequent zoning texture mapping.

Step six: estimating the reflection coefficient of the model by using the optimized camera pose; then, carrying out color consistency correction on the key frames participating in texture mapping, and avoiding negative effects of factors such as illumination change, white balance and the like on the reconstructed texture; aiming at a high reflectivity area on a model, a texture synthesis method based on patch is provided, textures are synthesized again for a highlight area in a key frame, and the synthesized textures are fully aligned with external textures, so that unnatural texture joints are avoided. The method comprises the following specific steps:

(1) As shown in fig. 2, we first project all key frames to one view using the Capture Render method, and the resulting image can be vectorized into an image matrix a. Since the diffuse reflection map D has isotropy, i.e., the observed lambertian surface intensity is the same regardless of the viewing angle of the observer, D is a low rank matrix. Whereas S is sparse because the mirror matrix S contains only highlights. By minimizing the kernel norms of D and the 1 norms of S, the diffuse and specular maps for each keyframe can be finally obtained:

A＝D+S (9)

min _D,S ||D|| _* +λ||S|| ₁ (10)

according to the Phong illumination model, specular reflection occurs only when the viewing direction and the reflection direction are the same. Therefore, the incidence direction of the light can be calculated by using the highlight information, the normal and the camera pose:

l＝2*(o·n)*n-r (11)

(2) Color images captured by RGB-D sensors are susceptible to automatic white balance, automatic exposure, illumination variation, and the like. These factors can lead to color inconsistencies between color images captured from different perspectives. To reduce the negative impact of color inconsistencies on reconstructed textures, consistency corrections of the key frame colors need to be made prior to texture mapping. Both the JointTG method and the Patch-Based Color Transfer method consider color correction as a color transfer task from a reference image to a target image, but this method requires that the two images be sufficiently similar, whereas in our experimental scenario, as the RGB-D sensor moves, non-lambertian objects will appear to highlight areas at different locations, and therefore the above method is not applicable to our experimental scenario. Therefore, we use a joint color histogram approach to correct color inconsistencies in key frames. Each vertex of the geometric model is first projected onto a depth image aligned with the RGB image to perform a visibility test. If a vertex is visible in both camera views, we re-project the vertex onto the RGB image to obtain a pair of color values and add them to the color correspondence list. There should be enough overlap between the images involved in color correction to ensure that the estimation of the color mapping function is based on a large number of color mapping samples. A color correspondence is then established between each pair of RGB images, and a joint color histogram (JIH) of the two RGB images can be drawn from all pairs of color values, as shown in fig. 4 (c). JIH can also be formulated:

We estimate the color mapping function CMF on the basis of this JIH. CMF is a mapping of the color of the source image S to the target image TAnd (3) a function, thereby obtaining a color corrected image T'. In this case, the CMF may be expressed as p' =f _m (p) wherein f _m Is the estimated CMF for each pair of images according to the created map, and p 'is the resulting color of the image T' after color correction of a given color p of the target image T. Ideally the color intensities of the corresponding points in the reference image and the target image should be equal, so a linear regression method can be used to fit the observations from JIH to estimate the CMF. However, considering that non-lambertian objects appear in white, high-light areas under natural light, these points are not accurate and may also undermine the robustness of the estimated CMF. We will therefore not consider points close to the extremum and far from the linear curve when estimating the CMF, JIH after color correction as shown in fig. 4 (e). In addition, a regression analysis called Support Vector Regression (SVR) can be used to estimate the single CMF in the current channel using a linear kernel and a radius basis function kernel. In addition, since there are three color channels per pair of images, one JIH per channel, there are also three CMFs, each channel being color corrected independently, as in fig. 4 (f). Although the CMF is estimated by JIH created using only the filtered pairing map, the color mapping function has a generalization function that can color correct all pixels of the target image T. This generalization increases the similarity between all images not only within the map, but also throughout the image, so this color correction method can handle multiple images.

(3) After the camera pose is optimized, model M ₀ Is substantially aligned. However, due to the material properties of the non-lambertian object, the high light areas in the input RGB image still cause unnatural blurring of the resulting texture. We can use a deep learning based approach to remove highlight regions in the image, but this approach often requires a large data set to train, which is very time and computational resource consuming.

Therefore, we consider that texture images acquired from a plurality of different perspectives are synthesized before texture mapping, and the synthesized RGB images are used for texture mapping, so that the negative influence of the non-texture region on the reconstructed texture can be greatly reduced. As shown in fig. 5, first, we project model vertices onto the keyframe to detect vertex visibility and store color values for the corresponding points to identify highlight regions. We then synthesize texture for the non-langerhans region in the model, as shown in fig. 5 (c). We introduced a bi-directional similarity (BDS) function based on the Image Summary method and constructed a BDS-G texture synthesis method on its basis. The addition of an additional transformation at the boundary of the synthesized texture effectively avoids blurring and ghosting and fully aligns the synthesized texture with the external texture, ensuring that a realistic RGB-D reconstructed texture result is generated. The bi-directional similar image generation is defined as follows:

Step seven: in order to restore the real texture of the non-lambertian material, we re-synthesize the texture image corresponding to the highlight region, making the reconstructed texture more real and natural:

E _BDS-G ＝E _BDS +λ _G E _G (15)

wherein lambda is _G Representing coefficients between items. We set lambda in the experiment _G =7. In the formula, we use E _BDS-G The function re-synthesizes the texture image of the highlight region and aligns it with the outer texture boundary. E (E) _BDS-G Is a bi-directional similarity term from the corresponding source map according to the objective function formulaImage S _A Resynthesis of target image T _A Texture of the highlight region in (a). As shown in fig. 5 (c), an image T _A And image T _B Respectively representing two images corresponding to texture regions which need to be recombined.

To align the boundary of the synthesized texture sufficiently with the boundary of the external texture, we add an additional transformation to the boundary vertices of the synthesized texture, moving the boundary vertices of the synthesized texture into alignment with the neighboring external texture during BDS synthesis. For better contrast to Ji Wenli, depth information is considered in synthesizing textures. E (E) _G Is a geometric consistency term that constrains the depth values of vertices at the boundaries of the synthesized texture to remain consistent with the depth values of corresponding points in the source image, so that the synthesized texture can be aligned at the edges with the external texture:

where i, j, k are the indices of key frame, highlight region and synthetic texture boundary pixels, respectively, and T is the additional transform that causes the boundary vertices u of the synthetic texture to be _k Near the common vertex of the outer texture boundary, T represents the transformation of the last iteration. u (u) _k ' represents u _k Corresponding pixels in image B. m represents the constraining weight of the image a re-composition. D (-) is used for taking a depth value corresponding to the pixel point, and II (-) is used for taking the depth value D _B Mapped into view a.

Step eight: to generate a globally aligned texture reconstruction model, we re-synthesize all source texture images and align them sufficiently with the external texture at the boundary. We express the final energy function as:

The invention is further illustrated by the following examples.

Example 1:

our experiments were all performed on a computer equipped with Intel borui 9 4.7GHz CPU,16GB RAM and RTX 3060 gb graphics card. First, RGB images and depth images of non-lambertian objects are acquired using a consumer-level depth camera, as shown in fig. 6. An initial geometric model is generated using the sequence of depth images, the color images are treated as texture images, which are mapped onto the reconstructed geometric model to generate textures.

A mesh model is reconstructed from the input depth sequence as an initial model of the texture map, as shown in fig. 7, and a subset is extracted from the original color sequence as texture candidates. To obtain a sharp texture effect, we select the image sub-sequence with the sharpest color from the RGB sequence as the candidate texture image. In order to eliminate redundant color images and ensure coverage of a reconstruction model, a new formula is reconstructed to select a new key frame in consideration of hardware specifications of Azure Kinect DK. For the camera pose of each key frame, in order to reduce the influence of the camera pose drift on the final texture model as much as possible, considering that the highlight region existing in the RGB image causes larger error to the light consistency target, we propose a novel combined color and depth optimization strategy, and after the camera pose optimization, we align all the texture graphs as much as possible, as shown in FIG. 8.

Estimating the reflection coefficient of the model by using the optimized camera pose; color consistency correction is then performed on the key frames participating in texture mapping, as shown in fig. 9, so as to avoid negative effects of factors such as illumination change, white balance and the like on the reconstructed texture; for the high reflectivity region on the model, we propose a patch-based texture synthesis method, re-synthesize texture for the highlight region in the key frame, and make the synthesized texture sufficiently aligned with the external texture, avoiding the generation of unnatural texture seams, as shown in fig. 10.

Finally, we use our pipeline to get new texture mapping results for non-lambertian objects, which can be applied to their 3D models. By contrast we can clearly perceive that we have restored their true and natural appearance, as shown in figure 11.

Example 2:

we first experimented with datasets captured by Azure Kinect DK cameras, which are challenging scenes containing illumination variation and low resolution. We have performed experimental analysis on these data sets to demonstrate the effectiveness of the proposed method.

Figure 12 shows a qualitative texture map comparison of the proposed method with the four methods KinectFusion, color Map Optimization, jointTG, G2LTex on Azure Kinect DK captured dataset.

Due to the low resolution of the Azure Kinect DK camera, the blurring of the image and the illumination variation from different perspectives presented by the handheld camera present significant challenges to the texture mapping method, resulting in texture mapping results with significant blurring and artifacts, as shown in fig. 12 (a). The approach employed selects the optimal RGB image for each face and employs non-rigid optimization in both global and local areas to mitigate seams between textures. However, this method does not make any correction for the illumination inconsistency between the different views, and thus does not completely eliminate unnatural seams between textures, as shown in fig. 12 (d).

In addition, the weighted average strategy adopted by the Color Map Optimization method has a blurring effect, which becomes more serious when there is a highlight region or a change in illumination in a color image, as shown in fig. 12 (b). The JointTG method uses a cubic spline curve to correct the tone of the keyframe, guided by the color of the vertices of the model. This method is easily affected by the highlight areas on the non-lambertian object, resulting in poor optimization, the resulting reconstructed texture is still not clear enough and even a texture miss occurs, as shown in fig. 12 (c).

We have introduced a joint optimization strategy to ensure that the camera pose can be optimized also in high light areas. In the global color coordination step, unlike the JointTG method, we estimate the color mapping function for the joint color histogram using a data fitting method, which greatly reduces the effect of the highlight pixels on color correction. In the texture synthesis step, we re-synthesize pixels in the highlight region using the BDS-G strategy, aligning the boundaries of the synthesized texture more closely with the external texture. In addition, the method does not need to re-synthesize the whole target image, and is more efficient than the EAGLE-TextureMapping method.

Thus, our method can produce more realistic texture results than other methods, and the texture is very close to the original content of the input color image, containing little blurring and seam artifacts, as shown in fig. 12 (e).

To verify the effectiveness of our method under different illumination, we performed experiments under outdoor and indoor lights, respectively.

The inputs as in fig. 13 are a sequence of depth images and a corresponding sequence of color images acquired by the Azure Kinect DK camera under direct sunlight outdoors. The input as in fig. 14 is an RGB and depth image pair acquired using a DK camera under weak light sources at night. The inputs as in fig. 15 are a sequence of depth images and a sequence of RGB images acquired under red/green/blue lamp light irradiation using a DK camera, respectively.

It can be seen that using the proposed method always gives a clear and natural texture result, regardless of the color and intensity of the incident light. On the other hand, the images acquired by the hand-held RGB-D sensor inevitably generate blurring, and even if a clear image around the key frame is selected, the introduction of blurring images, such as a case where all the images around the key frame are blurred, cannot be avoided. This blurring of texture candidate images and illumination variations between different views have significant adverse effects on existing texture mapping methods (KinectFusion method and Color Map Optimization method). These blurring and illumination variations can easily affect texture results as in (a) and (b) of fig. 13, 14 and 15. The keyframe color correction method of the JointTG method is easily affected by the highlight areas on the non-langerhans object, resulting in poor optimization and still an unclear and natural final reconstructed texture.

In addition, the JointTG method cannot get rid of the situation that the geometry of the reconstructed model is significantly missing, some tiny objects may not be reconstructed in the initial stage due to the resolution of the depth image and the limitation of noise, and the missing geometry cannot be recovered, as shown in (c) in fig. 13, 14 and 15. The G2LTex method cannot completely eliminate seams between textures as shown in (d) of fig. 13, 14 and 15.

The method herein introduces a strategy for highlight recognition to reduce the negative impact of highlight regions on camera pose optimization. Then, we re-synthesize pixels in the highlight region using the BDS-G strategy, aligning the boundaries of the synthesized texture more closely with the external texture. A clearer and more realistic texture result can be generated as shown in (e) of fig. 13, 14 and 15.

Example 3:

to demonstrate the effectiveness of this method, we have also performed experiments on various common data sets and compared texture results to the most advanced method.

We apply the methods herein to both the found dataset and the tarnishic 3D dataset and compare with their provider Color Map Optimization method and the intreric 3D method. The results show that our method is superior to other methods, as shown in fig. 16. The texture mapping result of the KinectFusion method does not perform any texture optimization, so the result inevitably introduces blurring and ghosting, as shown in fig. 16 (a) and (d). The weighted average strategy employed by the Color Map Optimization method has a blurring effect that becomes more severe when there is a highlight region or illumination change in the color image, as shown in fig. 16 (e). In addition, the intrnsic 3D method easily introduces texture duplication due to ambiguity in handling texture and geometric details with the sfs-based method, as shown in fig. 16 (b). In contrast, the method herein can produce competitive results using the same input color image with highlight regions, as shown in fig. 16 (c) and (f).

Example 4:

to examine the values of the components of the proposed method, we compare the intermediate texture results of the steps of the proposed method, as shown in fig. 17. If we do not do any texture optimization, the texture mapping result will appear as noticeable blurring and artifacts, as shown in FIG. 17 a. After the posture of the camera is optimized, the blurring effect of the reconstructed texture is obviously relieved. But the results are still less than ideal due to the inconsistent color and highlight regions of the key frames, as shown in fig. 17 b. After global color correction we get texture mapping results for global color consistency as shown in fig. 17 c. Finally, by the BDS-G texture synthesis strategy, the negative influence of the highlight region on the texture reconstruction result is further eliminated, and the high-fidelity texture is recovered, as shown in FIG. 17 d.

We also demonstrate the results of the proposed method with and without the BDS-G texture synthesis strategy. It includes not employing camera pose optimization, not employing global color consistency correction, and employing color correction. The results indicate that camera pose optimization can substantially align textures as shown in the first row of fig. 18 b. In addition, the proposed key frame color consistency constraint may improve texture blur slightly by camera pose optimization, as shown in the first row of fig. 18 c. In addition, the proposed BDS-G texture synthesis method can also re-synthesize textures of highlight regions in key frames without camera pose optimization, and improves the effect of reconstructing textures, as shown in the second row of FIG. 18 a. After optimizing the camera pose, more realistic texture results may be generated, as shown in the second row of fig. 18b and 18 c.

Example 5:

the performance of our described method was compared to methods JonitTG, G2Ltex, including the total number of vertices and faces of the model, the number of keyframes, and the runtime of each method. The JonitTG method optimizes the camera gesture, the geometry and texture of the reconstructed model and the color consistency between key frames simultaneously, and takes luminosity and geometric consistency as guidance. The G2Ltex method first uses a non-rigid optimization algorithm to correct the projection matrix of each graph. And then correcting the texture coordinates of all the vertexes on the boundary of the graph to align the textures, which takes a long time.

The experiment should follow the following rules: the experiment uses 5 different 3D models, the hardware devices are identical, the models reconstructed by the same method are used, and all parameters are set according to the recommendation of authors of each method. The experimental results are shown in FIG. 19.

Wherein T is _cp 、T _cc 、T _ic 、T _t Respectively representing the computation time required for camera pose optimization, global color correction, BDS-G texture synthesis, and the total processing time of our proposed method.

Claims

1. An optimization method for texture mapping of a non-lambertian surface based on a consumer-grade RGB-D sensor, comprising:

2. The method for optimizing texture mapping of non-lambertian surfaces based on consumer-grade RGB-D sensors of claim 1, wherein step 1 is specifically:

3. The method for optimizing texture mapping of non-lambertian surfaces based on consumer-grade RGB-D sensors of claim 1, wherein step 2 is specifically:

step 2.1: generating an initial model M from an input depth sequence ₀ And a set of camera poses { T ] _i Corresponds to the selected color image sub-sequence { C } _i Sum depth image sub-sequence { D } _i }

where R is a 3×3 rotation matrix and t is a 3×1 translation vector. Perspective projection is denoted by P, which can project a three-dimensional vertex v= [ x, y, z ] ] ^T Onto a two-dimensional image plane, including a de-homogenization operation. The pixel u (u, v) corresponding to the vertex v on the image is calculated as follows:

4. The method for optimizing texture mapping of non-lambertian surfaces based on consumer-grade RGB-D sensors of claim 1, wherein the step 3 is specifically:

step 3.1: using the blur-estimate method to quantify the blur of each image and selecting the sharpest color image from low to high depending on the blur of the image

I _i ＝{I _i ∈Φ _KF :∠(R _k ,R _i )＞30°||°||Dist(t _k ,t _i )＞0.2&&S _ispec /S _i ＜0.1} (3)

5. The method for optimizing texture mapping of non-lambertian surfaces based on consumer-grade RGB-D sensors of claim 1, wherein step 4 is specifically:

Step 4.1: calculating initial texture of the model to obtain an initial texture model M _C0 . To obtain a mesh model M ₀ We will M ₀ Each vertex v on _i Projected onto all visible key frames, and all color values of the vertex on the key frames are stored, and then the vertex v is calculated by weighted average _i Is a color of (c). Specifically we use k _ij ＝cos(θ) ² /d ² As vertex v _i Weights of projected color values on the jth keyframe, where θ is the vertex v _i Normal at the jth keyframe and view thereofThe angle between the angles, d is the apex v _i Distance to the camera center of keyframe j.

Step 4.2: the camera pose of each keyframe is optimized using a concept similar to the JointTG method to ensure that the texture of the model is as consistent as possible with the texture projected onto all visible keyframes. The difference is that, for the highlight region brought by the non-lambertian object to the key frame, we do not directly combine the color consistency and the geometric consistency, but use the color difference between the vertex and the corresponding point in the key frame to guide the weight ratio of the color consistency and the geometric consistency. The objective function is defined as:

E _tex ＝δ ₁ E _c +δ ₂ E _d (4)

δ ₂ ＝ln(|C(v _i )-Γ _j (v _i ,T _j )|+1) (6)

where j and i are the indices of the keyframes and vertices, respectively.

6. The method for optimizing texture mapping of non-lambertian surfaces based on consumer-grade RGB-D sensors of claim 1, wherein the step 5 is specifically:

step 5.1: all key frames are first projected to one view using the CaptureRender method, and the resulting image can be vectorized into an image matrix a. Since the diffuse reflection map D has isotropy, i.e., the observed lambertian surface intensity is the same regardless of the viewing angle of the observer, D is a low rank matrix. Whereas S is sparse because the mirror matrix S contains only highlights. By minimizing the kernel norms of D and the 1 norms of S, the diffuse and specular maps for each keyframe can be finally obtained:

A＝D+S (9)

min _D,S ||D|| _* +λ||S|| ₁ (10)

l＝2*(o·n)*n-r (11)

7. The method for optimizing texture mapping of non-lambertian surfaces based on consumer-grade RGB-D sensors of claim 1, wherein step 6 is specifically:

Wherein δ (u, v) is a kronecker function, the function value of which is 1 when u=v, and otherwise 0, x and y respectively representAll possible color values, p, in the source image S and the target image T _s And p _t Is the coordinates of the geometric model with the same vertex mapped to the images S and T, C _p Is a set of corresponding point pairs in S and T all corresponding to the same vertex.

8. The method for optimizing texture mapping of non-lambertian surfaces based on consumer-grade RGB-D sensors of claim 1, wherein the step 7 is specifically:

E _BDS-G ＝E _BDS +λ _G E _G (15)

In the formula, we use E _BDS-G The function re-synthesizes the texture image of the highlight region and aligns it with the outer texture boundary. E (E) _BDS-G Is a bi-directional similarity term from the corresponding source image S according to an objective function formula _A Resynthesis of target image T _A Texture of the highlight region in (a).

Step 7.4: the boundary of the synthesized texture is sufficiently aligned with the boundary of the external texture, additional transformations are added to the boundary vertices of the synthesized texture, and the boundary vertices of the synthesized texture are moved during BDS synthesis to align with adjacent external textures. For better contrast to Ji Wenli, depth information is considered in synthesizing textures. E (E) _G Is geometry oneAn provenance term that constrains the depth values of vertices at the synthesized texture boundaries to be consistent with the depth values of corresponding points in the source image, so that the synthesized texture can be aligned at the edges with the external texture:

9. The method for optimizing texture mapping of non-lambertian surfaces based on consumer-grade RGB-D sensors of claim 1, wherein step 8 is specifically: