CN117788344A

CN117788344A - Building texture image restoration method based on diffusion model

Info

Publication number: CN117788344A
Application number: CN202410205207.6A
Authority: CN
Inventors: 朱旭平; 宋彬; 何文武; 张宇; 王聪玉
Original assignee: Beijing Feidu Technology Co ltd
Current assignee: Beijing Feidu Technology Co ltd
Priority date: 2024-02-26
Filing date: 2024-02-26
Publication date: 2024-03-29
Anticipated expiration: 2044-02-26
Also published as: CN117788344B

Abstract

The invention discloses a building texture image restoration method based on a diffusion model, which comprises the steps of taking an original building texture image as input data of the diffusion model in a forward process to obtain standard Gaussian noise data; taking a building texture image added with Gaussian noise and standard Gaussian noise data as input data of a diffusion model in a reverse process, adding a coding neural network and a prediction neural network in the reverse process of the diffusion model, and performing diffusion model training; and taking the building texture image to be repaired and standard Gaussian noise data as input data of a diffusion model in a reverse process of adding the coding neural network and the prediction neural network to obtain the repaired building texture image. The invention can refine the texture color, so that the texture is more similar to the real effect of the building model.

Description

Building texture image restoration method based on diffusion model

Technical Field

The invention relates to the technical field of building texture image restoration, in particular to a building texture image restoration method based on a diffusion model.

Background

The tilt photography technology can more truly reflect the real image of the data acquisition object by acquiring data at multiple angles, and overcomes the defect that the traditional vertical photography can only acquire the top information of the ground object. On the other hand, the oblique photography data acquisition usually adopts an unmanned aerial vehicle low-altitude operation mode, does not need excessive manual intervention, and has the characteristics of low cost, high efficiency and the like. For this reason, the use of oblique photography data for large-scale three-dimensional reconstruction of cities has become a new trend in the industry.

However, oblique photography data provides multi-angle, omnidirectional images of the scene, while also presenting new challenges to three-dimensional reconstruction. Because of the problems of object shielding, inclination angle and the like in oblique photographic data, the reconstructed three-dimensional model often has more noise, and the typical manifestations are as follows: some scenes or objects may suffer from distortion, holes, blurring of textures, etc. Two-dimensional pictures (e.g., front-view corner facades showing a landmark building in a three-dimensional scene) generated by using such noisy three-dimensional model rendering often have the problem of low texture quality. Aiming at the problem, the existing method mostly adopts methods of shielding rejection, optimal texture clipping and filling and the like to correct textures, but the method often causes the defects of poor authenticity of the repaired textures, unreasonable texture patterns and the like.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a building texture image restoration method based on a diffusion model, which aims to solve the problem of poor quality of building textures in a two-dimensional picture generated by a three-dimensional reconstruction model of oblique photography.

In order to achieve the aim of the invention, the invention adopts the following technical scheme:

a building texture image restoration method based on a diffusion model comprises the following steps:

s1, taking an original building texture image as input data of a diffusion model in a forward process to obtain standard Gaussian noise data;

s2, taking the building texture image added with Gaussian noise and standard Gaussian noise data as input data of a diffusion model in a reverse process, adding a coding neural network and a prediction neural network in the reverse process of the diffusion model, coding the building texture image added with Gaussian noise by using the coding neural network, and predicting new Gaussian noise data and data distribution by using the prediction neural network according to the coding image and the data distribution to carry out diffusion model training;

s3, taking the building texture image to be repaired and standard Gaussian noise data as input data of a diffusion model in a reverse process of adding the coding neural network and the prediction neural network, and obtaining the repaired building texture image.

Further, the step S1 specifically includes the following steps:

s11, acquiring an original building texture image as input data of a diffusion model in a forward process;

and S12, in each diffusion step of the forward process, gaussian noise is added to the building texture image obtained in the previous diffusion step, and data distribution and standard Gaussian noise data of each diffusion step are obtained.

Further, the training of the diffusion model in step S2 specifically includes the following steps:

s21, inputting the building texture image added with the Gaussian noise into a coding neural network for coding to obtain a coding image of the building texture image added with the Gaussian noise;

s22, taking the current diffusion step, the coded image output by the coding neural network and the data distribution of the current diffusion step as training data, taking Gaussian noise added in the current diffusion step in the forward process as tag data, and inputting the training data and the tag data into the prediction neural network to obtain Gaussian noise data of the current diffusion step predicted in the reverse process;

s23, sampling a Gaussian noise from standard Gaussian noise data, and determining the data distribution of the current diffusion step by combining the Gaussian noise data predicted by the current diffusion step in the reverse process;

s24, after the reverse process reaches the initial diffusion step, taking the minimization of errors between the reverse over-predicted Gaussian noise data and the Gaussian noise added in the forward process as an optimization target, and adopting a gradient descent algorithm to carry out iterative optimization.

Further, the method for determining the data distribution of the current diffusion step in step S23 is as follows:

，

where xt (t, N (y)) represents the data distribution of the current diffusion step t, z (t) represents Gaussian noise, and p (t, N (y)) represents Gaussian noise data predicted by the inverse process at the current diffusion step t.

Further, in step S24, the minimization of the error between the inverse-oversupplied gaussian noise data and the gaussian noise added in the forward process is specifically as an optimization objective:

，

wherein,representing optimization objective +_>Model parameters representing a predicted neural network, +.>Representing the data distribution at the current diffusion step t +.>Adding the expected value of Gaussian noise, +.>Representing the gaussian noise added by the forward process in the current diffusion step t, +>Representing a predictive neural network, ++>Data distribution representing the current diffusion step t, +.>The representation of the encoded image is that,representing the norm square.

The invention has the following beneficial effects:

1. the common diffusion model can only randomly generate images, and the generated results have the characteristic of diversity.

2. The invention can refine the texture color to make the texture more similar to the real effect of the building model, and even generate models with different texture styles so as to adapt to different model requirements.

3. The invention can carry out key marking on partial textures, thereby repairing the texture information of a certain building.

Drawings

FIG. 1 is a schematic flow chart of a building texture image restoration method based on a diffusion model in the invention;

FIG. 2 is a schematic view of a diffusion model in the present invention;

FIG. 3 is a schematic diagram of the overall architecture of the FDDM model according to the invention;

FIG. 4 is a schematic diagram of a Unet model structure according to the present invention;

FIG. 5 is a schematic diagram of FDDM model repair in accordance with the present invention;

FIG. 6 is a diagram showing the effect of repairing the overall torsional deformation in the present invention;

FIG. 7 is a schematic view of the window correction effect according to the present invention;

FIG. 8 is a schematic diagram of the effect of correcting the vertical surface in the present invention;

FIG. 9 is a schematic view of the overall roof, facade and window correction effect according to the invention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.

As shown in fig. 1, an embodiment of the present invention provides a building texture image restoration method based on a diffusion model, which includes steps S1 to S3 as follows:

in an alternative embodiment of the present invention, step S1 specifically includes the steps of:

Specifically, the embodiment optimizes the quality of the architectural texture image to obtain various architectural texture images including high-quality textures.

As shown in fig. 2, the forward course of the diffusion model is given observation data x ₀ Generating a series of hidden variables x by continuously adding Gaussian noise _1:T Finally, x is obtained _T Obeys a standard gaussian distribution. Where the gaussian noise added at each step is a known quantity. The present embodiment uses a picture with high quality building texture as input data (x ₀ ) Through a plurality of diffusion steps, one Gaussian noise is added in each step, and finally standard Gaussian noise x is generated _T 。

in an alternative embodiment of the invention, as shown in FIG. 2, the inverse of the diffusion model is from x _T Sampling a noise as initial data, continuously subtracting the Gaussian noise, and recovering x ₀ . Where the gaussian noise subtracted at each step is the quantity to be solved.

The solution method of the diffusion model is realized by optimizing a variation lower bound. Specifically, a neural network z using θ as a parameter is first defined _θ The neural network takes as input the current step t and the current data distribution xt, and outputs a parameter of the predicted noise distribution. Then, the hidden variable x obtained from the forward process _1:T And the KL divergence between the standard Gaussian distribution, a lower variation bound is derived. Finally, by optimizing the lower bound of the variation, a neural network z can be obtained _θ To obtain the gaussian noise to be subtracted at each step in the inverse process:

，

wherein t represents the t step, x of the reverse process _t Is the data distribution of the t step, z _θ Represents a predictive neural network taking theta as a parameter, and has the following functions: according to the current step t and the current data distribution x _t The parameters of the noise profile to be removed in step t are predicted.Representing the norm square: by predicting neural network z _θ Predicted noise profile and noise z _t As close as possible, where z _t Can be calculated according to the forward procedure.

The invention provides a FDDM (FreeDo Diffusion Model) model, which is characterized in that the reverse process is improved on the basis of a diffusion model, and a picture to be repaired is added, so that the model learns how to restore the picture to be repaired into a picture with high-quality textures, and the problems that the picture generated by an original diffusion model has randomness and diversity, has large difference with the picture to be repaired and the like are solved, and the whole architecture is shown in figure 3.

As shown in fig. 3, the forward process takes a picture with high quality building texture as input data (x ₀ ) Through a plurality of diffusion steps, one Gaussian noise is added in each step, and finally, standard Gaussian noise (x _T ) The method comprises the steps of carrying out a first treatment on the surface of the Two neural networks are introduced in the reverse process, wherein the Z network is used for calculating noise to be removed in each step, the N network is used for coding the picture to be repaired, the standard Gaussian noise and the picture (y) to be repaired are used as input in the reverse process, and the noise is gradually removed, so that the picture with high-quality building texture is restored.

The training of the diffusion model in the step S2 specifically includes the following steps:

Specifically, the quality of the building texture in the pictures is improved, 6500 pictures of various buildings containing high-quality texture are collected, and each picture is subjected to noise adding processing by adopting a manual mode and simulating the characteristics of an oblique photography technology, so that a picture to be repaired is obtained. And taking each high-quality picture and the corresponding picture to be repaired as a group to form a training sample.

In the forward process of the diffusion model, a high-quality picture in a sample is taken as input to obtain standard Gaussian noise x _T ；

Training the Gaussian noise x obtained by the image to be repaired and the forward process in the sample in the reverse process of the diffusion model _T As input, pictures with high quality architectural textures are restored by gradual noise removal.

In the t th step (x _t To x _t-1 ) For example, the picture to be repaired is encoded by using an N network to obtain N (y), and the image data can be converted into a more effective representation form through encoding, so that noise is easier to infer, noise is removed, an image is restored, and the like in the reverse process. By encoding, the image can be represented as a more characterized and compressible feature vector or feature representation, which helps to more accurately remove noise and recover high quality features of the image in the reverse process; then x is _t T, N (y) is input into a Z network to obtain the noise Z removed in the step _θ And get a new data distribution x _t-1 The method comprises the steps of carrying out a first treatment on the surface of the And continuously and iteratively updating parameters of the Z network and the N network until the original high-quality picture is restored. The specific process is as follows:

1. and inputting the picture y to be repaired into the neural network N to obtain the coding representation N (y) of the picture.

2. For the t-th step of the reverse process, a predicted noise profile p (t, N (y)) is obtained from the current step t and the current data profile xt, and the encoded data N (y), as inputs to the neural network zθ. Namely:

p(t, N(y)) = zθ(t, xt, N(y))

where zθ denotes a neural network with θ as a parameter, which takes as input the current step t, the current data distribution xt, and the encoded data N (y), and outputs the predicted noise distribution.

3. A noise z (t) is sampled from the standard normal distribution and multiplied by the predicted noise distribution p (t, N (y)) to obtain the data distribution xt (t, N (y)) of the current step t. Namely: xt (t, N (y))=z (t) ×p (t, N (y)),

4. the steps 2-3 are repeated until the initial step of the reverse process is reached.

5. Computing optimization targetsWherein->Representing optimization objective +_>Model parameters representing a predicted neural network, +.>Representing the data distribution at the current diffusion step t +.>Adding the expected value of Gaussian noise, +.>Representing the gaussian noise added by the forward process in the current diffusion step t, +>Representing a predictive neural network, ++>Data distribution representing the current diffusion step t, +.>Representing the encoded image +.>Representing the norm square.

Minimizing optimization objectives using gradient descent and other optimization algorithmsThereby obtaining the reconstruction result of the picture y to be repaired.

The N network in the invention adopts a VAE encoder. The VAE encoder is part of a neural network model based on a variational self-encoder (Variational Autoencoder, VAE). The VAE encoder is mainly used for encoding input data into hidden variable vectors in potential space, so that compression and representation of the input data are realized.

VAEs are a generative model that enables the generation and reconstruction of data by learning the probability distribution of the data. The VAE achieves compression and representation of input data by mapping the input data to hidden variable vectors in a potential space, and simultaneously achieves generation and reconstruction of data by sampling the hidden variable vectors.

The VAE encoder is mainly composed of two parts: encoder and sampler. The encoder maps the input data to hidden variable vectors in the potential space and calculates the mean value and standard deviation of the vectors; the sampler samples a hidden variable vector from the distribution.

The main purpose of a VAE encoder is to learn the distribution of hidden variables that map input data into potential space, and the way to sample hidden variable vectors from this distribution. This process can be considered an unsupervised learning process, and the loss function in the VAE encoder learning process is also designed to minimize reconstruction errors and distribution bias in potential space.

The VAE model contains two main parts: an encoder and a decoder. The encoder maps the input data into a distribution of potential spaces, and the decoder maps the hidden variable vectors in the potential spaces back into the original data space.

The following is a specific model structure of the VAE:

(1) an encoder: encoders typically employ a multi-layer neural network structure, where each layer is a fully connected or convolutional layer, for mapping the input data to a distribution in potential space. The output of the encoder comprises two parts: the mean and standard deviation in the potential space, respectively.

(2) Potential spatial sampler: a hidden variable vector is obtained by sampling from the mean and standard deviation of the encoder output. Typically by adding noise to the mean and standard deviation and sampling.

(3) A decoder: the decoder typically employs a neural network structure as opposed to the encoder to map the hidden variable vectors in the potential space back into the original data space. Like the encoder, each layer of the decoder is also a fully concatenated or convolutional layer.

(4) Loss function: the loss function of a VAE model typically consists of two parts: reconstruction loss and KL divergence loss. The reconstruction loss measures the difference between the decoder output and the original data, and the KL divergence loss measures the difference between the potential spatial distribution of the encoder output and the standard normal distribution.

The training process of the VAE model is typically accomplished by minimizing the loss function through a back propagation algorithm. By training the VAE model, potential distributions of data can be learned, while new data samples can be generated.

As shown in fig. 4, a typical uiet model structure is shown, only as a schematic diagram, the input shape and the output shape of the uiet are the same, the channels are 3 (typically, three channels of RGB), and the width and the height are the same.

Essentially, the most important task of DDPM is to train a Unet model, which inputs the sum t and outputs Gaussian noise at Xt-1. I.e. predicting the gaussian noise at the previous moment using Xt and t. This allows the noise to be returned back to the real image step by step.

Assuming we need to generate an image of [64, 64, 3], at time t we have an Xt noise figure, the shape of which is also [64, 64, 3], we input it into the Unet together with t. The output of Unet is the noise of [64, 64, 3] at Xt-1.

In an alternative embodiment of the present invention, the present embodiment encapsulates the inverse process of the FDDM model into an FDDM building texture quality enhancement device, including a trained Z-network and N-network, which takes as input the standard gaussian distribution and the picture to be repaired, and outputs a picture with high quality building texture, as shown in fig. 5.

The following is an example analysis of a building texture image restoration method based on a diffusion model.

1. Preparing environment: the FDDM building texture quality improving device generated through pre-training is adopted, and the running environment is as follows: intel i9-10900X processor, NVIDIA GeForce RTX 3090 GPU, 128G memory, deployment Python 3.6 and Pytorch 1.4.0.

2. And (3) tuning of the device: in order to better adapt to the characteristics of data in the urban building texture quality improvement project, 1000 building elevation texture pictures are sampled from the urban building group, an enhanced data set is constructed, the device is optimized, and the optimizing training totally experiences 500 epochs.

3. Data conditions: the city building texture quality improvement project comprises 400 data to be processed, and most problems mainly include window texture distortion, building elevation texture blurring and the like, and each building elevation texture picture is adjusted to be a picture with the size of 512 x 512.

4. And (3) data processing: when testing, inputting a building elevation texture map to be optimized each time, outputting optimized pictures, and processing one such picture takes about 30s, and 400 pictures are processed in total, and the time is about 2 hours.

5. And (3) effect analysis: by comparison, the effect of the repaired building texture is improved to a certain extent, and the effect is mainly shown in the fact that the deformed texture is subjected to regularization treatment, so that the repaired building texture is more similar to the real situation. The comparative effects before and after repair are shown in fig. 6 to 9.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The principles and embodiments of the present invention have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Those of ordinary skill in the art will recognize that the embodiments described herein are for the purpose of aiding the reader in understanding the principles of the present invention and should be understood that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.

Claims

1. The building texture image restoration method based on the diffusion model is characterized by comprising the following steps of:

2. The method for repairing a building texture image based on a diffusion model according to claim 1, wherein the step S1 specifically comprises the following steps:

3. The method for repairing a building texture image based on a diffusion model according to claim 2, wherein the training of the diffusion model in step S2 specifically comprises the following steps:

4. A method for repairing a texture image of a building based on a diffusion model according to claim 3, wherein the method for determining the data distribution of the current diffusion step in step S23 is as follows:

，

5. A method for repairing a building texture image based on a diffusion model according to claim 3, wherein the minimizing of the error between the inverse oversupplied gaussian noise data and the gaussian noise added in the forward process in step S24 is specifically as an optimization objective:

，