CN116805290A

CN116805290A - Image restoration method and device

Info

Publication number: CN116805290A
Application number: CN202310763126.3A
Authority: CN
Inventors: 李亘杰
Original assignee: Shanghai Bilibili Technology Co Ltd
Current assignee: Shanghai Bilibili Technology Co Ltd
Priority date: 2023-06-26
Filing date: 2023-06-26
Publication date: 2023-09-26

Abstract

The embodiment of the application provides an image restoration method, which comprises the following steps: and acquiring data to be input according to the image to be repaired. Inputting the data to be input into a diffusion model, and fusing the data to be input and a Gaussian noise diagram generated randomly through the diffusion model to obtain fused data; wherein the diffusion model comprises a noise prediction model. And predicting noise parameters of the fusion data through the noise prediction model. And reconstructing and generating output data corresponding to the repaired image according to the data to be input and the noise parameters. And outputting the output data through the diffusion model, wherein the output data is a repaired image corresponding to the image to be repaired. According to the technical scheme provided by the embodiment of the application, the low-quality image can be repaired through the diffusion model, and the corresponding high-quality image is generated, so that the image processing effect is improved.

Description

Image restoration method and device

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to an image restoration method, an image restoration device, computer equipment and a computer readable storage medium.

Background

With the development of computer technology, people have higher quality requirements on images and videos, so that the requirements on image restoration, resolution improvement and the like are met. In the related art, if noise and detail loss in an image are excessive, a repair effect may be poor.

It should be noted that the foregoing is not necessarily prior art, and is not intended to limit the scope of the present application.

Disclosure of Invention

Embodiments of the present application provide an image restoration method, apparatus, computer device, and computer-readable storage medium to solve or alleviate one or more of the technical problems set forth above.

An aspect of an embodiment of the present application provides an image restoration method, including:

acquiring data to be input according to an image to be repaired;

inputting the data to be input into a diffusion model, and fusing the data to be input and a Gaussian noise diagram generated randomly through the diffusion model to obtain fused data; wherein the diffusion model comprises a noise prediction model;

estimating noise parameters of the fusion data through the noise prediction model;

reconstructing and generating output data corresponding to the repaired image according to the data to be input and the noise parameters;

And outputting the output data through the diffusion model, wherein the output data is a repaired image corresponding to the image to be repaired.

Optionally, the acquiring the data to be input according to the image to be repaired includes:

inputting the image to be repaired to a variable division coder-decoder, wherein the variable division coder-decoder is used for compression, reduction and data reconstruction;

and obtaining reconstruction data of the variable division codec, wherein the reconstruction data is the data to be input.

Optionally, the variable division codec includes an encoder, a decoder, and a sampler;

correspondingly, the variational codec is obtained by taking different sample images as inputs and training for the following multi-pass operation:

inputting a sample image into the encoder to obtain a coding vector corresponding to the sample image;

inputting the encoded vector to the decoder to obtain a first sample reconstruction object;

randomly generating a sampling vector by the sampler;

inputting the sampling vector to the decoder to obtain a second sample reconstruction object;

acquiring a first loss value according to the first sample reconstruction object and the second sample reconstruction object;

And adjusting parameters of the variable division codec according to the first loss value.

Optionally, the diffusion model is obtained by the following training operations:

obtaining sample reconstruction data of a sample image through a trained variable division codec;

and taking the sample reconstruction data as sample input data of the diffusion model, and carrying out model training on the diffusion model.

Optionally, the diffusion model is obtained through training a plurality of sample image pairs, and the sample image pairs comprise a first sample image and a second sample image which correspond to the same picture but have different image qualities; each sample image corresponds to a round of training operation of the diffusion model, wherein each round of training operation is as follows:

randomly generating Gaussian noise;

adding the Gaussian noise to the first sample image to obtain a first noise figure;

adding the second sample image into the first noise image through a conditional likelihood function to obtain a second noise image;

obtaining a predicted noise parameter of a second noise figure through the noise prediction model;

acquiring a second loss value according to the predicted noise parameter and the real noise parameter; the real noise parameters are noise parameters corresponding to the Gaussian noise;

And adjusting parameters of the noise prediction model according to the second loss value.

Optionally, the picture quality of the first sample image is higher than the picture quality of the second sample image.

Optionally, the sharpness of the first sample image is higher than the sharpness of the second sample image; and/or

The resolution of the first sample image is higher than the resolution of the second sample image.

Another aspect of an embodiment of the present application provides an image restoration apparatus, including:

the acquisition module is used for acquiring data to be input according to the image to be repaired;

the input module is used for inputting the data to be input into a diffusion model so as to fuse the data to be input with a Gaussian noise diagram generated randomly through the diffusion model to obtain fused data; wherein the diffusion model comprises a noise prediction model;

the estimating module is used for estimating the noise parameters of the fusion data through the noise prediction model;

the reconstruction module is used for reconstructing and generating output data corresponding to the restored image according to the data to be input and the noise parameters;

and the output module is used for outputting the output data through the diffusion model, wherein the output data is a repaired image corresponding to the image to be repaired.

Another aspect of an embodiment of the present application provides a computer apparatus, including:

at least one processor; and

A memory communicatively coupled to the at least one processor;

wherein: the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.

Another aspect of embodiments of the present application provides a computer-readable storage medium having stored therein computer instructions which, when executed by a processor, implement a method as described above.

The embodiment of the application adopts the technical scheme and can have the following advantages:

based on the image generation capability of the diffusion model, the low-quality image can be restored through the diffusion model, so that a high-quality image is generated, and the image processing effect is improved. In addition, the diffusion model can realize efficient calculation through a diffusion process, so that the calculation efficiency in the image restoration process can be improved, and the image processing efficiency is improved.

Drawings

The accompanying drawings illustrate exemplary embodiments and, together with the description, serve to explain exemplary implementations of the embodiments. The illustrated embodiments are for exemplary purposes only and do not limit the scope of the claims. Throughout the drawings, identical reference numerals designate similar, but not necessarily identical, elements.

FIG. 1 schematically illustrates an operational environment diagram of an image restoration method according to a first embodiment of the present application;

FIG. 2 schematically illustrates a flow chart of an image restoration method according to a first embodiment of the application;

fig. 3 schematically shows a flow chart of substeps of step S200 in fig. 1;

FIG. 4 schematically illustrates a new flowchart of an image restoration method according to a first embodiment of the application;

FIG. 5 schematically illustrates another additional flow chart of an image restoration method according to a first embodiment of the application;

FIG. 6 schematically illustrates another additional flow chart of an image restoration method according to a first embodiment of the application;

FIG. 7 schematically illustrates a flow chart of one example application of training a variable partition codec;

FIG. 8 schematically illustrates a flow chart of one exemplary application of training a noise prediction model;

FIG. 9 schematically illustrates a restoration effect diagram of a set of images to be restored and restored images;

FIG. 10 schematically illustrates a repair effect map of another set of images to be repaired and repaired images;

fig. 11 schematically shows a block diagram of an image restoration apparatus according to a second embodiment of the present application; and

Fig. 12 schematically shows a hardware architecture diagram of a computer device according to a third embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that the descriptions of "first," "second," etc. in the embodiments of the present application are for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present application.

In the description of the present application, it should be understood that the numerical references before the steps do not identify the order in which the steps are performed, but are merely used to facilitate description of the present application and to distinguish between each step, and thus should not be construed as limiting the present application.

First, a term explanation is provided in relation to the present application:

diffusion model (Diffusion Models): is a type of latent variable model. The goal of the diffusion model is to learn the latent structure of the dataset by modeling how the data points diffuse in the latent space. The working principle of the diffusion model is as follows: information attenuation due to noise is learned, and then an image is generated using the learned pattern. The diffusion model is capable of generating a realistic image by iteratively diffusing and blending pixel values. This model uses an initial set of pixel values and then gradually spreads and propagates the information by computing and updating the neighbor pixels around each pixel. In each iteration, the model gradually forms the image by updating the value of each pixel taking into account the weights and interactions of the surrounding pixels. Since the diffusion model performs local calculations on each pixel and can take advantage of interactions between pixels, microscopic details and global structures in the image can be captured. Through stepwise iteration and information propagation, the diffusion model is able to fill in blank areas of the image, recover lost details, and generate an image with natural, coherent and realistic feel. In addition, the diffusion model may also generate a diversified image according to the input initial conditions and parameter settings. Different image generation results can be explored by changing parameters such as initial pixel values, iteration times, diffusion rules, weights and the like. The flexibility enables the diffusion model to have wide application potential in the application fields of generating artistic images, repairing images, enhancing images and the like.

Likelihood function (Likelihood function): is a function of parameters in the statistical model, representing the likelihood (likelihood) among the model parameters. Likelihood is used to describe the possible values of an unknown parameter when the result is output by a known random variable.

Super-Resolution technology (SR): is an image processing technique whose purpose is to reconstruct a high resolution image from a low resolution image. The method can improve the definition and detail of the image, make the image become clearer and more textured, reduce noise interference, and enhance the contrast and color of the image.

The variable component codec (Variational Autoencoder, VAE) is an artificial neural network structure, the VAE belonging to a probability generating model (Probabilistic Generative Model). The VAEs can be further divided into encoders and decoders according to their functions. The encoder may map the input variables to a potential Space (Space) corresponding to the parameters of the variation distribution, the function of the decoder being essentially reversed, the decoder being to transform vectors of the potential Space back into reconstructed data.

Image noise: in digital images, unnecessary interference and noise occur in the images due to problems occurring in links such as sensor sampling, digital signal processing, transmission and the like. Image noise may manifest as problems of image blurring, line disappearance, color distortion, etc., which may reduce the quality and definition of an image. Sources of image noise include noise of the sensor, noise of the digital signal, noise in transmission, and the like.

And (3) fuzzy repair: the digital image processing technology is used to remove noise and details in the low-quality image and generate a high-quality image. Blur denoising is used for reducing the definition and precision of an image, reducing noise interference, and enhancing the contrast and color of the image. Common picture blurring repair techniques include image enhancement, image repair, and the like. The image enhancement technology comprises histogram equalization, color enhancement and the like, and the image restoration technology comprises frame removal, background removal and the like.

U-Net model (Convolutional Networks for Biomedical Image Segmentation): is an improved FCN (full convolutional neural network, fully Convolutional Networks) architecture. The U-Net network structure is symmetrical and is called a U-Net model because it is drawn like the letter U. The U-Net model consists of a left compression channel and a right expansion channel.

Training: a function is learned from existing data.

Reasoning: the use of certain functions allows for fast and efficient manipulation of unknown data to achieve desired results.

Next, in order to facilitate understanding of the technical solutions provided by the embodiments of the present application by those skilled in the art, the following description is made on related technologies:

The image restoration and superdivision techniques known to the inventors are all implemented by algorithms based on surrounding pixels. However, in the case where the picture of an image or video is very blurred or noise and detail loss are excessive, if image restoration and superdivision are performed based on an algorithm of surrounding pixels, the restoration effect of the image or video may be poor, and the superdivision effect may not be expected.

Therefore, the embodiment of the application provides an image restoration technical scheme. In the technical scheme, (1) the low-quality photo and the old video are repaired and oversubscribed based on the generation capacity of the diffusion model, so that the scheme is more accurate in processing details and textures. (2) And the variable component codec is adopted to perform data compression and dimension reduction on the image to be repaired, so that the consumption of computing resources in the training and reasoning process of the diffusion model is reduced, and the processing efficiency is improved. (3) The image to be restored is processed through the diffusion model, so that the scheme can be used for image super-resolution technology, and can also be used for image restoration tasks such as image deblurring and noise reduction. The scheme can effectively process images to be repaired with different types, sizes and resolutions. (4) By adopting the conditional likelihood function and the super-resolution technology, the scheme can generate the high-quality image with higher definition and more sense of reality, thereby improving the visual effect of the image. (5) In the training process of the diffusion model, a conditional likelihood function is introduced, so that the trained diffusion model can better capture fine details and textures in an image to be repaired, and super-resolution processing is performed more accurately. (6) When the diffusion model processes the image to be repaired, the diffusion model can realize efficient calculation through a diffusion process, so that the calculation speed in the image repairing process can be improved, and the image processing efficiency is improved. And the method for repairing and superdividing the image through the diffusion model is more efficient than the traditional method based on optimization or machine learning. See in particular below.

Finally, for ease of understanding, an exemplary operating environment is provided below.

As shown in fig. 1, the running environment diagram includes: as shown in fig. 1, the environment schematic includes a service platform 2, a network 4, and a client 6, where:

service platform 2 may be comprised of a single or multiple computing devices. The plurality of computing devices may include virtualized computing instances. Virtualized computing instances may include virtual machines such as emulation of computer systems, operating systems, servers, and the like. The computing device may load the virtual machine based on a virtual image and/or other data defining particular software (e.g., operating system, dedicated application, server) for emulation. As the demand for different types of processing services changes, different virtual machines may be loaded and/or terminated on one or more computing devices. A hypervisor may be implemented to manage the use of different virtual machines on the same computing device.

The service platform 2 may be configured to communicate with clients 6 and the like over a network 4. The network 4 includes various network devices such as routers, switches, multiplexers, hubs, modems, bridges, repeaters, firewalls, proxy devices, and/or the like. The network 4 may include physical links such as a cable link, a twisted pair cable link, a fiber optic link, combinations thereof, or the like, or wireless links such as a cellular link, a satellite link, a Wi-Fi link, or the like.

The service platform 2 may provide services such as image/video repair, image/video superdivision, variational codec training, diffusion model training, etc., such as: repairing the image uploaded by the client, or improving the resolution of the image, and the like.

The client 6 may be an electronic device running an operating system such as Windows, android (android) or iOS, such as a smart phone, tablet device, laptop, virtual reality device, game device, set top box, vehicle terminal, smart television. Based on the above-mentioned operating system, various application programs can be run, such as uploading the image to be repaired to the service platform 2. Of course, the client 6 may also provide a local image restoration function.

The client 6 may provide/configure a user access page, may be used for uploading images to be repaired, etc.

It is noted that the above devices are exemplary and that the number and variety of devices may be adjustable in different scenarios or according to different needs.

The following describes the technical solution of the present application through a plurality of embodiments by taking the service platform 2 as an execution body. It should be understood that these embodiments may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein.

Example 1

Fig. 2 schematically shows a flowchart of an image restoration method according to a first embodiment of the present application.

As shown in fig. 2, the image restoration method may include steps S200 to S208, in which:

step S200, obtaining data to be input according to the image to be repaired.

The image to be repaired may be a poor quality picture or video, such as an old photo, a blurred video, etc.

Poor image quality may be understood as poor definition and/or resolution, or more severe edge distortion, color distortion, etc.

The image to be repaired can be a locally stored image or an image uploaded by a user.

The image to be repaired may be various types of pictures or videos, such as JPG (Joint Photogra phic Experts GROUP) format or PNG (Portable Network Graphics) picture, etc.

The data to be input is data obtained based on the image to be repaired and is used as input of a follow-up diffusion model.

The data to be input is in the form of tensors, i.e. each number corresponds to one or more pixels in the image to be repaired.

The data to be input are corresponding data obtained after the image to be repaired is compressed or subjected to dimension reduction.

In the case of acquiring an image to be repaired, data to be input can be obtained in various ways, for example, a variational codec, PCA (principal component analysis ), LDA (linear discriminant analysis, linear Discriminant Analysis), LLE (local linear embedding, locally L inear Embedding), and the like.

Step S202, inputting the data to be input into a diffusion model, and fusing the data to be input and a Gaussian noise diagram generated randomly through the diffusion model to obtain fused data; wherein the diffusion model comprises a noise prediction model.

Diffusion model: (1) The change in pixel value may be calculated based on the surrounding neighborhood of each pixel. (2) The variation rule of each pixel value is defined by a diffusion function, such as: how to update its own value based on the pixel values of its surrounding neighborhood. The diffusion function may be a linear function, a nonlinear function, or a probabilistic model (e.g., a noise estimation model). (3) The pixel values may be updated continuously in an iterative manner until a stop condition or convergence is reached. The specific repairing process comprises the following steps: the image is progressively restored through a number of iterations. At each iteration, the diffusion model computes new values for each pixel to repair and recover the information of the image. Features such as pixel similarity, gradients, etc. in the neighborhood can also be introduced by the diffusion function to reconstruct the missing pixel information. And stopping iteration under the condition that the repaired image quality reaches the preset requirement.

The fusion mode may include data/feature stitching, data/feature combination, and the like, and may be selected according to actual requirements, which is not limited herein.

And S204, estimating noise parameters of the fusion data through the noise prediction model.

The noise prediction model may include a U-Net model, a GRU (Gate Recurrent Unit) neural network model, and the like.

The noise parameter is obtained according to the noise distribution of the data to be input.

Step S206, reconstructing and generating output data corresponding to the repaired image according to the data to be input and the noise parameter.

And continuously reconstructing and generating corresponding output data according to the noise parameters in an iterative mode until a stopping condition or convergence is reached.

When the image to be repaired is repaired, the noise parameters are predicted through the noise prediction model. Since the noise parameters of each prediction are sampled from one noise distribution, there is a degree of randomness and diversity, which makes the reconstructed output data (i.e., the restored image) more detailed and diverse. Therefore, the generated repaired image can restore more high-definition and finer detail information and high-frequency information on the detail part, and further the image processing effect is improved.

Step S208, outputting the output data through the diffusion model, wherein the output data is a repaired image corresponding to the image to be repaired.

The output data is reconstructed data obtained after multiple iterations, namely corresponding to the repaired image.

The repaired image may include restored pixel values to fill the damaged area and restore the integrity of the image, achieving an image repair effect.

The diffusion model may be computed locally on each pixel and may take advantage of interactions between pixels to capture microscopic details and global structures in the image. Through stepwise iteration and information propagation, the diffusion model is able to fill in blank areas of the image, recover lost details, and generate an image with natural, coherent and realistic feel.

According to the embodiment, based on the image generation capability of the diffusion model, the diffusion model is used for repairing the low-quality photo (the image to be repaired) and the old video, so that a corresponding high-quality image is generated, and the image processing effect is improved. In addition, when the diffusion model is used for processing the image to be repaired, the diffusion model can realize efficient calculation through a diffusion process, so that the calculation speed in the image repairing process can be improved, and the image processing efficiency is improved.

The diffusion model consumes larger computing resources during image restoration, and provides the following alternative embodiments for reducing the consumption of resources.

In an alternative embodiment, as shown in fig. 3, step S202 may include:

step S300, inputting the image to be repaired to a variable component coder-decoder, wherein the variable component coder-decoder is used for compression, reduction and data reconstruction.

Step S302, obtaining reconstruction data of the variable division codec, where the reconstruction data is the data to be input.

The variable component codec is a generative model for data reconstruction. In a variable division codec, an image to be repaired is input into an encoder, which converts the image to be repaired into two vectors of mean and variance, and the two vectors form gaussian distribution.

Compression: on the premise of not losing useful information, the data volume is reduced to reduce the storage space, and the transmission, storage and processing efficiency is further improved.

Dimension reduction: by retaining some important features, some redundant features are removed to achieve the effect of reducing the dimensionality of the data features.

The reconstructed data are data of the image to be repaired after compression, dimension reduction and reconstruction in sequence.

The concrete process of the variable component codec for processing the image to be repaired is as follows: the encoder maps the image to be repaired into the potential space to obtain the encoded vector, and the decoder converts the encoded vector in the potential space back into reconstructed data (data to be input as a diffusion model). Wherein the potential space is a continuous space subject to a gaussian distribution.

In this embodiment, the image to be restored is processed by using the variable division codec, that is, the image to be restored is compressed and reduced in dimension. On the one hand, redundant information can be effectively removed and the data volume is reduced by processing through the variable division codec. The reconstruction data processed by the variable-division codec is input into the diffusion model, so that the consumption of calculation resources of the diffusion model in the image restoration process can be further reduced, and the efficiency of the diffusion model on image processing is improved. On the other hand, the variable-division codec can remove redundant information, so that key features can be effectively grabbed when a subsequent diffusion model processes an image to be repaired, interference of invalid image information is reduced, further processing of details and textures is more accurate, and image processing effect is improved.

The variable component codec may be pre-trained in a number of ways, one of which is provided below for reference:

in an alternative embodiment, as shown in fig. 4 and 7, the variable division codec includes an encoder, a decoder, and a sampler; correspondingly, the variational codec is obtained by taking different sample images as inputs and training for the following multi-pass operation:

step S400, inputting a sample image into the encoder to obtain a coding vector corresponding to the sample image.

Step S402, inputting the encoded vector to the decoder to obtain a first sample reconstruction object.

Step S404, randomly generating a sampling vector by the sampler.

Step S406, inputting the sample vector to the decoder to obtain a second sample reconstruction object.

Step S408, obtaining a first loss value according to the first sample reconstruction object and the second sample reconstruction object.

The first loss value (loss) is the error between the sampled vector and the encoded vector.

Step S410, adjusting parameters of the variable division codec according to the first loss value.

The parameters of the variable component codec are adjusted through the first loss value, the distribution similarity of the sampling vector and the coding vector is increased, and the situation that the image corresponding to the input data and the output data of the variable component codec is not the same picture is reduced.

In this embodiment, a sampling vector is randomly generated by a sampler, and the sampling vector is compared with a coding vector to obtain a first loss value, and then parameters of the variable component codec are gradually adjusted by the loss value to train the variable component codec. By training the variable-division codec, the reconstruction accuracy of the coding vector is improved, so that the trained variable-division codec can compress and reduce the dimension of the image to be repaired under the condition that key image information is not lost as much as possible.

The diffusion model consumes larger computing resources during the training process, and the following alternative embodiments are provided for reducing the resource consumption.

In an alternative embodiment, as shown in fig. 5, may include:

step S500, sample reconstruction data of the sample image is obtained through the trained variable division codec.

Step S502, taking the sample reconstruction data as sample input data of the diffusion model, and performing model training on the diffusion model.

The sample reconstruction data is compressed data obtained based on a sample image and is used as input in the subsequent training of a diffusion model, and the sample reconstruction data is data compressed and subjected to dimension reduction by a trained variable-division codec.

In this embodiment, before training the diffusion model, the variable-division codec is trained, so that the sample input data of the diffusion model is processed by the variable-division codec. The redundant information can be effectively removed by processing the sample input data through the variable division encoding and decoding. In addition, the data volume can be effectively reduced by compressing and dimension-reducing the sample image, so that the calculation efficiency is improved, and the calculation resources during the training of the diffusion model are saved.

An exemplary training scheme for the diffusion model is provided below.

In an alternative embodiment, as shown in fig. 6 and fig. 8, the diffusion model is obtained by training a plurality of sample image pairs, and the sample image pairs include a first sample image and a second sample image which correspond to the same picture but have different image qualities; each sample image corresponds to a round of training operation of the diffusion model, wherein each round of training operation is as follows:

step S600, randomly generating gaussian noise.

The gaussian noise is a noise with a normal distribution (also called gaussian distribution) probability density function.

In step S602, the gaussian noise is added to the first sample image, so as to obtain a first noise map.

Gaussian noise may be added step by step, forming a first noise figure at a time, until a pure noise figure is obtained.

Gaussian noise can also be added at one time, and a pure noise diagram can be directly obtained.

The first sample image x_0 is a sample image with higher image quality.

The first noise figure is a picture x_t.

In step S604, the second sample image is added to the first noise figure by the conditional likelihood function to obtain a second noise figure.

The second sample image may be a sample image having the same picture as the first sample image but a lower picture quality.

Step S606, obtaining a predicted noise parameter of the second noise map through the noise prediction model.

Step S608, obtaining a second loss value according to the predicted noise parameter and the real noise parameter; the real noise parameter is a noise parameter corresponding to the Gaussian noise.

The second loss value is a difference between the predicted noise parameter and the real noise parameter.

The real noise parameter is the noise recorded during the noise addition to the first sample image.

Step S610, adjusting parameters of the noise prediction model according to the second loss value.

The following describes a specific training operation of the diffusion model:

Training principle of diffusion model: the first sample image is corrupted by continuously adding gaussian noise, and then by reversing this noise process, it is learned how to recover, i.e. adding gaussian noise first and then removing gaussian noise. The diffusion model can be trained by a forward process and a reverse process. The forward and reverse processes will be specifically described below.

(1) The forward process is also called as a diffusion process, and the forward process is to change the first sample image x_0 into a first noise image x_t with pure Gaussian noise by gradually adding Gaussian noise, so as to achieve the purpose of destroying the first sample image. Specifically:

the method comprises the steps of inputting a first sample image with higher image quality into a diffusion model, adding Gaussian noise E generated randomly into the first sample image, and then destroying the first sample image, namely performing a noise adding process on the first sample image.

The noise adding process can be represented by the following formula:

wherein the method comprises the steps ofIs a pre-set super-parameter called Noise schedule, is a small list of values, ε _t-1 N (0, 1) is Gaussian noise and t is the number of steps to add noise.

From the iterative derivation of equation (1), the equations for x_0 to x_t can be derived as follows:

Wherein,,is a super parameter set with Noise schedule, e-N (0, 1) is also a gaussian Noise, and t is the number of steps to add the Noise.

Either equation (1) or (2) can be used to describe the forward process. The formula (1) is used for the process of gradually destroying a picture, and the formula (2) is used for the process of destroying a picture in one step.

After the first noise figure is obtained, training of the reverse process may be performed.

(2) The reverse process: by estimating the actual gaussian noise, the multiple iterations gradually restore the corrupted first noise pattern x_t to the first sample image x_0.

The reverse process is formulated:

in the training of the reverse process, in order to accurately estimate the real Gaussian noise, a model for estimating the real noise from x_t and t, namely a noise prediction model E, is trained _θ (x _t T). By training a noise prediction model e _θ (x _t T) predicting the noise prediction modelSimilar to the actual gaussian noise e used to destroy the first sample image. Sigma (sigma) _t Is a fixed constant, the particular value and choice of which will depend on the particular application and requirements.

The training process of the noise prediction model is as follows: the second sample image is added to the first noise figure x_0 by a conditional likelihood function to obtain a second noise figure. And adding the second sample image with lower image quality into the first noise image through a conditional likelihood function, so that the generated second noise image and the first and second sample images are the same picture. For example, the first and second sample images are photographs of the same cat, and the second noise figure generated is also a photograph of the same cat.

Inputting the second noise figure into the noise prediction model, estimating the noise parameter of the second noise figure, and estimating the estimated predicted noise parameterComparing with the real noise parameter of the real noise epsilon to obtain a second Loss value Loss, wherein the second Loss value Loss can be expressed by the following formula:

and adjusting parameters of the noise prediction model through the second loss value.

In this embodiment, a conditional likelihood function is introduced when the diffusion model is trained, so that the trained diffusion model can better capture fine details and textures in the image, so that image restoration and superdivision processing can be performed subsequently. In addition, the noise prediction model is trained, so that the noise prediction model can more accurately predict the noise distribution of each detail part on the image to be repaired. The benefit of this is: facilitating the subsequent reconstruction of the output data (i.e., the restored image) based on the predicted noise parameters for each detail portion. And the predicted noise parameters are closer to the actual Gaussian noise parameters, so that the details and textures can be processed more accurately by the diffusion model, an image with more sense of reality and higher definition can be generated, and the image processing effect is further improved.

The diffusion model may be used to improve the image quality, and the image quality of the first sample image is higher than the image quality of the second sample image.

The diffusion model can be used in image restoration and superdivision technology.

In an alternative embodiment, the sharpness of the first sample image is higher than the sharpness of the second sample image. In other embodiments, the resolution of the first sample image is higher than the resolution of the second sample image.

The definition refers to the definition degree of each detail and boundary of each detail on the image. In this embodiment, a large number of contrast pictures with different resolutions can be used as training samples, so that the trained diffusion model can realize image restoration tasks such as deblurring of images, and the image definition after restoration becomes high, thereby improving the visual effect of the images.

The image resolution is a set of performance parameters for evaluating the richness of detail information in the image, and comprises time resolution, space resolution, color level resolution and the like. High resolution images generally include greater pixel density, more texture detail, and higher reliability than low resolution images. In this embodiment, a large number of control images with different resolutions may be used as training samples, so that the trained diffusion model may be used in the super-resolution technology, and further, a high-resolution image may be generated. And as the texture details of the high-resolution image are richer, the repaired image processed by the diffusion model has higher definition and finer detail information.

To make the application easier to understand, one exemplary application is provided below.

In this exemplary application, the service platform 2 is connected to a client through a network, and provides an image restoration service to the client.

First, practical application:

and inputting the blurred picture into the VAE to obtain reconstruction data. The reconstructed data is input into a diffusion model, and the diffusion model fuses (e.g. splits and the like) the reconstructed data and the Gaussian noise map generated randomly to obtain fused data. Then, the fusion data is input into the U-net model to output estimated noise. And restoring the corresponding blurred picture based on the estimated noise.

The image shown on the left side of fig. 9 is input into the diffusion model, and the image shown on the right side of fig. 9 can be output.

The image shown on the left side of fig. 10 is input into the diffusion model, and the image shown on the right side of fig. 10 can be output.

From the comparison of the pictures, a more detailed and various image can be obtained through the diffusion model, such as the right image (restored image) in fig. 9. The right image (restored image) in fig. 9 has higher definition and finer detail information and high frequency information on the detail portion than the blurred image (image to be restored) of the left image in fig. 9.

Second, model training process:

(1) firstly, training a variable division codec, wherein a specific training process is as follows:

s21: and inputting the sample image into an encoder to obtain a corresponding coding vector, and inputting the coding vector into a decoder to obtain a first sample reconstruction object.

S22: and randomly generating a sampling vector through a sampler, and inputting the sampling vector to a decoder to obtain a second sample reconstruction object.

S23: and comparing the first sample reconstruction object with the second sample reconstruction object to obtain a first loss value.

S24: parameters of the variable division codec are adjusted according to the first loss value.

(2) After the variational codec is trained, the diffusion model is trained. A specific training process is as follows:

s31, inputting the high-quality sample image into a variable-resolution codec for processing, and inputting the processed high-quality sample image into a diffusion model.

And S31, adding Gaussian noise into the processed high-quality sample image to obtain a first noise image.

And S32, adding the corresponding low-quality sample image into the first noise image through a conditional likelihood function to obtain a second noise image.

S33, obtaining the predicted noise parameters of the second noise diagram through the noise prediction model.

S34, obtaining a second loss value according to the predicted noise parameter and the real noise parameter.

And S35, adjusting parameters of the noise prediction model according to the second loss value.

Example two

Fig. 11 schematically shows a block diagram of an image restoration apparatus according to a second embodiment of the present application. The image restoration device may be divided into one or more program modules, which are stored in a storage medium and executed by one or more processors to accomplish an embodiment of the present application. Program modules in accordance with the embodiments of the present application are directed to a series of computer program instruction segments capable of performing the specified functions, and the following description describes each program module in detail. As shown in fig. 11, the image restoration apparatus 1100 may include: an acquisition module 1110, an input module 1120, an estimation module 1130, a reconstruction module 1140, and an output module 1150, wherein:

an acquisition module 1110, configured to acquire data to be input according to an image to be repaired;

the input module 1120 is configured to input the data to be input into a diffusion model, so as to fuse the data to be input with a gaussian noise diagram generated randomly through the diffusion model, and obtain fused data; wherein the diffusion model comprises a noise prediction model;

The estimating module 1130 is configured to estimate a noise parameter of the fused data through the noise prediction model;

a reconstruction module 1140, configured to reconstruct and generate output data corresponding to the restored image according to the data to be input and the noise parameter;

and an output module 1150, configured to output the output data through the diffusion model, where the output data is a repaired image corresponding to the image to be repaired.

In an alternative embodiment, the acquiring module 1110 is further configured to:

In an alternative embodiment, the variable division codec includes an encoder, a decoder, and a sampler;

the image restoration apparatus may further include a first training module (not identified) for:

Randomly generating a sampling vector by the sampler;

In an alternative embodiment, the apparatus may further comprise a second training module (not identified) for:

In an alternative embodiment, the diffusion model is obtained through training of a plurality of sample image pairs, and the sample image pairs comprise a first sample image and a second sample image which correspond to the same picture but have different image quality; the second training module (not identified) is further configured to:

randomly generating Gaussian noise;

In an alternative embodiment, the picture quality of the first sample image is higher than the picture quality of the second sample image.

In an alternative embodiment, the sharpness of the first sample image is higher than the sharpness of the second sample image; and/or the resolution of the first sample image is higher than the resolution of the second sample image.

Example III

Fig. 12 schematically shows a hardware architecture diagram of a computer device 10000 adapted to implement an image restoration method according to a third embodiment of the present application. In some embodiments, computer device 10000 may be a smart phone, a wearable device, a tablet, a personal computer, a vehicle terminal, a gaming machine, a virtual device, a workstation, a digital assistant, a set top box, a robot, or the like. In other embodiments, the computer device 10000 may be a rack server, a blade server, a tower server, or a rack server (including a stand-alone server, or a server cluster composed of multiple servers), or the like. As shown in fig. 12, the computer device 10000 includes, but is not limited to: the memory 10010, processor 10020, network interface 10030 may be communicatively linked to each other via a system bus. Wherein:

Memory 10010 includes at least one type of computer-readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory), random Access Memory (RAM), static Random Access Memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, and the like. In some embodiments, memory 10010 may be an internal storage module of computer device 10000, such as a hard disk or memory of computer device 10000. In other embodiments, the memory 10010 may also be an external storage device of the computer device 10000, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the computer device 10000. Of course, the memory 10010 may also include both an internal memory module of the computer device 10000 and an external memory device thereof. In this embodiment, the memory 10010 is typically used for storing an operating system installed on the computer device 10000 and various application software, such as program codes of an image restoration method. In addition, the memory 10010 may be used to temporarily store various types of data that have been output or are to be output.

The processor 10020 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other chip in some embodiments. The processor 10020 is typically configured to control overall operation of the computer device 10000, such as performing control and processing related to data interaction or communication with the computer device 10000. In this embodiment, the processor 10020 is configured to execute program codes or process data stored in the memory 10010.

The network interface 10030 may comprise a wireless network interface or a wired network interface, which network interface 10030 is typically used to establish a communication link between the computer device 10000 and other computer devices. For example, the network interface 10030 is used to connect the computer device 10000 to an external terminal through a network, establish a data transmission channel and a communication link between the computer device 10000 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a global system for mobile communications (Global System of Mobile communication, abbreviated as GSM), wideband code division multiple access (Wideband Code Divi sion Multiple Access, abbreviated as WCDMA), a 4G network, a 5G network, bluetooth (bluetooth), wi-Fi, etc.

It should be noted that fig. 12 only shows a computer device having components 10010-10030, but it should be understood that not all of the illustrated components are required to be implemented, and that more or fewer components may be implemented instead.

In this embodiment, the image restoration method stored in the memory 10010 may be further divided into one or more program modules and executed by one or more processors (such as the processor 10020) to complete the embodiment of the present application.

Example IV

The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, implements the steps of the image restoration method in the embodiment.

In this embodiment, the computer-readable storage medium includes a flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEP ROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the computer readable storage medium may be an internal storage unit of a computer device, such as a hard disk or a memory of the computer device. In other embodiments, the computer readable storage medium may also be an external storage device of a computer device, such as a plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash memory Card (Flash Card), etc. that are provided on the computer device. Of course, the computer-readable storage medium may also include both internal storage units of a computer device and external storage devices. In this embodiment, the computer readable storage medium is typically used to store an operating system and various types of application software installed on a computer device, such as program codes of the image restoration method in the embodiment, and the like. Furthermore, the computer-readable storage medium may also be used to temporarily store various types of data that have been output or are to be output.

It will be apparent to those skilled in the art that the modules or steps of the embodiments of the application described above may be implemented in a general purpose computer device, they may be concentrated on a single computer device, or distributed over a network of multiple computer devices, they may alternatively be implemented in program code executable by a computer device, so that they may be stored in a storage device for execution by the computer device, and in some cases, the steps shown or described may be performed in a different order than what is shown or described, or they may be separately made into individual integrated circuit modules, or a plurality of modules or steps in them may be made into a single integrated circuit module. Thus, embodiments of the application are not limited to any specific combination of hardware and software.

It should be noted that the foregoing is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application, and all equivalent structures or equivalent processes using the descriptions of the present application and the accompanying drawings, or direct or indirect application in other related technical fields, are included in the scope of the present application.

Claims

1. A method of image restoration, the method comprising:

acquiring data to be input according to an image to be repaired;

2. The method of claim 1, wherein the acquiring data to be input from the image to be repaired comprises:

3. The method of claim 2, wherein the variable division codec comprises an encoder, a decoder, and a sampler;

randomly generating a sampling vector by the sampler;

4. A method according to claim 2 or 3, characterized in that the diffusion model is obtained by the following training operations:

5. The method according to claim 1, wherein the diffusion model is trained by a plurality of sample image pairs including a first sample image and a second sample image corresponding to the same picture but different in image quality; each sample image corresponds to a round of training operation of the diffusion model, wherein each round of training operation is as follows:

Randomly generating Gaussian noise;

6. The method of claim 5, wherein the step of determining the position of the probe is performed,

the picture quality of the first sample image is higher than the picture quality of the second sample image.

7. The method of claim 6, wherein the step of providing the first layer comprises,

the sharpness of the first sample image is higher than the sharpness of the second sample image; and/or

8. An image restoration device, the device comprising:

9. A computer device, comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein:

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 7.

10. A computer readable storage medium having stored therein computer instructions which when executed by a processor implement the method of any one of claims 1 to 7.