WO2022194344A1 - Learnable augmentation space for dense generative adversarial networks - Google Patents

Learnable augmentation space for dense generative adversarial networks Download PDF

Info

Publication number
WO2022194344A1
WO2022194344A1 PCT/EP2021/056583 EP2021056583W WO2022194344A1 WO 2022194344 A1 WO2022194344 A1 WO 2022194344A1 EP 2021056583 W EP2021056583 W EP 2021056583W WO 2022194344 A1 WO2022194344 A1 WO 2022194344A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
processed
real
samples
restoration model
Prior art date
Application number
PCT/EP2021/056583
Other languages
French (fr)
Inventor
Ioannis ALEXIOU
Ioannis MARRAS
Stefanos ZAFEIRIOU
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/EP2021/056583 priority Critical patent/WO2022194344A1/en
Publication of WO2022194344A1 publication Critical patent/WO2022194344A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/10Image enhancement or restoration using non-spatial domain filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • This invention relates to image restoration, for example using dense Generative Adversarial Networks (GANs).
  • GANs dense Generative Adversarial Networks
  • Image restoration is a field of research which studies methods to recover a degraded image to its original form.
  • the techniques may also be used to improve an image of inherently lower quality image that has not been degraded.
  • There are many areas of application such as super resolution, demosaicking, denoising, deblurring and other.
  • Recently convolutional networks have shown tremendous ability to recover missing signals in comparison to what traditional methods could do in the past.
  • This metric is typical in supervised methods that aim to minimise a value between the estimated image and the ground real.
  • the other is perception metrics that are trained based on human scores to generate visually pleasing images accompanied by an adversarial training scheme.
  • ground real based metrics aim to output the best possible real structure with blur, whereas perceptual metrics output sharper images with unrealistic structure, or hallucinated structure.
  • perceptual metrics are generally favoured for use in GANs as they produce a sharper image to the human eye.
  • perceptual metrics have not yet outperformed the ground real based metrics when measured using peak signal-to-noise ratio (PSNR) as the hallucinate affects the structure.
  • PSNR peak signal-to-noise ratio
  • an apparatus for training an image restoration model comprising one or more processors configured to: receive a training image; generate a processed image from the training image by means of a candidate image restoration model; generate one or more samples of the processed image, wherein the samples of the processed image are indicative of a processed image random distribution of frequencies, the samples of the processed image collectively contributing to a processed image sample dataset; and adapt the image restoration model in dependence on the processed image dataset.
  • this may enable samples to be stored as a group so that they can easily be used in the next steps.
  • the apparatus may be configured to adapt the image restoration model depending on: receiving a real image corresponding to the processed image; and generating one or more samples of the real image, the samples of the real image are indicative of a real image random distribution of frequencies, the samples of the real image collectively contributing to a real image sample dataset.
  • this may introduce more variability and diversity into the training of the image restoration model.
  • the increased variability and diversity in the extracted image bands may better train the model at identifying and restoring missing signals.
  • this may enable samples to be stored as a group so that they can easily be used in the next steps.
  • the apparatus may be configured to adapt the image restoration model in dependence on the processed sample dataset by comparing the processed sample dataset with the real sample dataset and adapting the candidate image restoration model in dependence on that comparison.
  • the comparison may take account of the differences between the samples when adapting and training the image restoration model.
  • the apparatus may be configured to adapt the image restoration model in dependence on the processed sample dataset by comparing each of the processed sample data with the real sample dataset and adapting the candidate image restoration model in dependence on that comparison.
  • this may provide a more detailed dataset for the image restoration model to be trained with. This may result in an improved learning apparatus.
  • the apparatus may be configured to compare the processed sample dataset with the real sample dataset by comparing the mean squared error difference between the processed sample dataset and the real sample dataset and adapting the candidate image restoration model in dependence on that comparison.
  • this provides a metric for measuring the accuracy of the image restoration training apparatus and a quantifiable output to use as a gradient for the input of the next iteration of the training apparatus.
  • the apparatus may be configured to provide the processed image random distribution of frequencies at different frequency levels to the real image random distribution of frequencies.
  • this may further increase the variation and diversity of the training inputs as described above.
  • the apparatus may be configured to after generating samples of the processed image and before adapting the image restoration model, transform the processed sample dataset into the frequency domain. In some implementations, the apparatus may be configured to after generating samples of the real image and before adapting the image restoration model, transform the real sample dataset into the frequency domain.
  • the missing structure may be easier to separate. This may improve the ability of the image restoration training apparatus to target the missing signals and use them for training.
  • the apparatus may be configured to transform the sample dataset into the frequency domain by a linear transformation.
  • a linear network may exhibit non-linear learning dynamics similar to the equivalent non-linear counterparts. This may enable the network to train more easily and more quickly.
  • the apparatus may be configured to carry out the steps above for one or more subsequent processed image(s).
  • the apparatus may be configured to carry out the steps above for one or more subsequent real image(s), each subsequent real image corresponding to a respective subsequent processed image.
  • the image restoration training model may develop and improve over each iteration. Over the iterations the model may tend to an optimum.
  • the apparatus may be configured to provide each subsequent processed image random distribution of frequencies at different frequency levels to the preceding processed image distribution of frequencies.
  • the apparatus may be configured to provide each subsequent real image random distribution of frequencies at different frequency levels to the preceding real image distribution of frequencies.
  • an image restoration apparatus comprising one or more processors and a memory storing in non-transient form data defining program code executable by the processor(s) to implement an image restoration model formed by the apparatus above, the apparatus being configured to: receive a degraded image; and restore the degraded image by means of the image restoration model.
  • the image restoration apparatus may receive the benefits provided by the training apparatus described above.
  • a method for training an image restoration model comprising: receiving a training image; generating a processed image from the training image by means of a candidate image restoration model; generating one or more samples of the processed image, wherein the samples of the processed image are indicative of a processed image random distribution of frequencies, the samples of the processed image collectively contributing to a processed sample dataset; and adapting the candidate image restoration model in dependence on the processed sample dataset.
  • this may introduce more variability and diversity into the training of the image restoration model.
  • the increased variability and diversity in the extracted image bands may train the model better at identifying and restoring missing signals.
  • this may enable to samples to be stored as a group so that they can easily be used in the next steps.
  • a method for image restoration the method implementing the image restoration model formed by the method above and comprising: receiving a degraded image; and restoring the degraded image by means of the image restoration model.
  • the image restoration method may receive the benefits provided by the training apparatus described above.
  • Figures 1A-1C illustrate examples of images that have undergone image restoration.
  • Figure 1A illustrates image demosaicking
  • Figure 1B illustrates image denoising
  • Figure 1C illustrates image super resolution.
  • Figure 2 schematically illustrates the stages the images may undergo in the image restoration training apparatus.
  • Figure 3 schematically illustrates an exemplary network architecture used in the image restoration training apparatus.
  • Figure 4 schematically illustrates an exemplary augmentation block.
  • Figure 5 schematically illustrates an exemplary discriminator.
  • Figure 6 shows an example of a method for training an image restoration model.
  • Figure 7 shows an example of an apparatus configured to perform the methods described herein.
  • the apparatuses and methods described herein concern training an image restoration model and using said model to restore raw images.
  • Embodiments of the present invention tackle one or more of the problems previously mentioned by generating samples of the processed image that are indicative of a processed image random distribution of frequencies. In this way, it is possible to enable the model to learn from a range of frequency distributions and consequently enable the model to reduce the hallucination in the restored image.
  • Figures 1A-1C show examples of image restoration operations.
  • Figure 1A shows an example of image demosaicking.
  • the image 101 is captured from the sensor as a mosaic 102 which generates a mosaicked image 103.
  • the mosaicked image 103 has missing signals.
  • An algorithm is used to perform demosaicking on the mosaicked image 103 to recover the missing signals to generate an output image 104.
  • the output image 104 is clearer than the input image 103.
  • Figure 1 B shows an example of image denoising.
  • An input image 105 comprises a level of noise which makes the photo unclear.
  • An algorithm is used to perform denoising on the input image 105 to remove the noise by recovering the missing signals to generate an output image 106.
  • the output image 106 has reduced noise and is clearer than the input image 105.
  • Figure 1C shows an example of image super-resolution.
  • An original image 107 is zoomed in to generate an input image 105 for the super-resolution.
  • the zooming in makes the input image 105 unclear.
  • An algorithm is used to perform super resolution on the input image 108 to remove recover the missing signals to generate an output image 109.
  • the output image 109 is clearer than the input image 108.
  • Embodiments of the herein described apparatus and method may be used to carry out a range of image restoration tasks including image demosaicking, denoising, super-resolution, deblurring, amongst others.
  • the input image may also include a range of features, including images and videos in relation to computer vision, to corpora of text in relation to natural language processing, or to gene expressions in relation to bioinformatics.
  • Figure 2 shows, in a preferred implementation, the stages that the input images 202, 206 may undergo in the image restoration training apparatus and training method described herein.
  • the image 202 is a real image and the image 206 is a training image.
  • the real image may be a photographically captured image or an artificially generated image.
  • the real image may also be referred to as a truth or ground truth image.
  • the training image may be formed by degrading the real image. Then the real image may represent a target against which a restored version of the training image can be gauged.
  • the training image 206 may have a range of imperfections including mosaicking, noise, blur and so on. In any case, the training image 206 is likely to have missing signals.
  • the real image 202 corresponds to the training image 206. In other words, the real image 202 is of the same content of the training image 206. For example, as shown in Figure 1 A, both images 202, 206 would be of the same bird’s head.
  • the real image 202 preferably does not have the imperfections that are found in the training image 206. In other words, the real image preferably does not have the missing signals.
  • a processed image 207 is generated from the training image 206.
  • the processed image 207 is preferably generated by applying a candidate image restoration model to the training image 206.
  • the processing applied by the candidate image restoration model will depend on the state of the training image 206. If the training image 206 is at the required resolution, then the processed image 207 will be the same size. If the training image 206 is at a lower resolution, then the candidate image restoration model will upsample the training image 206 to the required resolution for the next steps.
  • Samples 208 of the processed image 207 are generated.
  • the apparatus 301 may generate one or more samples 208.
  • the samples 208 are indicative of a processed image 207 random distribution of frequencies.
  • the sample 208 may be selected within a certain band or level of frequencies.
  • the distribution of frequencies is preferably random or stochastic. In other words, the distribution of frequencies may be different for each sample 208 and each processed image 207 and each iteration of processed image 207, as explained below.
  • the samples 208 of the processed image 207 may collectively contribute to a processed image dataset 209.
  • the processed image dataset 209 may enable to samples 208 to be stored as a group so that they can easily be used in the next steps.
  • Samples 203 of the real image 202 are generated.
  • the apparatus may generate one or more samples 203.
  • the samples 203 are indicative of a real image 202 random distribution of frequencies.
  • the sample 203 may be selected within a certain band or level of frequencies.
  • the distribution of frequencies is preferably random or stochastic.
  • the distribution of frequencies may be different for each sample 203 and each real image 202 and each iteration of real image 202, as explained below.
  • the distribution of frequencies level may also be different for the real image 202 and the processed image 207.
  • the samples 203 of the real image 202 may collectively contribute to a processed image dataset 204.
  • the processed image dataset 204 may enable to samples 203 to be stored as a group so that they can easily be used in the next steps.
  • the processed sample dataset 209 is transformed into the frequency domain to generate a frequency domain processed sample dataset 201. This may be using a Fourier transform convert the processed sample dataset 209 into the complex space. Preferably the transformation is linear.
  • the real sample dataset 204 is transformed into the frequency domain to generate a frequency domain real sample dataset 205.
  • This may be performed using a Fourier transform convert the processed sample dataset 204 into the complex space.
  • the transformation is linear.
  • the processed sample data set 209 is compared with the real sample dataset 204. It may also be suitable to compare each of the processed sample data 208 with the real sample dataset 204. This may enable the weight of each of the processed sample data 208 to be varied in the comparison step. It may also be suitable to compare the frequency domain processed sample data set 209 with the frequency domain real sample dataset 204 if the processed sample data set 209 and the real sample dataset 204 have been transformed into the frequency domain. The comparison of the processed sample data set 209 with the real sample dataset 204 may be carried out using the mean squared error difference between the two datasets. It may also be suitable to use other forms of error difference.
  • the image restoration model is adapted in dependence on the processed image dataset 209. It also may be suitable to adapt the image restoration model in dependence of the frequency domain processed image dataset 210 if the processed image dataset 209 has been transformed into the frequency domain. It may also be suitable to adapt the image restoration model in dependence on the comparison between the processed sample data set 209 with the real sample dataset 204. This may involve using the error difference between the datasets, as described above.
  • the apparatus may carry out the above steps for one or more subsequent processed images 207.
  • the apparatus may carry out the above steps for one or more subsequent real images 202.
  • the one or more subsequent iterations that the apparatus 301 carries out may further adapt and improve the image restoration model.
  • the real image 202 may correspond with the processed image 207 in any given iteration of the apparatus 301 steps, i.e. the real image 202 and the processed image 207 are both the same image of a bird.
  • Each iteration may use a different image, i.e. one iteration may be an image of a bird and one iteration may be an image of a zebra. Or, each iteration may use the same image, i.e. the image is always the same image of a bird.
  • the training images 206 used to generate the processed image 207 may be provided from a bank of training images 206. For example, there may be a bank of images of animals to be used as training images 206.
  • the processed random distribution of frequencies in the subsequent iteration may preferably be at a different frequency level or band from the previous iteration.
  • the processed random distribution of frequencies may not be the same in any two consecutive iterations.
  • the processed random distribution of frequencies may even be different in every iteration such that the processed random distribution of frequencies is never the same.
  • the real random distribution of frequencies in the subsequent iteration may preferably be at a different frequency level or band from the previous iteration. In other words, the real random distribution of frequencies may not be the same in any two consecutive iterations. The real random distribution of frequencies may even be different in every iteration such that the real random distribution of frequencies is never the same.
  • Figure 3 shows an exemplary embodiment of the network architecture 301 used in the image restoration training apparatus.
  • Figure 3 shows a real image path and a training image path.
  • the real image 302 is input into a real transformer 303.
  • the real transformer 303 may carry out the above-described transforming step to generate frequency domain real sample dataset 205.
  • the real transformer 303 may also carry out the above-described generating step to generate samples 203 of the real image 202 which collectively contribute to a real image data set 204. Alternatively, the generation of samples 203 may be carried out by a separate component.
  • the output of the real transformer 303, the real image data set 204, is input into the real transform augmentation space 304.
  • the output of the real transform augmentation space 304 is input into the dense discriminator 309.
  • the training image 305 is input into the restoration generator 306.
  • the restoration generator 306 may carry out the above-described step of generating the processed image 207 from the training image 206.
  • the output of the restoration generator 306 is input into the training transformer 307.
  • the processed image 207, generated from the training image 305, is input into a training transformer 307.
  • the training transformer 307 may carry out the above-described transforming step to generate frequency domain processed sample dataset 210.
  • the training transformer 307 may also carry out the above-described generating step to generate samples 208 of the processed image 207 which collectively contribute to a processed image data set 209. Alternatively, the generation of samples 209 may be carried out by a separate component.
  • the output of the training transformer 307, the processed image data set 209, is input into the training transform augmentation space 308.
  • the output of the training transform augmentation space 308 is input into the dense discriminator 309.
  • the real transformer 303 and the training transformer 307 may be formed by a single transformer.
  • the real transform augmentation space 304 and the training transform augmentation space 308 may be formed by a single transform augmentation space.
  • the output of the dense discriminator 309 may be input into the evaluation space 310.
  • the transform augmentation space 304, 308, the dense discriminator 309 and the evaluation space 310 may carry out the above-described steps of comparing the processed sample data set 209 with the real sample dataset 204 and adapting the image restoration training model.
  • An important component in the adversarial network used in the image restoration training apparatus is the augmentation layer 303, 304, 307, 308 that transforms data, the processed image 207, from the generator 306 to the discriminator 309. This part of the adversarial network is important for avoiding saturation and mode collapse. This is where the image restoration stops learning from training iterations as the errors may converge to zero incorrectly.
  • a learnable augmentation layer that introduces the variability is preferable to avoid overfitting.
  • the approach described herein introduces the variability by sampling the processed image and the real image that are indicative, or dependent on a random distribution of frequencies. Mapping a random distribution of frequencies may increase the variability component to the desired end feature space.
  • the end feature space is preferably in the frequency domain because frequency domain may provide an easier method of separating the missing structure. Without transforming into the frequency domain, the missing information is scattered on the image whereas in the frequency domain the missing structure can be segmented by hard linear borders. The ease to separate missing structure in the frequency domain may therefore be beneficial to the learning of the training apparatus.
  • mapping a random distribution and transforming the output into the frequency domain has been found to improve the PSNR metric of the image restoration apparatus.
  • a primary objective is to learn a distribution, or image restoration model, that matches the ground real image 302.
  • Restoration tasks seek to recover the missing information in the degraded image.
  • the missing information may often lie within the high frequency spectrum.
  • the image restoration task may include super-resolution, denoising, a joint operation or any other image restoration task.
  • the network 301 comprises a generator G 306, a discriminator D 309 and an augmentation layer A 304, 308.
  • the choice of the generator 306 can be any supervised network that is suitable for the chosen task.
  • the generator 306 is a Regularised Generative Adversarial Network (RCAN) due to the diverse features added to the architecture.
  • RCAN may combine residual in residual blocks along with self-attention which may enable it to translate any possible performance gains in other simpler architectures.
  • the discriminator 309 is preferably a dense estimation internal kernel Generative Adversarial Network (GAN). This may enable a richer flow of information in the apparatus 301.
  • the dense discriminator 309 preferably estimates from the spatial dimensions and the channel dimension to form a tensor.
  • the images samples 203, 208 are indicative of a random distribution of frequencies.
  • the evaluation space 310 then performs the evaluation, or comparison, of the images on preferably all of the points in the feature space.
  • the feature space defined by the random distribution of frequencies. This applies to both the real image 302 and the training image 305.
  • the evaluation, or comparison preferably uses a mean squared error-based loss function. This loss function may enable to evaluation space 310 to densely evaluate the discriminator 309. Equation 1 and Equation 2 illustrate how the loss function is applied.
  • min V LSGAN (D) and min V LSGAN (G) may represent the minimization objectives for the
  • Ec, z ⁇ rc, z (c, z ) E X d : z ⁇ pxd,z(xd,z ' )Exd,z ⁇ pxd,z(xd,z ) may represent the expected values for the probability distributions of real, fake and inverted fake data with tensors E bh w c nncZ 7 3 ⁇ 4/lwc respectively.
  • BHWC may represent the indices of a 4D tensor corresponding to batch size, height, width and channels.
  • F may represent the generated data from the generator 306 and the discriminator 309 from the ground real image 202.
  • Equation 1 minimizes the feature point-wise mean squared error difference of the discriminator 309 output estimates, alternating between real image 302 output and training image output 305.
  • the labels for training the discriminator is a tensor map of all 1s for crops extracted from the real image 302, and a map of all zeros for the training image 305.
  • Equation 2 minimizes the feature point-wise mean squared error difference of the generator
  • 306 output estimates, alternating between real image 302 output and training image output 305.
  • the transform layer 303, 307 and transform augmentation block 304, 308 shown in Figure 3 are designed to improve the performance of the network.
  • the transform layer 303, 307 and transform augmentation block 304, 308 shown in Figure 3 are designed to improve the performance of the network.
  • the transform layer 303, 307 and transform augmentation block 304, 308 shown in Figure 3 are designed to improve the performance of the network.
  • the transform layer 303, 307 and transform augmentation block 304, 308 shown in Figure 3 are designed to improve the performance of the network.
  • the transform layer 303, 307 and transform augmentation block 304, 308 shown in Figure 3 are designed to improve the performance of the network.
  • the transform layer 303, 307 and transform augmentation block 304, 308 shown in Figure 3 are designed to improve the performance of the network.
  • the transform 303, 307 and the transform augmentation block 304, 308 provide a stochastic augmentation layer.
  • the inventors have found that adding the transform 303, 307 and the transform augmentation block 304, 308 between the generator 306 and the discriminator 309 may help to avoid over-fitting of the discriminator 309 and improve training stability of the network 301, especially for small datasets.
  • the transform 303, 307 and the transform augmentation block 304, 308 may provide two benefits. Primarily, aid the restoration tasks and add to the final image quality. Secondarily, to prevent the discriminator 309 from saturating and avoid mode collapse.
  • These benefits may be provided by the transform 303, 307 and the transform augmentation block 304, 308 learning a mapping between a latent variable, that draws samples from random distribution, to kernels that correspond in specific area within for Fourier spectrum.
  • Figure 4 illustrates the augmentation block 401 , provided by the transform layer 303, 307 and transform augmentation block 304, 308, in more detail.
  • the augmentation block 401 provides the flow of data between the generator 306 and the discriminator 309.
  • the aim of the augmentation block 401 is to improve the performance of the generator 306 and the discriminator 309. This may be achieved by adding an intermediary layer, the augmentation block 401, that randomly generates frequency distributions under specific constraints in the Fourier space.
  • the augmentation block 401 may provide variability to the input into the discriminator 309 and reduce missing information into the generator 306.
  • the augmentation block 401 may achieve this by a latent variable that may draw samples in dependence of a uniform, or normal, distribution.
  • the dimensions of the samples may depend on what restoration task is to be carried out.
  • the samples are illustrated by Z 402 in Figure 3.
  • the input Z 402 is input into fully connected convolutional layer 404 which reshapes to a tensor 403 and then transforms to the desired kernel 406 spatial dimensions.
  • the augmentation block 401 may comprise a plurality of convolutional layers 404, 405. A larger number of convolutional layers 404, 405 may improve the performance of the augmentation block 401. However, if too many convolutional layers 404, 405 are added, this may cause the optimisation of the augmentation block 401 to collapse. In the example embodiment shown in the augmentation block 401 comprises eight convolutional layers 404, 405.
  • all the convolutional kernels 406 in the residual net have a size of 3x3 with the final output of 64 kernels 406.
  • the kernels 406 are applied to the training and real images 302, 305 by using Equation 3.
  • ⁇ s 1 1 ...
  • I may represent the cumulative sum over the spatial dimensions of height and width.
  • M(s) may represent a binary mask with 1 if S > fc (the cut-off frequency defined per task) and 0 S > fc.
  • Q may represent the Hadamard product, an element wise multiplication between matrices.
  • F ⁇ ⁇ my represent a Fourier transform that generates a complex space of frequencies.
  • A(x,z) may represent the network that generates the output of the transform block.
  • A(x,z) may represent a generative model with a random z conditioned by data x.
  • Equation 3 shows the formulation of the loss function that is used to train the augmentation block 401.
  • M(s) is a binary mask that is task specific and relates to the band of the frequency spectrum used for the frequency distribution.
  • the S() function converts the complex space of Fourier to Real by concatenating real and imaginary data into a real tensor.
  • a linear variant of the residual net is used. Before the first iteration, it is unknown if the kernels will generate a positive, negative or mixed output. The linear variant of the residual net may not penalise a negative response. Additionally, a linear network may exhibit non-linear learning dynamics similar to the equivalent non-linear counterparts. This may enable the network to train more easily and more quickly.
  • Figure 5 illustrates an example of the discriminator 309, 501 in more detail.
  • the discriminator 501 may comprise an architecture for outputting dense probability estimates.
  • Standard classifier architecture learns a many-to-one mapping. This has a disadvantage in that during back propagation the gradient information flow is poor. In this way the classification labels make a smaller tensor size of the back propagated error which is compared to a larger tensor size with probably estimates for every location in the feature space.
  • a dense probability outputting discriminator 501 may enable a higher magnitude of tensor error gradients for the generator 306 to use which may improve the comparison.
  • the discriminator 501 illustrated in Figure 5 comprises six residual blocks 502.
  • the blocks each comprise two convolutions 503, 505.
  • a kernel size of three may increase the discriminative power of the discriminator 501.
  • Figure 6 summarises an example of a method for training an image restoration model.
  • the method comprises receiving a training image.
  • the method comprises generating a processed image from the training image by means of a candidate image restoration model.
  • the method comprises generating one or more samples of the processed image, wherein the samples of the processed image are indicative of a processed image random distribution of frequencies, the samples of the processed image collectively contributing to a processed sample dataset.
  • the method comprises adapting the candidate image restoration model in dependence on the processed sample dataset.
  • FIG. 7 An example of an apparatus 700 configured to implement the method is schematically illustrated in Figure 7.
  • the apparatus 700 may be implemented on an electronic device, such as a laptop, tablet, smart phone or TV.
  • the apparatus 700 comprises a processor 701 configured to process the datasets in the manner described herein.
  • the processor 701 may be implemented as a computer program running on a programmable device such as a Central Processing Unit (CPU).
  • the apparatus 700 comprises a memory 702 which is arranged to communicate with the processor 701.
  • Memory 702 may be a non-volatile memory.
  • the processor 701 may also comprise a cache (not shown in Figure 7), which may be used to temporarily store data from memory 702.
  • the apparatus may comprise more than one processor and more than one memory.
  • the memory may store data that is executable by the processor.
  • the processor may be configured to operate in accordance with a computer program stored in non-transitory form on a machine readable storage medium.
  • the computer program may store instructions for causing the processor to perform its methods in the manner described herein.
  • the apparatus 700 can also be used to restore degraded images using the trained image restoration model described above.
  • the image restoration apparatus may comprise one or more processors, such as processor 701 , and a memory 702 storing in non-transient form data defining program code executable by the processor(s) to implement the image restoration model formed by the image restoration training apparatus.
  • the image restoration apparatus may receive a degraded image.
  • the degraded image may be restored by means of the image restoration model formed by the image restoration training apparatus.
  • the method of restoring the degraded image may follow the same steps carried out by the processors of the image restoration apparatus.
  • the apparatus and method described herein may be practically applied to other data inputs in other fields such as images and videos in relation to computer vision, to corpora of text in relation to natural language processing or to gene expressions in relation to bioinformatics.
  • the applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims.
  • the applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

Described is an apparatus (700) for training an image restoration model. The apparatus (700) comprises one or more processors (701). The processors (701) are configured to receive (601) a training image (206, 305); generate (602) a processed image (207) from the training image (206, 305) by means of a candidate image restoration model; generate (603) one or more samples (208a, 208b, 208c) of the processed image (207), wherein the samples (208a, 208b, 208c) of the processed image (207) are indicative of a processed image random distribution of frequencies, the samples (208a, 208b, 208c) of the processed image (207) collectively contributing to a processed image sample dataset (209); and adapt (604) the candidate image restoration model in dependence on the processed image dataset (209). The processed image random distribution of frequencies can enable the apparatus to introduce more variability and diversity into the training of the image restoration model. The increased variability and diversity in the extracted image bands may better train the model at identifying and restoring missing signals.

Description

LEARNABLE AUGMENTATION SPACE FOR DENSE GENERATIVE ADVERSARIAL
NETWORKS
FIELD OF THE INVENTION
This invention relates to image restoration, for example using dense Generative Adversarial Networks (GANs).
BACKGROUND
Image restoration is a field of research which studies methods to recover a degraded image to its original form. The techniques may also be used to improve an image of inherently lower quality image that has not been degraded. There are many areas of application such as super resolution, demosaicking, denoising, deblurring and other. Recently convolutional networks have shown tremendous ability to recover missing signals in comparison to what traditional methods could do in the past.
Within this class of deep learning models there are two dominant learning methods that optimize their models' parameters on different objectives. The classification of these sets of methods is based on the objective of the loss function rather than the architecture itself.
One is distance metrics between the estimated image and the ground real. This metric is typical in supervised methods that aim to minimise a value between the estimated image and the ground real.
The other is perception metrics that are trained based on human scores to generate visually pleasing images accompanied by an adversarial training scheme.
There are significant differences between the outputs or results of the two methods. The ground real based metrics aim to output the best possible real structure with blur, whereas perceptual metrics output sharper images with unrealistic structure, or hallucinated structure.
The perceptual metrics are generally favoured for use in GANs as they produce a sharper image to the human eye. However, perceptual metrics have not yet outperformed the ground real based metrics when measured using peak signal-to-noise ratio (PSNR) as the hallucinate affects the structure.
It is desirable to develop a method that overcomes the above problems. SUMMARY
According to a first aspect there is provided an apparatus for training an image restoration model, the apparatus comprising one or more processors configured to: receive a training image; generate a processed image from the training image by means of a candidate image restoration model; generate one or more samples of the processed image, wherein the samples of the processed image are indicative of a processed image random distribution of frequencies, the samples of the processed image collectively contributing to a processed image sample dataset; and adapt the image restoration model in dependence on the processed image dataset.
By generating the one more samples of the processed image indicative of the processed image random distribution of frequencies this may introduce more variability and diversity into the training of the image restoration model. The increased variability and diversity in the extracted image bands may better train the model at identifying and restoring missing signals.
By collectively contributing the samples of the processed image to a processed image dataset, this may enable samples to be stored as a group so that they can easily be used in the next steps.
In some implementations, the apparatus may be configured to adapt the image restoration model depending on: receiving a real image corresponding to the processed image; and generating one or more samples of the real image, the samples of the real image are indicative of a real image random distribution of frequencies, the samples of the real image collectively contributing to a real image sample dataset.
By generating the one more samples of the real image to be indicative of the real image random distribution of frequencies, this may introduce more variability and diversity into the training of the image restoration model. The increased variability and diversity in the extracted image bands may better train the model at identifying and restoring missing signals.
By collectively contributing the samples of the real image to a real image dataset, this may enable samples to be stored as a group so that they can easily be used in the next steps.
In some implementations, the apparatus may be configured to adapt the image restoration model in dependence on the processed sample dataset by comparing the processed sample dataset with the real sample dataset and adapting the candidate image restoration model in dependence on that comparison.
By using both the real and processed sample datasets, the comparison may take account of the differences between the samples when adapting and training the image restoration model.
In some implementations, the apparatus may be configured to adapt the image restoration model in dependence on the processed sample dataset by comparing each of the processed sample data with the real sample dataset and adapting the candidate image restoration model in dependence on that comparison.
By comparing the data of each of the processed samples, instead of the sample data as a whole, this may provide a more detailed dataset for the image restoration model to be trained with. This may result in an improved learning apparatus.
In some implementations, the apparatus may be configured to compare the processed sample dataset with the real sample dataset by comparing the mean squared error difference between the processed sample dataset and the real sample dataset and adapting the candidate image restoration model in dependence on that comparison.
By comparing the data sets using a mean squared error different this provides a metric for measuring the accuracy of the image restoration training apparatus and a quantifiable output to use as a gradient for the input of the next iteration of the training apparatus.
In some implementations, the apparatus may be configured to provide the processed image random distribution of frequencies at different frequency levels to the real image random distribution of frequencies.
By providing the distribution of frequencies at different levels for the real and processed image, this may further increase the variation and diversity of the training inputs as described above.
In some implementations, the apparatus may be configured to after generating samples of the processed image and before adapting the image restoration model, transform the processed sample dataset into the frequency domain. In some implementations, the apparatus may be configured to after generating samples of the real image and before adapting the image restoration model, transform the real sample dataset into the frequency domain.
By transforming the image data sets into the frequency domain, the missing structure may be easier to separate. This may improve the ability of the image restoration training apparatus to target the missing signals and use them for training.
In some implementations, the apparatus may be configured to transform the sample dataset into the frequency domain by a linear transformation.
Before the first iteration, it is unknown if the kernels will generate a positive or negative or mixed output. The linear variant of the residual net does not penalise a negative response. Additionally, a linear network may exhibit non-linear learning dynamics similar to the equivalent non-linear counterparts. This may enable the network to train more easily and more quickly.
In some implementations, the apparatus may be configured to carry out the steps above for one or more subsequent processed image(s).
In some implementations, the apparatus may be configured to carry out the steps above for one or more subsequent real image(s), each subsequent real image corresponding to a respective subsequent processed image.
By carrying out the steps for subsequent processed and real images, the image restoration training model may develop and improve over each iteration. Over the iterations the model may tend to an optimum.
In some implementations, the apparatus may be configured to provide each subsequent processed image random distribution of frequencies at different frequency levels to the preceding processed image distribution of frequencies.
In some implementations, the apparatus may be configured to provide each subsequent real image random distribution of frequencies at different frequency levels to the preceding real image distribution of frequencies.
By providing the distribution of frequencies at a different level to the preceding iteration, this may further increase the variation and diversity of the training inputs as described above. According to a second aspect there is provided an image restoration apparatus, the apparatus comprising one or more processors and a memory storing in non-transient form data defining program code executable by the processor(s) to implement an image restoration model formed by the apparatus above, the apparatus being configured to: receive a degraded image; and restore the degraded image by means of the image restoration model.
By using the image restoration model formed by the apparatus above, the image restoration apparatus may receive the benefits provided by the training apparatus described above.
According to a third aspect there is provided a method for training an image restoration model, the method comprising: receiving a training image; generating a processed image from the training image by means of a candidate image restoration model; generating one or more samples of the processed image, wherein the samples of the processed image are indicative of a processed image random distribution of frequencies, the samples of the processed image collectively contributing to a processed sample dataset; and adapting the candidate image restoration model in dependence on the processed sample dataset.
By generating the one more samples of the processed image indicative of the processed image random distribution of frequencies, this may introduce more variability and diversity into the training of the image restoration model. The increased variability and diversity in the extracted image bands may train the model better at identifying and restoring missing signals.
By collectively contributing the samples of the processed image to a processed image dataset, this may enable to samples to be stored as a group so that they can easily be used in the next steps.
According to a fourth aspect there is provided a method for image restoration, the method implementing the image restoration model formed by the method above and comprising: receiving a degraded image; and restoring the degraded image by means of the image restoration model.
By using the image restoration model formed by the method above, the image restoration method may receive the benefits provided by the training apparatus described above. BRIEF DESCRIPTION OF THE FIGURES
The present invention will now be described by way of example with reference to the accompanying drawings. In the drawings:
Figures 1A-1C illustrate examples of images that have undergone image restoration. Figure 1A illustrates image demosaicking; Figure 1B illustrates image denoising; Figure 1C illustrates image super resolution.
Figure 2 schematically illustrates the stages the images may undergo in the image restoration training apparatus.
Figure 3 schematically illustrates an exemplary network architecture used in the image restoration training apparatus.
Figure 4 schematically illustrates an exemplary augmentation block.
Figure 5 schematically illustrates an exemplary discriminator.
Figure 6 shows an example of a method for training an image restoration model.
Figure 7 shows an example of an apparatus configured to perform the methods described herein.
DETAILED DESCRIPTION
The apparatuses and methods described herein concern training an image restoration model and using said model to restore raw images.
Embodiments of the present invention tackle one or more of the problems previously mentioned by generating samples of the processed image that are indicative of a processed image random distribution of frequencies. In this way, it is possible to enable the model to learn from a range of frequency distributions and consequently enable the model to reduce the hallucination in the restored image.
Figures 1A-1C show examples of image restoration operations.
Figure 1A shows an example of image demosaicking. The image 101 is captured from the sensor as a mosaic 102 which generates a mosaicked image 103. The mosaicked image 103 has missing signals. An algorithm is used to perform demosaicking on the mosaicked image 103 to recover the missing signals to generate an output image 104. The output image 104 is clearer than the input image 103.
Figure 1 B shows an example of image denoising. An input image 105 comprises a level of noise which makes the photo unclear. An algorithm is used to perform denoising on the input image 105 to remove the noise by recovering the missing signals to generate an output image 106. The output image 106 has reduced noise and is clearer than the input image 105.
Figure 1C shows an example of image super-resolution. An original image 107 is zoomed in to generate an input image 105 for the super-resolution. The zooming in makes the input image 105 unclear. An algorithm is used to perform super resolution on the input image 108 to remove recover the missing signals to generate an output image 109. The output image 109 is clearer than the input image 108.
Embodiments of the herein described apparatus and method may be used to carry out a range of image restoration tasks including image demosaicking, denoising, super-resolution, deblurring, amongst others. The input image may also include a range of features, including images and videos in relation to computer vision, to corpora of text in relation to natural language processing, or to gene expressions in relation to bioinformatics.
Figure 2 shows, in a preferred implementation, the stages that the input images 202, 206 may undergo in the image restoration training apparatus and training method described herein.
The image 202 is a real image and the image 206 is a training image. The real image may be a photographically captured image or an artificially generated image. The real image may also be referred to as a truth or ground truth image. The training image may be formed by degrading the real image. Then the real image may represent a target against which a restored version of the training image can be gauged.
The training image 206 may have a range of imperfections including mosaicking, noise, blur and so on. In any case, the training image 206 is likely to have missing signals. The real image 202 corresponds to the training image 206. In other words, the real image 202 is of the same content of the training image 206. For example, as shown in Figure 1 A, both images 202, 206 would be of the same bird’s head. The real image 202 preferably does not have the imperfections that are found in the training image 206. In other words, the real image preferably does not have the missing signals.
A processed image 207 is generated from the training image 206. The processed image 207 is preferably generated by applying a candidate image restoration model to the training image 206. The processing applied by the candidate image restoration model will depend on the state of the training image 206. If the training image 206 is at the required resolution, then the processed image 207 will be the same size. If the training image 206 is at a lower resolution, then the candidate image restoration model will upsample the training image 206 to the required resolution for the next steps.
Samples 208 of the processed image 207 are generated. The apparatus 301 may generate one or more samples 208. In Figure 2, there are three samples 208a, 208b, 208c illustrated. The samples 208 are indicative of a processed image 207 random distribution of frequencies. In other words, the sample 208 may be selected within a certain band or level of frequencies. The distribution of frequencies is preferably random or stochastic. In other words, the distribution of frequencies may be different for each sample 208 and each processed image 207 and each iteration of processed image 207, as explained below.
The samples 208 of the processed image 207 may collectively contribute to a processed image dataset 209. The processed image dataset 209 may enable to samples 208 to be stored as a group so that they can easily be used in the next steps.
Samples 203 of the real image 202 are generated. The apparatus may generate one or more samples 203. In Figure 2, there are three samples 203a, 203b, 203c illustrated. The samples 203 are indicative of a real image 202 random distribution of frequencies. In other words, the sample 203 may be selected within a certain band or level of frequencies. The distribution of frequencies is preferably random or stochastic. In other words, the distribution of frequencies may be different for each sample 203 and each real image 202 and each iteration of real image 202, as explained below. The distribution of frequencies level may also be different for the real image 202 and the processed image 207.
The samples 203 of the real image 202 may collectively contribute to a processed image dataset 204. The processed image dataset 204 may enable to samples 203 to be stored as a group so that they can easily be used in the next steps. The processed sample dataset 209 is transformed into the frequency domain to generate a frequency domain processed sample dataset 201. This may be using a Fourier transform convert the processed sample dataset 209 into the complex space. Preferably the transformation is linear.
The real sample dataset 204 is transformed into the frequency domain to generate a frequency domain real sample dataset 205. This may be performed using a Fourier transform convert the processed sample dataset 204 into the complex space. Preferably the transformation is linear.
The processed sample data set 209 is compared with the real sample dataset 204. It may also be suitable to compare each of the processed sample data 208 with the real sample dataset 204. This may enable the weight of each of the processed sample data 208 to be varied in the comparison step. It may also be suitable to compare the frequency domain processed sample data set 209 with the frequency domain real sample dataset 204 if the processed sample data set 209 and the real sample dataset 204 have been transformed into the frequency domain. The comparison of the processed sample data set 209 with the real sample dataset 204 may be carried out using the mean squared error difference between the two datasets. It may also be suitable to use other forms of error difference.
The image restoration model is adapted in dependence on the processed image dataset 209. It also may be suitable to adapt the image restoration model in dependence of the frequency domain processed image dataset 210 if the processed image dataset 209 has been transformed into the frequency domain. It may also be suitable to adapt the image restoration model in dependence on the comparison between the processed sample data set 209 with the real sample dataset 204. This may involve using the error difference between the datasets, as described above.
The apparatus may carry out the above steps for one or more subsequent processed images 207. The apparatus may carry out the above steps for one or more subsequent real images 202. The one or more subsequent iterations that the apparatus 301 carries out may further adapt and improve the image restoration model. The real image 202 may correspond with the processed image 207 in any given iteration of the apparatus 301 steps, i.e. the real image 202 and the processed image 207 are both the same image of a bird. Each iteration may use a different image, i.e. one iteration may be an image of a bird and one iteration may be an image of a zebra. Or, each iteration may use the same image, i.e. the image is always the same image of a bird. The training images 206 used to generate the processed image 207 may be provided from a bank of training images 206. For example, there may be a bank of images of animals to be used as training images 206.
The processed random distribution of frequencies in the subsequent iteration may preferably be at a different frequency level or band from the previous iteration. In other words, the processed random distribution of frequencies may not be the same in any two consecutive iterations. The processed random distribution of frequencies may even be different in every iteration such that the processed random distribution of frequencies is never the same.
The real random distribution of frequencies in the subsequent iteration may preferably be at a different frequency level or band from the previous iteration. In other words, the real random distribution of frequencies may not be the same in any two consecutive iterations. The real random distribution of frequencies may even be different in every iteration such that the real random distribution of frequencies is never the same.
An exemplary embodiment of the image restoration training apparatus and image restoration apparatus will now be described in more detail.
Figure 3 shows an exemplary embodiment of the network architecture 301 used in the image restoration training apparatus.
Figure 3 shows a real image path and a training image path. The real image 302 is input into a real transformer 303. The real transformer 303 may carry out the above-described transforming step to generate frequency domain real sample dataset 205. The real transformer 303 may also carry out the above-described generating step to generate samples 203 of the real image 202 which collectively contribute to a real image data set 204. Alternatively, the generation of samples 203 may be carried out by a separate component.
The output of the real transformer 303, the real image data set 204, is input into the real transform augmentation space 304. The output of the real transform augmentation space 304 is input into the dense discriminator 309.
The training image 305 is input into the restoration generator 306. The restoration generator 306 may carry out the above-described step of generating the processed image 207 from the training image 206. The output of the restoration generator 306 is input into the training transformer 307. The processed image 207, generated from the training image 305, is input into a training transformer 307. The training transformer 307 may carry out the above-described transforming step to generate frequency domain processed sample dataset 210. The training transformer 307 may also carry out the above-described generating step to generate samples 208 of the processed image 207 which collectively contribute to a processed image data set 209. Alternatively, the generation of samples 209 may be carried out by a separate component.
The output of the training transformer 307, the processed image data set 209, is input into the training transform augmentation space 308. The output of the training transform augmentation space 308 is input into the dense discriminator 309.
In another exemplary embodiment, the real transformer 303 and the training transformer 307 may be formed by a single transformer. Similarly, the real transform augmentation space 304 and the training transform augmentation space 308 may be formed by a single transform augmentation space.
The output of the dense discriminator 309 may be input into the evaluation space 310.
The transform augmentation space 304, 308, the dense discriminator 309 and the evaluation space 310 may carry out the above-described steps of comparing the processed sample data set 209 with the real sample dataset 204 and adapting the image restoration training model.
An important component in the adversarial network used in the image restoration training apparatus is the augmentation layer 303, 304, 307, 308 that transforms data, the processed image 207, from the generator 306 to the discriminator 309. This part of the adversarial network is important for avoiding saturation and mode collapse. This is where the image restoration stops learning from training iterations as the errors may converge to zero incorrectly.
It is common to employ augmentation methods which are explicitly designed, and hand tuned. The inventors have found that a learnable augmentation layer that introduces the variability is preferable to avoid overfitting. The approach described herein introduces the variability by sampling the processed image and the real image that are indicative, or dependent on a random distribution of frequencies. Mapping a random distribution of frequencies may increase the variability component to the desired end feature space. Additionally, the end feature space is preferably in the frequency domain because frequency domain may provide an easier method of separating the missing structure. Without transforming into the frequency domain, the missing information is scattered on the image whereas in the frequency domain the missing structure can be segmented by hard linear borders. The ease to separate missing structure in the frequency domain may therefore be beneficial to the learning of the training apparatus.
The combination of mapping a random distribution and transforming the output into the frequency domain has been found to improve the PSNR metric of the image restoration apparatus.
Referring to Figure 3, a primary objective is to learn a distribution, or image restoration model, that matches the ground real image 302. Restoration tasks seek to recover the missing information in the degraded image. The missing information may often lie within the high frequency spectrum. The image restoration task may include super-resolution, denoising, a joint operation or any other image restoration task.
As described above, and as shown in Figure 3, the network 301 comprises a generator G 306, a discriminator D 309 and an augmentation layer A 304, 308.
The choice of the generator 306 can be any supervised network that is suitable for the chosen task. Preferably the generator 306 is a Regularised Generative Adversarial Network (RCAN) due to the diverse features added to the architecture. The RCAN may combine residual in residual blocks along with self-attention which may enable it to translate any possible performance gains in other simpler architectures.
The discriminator 309 is preferably a dense estimation internal kernel Generative Adversarial Network (GAN). This may enable a richer flow of information in the apparatus 301. The dense discriminator 309 preferably estimates from the spatial dimensions and the channel dimension to form a tensor. Specifically, the training image 305 passes through the generator 306, augmentation space 308, and outputs as tensor processed image samples 208 (=D(A(G(x), z)) and tensor real image samples (=D(A(x, z)).
As explained above, the images samples 203, 208 are indicative of a random distribution of frequencies. The evaluation space 310 then performs the evaluation, or comparison, of the images on preferably all of the points in the feature space. The feature space defined by the random distribution of frequencies. This applies to both the real image 302 and the training image 305. The evaluation, or comparison, preferably uses a mean squared error-based loss function. This loss function may enable to evaluation space 310 to densely evaluate the discriminator 309. Equation 1 and Equation 2 illustrate how the loss function is applied.
Figure imgf000015_0001
Symbols: min VLSGAN(D) and min VLSGAN(G) may represent the minimization objectives for the
D G discriminator D 309 and the generator G 306.
Ec,z~rc,z(c,z ) EXd:z~pxd,z(xd,z')Exd,z~pxd,z(xd,z ) may represent the expected values for the probability distributions of real, fake and inverted fake data with tensors Ebhwc nncZ 7¾/lwc respectively.
BHWC may represent the indices of a 4D tensor corresponding to batch size, height, width and channels.
F may represent the generated data from the generator 306 and the discriminator 309 from the ground real image 202.
Equation 1 minimizes the feature point-wise mean squared error difference of the discriminator 309 output estimates, alternating between real image 302 output and training image output 305. The labels for training the discriminator is a tensor map of all 1s for crops extracted from the real image 302, and a map of all zeros for the training image 305.
Equation 2 minimizes the feature point-wise mean squared error difference of the generator
306 output estimates, alternating between real image 302 output and training image output 305.
The transform layer 303, 307 and transform augmentation block 304, 308 shown in Figure 3 are designed to improve the performance of the network. Preferably, the transform layer 303,
307 and transform augmentation block 304, 308 provide a stochastic augmentation layer. The inventors have found that adding the transform 303, 307 and the transform augmentation block 304, 308 between the generator 306 and the discriminator 309 may help to avoid over-fitting of the discriminator 309 and improve training stability of the network 301, especially for small datasets. In other words, the transform 303, 307 and the transform augmentation block 304, 308 may provide two benefits. Primarily, aid the restoration tasks and add to the final image quality. Secondarily, to prevent the discriminator 309 from saturating and avoid mode collapse. These benefits may be provided by the transform 303, 307 and the transform augmentation block 304, 308 learning a mapping between a latent variable, that draws samples from random distribution, to kernels that correspond in specific area within for Fourier spectrum.
Figure 4 illustrates the augmentation block 401 , provided by the transform layer 303, 307 and transform augmentation block 304, 308, in more detail. The augmentation block 401 provides the flow of data between the generator 306 and the discriminator 309. The aim of the augmentation block 401 is to improve the performance of the generator 306 and the discriminator 309. This may be achieved by adding an intermediary layer, the augmentation block 401, that randomly generates frequency distributions under specific constraints in the Fourier space.
The augmentation block 401 may provide variability to the input into the discriminator 309 and reduce missing information into the generator 306. The augmentation block 401 may achieve this by a latent variable that may draw samples in dependence of a uniform, or normal, distribution. The dimensions of the samples may depend on what restoration task is to be carried out. The samples are illustrated by Z 402 in Figure 3.
The input Z 402 is input into fully connected convolutional layer 404 which reshapes to a tensor 403 and then transforms to the desired kernel 406 spatial dimensions. The augmentation block 401 may comprise a plurality of convolutional layers 404, 405. A larger number of convolutional layers 404, 405 may improve the performance of the augmentation block 401. However, if too many convolutional layers 404, 405 are added, this may cause the optimisation of the augmentation block 401 to collapse. In the example embodiment shown in the augmentation block 401 comprises eight convolutional layers 404, 405.
Preferably all the convolutional kernels 406 in the residual net have a size of 3x3 with the final output of 64 kernels 406. The kernels 406 are applied to the training and real images 302, 305 by using Equation 3.
Figure imgf000016_0001
ås=11 ... I may represent the cumulative sum over the spatial dimensions of height and width. M(s) may represent a binary mask with 1 if S > fc (the cut-off frequency defined per task) and 0 S > fc.
Q may represent the Hadamard product, an element wise multiplication between matrices.
S(. ) = [realQ imag ()] may represent a function that splits the complex space in real and imaginary parts. These may be concatenated together to form a new tensor.
F{ } my represent a Fourier transform that generates a complex space of frequencies. åfe=i(-) may represent a cumulative sum over channels as outputted by A(x,z).
A(x,z) may represent the network that generates the output of the transform block. A(x,z) may represent a generative model with a random z conditioned by data x.
Equation 3 shows the formulation of the loss function that is used to train the augmentation block 401. M(s) is a binary mask that is task specific and relates to the band of the frequency spectrum used for the frequency distribution. The S() function converts the complex space of Fourier to Real by concatenating real and imaginary data into a real tensor.
Preferably a linear variant of the residual net is used. Before the first iteration, it is unknown if the kernels will generate a positive, negative or mixed output. The linear variant of the residual net may not penalise a negative response. Additionally, a linear network may exhibit non-linear learning dynamics similar to the equivalent non-linear counterparts. This may enable the network to train more easily and more quickly.
Figure 5 illustrates an example of the discriminator 309, 501 in more detail. Preferably the discriminator 501 may comprise an architecture for outputting dense probability estimates. Standard classifier architecture learns a many-to-one mapping. This has a disadvantage in that during back propagation the gradient information flow is poor. In this way the classification labels make a smaller tensor size of the back propagated error which is compared to a larger tensor size with probably estimates for every location in the feature space. A dense probability outputting discriminator 501 may enable a higher magnitude of tensor error gradients for the generator 306 to use which may improve the comparison.
The discriminator 501 illustrated in Figure 5 comprises six residual blocks 502. The blocks each comprise two convolutions 503, 505. On either side of the convolutions 503, 505, there is a rectified linear unit (ReUL) 504 and an addition unit 506. A kernel size of three may increase the discriminative power of the discriminator 501. Figure 6 summarises an example of a method for training an image restoration model. At step 601 , the method comprises receiving a training image. At step 602, the method comprises generating a processed image from the training image by means of a candidate image restoration model. At step 603, the method comprises generating one or more samples of the processed image, wherein the samples of the processed image are indicative of a processed image random distribution of frequencies, the samples of the processed image collectively contributing to a processed sample dataset. At step 604, the method comprises adapting the candidate image restoration model in dependence on the processed sample dataset.
An example of an apparatus 700 configured to implement the method is schematically illustrated in Figure 7. The apparatus 700 may be implemented on an electronic device, such as a laptop, tablet, smart phone or TV.
The apparatus 700 comprises a processor 701 configured to process the datasets in the manner described herein. For example, the processor 701 may be implemented as a computer program running on a programmable device such as a Central Processing Unit (CPU). The apparatus 700 comprises a memory 702 which is arranged to communicate with the processor 701. Memory 702 may be a non-volatile memory. The processor 701 may also comprise a cache (not shown in Figure 7), which may be used to temporarily store data from memory 702. The apparatus may comprise more than one processor and more than one memory. The memory may store data that is executable by the processor. The processor may be configured to operate in accordance with a computer program stored in non-transitory form on a machine readable storage medium. The computer program may store instructions for causing the processor to perform its methods in the manner described herein.
The apparatus 700 can also be used to restore degraded images using the trained image restoration model described above. The image restoration apparatus may comprise one or more processors, such as processor 701 , and a memory 702 storing in non-transient form data defining program code executable by the processor(s) to implement the image restoration model formed by the image restoration training apparatus. The image restoration apparatus may receive a degraded image. The degraded image may be restored by means of the image restoration model formed by the image restoration training apparatus.
The method of restoring the degraded image may follow the same steps carried out by the processors of the image restoration apparatus. The apparatus and method described herein may be practically applied to other data inputs in other fields such as images and videos in relation to computer vision, to corpora of text in relation to natural language processing or to gene expressions in relation to bioinformatics. The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.

Claims

1. An apparatus (700) for training an image restoration model, the apparatus comprising one or more processors (701) configured to: receive (601) a training image (206, 305); generate (602) a processed image (207) from the training image (206) by means of a candidate image restoration model; generate (603) one or more samples (208a, 208b, 208c) of the processed image (207), wherein the samples (208a, 208b, 208c) of the processed image (207) are indicative of a processed image random distribution of frequencies, the samples (208a, 208b, 208c) of the processed image (207) collectively contributing to a processed image sample dataset (209); and adapt (604) the image restoration model in dependence on the processed image dataset (209).
2. The apparatus (700) according to claim 1, wherein the one or more processors (701) are configured to: adapt (604) the image restoration model depending on: receiving a real image (202) corresponding to the processed image (207); and generating one or more samples (203a, 203b, 203c) of the real image (202), the samples (203a, 203b, 203c) of the real image (202) are indicative of a real image random distribution of frequencies, the samples (203a, 203b, 203c) of the real image (202) collectively contributing to a real image sample dataset (204).
3. The apparatus (700) according to claim 2, wherein the one or more processors (701) are configured to adapt the image restoration model in dependence on the processed sample dataset (209) by comparing the processed sample dataset (209) with the real sample dataset (204) and adapting (604) the candidate image restoration model in dependence on that comparison.
4. The apparatus (700) according to claims 2 or 3, wherein the one or more processors (701) are configured to adapt the image restoration model in dependence on the processed sample dataset (209) by comparing each of the processed sample data (208a, 208b, 208c) with the real sample dataset (204) and adapting (604) the candidate image restoration model in dependence on that comparison.
5. The apparatus (700) according to claim 3 or 4, wherein the one or more processors (701) are configured to compare the processed sample dataset (209) with the real sample dataset (204) by comparing the mean squared error difference between the processed sample dataset (209) and the real sample dataset (204) and adapting (604) the candidate image restoration model in dependence on that comparison.
6. The apparatus (700) according to claims 2 to 5, wherein the one or more processors (701) are configured to provide the processed image random distribution of frequencies at different frequency levels to the real image random distribution of frequencies.
7. The apparatus (700) according to any preceding claim, wherein the one or more processors (701) are configured to: after generating (603) samples of the processed image and before adapting (604) the image restoration model, transform the processed sample dataset (209) into the frequency domain (210).
8. The apparatus (700) according to any of claims 2 to 7, wherein the one or more processors (701) are configured to: after generating samples of the real image and before adapting the image restoration model, transform the real sample dataset (204) into the frequency domain (205).
9. The apparatus (700) according to any of claims 7 or 8, wherein the one or more processors (701) are configured to transform the sample dataset (204, 209) into the frequency domain (205, 210) by a linear transformation.
10. The apparatus (700) according to any preceding claim, wherein the one or more processors (701) are configured to: carry out the steps of any preceding claim for one or more subsequent processed image(s) (207).
11. The apparatus (700) according to any of claims 2 to 10, wherein the one or more processors (701) are configured to: carry out the steps of any of claims 2 to 10 for one or more subsequent real image(s) (202), each subsequent real image (202) corresponding to a respective subsequent processed image (207).
12. The apparatus (700) according to claims 10 or 11, wherein the one or more processors (701) are configured to provide each subsequent processed image (207) random distribution of frequencies at different frequency levels to the preceding processed image (207) distribution of frequencies.
13. The apparatus (700) according to claims 11 or 12, wherein the one or more processors (701) are configured to provide each subsequent real image (202) random distribution of frequencies at different frequency levels to the preceding real image (202) distribution of frequencies.
14. An image restoration apparatus (700), the apparatus comprising one or more processors (701) and a memory (702) storing in non-transient form data defining program code executable by the processor(s) (701) to implement an image restoration model formed by the apparatus (700) of any of claims 1 to 13, the apparatus (700) being configured to: receive a degraded image; and restore the degraded image by means of the image restoration model.
15. A method (600) for training an image restoration model, the method comprising: receiving (601) a training image (206); generating (602) a processed image (207) from the training image by means of a candidate image restoration model; generating (603) one or more samples (208a, 208b, 208c) of the processed image (207), wherein the samples (208a, 208b, 208c) of the processed image (207) are indicative of a processed image random distribution of frequencies, the samples (208a, 208b, 208c) of the processed image (207) collectively contributing to a processed sample dataset (209); and adapting (604) the candidate image restoration model in dependence on the processed sample dataset (209).
16. A method for image restoration, the method implementing the image restoration model formed by the method of claim 15 and comprising: receiving a degraded image; and restoring the degraded image by means of the image.
PCT/EP2021/056583 2021-03-16 2021-03-16 Learnable augmentation space for dense generative adversarial networks WO2022194344A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2021/056583 WO2022194344A1 (en) 2021-03-16 2021-03-16 Learnable augmentation space for dense generative adversarial networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2021/056583 WO2022194344A1 (en) 2021-03-16 2021-03-16 Learnable augmentation space for dense generative adversarial networks

Publications (1)

Publication Number Publication Date
WO2022194344A1 true WO2022194344A1 (en) 2022-09-22

Family

ID=75111569

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2021/056583 WO2022194344A1 (en) 2021-03-16 2021-03-16 Learnable augmentation space for dense generative adversarial networks

Country Status (1)

Country Link
WO (1) WO2022194344A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117455013A (en) * 2023-11-10 2024-01-26 无锡鸣石峻致医疗科技有限公司 Training sample data generation method, system, electronic equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200051260A1 (en) * 2018-08-07 2020-02-13 BlinkAI Technologies, Inc. Techniques for controlled generation of training data for machine learning enabled image enhancement
CN111709890A (en) * 2020-06-12 2020-09-25 北京小米松果电子有限公司 Training method and device of image enhancement model and storage medium
US20200349673A1 (en) * 2018-01-23 2020-11-05 Nalbi Inc. Method for processing image for improving the quality of the image and apparatus for performing the same

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200349673A1 (en) * 2018-01-23 2020-11-05 Nalbi Inc. Method for processing image for improving the quality of the image and apparatus for performing the same
US20200051260A1 (en) * 2018-08-07 2020-02-13 BlinkAI Technologies, Inc. Techniques for controlled generation of training data for machine learning enabled image enhancement
CN111709890A (en) * 2020-06-12 2020-09-25 北京小米松果电子有限公司 Training method and device of image enhancement model and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GUO YECAI ET AL: "Underwater Image Enhancement Using a Multiscale Dense Generative Adversarial Network", IEEE JOURNAL OF OCEANIC ENGINEERING, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 45, no. 3, 4 June 2019 (2019-06-04), pages 862 - 870, XP011799270, ISSN: 0364-9059, [retrieved on 20200716], DOI: 10.1109/JOE.2019.2911447 *
PANAGIOTOPOULOU ANTIGONI ET AL: "Sentinel-2 and SPOT-7 Images in Machine Learning Frameworks for Super-Resolution", 23 August 2020, COMPUTER VISION - ECCV 2020 : 16TH EUROPEAN CONFERENCE, GLASGOW, UK, AUGUST 23-28, 2020 : PROCEEDINGS; PART OF THE LECTURE NOTES IN COMPUTER SCIENCE ; ISSN 0302-9743; [LECTURE NOTES IN COMPUTER SCIENCE; LECT.NOTES COMPUTER], PAGE(S) 462 - 476, ISBN: 978-3-030-58594-5, XP047577477 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117455013A (en) * 2023-11-10 2024-01-26 无锡鸣石峻致医疗科技有限公司 Training sample data generation method, system, electronic equipment and medium

Similar Documents

Publication Publication Date Title
US11645835B2 (en) Hypercomplex deep learning methods, architectures, and apparatus for multimodal small, medium, and large-scale data representation, analysis, and applications
CN109191382B (en) Image processing method, device, electronic equipment and computer readable storage medium
US10552944B2 (en) Image upscaling with controllable noise reduction using a neural network
US20190130212A1 (en) Deep Network Embedding with Adversarial Regularization
Huang et al. Sparse signal recovery via generalized entropy functions minimization
Chen et al. Convolutional neural network based dem super resolution
US9171226B2 (en) Image matching using subspace-based discrete transform encoded local binary patterns
Gong et al. Combining sparse representation and local rank constraint for single image super resolution
CN112164008A (en) Training method of image data enhancement network, and training device, medium, and apparatus thereof
Cui et al. Deep neural network based sparse measurement matrix for image compressed sensing
Niu et al. Machine learning-based framework for saliency detection in distorted images
Sun et al. Compressive superresolution imaging based on local and nonlocal regularizations
CN109697442B (en) Training method and device of character recognition model
JP7188856B2 (en) Dynamic image resolution evaluation
Lian et al. Transfer orthogonal sparsifying transform learning for phase retrieval
WO2022194344A1 (en) Learnable augmentation space for dense generative adversarial networks
Phillips et al. Class embodiment autoencoder (CEAE) for classifying the botanical origins of honey
Karthikeyan et al. Energy based denoising convolutional neural network for image enhancement
Lyu et al. Identifiability-guaranteed simplex-structured post-nonlinear mixture learning via autoencoder
Patil et al. Deep hyperparameter transfer learning for diabetic retinopathy classification
CN115937121A (en) Non-reference image quality evaluation method and system based on multi-dimensional feature fusion
CN113554047A (en) Training method of image processing model, image processing method and corresponding device
Seiffert ANNIE—Artificial Neural Network-based Image Encoder
Zeng et al. Slice-based online convolutional dictionary learning
Gunn et al. Regularized training of intermediate layers for generative models for inverse problems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21713341

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21713341

Country of ref document: EP

Kind code of ref document: A1