WO2022194345A1 - Modular and learnable image signal processor - Google Patents

Modular and learnable image signal processor Download PDF

Info

Publication number
WO2022194345A1
WO2022194345A1 PCT/EP2021/056588 EP2021056588W WO2022194345A1 WO 2022194345 A1 WO2022194345 A1 WO 2022194345A1 EP 2021056588 W EP2021056588 W EP 2021056588W WO 2022194345 A1 WO2022194345 A1 WO 2022194345A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
module
image processor
perform
modules
Prior art date
Application number
PCT/EP2021/056588
Other languages
French (fr)
Inventor
Eduardo PEREZ PELLITERO
Marcos CONDE
Ales LEONARDIS
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/EP2021/056588 priority Critical patent/WO2022194345A1/en
Publication of WO2022194345A1 publication Critical patent/WO2022194345A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/46Colour picture communication systems
    • H04N1/56Processing of colour picture signals
    • H04N1/60Colour correction or control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/46Colour picture communication systems
    • H04N1/56Processing of colour picture signals
    • H04N1/60Colour correction or control
    • H04N1/6058Reduction of colour to a range of reproducible colours, e.g. to ink- reproducible colour gamut
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/46Colour picture communication systems
    • H04N1/56Processing of colour picture signals
    • H04N1/60Colour correction or control
    • H04N1/6075Corrections to the hue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/46Colour picture communication systems
    • H04N1/56Processing of colour picture signals
    • H04N1/60Colour correction or control
    • H04N1/6077Colour balance, e.g. colour cast correction

Definitions

  • This invention relates to the processing of images in an Image Signal Processor, for example for the transformation of a RAW image to an RGB image.
  • the present invention concerns the processing required to convert a RAW reading from a camera sensor to a high-quality RGB image, which is generally performed by the so-called Image Signal Processor (ISP).
  • ISP Image Signal Processor
  • ISPs convert the RAW sensor readings into good-looking RGB images and are a main building block of digital cameras.
  • ISPs perform a set of canonical and other image processing tasks (for example, white balance, color correction, gamma expansion), but there is great variability and complexity in the design of commercial ISPs.
  • Brooks et al. presents a canonical ISP that is differentiable, but whose parameters are not learnable and are instead manually fixed previously.
  • the CycleISP method uses a convolutional neural network (CNN) to model the forward and inverse ISP.
  • CNN convolutional neural network
  • the model is not based on interpretable modules. One cannot easily inspect intermediate steps, or add, delete or modify some operations that are happening in the ISP if there is a requirement to do so.
  • an image processor comprising a plurality of processing modules configured to operate in series to transform an input image, one or more of the plurality of modules being configured to independently implement one or more trained artificial intelligence models, wherein each of the modules of the image processor has first and second operation modes that perform and invert a respective image transformation respectively on the input image or an intermediate image derived from the input image.
  • Embodiments of the present invention may advantageously provide an end-to-end learnable, invertible and interpretable ISP.
  • the image processor may comprise multiple modules each configured to implement a trained artificial intelligence model, wherein the respective trained artificial intelligence models are trainable end-to-end. This may allow the image processing pipeline to be efficiently trained.
  • the respective trained artificial intelligence models may be end-to-end trainable using paired or unpaired training images.
  • the pipeline may therefore be trained using adversarial training.
  • One or more of the modules may be configured to implement a model comprising predefined parameters. These parameters can be learned, fo example learned end-to-end, or controlled by hand (i.e. the parameters may be the user’s choice).
  • the pipeline may therefore be hybrid parametric and trainable.
  • One or more of the modules may be configured to implement a different model for each of the first and second operation modes of the respective module. This may allow each module to perform and invert its respective image transformation operation effectively.
  • the plurality of modules of the image processor may be collectively configured to perform or invert a total image transformation of the input image.
  • the plurality of modules may be collectively configured to transform a RAW image to an RGB image, or vice versa, by performing or inverting multiple image transformation operations.
  • the image processor may comprise a module configured to perform color correction, said module being configured to jointly learn a plurality of colour correction matrices and a decoder that during inference is configured to estimate and apply a linear decomposition to the corresponding colour correction matrices.
  • the color correction module may therefore be enhanced using dictionary learning.
  • the module may be configured to multiply a color correction matrix by each pixel of the input image or intermediate image.
  • the module may be configured to multiply a color correction matrix pseudo-inverse by each pixel of the input image or intermediate image.
  • the image processor may comprise a module configured to perform white balancing, said module being configured to jointly learn a plurality of white balance vectors and a decoder that during inference is configured to estimate and apply a linear decomposition to the corresponding white balance vectors.
  • the white balance module may therefore be enhanced using dictionary learning.
  • the linear decomposition(s) may be in the form of a vector parameter. This may be a convenient implementation.
  • the image processor may comprise a module configured to perform lens shading correction, said module being configured to perform the lens shading correction by the composition of an estimated attention mask and a Gaussian mask that is multiplied or divided by each pixel of the input image or the intermediate image to perform or invert the lens shading correction respectively. This may result in improved image quality when performing lens shading correction and better lens shading modelling when inverting the operation.
  • the image processor may comprise a module configured to perform tone mapping, said module being configured to implement a neural network to perform the tone mapping, and a different neural network to invert the tone mapping. This may result in improved image quality.
  • the processor may comprise a module (or modules) configured to perform and invert one or more of the follow image transformations: lens shading correction, white balance, color correction, gamma decompression and tone mapping.
  • the processor may therefore conveniently perform a number of desired image tranformation operations on the input image.
  • the image processor may be pipelined. Each module except for the first module in the pipeline may be configured to take an input from a preceding module. This may be an efficient way of processing the input image.
  • the input image may be a RAW image.
  • the input image may be an RGB image.
  • the image processor may coveniently transform a RAW image to an RGB image, or vice versa.
  • an image processing apparatus comprising the image processor as described above and an imaging device, wherein the image processor is configured to receive the input image from the imaging device.
  • the image processor described above may therefore conveniently be implemented in an imaging device such as a camera or a smartphone.
  • a computer-implemented method for transforming an input image in an image processor comprising a plurality of processing modules configured to operate in series, the method comprising, in one or more of the modules, independently implementing one or more trained artificial intelligence models according to first and second operation modes that perform and invert a respective image transformation respectively on the input image or an intermediate image derived from the input image.
  • Embodiments of the present invention may therefore provide an end-to-end learnable, invertible and interpretable ISP.
  • a computer program which, when executed by a computer, causes the computer to perform the method described above.
  • the computer program may be provided on a non-transitory computer readable storage medium.
  • Figure 1 schematically illustrates an example of a learnable ISP architecture.
  • Figure 2 schematically illustrates an exemplary implementation of a dictionary-based color correction image transformation operation for the inverse path (RGB to RAW).
  • Figure 3 shows an exemplary implementation of a dictionary-based white balance image tranformation operation for the inverse path (RGB to RAW).
  • Figures 4(a) and 4(b) schematically illustrate the lens shading correction (LSC) effect.
  • Figure 5 schematically illustrates an example of a lens shading learnable module of the ISP.
  • Figure 6 schematically illustrates an example of an unsupervised training strategy for adapting the ISP model.
  • Figure 7 shows an example of a device configured to implement the method described herein.
  • Figure 8 shows an example of a RAW image generated from an RGB image taken on a smartphone using the approach described herein.
  • Figures 9(a) and 9(b) show a comparison between a generated RAW image using the approach described herein ( Figure 9(a)) and a real smartphone RAW image ( Figure 9(b)).
  • Figures 10(a) and 10(b) show a further comparison between a generated RAW image using the approach described herein ( Figure 10(a)) and a real smartphone RAW image ( Figure 10(b)).
  • Figures 11 (a) to 11 (d) show examples of qualitative results on a denoising task.
  • Figure 11 (a) shows results from a model trained with the method of Brooks et al.
  • Figure 11 (b) shows results from a model trained using the approach described herein.
  • Figure 11 (c) shows the original noisy image.
  • Figure 11 (d) shows a ground truth image.
  • Figures 12 (a) and 12 (b) show examples of qualitative results on a linear HDR to linear RAW transformation using the approach described herein with the unsupervised method.
  • Figure 12(a) shows a HDR-RGB image from a smartphone.
  • Figure 12(b) shows a generated linear RAW image using the approach described herein.
  • Described herein is a learnable ISP for processing RAW sensor data to produce an RGB image that can additionally perform the inverse of that processing, i.e. to obtain a realistic RAW sensor reading from an RGB processed image.
  • the ISP is advantageously modular and learnable end-to-end. Each module may also be configured to individually reversibly process an intermediate image derived from the input image.
  • Embodiments of the present invention may also solve the forward and inverse hybrid ISP problem by incorporating a set of modules for performing and inverting both canonical (i.e. classical) ISP operations, that may be enhanced via dictionary learning, and also deep learning modules, which can provide an end-to-end learnable, invertible and interpretable ISP.
  • canonical i.e. classical
  • deep learning modules which can provide an end-to-end learnable, invertible and interpretable ISP.
  • FIG. 1 shows a schematic illustration of an example of the ISP model, showing the independent processing modules comprising the pipeline 100.
  • Each module 102, 103, 104, 105, 106 is independent from the other modules in the pipeline and performs and inverts a given image processing task.
  • the image signal processor comprises a configurable set of modules. Each module has two operation modes. Each module in the pipeline has first and second operation modes that perform and invert a respective image transformation respectively on the input image or an intermediate image derived from the input image.
  • the ISP comprises any subset of modules configured to perform the following image processing operations: lens shading correction, white balance, color correction, gamma decompression, tone mapping (preferably neural tone mapping) and their inverse operations.
  • the pipeline is flexible, allowing for the addition or removal of modules as desired.
  • the modules are configured to operate in series. Each module except for the first module in the pipeline (which may be module 102 or 106 in this example, depending on the desired directionality of the image processing operation) is configured to take an input from a preceding module. For the RGB to RAW transformation illustrated from left to right in Figure 1 , the first module 102 in the pipeline is configured to receive the input image (the RGB image 101 ).
  • a first module for example module 103
  • any subsequent modules in the pipeline for example modules 104-106
  • the module 103 is configured to receive an intermediate image that has already been processed by the tone mapping module 102.
  • One or more of the modules in the pipeline is configured to implement a trained artificial intelligence model, such as a CNN.
  • a trained artificial intelligence model such as a CNN.
  • multiple modules are configured to implement a trained artificial intelligence model.
  • the respective trained artificial intelligence models implemented by these multiple modules are conveniently trainable end-to-end.
  • one or more of the modules may be configured to implement a model comprising predefined parameters.
  • the pipeline may be a hybrid trainable and parametric ISP.
  • One or more of the modules may be configured to perform a mathematically invertible operation to perform a respective image transformation. The operation can then be inverted to invert the respective image transformation.
  • Such operations may include gamma compression and gamma expansion.
  • the plurality of modules of the image signal processor are collectively configured to perform or invert a total image transformation of the input image. For example, from a RAW input image to an RGB image, or vice versa.
  • the example of the learnable ISP shown in Figure 1 primarily demonstrates the transformation of a RBG image to an RAW image (from left to right in Figure 1). However, as mentioned above, the ISP may also perform the reverse operation from RAW to RGB.
  • the modules in the pipeline comprise a tonemapping module 102, a gamma decomposition module 103, a learned color correction (CCM) module 104, a learned white balance (WB) and digital gain module 105, and a learned lens shading (LS) module 106.
  • CCM learned color correction
  • WB learned white balance
  • LS learned lens shading
  • the paths which represent the inversion and performance of the tonemapping operation are explicitly shown at 102a and 102b respectively.
  • the paths which represent the inversion (lens shading effect) and performance (lens shading correction) of the lens shading operation are explicitly shown at 106a and 106b respectively.
  • the other modules 103, 104 and 105 are also configured to perform and invert their respective image transformations (not shown explicitly in Figure 1).
  • Each of the learnable modules may be configured to implement a different model for each of the first and second operation modes of the respective module.
  • the forward tone mapping is preferably performed by a neural network
  • the inverse tone mapping is preferably performed by a different neural network trained to invert the forward operation.
  • any modules implementing a model with fixed (non- learnable) parameters may be configured to implement a different model for each of the first and second operation modes of the respective module.
  • the loss between the generated RAW image 107 and the real RAW image 108 may be determined and used to train the model for the RGB to RAW direction.
  • the model for the RAW to RGB direction may also be trained in dependence on the corresponding loss between the generated RGB image and a real RGB image.
  • the tone mapping is performed using a pixel wise CNN as an approximation of a piece-wise linear function.
  • This approach has in some implementations been shown to be more powerful than simple look-up table for tone mapping.
  • the network architecture is a convolutional block of N layers of 32-64 filters of size (1x1) and ReLU activations.
  • CCM and WB operations may be fixed, but this can be unrealistic.
  • estimating or learning a CCM or WB operation can in some circumstances be challenging and computationally costly.
  • a dictionary and a encoder that finds a linear decomposition are simultaneously learned.
  • a dictionary allows to have a more expressive representation of the distribution (different modes/cameras).
  • the encoder can provide an optimal decomposition.
  • the linear weights can be used as a controllable parameter if desired.
  • the ISP comprises a color correction matrix (CCM) module.
  • the CCM module is configured to jointly learn a dictionary of colour correction matrices and a decoder that during inference estimates and applies a linear decomposition (preferably in the form of a vector parameter) with repect to the corresponding dictionary of CCMs. For the forward mode, the resulting CCM is multiplied by each pixel, and for the backward mode the matrix pseudo-inverse is multiplied by each pixel.
  • CCM color correction matrix
  • FIG. 2 shows an exemplary implementation of the dictionary-based color correction for the inverse path (RGB to RAW) of this exemplary solution.
  • the RGB input image is shown at 201 .
  • the color correction dictionary can be represented as a convolutional layer composed of filters of dimensions (1 , 1 , 3, 3 * N), where N is the number of CCMs in the dictionary.
  • Each CCM preferably has dimensions of 3x3.
  • N CCM-RGB results with dimensions (N, h, w, 3) is shown at 203.
  • This intermediate representation 203 feeds the encoder 204 (preferably a pixel-wise CNN) that finds the optimal linear decomposition weights, shown at 205.
  • the linear combination using such weights, illustrated at 206 produces the resultant color corrected image 207 from this block.
  • This intermediate image 207 can then proceed to the next module in the pipeline, such as a WB module.
  • This representation can allow for a rich learnt representation comprising different color correction modes and to obtain the corresponding output from each one and combine them in an optimal way.
  • the model learns simultaneously a dictionary and an encoder that finds a linear decomposition.
  • N is set to 16.
  • the hybrid model can therefore combine canonical invertible ISP operations in some modules (which can be enhanced via dictionary learning) with learnable invertable operations in other modules.
  • the dictionary-based approach may also conveniently be used for performing and inverting the white balance (WB) transformation.
  • the ISP comprises a WB module.
  • the WB module jointly learns a dictionary representation of white balance vectors and a decoder that during inference estimates and applies a linear decomposition (preferably in the form of a vector parameter) with respect to the corresponding dictionary of WBs.
  • a linear decomposition preferably in the form of a vector parameter
  • FIG 3 shows an exemplary implementation of the dictionary-based white balance image transformation operation for the inverse path (RGB to RAW).
  • the RGB input image, after color correction/CST, is shown at 301 .
  • the input to the module is an intermediate image derived from the original RGB input image (i.e. after color correction/CST).
  • the white balance dictionary is represented as an N x 3 matrix, where N is the size of the WB dictionary Each row in the matrix represents a set of gains: RGB Gain, Red Gain and Blue Gain.
  • N is the size of the WB dictionary
  • Each row in the matrix represents a set of gains: RGB Gain, Red Gain and Blue Gain.
  • the corresponding WB-RGB input is generated, as shown at 303, which feeds the encoder 304.
  • the optimal linear decomposition weights i.e. the importance for each WB-RGB result
  • a linear combination is performed based on them, as shown at 306.
  • the resulting intermediate RGB image 307 then proceeds to the next block in the pipeline (in this example, to the lens shading module for lens shading estimation) to give a further corrected RGB image 308.
  • the model therefore simultaneously learns a dictionary and an encoder that finds a linear decomposition.
  • the dictionary size N is set to 16.
  • the encoder 304 prefeably has the same architecture as the encoder 204 used in the CCM step, but it is not the same encoder.
  • the ISP comprises a lens shading correction module.
  • FIGs 4(a) and 4(b) show an example of the lens shading correction (LSC) effect.
  • LSC lens shading correction
  • the linear RAW image from the sensor has an undesirable shading effect produced by various physical phenomena related with the camera’s lens.
  • the ISP can perform a lens shading correction (LSC) operation to remove this effect and improve the final RGB image quality.
  • the RGB image after LSC is illustrated in Figure 4(b).
  • the lens shading learnable module of the ISP is preferably based on Gaussian lens shading model estimation. It is also preferably based on an attention mechanism to fine-tune and correct deviations from the Gaussian model.
  • the lens shading correction is performed by the composition of an estimated attention mask and a Gaussian mask (controlled by parameters mu and sigma) that is then multiplied by each pixel (forward mode) or divided by each pixel (backward mode).
  • a Gaussian mask controlled by parameters mu and sigma
  • Figure 5 schematically illustrates an example of a lens shading learnable module of the ISP described herein for the RGB to RAW (inverse) transformation.
  • the intermediate RGB image after undergoing the inverse WB operation is shown at 501 .
  • the image 501 is the input to the module.
  • a simple CNN is trained.
  • the CNN comprises three layers of 16 filters with 3x3 kernels. The last layer is followed by a sigmoid activation that provides the attention mask 503.
  • the attention mask is combined with the previously estimated Gaussian mask 504.
  • the previously estimated probabilistic mask (Gaussian mask) 504 is generated as follows, as indicated at 505.
  • a generated RAW image is shown at 506.
  • noisy lens shading masks 507 are extracted from the generated RAW image 506 as an element-wise gain from selected samples using the real RAW image 508.
  • a Gaussian kernel is fitted to give the estimated probabilistic mask 504.
  • the combination of the attention mask 503 and the estimated probabilistic mask 504 is carried out by an element-wise multiplication.
  • the result is a pseudo lens shading mask 510 applied as a pixel-wise gain that emulates the real lens shading effect and thus improves the RAW image reconstruction to give the RGB image after LSE, shown at 511 .
  • the complete process is learnable and allows one to understand the lens shading effect and improve the quality of the reconstructed images using it.
  • This module is especially useful for modelling the reverse ISP path (RGB to RAW). However, the module can be omitted from the pipeline if required, for example while modelling Linear HDR to Linear RAW.
  • the inverse operation is lens shading correction, which can be performed in a similar manner for the transformation of a RAW image (or an intermediate image derived from the RAW image) to an RGB image.
  • an additional loss function for a different color domain may also be used.
  • a loss function determined in the chroma domain may also be used. This may further improve the appearance and quality of the generated images from the ISP. This may be performed as described below:
  • RGB to YUV color domain conversion linear function
  • a variation of the color correction module may also be implemented where strictly the chroma component from the images is used to ensure that the transformation only changes the color of the images.
  • the ISP described herein may conveniently be trained end-to-end (i.e. such that it is differentiable) with paired and/or unpaired examples, thus allowing an unsupervised domain transfer solution for unpaired data.
  • This approach together with the modularity- adaptability of the model, allows the model to learn more complex domain adaptations such as Linear HDR to Linear RAW from little unpaired data.
  • the ISP model may be trained using a training dataset 601 .
  • the training image 602 is unpaired.
  • the image 602 is input to the ISP 603 to produce a RAW generated image 604.
  • An unpaired real RAW image is shown at 605.
  • the ISP model adversarial training for unpaired data can be performed using a Generative Adversarial Network (GAN) discriminator 606, which can learn to discriminate between the real image 605 and generated image 604 and provide an adversarial loss.
  • GAN Generative Adversarial Network
  • the model is preferably treated as a GAN, where the ISP model acts as a generator. During training, several fine-tuning rounds may be applied while tracking minimax GAN loss. The use of unpaired training examples may result in more efficient training of the ISP model.
  • the model may alternatively be trained using paired training examples.
  • the approach provides a computer-implemented method for transforming an input image in an image processor comprising a plurality of processing modules configured to operate in series.
  • the method comprises, in one or more of the modules, independently implementing one or more trained artificial intelligence models according to first and second operation modes that perform and invert a respective image transformation respectively on the input image or an intermediate image derived from the input image (for example, an intermediate stage of the total RAW to RGB or RGB to RAW transformation).
  • Figure 7 shows an example of apparatus 700 comprising a imaging device 701 , for example a camera, configured to use the method described herein to process image data captured by at least one image sensor in the device.
  • the device 701 comprises image sensors 702, 703.
  • Such a device 701 typically includes some onboard processing capability. This could be provided by the processor 704.
  • the processor 704 could also be used for the essential functions of the device.
  • One or more of the image sensors 702, 703 may be used to generate a RAW image that may be converted to an RGB image using the image signal processing pipeline described above.
  • the transceiver 705 is capable of communicating over a network with other entities 710, 711 . Those entities may be physically remote from the device 701 .
  • the network may be a publicly accessible network such as the internet.
  • the entities 710, 711 may be based in the cloud.
  • Entity 710 is a computing entity.
  • Entity 711 is a command and control entity.
  • These entities are logical entities. In practice they may each be provided by one or more physical devices such as servers and data stores, and the functions of two or more of the entities may be provided by a single physical device.
  • Each physical device implementing an entity comprises a processor and a memory.
  • the devices may also comprise a transceiver for transmitting and receiving data to and from the transceiver 705 of device 701.
  • the memory stores in a non transient way code that is executable by the processor to implement the respective entity in the manner described herein.
  • the command and control entity 711 may train the model(s) used in the device. This is typically a computationally intensive task, even though the resulting model(s) may be efficiently described, so it may be efficient for the development of the algorithm to be performed in the cloud, where it can be anticipated that significant energy and computing resource is available. It can be anticipated that this is more efficient than forming such a model at a typical imaging device.
  • the command and control entity can automatically form a corresponding model and cause it to be transmitted to the relevant imaging device.
  • the model is implemented at the device 701 by processor 704.
  • an image may be captured by one or both of the sensors 702, 703 and the image data may be sent by the transceiver 705 to the cloud for processing. The resulting image could then be sent back to the device 701 , as shown at 712 in Figure 7.
  • the method may be deployed in multiple ways, for example in the cloud, on the device, or alternatively in dedicated hardware.
  • the cloud facility could perform training to develop new algorithms or refine existing ones.
  • the training could either be undertaken close to the source data, or could be undertaken in the cloud, e.g. using an inference engine.
  • the method may also be implemented at the device, in a dedicated piece of hardware, or in the cloud.
  • Figure 8 shows an RGB image 801 taken on the Fluawei P20pro smartphone and the RAW image 802 generated from the RGB image 801 using the approach described herein.
  • Figures 9(a) and 9(b) show a comparison between generated RAW images using the approach described herein ( Figure 9(a)) and a real smartphone RAW image ( Figure 9(b)).
  • Figures 10(a) and 10(b) show a further comparison between generated RAW images using the approach described herein ( Figure 10(a)) and a real smartphone RAW image ( Figure 10(b)).
  • Figures 11 (a) to 11 (d) show examples of qualitative results on a denoising task.
  • Figure 11 (a) shows results from a model trained with the method of Brooks et al.
  • Figure 11 (b) shows results from a model trained using the approach described herein.
  • Figure 11 (c) shows the original noisy image.
  • Figure 11 (d) shows a ground truth image. It can be seen that use of the described ISP model can generate higher-quality synthetic data and therefore the model may achieve better performance at low-level computer vision tasks, such as RAW image denoising.
  • Figures 12 (a) and 12 (b) show examples of qualitative results on a linear HDR to linear RAW transformation using the approach described herein with the unsupervised method.
  • Figure 12(a) shows a HDR-RGB image from a Fluawei P40 smartphone.
  • Figure 12(b) shows a generated linear RAW image using the approach described herein.
  • the approach described herein retains the benefits of classical ISP approaches in terms of efficiency, interpretability and modularity. Moreover, the approach may result in improved image processing by using CNNs for tone mapping, a lens shading learnable module and dictionary learning for core image processing operations such as the CCM and WB.
  • the pipeline can conveniently be trained end-to-end and there is no need to manually fine- tune parameters.
  • the pipeline can be trained using paired data or unpaired data (i.e. adversarial training).
  • the pipeline advantageously exhibits modularity and interpretability. One can inspect, add, or modify any desired module.
  • Table 1 Summary comparison between the present ISP and prior art approaches. Embodiments of the present invention do not require previous information from the camera it is desired to model. Usually, obtaining this information is very expensive and a time consuming process.
  • the approach described herein can learn and model image processing operations within an ISP in a modular fasion (e.g. CCM, WB, LSC and tone mapping, among others), the camera’s lens shading effect, piece-wise linear function approximations for tonemapping (via CNNs) and different representations of the ISP intermediate results.
  • Previous approaches cannot generally learn nor model this important information.
  • Previous methods model the ISP from a specific camera.
  • the present approach can model multiple cameras at the same time as a result of the dictionary-based decomposition of WB and CCM.
  • the present approach can be adapted to perform multiple domain transformations, such as RGB to RAW, Linear HDR to Linear RAW, etc.
  • multiple domain transformations such as RGB to RAW, Linear HDR to Linear RAW, etc.
  • the described model is reversible with little effort (RGB to RAW and RAW to RGB).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Pure & Applied Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)

Abstract

Described herein is an image processor (100) comprising a plurality of processing modules (102, 103, 104, 105, 106) configured to operate in series to transform an input image (101), one or more of the plurality of modules (102, 104, 105, 106) being configured to independently implement one or more trained artificial intelligence models, wherein each of the modules of the image processor has first (102a, 106a) and second (102b, 106b) operation modes that perform and invert a respective image transformation respectively on the input image (101) or an intermediate image derived from the input image. Embodiments of the present invention may provide an end-to-end learnable, invertible and interpretable image signal processor.

Description

MODULAR AND LEARNABLE IMAGE SIGNAL PROCESSOR
FIELD OF THE INVENTION
This invention relates to the processing of images in an Image Signal Processor, for example for the transformation of a RAW image to an RGB image.
BACKGROUND
The present invention concerns the processing required to convert a RAW reading from a camera sensor to a high-quality RGB image, which is generally performed by the so-called Image Signal Processor (ISP).
ISPs convert the RAW sensor readings into good-looking RGB images and are a main building block of digital cameras. Traditionally, ISPs perform a set of canonical and other image processing tasks (for example, white balance, color correction, gamma expansion), but there is great variability and complexity in the design of commercial ISPs.
There few prior solutions aiming at a “bidirectional” ISP that can perform both RAW to RGB and RGB to RAW image transformations. Such solutions include Google's unprocessing, as described in Brooks et al., “Google Unprocessing for Learnt Raw Denoising”, CVPR 2019, and cycleISP, as described in Zamir et al., “CycleISP: Real Image Restoration via Improved Data Synthesis”, CVPR 2020. Both of these approaches have several disadvantages. In summary, neither of these existing prior art solutions are both learnable and modular.
The work of Brooks et al. presents a canonical ISP that is differentiable, but whose parameters are not learnable and are instead manually fixed previously.
The CycleISP method uses a convolutional neural network (CNN) to model the forward and inverse ISP. The model is not based on interpretable modules. One cannot easily inspect intermediate steps, or add, delete or modify some operations that are happening in the ISP if there is a requirement to do so. There is no explicit mechanism and awareness to model different ISPs and the approach has a high complexity, with no model regulation.
It is desirable to develop a method for processing images that overcomes these problems. SUMMARY
According to one aspect there is provided an image processor comprising a plurality of processing modules configured to operate in series to transform an input image, one or more of the plurality of modules being configured to independently implement one or more trained artificial intelligence models, wherein each of the modules of the image processor has first and second operation modes that perform and invert a respective image transformation respectively on the input image or an intermediate image derived from the input image.
Embodiments of the present invention may advantageously provide an end-to-end learnable, invertible and interpretable ISP.
The image processor may comprise multiple modules each configured to implement a trained artificial intelligence model, wherein the respective trained artificial intelligence models are trainable end-to-end. This may allow the image processing pipeline to be efficiently trained.
The respective trained artificial intelligence models may be end-to-end trainable using paired or unpaired training images. The pipeline may therefore be trained using adversarial training.
One or more of the modules may be configured to implement a model comprising predefined parameters. These parameters can be learned, fo example learned end-to-end, or controlled by hand (i.e. the parameters may be the user’s choice). The pipeline may therefore be hybrid parametric and trainable.
One or more of the modules may be configured to implement a different model for each of the first and second operation modes of the respective module. This may allow each module to perform and invert its respective image transformation operation effectively.
The plurality of modules of the image processor may be collectively configured to perform or invert a total image transformation of the input image. For example, the plurality of modules may be collectively configured to transform a RAW image to an RGB image, or vice versa, by performing or inverting multiple image transformation operations.
The image processor may comprise a module configured to perform color correction, said module being configured to jointly learn a plurality of colour correction matrices and a decoder that during inference is configured to estimate and apply a linear decomposition to the corresponding colour correction matrices. The color correction module may therefore be enhanced using dictionary learning. For the first operation mode that performs the color correction, the module may be configured to multiply a color correction matrix by each pixel of the input image or intermediate image. For the second operation mode that inverts the color correction, the module may be configured to multiply a color correction matrix pseudo-inverse by each pixel of the input image or intermediate image.
The image processor may comprise a module configured to perform white balancing, said module being configured to jointly learn a plurality of white balance vectors and a decoder that during inference is configured to estimate and apply a linear decomposition to the corresponding white balance vectors. The white balance module may therefore be enhanced using dictionary learning.
The linear decomposition(s) may be in the form of a vector parameter. This may be a convenient implementation.
The image processor may comprise a module configured to perform lens shading correction, said module being configured to perform the lens shading correction by the composition of an estimated attention mask and a Gaussian mask that is multiplied or divided by each pixel of the input image or the intermediate image to perform or invert the lens shading correction respectively. This may result in improved image quality when performing lens shading correction and better lens shading modelling when inverting the operation.
The image processor may comprise a module configured to perform tone mapping, said module being configured to implement a neural network to perform the tone mapping, and a different neural network to invert the tone mapping. This may result in improved image quality.
The processor may comprise a module (or modules) configured to perform and invert one or more of the follow image transformations: lens shading correction, white balance, color correction, gamma decompression and tone mapping. The processor may therefore conveniently perform a number of desired image tranformation operations on the input image.
The image processor may be pipelined. Each module except for the first module in the pipeline may be configured to take an input from a preceding module. This may be an efficient way of processing the input image.
The input image may be a RAW image. The input image may be an RGB image. The image processor may coveniently transform a RAW image to an RGB image, or vice versa. According to another aspect there is provided an image processing apparatus comprising the image processor as described above and an imaging device, wherein the image processor is configured to receive the input image from the imaging device. The image processor described above may therefore conveniently be implemented in an imaging device such as a camera or a smartphone.
According to a further aspect there is provided a computer-implemented method for transforming an input image in an image processor comprising a plurality of processing modules configured to operate in series, the method comprising, in one or more of the modules, independently implementing one or more trained artificial intelligence models according to first and second operation modes that perform and invert a respective image transformation respectively on the input image or an intermediate image derived from the input image.
Embodiments of the present invention may therefore provide an end-to-end learnable, invertible and interpretable ISP.
According to a further aspect there is provided a computer program which, when executed by a computer, causes the computer to perform the method described above. The computer program may be provided on a non-transitory computer readable storage medium.
BRIEF DESCRIPTION OF THE FIGURES
The present invention will now be described by way of example with reference to the accompanying drawings.
In the drawings:
Figure 1 schematically illustrates an example of a learnable ISP architecture.
Figure 2 schematically illustrates an exemplary implementation of a dictionary-based color correction image transformation operation for the inverse path (RGB to RAW).
Figure 3 shows an exemplary implementation of a dictionary-based white balance image tranformation operation for the inverse path (RGB to RAW).
Figures 4(a) and 4(b) schematically illustrate the lens shading correction (LSC) effect. Figure 5 schematically illustrates an example of a lens shading learnable module of the ISP.
Figure 6 schematically illustrates an example of an unsupervised training strategy for adapting the ISP model.
Figure 7 shows an example of a device configured to implement the method described herein.
Figure 8 shows an example of a RAW image generated from an RGB image taken on a smartphone using the approach described herein.
Figures 9(a) and 9(b) show a comparison between a generated RAW image using the approach described herein (Figure 9(a)) and a real smartphone RAW image (Figure 9(b)).
Figures 10(a) and 10(b) show a further comparison between a generated RAW image using the approach described herein (Figure 10(a)) and a real smartphone RAW image (Figure 10(b)).
Figures 11 (a) to 11 (d) show examples of qualitative results on a denoising task. Figure 11 (a) shows results from a model trained with the method of Brooks et al., Figure 11 (b) shows results from a model trained using the approach described herein. Figure 11 (c) shows the original noisy image. Figure 11 (d) shows a ground truth image.
Figures 12 (a) and 12 (b) show examples of qualitative results on a linear HDR to linear RAW transformation using the approach described herein with the unsupervised method. Figure 12(a) shows a HDR-RGB image from a smartphone. Figure 12(b) shows a generated linear RAW image using the approach described herein.
DETAILED DESCRIPTION
Described herein is a learnable ISP for processing RAW sensor data to produce an RGB image that can additionally perform the inverse of that processing, i.e. to obtain a realistic RAW sensor reading from an RGB processed image. The ISP is advantageously modular and learnable end-to-end. Each module may also be configured to individually reversibly process an intermediate image derived from the input image.
Embodiments of the present invention may also solve the forward and inverse hybrid ISP problem by incorporating a set of modules for performing and inverting both canonical (i.e. classical) ISP operations, that may be enhanced via dictionary learning, and also deep learning modules, which can provide an end-to-end learnable, invertible and interpretable ISP.
Figure 1 shows a schematic illustration of an example of the ISP model, showing the independent processing modules comprising the pipeline 100. Each module 102, 103, 104, 105, 106 is independent from the other modules in the pipeline and performs and inverts a given image processing task.
The image signal processor comprises a configurable set of modules. Each module has two operation modes. Each module in the pipeline has first and second operation modes that perform and invert a respective image transformation respectively on the input image or an intermediate image derived from the input image.
In a preferred implementation, the ISP comprises any subset of modules configured to perform the following image processing operations: lens shading correction, white balance, color correction, gamma decompression, tone mapping (preferably neural tone mapping) and their inverse operations. The pipeline is flexible, allowing for the addition or removal of modules as desired.
The modules are configured to operate in series. Each module except for the first module in the pipeline (which may be module 102 or 106 in this example, depending on the desired directionality of the image processing operation) is configured to take an input from a preceding module. For the RGB to RAW transformation illustrated from left to right in Figure 1 , the first module 102 in the pipeline is configured to receive the input image (the RGB image 101 ).
Where it is desired to perform one or more intermediate image transformations, a first module (for example module 103), and any subsequent modules in the pipeline (for example modules 104-106), may be configured to receive information derived from the input image 101. This may be in the form of an intermediate image derived from the input image 101 . For instance, in the example shown in Figure 1 , the module 103 is configured to receive an intermediate image that has already been processed by the tone mapping module 102.
One or more of the modules in the pipeline is configured to implement a trained artificial intelligence model, such as a CNN. Preferably, multiple modules are configured to implement a trained artificial intelligence model. In a preferred embodiment, the respective trained artificial intelligence models implemented by these multiple modules are conveniently trainable end-to-end. Furthermore, one or more of the modules may be configured to implement a model comprising predefined parameters. Thus the pipeline may be a hybrid trainable and parametric ISP.
One or more of the modules may be configured to perform a mathematically invertible operation to perform a respective image transformation. The operation can then be inverted to invert the respective image transformation. Such operations may include gamma compression and gamma expansion.
The plurality of modules of the image signal processor are collectively configured to perform or invert a total image transformation of the input image. For example, from a RAW input image to an RGB image, or vice versa.
The example of the learnable ISP shown in Figure 1 primarily demonstrates the transformation of a RBG image to an RAW image (from left to right in Figure 1). However, as mentioned above, the ISP may also perform the reverse operation from RAW to RGB.
In this example, the modules in the pipeline comprise a tonemapping module 102, a gamma decomposition module 103, a learned color correction (CCM) module 104, a learned white balance (WB) and digital gain module 105, and a learned lens shading (LS) module 106. Other suitable modules may also be included.
In the tonemapping module, the paths which represent the inversion and performance of the tonemapping operation are explicitly shown at 102a and 102b respectively. Similarly, in the lens shading module, the paths which represent the inversion (lens shading effect) and performance (lens shading correction) of the lens shading operation are explicitly shown at 106a and 106b respectively.
The other modules 103, 104 and 105 are also configured to perform and invert their respective image transformations (not shown explicitly in Figure 1).
Each of the learnable modules may be configured to implement a different model for each of the first and second operation modes of the respective module. For example, in the tone mapping module 106, the forward tone mapping is preferably performed by a neural network, and the inverse tone mapping is preferably performed by a different neural network trained to invert the forward operation. Similarly, any modules implementing a model with fixed (non- learnable) parameters may be configured to implement a different model for each of the first and second operation modes of the respective module. The loss between the generated RAW image 107 and the real RAW image 108 may be determined and used to train the model for the RGB to RAW direction. The model for the RAW to RGB direction may also be trained in dependence on the corresponding loss between the generated RGB image and a real RGB image.
In an exemplary embodiment of the module 102, the tone mapping is performed using a pixel wise CNN as an approximation of a piece-wise linear function. This approach has in some implementations been shown to be more powerful than simple look-up table for tone mapping. In a preferred implementation, the network architecture is a convolutional block of N layers of 32-64 filters of size (1x1) and ReLU activations.
Different WB and CCM approaches may be used depending on the capture mode and scene. In some implementations, the CCM and WB operations may be fixed, but this can be unrealistic. However, estimating or learning a CCM or WB operation can in some circumstances be challenging and computationally costly.
In a preferred implementation of the present invention, for performing and inverting the WB and/or CCM operations, a dictionary and a encoder that finds a linear decomposition are simultaneously learned. A dictionary allows to have a more expressive representation of the distribution (different modes/cameras). The encoder can provide an optimal decomposition. Alternatively, the linear weights can be used as a controllable parameter if desired.
In one implementation, the ISP comprises a color correction matrix (CCM) module. The CCM module is configured to jointly learn a dictionary of colour correction matrices and a decoder that during inference estimates and applies a linear decomposition (preferably in the form of a vector parameter) with repect to the corresponding dictionary of CCMs. For the forward mode, the resulting CCM is multiplied by each pixel, and for the backward mode the matrix pseudo-inverse is multiplied by each pixel.
Figure 2 shows an exemplary implementation of the dictionary-based color correction for the inverse path (RGB to RAW) of this exemplary solution. The RGB input image is shown at 201 .
The color correction dictionary, indicated at 202, can be represented as a convolutional layer composed of filters of dimensions (1 , 1 , 3, 3*N), where N is the number of CCMs in the dictionary. Each CCM preferably has dimensions of 3x3.
After this convolutional layer is applied, the resultant representation (N CCM-RGB results with dimensions (N, h, w, 3)) is shown at 203. This intermediate representation 203 feeds the encoder 204 (preferably a pixel-wise CNN) that finds the optimal linear decomposition weights, shown at 205. Finally, the linear combination using such weights, illustrated at 206, produces the resultant color corrected image 207 from this block. This intermediate image 207 can then proceed to the next module in the pipeline, such as a WB module.
This representation can allow for a rich learnt representation comprising different color correction modes and to obtain the corresponding output from each one and combine them in an optimal way.
As described above, the model learns simultaneously a dictionary and an encoder that finds a linear decomposition. In a preferred implementation, N is set to 16.
The hybrid model can therefore combine canonical invertible ISP operations in some modules (which can be enhanced via dictionary learning) with learnable invertable operations in other modules.
The dictionary-based approach may also conveniently be used for performing and inverting the white balance (WB) transformation.
In this exemplary implementation, the ISP comprises a WB module. The WB module jointly learns a dictionary representation of white balance vectors and a decoder that during inference estimates and applies a linear decomposition (preferably in the form of a vector parameter) with respect to the corresponding dictionary of WBs. For the forward mode, the resulting WB is multiplied by each pixel, for the backward mode each pixel is divided by the resulting WB. The module performs a scalar multiplication (or division) between a pixel that is composed of R, G and B and their respective colour gains, i.e. R’=R*R_gain, G’=G*G_gain and B’=B*B gain respectively (WB = [R gain, G_gain, B gain]).
Figure 3 shows an exemplary implementation of the dictionary-based white balance image transformation operation for the inverse path (RGB to RAW). The RGB input image, after color correction/CST, is shown at 301 . Thus, the input to the module is an intermediate image derived from the original RGB input image (i.e. after color correction/CST).
The white balance dictionary, indicated at 302, is represented as an N x 3 matrix, where N is the size of the WB dictionary Each row in the matrix represents a set of gains: RGB Gain, Red Gain and Blue Gain. For each set of parameters, the corresponding WB-RGB input is generated, as shown at 303, which feeds the encoder 304. The optimal linear decomposition weights (i.e. the importance for each WB-RGB result) are inferred, as shown at 305, and a linear combination is performed based on them, as shown at 306. The resulting intermediate RGB image 307 then proceeds to the next block in the pipeline (in this example, to the lens shading module for lens shading estimation) to give a further corrected RGB image 308.
The model therefore simultaneously learns a dictionary and an encoder that finds a linear decomposition.
In a preferred implementation, the dictionary size N is set to 16. The encoder 304 prefeably has the same architecture as the encoder 204 used in the CCM step, but it is not the same encoder.
In one implementation, the ISP comprises a lens shading correction module.
Figures 4(a) and 4(b) show an example of the lens shading correction (LSC) effect. As shown in Figure 4(a), the linear RAW image from the sensor has an undesirable shading effect produced by various physical phenomena related with the camera’s lens. The ISP can perform a lens shading correction (LSC) operation to remove this effect and improve the final RGB image quality. The RGB image after LSC is illustrated in Figure 4(b).
An example of the lens shading modelling will now be described.
The lens shading learnable module of the ISP is preferably based on Gaussian lens shading model estimation. It is also preferably based on an attention mechanism to fine-tune and correct deviations from the Gaussian model.
In a preferred implementation, the lens shading correction is performed by the composition of an estimated attention mask and a Gaussian mask (controlled by parameters mu and sigma) that is then multiplied by each pixel (forward mode) or divided by each pixel (backward mode).
Figure 5 schematically illustrates an example of a lens shading learnable module of the ISP described herein for the RGB to RAW (inverse) transformation. The intermediate RGB image after undergoing the inverse WB operation is shown at 501 . The image 501 is the input to the module.
In order to recover further details and to correct deviations from the Gaussian model, a simple CNN is trained. As shown at 502, in a preferred implementation, the CNN comprises three layers of 16 filters with 3x3 kernels. The last layer is followed by a sigmoid activation that provides the attention mask 503. The attention mask is combined with the previously estimated Gaussian mask 504. The previously estimated probabilistic mask (Gaussian mask) 504 is generated as follows, as indicated at 505. A generated RAW image is shown at 506. Noisy lens shading masks 507 are extracted from the generated RAW image 506 as an element-wise gain from selected samples using the real RAW image 508.
As shown at 509, after filtering the signal, a Gaussian kernel is fitted to give the estimated probabilistic mask 504.
The combination of the attention mask 503 and the estimated probabilistic mask 504 is carried out by an element-wise multiplication. The result is a pseudo lens shading mask 510 applied as a pixel-wise gain that emulates the real lens shading effect and thus improves the RAW image reconstruction to give the RGB image after LSE, shown at 511 .
The complete process is learnable and allows one to understand the lens shading effect and improve the quality of the reconstructed images using it.
This module is especially useful for modelling the reverse ISP path (RGB to RAW). However, the module can be omitted from the pipeline if required, for example while modelling Linear HDR to Linear RAW.
The inverse operation is lens shading correction, which can be performed in a similar manner for the transformation of a RAW image (or an intermediate image derived from the RAW image) to an RGB image.
In embodiments of the present invention comprising a color correction module, an additional loss function for a different color domain (i.e. different to the RGB domain) may also be used. For example, a loss function determined in the chroma domain may also be used. This may further improve the appearance and quality of the generated images from the ISP. This may be performed as described below:
1 . RGB to YUV color domain conversion (linear function).
2. Separate Lumna (Y) and Chroma (UV) components from the image at YUV color domain.
3. Calculate the loss function (L2) between generated image’s chroma component and real image’s chroma component.
4. Add previous loss as an extra term in the final loss. A variation of the color correction module may also be implemented where strictly the chroma component from the images is used to ensure that the transformation only changes the color of the images.
In some examples, the ISP described herein may conveniently be trained end-to-end (i.e. such that it is differentiable) with paired and/or unpaired examples, thus allowing an unsupervised domain transfer solution for unpaired data. This approach, together with the modularity- adaptability of the model, allows the model to learn more complex domain adaptations such as Linear HDR to Linear RAW from little unpaired data.
As shown in Figure 6, the ISP model may be trained using a training dataset 601 . The training image 602 is unpaired. The image 602 is input to the ISP 603 to produce a RAW generated image 604. An unpaired real RAW image is shown at 605.
The ISP model adversarial training for unpaired data can be performed using a Generative Adversarial Network (GAN) discriminator 606, which can learn to discriminate between the real image 605 and generated image 604 and provide an adversarial loss.
The model is preferably treated as a GAN, where the ISP model acts as a generator. During training, several fine-tuning rounds may be applied while tracking minimax GAN loss. The use of unpaired training examples may result in more efficient training of the ISP model.
The model may alternatively be trained using paired training examples.
In summary, as discussed above, the approach provides a computer-implemented method for transforming an input image in an image processor comprising a plurality of processing modules configured to operate in series. The method comprises, in one or more of the modules, independently implementing one or more trained artificial intelligence models according to first and second operation modes that perform and invert a respective image transformation respectively on the input image or an intermediate image derived from the input image (for example, an intermediate stage of the total RAW to RGB or RGB to RAW transformation).
Figure 7 shows an example of apparatus 700 comprising a imaging device 701 , for example a camera, configured to use the method described herein to process image data captured by at least one image sensor in the device. The device 701 comprises image sensors 702, 703. Such a device 701 typically includes some onboard processing capability. This could be provided by the processor 704. The processor 704 could also be used for the essential functions of the device.
One or more of the image sensors 702, 703 may be used to generate a RAW image that may be converted to an RGB image using the image signal processing pipeline described above.
The transceiver 705 is capable of communicating over a network with other entities 710, 711 . Those entities may be physically remote from the device 701 . The network may be a publicly accessible network such as the internet. The entities 710, 711 may be based in the cloud. Entity 710 is a computing entity. Entity 711 is a command and control entity. These entities are logical entities. In practice they may each be provided by one or more physical devices such as servers and data stores, and the functions of two or more of the entities may be provided by a single physical device. Each physical device implementing an entity comprises a processor and a memory. The devices may also comprise a transceiver for transmitting and receiving data to and from the transceiver 705 of device 701. The memory stores in a non transient way code that is executable by the processor to implement the respective entity in the manner described herein.
The command and control entity 711 may train the model(s) used in the device. This is typically a computationally intensive task, even though the resulting model(s) may be efficiently described, so it may be efficient for the development of the algorithm to be performed in the cloud, where it can be anticipated that significant energy and computing resource is available. It can be anticipated that this is more efficient than forming such a model at a typical imaging device.
In one implementation, once the algorithms have been developed in the cloud, the command and control entity can automatically form a corresponding model and cause it to be transmitted to the relevant imaging device. In this example, the model is implemented at the device 701 by processor 704.
In another possible implementation, an image may be captured by one or both of the sensors 702, 703 and the image data may be sent by the transceiver 705 to the cloud for processing. The resulting image could then be sent back to the device 701 , as shown at 712 in Figure 7.
Therefore, the method may be deployed in multiple ways, for example in the cloud, on the device, or alternatively in dedicated hardware. As indicated above, the cloud facility could perform training to develop new algorithms or refine existing ones. Depending on the compute capability near to the data corpus, the training could either be undertaken close to the source data, or could be undertaken in the cloud, e.g. using an inference engine. The method may also be implemented at the device, in a dedicated piece of hardware, or in the cloud.
Examples of qualitative results are shown in Figures 8 to 12.
Figure 8 shows an RGB image 801 taken on the Fluawei P20pro smartphone and the RAW image 802 generated from the RGB image 801 using the approach described herein. Figures 9(a) and 9(b) show a comparison between generated RAW images using the approach described herein (Figure 9(a)) and a real smartphone RAW image (Figure 9(b)). Figures 10(a) and 10(b) show a further comparison between generated RAW images using the approach described herein (Figure 10(a)) and a real smartphone RAW image (Figure 10(b)).
Figures 11 (a) to 11 (d) show examples of qualitative results on a denoising task. Figure 11 (a) shows results from a model trained with the method of Brooks et al., Figure 11 (b) shows results from a model trained using the approach described herein. Figure 11 (c) shows the original noisy image. Figure 11 (d) shows a ground truth image. It can be seen that use of the described ISP model can generate higher-quality synthetic data and therefore the model may achieve better performance at low-level computer vision tasks, such as RAW image denoising.
Figures 12 (a) and 12 (b) show examples of qualitative results on a linear HDR to linear RAW transformation using the approach described herein with the unsupervised method. Figure 12(a) shows a HDR-RGB image from a Fluawei P40 smartphone. Figure 12(b) shows a generated linear RAW image using the approach described herein.
The approach described herein retains the benefits of classical ISP approaches in terms of efficiency, interpretability and modularity. Moreover, the approach may result in improved image processing by using CNNs for tone mapping, a lens shading learnable module and dictionary learning for core image processing operations such as the CCM and WB.
The pipeline can conveniently be trained end-to-end and there is no need to manually fine- tune parameters. The pipeline can be trained using paired data or unpaired data (i.e. adversarial training).
The pipeline advantageously exhibits modularity and interpretability. One can inspect, add, or modify any desired module.
The advantages of embodiments of the present invention compared to prior art methods are further summarized in Table 1 below.
Figure imgf000017_0001
Table 1 : Summary comparison between the present ISP and prior art approaches. Embodiments of the present invention do not require previous information from the camera it is desired to model. Usually, obtaining this information is very expensive and a time consuming process.
The approach described herein can learn and model image processing operations within an ISP in a modular fasion (e.g. CCM, WB, LSC and tone mapping, among others), the camera’s lens shading effect, piece-wise linear function approximations for tonemapping (via CNNs) and different representations of the ISP intermediate results. Previous approaches cannot generally learn nor model this important information. Previous methods model the ISP from a specific camera. The present approach can model multiple cameras at the same time as a result of the dictionary-based decomposition of WB and CCM.
The present approach can be adapted to perform multiple domain transformations, such as RGB to RAW, Linear HDR to Linear RAW, etc. By definition, the described model is reversible with little effort (RGB to RAW and RAW to RGB).
The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.

Claims

1 . An image processor (100) comprising a plurality of processing modules (102, 103, 104, 105, 106) configured to operate in series to transform an input image, one or more of the plurality of modules (102, 104, 105, 106) being configured to independently implement one or more trained artificial intelligence models, wherein each of the modules of the image processor has first (102a, 106a) and second (102b, 106b) operation modes that perform and invert a respective image transformation respectively on the input image or an intermediate image derived from the input image.
2. The image processor as claimed in claim 1 , wherein the image processor comprises multiple modules (102, 104, 105, 106) each configured to implement a trained artificial intelligence model, wherein the respective trained artificial intelligence models are trainable end-to-end.
3. The image processor as claimed in claim 2, wherein the respective trained artificial intelligence models are end-to-end trainable using paired or unpaired (602, 605) training images.
4. The image processor of any preceding claim, wherein one or more of the modules (103) is configured to implement a model comprising predefined parameters.
5. The image processor as claimed in any preceding claim, wherein one or more of the modules (102, 106) is configured to implement a different model (102a, 102b, 106a, 106b) for each of the first (102a, 106a) and second (102b, 106b) operation modes of the respective module.
6. The image processor as claimed in any prceeding claim, wherein the plurality of modules (102, 103, 104, 105, 106) of the image signal processor are collectively configured to perform or invert a total image transformation of the input image (101 ).
7. The image processor as claimed in any preceding claim, wherein the image processor comprises a module (104) configured to perform color correction, said module being configured to jointly learn a plurality of colour correction matrices (202) and a decoder that during inference is configured to estimate and apply a linear decomposition to the corresponding colour correction matrices.
8. The image processor of claim 7, wherein for the first operation mode that performs the color correction, the module is configured to multiply a color correction matrix by each pixel of the input image or intermediate image, and for the second operation mode that inverts the color correction, the module is configured to multiply a color correction matrix pseudo-inverse by each pixel of the input image or intermediate image.
9. The image processor as claimed in any preceding claim, wherein the image processor comprises a module (105) configured to perform white balancing, said module being configured to jointly learn a plurality of white balance vectors (302) and a decoder that during inference is configured to estimate and apply a linear decomposition to the corresponding white balance vectors.
10. The image processor of any of claims 7 to 9, wherein the linear decomposition(s) is/are in the form of a vector parameter.
11 . The image processor of any preceding claim, wherein the image processor comprises a module (106) configured to perform lens shading correction, said module being configured to perform the lens shading correction by the composition of an estimated attention mask (503) and a Gaussian mask (504) that is multiplied or divided by each pixel of the input image or the intermediate image (501) to perform or invert the lens shading correction respectively.
12. The image processor as claimed in any preceding claim, wherein the image processor comprises a module (102) configured to perform tone mapping, said module being configured to implement a neural network (102b) to perform the tone mapping, and a different neural network (102a) to invert the tone mapping.
13. The image processor as claimed in any preceding claim, wherein the processor comprises a module configured to perform and invert one or more of the follow image transformations: lens shading correction (106), white balance (105), color correction (104), gamma decompression (103) and tone mapping (102).
14. The image processor as claimed in any preceding claim, wherein the image processor is pipelined and each module (103, 104, 105, 106) except for the first module (102) in the pipeline is configured to take an input from a preceding module.
15. The image processor as claimed in any preceding claim, wherein the input image is a RAW image or an RGB image (101).
16. An image processing apparatus (700) comprising the image processor (100) as claimed in any preceding claim and an imaging device (701), wherein the image processor is configured to receive the input image from the imaging device.
17. A computer-implemented method for transforming an input image in an image processor comprising a plurality of processing modules (102, 103, 104, 105, 106) configured to operate in series, the method comprising, in one or more of the modules (102, 104, 105, 106), independently implementing one or more trained artificial intelligence models according to first (102a, 106a) and second (102b, 106b) operation modes that perform and invert a respective image transformation respectively on the input image (101) or an intermediate image derived from the input image.
PCT/EP2021/056588 2021-03-16 2021-03-16 Modular and learnable image signal processor WO2022194345A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2021/056588 WO2022194345A1 (en) 2021-03-16 2021-03-16 Modular and learnable image signal processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2021/056588 WO2022194345A1 (en) 2021-03-16 2021-03-16 Modular and learnable image signal processor

Publications (1)

Publication Number Publication Date
WO2022194345A1 true WO2022194345A1 (en) 2022-09-22

Family

ID=75111571

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2021/056588 WO2022194345A1 (en) 2021-03-16 2021-03-16 Modular and learnable image signal processor

Country Status (1)

Country Link
WO (1) WO2022194345A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115314604A (en) * 2022-10-12 2022-11-08 杭州魔点科技有限公司 Method and system for generating color correction matrix, electronic device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020126023A1 (en) * 2018-12-21 2020-06-25 Huawei Technologies Co., Ltd. Image processor
WO2020215180A1 (en) * 2019-04-22 2020-10-29 华为技术有限公司 Image processing method and apparatus, and electronic device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020126023A1 (en) * 2018-12-21 2020-06-25 Huawei Technologies Co., Ltd. Image processor
WO2020215180A1 (en) * 2019-04-22 2020-10-29 华为技术有限公司 Image processing method and apparatus, and electronic device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
BROOKS ET AL.: "Google Unprocessing for Learnt Raw Denoising", CVPR 2019
PATRICK HANSEN ET AL: "ISP4ML: Understanding the Role of Image Signal Processing in Efficient Deep Learning Vision Systems", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 18 November 2019 (2019-11-18), XP081535058 *
ZAMIR ET AL.: "CycleISP: Real Image Restoration via Improved Data Synthesis", CVPR 2020
ZAMIR SYED WAQAS ET AL: "CycleISP: Real Image Restoration via Improved Data Synthesis", 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE, 13 June 2020 (2020-06-13), pages 2693 - 2702, XP033805021, DOI: 10.1109/CVPR42600.2020.00277 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115314604A (en) * 2022-10-12 2022-11-08 杭州魔点科技有限公司 Method and system for generating color correction matrix, electronic device and storage medium

Similar Documents

Publication Publication Date Title
Liang et al. Cameranet: A two-stage framework for effective camera isp learning
Neshatavar et al. Cvf-sid: Cyclic multi-variate function for self-supervised image denoising by disentangling noise from image
Furuta et al. Fully convolutional network with multi-step reinforcement learning for image processing
TW202134997A (en) Method for denoising image, method for augmenting image dataset and user equipment
WO2020187424A1 (en) Image processor
Afifi et al. Cie xyz net: Unprocessing images for low-level computer vision tasks
CN104820974A (en) Image denoising method based on ELM
CN112465727A (en) Low-illumination image enhancement method without normal illumination reference based on HSV color space and Retinex theory
Ali et al. Comparametric image compositing: Computationally efficient high dynamic range imaging
CN116547694A (en) Method and system for deblurring blurred images
CN115526803A (en) Non-uniform illumination image enhancement method, system, storage medium and device
Panetta et al. Deep perceptual image enhancement network for exposure restoration
Ko et al. Learning lightweight low-light enhancement network using pseudo well-exposed images
WO2022194345A1 (en) Modular and learnable image signal processor
CN117593235A (en) Retinex variation underwater image enhancement method and device based on depth CNN denoising prior
CN111292251B (en) Image color cast correction method, device and computer storage medium
CN117095217A (en) Multi-stage comparative knowledge distillation process
CN111798381A (en) Image conversion method, image conversion device, computer equipment and storage medium
CN116342504A (en) Image processing method and device, electronic equipment and readable storage medium
CN114565543A (en) Video color enhancement method and system based on UV histogram features
WO2022207110A1 (en) Noise reconstruction for image denoising
Li et al. CLIP Guided Image-perceptive Prompt Learning for Image Enhancement
WO2024127590A1 (en) Image enhancing device, image enhancing method, and program
CN113658057B (en) Swin converter low-light-level image enhancement method
Otsuka et al. Self-Supervised Reversed Image Signal Processing via Reference-Guided Dynamic Parameter Selection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21713343

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21713343

Country of ref document: EP

Kind code of ref document: A1