US20220156899A1 - Electronic device for estimating camera illuminant and method of the same - Google Patents

Electronic device for estimating camera illuminant and method of the same Download PDF

Info

Publication number
US20220156899A1
US20220156899A1 US17/377,656 US202117377656A US2022156899A1 US 20220156899 A1 US20220156899 A1 US 20220156899A1 US 202117377656 A US202117377656 A US 202117377656A US 2022156899 A1 US2022156899 A1 US 2022156899A1
Authority
US
United States
Prior art keywords
image
color
neural network
camera
values
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US17/377,656
Inventor
Abdelrahman Abdelhamed
Abhijith Punnappurath
Michael Scott Brown
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Priority to US17/377,656 priority Critical patent/US20220156899A1/en
Assigned to SAMSUNG ELECTRONICS CO., LTD. reassignment SAMSUNG ELECTRONICS CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BROWN, MICHAEL SCOTT, PUNNAPPURATH, Abhijith, ABDELHAMED, Abdelrahman
Priority to PCT/KR2021/016244 priority patent/WO2022103121A1/en
Publication of US20220156899A1 publication Critical patent/US20220156899A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • G06T5/92
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/007Dynamic range modification
    • G06T5/009Global, i.e. based on properties of the image as a whole
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/64Circuits for processing colour signals
    • H04N9/67Circuits for processing colour signals for matrixing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/12Details of acquisition arrangements; Constructional details thereof
    • G06V10/14Optical characteristics of the device performing the acquisition or on the illumination arrangements
    • G06V10/141Control of illumination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/10Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from different wavelengths
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/80Camera processing pipelines; Components thereof
    • H04N23/84Camera processing pipelines; Components thereof for processing colour signals
    • H04N23/88Camera processing pipelines; Components thereof for processing colour signals for colour balance, e.g. white-balance circuits or colour temperature control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/90Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums
    • H04N5/247
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/64Circuits for processing colour signals
    • H04N9/73Colour balance circuits, e.g. white balance circuits or colour temperature control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Definitions

  • the disclosure relates to a system and method for estimating a scene illumination using a neural network configured to predict the scene illumination based on two or more images of the same scene that are simultaneously captured by two or more cameras having different spectral sensitivities, and performing white balance corrections on the captured images.
  • illuminant estimation is a critical step for computational color constancy.
  • Color constancy refers to the ability of the human visual system to perceive scene colors as being the same even when observed under different illuminations. Cameras do not innately possess this illumination adaptation ability, and a raw-RGB image recorded by a camera sensor has significant color cast due to the scene's illumination. As a result, computational color constancy is applied to the camera's raw-RGB sensor image as one of the first steps in the in-camera imaging pipeline to remove this undesirable color cast.
  • color constancy is achieved using (1) a statistics-based method or (2) a learning-based method.
  • Statistics-based methods operate using statistics from an image's color distribution and spatial layout to estimate the scene illuminant. These statistics-based methods are fast and easy to implement. However, these statistics-based methods make very strong assumptions about scene content and fail in cases where these assumptions do not hold.
  • Learning-based methods use labelled training data where the ground truth illumination corresponding to each input image is known from physical color charts placed in the scene.
  • learning-based approaches are shown to be more accurate than statistical-based methods.
  • learning-based methods in the related art usually include many more parameters than statistics-based ones. The number of parameters could reach up to tens of millions in some models, which result in a relatively longer training time.
  • One or more example embodiments provide a system and method for estimating a scene illumination using a neural network configured to predict the scene illumination based on two or more images of the same scene that are simultaneously captured by two or more cameras having different spectral sensitivities.
  • the multiple-camera setup may provide a benefit of improving the accuracy of illuminant estimation.
  • an apparatus for processing image data may include: a memory storing instructions; and a processor configured to execute the instructions to: obtain a first image and a second image that capture a same scene in different views, from a first camera and a second camera, respectively; spatially align the first image with the second image; obtain a color transformation matrix that maps the first image to the second image based on color values of the first image and the second image; obtain an estimated illuminant color from an output of a neural network by inputting the color transformation matrix to the neural network, wherein the neural network is trained based on a pair of reference images of a same reference scene and a color rendition chart that are captured by different cameras having different spectral sensitivities; and perform a white balance correction on the first image based on the estimated illuminant color to output a corrected first image.
  • the neural network may be trained to minimize a loss between the estimated illuminant color and a ground-truth illuminant color, and the ground-truth illuminant color may be obtained from a color value of at least one achromatic patch in the color rendition chart.
  • the second image may show a wider view of the same scene than the first image
  • the processor may be further configured to execute the instructions to: crop the second image to have a same view as the first image, to spatially align the first image with the cropped second image.
  • the processor may be further configured to execute the instructions to: down-sample the first image to obtain a down-sampled first image; down-sample the cropped second image to obtain a down-sampled second image; and compute the color transformation matrix that maps the down-sampled first image to the down-sampled second image based on color values of the down-sampled first image and the down-sampled second image.
  • the color transformation matrix may be a three-by-three matrix that maps RGB values of the first image to RGB values of the second image.
  • the output of the neural network may represent a ratio of RGB values of the estimated illuminant color.
  • the neural network may be further trained using augmented images, and the augmented images may be obtained by re-illuminating a first reference image and a second reference image of different scenes under different illuminations that are captured by a same reference camera, based on color transformations between first color chart values of the first reference image and second color chart values of the second reference image.
  • the neural network may be further trained using augmented images, and the augmented images may be obtained by re-illuminating a first reference image and a second reference image of different scenes under different illuminations that are captured by a same reference camera, based on color transformations between all color values of the first reference image and all color values of the second reference image.
  • the color transformation matrix may correspond to a first color transformation matrix.
  • the processor may be further configured to execute the instructions to: obtain, from a third camera, a third image that captures the same scene in a view different from the views of the first image and the second image; spatially align the third image with the first image; spatially align the third image with the second image; obtain a second color transformation matrix that maps the first image to the third image based on the color values of the first image and color values of the third image; obtain a third color transformation matrix that maps the second image to the third image based on the color values of the second image and the color values of the third image; concatenate the first, the second, and the third color transformation matrices to obtain a concatenated matrix; obtain the estimated illuminant color from the output of the neural network by inputting the concatenated matrix to the neural network; and performing the white balance correction on the first image based on the estimated illuminant color to output the corrected first image.
  • the apparatus may be a user device in which the first camera and the second camera are mounted, and the first camera and the second camera may have different fields of view and different spectral sensitivities.
  • the apparatus may be a server including a communication interface configured to communicate with a user device including the first camera and the second camera, to receive the first image and the second image from the user device.
  • a method for processing image data may include: obtaining a first image and a second image that capture a same scene in different views, from a first camera and a second camera, respectively; spatially aligning the first image with the second image; obtaining a color transformation matrix that maps the first image to the second image based on color values of the first image and the second image; obtaining an estimated illuminant color from an output of a neural network by inputting the color transformation matrix to the neural network, wherein the neural network is trained based on a pair of reference images of a same reference scene and a color rendition chart that are captured by different cameras having different spectral sensitivities; and performing a white balance correction on the first image based on the estimated illuminant color to output a corrected first image.
  • the neural network may be trained to minimize a loss between the estimated illuminant color and a ground-truth illuminant color, and wherein the ground-truth illuminant color may be obtained from a color value of at least one achromatic patch in the color rendition chart.
  • the second image may show a wider view of the same scene than the first image, and the method may further include: cropping the second image to have a same view as the first image, to spatially align the first image with the cropped second image.
  • the method may further include: down-sampling the first image to obtain a down-sampled first image; down-sampling the cropped second image to obtain a down-sampled second image; and computing the color transformation matrix that maps the down-sampled first image to the down-sampled second image based on color values of the down-sampled first image and the down-sampled second image.
  • the color transformation matrix may be a three-by-three matrix that maps RGB values of the first image to RGB values of the second image.
  • the output of the neural network may represent a ratio of RGB values of the estimated illuminant color.
  • the neural network may be further trained using augmented images, and the augmented images may be obtained by re-illuminating a first reference image and a second reference image of different scenes under different illuminations that are captured by a same reference camera, based on color transformations between first color chart values of the first reference image and second color chart values of the second reference image.
  • the color transformation matrix may correspond to a first color transformation matrix.
  • the method may further include: obtaining, from a third camera, a third image that captures the same scene in a view different from the views of the first image and the second image; spatially aligning the third image with the first image; spatially aligning the third image with the second image; obtaining a second color transformation matrix that maps the first image to the third image based on the color values of the first image and color values of the third image; obtaining a third color transformation matrix that maps the second image to the third image based on the color values of the second image and the color values of the third image; concatenating the first, the second, and the third color transformation matrices to obtain a concatenated matrix; obtaining the estimated illuminant color from the output of the neural network by inputting the concatenated matrix to the neural network; and performing the white balance correction on the first image based on the estimated illuminant color to output the corrected first image.
  • a non-transitory computer readable storage medium storing a program to be executable by at least one processor to perform a method for processing image data, including: obtaining a first image and a second image that capture a same scene in different views, from a first camera and a second camera, respectively; spatially aligning the first image with the second image; obtaining a color transformation matrix that maps the first image to the second image based on color values of the first image and the second image; obtaining an estimated illuminant color from an output of a neural network by inputting the color transformation matrix to the neural network, wherein the neural network is trained based on a pair of reference images of a same reference scene and a color rendition chart that are captured by different cameras having different spectral sensitivities; and performing a white balance correction on the first image based on the estimated illuminant color to output a corrected first image.
  • FIG. 1 is a diagram of a system for performing image processing using a pair of cameras according to an embodiment
  • FIG. 2 is a diagram of a user device and spectral sensitivities of a pair of cameras mounted on the user device according to an embodiment
  • FIG. 3 illustrates a wrap and crop operation according to an embodiment
  • FIG. 4 is a diagram of a neural network for estimating illumination of a scene captured by a pair of cameras according to an embodiment
  • FIG. 5 is a diagram of devices of the system for performing the image processing according to an embodiment
  • FIG. 6 is a diagram of components of the devices of FIG. 5 according to an embodiment
  • FIG. 7 is a diagram of a system for training a neural network of FIG. 5 according to an embodiment
  • FIG. 8 illustrates a data augmentation process according to an embodiment
  • FIG. 9 illustrates a data augmentation process based on full matrix transformation between color rendition charts captured in images according to an embodiment
  • FIG. 10 illustrates a data augmentation process based on diagonal transformation between illuminants according to an embodiment
  • FIG. 11 illustrates a data augmentation process based on full matrix transformation between images according to an embodiment
  • FIG. 12 is a diagram of a system for performing image processing using more than two cameras according to an embodiment.
  • Example embodiments of the present disclosure are directed to estimating a scene illumination in the RGB color space of camera sensors, and applying a matrix computed from estimated scene illumination parameters to perform a white-balance correction.
  • FIG. 1 is a diagram of a method for estimating illumination of a physical scene using a neural network according to an embodiment.
  • an image signal processing is performed using a pair of images of the same physical scene that are simultaneously captured by two different cameras, a first camera 111 and a second camera 112 .
  • both the illuminant for the first camera 111 and the illuminant for the second camera 112 are predicted, but for simplicity, the method shown in FIG. 1 focuses on estimating the illuminant for the first camera 111 .
  • the two cameras 111 and 112 may have different focal lengths and lens configurations to allow a user device (e.g., a smartphone) 110 to deliver DSLR-like optical capabilities of providing a wide-angle view and a telephoto. Also, the two cameras 111 and 112 may have different spectral sensitivities and therefore may provide different spectral measurements of the physical scene.
  • a user device e.g., a smartphone
  • the two cameras 111 and 112 may have different spectral sensitivities and therefore may provide different spectral measurements of the physical scene.
  • Graphs (a) and (b) shown in FIG. 2 represent the spectral sensitivities of the first camera 111 and the second camera 112 in RGB channels, respectively.
  • the pitch of photodiodes and the overall resolutions of the two image sensors (e.g., charge-coupled device (CCD) sensors) mounted in the first camera 111 and the second camera 112 may be different from each other to accommodate the different optics associated with each sensor.
  • different color filter arrays (CFA) may be used in the first camera 111 and the second camera 112 according to the different optics, which may result in the different spectral sensitivities to incoming light as shown in graphs (a) and (b) of FIG. 2 .
  • the first camera 111 and the second camera 112 may simultaneously capture a first (unprocessed) raw-RGB image and a second (unprocessed) raw-RGB image of the same scene, respectively, that provide different spectral measurements of the scene.
  • the first raw-RGB image and the second raw-RGB image may have different views while capturing the same scene.
  • the image signal processing according to an embodiment of the present disclosure may use the color values of the scene captured with the different spectral sensitivities to estimate the scene illumination since the color values are correlated with the scene illumination.
  • the image signal processing may include: image alignment operation S 110 for spatially aligning a pair of images, color transformation operation S 120 for computing color transformation between the images, illumination estimation operation S 130 for estimate the scene illumination using a neural network, and white balance operation S 140 for correcting scene colors in the images based on the estimated scene illumination.
  • a global homography may be used to align two different images of the same scene having different fields of view, and then down-sampling is performed on the aligned two images, prior to computing color transformation between the two images.
  • down-sampling S 111 and S 113 and warping and cropping S 112 are performed to register the pair of the first raw-RGB image and the second raw-RGB image, which capture the same scene but have different fields of view.
  • the first raw-RGB image is downscaled by a preset factor (e.g., a factor of six) in operation S 111 .
  • a preset factor e.g., a factor of six
  • either or both of image warping and image cropping S 112 are performed on the second raw-RGB image to align the second raw-RGB image with the first raw-RGB image.
  • the second raw-RGB image is cropped to have the same size of the field of view as the first raw-RGB image.
  • any one or any combination of transformation, rotation, and translation may be applied to the second raw-RGB image so that the same objects in the first raw-RGB image and the second raw-RGB image are located at the same pixel coordinates.
  • FIG. 3 illustrates a wrap and crop operation according to an embodiment of the disclosure.
  • a pre-calibrated perspective transform H is calculated between the first and second cameras 111 and 112 , and the perspective transform H is applied to the second raw-RGB image to align the second raw-RGB image with the first raw-RGB image.
  • the first camera 1 and the second camera 2 may capture a preset pattern to obtain image 1 and image 2 , respectively.
  • At least four points x′ 1 , x′ 2 , x′ 3 , and x′ 4 are selected from image 1 to compute the perspective transform H.
  • x′ 1 ( x′ 1 ,y′ 1 ,1) T
  • x′ 2 ( x′ 2 ,y′ 2 ,1) T
  • x′ 3 ( x′ 3 ,y′ 3 ,1) T
  • x′ 4 ( x′ 4 ,y′ 4 ,1) T
  • Matrix h [h 1 , h 2 , h 3 , h 4 , h 5 , h 6 , h 7 , h 8 , h 9 ] is obtained based on the following:
  • the warp and crop operation for a new scene is performed by applying the perspective transform H to an image captured by the second camera 112 (e.g., the second raw-RGB image).
  • the warp and crop operation may be performed only once for the two cameras 111 and 112 , rather than being performed individually for new images captured by the cameras 111 and 112 .
  • the down-sampling S 111 and the down-sampling S 113 may use the same down-sampling factor to allow the down-sampled first raw-RGB image and the down-sampled first raw-RGB image to have substantially the same resolution.
  • the present embodiment is not limited thereto, and different down-sampling factors may be used for the down-sampling S 111 and the down-sampling S 113 .
  • the first processing pipeline including operation S 111 and the second processing pipeline including operations S 112 and S 113 may be executed in parallel or in sequence.
  • the down-sampling S 111 and the down-sampling S 113 prior to computing the color transformation may make the illumination estimation robust to any small misalignments and slight parallax in the two views. Since the hardware arrangement of the two cameras 111 and 112 does not change for a given device (e.g., the user device 110 ), the homography can be pre-computed and remains fixed for all image pairs from the same device.
  • a color transformation matrix is computed to map the down-sampled first raw-RGB image from the first camera 111 to the corresponding aligned and down-sampled second raw-RGB image from the second camera 112 .
  • the color transformation between the two different images of the same scene may have a unique signature that is related to the scene illumination. Accordingly, the color transformation itself may be used as the feature for illumination estimation.
  • T is computed using the pseudo inverse, as follows:
  • the linear color transformation T may be represented in a 3 ⁇ 3 color transformation matrix as follows:
  • T 3 ⁇ 3 ( t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 )
  • A denotes pixel values in R, G, B color channels for the down-sampled first raw-RGB image
  • B denotes pixel values in R, G, B color channels for the aligned and down-sampled second raw-RGB image
  • the 3 ⁇ 3 color transformation matrix T between A and B is calculated as follows.
  • the three columns correspond to R, G, B color channels, and the rows correspond to the number of pixels in the down-sampled first raw-RGB image and the aligned and down-sampled second raw-RGB image, respectively.
  • the 3 ⁇ 3 color transformation matrix is used since the 3 ⁇ 3 color transformation matrix is linear and accurate, and computationally efficient.
  • a neural network trained for estimating the illumination of the scene receives, as input, the color transformation, and outputs a two-dimensional (2D) chromaticity value that corresponds to the illumination estimation of the scene.
  • the 2D chromaticity value may be represented by a ratio of R, G, and B values, such as 2D [R/G B/G].
  • the estimated illumination ⁇ circumflex over (L) ⁇ is expressed as:
  • the neural network may include an input layer having nine (9) nodes for receiving the nine (9) parameters of the 3 ⁇ 3 color transformation matrix, an output layer having two nodes for outputting the 2D chromaticity value, a set of hidden layers placed between the input layer and the output layer.
  • each hidden layer may include nine (9) nodes.
  • the neural network according to an example embodiment may be required to process only the nine parameters in the color transformation matrix, and as a result, the neural network is relatively very light compared with other image processing networks, and therefore is capable of being efficiently run on-device in real time.
  • a method and a system for training the neural network will be described later with reference to FIG. 7 .
  • a white balance gain of the first raw-RGB image is adjusted based on the estimated illumination of the light source at the scene.
  • Parameters such as the R gain and the B gain i.e., the gain values for the red color channel and the blue color channel
  • the R gain and the B gain are calculated based upon a preset algorithm.
  • white balance correction factors are selected for the first raw-RGB image based on the estimated illumination, and each color component (e.g., R WB , G WB , B WB ) of the first raw-RGB image is multiplied with its respective correction factor (e.g., ⁇ , ⁇ , ⁇ ) to obtain white-balanced color components (e.g., ⁇ R WB , ⁇ G WB , ⁇ B WB ).
  • a R/G correction factor and a B/G correction factor may be computed based on the estimated illumination, to adjust the R/G gain and B/G gain of the first raw-RGB image.
  • FIG. 5 is a diagram of devices for performing the illumination estimation according to an embodiment.
  • FIG. 5 includes a user device 110 , a server 120 , and a network 130 .
  • the user device 110 and the server 120 may interconnect via wired connections, wireless connections, or a combination of wired and wireless connections.
  • the user device 110 includes one or more devices configured to generate an output image.
  • the user device 110 may include a computing device (e.g., a desktop computer, a laptop computer, a tablet computer, a handheld computer, a smart speaker, a server, etc.), a mobile phone (e.g., a smart phone, a radiotelephone, etc.), a camera device, a wearable device (e.g., a pair of smart glasses or a smart watch), or a similar device.
  • a computing device e.g., a desktop computer, a laptop computer, a tablet computer, a handheld computer, a smart speaker, a server, etc.
  • a mobile phone e.g., a smart phone, a radiotelephone, etc.
  • a camera device e.g., a camera device, a wearable device (e.g., a pair of smart glasses or a smart watch), or a similar device.
  • the server 120 includes one or more devices configured to train a neural network for predicting the scene illumination using camera images to correct scene colors in the camera images.
  • the server 120 may be a server, a computing device, or the like.
  • the server 120 may receive camera images from an external device (e.g., the user device 110 or another external device), train a neural network for predicting illumination parameters using the camera images, and provide the trained neural network to the user device 110 to permit the user device 110 to generate an output image using the neural network.
  • an external device e.g., the user device 110 or another external device
  • the network 130 includes one or more wired and/or wireless networks.
  • network 130 may include a cellular network (e.g., a fifth generation (5G) network, a long-term evolution (LTE) network, a third generation (3G) network, a code division multiple access (CDMA) network, etc.), a public land mobile network (PLMN), a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a telephone network (e.g., the Public Switched Telephone Network (PSTN)), a private network, an ad hoc network, an intranet, the Internet, a fiber optic-based network, or the like, and/or a combination of these or other types of networks.
  • 5G fifth generation
  • LTE long-term evolution
  • 3G third generation
  • CDMA code division multiple access
  • PLMN public land mobile network
  • LAN local area network
  • WAN wide area network
  • MAN metropolitan area network
  • PSTN Public Switched Telephone Network
  • PSTN Public Switch
  • the number and arrangement of devices and networks shown in FIG. 5 are provided as an example. In practice, there may be additional devices and/or networks, fewer devices and/or networks, different devices and/or networks, or differently arranged devices and/or networks than those shown in FIG. 5 . Furthermore, two or more devices shown in FIG. 5 may be implemented within a single device, or a single device shown in FIG. 5 may be implemented as multiple, distributed devices. Additionally, or alternatively, a set of devices (e.g., one or more devices) may perform one or more functions described as being performed by another set of devices.
  • FIG. 6 is a diagram of components of one or more devices of FIG. 5 according to an embodiment.
  • Device 200 may correspond to the user device 110 and/or the server 120 .
  • the device 200 may include a bus 210 , a processor 220 , a memory 230 , a storage component 240 , an input component 250 , an output component 260 , and a communication interface 270 .
  • the bus 210 includes a component that permits communication among the components of the device 200 .
  • the processor 220 is implemented in hardware, firmware, or a combination of hardware and software.
  • the processor 220 is a central processing unit (CPU), a graphics processing unit (GPU), an accelerated processing unit (APU), a microprocessor, a microcontroller, a digital signal processor (DSP), a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), or another type of processing component.
  • the process 220 includes one or more processors capable of being programmed to perform a function.
  • the memory 230 includes a random access memory (RAM), a read only memory (ROM), and/or another type of dynamic or static storage device (e.g., a flash memory, a magnetic memory, and/or an optical memory) that stores information and/or instructions for use by the processor 220 .
  • RAM random access memory
  • ROM read only memory
  • static storage device e.g., a flash memory, a magnetic memory, and/or an optical memory
  • the storage component 240 stores information and/or software related to the operation and use of the device 200 .
  • the storage component 240 may include a hard disk (e.g., a magnetic disk, an optical disk, a magneto-optic disk, and/or a solid state disk), a compact disc (CD), a digital versatile disc (DVD), a floppy disk, a cartridge, a magnetic tape, and/or another type of non-transitory computer-readable medium, along with a corresponding drive.
  • the input component 250 includes a component that permits the device 200 to receive information, such as via user input (e.g., a touch screen display, a keyboard, a keypad, a mouse, a button, a switch, and/or a microphone).
  • the input component 250 may include a sensor for sensing information (e.g., a global positioning system (GPS) component, an accelerometer, a gyroscope, and/or an actuator).
  • GPS global positioning system
  • the input component 250 may include two or more cameras, including the first camera 111 and the second camera 112 illustrated in FIG. 2 .
  • the first camera 111 and the second camera 112 may be rear-facing cameras that have different spectral sensitivities and have different fields of view from each other.
  • the output component 260 includes a component that provides output information from the device 200 (e.g., a display, a speaker, and/or one or more light-emitting diodes (LEDs)).
  • a component that provides output information from the device 200 e.g., a display, a speaker, and/or one or more light-emitting diodes (LEDs)).
  • LEDs light-emitting diodes
  • the communication interface 270 includes a transceiver-like component (e.g., a transceiver and/or a separate receiver and transmitter) that enables the device 200 to communicate with other devices, such as via a wired connection, a wireless connection, or a combination of wired and wireless connections.
  • the communication interface 270 may permit device 200 to receive information from another device and/or provide information to another device.
  • the communication interface 270 may include an Ethernet interface, an optical interface, a coaxial interface, an infrared interface, a radio frequency (RF) interface, a universal serial bus (USB) interface, a Wi-Fi interface, a cellular network interface, or the like.
  • the device 200 may perform one or more processes described herein.
  • the device 200 may perform operations S 110 -S 140 based on the processor 220 executing software instructions stored by a non-transitory computer-readable medium, such as the memory 230 and/or the storage component 240 .
  • a computer-readable medium is defined herein as a non-transitory memory device.
  • a memory device includes memory space within a single physical storage device or memory space spread across multiple physical storage devices.
  • Software instructions may be read into the memory 230 and/or the storage component 240 from another computer-readable medium or from another device via the communication interface 270 .
  • software instructions stored in the memory 230 and/or storage component 240 may cause the processor 220 to perform one or more processes described herein.
  • hardwired circuitry may be used in place of or in combination with software instructions to perform one or more processes described herein.
  • embodiments described herein are not limited to any specific combination of hardware circuitry and software.
  • FIG. 7 is a diagram of a system for training a neural network of FIG. 4 according to an embodiment.
  • the training process may be performed by the user device 110 or the server 120 , using the components illustrated in FIG. 6 .
  • the neural network is trained to predict the illuminant for the first camera 111 and the illuminant for the second camera 112 using the same color transforms, but for simplicity, the description of the training process in the present disclosure focuses on estimating the illuminant for the first camera 111 .
  • a network training process is performed using a pair of images of the same physical scene that are simultaneously captured by two different cameras 111 and 112 .
  • the two cameras 111 and 112 may have different spectral sensitivities and therefore may provide different spectral measurements for the same scene having the same light source.
  • the first camera 111 and the second camera 112 may simultaneously capture a first raw-RGB image and a second raw-RGB image of the same scene, respectively, that provide different spectral measurements of the scene.
  • the first raw-RGB image and the second raw-RGB image may have different views while capturing the same scene.
  • the first camera 111 and the second camera 112 may capture a color rendition chart as shown in FIG. 7 .
  • the color rendition chart may allow the first raw-RGB image and the second raw-RGB to provide a wide distribution of colors under the scene.
  • the neutral patches (also referred to as “achromatic patches” or “gray patches”) of the color rendition chart in the first raw-RGB image may provide a ground truth illumination value (e.g., a ground-truth illuminant color) for the first raw-RGB image.
  • the neutral patches in the second raw-RGB image may provide a ground truth illumination value for the second raw-RGB image.
  • the first raw-RGB image and the second raw-RGB image may be referred to as image 1 and image 2 .
  • image 1 and image 2 are spatially aligned with each other, for example, using a global homography.
  • image 2 is cropped to have the same size of the field of view as image 2 , and any one or any combination of transformation, rotation, and translation is applied to image 2 so that the same objects (e.g., the slide) in image 1 and image 2 are located at the same pixel coordinates.
  • the aligned image 1 and image 2 are down-sampled prior to computing color transformation between image 1 and image 2 .
  • the down-sampling may make the illumination estimation robust to any small misalignments and slight parallax in the two views of images 1 and 2 . Since the hardware arrangement of the two cameras 111 and 112 does not change for a given device, the homography can be pre-computed and remains fixed for all image pairs from the same device.
  • a color transformation matrix is computed to map the down-sampled image 1 from the first camera 111 to the corresponding aligned and down-sampled image from the second camera 112 .
  • the color transformation matrix may be computed based on Equations (1) and (2).
  • a neural network for estimating the illumination of the scene is constructed to have the structure shown in FIG. 4 .
  • the neural network may include an input layer having nine (9) nodes for receiving the nine (9) parameters of a 3 ⁇ 3 color transformation matrix, an output layer having two nodes for outputting the 2D chromaticity value, a set of hidden layers placed between the input layer and the output layer.
  • the neural network according to an example embodiment may be required to process only the nine parameters in the color transformation matrix, and as a result, the neural network is relatively very light compared with other image processing networks, and therefore is capable of being efficiently run on-device in real time.
  • the neural network receives, as input, the parameters of the color transformation matrix, and outputs a two-dimensional (2D) chromaticity value that corresponds to the illumination estimation of the scene.
  • the 2D chromaticity value may be represented as 2D [R/G B/G], indicating a ratio of a red color value to a green color value, and a ratio of a blue color value to the green color value.
  • T ⁇ T 1 , . . . ,T M ⁇
  • (I 11 ,I 21 ) may denote image 1 and image 2
  • T 1 may denote color transformation between image 1 and image 2 .
  • the training process according to the embodiment is described using the pair of images 1 and 2 , but a large number of paired images may be used for training the neural network.
  • Augmented training images may be developed by applying mathematical transformation functions to camera captured images. The description of data augmentation will be provided later with reference to FIGS. 7-9 .
  • a set of corresponding target ground truth illuminations L of image I 1i (i.e., as measured by the first camera 111 ) is obtained from each pair of images as follows:
  • the ground truth illumination L 1 may denote a ground truth illumination of image 1 .
  • the ground truth illumination L 1 may be obtained by extracting the image area of the neutral patches from image 1 and measuring pixel colors of the neutral patches since the neutral patches work as a good reflector of the scene illumination. For example, average pixel colors L 1 [R avg , G avg , B avg ] inside the neutral patches may be used as the ground truth illumination L 1 for image 1 .
  • the neural network f ⁇ T ⁇ L is trained with parameters ⁇ to model the mapping between the color transformations T and scene illuminations L.
  • the neural network f ⁇ may predict the scene illumination L for the first camera 111 given the color transformation T between image 1 and image 2 , as follows:
  • the neural network f ⁇ is trained to minimize the loss between the predicted illuminations ⁇ circumflex over (L) ⁇ i and the ground truth illuminations L i as follows:
  • the neural network is lightweight, for example, consisting of a small number (e.g., 2, 5, or 16) of dense layers, wherein each layer has nine neurons only.
  • the total number of parameters may range from 200 parameters for the 2-layer neural network up to 1460 parameters for the 16-layer neural network.
  • the input to the neural network is the flattened nine values of the color transformation T and the output is two values corresponding to the illumination estimation in the 2D [R/G B/G] chromaticity color space where the green channel's value may be set to 1.
  • the user device 110 or the server 120 may use the neural network that has been trained by an external device without performing an additional training process on the user device 110 or the server 120 , or alternatively may continue to train the neural network in real time on the user device 110 or the server 120 .
  • FIG. 8 illustrates a data augmentation process according to an embodiment.
  • a data augmentation process may be performed to increase the number of training samples and the generalizability of the model according to an example embodiment.
  • image I 1 is captured under a source illuminant L 1 [r 1 , g 1 , b 1 ] and includes a color rendition chart.
  • Image I 1 is re-illuminated to obtain image I 1 ′ which appears to be captured under the target illuminant L 2 [r 2 , g 2 , b 2 ].
  • Image I 1 ′ as well as image I 1 may be used to train the neural network.
  • FIG. 9 illustrates a data augmentation process based on a full matrix transformation between color rendition charts captured in images according to an embodiment.
  • a pair of captured images I 1 and I 2 are used to obtain a re-illuminated image I 1 ′ that includes the same image content as the captured image I 1 but has different color values from the captured image I 1 .
  • the captured image I 1 and captured image I 2 are images captured by the same camera (e.g., the first camera 111 ), under different light sources, illuminant L 1 and illuminant L 2 , respectively.
  • the captured image I 1 and captured image I 2 both include a color rendition chart captured therein.
  • the color rendition chart is extracted from each of the captured image I 1 and the captured image I 2 .
  • a color transformation matrix T is computed based on the color chart values of the captured image I 1 and the color chart values of the captured image I 2 .
  • the color transformation matrix T may convert the color chart values of the captured image I 1 to the color chart values of the captured image I 2 .
  • the color transformation matrix T is applied to the captured image I 1 to transform approximately all the colors in the captured image I 1 and thereby to obtain the re-illuminated image I 1 ′ which appears to be captured under illuminant L 2 .
  • FIG. 9 shows augmentation of an image pair from the first camera 111 only
  • the corresponding pair of images from the second camera 112 is augmented in the same way.
  • the captured image I 2 gas well as the captured image I 1 is re-illuminated in a similar manner, based on a color transformation matrix that transforms the color chart values of the captured image I 2 to the color chart values of the captured image I 1 .
  • the color values of the color chart patches are extracted from each image.
  • a color transformation T C 1i ⁇ 1j ⁇ R 3 ⁇ 3 between each pair of images (I 1i , I 1j ) is obtained from the first camera 111 based only on the color chart values from the two images (I 1i , I 1j ) as follows:
  • T C 1i ⁇ 1j ( I 1i T I 1i ) ⁇ 1 I 1i T I 1j
  • T C 2i ⁇ 2j ( I 2i T I 2i ) ⁇ 1 I 2i T I 2j
  • This bank of color transformations is applied to augment images by re-illuminating any given pair of images from the two cameras (I 1i ,I 2i ) to match their colors to any target pair of images I 1j , I 2j , as follows:
  • i ⁇ j means re-illuminating image i to match the colors of image j.
  • the number of training image pairs may be increased from M to M 2 .
  • approximately all colors may be transformed since the color rendition charts included in the images provide a wide distribution of colors.
  • the data augmentation process is not limited to the method of using the color rendition charts as shown in FIG. 9 , and different data augmentation methods may be applied as shown in FIGS. 10 and 11 .
  • FIG. 10 illustrates a data augmentation process based on a diagonal transformation between illuminants according to an embodiment.
  • a source illuminant L 1 [r 1 , g 1 , b 1 ] and a target illuminant L 2 [r 2 , g 2 , b 2 ] are identified from images I 1 and I 2 that are captured by the same camera (e.g., the first camera 111 ).
  • a color transformation between the source illuminant L 1 [r 1 , g 1 , b 1 ] and the target illuminant L 2 [r 2 , g 2 , b 2 ] may be obtained as follows:
  • image I 1 The color transformation is applied to image I 1 to change neutral color values of image I 1 and thereby to obtain image I 1 ′ which appears to be captured under the target illuminant L 2 [r 2 , g 2 , b 2 ].
  • Image I 1 ′ as well as image I 1 may be used to train the neural network.
  • FIG. 11 illustrates a data augmentation process based on a full matrix transformation between images according to an embodiment.
  • a color transformation matrix T is obtained using all image colors of image I 1 and all image colors of Image I 2 , unlike the embodiment of FIG. 9 in which the color chart values extracted from images I 1 and I 2 are used to calculate the color transformation matrix T.
  • a color rendition chart may be omitted from images I 1 and I 2 , and instead, images I 1 and I 2 may be required to capture a scene having a wide distribution of colors. Also, the color transformation matrix T may be computed individually for each image pair.
  • FIG. 12 is a diagram of a system for performing image processing using more than two cameras according to an embodiment.
  • 3 ⁇ 3 color transformation matrices are constructed independently using the process described with reference to FIG. 1 .
  • color transformation matrices are then concatenated and fed as input to the neural network.
  • the feature vector that is input to the network is of the size of
  • raw-RGB image 1 , raw-RGB image 2 , and raw-RGB image 3 are captured by camera 1 , camera 2 , and camera 3 , respectively.
  • the raw-RGB image 1 and the raw-RGB image 2 are re aligned with each other and down-sampled for calculation of a first color transformation between the down-sampled raw-RGB image 1 and the aligned and down-sampled raw-RGB image 2 .
  • the raw-RGB image 1 and the raw-RGB image 3 are aligned with each other and down-sampled for calculation of a second color transformation between the down-sampled raw-RGB image 1 and the aligned and down-sampled raw-RGB image 3 .
  • the raw-RGB image 2 and the raw-RGB image 3 are aligned with each other and down-sampled for calculation of a third color transformation between the down-sampled raw-RGB image 2 and the aligned and down-sampled raw-RGB image 3 .
  • the first color transformation, the second color transformation, and the third color transformation are concentrated at a concatenation layer, and then are fed as input to a neural network for estimating the scene illumination.
  • Each of the first color transformation, the second color transformation, and the third color transformation may be a 3 ⁇ 3 matrix.
  • the neural network may have an input layer having 27 nodes for receiving 27 parameters of the concatenated matrices, an output layer having 2 nodes for outputting a 2D chromaticity value for correcting color values of the raw-RGB image 1 , and a set of hidden layers located between the input layer and the output layer.
  • component is intended to be broadly construed as hardware, firmware, or a combination of hardware and software.
  • the expression, “at least one of a, b, and c,” should be understood as including only a, only b, only c, both a and b, both a and c, both b and c, all of a, b, and c, or any variations of the aforementioned examples.

Abstract

A method for processing image data may include: obtaining a first image and a second image that capture a same scene in different views, from a first camera and a second camera, respectively; spatially aligning the first image with the second image; obtaining a color transformation matrix that maps the first image to the second image based on color values of the first image and the second image; obtaining an estimated illuminant color from an output of a neural network by inputting the color transformation matrix to the neural network, wherein the neural network is trained based on a pair of reference images of a same reference scene and a color rendition chart that are captured by different cameras having different spectral sensitivities; and performing a white balance correction on the first image based on the estimated illuminant color to output a corrected first image.

Description

    CROSS-REFERENCE TO RELATED APPLICATION(S)
  • This application is based on and claims priority under 35 U.S.C. § 119 to U.S. Provisional Patent Application No. 63/114,079 filed on Nov. 16, 2020, U.S. Provisional Patent Application No. 63/186,346 filed on May 10, 2021, in the U.S. Patent & Trademark Office, the disclosures of which are incorporated herein by reference in their entireties.
  • BACKGROUND 1. Field
  • The disclosure relates to a system and method for estimating a scene illumination using a neural network configured to predict the scene illumination based on two or more images of the same scene that are simultaneously captured by two or more cameras having different spectral sensitivities, and performing white balance corrections on the captured images.
  • 2. Description of Related Art
  • In processing camera captured images, illuminant estimation is a critical step for computational color constancy. Color constancy refers to the ability of the human visual system to perceive scene colors as being the same even when observed under different illuminations. Cameras do not innately possess this illumination adaptation ability, and a raw-RGB image recorded by a camera sensor has significant color cast due to the scene's illumination. As a result, computational color constancy is applied to the camera's raw-RGB sensor image as one of the first steps in the in-camera imaging pipeline to remove this undesirable color cast.
  • In the related art, color constancy is achieved using (1) a statistics-based method or (2) a learning-based method.
  • Statistics-based methods operate using statistics from an image's color distribution and spatial layout to estimate the scene illuminant. These statistics-based methods are fast and easy to implement. However, these statistics-based methods make very strong assumptions about scene content and fail in cases where these assumptions do not hold.
  • Learning-based methods use labelled training data where the ground truth illumination corresponding to each input image is known from physical color charts placed in the scene. In general, learning-based approaches are shown to be more accurate than statistical-based methods. However, learning-based methods in the related art usually include many more parameters than statistics-based ones. The number of parameters could reach up to tens of millions in some models, which result in a relatively longer training time.
  • SUMMARY
  • One or more example embodiments provide a system and method for estimating a scene illumination using a neural network configured to predict the scene illumination based on two or more images of the same scene that are simultaneously captured by two or more cameras having different spectral sensitivities. The multiple-camera setup may provide a benefit of improving the accuracy of illuminant estimation.
  • According to an aspect of an example embodiment, an apparatus for processing image data, may include: a memory storing instructions; and a processor configured to execute the instructions to: obtain a first image and a second image that capture a same scene in different views, from a first camera and a second camera, respectively; spatially align the first image with the second image; obtain a color transformation matrix that maps the first image to the second image based on color values of the first image and the second image; obtain an estimated illuminant color from an output of a neural network by inputting the color transformation matrix to the neural network, wherein the neural network is trained based on a pair of reference images of a same reference scene and a color rendition chart that are captured by different cameras having different spectral sensitivities; and perform a white balance correction on the first image based on the estimated illuminant color to output a corrected first image.
  • The neural network may be trained to minimize a loss between the estimated illuminant color and a ground-truth illuminant color, and the ground-truth illuminant color may be obtained from a color value of at least one achromatic patch in the color rendition chart.
  • The second image may show a wider view of the same scene than the first image, and the processor may be further configured to execute the instructions to: crop the second image to have a same view as the first image, to spatially align the first image with the cropped second image.
  • The processor may be further configured to execute the instructions to: down-sample the first image to obtain a down-sampled first image; down-sample the cropped second image to obtain a down-sampled second image; and compute the color transformation matrix that maps the down-sampled first image to the down-sampled second image based on color values of the down-sampled first image and the down-sampled second image.
  • The color transformation matrix may be a three-by-three matrix that maps RGB values of the first image to RGB values of the second image.
  • The output of the neural network may represent a ratio of RGB values of the estimated illuminant color.
  • The neural network may be further trained using augmented images, and the augmented images may be obtained by re-illuminating a first reference image and a second reference image of different scenes under different illuminations that are captured by a same reference camera, based on color transformations between first color chart values of the first reference image and second color chart values of the second reference image.
  • The neural network may be further trained using augmented images, and the augmented images may be obtained by re-illuminating a first reference image and a second reference image of different scenes under different illuminations that are captured by a same reference camera, based on color transformations between all color values of the first reference image and all color values of the second reference image.
  • The color transformation matrix may correspond to a first color transformation matrix. The processor may be further configured to execute the instructions to: obtain, from a third camera, a third image that captures the same scene in a view different from the views of the first image and the second image; spatially align the third image with the first image; spatially align the third image with the second image; obtain a second color transformation matrix that maps the first image to the third image based on the color values of the first image and color values of the third image; obtain a third color transformation matrix that maps the second image to the third image based on the color values of the second image and the color values of the third image; concatenate the first, the second, and the third color transformation matrices to obtain a concatenated matrix; obtain the estimated illuminant color from the output of the neural network by inputting the concatenated matrix to the neural network; and performing the white balance correction on the first image based on the estimated illuminant color to output the corrected first image.
  • The apparatus may be a user device in which the first camera and the second camera are mounted, and the first camera and the second camera may have different fields of view and different spectral sensitivities.
  • The apparatus may be a server including a communication interface configured to communicate with a user device including the first camera and the second camera, to receive the first image and the second image from the user device.
  • According to an aspect of an example embodiment, a method for processing image data may include: obtaining a first image and a second image that capture a same scene in different views, from a first camera and a second camera, respectively; spatially aligning the first image with the second image; obtaining a color transformation matrix that maps the first image to the second image based on color values of the first image and the second image; obtaining an estimated illuminant color from an output of a neural network by inputting the color transformation matrix to the neural network, wherein the neural network is trained based on a pair of reference images of a same reference scene and a color rendition chart that are captured by different cameras having different spectral sensitivities; and performing a white balance correction on the first image based on the estimated illuminant color to output a corrected first image.
  • The neural network may be trained to minimize a loss between the estimated illuminant color and a ground-truth illuminant color, and wherein the ground-truth illuminant color may be obtained from a color value of at least one achromatic patch in the color rendition chart.
  • The second image may show a wider view of the same scene than the first image, and the method may further include: cropping the second image to have a same view as the first image, to spatially align the first image with the cropped second image.
  • The method may further include: down-sampling the first image to obtain a down-sampled first image; down-sampling the cropped second image to obtain a down-sampled second image; and computing the color transformation matrix that maps the down-sampled first image to the down-sampled second image based on color values of the down-sampled first image and the down-sampled second image.
  • The color transformation matrix may be a three-by-three matrix that maps RGB values of the first image to RGB values of the second image.
  • The output of the neural network may represent a ratio of RGB values of the estimated illuminant color.
  • The neural network may be further trained using augmented images, and the augmented images may be obtained by re-illuminating a first reference image and a second reference image of different scenes under different illuminations that are captured by a same reference camera, based on color transformations between first color chart values of the first reference image and second color chart values of the second reference image.
  • The color transformation matrix may correspond to a first color transformation matrix. The method may further include: obtaining, from a third camera, a third image that captures the same scene in a view different from the views of the first image and the second image; spatially aligning the third image with the first image; spatially aligning the third image with the second image; obtaining a second color transformation matrix that maps the first image to the third image based on the color values of the first image and color values of the third image; obtaining a third color transformation matrix that maps the second image to the third image based on the color values of the second image and the color values of the third image; concatenating the first, the second, and the third color transformation matrices to obtain a concatenated matrix; obtaining the estimated illuminant color from the output of the neural network by inputting the concatenated matrix to the neural network; and performing the white balance correction on the first image based on the estimated illuminant color to output the corrected first image.
  • According to an aspect of an example embodiment, a non-transitory computer readable storage medium storing a program to be executable by at least one processor to perform a method for processing image data, including: obtaining a first image and a second image that capture a same scene in different views, from a first camera and a second camera, respectively; spatially aligning the first image with the second image; obtaining a color transformation matrix that maps the first image to the second image based on color values of the first image and the second image; obtaining an estimated illuminant color from an output of a neural network by inputting the color transformation matrix to the neural network, wherein the neural network is trained based on a pair of reference images of a same reference scene and a color rendition chart that are captured by different cameras having different spectral sensitivities; and performing a white balance correction on the first image based on the estimated illuminant color to output a corrected first image.
  • Additional aspects will be set forth in part in the description that follows and, in part, will be apparent from the description, or may be learned by practice of the presented embodiments of the disclosure.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The above and other aspects, features, and aspects of embodiments of the disclosure will be more apparent from the following description taken in conjunction with the accompanying drawings, in which:
  • FIG. 1 is a diagram of a system for performing image processing using a pair of cameras according to an embodiment;
  • FIG. 2 is a diagram of a user device and spectral sensitivities of a pair of cameras mounted on the user device according to an embodiment;
  • FIG. 3 illustrates a wrap and crop operation according to an embodiment;
  • FIG. 4 is a diagram of a neural network for estimating illumination of a scene captured by a pair of cameras according to an embodiment;
  • FIG. 5 is a diagram of devices of the system for performing the image processing according to an embodiment;
  • FIG. 6 is a diagram of components of the devices of FIG. 5 according to an embodiment;
  • FIG. 7 is a diagram of a system for training a neural network of FIG. 5 according to an embodiment;
  • FIG. 8 illustrates a data augmentation process according to an embodiment;
  • FIG. 9 illustrates a data augmentation process based on full matrix transformation between color rendition charts captured in images according to an embodiment; and
  • FIG. 10 illustrates a data augmentation process based on diagonal transformation between illuminants according to an embodiment;
  • FIG. 11 illustrates a data augmentation process based on full matrix transformation between images according to an embodiment; and
  • FIG. 12 is a diagram of a system for performing image processing using more than two cameras according to an embodiment.
  • DETAILED DESCRIPTION
  • The following detailed description of example embodiments refers to the accompanying drawings. The same reference numbers in different drawings may identify the same or similar elements.
  • Example embodiments of the present disclosure are directed to estimating a scene illumination in the RGB color space of camera sensors, and applying a matrix computed from estimated scene illumination parameters to perform a white-balance correction.
  • FIG. 1 is a diagram of a method for estimating illumination of a physical scene using a neural network according to an embodiment.
  • As shown in FIG. 1, an image signal processing is performed using a pair of images of the same physical scene that are simultaneously captured by two different cameras, a first camera 111 and a second camera 112. According to embodiments of the present disclosure, both the illuminant for the first camera 111 and the illuminant for the second camera 112 are predicted, but for simplicity, the method shown in FIG. 1 focuses on estimating the illuminant for the first camera 111.
  • Referring to FIG. 2, the two cameras 111 and 112 may have different focal lengths and lens configurations to allow a user device (e.g., a smartphone) 110 to deliver DSLR-like optical capabilities of providing a wide-angle view and a telephoto. Also, the two cameras 111 and 112 may have different spectral sensitivities and therefore may provide different spectral measurements of the physical scene.
  • Graphs (a) and (b) shown in FIG. 2 represent the spectral sensitivities of the first camera 111 and the second camera 112 in RGB channels, respectively.
  • For example, the pitch of photodiodes and the overall resolutions of the two image sensors (e.g., charge-coupled device (CCD) sensors) mounted in the first camera 111 and the second camera 112 may be different from each other to accommodate the different optics associated with each sensor. Also, different color filter arrays (CFA) may be used in the first camera 111 and the second camera 112 according to the different optics, which may result in the different spectral sensitivities to incoming light as shown in graphs (a) and (b) of FIG. 2.
  • The first camera 111 and the second camera 112 may simultaneously capture a first (unprocessed) raw-RGB image and a second (unprocessed) raw-RGB image of the same scene, respectively, that provide different spectral measurements of the scene.
  • The first raw-RGB image and the second raw-RGB image may have different views while capturing the same scene. The image signal processing according to an embodiment of the present disclosure may use the color values of the scene captured with the different spectral sensitivities to estimate the scene illumination since the color values are correlated with the scene illumination.
  • Referring back to FIG. 1, the image signal processing may include: image alignment operation S110 for spatially aligning a pair of images, color transformation operation S120 for computing color transformation between the images, illumination estimation operation S130 for estimate the scene illumination using a neural network, and white balance operation S140 for correcting scene colors in the images based on the estimated scene illumination.
  • In image alignment operation S110, a global homography may be used to align two different images of the same scene having different fields of view, and then down-sampling is performed on the aligned two images, prior to computing color transformation between the two images.
  • Specifically, down-sampling S111 and S113 and warping and cropping S112 are performed to register the pair of the first raw-RGB image and the second raw-RGB image, which capture the same scene but have different fields of view.
  • In a first processing pipeline, the first raw-RGB image is downscaled by a preset factor (e.g., a factor of six) in operation S111.
  • In a second processing pipeline, either or both of image warping and image cropping S112 are performed on the second raw-RGB image to align the second raw-RGB image with the first raw-RGB image. For example, in the second processing pipeline, the second raw-RGB image is cropped to have the same size of the field of view as the first raw-RGB image. Additionally, any one or any combination of transformation, rotation, and translation may be applied to the second raw-RGB image so that the same objects in the first raw-RGB image and the second raw-RGB image are located at the same pixel coordinates.
  • FIG. 3 illustrates a wrap and crop operation according to an embodiment of the disclosure. A pre-calibrated perspective transform H is calculated between the first and second cameras 111 and 112, and the perspective transform H is applied to the second raw-RGB image to align the second raw-RGB image with the first raw-RGB image.
  • As shown in FIG. 3, the first camera 1 and the second camera 2 may capture a preset pattern to obtain image 1 and image 2, respectively.
  • At least four points x′1, x′2, x′3, and x′4 are selected from image 1 to compute the perspective transform H.

  • x′ 1=(x′ 1 ,y′ 1,1)T

  • x′ 2=(x′ 2 ,y′ 2,1)T

  • x′ 3=(x′ 3 ,y′ 3,1)T

  • x′ 4=(x′ 4 ,y′ 4,1)T
  • The corresponding points x1, x2, x3, and x4 in image 2 are represented as follows:

  • x 1=(x 1 ,y 1,1)T

  • x 2=(x 2 ,y 2,1)T

  • x 3=(x 3 ,y 3,1)T

  • x 4=(x 4 ,y 4,1)T
  • Matrix h [h1, h2, h3, h4, h5, h6, h7, h8, h9] is obtained based on the following:
  • [ 0 T - x 1 T y 1 x 1 T x 1 T 0 T - x 1 x 1 T 0 T - x 2 T y 2 x 2 T x 2 T 0 T - x 2 x 2 T 0 T - x 4 T y 4 x 4 T x 4 T 0 T - x 4 x 4 T ] 8 × 9 × [ h 1 h 2 h 3 h 4 h 5 h 6 h 7 h 8 h 9 ] 9 × 1 = 0 8 × 1
  • Using matrix h [h1, h2, h3, h4, h5, h6, h7, h8, h9], the perspective transform H is obtained as follows:
  • H = [ h 1 h 2 h 3 h 4 h 5 h 6 h 7 h 8 h 9 ] 3 × 3
  • Once the perspective transform H is computed using the calibration pattern, the warp and crop operation for a new scene is performed by applying the perspective transform H to an image captured by the second camera 112 (e.g., the second raw-RGB image). In an example embodiment, the warp and crop operation may be performed only once for the two cameras 111 and 112, rather than being performed individually for new images captured by the cameras 111 and 112.
  • Once the second raw-RGB image is aligned with the first raw-RGB image, down-sampling S113 is performed on the aligned second raw-RGB image.
  • The down-sampling S111 and the down-sampling S113 may use the same down-sampling factor to allow the down-sampled first raw-RGB image and the down-sampled first raw-RGB image to have substantially the same resolution.
  • However, the present embodiment is not limited thereto, and different down-sampling factors may be used for the down-sampling S111 and the down-sampling S113. Also, the first processing pipeline including operation S111 and the second processing pipeline including operations S112 and S113 may be executed in parallel or in sequence.
  • The down-sampling S111 and the down-sampling S113 prior to computing the color transformation, may make the illumination estimation robust to any small misalignments and slight parallax in the two views. Since the hardware arrangement of the two cameras 111 and 112 does not change for a given device (e.g., the user device 110), the homography can be pre-computed and remains fixed for all image pairs from the same device.
  • In color transformation operation S120, a color transformation matrix is computed to map the down-sampled first raw-RGB image from the first camera 111 to the corresponding aligned and down-sampled second raw-RGB image from the second camera 112. Fora particular scene illuminant, the color transformation between the two different images of the same scene may have a unique signature that is related to the scene illumination. Accordingly, the color transformation itself may be used as the feature for illumination estimation.
  • Given the first raw-RGB image I1∈Rn×3 and the second raw-RGB image I2 ∈Rn×3 with n pixels of the same scene captured by the first camera 111 and the second camera 112, under the same illumination L∈R3, there exists a linear color transformation T∈R3×3 between the color values of the first raw-RGB images and the second raw-RGB image I2 as:

  • I 2 ≈I 1 T  Equation (1)
  • such that T is unique to the scene illumination L.
  • T is computed using the pseudo inverse, as follows:

  • T=(I 1 T I 1)−1 I 1 T I 2  Equation (2)
  • For example, the linear color transformation T may be represented in a 3×3 color transformation matrix as follows:
  • T 3 × 3 = ( t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 )
  • More specifically, given A denotes pixel values in R, G, B color channels for the down-sampled first raw-RGB image, B denotes pixel values in R, G, B color channels for the aligned and down-sampled second raw-RGB image, the 3×3 color transformation matrix T between A and B is calculated as follows.
  • A × T = B A = [ a 1 R a 1 G a 1 B a 2 R a 2 G a 2 B a NR a NG a NB ] T = [ t 11 t 12 t 13 t 21 t 22 t 23 t 31 t 32 t 33 ] B = [ b 1 R b 1 G b 1 B b 2 R b 2 G b 2 B b NR b NG b NB ]
  • In the matrices of A and B, the three columns correspond to R, G, B color channels, and the rows correspond to the number of pixels in the down-sampled first raw-RGB image and the aligned and down-sampled second raw-RGB image, respectively.
  • Using a pseudo-inverse equation, the 3×3 color transformation matrix T is calculated as follows:
  • T = ( [ a 1 R a 1 G a 1 B a 2 R a 2 G a 2 B a NR a NG a NB ] T [ a 1 R a 1 G a 1 B a 2 R a 2 G a 2 B a NR a NG a NB ] ) - 1 [ a 1 R a 1 G a 1 B a 2 R a 2 G a 2 B a NR a NG a NB ] T [ b 1 R b 1 G b 1 B b 2 R b 2 G b 2 B b NR b NG b NB ]
  • In the embodiment, the 3×3 color transformation matrix is used since the 3×3 color transformation matrix is linear and accurate, and computationally efficient. However, the size of the color transformation matrix is not limited thereto, and any 3×M color transformation matrix (wherein M=3) may be used.
  • In illumination estimation operation S130, a neural network trained for estimating the illumination of the scene (e.g., the illuminant color) receives, as input, the color transformation, and outputs a two-dimensional (2D) chromaticity value that corresponds to the illumination estimation of the scene. The 2D chromaticity value may be represented by a ratio of R, G, and B values, such as 2D [R/G B/G]. For example, the estimated illumination {circumflex over (L)} is expressed as:
  • L ^ = ( r ^ b ^ ) = ( r ^ 1 b ^ )
  • Referring to FIG. 4, the neural network may include an input layer having nine (9) nodes for receiving the nine (9) parameters of the 3×3 color transformation matrix, an output layer having two nodes for outputting the 2D chromaticity value, a set of hidden layers placed between the input layer and the output layer. For example, each hidden layer may include nine (9) nodes.
  • The neural network according to an example embodiment may be required to process only the nine parameters in the color transformation matrix, and as a result, the neural network is relatively very light compared with other image processing networks, and therefore is capable of being efficiently run on-device in real time.
  • A method and a system for training the neural network will be described later with reference to FIG. 7.
  • Referring back to FIG. 1, in white balance operation S140, a white balance gain of the first raw-RGB image is adjusted based on the estimated illumination of the light source at the scene.
  • Parameters such as the R gain and the B gain (i.e., the gain values for the red color channel and the blue color channel) for white balance adjustment are calculated based upon a preset algorithm.
  • In an embodiment, white balance correction factors (e.g., α, β, γ) are selected for the first raw-RGB image based on the estimated illumination, and each color component (e.g., RWB, GWB, BWB) of the first raw-RGB image is multiplied with its respective correction factor (e.g., α, β, γ) to obtain white-balanced color components (e.g., αRWB, βGWB, γBWB).
  • In an embodiment, a R/G correction factor and a B/G correction factor may be computed based on the estimated illumination, to adjust the R/G gain and B/G gain of the first raw-RGB image.
  • FIG. 5 is a diagram of devices for performing the illumination estimation according to an embodiment. FIG. 5 includes a user device 110, a server 120, and a network 130. The user device 110 and the server 120 may interconnect via wired connections, wireless connections, or a combination of wired and wireless connections.
  • The user device 110 includes one or more devices configured to generate an output image. For example, the user device 110 may include a computing device (e.g., a desktop computer, a laptop computer, a tablet computer, a handheld computer, a smart speaker, a server, etc.), a mobile phone (e.g., a smart phone, a radiotelephone, etc.), a camera device, a wearable device (e.g., a pair of smart glasses or a smart watch), or a similar device.
  • The server 120 includes one or more devices configured to train a neural network for predicting the scene illumination using camera images to correct scene colors in the camera images. For example, the server 120 may be a server, a computing device, or the like. The server 120 may receive camera images from an external device (e.g., the user device 110 or another external device), train a neural network for predicting illumination parameters using the camera images, and provide the trained neural network to the user device 110 to permit the user device 110 to generate an output image using the neural network.
  • The network 130 includes one or more wired and/or wireless networks. For example, network 130 may include a cellular network (e.g., a fifth generation (5G) network, a long-term evolution (LTE) network, a third generation (3G) network, a code division multiple access (CDMA) network, etc.), a public land mobile network (PLMN), a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a telephone network (e.g., the Public Switched Telephone Network (PSTN)), a private network, an ad hoc network, an intranet, the Internet, a fiber optic-based network, or the like, and/or a combination of these or other types of networks.
  • The number and arrangement of devices and networks shown in FIG. 5 are provided as an example. In practice, there may be additional devices and/or networks, fewer devices and/or networks, different devices and/or networks, or differently arranged devices and/or networks than those shown in FIG. 5. Furthermore, two or more devices shown in FIG. 5 may be implemented within a single device, or a single device shown in FIG. 5 may be implemented as multiple, distributed devices. Additionally, or alternatively, a set of devices (e.g., one or more devices) may perform one or more functions described as being performed by another set of devices.
  • FIG. 6 is a diagram of components of one or more devices of FIG. 5 according to an embodiment. Device 200 may correspond to the user device 110 and/or the server 120.
  • As shown in FIG. 6, the device 200 may include a bus 210, a processor 220, a memory 230, a storage component 240, an input component 250, an output component 260, and a communication interface 270.
  • The bus 210 includes a component that permits communication among the components of the device 200. The processor 220 is implemented in hardware, firmware, or a combination of hardware and software. The processor 220 is a central processing unit (CPU), a graphics processing unit (GPU), an accelerated processing unit (APU), a microprocessor, a microcontroller, a digital signal processor (DSP), a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), or another type of processing component. The process 220 includes one or more processors capable of being programmed to perform a function.
  • The memory 230 includes a random access memory (RAM), a read only memory (ROM), and/or another type of dynamic or static storage device (e.g., a flash memory, a magnetic memory, and/or an optical memory) that stores information and/or instructions for use by the processor 220.
  • The storage component 240 stores information and/or software related to the operation and use of the device 200. For example, the storage component 240 may include a hard disk (e.g., a magnetic disk, an optical disk, a magneto-optic disk, and/or a solid state disk), a compact disc (CD), a digital versatile disc (DVD), a floppy disk, a cartridge, a magnetic tape, and/or another type of non-transitory computer-readable medium, along with a corresponding drive.
  • The input component 250 includes a component that permits the device 200 to receive information, such as via user input (e.g., a touch screen display, a keyboard, a keypad, a mouse, a button, a switch, and/or a microphone). The input component 250 may include a sensor for sensing information (e.g., a global positioning system (GPS) component, an accelerometer, a gyroscope, and/or an actuator).
  • In particular, the input component 250 may include two or more cameras, including the first camera 111 and the second camera 112 illustrated in FIG. 2. The first camera 111 and the second camera 112 may be rear-facing cameras that have different spectral sensitivities and have different fields of view from each other.
  • The output component 260 includes a component that provides output information from the device 200 (e.g., a display, a speaker, and/or one or more light-emitting diodes (LEDs)).
  • The communication interface 270 includes a transceiver-like component (e.g., a transceiver and/or a separate receiver and transmitter) that enables the device 200 to communicate with other devices, such as via a wired connection, a wireless connection, or a combination of wired and wireless connections. The communication interface 270 may permit device 200 to receive information from another device and/or provide information to another device. For example, the communication interface 270 may include an Ethernet interface, an optical interface, a coaxial interface, an infrared interface, a radio frequency (RF) interface, a universal serial bus (USB) interface, a Wi-Fi interface, a cellular network interface, or the like.
  • The device 200 may perform one or more processes described herein. The device 200 may perform operations S110-S140 based on the processor 220 executing software instructions stored by a non-transitory computer-readable medium, such as the memory 230 and/or the storage component 240. A computer-readable medium is defined herein as a non-transitory memory device. A memory device includes memory space within a single physical storage device or memory space spread across multiple physical storage devices.
  • Software instructions may be read into the memory 230 and/or the storage component 240 from another computer-readable medium or from another device via the communication interface 270. When executed, software instructions stored in the memory 230 and/or storage component 240 may cause the processor 220 to perform one or more processes described herein.
  • Additionally, or alternatively, hardwired circuitry may be used in place of or in combination with software instructions to perform one or more processes described herein. Thus, embodiments described herein are not limited to any specific combination of hardware circuitry and software.
  • FIG. 7 is a diagram of a system for training a neural network of FIG. 4 according to an embodiment. The training process may be performed by the user device 110 or the server 120, using the components illustrated in FIG. 6.
  • The neural network according to an embodiment is trained to predict the illuminant for the first camera 111 and the illuminant for the second camera 112 using the same color transforms, but for simplicity, the description of the training process in the present disclosure focuses on estimating the illuminant for the first camera 111.
  • As shown in FIG. 7, a network training process is performed using a pair of images of the same physical scene that are simultaneously captured by two different cameras 111 and 112. The two cameras 111 and 112 may have different spectral sensitivities and therefore may provide different spectral measurements for the same scene having the same light source.
  • The first camera 111 and the second camera 112 may simultaneously capture a first raw-RGB image and a second raw-RGB image of the same scene, respectively, that provide different spectral measurements of the scene. The first raw-RGB image and the second raw-RGB image may have different views while capturing the same scene.
  • For the purposes of training the neural network, the first camera 111 and the second camera 112 may capture a color rendition chart as shown in FIG. 7. The color rendition chart may allow the first raw-RGB image and the second raw-RGB to provide a wide distribution of colors under the scene. Also, the neutral patches (also referred to as “achromatic patches” or “gray patches”) of the color rendition chart in the first raw-RGB image may provide a ground truth illumination value (e.g., a ground-truth illuminant color) for the first raw-RGB image. Likewise, the neutral patches in the second raw-RGB image may provide a ground truth illumination value for the second raw-RGB image.
  • Hereinafter, the first raw-RGB image and the second raw-RGB image may be referred to as image 1 and image 2.
  • In operation S210, image 1 and image 2 are spatially aligned with each other, for example, using a global homography. For example, image 2 is cropped to have the same size of the field of view as image 2, and any one or any combination of transformation, rotation, and translation is applied to image 2 so that the same objects (e.g., the slide) in image 1 and image 2 are located at the same pixel coordinates.
  • In turn, the aligned image 1 and image 2 are down-sampled prior to computing color transformation between image 1 and image 2. The down-sampling may make the illumination estimation robust to any small misalignments and slight parallax in the two views of images 1 and 2. Since the hardware arrangement of the two cameras 111 and 112 does not change for a given device, the homography can be pre-computed and remains fixed for all image pairs from the same device.
  • In operation S220, a color transformation matrix is computed to map the down-sampled image 1 from the first camera 111 to the corresponding aligned and down-sampled image from the second camera 112. For example, the color transformation matrix may be computed based on Equations (1) and (2).
  • In operation S230, a neural network for estimating the illumination of the scene is constructed to have the structure shown in FIG. 4. For example, the neural network may include an input layer having nine (9) nodes for receiving the nine (9) parameters of a 3×3 color transformation matrix, an output layer having two nodes for outputting the 2D chromaticity value, a set of hidden layers placed between the input layer and the output layer. The neural network according to an example embodiment may be required to process only the nine parameters in the color transformation matrix, and as a result, the neural network is relatively very light compared with other image processing networks, and therefore is capable of being efficiently run on-device in real time.
  • In the training process, the neural network receives, as input, the parameters of the color transformation matrix, and outputs a two-dimensional (2D) chromaticity value that corresponds to the illumination estimation of the scene. The 2D chromaticity value may be represented as 2D [R/G B/G], indicating a ratio of a red color value to a green color value, and a ratio of a blue color value to the green color value.
  • Given a dataset of M image pairs L={(I11,I21), . . . , (I1M,I2M)}, the corresponding color transformations T1, . . . , TM between each pair of images are computed using Equation (2), as follows:

  • T={T 1 , . . . ,T M}
  • (I11,I21) may denote image 1 and image 2, and T1 may denote color transformation between image 1 and image 2. The training process according to the embodiment is described using the pair of images 1 and 2, but a large number of paired images may be used for training the neural network. Augmented training images may be developed by applying mathematical transformation functions to camera captured images. The description of data augmentation will be provided later with reference to FIGS. 7-9.
  • In operation S240, a set of corresponding target ground truth illuminations L of image I1i (i.e., as measured by the first camera 111) is obtained from each pair of images as follows:

  • L={L 1 , . . . ,L M},
  • L1 may denote a ground truth illumination of image 1. The ground truth illumination L1 may be obtained by extracting the image area of the neutral patches from image 1 and measuring pixel colors of the neutral patches since the neutral patches work as a good reflector of the scene illumination. For example, average pixel colors L1 [Ravg, Gavg, Bavg] inside the neutral patches may be used as the ground truth illumination L1 for image 1.
  • The neural network fθ: T→L is trained with parameters θ to model the mapping between the color transformations T and scene illuminations L. The neural network f may predict the scene illumination L for the first camera 111 given the color transformation T between image 1 and image 2, as follows:

  • {circumflex over (L)}=f θ(T)  Equation (3)
  • In operation S250, the neural network f is trained to minimize the loss between the predicted illuminations {circumflex over (L)}i and the ground truth illuminations Li as follows:
  • min θ 1 M i = 1 M L ^ i - L i Equation ( 4 )
  • The neural network according to an embodiment is lightweight, for example, consisting of a small number (e.g., 2, 5, or 16) of dense layers, wherein each layer has nine neurons only. The total number of parameters may range from 200 parameters for the 2-layer neural network up to 1460 parameters for the 16-layer neural network. The input to the neural network is the flattened nine values of the color transformation T and the output is two values corresponding to the illumination estimation in the 2D [R/G B/G] chromaticity color space where the green channel's value may be set to 1.
  • According to embodiments of the present disclosure, the user device 110 or the server 120 may use the neural network that has been trained by an external device without performing an additional training process on the user device 110 or the server 120, or alternatively may continue to train the neural network in real time on the user device 110 or the server 120.
  • FIG. 8 illustrates a data augmentation process according to an embodiment.
  • Due to the difficulty in obtaining large datasets of image pairs captured with two cameras under the same illumination, a data augmentation process may be performed to increase the number of training samples and the generalizability of the model according to an example embodiment.
  • As shown in FIG. 8, image I1 is captured under a source illuminant L1[r1, g1, b1] and includes a color rendition chart. Image I1 is re-illuminated to obtain image I1′ which appears to be captured under the target illuminant L2[r2, g2, b2]. Image I1′ as well as image I1 may be used to train the neural network.
  • Various methods may be used to re-illuminate an image which will be described with references to FIG. 9-11 hereinafter.
  • FIG. 9 illustrates a data augmentation process based on a full matrix transformation between color rendition charts captured in images according to an embodiment.
  • As shown in FIG. 9, a pair of captured images I1 and I2 are used to obtain a re-illuminated image I1′ that includes the same image content as the captured image I1 but has different color values from the captured image I1. The captured image I1 and captured image I2 are images captured by the same camera (e.g., the first camera 111), under different light sources, illuminant L1 and illuminant L2, respectively. The captured image I1 and captured image I2 both include a color rendition chart captured therein.
  • In order to re-illuminate the captured image I1 based on the color values of the captured image I2, the color rendition chart is extracted from each of the captured image I1 and the captured image I2. A color transformation matrix T is computed based on the color chart values of the captured image I1 and the color chart values of the captured image I2. The color transformation matrix T may convert the color chart values of the captured image I1 to the color chart values of the captured image I2.
  • The color transformation matrix T is applied to the captured image I1 to transform approximately all the colors in the captured image I1 and thereby to obtain the re-illuminated image I1′ which appears to be captured under illuminant L2.
  • While FIG. 9 shows augmentation of an image pair from the first camera 111 only, the corresponding pair of images from the second camera 112 is augmented in the same way. Also, the captured image I2 gas well as the captured image I1) is re-illuminated in a similar manner, based on a color transformation matrix that transforms the color chart values of the captured image I2 to the color chart values of the captured image I1.
  • In an example embodiment of the present disclosure, given a small dataset of raw-RGB image pairs captured with two cameras and including the color rendition charts, the color values of the color chart patches (e.g., the 24 color chart patches shown in FIG. 9), C∈R24×3, are extracted from each image.
  • A color transformation TC 1i→1j∈R3×3 between each pair of images (I1i, I1j) is obtained from the first camera 111 based only on the color chart values from the two images (I1i, I1j) as follows:

  • T C 1i→1j=(I 1i T I 1i)−1 I 1i T I 1j
  • Similarly, the color transformation TC 2i→2j for image pairs (I2i, I2j) is obtained from the second camera 112 as follows:

  • T C 2i→2j=(I 2i T I 2i)−1 I 2i T I 2j
  • This bank of color transformations is applied to augment images by re-illuminating any given pair of images from the two cameras (I1i,I2i) to match their colors to any target pair of images I1j, I2j, as follows:

  • I 1i→j =I 1i T C 1i→1j

  • I 2i→j =I 2i T C 2i→2j
  • where i→j means re-illuminating image i to match the colors of image j. Using this illuminant augmentation method, the number of training image pairs may be increased from M to M2.
  • According to the data augmentation process shown in FIG. 9, approximately all colors may be transformed since the color rendition charts included in the images provide a wide distribution of colors.
  • However, the data augmentation process is not limited to the method of using the color rendition charts as shown in FIG. 9, and different data augmentation methods may be applied as shown in FIGS. 10 and 11.
  • FIG. 10 illustrates a data augmentation process based on a diagonal transformation between illuminants according to an embodiment.
  • Referring to FIG. 10, a source illuminant L1[r1, g1, b1] and a target illuminant L2[r2, g2, b2] are identified from images I1 and I2 that are captured by the same camera (e.g., the first camera 111). A color transformation between the source illuminant L1[r1, g1, b1] and the target illuminant L2[r2, g2, b2] may be obtained as follows:
  • [ r 2 / r 1 0 0 0 g 2 / g 1 0 0 0 b 2 / b 1 ]
  • The color transformation is applied to image I1 to change neutral color values of image I1 and thereby to obtain image I1′ which appears to be captured under the target illuminant L2[r2, g2, b2]. Image I1′ as well as image I1 may be used to train the neural network.
  • FIG. 11 illustrates a data augmentation process based on a full matrix transformation between images according to an embodiment.
  • In an embodiment shown in FIG. 11, a color transformation matrix T is obtained using all image colors of image I1 and all image colors of Image I2, unlike the embodiment of FIG. 9 in which the color chart values extracted from images I1 and I2 are used to calculate the color transformation matrix T.
  • According to the embodiment shown in FIG. 11, a color rendition chart may be omitted from images I1 and I2, and instead, images I1 and I2 may be required to capture a scene having a wide distribution of colors. Also, the color transformation matrix T may be computed individually for each image pair.
  • FIG. 12 is a diagram of a system for performing image processing using more than two cameras according to an embodiment.
  • When there are N cameras (wherein N>2),
  • ( N 2 )
  • 3×3 color transformation matrices are constructed independently using the process described with reference to FIG. 1. The
  • ( N 2 )
  • color transformation matrices are then concatenated and fed as input to the neural network. In particular, the feature vector that is input to the network is of the size of
  • ( N 2 ) × 9.
  • In detail, referring to FIG. 12, raw-RGB image 1, raw-RGB image 2, and raw-RGB image 3 are captured by camera 1, camera 2, and camera 3, respectively.
  • The raw-RGB image 1 and the raw-RGB image 2 are re aligned with each other and down-sampled for calculation of a first color transformation between the down-sampled raw-RGB image 1 and the aligned and down-sampled raw-RGB image 2.
  • The raw-RGB image 1 and the raw-RGB image 3 are aligned with each other and down-sampled for calculation of a second color transformation between the down-sampled raw-RGB image 1 and the aligned and down-sampled raw-RGB image 3.
  • The raw-RGB image 2 and the raw-RGB image 3 are aligned with each other and down-sampled for calculation of a third color transformation between the down-sampled raw-RGB image 2 and the aligned and down-sampled raw-RGB image 3.
  • The first color transformation, the second color transformation, and the third color transformation are concentrated at a concatenation layer, and then are fed as input to a neural network for estimating the scene illumination.
  • Each of the first color transformation, the second color transformation, and the third color transformation may be a 3×3 matrix. The neural network may have an input layer having 27 nodes for receiving 27 parameters of the concatenated matrices, an output layer having 2 nodes for outputting a 2D chromaticity value for correcting color values of the raw-RGB image 1, and a set of hidden layers located between the input layer and the output layer.
  • The foregoing disclosure provides illustration and description, but is not intended to be exhaustive or to limit the implementations to the precise form disclosed. Modifications and variations are possible in light of the above disclosure or may be acquired from practice of the implementations.
  • As used herein, the term “component” is intended to be broadly construed as hardware, firmware, or a combination of hardware and software.
  • It will be apparent that systems and/or methods, described herein, may be implemented in different forms of hardware, firmware, or a combination of hardware and software. The actual specialized control hardware or software code used to implement these systems and/or methods is not limiting of the implementations. Thus, the operation and behavior of the systems and/or methods were described herein without reference to specific software code—it being understood that software and hardware may be designed to implement the systems and/or methods based on the description herein.
  • Even though particular combinations of features are recited in the claims and/or disclosed in the specification, these combinations are not intended to limit the disclosure of possible implementations. In fact, many of these features may be combined in ways not specifically recited in the claims and/or disclosed in the specification. Although each dependent claim listed below may directly depend on only one claim, the disclosure of possible implementations includes each dependent claim in combination with every other claim in the claim set.
  • No element, act, or instruction used herein should be construed as critical or essential unless explicitly described as such. Also, as used herein, the articles “a” and “an” are intended to include one or more items, and may be used interchangeably with “one or more.” Furthermore, as used herein, the term “set” is intended to include one or more items (e.g., related items, unrelated items, a combination of related and unrelated items, etc.), and may be used interchangeably with “one or more.” Where only one item is intended, the term “one” or similar language is used. Also, as used herein, the terms “has,” “have,” “having,” or the like are intended to be open-ended terms. Further, the phrase “based on” is intended to mean “based, at least in part, on” unless explicitly stated otherwise.
  • Expressions such as “at least one of,” when preceding a list of elements, modify the entire list of elements and do not modify the individual elements of the list. For example, the expression, “at least one of a, b, and c,” should be understood as including only a, only b, only c, both a and b, both a and c, both b and c, all of a, b, and c, or any variations of the aforementioned examples.
  • While such terms as “first,” “second,” etc., may be used to describe various elements, such elements must not be limited to the above terms. The above terms may be used only to distinguish one element from another.

Claims (20)

What is claimed is:
1. An apparatus for processing image data, the apparatus comprising:
a memory storing instructions; and
a processor configured to execute the instructions to:
obtain a first image and a second image that capture a same scene in different views, from a first camera and a second camera, respectively;
spatially align the first image with the second image;
obtain a color transformation matrix that maps the first image to the second image based on color values of the first image and the second image;
obtain an estimated illuminant color from an output of a neural network by inputting the color transformation matrix to the neural network, wherein the neural network is trained based on a pair of reference images of a same reference scene and a color rendition chart that are captured by different cameras having different spectral sensitivities; and
perform a white balance correction on the first image based on the estimated illuminant color to output a corrected first image.
2. The apparatus of claim 1, wherein the neural network is trained to minimize a loss between the estimated illuminant color and a ground-truth illuminant color, and
wherein the ground-truth illuminant color is obtained from a color value of at least one achromatic patch in the color rendition chart.
3. The apparatus of claim 1, wherein the second image shows a wider view of the same scene than the first image, and
wherein the processor is further configured to execute the instructions to:
crop the second image to have a same view as the first image, to spatially align the first image with the cropped second image.
4. The apparatus of claim 1, wherein the processor is further configured to execute the instructions to:
down-sample the first image to obtain a down-sampled first image;
down-sample the cropped second image to obtain a down-sampled second image; and
compute the color transformation matrix that maps the down-sampled first image to the down-sampled second image based on color values of the down-sampled first image and the down-sampled second image.
5. The apparatus of claim 1, wherein the color transformation matrix is a three-by-three matrix that maps RGB values of the first image to RGB values of the second image.
6. The apparatus of claim 1, wherein the output of the neural network represents a ratio of RGB values of the estimated illuminant color.
7. The apparatus of claim 1, wherein the neural network is further trained using augmented images, and
wherein the augmented images are obtained by re-illuminating a first reference image and a second reference image of different scenes under different illuminations that are captured by a same reference camera, based on color transformations between first color chart values of the first reference image and second color chart values of the second reference image.
8. The apparatus of claim 1, wherein the neural network is further trained using augmented images, and
wherein the augmented images are obtained by re-illuminating a first reference image and a second reference image of different scenes under different illuminations that are captured by a same reference camera, based on color transformations between all color values of the first reference image and all color values of the second reference image.
9. The apparatus of claim 1, wherein the color transformation matrix is a first color transformation matrix,
the processor is further configured to execute the instructions to:
obtain, from a third camera, a third image that captures the same scene in a view different from the views of the first image and the second image;
spatially align the third image with the first image;
spatially align the third image with the second image;
obtain a second color transformation matrix that maps the first image to the third image based on the color values of the first image and color values of the third image;
obtain a third color transformation matrix that maps the second image to the third image based on the color values of the second image and the color values of the third image;
concatenate the first, the second, and the third color transformation matrices to obtain a concatenated matrix;
obtain the estimated illuminant color from the output of the neural network by inputting the concatenated matrix to the neural network; and
performing the white balance correction on the first image based on the estimated illuminant color to output the corrected first image.
10. The apparatus of claim 1, wherein the apparatus is a user device in which the first camera and the second camera are mounted, and
wherein the first camera and the second camera have different fields of view and different spectral sensitivities.
11. The apparatus of claim 1, the apparatus is a server comprising a communication interface configured to communicate with a user device comprising the first camera and the second camera, to receive the first image and the second image from the user device.
12. A method for processing image data, the method comprising:
obtaining a first image and a second image that capture a same scene in different views, from a first camera and a second camera, respectively;
spatially aligning the first image with the second image;
obtaining a color transformation matrix that maps the first image to the second image based on color values of the first image and the second image;
obtaining an estimated illuminant color from an output of a neural network by inputting the color transformation matrix to the neural network, wherein the neural network is trained based on a pair of reference images of a same reference scene and a color rendition chart that are captured by different cameras having different spectral sensitivities; and
performing a white balance correction on the first image based on the estimated illuminant color to output a corrected first image.
13. The method of claim 12, wherein the neural network is trained to minimize a loss between the estimated illuminant color and a ground-truth illuminant color, and
wherein the ground-truth illuminant color is obtained from a color value of at least one achromatic patch in the color rendition chart.
14. The method of claim 12, wherein the second image shows a wider view of the same scene than the first image, and
wherein the method further comprises:
cropping the second image to have a same view as the first image, to spatially align the first image with the cropped second image.
15. The method of claim 12, further comprising:
down-sampling the first image to obtain a down-sampled first image;
down-sampling the cropped second image to obtain a down-sampled second image; and
computing the color transformation matrix that maps the down-sampled first image to the down-sampled second image based on color values of the down-sampled first image and the down-sampled second image.
16. The method of claim 12, wherein the color transformation matrix is a three-by-three matrix that maps RGB values of the first image to RGB values of the second image.
17. The method of claim 12, wherein the output of the neural network represents a ratio of RGB values of the estimated illuminant color.
18. The method of claim 12, wherein the neural network is further trained using augmented images, and
wherein the augmented images are obtained by re-illuminating a first reference image and a second reference image of different scenes under different illuminations that are captured by a same reference camera, based on color transformations between first color chart values of the first reference image and second color chart values of the second reference image.
19. The method of claim 12, wherein the color transformation matrix is a first color transformation matrix, and
wherein the method further comprises:
obtaining, from a third camera, a third image that captures the same scene in a view different from the views of the first image and the second image;
spatially aligning the third image with the first image;
spatially aligning the third image with the second image;
obtaining a second color transformation matrix that maps the first image to the third image based on the color values of the first image and color values of the third image;
obtaining a third color transformation matrix that maps the second image to the third image based on the color values of the second image and the color values of the third image;
concatenating the first, the second, and the third color transformation matrices to obtain a concatenated matrix;
obtaining the estimated illuminant color from the output of the neural network by inputting the concatenated matrix to the neural network; and
performing the white balance correction on the first image based on the estimated illuminant color to output the corrected first image.
20. A non-transitory computer readable storage medium storing a program to be executable by at least one processor to perform a method for processing image data, the method comprising:
obtaining a first image and a second image that capture a same scene in different views, from a first camera and a second camera, respectively;
spatially aligning the first image with the second image;
obtaining a color transformation matrix that maps the first image to the second image based on color values of the first image and the second image;
obtaining an estimated illuminant color from an output of a neural network by inputting the color transformation matrix to the neural network, wherein the neural network is trained based on a pair of reference images of a same reference scene and a color rendition chart that are captured by different cameras having different spectral sensitivities; and
performing a white balance correction on the first image based on the estimated illuminant color to output a corrected first image.
US17/377,656 2020-11-16 2021-07-16 Electronic device for estimating camera illuminant and method of the same Abandoned US20220156899A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/377,656 US20220156899A1 (en) 2020-11-16 2021-07-16 Electronic device for estimating camera illuminant and method of the same
PCT/KR2021/016244 WO2022103121A1 (en) 2020-11-16 2021-11-09 Electronic device for estimating camera illuminant and method of the same

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US202063114079P 2020-11-16 2020-11-16
US202163186346P 2021-05-10 2021-05-10
US17/377,656 US20220156899A1 (en) 2020-11-16 2021-07-16 Electronic device for estimating camera illuminant and method of the same

Publications (1)

Publication Number Publication Date
US20220156899A1 true US20220156899A1 (en) 2022-05-19

Family

ID=81586783

Family Applications (2)

Application Number Title Priority Date Filing Date
US17/377,656 Abandoned US20220156899A1 (en) 2020-11-16 2021-07-16 Electronic device for estimating camera illuminant and method of the same
US18/105,660 Pending US20230188690A1 (en) 2020-11-16 2023-02-03 Electronic device for estimating camera illuminant and method of the same

Family Applications After (1)

Application Number Title Priority Date Filing Date
US18/105,660 Pending US20230188690A1 (en) 2020-11-16 2023-02-03 Electronic device for estimating camera illuminant and method of the same

Country Status (2)

Country Link
US (2) US20220156899A1 (en)
WO (1) WO2022103121A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116188797A (en) * 2022-12-09 2023-05-30 齐鲁工业大学 Scene light source color estimation method capable of being effectively embedded into image signal processor

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8229215B2 (en) * 2007-12-03 2012-07-24 Omnivision Technologies, Inc. Image sensor apparatus and method for scene illuminant estimation
US20090147098A1 (en) * 2007-12-10 2009-06-11 Omnivision Technologies, Inc. Image sensor apparatus and method for color correction with an illuminant-dependent color correction matrix
US10949958B2 (en) * 2016-11-15 2021-03-16 Google Llc Fast fourier color constancy
WO2020098953A1 (en) * 2018-11-16 2020-05-22 Huawei Technologies Co., Ltd. Meta-learning for camera adaptive color constancy
CN111314683B (en) * 2020-03-17 2022-04-15 Oppo广东移动通信有限公司 White balance adjusting method and related equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116188797A (en) * 2022-12-09 2023-05-30 齐鲁工业大学 Scene light source color estimation method capable of being effectively embedded into image signal processor

Also Published As

Publication number Publication date
WO2022103121A1 (en) 2022-05-19
US20230188690A1 (en) 2023-06-15

Similar Documents

Publication Publication Date Title
US10547772B2 (en) Systems and methods for reducing motion blur in images or video in ultra low light with array cameras
US10897609B2 (en) Systems and methods for multiscopic noise reduction and high-dynamic range
US9792684B2 (en) System and method for imaging device modelling and calibration
US20130010075A1 (en) Camera with sensors having different color patterns
US8463068B2 (en) Methods, systems and apparatuses for pixel value correction using multiple vertical and/or horizontal correction curves
US9055178B2 (en) Single-shot high dynamic range imaging
CN101680756A (en) Compound eye imaging device, distance measurement device, parallax calculation method and distance measurement method
US11350070B2 (en) Systems, methods and computer programs for colorimetric mapping
US20090310872A1 (en) Sparse integral image descriptors with application to motion analysis
KR20100104591A (en) Method for fabricating a panorama
CN113170028A (en) Method for generating image data of imaging algorithm based on machine learning
JP2003108999A (en) Image processing method for correcting color of electronic color image
US20230188690A1 (en) Electronic device for estimating camera illuminant and method of the same
US20120106840A1 (en) Combining images captured with different color patterns
US20180176528A1 (en) Light locus generation for automatic white balance
US20100182464A1 (en) Joint Automatic Demosaicking And White Balancing
JP5269954B2 (en) Imaging device
JP6807538B2 (en) Image processing equipment, methods, and programs
US20240029308A1 (en) Apparatus and method for performing color transformation on raw sensor images
US20060092171A1 (en) Method and system for image white point estimation
TWI536765B (en) Imaging systems with clear filter pixels
US20160094834A1 (en) Imaging device with 4-lens time-of-flight pixels & interleaved readout thereof
JP6103767B2 (en) Image processing apparatus, method, and program
JP5733706B2 (en) Image processing apparatus, method, and program
CN116055896A (en) Image generation method and device and electronic device

Legal Events

Date Code Title Description
AS Assignment

Owner name: SAMSUNG ELECTRONICS CO., LTD., KOREA, REPUBLIC OF

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ABDELHAMED, ABDELRAHMAN;PUNNAPPURATH, ABHIJITH;BROWN, MICHAEL SCOTT;SIGNING DATES FROM 20210707 TO 20210709;REEL/FRAME:056879/0966

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STCB Information on status: application discontinuation

Free format text: EXPRESSLY ABANDONED -- DURING EXAMINATION