CN112819787A

CN112819787A - Multi-light source prediction method

Info

Publication number: CN112819787A
Application number: CN202110136230.0A
Authority: CN
Inventors: 董宇涵; 邢晓岩; 李志德; 余澄
Original assignee: Shenzhen International Graduate School of Tsinghua University
Current assignee: Shenzhen International Graduate School of Tsinghua University
Priority date: 2021-02-01
Filing date: 2021-02-01
Publication date: 2021-05-18
Anticipated expiration: 2041-02-01
Also published as: CN112819787B

Abstract

A multi-light-source prediction method, comprising the steps of: decoupling the color and semantic features of the image by a feature extraction network to obtain a high-dimensional feature matrix; predicting the light source value of the principal light source by the principal light source prediction network through a high-dimensional feature matrix obtained by the feature extraction network; and the light source distribution weight map prediction network realizes the pixel-level light source distribution weight map prediction through a high-dimensional characteristic matrix obtained in the characteristic extraction network. The method overcomes the problem of constant color of multiple light sources when a digital camera images, can effectively complete the multiple light source prediction based on scene semantic information, has better adaptability to different illumination modes, and can realize accurate light source distribution and light source color information obtained from multiple light source pictures under various real conditions.

Description

Multi-light source prediction method

Technical Field

The invention relates to computational photography, in particular to a multi-light-source prediction method.

Background

Color Constancy (CC) is a classic problem faced by cameras during imaging, and human eyes can accurately restore the Color of an object when affected by light sources of different colors, which benefits from the prior knowledge of human brain on a specific environment, but for cameras, the Color of the object cannot be accurately restored when facing the interference of an ambient light source.

Common color constancy approaches can be divided into two categories, one is traditional statistical-based algorithms and the other is data-driven-based methods. The method has the advantages of high speed and small calculation amount, and has the defect that the method is extremely susceptible to the influence of a large-area pure color area in the face of the condition of lacking white points (such as a large-area pure color scene and the like), thereby causing poor adaptability. With the rise of deep learning in the field of computer vision, data-driven algorithms are also applied to Color Constancy, and mainly include Fast Fourier Color Constancy (FFCC) for learning frequency domain spectrum as the main and full-connected convolution network Color Constancy (FC) based on picture semantic information⁴) Compared with the traditional algorithm, the data-driven method has better adaptability in the face of complex scenes, for example, when the method faces large-area pure-color scenes, the data-driven method can often realize better light source prediction.

However, most of the current color constancy methods focus on the recovery of a single light source, while light sources which ignore common scenes are often multi-light sources, and although the conventional methods and the data-driven methods can realize the prediction of the multi-light sources by a clustering or partitioning method, the prediction means often ignore semantic information of pictures and cannot truly restore information of the multi-light source scenes.

Some researchers propose to realize multi-light source restoration from a picture to a picture by using a generated countermeasure network, but the method may affect the structural information of the picture, and a scene which does not belong to a real picture may be introduced in the countermeasure generation, so that the restoration result cannot be really applied to a mobile photographing terminal.

It is to be noted that the information disclosed in the above background section is only for understanding the background of the present application and thus may include information that does not constitute prior art known to a person of ordinary skill in the art.

Disclosure of Invention

The main objective of the present invention is to overcome the above-mentioned drawbacks of the background art, and to provide a multi-light-source prediction method.

In order to achieve the purpose, the invention adopts the following technical scheme:

a multi-light-source prediction method, comprising the steps of:

decoupling the color and semantic features of the image by a feature extraction network to obtain a high-dimensional feature matrix;

predicting the light source value of the principal light source by the principal light source prediction network through a high-dimensional feature matrix obtained by the feature extraction network;

and the light source distribution weight map prediction network realizes the pixel-level light source distribution weight map prediction through a high-dimensional characteristic matrix obtained in the characteristic extraction network.

Further:

the feature extraction network comprises a shallow semantic extraction branch, a deep semantic extraction branch and a color preference extraction branch, the shallow semantic extraction branch extracts shallow semantic information in an image through a small receptive field, the deep semantic extraction branch extracts deep semantic information in the image through a large receptive field so as to extract the incidence relation of different composition structures in the image, and the color preference extraction branch extracts color preference in the image so as to decouple color and semantic features.

One or more of the following settings are used:

the shallow semantic extraction branch comprises 5 convolutional layers and 4 pooling layers, all convolutional layers of the branch are convolution kernels of 3 x 3, and the step length is 2;

the deep semantic extraction branch adopts a front 5-layer network of AlexNet, the network comprises 5 convolutional layers and 3 pooling layers, convolution kernels of the front two convolutional layers are respectively 11 multiplied by 11 and 5 multiplied by 5, and convolution kernels of the rear three convolutional layers are respectively 3 multiplied by 3;

the color preference extraction branch adopts 5 convolutional layers and 4 pooling layers, and all convolutional layer convolutional kernels are 1 × 1.

The main light source prediction network comprises a light source position selection module and a light source regression module, wherein the light source position selection module is composed of convolution networks with the same number as that of light sources to be predicted, and the determination of the light source position is realized by adopting convolution layers with convolution kernel size of 1 multiplied by 1 and combining 2 times of pooling downsampling; the light source regression module adopts a full convolution network, and realizes the regression of the light source by extracting the characteristics of each channel after the light source position selection module determines the position of the light source.

The principal illuminant prediction network further comprises a channel attention module, and the channel attention module reweighs channels in the original high-dimensional feature matrix F to obtain a feature matrix G with different channel weight values.

The feature matrix G is calculated by equation (1):

where ω is the weight of the channel weighting, k is the dimension of the corresponding matrix channel, i is the pixel point of the corresponding characteristic map,

representing the multiplication of the channels.

The light source distribution weight graph prediction network comprises four upper sampling layers, and the four upper sampling layers are in short connection with the corresponding feature graph size and position in the feature extraction network.

Using predicted illuminant values

Angular error L from true nominal light source value E_angularThe evaluation indexes were as follows:

wherein the predicted value

The light source value is actually a 3-dimensional matrix with the size of (1,1,3), which represents the size of R, G, B three-channel values in RGB space, and is calculated by using an inner product (·) with the real calibration value E;

to minimize the angular error L between the predicted and true values_angularAs an optimization target and the angle error L_angularThe loss function of the predicted network as a principal light source is used to optimize the iteration of the network.

Further, the mean square error MSE is also used as a loss function of the light source distribution weight map network as follows:

where N is the number of samples, i is the current sample number, x_iFor the purpose of the current sample,

is the mean of all samples.

And multiplying the predicted light source value by a light source weight map to obtain light source distribution of a pixel level, and supervising the light source distribution by using the angle errors similarly, wherein the average value of the angle errors of all pixel points of the whole map is supervised:

wherein the content of the first and second substances,

for the predicted illuminant value of each pixel point i in the illuminant distribution map, N is the total number of full image pixels, E_iThe real light source value of the point i; the final loss function L of the network is:

where k represents k candidate principal illuminant k e { 2.

A computer-readable storage medium, in which a computer program is stored which, when executed by a processor, carries out the method.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides a multi-light source prediction method based on deep learning, aiming at the multi-light source estimation problem required by the multi-light source local white balance problem of a camera, the multi-light source prediction method can realize effective separation of main component light sources and distribution thereof from a multi-light source scene picture, and is a multi-light source estimation scheme with good application prospect.

The multi-light source prediction method based on the light source distribution weight graph solves the problem of constant multi-light source color during imaging of a digital camera, can effectively complete multi-light source prediction based on scene semantic information, has good adaptability to different illumination modes, and can realize accurate light source distribution and light source color information obtained from multi-light source pictures in various real conditions.

Drawings

Fig. 1 is a diagram of a deep neural network structure of a multi-light-source prediction method according to an embodiment of the present invention.

Fig. 2 is a light source distribution weight diagram and a different light source recovery diagram according to an embodiment of the invention.

Detailed Description

The embodiments of the present invention will be described in detail below. It should be emphasized that the following description is merely exemplary in nature and is not intended to limit the scope of the invention or its application.

The invention provides a multi-light-source prediction method based on a light source distribution probability map, which is used for realizing multi-light-source color constancy in the field of camera imaging. The core idea of the invention is to establish an end-to-end neural network, realize the prediction of the probability distribution diagram of the light source and the prediction of the corresponding light source, and realize the color constancy of the pixel level multi-light source by combining the two.

Referring to fig. 1 and fig. 2, the multi-light-source prediction method based on light source probability distribution according to the embodiment of the present invention mainly includes:

The structure of the deep neural network used in the embodiment of the present invention is shown in fig. 1.

Feature extraction network

The feature extraction is important for the prediction of multiple light sources, so a brand-new feature extraction network is designed in the embodiment of the invention. The network comprises three branches; a shallow semantic extraction branch, a deep semantic extraction branch, and a color preference extraction branch.

The shallow semantic extraction branch comprises 5 convolutional layers and 4 pooling layers, all convolutional layers of the branch are convolution kernels of 3 x 3, and the step length is 2. The purpose of the layer is to extract shallow semantic information such as the outline of the picture content through a small receptive field.

The deep semantic extraction branch selects a first 5-layer network (Krizhevsky A, Sutskeeper I, Hinton G E. imaging classification with deep semantic network [ J ]. Communications of the ACM,2017,60(6):84-90) of AlexNet, the network comprises 5 convolutional layers and 3 pooling layers, convolution kernels of the first two layers are 11 × 11 and 5 × 5 respectively, convolution kernels of the last three layers are 3 × 3 respectively, and the first five layers of network introduced into AlexNet as the deep semantic extraction branch depends on a larger-size receptive field which can be provided by the deep semantic extraction branch, so that the neural network can better extract the incidence relation of different composition structures in the picture.

The color preference extraction branch also adopts 5 convolutional layers and 4 pooling layers, but all convolutional layer convolutional kernels of the branch are 1 × 1, so that the decoupling of color and semantic features is implicitly realized in order to avoid the interference of image content on feature extraction.

Principal light source prediction network

The purpose of the principal light source prediction network is to predict the value of the principal light source in the picture, and the network performs the prediction of the light source value through a high-dimensional feature matrix obtained by a feature extraction network. The prediction network consists of two parts: the device comprises a light source position selection module and a light source regression module. The light source position selection module is composed of convolution networks with the same number as the light sources to be predicted, the convolution layer with the convolution kernel size of 1 multiplied by 1 is adopted in the part, and the determination of the light source position is realized by combining 2 times of pooling downsampling; the light source regression module uses a full convolution network, and realizes the regression of the light source by extracting the characteristics of each channel after the selection module determines the position of the light source. In particular, in order to make the network prediction effect more accurate, in the preferred embodiment of the present invention, a channel attention module is added in the prediction network, and the channel attention module can re-weight the channels in the original feature matrix F to obtain the feature matrix G with different channel weight values, and the module can be calculated by formula (1).

representing the multiplication of the channels.

Pixel-level light source distribution weight map prediction network

In order to obtain more accurate light source distribution situation, the embodiment of the invention designs a light source weight prediction network at pixel level, which can realize the prediction of a weight map at pixel level through a high-dimensional feature matrix obtained in a feature extraction network, and such a prediction process can also be considered as a map-to-map change. The network mainly comprises four upper sampling layers and a feature extraction network part, wherein in order to keep the structure information of pictures as much as possible and reduce the loss caused by the increase of the number of network layers, short connection is carried out on the four upper sampling layers and the feature extraction network corresponding to the size and the position of a feature map.

Optimization and loss function

The preferred embodiment of the present invention uses predicted light source values

Angular error L from true nominal light source value E_angularAs evaluation indexes, the following were used

Wherein the predicted value

The illuminant value is actually a 3-dimensional matrix of size (1,1,3) representing the size of the R, G, B three-channel value in RGB space, calculated using the inner product (-) with the true calibration value E。

In order to obtain the optimal prediction effect, the optimization goal of the multi-light-source prediction scheme of the preferred embodiment is to minimize the angle error L between the predicted value and the true value_angularAnd used as a loss function of the principal light source prediction network to optimize iteration of the network. Meanwhile, the preferred embodiment uses Mean Square Error (MSE) as a loss function for the light source distribution weight map network:

is the mean of all samples.

Further, multiplying the predicted light source value by the light source weight map can obtain the light source distribution at the pixel level, and the light source distribution is also supervised by using the angle error, and the actual supervision should be that the average value of the angle errors of all pixel points in the whole map is:

wherein the content of the first and second substances,

where k represents k candidate principal illuminant k e { 2.

Performance analysis

The angular errors were used for analysis and the Mean (Mean), Median (Median) and tertile (Trimean) of the angular errors of all test data were used for calculation and comparison in the actual evaluation, as shown in table 1. Compared with the traditional algorithm Gray-World, the method provided by the embodiment of the invention has the advantages that the three indexes are all improved, and compared with 3du-awb and side-awb which are deep learning methods, the method provided by the invention has larger lead on each index. Meanwhile, compared with the existing color constancy algorithm, the method of the embodiment of the invention realizes the prediction of a plurality of principal light sources and the prediction of a pixel-level light source distribution diagram, and is a multi-light-source estimation scheme with good application prospect.

TABLE 1

The background of the present invention may contain background information related to the problem or environment of the present invention and does not necessarily describe the prior art. Accordingly, the inclusion in the background section is not an admission of prior art by the applicant.

The foregoing is a more detailed description of the invention in connection with specific/preferred embodiments and is not intended to limit the practice of the invention to those descriptions. It will be apparent to those skilled in the art that various substitutions and modifications can be made to the described embodiments without departing from the spirit of the invention, and these substitutions and modifications should be considered to fall within the scope of the invention. In the description herein, references to the description of the term "one embodiment," "some embodiments," "preferred embodiments," "an example," "a specific example," or "some examples" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction. Although embodiments of the present invention and their advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the scope of the claims.

Claims

1. A multi-light source prediction method is characterized by comprising the following steps:

2. The multi-light-source prediction method of claim 1, wherein the feature extraction network comprises a shallow semantic extraction branch, a deep semantic extraction branch and a color preference extraction branch, the shallow semantic extraction branch extracts shallow semantic information in the image through a smaller receptive field, the deep semantic extraction branch extracts deep semantic information in the image through a larger receptive field so as to extract correlations of different constituent structures in the image, and the color preference extraction branch extracts color preferences in the image so as to achieve color and semantic feature decoupling.

3. The multi-light-source prediction method of claim 2, wherein one or more of the following settings are used:

4. The multi-light-source prediction method according to any one of claims 1 to 3, wherein the principal component light source prediction network comprises a light source position selection module and a light source regression module, the light source position selection module is composed of convolution networks with the same number as the light sources to be predicted, the determination of the light source positions is realized by adopting convolution layers with convolution kernel size of 1 x 1 and combining 2 times of pooling down-sampling; the light source regression module adopts a full convolution network, and realizes the regression of the light source by extracting the characteristics of each channel after the light source position selection module determines the position of the light source.

5. The multi-light-source prediction method of claim 4, wherein the principal light-source prediction network further comprises a channel attention module, and the channel attention module reweighs channels in the original high-dimensional feature matrix F to obtain a feature matrix G with different channel weight values.

6. The multi-light-source prediction method according to claim 5, wherein the feature matrix G is calculated by formula (1):

representing the multiplication of the channels.

7. The multi-light-source prediction method of any one of claims 1 to 6, wherein the light source distribution weight map prediction network includes four upsampling layers, and the four upsampling layers are short-connected with corresponding feature map size positions in the feature extraction network.

8. The multi-light-source prediction method according to any one of claims 1 to 7, wherein a predicted light source value is used

wherein the predicted value

9. The multi-light-source prediction method of claim 8 further using mean square error, MSE, as a loss function of the light source distribution weight map network as follows:

is the mean of all samples;

preferably, the predetermined light source value is multiplied by a light source weight map to obtain a light source distribution at pixel level, which is also supervised using angular errors, and the average value of angular errors of all pixel points of the whole map is supervised:

wherein the content of the first and second substances,

where k represents k candidate principal illuminant k e { 2.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 9.