CN112581373B

CN112581373B - Image color correction method based on deep learning

Info

Publication number: CN112581373B
Application number: CN202011471881.7A
Authority: CN
Inventors: 闫波; 张晟; 宿红毅; 郑宏
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2022-06-10
Anticipated expiration: 2040-12-14
Also published as: CN112581373A

Abstract

The invention discloses an image color correction method based on deep learning, and belongs to the field of deep learning and computer vision. Firstly, collecting image data under different illumination environments to construct a color cast image data set. And then, data enhancement and preprocessing are carried out on the data, so that the data meet the training requirement of the neural network. Then, a basic convolutional neural network of an Encoder-Decoder structure is constructed, and a cascaded convolutional neural network is constructed according to the basic neural network. Meanwhile, specific neural network loss functions are designed according to the characteristics of the cascade neural network. After training, the cascade convolution neural network can convert an original picture with color cast into a picture without color cast. The method does not need prior hypothesis, can realize rapid real-time detection, can be applied to various automatic image color cast correction systems and equipment, not only avoids the subjectivity of manual correction, but also greatly improves the correction efficiency, reduces the burden of manual correction and improves the accuracy of image color cast correction.

Description

Image color correction method based on deep learning

Technical Field

The invention relates to an image color correction method based on deep learning, and belongs to the technical field of deep learning image processing.

Background

In the process of taking photos in life, even if the same scene is taken, the effect of the taken images is different under different shooting settings, illumination environments and image collecting devices. For example, images shot during underwater investigation are fuzzy and dark due to the influence of light attenuation and water environment, and cannot well reflect real underwater conditions, so that scientific research results are influenced; when an indoor photo or photo brochure is shot, the color temperature of the indoor artificial light source is different, so that the color of the whole image is warm or cold, and the propaganda picture deviates from the real situation. In addition, if the weather is rainy, the photographed picture is likely to be blurred and dim, which results in that the photographed picture is not clear. Therefore, a color correction method adaptable to various tasks is required to solve the picture problem.

Color correction is an important technical problem in the field of images. In the conventional color correction method, a static, hypothesis-based processing method is mainly used. For example, when faced with insufficient illumination, histogram equalization methods may be employed; when the light source is color-shifted, a gray world algorithm or a specular reflection algorithm can be adopted. However, these algorithms are mutually split, and often need to be combined together according to a certain sequence, and during the combination process, the correction effect is influenced by mutual restriction. Moreover, these algorithms often need to be able to be used on the basis of certain assumptions, for example, the Gray world algorithm assumes that for an RGB image with a large amount of color variation, the average of the three components tends to be the same Gray value Gray, which is obviously not suitable for a picture with a single color.

In conclusion, it is important to research a new color correction technology. However, the existing image color cast correction technology is not enough, and the distance of the existing method is still different from the requirement of meeting the actual precision.

Disclosure of Invention

The invention aims to overcome the defects of the existing image color correction technology, and provides an image color correction method based on deep learning, which comprises the following steps:

step 1: image data is acquired.

The method comprises the following specific steps:

step 1.1: and using image acquisition equipment to photograph the target scene. For the same scene, the picture should be taken simultaneously in a standard lighting environment and a non-standard lighting environment, wherein the standard lighting environment refers to the situation that the picture should be taken under the full illumination of the D65 light source, and the non-standard lighting environment includes but is not limited to light sources with too dark or too bright, excessive or insufficient exposure and other different color temperatures.

Step 1.2: and marking the collected pictures. For images shot in the same scene, which images are shot in a standard illumination environment and which images are shot in a non-standard illumination environment are marked respectively, and paired data sets are constructed and stored in a database.

Step 2: data enhancement and data pre-processing are performed on the acquired image dataset.

The method comprises the following specific steps:

step 2.1: the obtained image data is down-sampled at the center of the picture to obtain an image with a pixel value of a fixed pixel size.

Step 2.2: and carrying out mirror image inversion, rotation, distortion and random cutting on the down-sampled image data to enhance the data.

Further, the data enhancement method in step 2.2 is as follows:

step 2.2.1: and performing horizontal and vertical mirror image inversion on the image data, and performing data amplification.

Step 2.2.2: a rotation operation is performed. The method comprises the steps of rotating an original image clockwise and amplifying data.

Step 2.2.3: and (5) performing cutting operation. And performing random walk on the original image by means of sliding windows to perform cropping, wherein the cropping times are 5-10 times.

Step 2.3: and carrying out division of a training set and a test set on the amplified data to obtain a finally processed data set.

And step 3: and constructing a full convolution neural network based on the Encoder-Decoder.

The method comprises the following specific steps:

the basic convolutional neural network is as shown in fig. 1, and adopts a downsampling-upsampling structure similar to U-net as a whole, and is composed of two basic unit modules, in the downsampling part, the unit modules are composed of two convolutional layers and a pooling layer, and after the convolutional layers are processed, the image is activated through an activation function. In the up-sampling part, the unit module is composed of an up-sampling layer and two convolution layers, and the image is activated by an activation function after being processed by the convolution layers. Unlike the conventional U-net network, each time the activation function is activated, the normalization is performed through the BatchNorm layer. In addition, in order to ensure the flexibility of the network, the times of down sampling and up sampling can be adjusted.

The whole neural network structure is an Encoder-Decoder structure of down-sampling and up-sampling, the number of down-sampling processing modules is consistent with that of up-sampling processing modules, and the output of each down-sampling layer is used as a part of input to be connected to the up-sampling as input for supplementing shallow information of an image.

Optimally, the upsampling process uses bilinear interpolation to upsample, so that the feature map changes gradually rather than drastically.

And 4, step 4: and (3) cascading the single-stage full convolution neural network to obtain a final cascaded neural network, and designing a corresponding loss function aiming at the cascaded neural network.

The method comprises the following specific steps:

step 4.1: the cascaded convolutional neural network is composed of more than 2 basic convolutional neural networks constructed in step 3, as shown in fig. 2, and the output of each basic convolutional neural network is used as the input of the next basic convolutional neural network and propagates downwards.

Step 4.2: because the depth of the cascade convolution neural network is generally deeper, in order to prevent the gradient disappearance phenomenon in the deeper network, the initial image is continuously added into the subsequent cascade network as the input, thereby providing a certain gradient.

Step 4.3: unlike a common convolutional neural network, since there is a cascade of a plurality of convolutional neural networks, a specific loss function needs to be designed for it. And each level of convolutional neural network outputs the corrected image, and LOSS value calculation is carried out on the corrected image and the standard image to obtain the LOSS value of each level. Wherein, LOSS can be MSE-LOSS, SSIM-LOSS, MSSIM-LOSS, etc., and one of them is selected as the most suitable LOSS.

And after the LOSS value of each level is obtained, the overall LOSS value of the cascade neural network is formed according to a certain rule. Here, a linear combination is adopted, and the formula is as follows:

wherein L is_kIs the LOSS value, alpha, between the output of the kth network and the standard image_kThe sum of the significant coefficients of all stages is 1. And multiplying and adding LOSS corresponding to each level and the effective coefficient thereof to obtain the LOSS value of the whole network.

And 5: and (4) using the preprocessed image data set for cascade neural network training to obtain a plurality of trained neural network models.

The method comprises the following specific steps:

step 5.1: parameters of each layer in the network are initialized before training the neural network. Here, initialization assignment may be performed on parameters of each layer of the network by using an initialization method such as random or Xavier.

Step 5.2: before training, the training set is divided into training cases and verification cases. Here, the training set may be divided in a K-fold cross validation manner.

Step 5.3: and inputting the divided training set into a cascade convolution neural network to obtain output corrected image output, and calculating the corrected image output and the standard image to obtain the corresponding LOSS.

Step 5.4: and optimizing each parameter of the neural network by adopting an optimizer, and performing back propagation on the obtained LOSS to update each parameter of the network.

And step 5.5: and after the parameters are updated, fixing the network parameters, inputting the verification set data into the neural network for verification to obtain the LOSS value on the verification set, and terminating the training of the neural network if the LOSS value is not reduced for a certain number of continuous rounds.

Step 5.6: and adjusting the learning rate of the neural network according to the LOSS value obtained in the step 5.5. The adjustment strategy is as follows: if the LOSS value does not decrease for a certain number of consecutive rounds, the learning rate is adjusted to k times the original rate.

Step 5.7: and returning to the step 5.2 for loop iteration until the exit condition is met or the preset upper limit of the training times is reached, and then terminating the training.

And 6: and putting the collected color cast image into a trained model for real-time color cast correction to obtain a corrected image without color cast.

The method comprises the following specific steps:

step 6.1: and cutting the pictures acquired in real time or the images uploaded by the clients in the network to a fixed size obtained by cutting the data set during training.

Step 6.2: and inputting the cut picture into a trained cascade convolution neural network to obtain a corrected image, and returning the corrected image to the image correction system to finish correction.

Advantageous effects

Compared with the prior art, the method of the invention does not need prior hypothesis, can remove the influence of various factors on the color of the original picture, and can be applied to color correction tasks of various scenes. On the basis, an automatic color correction system can be further constructed, and the color correction efficiency is further improved.

Drawings

FIG. 1 is a diagram of a basic fully convolutional neural network architecture;

FIG. 2 is a diagram of a cascaded neural network architecture;

fig. 3 is a flow chart of image color cast correction.

Detailed description of the invention

The method of the present invention is further illustrated by the following examples in conjunction with the accompanying drawings.

Examples

Step 1: and selecting an indoor scene, firstly photographing the indoor scene by using image acquisition equipment to acquire normal and color cast image data, and storing the image data into a database after primary processing.

Step 1.1, collecting images by using an optical collecting device, photographing indoor target scenes by using a standard D65 light source in a standard environment in the collecting process, adjusting the color tone of a background light source (such as switching to an incandescent lamp, a warm light lamp and the like), adjusting the brightness of the light source, and adjusting an exposure value to photograph so as to obtain photographed images.

Step 1.2: and (2) carrying out information annotation on the image obtained by photographing in the step 1.1, marking the indoor image obtained by photographing in the environment of the standard light source as True, marking the indoor image obtained by photographing in the environment of the other non-standard light source as False, simultaneously establishing corresponding fields in a database, storing the image in a BLOB format, marking and storing the marking bits in a Boolean type, setting the True image as True, setting the False image as False, and constructing an indoor image database.

Step 2: and performing data enhancement and data preprocessing on the acquired data set.

Step 2.1: and reading corresponding indoor image data from the database, converting the BLOB field into a PNG format and reading. Then, the obtained image data is down-sampled at the picture center to obtain an image picture having a pixel value of 256 × 256.

Step 2.2: and (3) performing mirror image inversion, rotation, distortion and random cutting on the image processed in the step (2.1) to perform data enhancement.

Step 2.2.2: the rotation operation comprises the steps of rotating the original image by 45 degrees, 90 degrees, 135 degrees, 180 degrees, 225 degrees, 270 degrees and 315 degrees clockwise, and performing data amplification.

Step 2.2.3: and (4) cutting operation. And (4) clipping by random walk on the original image in a sliding window mode, wherein the clipping times are 10 times.

Step 2.3: and (3) dividing the amplified data into a training set and a test set, wherein the division ratio is 8:2, and obtaining a finally processed image data set.

And step 3: and constructing a basic full convolution neural network based on the Encode-Decoder.

The basic convolutional neural network is as shown in fig. 2, integrally adopts an Encoder-Decoder structure of down-sampling and up-sampling, and is composed of two basic unit modules, in the down-sampling process, a processing unit module is composed of two convolutional layers and a pooling layer, and an image is activated through a Relu function after being processed by the convolutional layers. In the up-sampling process, the processing unit module consists of an up-sampling layer and two convolution layers, and the image is activated through a Relu function after being processed by the convolution layers.

The whole neural network structure is a symmetrical Encoder-Decoder structure, the number of the down-sampling processing modules and the number of the up-sampling processing modules are consistent, and the down-sampling processing modules and the up-sampling processing modules comprise 6 down-sampling processing modules and 6 up-sampling processing modules. In addition, the output of each unit module of down sampling is connected as a part of the input to the unit module of up sampling as the input, and the shallow information for complementing the image is processed for finer image processing, and is connected for a total of 6 times.

The upsampling process uses bilinear interpolation to perform upsampling so as to make the feature map change gradually rather than violently.

Step 4.1: the cascaded convolutional neural network is shown in fig. 3, and is composed of 4 basic convolutional neural networks described above, and the output of each basic convolutional neural network is used as the input of the next basic convolutional neural network and propagates downwards.

Step 4.2: considering that the depth of the cascaded convolutional neural network is generally deep, in order to prevent the gradient disappearance phenomenon in the deep network, the initial image is continuously added into the subsequent cascaded network as an input, and a certain gradient is provided.

Step 4.3: unlike a conventional convolutional neural network, since there is a cascade of a plurality of convolutional neural networks, a specific loss function needs to be designed for it. And each level of convolutional neural network outputs the corrected image, and the corrected image and the standard image are subjected to MSE-LOSS calculation to obtain each level of LOSS value. Wherein, the MSE-LOSS calculation formula is as follows:

where m and n represent the length and width of the image, respectively, a and B represent the corrected image and the original image, respectively, and i, j represent the row and column coordinates in the image, respectively.

And after the MSE-LOSS value of each stage is obtained, the overall LOSS value of the cascade neural network is formed according to a certain rule. Here, a linear combination is adopted, and the formula is as follows:

wherein L is_kIs the MSE-LOSS value, alpha, between the output of the kth level network and the standard image_kThe sum of the significant coefficients of all stages is 1. In particular, since there are only 4 levels, the weight should be gradually increased in consideration that the accuracy of the image should be increased at each level. Here, α₁＝0.1，α₂＝0.2，α₃＝0.3，α₄0.4. And then multiplying and adding the MSE-LOSS corresponding to each level and the effective coefficient thereof to obtain the LOSS value of the whole network.

And 5, using the preprocessed data set for cascade neural network training to obtain a plurality of trained neural network models.

And 5.1, before training the neural network, initializing parameters of each layer in the network, wherein the evaluation is carried out on each layer of the network by using an Xavier initialization method.

And 5.2, before training, dividing the training set into a training case and a verification case, wherein the training set is divided in a ten-fold cross verification mode.

And 5.3, inputting the divided training set into a cascade convolution neural network to obtain output corrected images, and calculating the corrected images and the standard images to obtain corresponding LOSS.

And 5.4, optimizing each parameter of the neural network by adopting an Adam optimizer, and performing back propagation on the obtained LOSS to update each parameter of the network.

And 5.5, after the parameters are updated, fixing the network parameters, inputting the verification set data into the neural network for verification to obtain the LOSS value on the verification set, and terminating the training of the neural network if the LOSS value does not drop in 15 successive rounds.

And 5.6, adjusting the learning rate of the neural network according to the LOSS value obtained in the step 5.5, wherein the adjustment strategy is as follows: if LOSS value does not decrease in 5 consecutive rounds, the learning rate is adjusted to 0.1 times of the original learning rate.

And 5.7, returning to the step 5.2 again to carry out loop iteration until the exit condition is met or the preset upper limit of the training times is reached, and then terminating the training.

Step 6: and putting the collected indoor image data into a trained model for real-time color cast correction to obtain a corrected indoor image without color cast.

And 6.1, cutting the pictures acquired in real time or indoor image pictures uploaded by the clients in the network to 256 multiplied by 256.

Step 6.2: and inputting the cut picture into a trained cascade convolution neural network to obtain a corrected indoor image picture, and returning the corrected indoor image picture to a webpage or a client for use to finish correction.

While the foregoing is directed to the preferred embodiment of the present invention, it is not intended that the invention be limited to the embodiment and the drawings disclosed herein. Equivalents and modifications may be made without departing from the spirit of the disclosure, which is to be considered as within the scope of the invention.

Claims

1. An image color correction method based on deep learning is characterized by comprising the following steps:

step 1: acquiring image data;

step 2: performing data enhancement and data preprocessing on the acquired image data set;

and 3, step 3: constructing a full convolution neural network based on an Encoder-Decoder, which comprises the following specific steps:

the basic convolutional neural network integrally adopts a down-sampling-up-sampling structure and consists of two basic unit modules, wherein in a down-sampling part, the unit modules consist of two convolutional layers and a pooling layer, and an image is activated through an activation function after being processed by the convolutional layers; in the up-sampling part, the unit module consists of an up-sampling layer and two convolution layers, and an image is activated through an activation function after being processed by the convolution layers; after each activation by the activation function, normalization is carried out by a BatchNorm layer;

in order to ensure the flexibility of the network, the times of down sampling and up sampling can be adjusted;

the overall neural network structure is an Encoder-Decoder structure of down-sampling-up-sampling, the number of down-sampling processing modules is consistent with that of up-sampling processing modules, and the output of each down-sampling layer is used as a part of input to be connected to the up-sampling as input for supplementing the shallow information of the image;

and 4, step 4: the method comprises the following steps of cascading single-stage full convolution neural networks to obtain a final cascaded neural network, and designing a corresponding loss function aiming at the cascaded neural network, wherein the loss function is specifically as follows:

step 4.1: the cascade convolution neural network is composed of more than 2 basic convolution neural networks constructed in the step 3, and the output of each basic convolution neural network is used as the input of the next basic convolution neural network and is propagated downwards;

and 4.2: continuously adding the initial image into a subsequent cascade network as input;

step 4.3: designing a specific loss function;

each level of convolutional neural network outputs the corrected image, and LOSS value calculation is carried out on the corrected image and the standard image to obtain each level of LOSS value;

after obtaining the LOSS value of each level, the overall LOSS value of the cascade neural network is formed, and a linear combination mode is adopted, wherein the formula is as follows:

wherein L is_kIs the LOSS value, alpha, between the output of the kth network and the standard image_kThe sum of the significant coefficients of all the stages is 1; multiplying and adding LOSS corresponding to each level and the effective coefficient thereof to obtain an LOSS value of the whole network;

and 5: using the preprocessed image data set for cascade neural network training to obtain a trained cascade neural network model;

step 6: and putting the collected color cast image into a trained model for real-time color cast correction to obtain a corrected image without color cast.

2. The image color correction method based on deep learning as claimed in claim 1, wherein the specific implementation method of step 1 is as follows:

step 1.1: photographing a target scene by using image acquisition equipment; for the same scene, the picture should be taken in a standard lighting environment and a non-standard lighting environment at the same time, wherein the standard lighting environment refers to the situation that the picture should be taken under the sufficient illumination of the D65 light source, and the non-standard lighting environment includes but is not limited to the light sources with too dark or too bright, too much or insufficient exposure, and other different color temperatures;

step 1.2: marking the collected pictures; for images shot in the same scene, which images are shot in a standard illumination environment and which images are shot in a non-standard illumination environment are marked respectively, and paired data sets are constructed and stored in a database.

3. The image color correction method based on deep learning of claim 1, wherein the step 2 is implemented as follows:

step 2.1: carrying out downsampling on the obtained image data by using a picture center to obtain an image with a pixel value of a fixed pixel size;

4. The image color correction method based on deep learning of claim 3, characterized in that the specific implementation method of step 2.2 is as follows:

step 2.2.1: carrying out horizontal and vertical mirror image inversion on the image data, and carrying out data amplification;

step 2.2.2: performing rotation operation; the method comprises the steps of rotating an original image clockwise and performing data amplification;

step 2.2.3: cutting operation is carried out; clipping is carried out by random walk on the original image in a sliding window mode;

5. The method as claimed in claim 1, wherein in step 3, the upsampling process performs upsampling by bilinear interpolation.

6. The method as claimed in claim 1, wherein in step 4.3, the LOSS is selected from one of MSE-LOSS, SSIM-LOSS and MSSIM-LOSS.

7. The image color correction method based on deep learning of claim 1, wherein the step 5 is implemented as follows:

step 5.1: initializing parameters of each layer in the network before training the neural network;

step 5.2: before training, dividing a training set into a training case and a verification case; here, a K-fold cross validation mode can be adopted to divide the training set;

step 5.3: inputting the divided training set into a cascade convolution neural network to obtain output corrected image output, and calculating the output corrected image output and a standard image to obtain a corresponding LOSS;

step 5.4: optimizing each parameter of the neural network by adopting an optimizer, and performing back propagation on the obtained LOSS to update each parameter of the network;

step 5.5: after the parameters are updated, fixing the network parameters, inputting the verification set data into the neural network for verification to obtain the LOSS value on the verification set, if the LOSS value does not decrease for a certain number of continuous rounds, meeting the exit condition, and terminating the training of the neural network;

step 5.6: adjusting the learning rate of the neural network according to the LOSS value obtained in the step 5.5; the adjustment strategy is as follows: if the LOSS value is not reduced for a certain number of continuous rounds, the learning rate is adjusted to be k times of the original rate;

8. The image color correction method based on deep learning of claim 7, wherein in step 5.1, the method for performing initialization and assignment on the network layer parameters includes a random method and a Xavier initialization method.

9. The image color correction method based on deep learning of claim 7, wherein the step 6 is implemented as follows:

step 6.1: clipping the image to a fixed size obtained by clipping the data set during training;

step 6.2: and inputting the cut picture into a trained cascade convolution neural network to obtain a corrected image, and finishing correction.

10. The method for correcting image color based on deep learning of claim 1, wherein in step 4.3, MSE-LOSS calculation is performed on the corrected image and the standard image to obtain LOSS value at each level, wherein the MSE-LOSS calculation formula is as follows: