CN107507250A

CN107507250A - A kind of complexion tongue color image color correction method based on convolutional neural networks

Info

Publication number: CN107507250A
Application number: CN201710406983.2A
Authority: CN
Inventors: 李晓光; 卢运西; 卓力; 张菁; 张辉
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2017-06-02
Filing date: 2017-06-02
Publication date: 2017-12-22
Anticipated expiration: 2037-06-02
Also published as: CN107507250B

Abstract

A method for color correction of face color and tongue color image based on convolutional neural network relates to a digital image processing method. The algorithm mainly includes offline part and online part. The offline part consists of training data collection, color correction convolutional neural network framework construction and training, and the online part is image color correction and color correction effect evaluation. CNN imitates the human cognitive process, that is, layer-by-layer abstraction from local features to global features. Applying convolutional neural networks to color correction can achieve a more ideal color reproduction effect. The present invention uses a deep convolutional neural network method to perform color correction on the facial complexion and tongue color collected in a stable optical environment, and reproduce the color information that the target object actually presents in the same optical environment.

Description

A method for color correction of complexion and tongue color images based on convolutional neural network

技术领域technical field

本发明涉及数字图像处理方法，特别涉及一种基于卷积神经网络的面色舌色图像颜色校正方法。The invention relates to a digital image processing method, in particular to a method for correcting complexion and tongue color images based on a convolutional neural network.

背景技术Background technique

真实颜色的采集与复现在医学、艺术等领域具有重要的价值。图像的颜色信息是进行某些专业图像分析的重要依据。物体表面呈现的颜色与光源特性、光照条件、采集设备和显示设备、打印设备等各种环节密切相关。颜色校正是颜色复现、实现颜色一致性呈现的关键技术。目前,颜色校正已经在医学图像、壁画图像和证照图像等众多图像处理领域中得到了应用。研究能真实反映观察对象本身颜色的颜色校正技术具有重要的意义。The collection and reproduction of true colors is of great value in fields such as medicine and art. Image color information is an important basis for some professional image analysis. The color presented on the surface of an object is closely related to various links such as light source characteristics, lighting conditions, acquisition equipment, display equipment, and printing equipment. Color correction is a key technology for color reproduction and consistent color presentation. At present, color correction has been applied in many image processing fields such as medical images, mural images and license images. It is of great significance to study the color correction technology that can truly reflect the color of the observed object itself.

相机获得图像与实际场景图像相比存在的颜色失真。因此，有很多针对图像颜色校正的方法相继提出，比如多项式回归、基于偏最小二乘回归颜色校正法和神经网络等方法。多项式回归颜色校正方法需要的训练样本较少，运算复杂度低，但是其回归精度与训练样本和多项式选取关系大，外推能力较差。偏最小二乘回归颜色校正法能较好地解决如自变量之间的多重相关性、样本数相对较少的问题，但精度尚难以满足实际医学应用需求。传统基于神经网络的颜色校正方法训练受限于网络层数、初始化参数选择及并行方式，故网络泛化性能一般存在过拟合现象。The color distortion in the camera-acquired image compared to the actual scene image. Therefore, many methods for image color correction have been proposed one after another, such as polynomial regression, color correction based on partial least squares regression, and neural networks. The polynomial regression color correction method requires fewer training samples and has low computational complexity, but its regression accuracy is closely related to the selection of training samples and polynomials, and its extrapolation ability is poor. The partial least squares regression color correction method can better solve problems such as multiple correlations between independent variables and a relatively small number of samples, but the accuracy is still difficult to meet the needs of actual medical applications. The training of traditional neural network-based color correction methods is limited by the number of network layers, initialization parameter selection and parallel mode, so the generalization performance of the network generally has over-fitting phenomenon.

近年来，深度学习得到广泛应用，其中卷积神经网络(Convolutional NeuralNetwork,CNN)是一种典型的深度前馈网络。CNN模仿人类的认知过程，即从局部特征到全局特征的逐层抽象。将卷积神经网络应用于颜色校正中，可以获得较为理想的颜色重现效果。In recent years, deep learning has been widely used, among which Convolutional Neural Network (CNN) is a typical deep feedforward network. CNN imitates the human cognitive process, that is, layer-by-layer abstraction from local features to global features. Applying convolutional neural networks to color correction can achieve a more ideal color reproduction effect.

发明内容Contents of the invention

本发明的目的在于，通过采用深度卷积神经网络方法，对稳定光学环境中采集到的人脸面色和舌色进行颜色校正，重现在相同光学环境中目标对象真是呈现的颜色信息。The purpose of the present invention is to perform color correction on the facial complexion and tongue color collected in a stable optical environment by using a deep convolutional neural network method, and to reproduce the color information that the target object actually presents in the same optical environment.

本发明是采用以下技术手段实现的：The present invention is realized by adopting the following technical means:

一种基于卷积神经网络的面色舌色图像颜色校正方法，整体流程图如附图1所示；算法主要包括离线部分和在线部分。离线部分由训练数据采集、颜色校正卷积神经网络网络框架搭建和训练组成，在线部分则是图像颜色校正和颜色校正效果评价。A color correction method for complexion and tongue color images based on convolutional neural networks, the overall flow chart is shown in Figure 1; the algorithm mainly includes an offline part and an online part. The offline part consists of training data collection, color correction convolutional neural network framework construction and training, and the online part is image color correction and color correction effect evaluation.

所述的离线部分，具体内容如下：The specific content of the offline part is as follows:

(1)训练数据采集(1) Training data collection

本发明图像采集采用的是封闭式环境——人工暗箱，以避免外界杂散光的影响，利用人工光源照明，以保证舌图像采集的质量和稳定性；光源、成像设备相对位置固定，从而达到采集环境的一致性和标准化；在暗箱条件下人工光源(D65)来模拟自然光，有效的保证光源条件的稳定性。The image acquisition of the present invention adopts a closed environment—artificial dark box, to avoid the influence of stray light from the outside, and artificial light source is used for lighting to ensure the quality and stability of tongue image acquisition; the relative positions of the light source and imaging equipment are fixed, so as to achieve The consistency and standardization of the environment; the artificial light source (D65) is used to simulate natural light under the dark box condition, effectively ensuring the stability of the light source condition.

与传统方法不同，本方法使用ColorChecker Digital SG作为颜色校正的色卡。ColorChecker Digital SG有140色块，相对传统ColorChecker Classic有更广的颜色域。同时，在训练样本中，我们增加与肤色和舌色颜色相近的色块样本，这样有利于提高颜色校正的精度。在封闭式环境条件下，对ColorChecker Digital SG标准色卡进行拍照。通过改变色卡的拍摄角度，调整色卡与光源距离，调整色卡和相机的距离等方式拍摄得到色卡图像，利用这些图像生成卷积神经网络颜色校正模型的训练数据。对拍摄得到的图像进行处理，裁剪截取每个色块，每个色块需要设置固定大小格式作为训练样本，利用色卡每个色块标准值生成RGB图像做训练数据的标签，训练样本和标签一一对应。Unlike conventional methods, this method uses ColorChecker Digital SG as a color card for color correction. ColorChecker Digital SG has 140 color blocks, which has a wider color gamut than the traditional ColorChecker Classic. At the same time, in the training samples, we increase the color patch samples that are similar to the skin color and tongue color, which is conducive to improving the accuracy of color correction. Under closed environmental conditions, take pictures of ColorChecker Digital SG standard color cards. By changing the shooting angle of the color card, adjusting the distance between the color card and the light source, adjusting the distance between the color card and the camera, etc., the color card images are captured, and these images are used to generate training data for the color correction model of the convolutional neural network. Process the captured image, crop and intercept each color block, each color block needs to be set in a fixed size format as a training sample, use the standard value of each color block of the color card to generate an RGB image as the label of the training data, training samples and labels One to one correspondence.

(2)颜色校正卷积神经网络网络框架搭建和训练(2) Construction and training of color correction convolutional neural network network framework

本发明是由神经网络拟合真实颜色和拍照得到的图像颜色之间的关系，学习内容相对简单，所以网络设计为浅层的深度神经网络，网络层数为5层。本发明采用三种不同的层结构，如附图2所示，分别是输入层、非线性变换层、输出层。输入层是由一个卷积层和修正线性单元(Rectified linear unit,ReLU)组成；非线性变换层由3层网络组成，每层由一个卷积层和ReLU激活函数组成，在卷积层和激活函数中间有一个批归一化；输出层是由一个卷积层组成。In the present invention, the neural network is used to fit the relationship between the real color and the image color obtained by photographing, and the learning content is relatively simple, so the network is designed as a shallow deep neural network with 5 layers. The present invention adopts three different layer structures, as shown in Figure 2, which are input layer, nonlinear transformation layer and output layer respectively. The input layer is composed of a convolutional layer and a rectified linear unit (ReLU); the nonlinear transformation layer consists of a 3-layer network, each layer consists of a convolutional layer and a ReLU activation function, and the convolutional layer and activation There is a batch normalization in the middle of the function; the output layer is composed of a convolutional layer.

在训练中，本发明利用带mini-batch的随机梯度下降算法来迭代和更新卷积核状态W和偏置B，每次进行微批数据集(mini-batch)运算，并采用随机梯度下降算法寻找全局最优解。In the training, the present invention uses the stochastic gradient descent algorithm with mini-batch to iterate and update the convolution kernel state W and bias B, each time a micro-batch data set (mini-batch) operation is performed, and the stochastic gradient descent algorithm is used Find the global optimal solution.

在CNN的图像处理过程中，卷积层之间需要通过卷积滤波器联系，卷积滤波器的定义表示为W×H×C×D,其中，C代表被滤波图像的通道数；W、H分别代表滤波范围的宽、高；D代表卷积滤波器的种类。In the image processing process of CNN, convolutional layers need to be connected through convolutional filters. The definition of convolutional filters is expressed as W×H×C×D, where C represents the number of channels of the filtered image; W, H represents the width and height of the filtering range respectively; D represents the type of convolution filter.

网络的输入层含有一个卷积层和ReLU激活函数。输入层特征提取公式表示如下：The input layer of the network consists of a convolutional layer and a ReLU activation function. The input layer feature extraction formula is expressed as follows:

F₁(X₁)＝max(0，W₁*X₁+B₁) (1)F ₁ (X ₁ )=max(0, W ₁ *X ₁ +B ₁ ) (1)

式中，X₁为进入输入层的特征图。W₁和B₁分别表示输入层的卷积滤波器和偏置，W₁的尺寸是3×3×3×64,它表示64种不同的卷积滤波器，每个卷积的核尺寸3×3×3,F₁(X₁)是输入层得到的特征图。输入图像为3×40×40的特征图，表示特征图为3通道的彩色图，宽w和高h均为40。经过卷积层输出特征图的宽w₁和高h₁计算公式如公式(2)和公式(3)所示，kernel为卷积的核大小；stride为卷积核的步长，当取值为1时，提取重叠的图像块，效果较好；pad为边缘补零像素个数。在本发明中设定kernel的值为3，stride的值为1，pad的值为1。因此，输入图像经过输入层64个卷积核3×3之后会产生64×40×40的特征图；然后，特征图经过修正线性单元ReLU。ReLu的表示为max(0,X),可以提取有用的特征图。最后输出结果仍为64×40×40的特征图。where _X1 is the feature map entering the input layer. W ₁ and B ₁ represent the convolution filter and bias of the input layer respectively. The size of W ₁ is 3×3×3×64, which represents 64 different convolution filters, and the kernel size of each convolution is 3 ×3×3, F ₁ (X ₁ ) is the feature map obtained by the input layer. The input image is a 3×40×40 feature map, which means that the feature map is a 3-channel color map, and the width w and height h are both 40. The calculation formulas of width w ₁ and height h ₁ of the output feature map through the convolutional layer are shown in formula (2) and formula (3), kernel is the kernel size of convolution; stride is the step size of convolution kernel, when the value When it is 1, the overlapping image blocks are extracted, and the effect is better; pad is the number of zero-filled pixels on the edge. In the present invention, the value of kernel is set to 3, the value of stride is 1, and the value of pad is 1. Therefore, after the input image passes through the 64 convolution kernels of the input layer 3×3, a 64×40×40 feature map will be generated; then, the feature map will pass through the modified linear unit ReLU. The representation of ReLu is max(0,X), which can extract useful feature maps. The final output is still a 64×40×40 feature map.

在非线性变换层的非线性映射过程中，卷积层、批归一化和ReLU函数位于第二层、第三层和第四层。非线性变换层各阶段的公式表示如下：In the nonlinear mapping process of the nonlinear transformation layer, the convolutional layer, batch normalization and ReLU functions are located in the second, third and fourth layers. The formulas of each stage of the nonlinear transformation layer are expressed as follows:

F_i(X_i)＝max(0，W_i*F_i-1(X_i-1)+B_i){i＝2，3，4} (4)F _i (X _i )=max(0, W _i *F _i-1 (X _i-1 )+B _i ){i=2, 3, 4} (4)

(4)式中,i表示第i层，X_i为第i-1层的输出，即F_i-1(X_i-1)。W_i和B_i分别表示非线性变换阶段的卷积滤波器和偏置,其中,卷积滤波器W₁的尺寸是3×3×3×64,第2，3，4层卷积层W_i的尺寸是64×3×3×64,每个卷积核的尺寸为64×3×3。输入层输出的64×40×40的特征图，输入到第二个卷积层中，经过64个卷积核3×3之后会产生64×40×40的特征图。然后，64×40×40的特征图进入批归一化。批归一化在卷积层和ReLU激活函数中间，解决了神经网络训练时的收敛速度慢和梯度爆炸等无法训练的情况。同时，批归一化加快了网络的训练速度，提高了模型精度。最后，特征图经过修正线性单元，提高了特征的非线性。第二层网络输出64×40×40的特征图之后经过与第二层有相同的结构的第三、四层，最终得到64×40×40的特征图。In formula (4), i represents the i-th layer, and X _i is the output of the i-1th layer, that is, F _i-1 (X _i-1 ). W _i and B _i respectively represent the convolution filter and bias of the nonlinear transformation stage, where the size of the convolution filter W ₁ is 3×3×3×64, and the second, third, and fourth layers of convolutional layers W The size of _i is 64×3×3×64, and the size of each convolution kernel is 64×3×3. The 64×40×40 feature map output by the input layer is input into the second convolutional layer, and a 64×40×40 feature map will be generated after passing through 64 convolution kernels 3×3. Then, the 64×40×40 feature maps go into batch normalization. Batch normalization is between the convolutional layer and the ReLU activation function, which solves the problems of slow convergence and gradient explosion during neural network training. At the same time, batch normalization speeds up the training speed of the network and improves the model accuracy. Finally, the feature maps are modified with linear units to improve the non-linearity of the features. The second layer network outputs a 64×40×40 feature map, and then passes through the third and fourth layers with the same structure as the second layer, and finally obtains a 64×40×40 feature map.

在输出层的输出重建过程中，特征图输入到只含有一个卷积层的输出层。输出重建的公式表示如下：In the output reconstruction process of the output layer, the feature map is input to the output layer which contains only one convolutional layer. The formula for the output reconstruction is expressed as follows:

F₅(X₅)＝W₅*F₄(X₄)+B₅ (5)F ₅ (X ₅ )＝W ₅ *F ₄ (X ₄ )+B ₅ (5)

式中，X₅为第4层的输出。W₅和B₅分别表示特征重建层的卷积滤波器和偏置，W₅的尺寸是3×3×64×3，特征重建层有3个卷积滤波器,等同于均值滤波器的作用，每个卷积的核尺寸是3×3×64，能够实现平均特征图的作用，F₄(X₄)是非线性变换层产生的特征图，即X₅；非线性变换层输出的特征图经过3个卷积核3×3之后会产生3×40×40的特征图。In the formula, X ₅ is the output of the fourth layer. W ₅ and B ₅ represent the convolution filter and offset of the feature reconstruction layer respectively. The size of W ₅ is 3×3×64×3, and the feature reconstruction layer has 3 convolution filters, which are equivalent to the function of the mean filter , the kernel size of each convolution is 3×3×64, which can realize the function of the average feature map, F ₄ (X ₄ ) is the feature map generated by the nonlinear transformation layer, that is, X ₅ ; the feature map output by the nonlinear transformation layer After 3 convolution kernels 3×3, a 3×40×40 feature map will be generated.

采集得到的数据集经过该网络进行训练，迭代50次以上后得到每轮训练的模型，模型最终被保存到文件中。The collected data set is trained by the network, and the model of each round of training is obtained after more than 50 iterations, and the model is finally saved to a file.

所述的在线部分，具体内容如下：The online part, the specific content is as follows:

(1)图像颜色校正(1) Image color correction

利用训练得到的模型对颜色失真图像进行颜色校正，得到校正后的图像。在本发明的暗室中拍摄色卡、人脸和舌图像，得到的照片与实际颜色相比较存在失真，使用基于卷积神经网络颜色校正方法对失真图像进行颜色校正。首先读取待校正图像像素点保存为图像矩阵，然后读取训练得到的MAT格式文件得到颜色校正模型。将图像矩阵输入到网络模型当中，分别在R、G、B三个通道对图像进行颜色校正，输出校正后的图像。The color-distorted image is color-corrected by using the trained model to obtain the corrected image. In the darkroom of the present invention, the color card, face and tongue images are taken, and the obtained photos are distorted compared with the actual color, and the color correction method based on the convolutional neural network is used to correct the color of the distorted image. First, read the pixel points of the image to be corrected and save it as an image matrix, and then read the MAT format file obtained from the training to obtain the color correction model. Input the image matrix into the network model, perform color correction on the image in the R, G, and B channels respectively, and output the corrected image.

(2)颜色校正效果评价(2) Evaluation of color correction effect

为了验证颜色校正模型的有效性，需要对颜色校正效果进行评价。颜色校正的评价是一个很复杂的问题，涉及到色度学、生理学、心理学等不同学科领域。常用的评价方法有客观评价和主观评价。In order to verify the validity of the color correction model, it is necessary to evaluate the color correction effect. The evaluation of color correction is a very complex issue, involving different disciplines such as colorimetry, physiology, and psychology. Commonly used evaluation methods include objective evaluation and subjective evaluation.

根据色度学的理论，颜色重现的评价标准有反射光谱匹配、色貌匹配和三刺激匹配等，这些都属于客观标准。三刺激值匹配是使计算机显示的物体的三刺激值与对应的实际物体颜色的三刺激值相同。拍摄后的图像的颜色，其三刺激值与对应的实际物体相同或者色差在允许范围内，则颜色重现的质量就好。三刺激值匹配是最普通和最有实际意义的颜色重现标准。本发明采用CIE1976L*a*b*作为评价指标。According to the theory of colorimetry, the evaluation criteria of color reproduction include reflection spectrum matching, color appearance matching and tri-stimulus matching, etc., which are all objective standards. Tristimulus value matching is to make the tristimulus value of the object displayed by the computer the same as the tristimulus value of the corresponding actual object color. The quality of color reproduction is good if the tristimulus value of the captured image is the same as that of the corresponding actual object or the color difference is within the allowable range. Tristimulus value matching is the most common and practical color reproduction criterion. The present invention adopts CIE1976L*a*b* as an evaluation indicator.

主观评价就是有人对某一给定刺激的视觉感受做直接评价，本发明中颜色校正主观评价的观察者对比真实食物和校正后图像，之后评价该方法是否还原出真实的颜色。Subjective evaluation means that someone directly evaluates the visual experience of a given stimulus. In the present invention, the observer of the color correction subjective evaluation compares the real food with the corrected image, and then evaluates whether the method restores the true color.

附图说明：Description of drawings:

图1基于卷积神经网络颜色校正方法流程图；Figure 1 is a flow chart of color correction method based on convolutional neural network;

图2颜色校正卷积神经网络模型架构图；Figure 2 Architecture diagram of color correction convolutional neural network model;

图3人脸和舌校正前后对比图。Figure 3 Comparison of face and tongue before and after correction.

具体实施方式detailed description

根据上述描述，以下介绍本发明具体的实施流程。According to the above description, the following describes the specific implementation process of the present invention.

所述离线部分分为2个步骤：The offline part is divided into 2 steps:

步骤1：训练数据采集Step 1: Training Data Acquisition

图像采集是颜色校正工作的基础。采集设备及照明条件变化时，如何保证获取的图像具有恒定的彩色特性是图像采集的关键问题。它涉及到图像采集装置的设计、照明光源的选择、色彩空间(Color Space)选取、系统颜色特性数学模型的建立等诸多问题。因此，颜色校正图像采集环境及方法的标准化是颜色校正的重要基础。一般情况下，暗室或暗箱是最理想的拍摄环境，本发明采用的自行研制的暗箱，能够避免外界杂光的干扰，保持光源环境的稳定。Image acquisition is the basis for color correction work. When the acquisition equipment and lighting conditions change, how to ensure that the acquired image has constant color characteristics is a key issue in image acquisition. It involves many issues such as the design of the image acquisition device, the selection of the lighting source, the selection of the color space (Color Space), the establishment of the mathematical model of the system color characteristics, and so on. Therefore, the standardization of color correction image acquisition environment and method is an important basis for color correction. Generally, a dark room or a dark box is the most ideal shooting environment. The self-developed dark box adopted by the present invention can avoid the interference of external stray light and keep the light source environment stable.

步骤1.1：规范化样本采集Step 1.1: Normalize Sample Collection

经过大量试验，本发明为保证图像采集的质量，避免外界环境的影响，对颜色校正采集环境提出了以下条件。After a large number of tests, in order to ensure the quality of image acquisition and avoid the influence of the external environment, the present invention proposes the following conditions for the color correction acquisition environment.

1)使用封闭的采集环境，避免杂散光进入拍摄环境和强烈光线射入镜头；1) Use a closed collection environment to avoid stray light entering the shooting environment and strong light entering the lens;

2)实验光源选用D65光源，模拟自然光；2) The experimental light source is D65 light source, simulating natural light;

3)D65光源稳定时间是10分钟。打开光源，待光源稳定之后，采集样本图像；3) D65 light source stabilization time is 10 minutes. Turn on the light source, and collect the sample image after the light source is stable;

4)将光源和相机两者位置固定，色卡的位置设定在距离相机30～35厘米范围之内；4) Fix the positions of the light source and the camera, and set the position of the color card within 30-35 cm from the camera;

5)正确设置相机参数。对佳能EOS1200D，设置白平衡自动，光圈F10，ISO设置为3200，快门时间为1/160秒；5) Correctly set the camera parameters. For Canon EOS1200D, set the white balance to auto, the aperture to F10, the ISO to 3200, and the shutter time to 1/160 second;

步骤1.2：样本采集Step 1.2: Sample Collection

样本采集过程按照设定条件进行。本发明在暗箱中使用已经配置好参数的佳能EOS1200D对ColorCheck Digital SG色卡进行拍照，通过改变色卡位置、相机拍摄角度，获得大量色卡照片。为了增加颜色校正的鲁棒性，在相同光照环境下对人脸、舌等进行拍照，得到的人脸和舌图像可以应用到卷积神经网络颜色校正模型校正效果的验证。ColorCheckDigital SG色卡具有光学标准值，故所得的色卡照片，经过处理和对应的标准图像一一对应，能够作为颜色校正网络的训练样本。The sample collection process is carried out according to the set conditions. The present invention uses the Canon EOS1200D with configured parameters to take pictures of the ColorCheck Digital SG color card in the dark box, and obtains a large number of color card photos by changing the position of the color card and the shooting angle of the camera. In order to increase the robustness of color correction, the face and tongue are photographed under the same lighting environment, and the obtained face and tongue images can be applied to the verification of the correction effect of the convolutional neural network color correction model. The ColorCheckDigital SG color card has optical standard values, so the obtained color card photos, after processing, correspond to the corresponding standard images one-to-one, and can be used as training samples for the color correction network.

步骤1.3：样本预处理Step 1.3: Sample Preprocessing

相机拍摄得到的ColorCheck Digital SG色卡照片，不能直接作为卷积神经网络的训练样本，需要将每个色块分割出来。本发明把色卡每个色块分割出来，每个色块的尺寸为180×180像素。通过查阅官方提供的ColorCheck Digital SG光学资料得到色卡每个色块的LAB值。本发明将每个色块的LAB值转化为D65光源环境下的RGB值，利用色块的RGB数值，生成RGB图像。同样，本发明设置标准色块RGB图像尺寸为180×180，照片截得色块图像和标准色块图像尺寸大小相同，便于之后运算。标准色卡RGB图像作为截得的色块图像训练标签，截得的色块图像与标签命名一一对应，能够避免对应错误所造成较大的误差。The ColorCheck Digital SG color card photos taken by the camera cannot be directly used as training samples for the convolutional neural network, and each color block needs to be segmented. The present invention divides each color block of the color card, and the size of each color block is 180×180 pixels. The LAB value of each color block of the color card can be obtained by consulting the official ColorCheck Digital SG optical data. The invention converts the LAB value of each color block into the RGB value under the D65 light source environment, and uses the RGB value of the color block to generate an RGB image. Similarly, the present invention sets the size of the standard color block RGB image to 180×180, and the size of the color block image cut from the photo is the same as that of the standard color block image, which is convenient for subsequent calculations. The RGB image of the standard color card is used as the training label of the intercepted color patch image, and the intercepted color patch image is in one-to-one correspondence with the label name, which can avoid large errors caused by corresponding errors.

为了提高卷积神经网络颜色校正模型的泛化能力和鲁棒性，本发明将同一张色卡图像裁剪出的140色块，按照ColorCheck Digital SG色卡中色块排列顺序拼接在一起，生成拼接图像，与单色块图片一同放入到训练数据中，同时将标准色块照片按同样的顺序拼接起来，放入标签数据集中。这样训练数据样本类型呈现多样性，有利于提高模型的泛化能力。In order to improve the generalization ability and robustness of the convolutional neural network color correction model, the present invention splices 140 color blocks cut from the same color card image according to the order of the color blocks in the ColorCheck Digital SG color card to generate a splicing The images are put into the training data together with the single color block pictures, and the standard color block photos are stitched together in the same order and put into the label data set. In this way, the types of training data samples are diverse, which is conducive to improving the generalization ability of the model.

为了增加样本多样性及样本库的扩充，本发明对训练数据集进行数据增强，通过上下翻转、向左旋转90度、向右旋转90度、向右转换180度、向左旋转90度并上下翻转、向右旋转90度并上下翻转和向右转换180度并上下翻转七种变换，将训练数据扩充为原来的8倍，之后使用滑动窗对训练样本分割成40*40小图。经过一系列的扩充训练数据集的数量达到96000张。颜色校正模型训练本质上是一种输入到输出的映射，它能够学习大量的输入与输出之间的映射关系，最后生成校正模型。为了验证验证模型的准确度的，采用类似的方式准备大小为1280张图片的测试集。将训练集和测试集送入到卷积神经网络中进行训练。In order to increase sample diversity and expand the sample library, the present invention performs data enhancement on the training data set by flipping up and down, rotating 90 degrees to the left, rotating 90 degrees to the right, converting 180 degrees to the right, rotating 90 degrees to the left and up and down Flip, rotate 90 degrees to the right and flip up and down, convert 180 degrees to the right and flip up and down seven transformations, expand the training data to 8 times the original size, and then use the sliding window to divide the training samples into 40*40 small images. After a series of expansions, the number of training data sets reached 96,000. Color correction model training is essentially an input-to-output mapping, which can learn a large number of mapping relationships between inputs and outputs, and finally generate a correction model. In order to verify the accuracy of the verification model, a test set with a size of 1280 images is prepared in a similar manner. The training set and test set are sent to the convolutional neural network for training.

步骤2：颜色校正卷积神经网络网络框架搭建和训练Step 2: Building and training the color correction convolutional neural network framework

本发明中没有加入池化层和全连接层，整个网络分为输入层、非线性变换层、输出层。输入层是由一个卷积层和一个ReLU激活函数组成；非线性变换层由3层网络组成，每层由一个卷积层和一个ReLU激活函数组成，在卷积层和激活函数中间有一个批归一化；输出层是由一个卷积层组成。In the present invention, no pooling layer and fully connected layer are added, and the entire network is divided into an input layer, a nonlinear transformation layer, and an output layer. The input layer is composed of a convolutional layer and a ReLU activation function; the nonlinear transformation layer is composed of a 3-layer network, each layer is composed of a convolutional layer and a ReLU activation function, and there is a batch between the convolutional layer and the activation function Normalization; the output layer is composed of a convolutional layer.

(1)网络的输入层含有一个卷积层和ReLU激活函数。输入层特征提取公式表示如下：(1) The input layer of the network contains a convolutional layer and a ReLU activation function. The input layer feature extraction formula is expressed as follows:

F₁(X₁)＝max(0，W₁*X₁+B₁) (6)F ₁ (X ₁ )=max(0, W ₁ *X ₁ +B ₁ ) (6)

式中，X₁为进入输入层的特征图。W₁和B₁分别表示输入层的卷积滤波器和偏置，W₁的尺寸是3×3×3×64,它表示64种不同的卷积滤波器，每个卷积的核尺寸3×3×3,F₁(X₁)是输入层得到的特征图。where _X1 is the feature map entering the input layer. W ₁ and B ₁ represent the convolution filter and bias of the input layer respectively. The size of W ₁ is 3×3×3×64, which represents 64 different convolution filters, and the kernel size of each convolution is 3 ×3×3, F ₁ (X ₁ ) is the feature map obtained by the input layer.

F_i(X_i)＝max(0，W_i*F_i-1(X_i)+B_i){i＝2，3，4} (7)F _i (X _i )=max(0,W _i *F _i-1 (X _i )+B _i ){i=2,3,4} (7)

式中,i表示第i层，X_i为第i-1层的输出。W_i和B_i分别表示非线性变换阶段的卷积滤波器和偏置,其中,卷积滤波器W₁的尺寸是3×3×3×64,第2，3，4层卷积层W_i的尺寸是64×3×3×64,每个卷积核的尺寸为64×3×3。In the formula, i represents the i-th layer, and X _i is the output of the i-1th layer. W _i and B _i respectively represent the convolution filter and bias of the nonlinear transformation stage, where the size of the convolution filter W ₁ is 3×3×3×64, and the second, third, and fourth layers of convolutional layers W The size of _i is 64×3×3×64, and the size of each convolution kernel is 64×3×3.

F₅(X₅)＝W₅*F₄(X₄)+B₅ (8)F ₅ (X ₅ )＝W ₅ *F ₄ (X ₄ )+B ₅ (8)

式中，X₅为第4层的输出。W₅和B₅分别表示特征重建层的卷积滤波器和偏置，W₅的尺寸是3×3×64×3，特征重建层有3个卷积滤波器,等同于均值滤波器的作用，每个卷积的核尺寸是3×3×64，能够实现平均特征图的作用，F₄(X₄)是非线性变换层产生的特征图。In the formula, X ₅ is the output of the fourth layer. W ₅ and B ₅ represent the convolution filter and offset of the feature reconstruction layer respectively. The size of W ₅ is 3×3×64×3, and the feature reconstruction layer has 3 convolution filters, which are equivalent to the function of the mean filter , the kernel size of each convolution is 3×3×64, which can realize the function of the average feature map, and F ₄ (X ₄ ) is the feature map generated by the nonlinear transformation layer.

在模型训练过程中，输入图像大小为40×40的特征图，在第一个卷积层中，经过64个卷积核3×3之后会产生64×40×40的特征图；在第二个卷积层中，输入大小为64×40×40的特征图，经过64个卷积核3×3之后会产生64×40×40的特征图；在第三个卷积层中，输入大小为64×40×40的特征图，经过64个卷积核3×3之后会产生64×40×40的特征图；在第四个卷积层中，输入大小为64×40×40的特征图，经过64个卷积核3×3之后会产生64×40×40的特征图；最后输入到输出层的卷积层，经过3个卷积核3×3之后会产生3×40×40的特征图。In the process of model training, the input image size is 40×40 feature map, in the first convolutional layer, after 64 convolution kernels 3×3, a 64×40×40 feature map will be generated; in the second In the first convolutional layer, the input size is 64×40×40 feature map, after passing through 64 convolution kernels 3×3, a 64×40×40 feature map will be generated; in the third convolutional layer, the input size is It is a 64×40×40 feature map, and after 64 convolution kernels 3×3, a 64×40×40 feature map will be generated; in the fourth convolutional layer, the input size is 64×40×40. Figure, after 64 convolution kernels 3×3, a 64×40×40 feature map will be generated; finally input to the convolution layer of the output layer, after 3 convolution kernels 3×3, a 3×40×40 feature map will be generated feature map of .

(2)在非线性变换层中，卷积层和激励层之间加入批归一化。批归一化主要思路是将输入数据白化(Whitened)，加快网络收敛速度，降低数据冗余性和特性相关性，实际上通过线性变换使数据0均值和单位方差。批归一化解决了神经网络训练时遇到的收敛速度慢和梯度爆炸等无法训练的情况。同时，批归一化加快了网络的训练速度，提高模型精度。(2) In the nonlinear transformation layer, batch normalization is added between the convolution layer and the excitation layer. The main idea of batch normalization is to whiten the input data, speed up network convergence, reduce data redundancy and feature correlation, and actually make the data 0 mean and unit variance through linear transformation. Batch normalization solves the untrainable problems such as slow convergence and gradient explosion encountered during neural network training. At the same time, batch normalization speeds up the training speed of the network and improves the accuracy of the model.

(3)在每层的激活函数中，本发明采用修正线性单元，公式如(9)式所示。x表示特征图经过卷积核后的结果，当x<0，f(x)＝0；如果x>0,f(x)＝x。在正向传播时，加快了计算速度。在反向传播时，当x>0时，梯度为1，因而减轻了梯度弥散问题。故得到的随机梯度下降算法收敛速度较sigmoid和tanh有很大提高。(3) In the activation function of each layer, the present invention uses a modified linear unit, and the formula is shown in formula (9). x represents the result of the feature map after the convolution kernel, when x<0, f(x)=0; if x>0, f(x)=x. During forward propagation, the calculation speed is accelerated. During backpropagation, when x>0, the gradient is 1, thus alleviating the gradient dispersion problem. Therefore, the convergence speed of the obtained stochastic gradient descent algorithm is greatly improved compared with sigmoid and tanh.

f(x)＝max(0,x) (9)f(x)=max(0,x) (9)

(4)在网络训练时本发明采用带mini-batch的随机梯度下降算法进行训练，当样本量比较大或迭代次数高时时，传统的梯度下降算法运算速度较慢，而随机梯度下降算克服这些缺点。传统的训练每次带入所有的样本，带mini-batch的随机梯度下降算法则是带入微批数据集(mini-batch)进行运算，采用随机下降加寻找全局最优解。学习速率是随机梯度下降算法学习方法的必要参数，决定了权值更新的速度，设置得太大会导致代价函数振荡，结果越过最优值，太小会使收敛速度过慢，一般倾向于选取较小的学习速率，如0.001±0.01以保持系统稳定。动量参数和权值衰减因子可提高训练自适应性，动量参数通常为[0.9,1.0]，权值衰减因子通常为0.0005±0.0002。通过实验观察，本发明将学习速率设为10^-4，动量参数设为0.9，权值衰减因子取值0.0005。经过多次实验，本发明设定批尺寸为128。(4) During network training, the present invention adopts the stochastic gradient descent algorithm with mini-batch for training. When the sample size is relatively large or the number of iterations is high, the traditional gradient descent algorithm has a slow operation speed, and the stochastic gradient descent algorithm overcomes these shortcoming. The traditional training brings in all samples each time, and the stochastic gradient descent algorithm with mini-batch is brought into the micro-batch data set (mini-batch) for operation, and uses random descent plus to find the global optimal solution. The learning rate is a necessary parameter of the stochastic gradient descent algorithm learning method, which determines the speed of weight update. If it is set too large, the cost function will oscillate, and the result will exceed the optimal value. If it is too small, the convergence speed will be too slow. Small learning rate, such as 0.001 ± 0.01 to keep the system stable. The momentum parameter and the weight decay factor can improve the training adaptability, the momentum parameter is usually [0.9,1.0], and the weight decay factor is usually 0.0005±0.0002. Through experimental observation, the present invention sets the learning rate to 10 ^-4 , the momentum parameter to 0.9, and the weight decay factor to 0.0005. After many experiments, the present invention sets the batch size to 128.

经过以上四个主要步骤设计和调整，卷积神经网络颜色校正模型搭建完成，将步骤1中采集得到的训练样本数据放入网络中训练，训练完成后，得到了50轮训练后的模型，以Matlab的MAT格式保存。After the design and adjustment of the above four main steps, the convolutional neural network color correction model is built, and the training sample data collected in step 1 is put into the network for training. After the training is completed, the model after 50 rounds of training is obtained. Matlab's MAT format saves.

所述在线部分分为2个步骤：The online part is divided into 2 steps:

步骤1：图像颜色校正Step 1: Image Color Correction

在暗箱中对色卡、人脸和舌进行拍摄，得到该光源环境下的色卡、人脸和舌照片。通过比较可以发现，照片和实际物体存在不同程度的颜色失真。本发明使用基于卷积神经网络颜色校正模型，对颜色失真图像进行颜色校正。首先，对待校正的图像进行逐像素读取RGB值，保存为图像矩阵。然后，将颜色校正网络从Matlab的MAT格式中读取出来，获得基于卷积神经网络颜色校正模型。将图像矩阵输入到网络中，输入层3个卷积核读取图像R、G、B三通道的矩阵。之后，模型分别对图像的R、G、B分量进行颜色校正。三通道的图像矩阵通过网络颜色校正后，输出三通道图像矩阵。三个图像矩阵最后合并成RGB图像，即得到颜色校正后的图像。Photograph the color card, human face and tongue in the dark box, and obtain the color card, human face and tongue photos under the light source environment. Through comparison, it can be found that there are different degrees of color distortion in photos and actual objects. The present invention uses a convolutional neural network-based color correction model to perform color correction on color-distorted images. First, the RGB value of the image to be corrected is read pixel by pixel and saved as an image matrix. Then, read the color correction network from the MAT format of Matlab to obtain a color correction model based on convolutional neural network. The image matrix is input into the network, and the three convolution kernels of the input layer read the matrix of the three channels of the image R, G, and B. Afterwards, the model performs color correction on the R, G, and B components of the image, respectively. After the three-channel image matrix is color-corrected by the network, the three-channel image matrix is output. The three image matrices are finally merged into an RGB image, that is, a color-corrected image is obtained.

步骤2：颜色校正效果评价Step 2: Color correction effect evaluation

利用该模型对色卡、人脸和舌图像进行颜色校正，得到校正后的图像。为了验证颜色校正模型的有效性，需要对校正后的图像进行评价。The model is used to correct the color of the color card, face and tongue images, and the corrected images are obtained. In order to verify the effectiveness of the color correction model, the corrected images need to be evaluated.

颜色校正的评价涉及到色度学、心理学等领域，是一个复杂的问题。通常分为主观评价和客观评价两类。主观评价方法经常是让观察者，对校正图像和拍摄物体进行观察比较。观察者通过校正后的图像的颜色与真实颜色进行比较，衡量颜色重现的效果。这种方法直观有效，是颜色校正质量评价的主要方法。The evaluation of color correction involves fields such as colorimetry and psychology, and is a complex issue. It is usually divided into two categories: subjective evaluation and objective evaluation. Subjective evaluation methods often allow the observer to observe and compare the corrected image and the photographed object. Observers measure how well the colors are reproduced by comparing the colors of the corrected image with the real colors. This method is intuitive and effective, and it is the main method for color correction quality evaluation.

客观评价是在一定的条件下，采用一套色标进行。如果校正后的颜色接近于原来的色标值，则颜色校正的效果较好。通常，颜色重现三刺激值和色标色差在允许的范围内，就认为获得校正的图像和真实的物体获得相同的视觉效果。在CIELAB空间中，一般认为ΔE<3为彩色颜色真实重现的标准，ΔE<6为颜色校正精准可被接受的标准。Objective evaluation is carried out under certain conditions with a set of color scales. Color correction works better if the corrected color is close to the original color stop value. Generally, if the color reproduction tristimulus value and the color difference of the color scale are within the allowable range, it is considered that the corrected image and the real object have the same visual effect. In CIELAB space, it is generally believed that ΔE<3 is the standard for true color reproduction, and ΔE<6 is the standard for accurate and acceptable color correction.

为了验证基于卷积神经网络颜色校正方法的有效性，本发明对色卡、人脸和舌拍摄图像进行颜色校正；为了评估颜色校正模型性能的优劣，本发明分别给出了主观和客观颜色评价结果。In order to verify the effectiveness of the color correction method based on the convolutional neural network, the present invention performs color correction on the color card, face and tongue images; in order to evaluate the performance of the color correction model, the present invention respectively gives subjective and objective color Evaluation results.

步骤1.1：主观评价Step 1.1: Subjective Evaluation

如图3所示，本实验给出人脸和舌颜色校正前后的对比图。佳能相机原拍摄人的照片存在偏褐色、偏暗问题。经过颜色校正后，基于卷积神经网络颜色校正方法很好的还原出人脸和舌的真实颜色。As shown in Figure 3, this experiment gives a comparison of the face and tongue before and after color correction. Canon cameras originally took photos of people with brownish and dark problems. After color correction, the color correction method based on the convolutional neural network can restore the true color of the face and tongue very well.

步骤1.2：客观评价Step 1.2: Objective Evaluation

针对客观颜色评价结果，对标准色卡图像，利用本发明自行开发的软件将色卡图像裁剪出色块图像，并读取每个色块中RGB值，将RGB值转化为LAB值，利用标准误差值CIE1976L*a*b*作为评价指标，误差如计算公式(10)所示。For the objective color evaluation results, for the standard color card image, use the self-developed software of the present invention to cut out the color block image of the color card image, and read the RGB value in each color block, convert the RGB value into a LAB value, and use the standard error Value CIE1976L*a*b* As an evaluation index, the error is shown in formula (10).

上式中ΔL^*、Δa^*和Δb^*分别是色块图像LAB值和标准色块LAB值的差值。利用上述方法，测试样本矫正前后色差情况。经计算得到，校正前样本颜色误差平均值为14.21，经过颜色校正后误差平均值为3.70。因此，本发明提出的颜色校正方法取得了不错的校正结果。In the above formula, ΔL ^* , Δa ^* and Δb ^* are the difference between the LAB value of the color patch image and the LAB value of the standard color patch, respectively. Using the above method, test the chromatic aberration of the sample before and after correction. Calculated, the sample color error before correction Average is 14.21, color corrected for error The average is 3.70. Therefore, the color correction method proposed by the present invention has achieved good correction results.

Claims

1. A complexion and tongue color image color correction method based on convolutional neural network, comprising offline part and online part, characterized in that: offline part is composed of training data collection, color correction convolutional neural network framework and training, online Some include image color correction;

The specific content of the offline part is as follows:

(1) Training data collection

The acquisition adopts artificial light source under dark box conditions to simulate natural light, effectively ensuring the stability of light source conditions;

Process the captured image, crop and intercept each color block, each color block needs to be set in a fixed size format as a training sample, use the standard value of each color block of the color card to generate an RGB image as the label of the training data, training samples and labels one-to-one correspondence;

(2) Construction and training of color correction convolutional neural network network framework

The network is designed as a shallow deep neural network, and the number of network layers is 5 layers; they are input layer, nonlinear transformation layer, and output layer; the input layer is composed of a convolutional layer and a modified linear unit ReLU; the nonlinear transformation layer consists of 3-layer network, each layer consists of a convolutional layer and ReLU activation function, there is a batch normalization between the convolutional layer and the activation function; the output layer is composed of a convolutional layer;

In training, use the stochastic gradient descent algorithm with mini-batch to iterate and update the convolution kernel state W and bias B, perform micro-batch data set operations each time, and use the stochastic gradient descent algorithm to find the global optimal solution;

In the image processing process of CNN, convolutional layers need to be connected through convolutional filters. The definition of convolutional filters is expressed as W×H×C×D, where C represents the number of channels of the filtered image; W, H represents the width and height of the filtering range respectively; D represents the type of convolution filter;

The input layer of the network consists of a convolutional layer and a ReLU activation function. The input layer feature extraction formula is expressed as follows:

F ₁ (X ₁ )=max(0, W ₁ *X ₁ +B ₁ )

(1)

where _X1 is the feature map entering the input layer. W ₁ and B ₁ represent the convolution filter and bias of the input layer respectively. The size of W ₁ is 3×3×3×64, which represents 64 different convolution filters, and the kernel size of each convolution is 3 ×3×3, F ₁ (X ₁ ) is the feature map obtained by the input layer. The input image is a 3×40×40 feature map, which means that the feature map is a 3-channel color map, and the width w and height h are both 40. The calculation formulas of width w ₁ and height h ₁ of the output feature map through the convolutional layer are shown in formula (2) and formula (3), kernel is the kernel size of convolution; stride is the step size of convolution kernel, when the value When it is 1, the overlapping image blocks are extracted, and the effect is better; pad is the number of zero-filled pixels on the edge. In the present invention, the value of kernel is set to 3, the value of stride is 1, and the value of pad is 1. Therefore, after the input image passes through the 64 convolution kernels of the input layer 3×3, a 64×40×40 feature map will be generated; then, the feature map will pass through the modified linear unit ReLU. The representation of ReLu is max(0,X), which extracts useful feature maps. The final output result is still a feature map of 64×40×40;

In the nonlinear mapping process of the nonlinear transformation layer, the convolutional layer, batch normalization, and ReLU functions are located in the second, third, and fourth layers; the formulas of each stage of the nonlinear transformation layer are expressed as follows:

F _i (X _i )=max(o, W _i *F _i-1 ( _xi-1 )+R _i ){i=2, 3, 4}

In formula (4), i represents the i-th layer, Xi _i is the output of the i-1th layer, that is, F _i-1 (X _i-1 ); W _i and B _i represent the convolution filter in the nonlinear transformation stage and bias, where the size of the convolutional filter W ₁ is 3×3×3×64, the size of the 2nd, 3rd, and 4th convolutional layer W _i is 64×3×3×64, and each volume The size of the product kernel is 64×3×3; the feature map of 64×40×40 output by the input layer is input into the second convolution layer, and after 64 convolution kernels 3×3, 64×40× 40 feature map; then, the 64×40×40 feature map enters batch normalization; batch normalization is between the convolutional layer and the ReLU activation function, which solves the problem of slow convergence and gradient explosion during neural network training. The training situation; at the same time, batch normalization speeds up the training speed of the network and improves the model accuracy; finally, the feature map is modified by the linear unit to improve the nonlinearity of the feature; the second layer of the network outputs a 64×40×40 feature After the graph, the third and fourth layers with the same structure as the second layer are passed through, and finally a 64×40×40 feature map is obtained;

In the output reconstruction process of the output layer, the feature map is input to the output layer containing only one convolutional layer; the formula for output reconstruction is expressed as follows:

F ₅ (X ₅ )＝W ₅ *F ₄ (X ₄ )+B ₅ (5)

In the formula, X ₅ is the output of the fourth layer; W ₅ and B ₅ represent the convolution filter and bias of the feature reconstruction layer respectively, the size of W ₅ is 3×3×64×3, and there are 3 feature reconstruction layers The convolution filter is equivalent to the function of the mean filter. The kernel size of each convolution is 3×3×64, which can realize the function of the average feature map. F ₄ (X ₄ ) is the feature map generated by the nonlinear transformation layer. That is, X ₅ ; the feature map output by the nonlinear transformation layer will produce a 3×40×40 feature map after passing through 3 convolution kernels of 3×3;

The collected data set is trained by the network, and the model of each round of training is obtained after more than 50 iterations, and the model is finally saved to the file;

The online part, the specific content is as follows:

Use the model obtained by training to correct the color distortion image to obtain the corrected image; take color cards, face and tongue images in the dark room, and the obtained photos are distorted compared with the actual color, and color correction based on convolutional neural network is used The method performs color correction on the distorted image; first reads the pixel points of the image to be corrected and saves it as an image matrix, and then reads the MAT format file obtained from training to obtain the color correction model; the image matrix is input into the network model, respectively in R, G, The three channels of B perform color correction on the image and output the corrected image.

2. The method according to claim 1, characterized in that:

Use ColorChecker Digital SG as the color card for color correction; take pictures of the ColorChecker Digital SG standard color card under closed environmental conditions; by including changing the shooting angle of the color card, adjusting the distance between the color card and the light source, adjusting the color card and camera The color card images are obtained by shooting in distance mode, and these images are used to generate the training data of the convolutional neural network color correction model.