WO2023236565A1 - 一种基于强化学习和美学评估的低光图像增强方法 - Google Patents

一种基于强化学习和美学评估的低光图像增强方法 Download PDF

Info

Publication number
WO2023236565A1
WO2023236565A1 PCT/CN2023/074843 CN2023074843W WO2023236565A1 WO 2023236565 A1 WO2023236565 A1 WO 2023236565A1 CN 2023074843 W CN2023074843 W CN 2023074843W WO 2023236565 A1 WO2023236565 A1 WO 2023236565A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
image
network
loss
reinforcement learning
Prior art date
Application number
PCT/CN2023/074843
Other languages
English (en)
French (fr)
Inventor
梁栋
李铃
黄圣君
陈松灿
Original Assignee
南京航空航天大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京航空航天大学 filed Critical 南京航空航天大学
Publication of WO2023236565A1 publication Critical patent/WO2023236565A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • G06T5/94Dynamic range modification of images or parts thereof based on local image properties, e.g. for local contrast enhancement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Definitions

  • the invention relates to the technical field of image enhancement, and mainly relates to a low-light image enhancement method based on reinforcement learning and aesthetic evaluation.
  • Low-light image enhancement is very important in computer vision. The field plays a very important role. Pictures taken in low light often have many adverse effects. For example, blurred images lead to uncertainty about the subject of the image, blurred faces lead to inaccurate recognition, and blurred details lead to incorrect image expression. This will not only affect people's experience of using camera equipment and reduce the quality of photos; sometimes it will also lead to conveying wrong information. Low-light image enhancement makes the captured images brighter, with higher contrast and more obvious structural information, which is beneficial to subsequent high-level work, such as target detection, face recognition, image classification, etc., and has strong practical significance.
  • LL-Net Korean Gwn Lore, Adedotun Akintayo, and Soumik Sarkar. 2017. LLNet: A deep autoencoder approach to natural low-light image enhancement. Pattern Recognition 61 (2017), 650–662. proposes a stacked autoencoder processor that utilizes synthesized low-light/normal-light image pairs for simultaneous denoising and enhancement.
  • the distribution of synthetic data inevitably deviates from real-world images, thus leading to severe performance degradation when transferred to real situations. Subsequently, Wei et al.
  • the most important evaluation criterion is the user's subjective evaluation.
  • objective evaluation indicators (loss functions) with or without reference are often used to guide the training of the model in the training stage of the model.
  • the loss functions with reference mainly include L 1 loss, L 2 loss and SSIM loss.
  • the loss functions without reference mainly use Spatial Consistency Loss, Exposure Control Loss, and Color Constancy Loss. (Color Constancy Loss) and Illumination Smoothness Loss.
  • the above-mentioned reference/no-reference loss function focuses more on the gap between low-light images and normal brightness images and the characteristics of the image itself, ignoring the user's subjective evaluation.
  • the present invention provides a low-light image enhancement method based on reinforcement learning and aesthetic evaluation.
  • the action space range is defined. It is broader and includes not only operations to increase the brightness of image pixels, but also operations to reduce the brightness of image pixels.
  • the enhancement operation can be performed multiple times, and by learning a random enhancement strategy, it has higher flexibility for real-life scenarios.
  • a new one can be approximately regarded as Aesthetic quality ratings of user subjective evaluation metrics as part of the loss function.
  • a low-light image enhancement method based on reinforcement learning and aesthetic evaluation including the following steps:
  • Step S1 Generate abnormal brightness images under different lighting scenarios, and build a training data set for the reinforcement learning system based on the images;
  • Step S2 Initialize the training data set, policy network and value network in the reinforcement learning system
  • Step S3 Update the policy network and value network based on the no-reference reward value and the aesthetic evaluation reward value
  • Step S4 When all samples are trained and all training iterations are completed, the model training is completed
  • Step S5 Output the image result after enhancing the low-light image.
  • step S2 includes:
  • s (t) represents the state at time step t;
  • the output of the policy network is the policy ⁇ (a (t)
  • the value network output value is V(s (t) ), representing the expected total reward from the current state s (t) .
  • step S3 the specific steps of updating the policy network and value network in step S3 include:
  • ⁇ i is the ith power of the discount factor ⁇
  • r (t) represents the environmental reward value at time t
  • Step S3.2 Train the training data set based on historical stage images to obtain the value network output value
  • Step S3.3 Update the value network based on the environmental reward value and the value network output value:
  • ⁇ v represents the value network parameters
  • Step S3.4 Update the policy network based on the environmental reward value and predicted value:
  • the output of the policy network adopts the policy ⁇ (a (t)
  • ⁇ p represents the policy network parameters
  • I t-1 (x) represents the pixel value at the image pixel point x at the t-1 step iteration
  • a t (x) represents the t-step iteration.
  • I t (x) represents the enhanced pixel value at the image pixel x at the t-step iteration.
  • step S3.1 considers the following influencing factors:
  • K represents the size of the local area
  • ⁇ (i) represents the four adjacent areas centered on area i
  • Y represents the average pixel gray value of the local area in the enhanced image
  • I represents the average pixel value of the local area in the input image grayscale value
  • E represents the gray level of the image pixels in the RGB color space
  • M represents several non-overlapping local areas
  • Y represents the average gray level of the pixels in a local area in the enhanced image, and the size of the local area is ⁇ k; k ⁇ [1,M] ⁇ ;
  • J p represents the average pixel gray value of p channels in the enhanced image
  • (p, q) represents any channel in (R, G), (R, B), (G, B);
  • represents The set of (R,G), (R,B), (G,B);
  • N represents the number of iterations of image enhancement in reinforcement learning, and represents the horizontal and vertical gradient operations in turn
  • represents the set of the three channels R, G, and B in the image
  • an image aesthetic scoring deep learning network model is additionally introduced to aesthetically score the image, and then calculate the aesthetic quality loss; two independent aesthetics are trained using the color and brightness attributes and quality attributes of the image respectively. Scoring models, recorded as Model 1 and Model 2 ; scored through an additionally introduced aesthetic evaluation model;
  • f 1 represents the score of the color and brightness attributes in the enhanced image, that is, the score output after the enhanced image is input to Model 1
  • f 2 represents the score of the enhanced image quality attribute, that is, the score after the enhanced image is input to Model 1
  • the score output after 2 the higher the score, the better the quality of the image
  • ⁇ and ⁇ are both weight coefficients
  • the goal of image enhancement is to make the reward value r as large as possible; the smaller the spatial consistency loss, exposure control loss, color constancy loss and brightness smoothing loss, the better the image quality, and the larger the aesthetic quality loss, the better the image quality; therefore,
  • the local area size K is set to 4 ⁇ 4 in the spatial consistency loss.
  • E is set to 0.6 in the exposure control loss
  • M represents a non-overlapping local area of size 16 ⁇ 16.
  • the enhanced operation of the input low-light image has a larger dynamic range and has higher flexibility for real-life scenes. Taking into account the uneven lighting and backlight in low-light scenes, not only the action to enhance the brightness of the image is set, but the action space also includes the behavior of darkening the brightness of the image. This way of definition can better meet the needs of real scenes. Low-light image enhancement needs.
  • the present invention can make the enhanced image have better visual effects and user subjective evaluation scores.
  • most methods rely on paired training data sets and use reference loss functions. Some methods use the information of the image itself to design a reference-free loss function for guidance. Enhance network training. However, the above losses are all objective evaluation indicators.
  • the present invention introduces the aesthetic evaluation score as an indicator to simulate the user's subjective evaluation, which can better guide the low-light image enhancement network to generate high-quality images that are satisfactory to the user.
  • Figure 1 is a flow chart of a low-light image enhancement method based on reinforcement learning and aesthetic evaluation provided by the present invention
  • Figure 2 is an algorithm framework diagram of the low-light image enhancement method based on reinforcement learning and aesthetic evaluation provided by the present invention
  • the present invention provides a low-light image enhancement method based on reinforcement learning and aesthetic evaluation.
  • the specific principle is shown in Figure 1, which includes the following steps:
  • Step S1 Generate abnormal brightness images under different lighting scenarios, and build a training data set for the reinforcement learning system based on the images.
  • Step S2 Initialize the training data set, policy network and value network in the reinforcement learning system.
  • the current state s (t) is used as the input of the policy network and the value network, and s (t) represents the state at time step t.
  • the output of the policy network is the policy ⁇ (a (t)
  • the value network output value is V(s (t) ), which represents the expected total reward from the current state s (t) , which shows how good the current network state is.
  • Step S3 Update the policy network and value network based on the no-reference reward value and the aesthetic evaluation reward value. specifically,
  • ⁇ i is the ith power of the discount factor ⁇ .
  • r (t) represents the environmental reward value at time t. The following influencing factors are considered when calculating the environmental reward value:
  • K represents the local area size.
  • ⁇ (i) represents the four adjacent areas centered on area i.
  • Y represents the average pixel gray value of the local area in the enhanced image.
  • I represents the average gray value of pixels in the local area of the input image.
  • the local area size K is set to 4 ⁇ 4 based on experience.
  • E represents the gray level of the image pixel in the RGB color space.
  • M represents several non-overlapping local areas.
  • Y represents the average pixel gray value of a local area in the enhanced image, and the size of the local area is ⁇ k; k ⁇ [1, M] ⁇ ; in this embodiment, E is set to 0.6, and M represents the size of 16 ⁇ 16 non-overlapping local areas.
  • J p represents the average pixel gray value of p channels in the enhanced image
  • (p, q) represents any channel in (R, G), (R, B), (G, B)
  • represents The set of (R,G), (R,B), (G,B);
  • N represents the number of iterations of image enhancement in reinforcement learning, and represents the horizontal and vertical gradient operations in turn
  • represents the set of three channels of R, G, and B in the image;
  • Aesthetic image analysis currently attracts more and more attention in the field of computer vision. It is related to an advanced perception of visual aesthetics.
  • Machine learning models for image aesthetic quality assessment have broad application prospects, such as image retrieval, photo management, image editing, and photography.
  • aesthetic quality evaluation is always related to the color and brightness of the image, the quality, composition and depth of the image, and the semantic content. It is difficult to think of aesthetic quality assessment as an isolated task.
  • the present invention additionally introduces an image aesthetic scoring deep learning network model to perform aesthetic scoring on the image, and then calculate the aesthetic quality loss.
  • Two independent aesthetic scoring models are trained using the color and brightness attributes and quality attributes of the image respectively, marked as Model 1 and Model 2 .
  • f 1 represents the score of the color and brightness attributes in the enhanced image, that is, the score output after the enhanced image is input to Model 1 .
  • f 2 represents the score of the enhanced image quality attribute, that is, the score output after the enhanced image is input to Model 2. The higher the score, the better the quality of the image.
  • ⁇ and ⁇ are weight coefficients.
  • Step S3.2 Train the training data set based on historical stage images to obtain the value network output value.
  • Step S3.3 Update the value network based on the environmental reward value and the value network output value:
  • Step S3.4 Update the policy network based on the environmental reward value and the value network output value:
  • the output of the policy network adopts the policy ⁇ (a (t)
  • A represents action space.
  • the output dimension of the policy network is
  • ⁇ p represents the policy network parameters
  • the low-light image in that state is first fed into the policy network.
  • the policy grid formulates an enhancement strategy for each pixel in the image based on the current input image and outputs it.
  • the input image is enhanced according to the policy formulated by the policy network.
  • the enhancement operation requires multiple iterations in the pre-established scheme.
  • the action space A is crucial to the performance of the network, because too small a range will result in limited enhancement of low-light images, while too large a range will result in a very large search space, and network training will become very difficult.
  • I t-1 (x) represents the pixel value at the image pixel point x at the t-1 step iteration
  • a t (x) represents the action selected at the image pixel point x at the t-step iteration
  • I t (x) Represents the enhanced pixel value at image pixel point x during t-step iteration.
  • I output (x) I input (x)+ ⁇ I input (x)(1-I input (x))
  • I input (x) represents the pixel value at the x coordinate of the input image.
  • the pixel value is normalized within the range of [0,1].
  • I output (x) represents the pixel value output after adjusting the adjustment parameters.
  • Each pixel is within the normalized range of [0,1].
  • Step S4 When all samples are trained and all training iterations are completed, model training is completed.
  • Step S5 Output the image result after enhancing the low-light image.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于强化学习和美学评估的低光图像增强方法,首先生成不同光照场景下的非正常亮度图像,并基于图像构建强化学习系统的训练数据集;接着初始化强化学习系统中的训练数据集、策略网络和价值网络,基于无参考奖励值和美学评估奖励值更新策略网络和价值网络;完成训练后输出增强后的图像结果;本发明通过对强化学习中定义的动作空间范围扩大,输入的低光图像得到的增强操作就有了更大的动态范围,对于现实场景具有更高的灵活性,能更好的满足真实场景下的低光图像增强需求;此外通过引入美学质量评估的分数作为损失函数的一部分,可以使增强后的图像具有更好的视觉效果及用户主观评价得分。

Description

一种基于强化学习和美学评估的低光图像增强方法 技术领域
本发明涉及图像增强技术领域,主要涉及一种基于强化学习和美学评估的低光图像增强方法。
背景技术
在恶劣照明条件下拍摄的图片,由于数码相机传感器的入射光亮不足,会导致图像的动态范围低,同时还会受到噪声的严重干扰,难以获得高质量的图像,而低光图像增强在计算机视觉领域扮演着十分重要的角色。低光照拍摄的图片往往会有很多不良影响,例如拍出来的图像模糊导致图像主体不确定,人脸模糊导致识别不准确,细节模糊导致图像表达意思错误。这样的话不仅会影响人使用摄像设备的体验,降低了照片质量;有时候更是会导致传达错误的信息。低光图像增强使拍摄的图像亮度更亮、对比度更高、结构信息更明显,从而有利于后续的高层次工作,比如目标检测、人脸识别、图像分类等,具有很强的现实意义。
近年来,基于深度学习的方法通常以高质量的正常光图像作为指导,来学习如何改进和增强低光图像。LL-Net(Kin Gwn Lore,Adedotun Akintayo,and Soumik Sarkar.2017.LLNet:A deep autoencoder approach to natural low-light image enhancement.Pattern Recognition 61(2017),650–662.)提出了一种堆叠自动编码器,利用合成的低光/正常光图像对同时进行去噪和增强。然而,由于其与真实图像的差异,合成数据的分布不可避免地偏离真实世界的图像,因此在转移到真实情况时导致性能严重下降。随后,Wei等人(Chen Wei,Wenjing Wang,Wenhan Yang,and Jiaying Liu.2018.Deep retinex decomposition for low-light enhancement.arXiv preprint arXiv:1808.04560(2018).)收集了一个具有低光/正常光图像对的真实数据集,在此基础上,提出了视网膜网络以数据驱动的方式将图像分解为照明和反射率。在此之后,还有很多其他有监督的低光图像增强神经网络被提出(Wenhan Yang,Shiqi Wang,Yuming Fang,Yue Wang,and Jiaying Liu.2020.From Fidelity to Perceptual Quality:A Semi-Supervised Approach for Low-Light Image Enhancement.In IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).;Yonghua Zhang,Jiawan Zhang,and Xiaojie Guo.2019.Kindling the darkness:A practical low-light image enhancer.In Proceedings of the 27th ACM International Conference on Multimedia. 1632–1640.)。最近的方法侧重于无监督的低光图像增强,它可以直接使用没有任何配对训练数据的低光图像来对模型进行训练。最近的Zero-DCE(Chunle Guo,Chongyi Li,Jichang Guo,Chen Change Loy,Junhui Hou,Sam Kwong,and Runmin Cong.2020.Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement.In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.1780–1789.)使用非参考损失来训练深度低光图像增强模型。然而,现有的深度学习方法往往只关注对亮度不足的低光图像的增强,但是在背光条件和光照不均匀场景下的低光图像中还会存在正常亮度或者过度曝光的现象。
另一方面,对于图像增强任务而言,最重要的评价标准就是用户的主观评价。但是在现有的方法中在模型的训练阶段往往都采用有参考/无参考的客观评价指标(损失函数)来指导模型的训练。其中有参考的损失函数主要包括L1损失、L2损失和SSIM损失,无参考的损失函数主要使用的是空间一致性损失(Spatial Consistency Loss)、曝光控制损失(Exposure Control Loss)、颜色恒常损失(Color Constancy Loss)和亮度平滑损失(Illumination Smoothness Loss)。以上所述的有参考/无参考损失函数关注的更多是低光图像和正常亮度图像之间的差距以及图像自身的特征,忽略了用户主观评价。
发明内容
发明目的:针对上述背景技术中存在的问题,本发明提供了一种基于强化学习和美学评估的低光图像增强方法,首先考虑到低光图像成像方式和场景的复杂性,将动作空间范围定义的更广,不仅包括将图像像素亮度提高的操作,还有将图像像素亮度降低的操作。增强操作可以多次进行,通过学习一个随机增强策略,对于现实场景具有更高的灵活性,其次在计算损失函数时,在使用更灵活的无参考损失的同时,引入了新的可以近似看作用户主观评价指标的美学质量评分作为损失函数的一部分。
技术方案:为实现上述目的,本发明采用的技术方案为:
一种基于强化学习和美学评估的低光图像增强方法,包括以下步骤:
步骤S1、生成不同光照场景下的非正常亮度图像,并基于所述图像构建强化学习系统的训练数据集;
步骤S2、初始化强化学习系统中的训练数据集、策略网络和价值网络;
步骤S3、基于无参考奖励值和美学评估奖励值更新策略网络和价值网络;
步骤S4、当所有样本训练完成,且完成所有训练迭代次数时,模型训练完毕;
步骤S5、输出对低光图像增强后的图像结果。
进一步地,所述步骤S2中初始化策略网络和价值网络具体方法包括:
使用当前状态s(t)作为策略网络和价值网络的输入,s(t)代表时间步长t时的状态;策略网络的输出是采取动作a(t)的策略π(a(t)|s(t));价值网络输出值为V(s(t)),代表来自当前状态s(t)的预期总奖励。
进一步地,所述步骤S3中更新策略网络和价值网络的具体步骤包括:
步骤S3.1、基于历史阶段图像对训练数据集进行训练,获得环境奖励值如下:
R(t)=r(t)+γr(t+1)2r(t+2)+...+γn-1r(t+n-1)nV(s(t+n))
其中γi是折扣因子γ的第i次幂,r(t)代表t时刻的环境奖励值;
步骤S3.2、基于历史阶段图像对训练数据集进行训练,获得价值网络输出值;
步骤S3.3、基于环境奖励值和价值网络输出值对价值网络进行更新:
其中θv代表价值网络参数;
步骤S3.4、基于环境奖励值和预测价值对策略网络进行更新:
所述策略网络的输出采用动作a(t)∈A的策略π(a(t)|s(t)),其中π(a(t)|s(t))为通过softmax计算得到的概率;A代表动作空间;所述策略网络的输出维度为|A|;具体更新如下:
A(a(t),s(t))=R(t)-V(s(t))
其中θp代表策略网络参数。
进一步地,所述步骤S3.4中将动作空间A范围设置为A∈[-0.5,0.5],步距为0.05,用于预先定义的输出表示,具体如下:
It(x)=It-1(x)+At(x)It-1(x)(1-It-1(x))
其中It-1(x)代表在t-1步迭代时图像像素点x处的像素值,At(x)代表在t步迭代 时图像像素点x处选择的动作,It(x)代表在t步迭代时图像像素点x处增强后的像素值。
进一步地,所述步骤S3.1中环境奖励值考虑以下影响因子:
(1)空间一致性损失
其中K代表局部区域大小;Ω(i)代表以区域i为中心的四个相邻区域;Y代表增强后的图像中局部区域的像素平均灰度值;I代表输入图像中局部区域的像素平均灰度值;
(2)曝光控制损失
其中E代表图像像素在RGB颜色空间中的灰度水平;M代表非重叠的若干局部区域;Y代表增强后的图像中一个局部区域的像素平均灰度值,局部区域的大小为{k;k∈[1,M]};
(3)颜色恒常损失
其中Jp代表增强后的图像中p个通道的像素平均灰度值,(p,q)代表(R,G),(R,B),(G,B)中的任一通道;ε表示(R,G),(R,B),(G,B)的集合;
(4)亮度平滑损失
其中代表每个状态下的参数曲线映射,N代表强化学习中图像增强的迭代次数,依次代表水平和垂直梯度运算;ξ表示图像中R,G,B三个通道的集合;
(5)美学质量损失
为了对增强后图像的美学质量进行评分,额外引入图像美学评分深度学习网络模型来对图像进行美学评分,进而计算美学质量损失;分别利用图像的颜色和亮度属性以及质量属性训练两个独立的美学评分模型,记为Model1和Model2;通过一个额外引入的美学评估模型进行评分;
美学质量损失表示如下:
Leva=αf1+βf2
其中f1表示增强后的图像中颜色和亮度属性的评分,即将增强后的图像输入到Model1后输出的分数;f2表示增强后的图像质量属性的评分,即将增强后的图像输入到Model2后输出的分数,评分越高表示图像的质量越好;α和β均为权重系数;
图像增强的目标是使奖励值r尽可能大;空间一致性损失、曝光控制损失、颜色恒常损失和亮度平滑损失越小表示图像质量越好,美学质量损失越大表示图像质量越好;因此,在t时刻的奖励值r(t)表示如下:
r(t)=-Lspa-Lexp-Lcol-Ltv+Leva
t时刻的环境奖励值在引入影响因子的条件下表示如下:
R(t)=r(t)+γr(t+1)2r(t+2)+...+γn-1r(t+n-1)nV(s(t+n))。
进一步地,空间一致性损失中将局部区域大小K设置为4×4。
进一步地,曝光控制损失中将E设置为0.6,M代表大小为16×16的非重叠局部区域。
有益效果:
(1)本发明通过对强化学习中定义的动作空间范围扩大,输入的低光图像得到的增强操作就有了更大的动态范围,对于现实场景具有更高的灵活性。考虑到低光场景中的光照不均匀和背光等情况,不仅设置了将图像亮度增强的动作,而且动作空间中还包括将图像亮度变暗的行为,这样的定义方式能更好的满足真实场景下的低光图像增强需求。
(2)本发明通过引入美学质量评估的分数作为损失函数的一部分,可以使增强后的图像具有更好的视觉效果及用户主观评价得分。在现有的基于深度学习的低光增强方法中,大多数方法依赖于成对的训练数据集并采用有参考的损失函数,也有部分方法利用图像自身的信息设计了无参考的损失函数来指导增强网络的训练。但是上述损失均为客观评价指标,本发明引入美学评估分数作为模拟用户主观评价的指标,能够更好的指导低光图像增强网络生成用户满意的高质量图像。
附图说明
图1是本发明提供的基于强化学习和美学评估的低光图像增强方法流程图;
图2是本发明提供的基于强化学习和美学评估的低光图像增强方法的算法框架图;
具体实施方式
下面结合附图对本发明作更进一步的说明。显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本发明提供了一种基于强化学习和美学评估的低光图像增强方法,具体原理如图1所示,包括以下步骤:
步骤S1、生成不同光照场景下的非正常亮度图像,并基于图像构建强化学习系统的训练数据集。
步骤S2、初始化强化学习系统中的训练数据集、策略网络和价值网络。
具体地,参考图2,使用当前状态s(t)作为策略网络和价值网络的输入,s(t)代表时间步长t时的状态。策略网络的输出是采取动作a(t)的策略π(a(t)|s(t))。价值网络输出值为V(s(t)),代表来自当前状态s(t)的预期总奖励,这显示了当前网络状态有多好。
步骤S3、基于无参考奖励值和美学评估奖励值更新策略网络和价值网络。具体地,
步骤S3.1、基于历史阶段图像对训练数据集进行训练,获得环境奖励值如下:
R(t)=r(t)+γr(t+1)2r(t+2)+...+γn-1r(t+n-1)nV(s(t+n))
其中γi是折扣因子γ的第i次幂。r(t)代表t时刻的环境奖励值。计算环境奖励值时考虑以下影响因子:
(1)空间一致性损失
其中K代表局部区域大小。Ω(i)代表以区域i为中心的四个相邻区域。Y代表增强后的图像中局部区域的像素平均灰度值。I代表输入图像中局部区域的像素平均灰度值。本实施例中根据经验将局部区域大小K设置为4×4
(2)曝光控制损失
其中E代表图像像素在RGB颜色空间中的灰度水平。M代表非重叠的若干局部区域。Y代表增强后的图像中一个局部区域的像素平均灰度值,局部区域的大小为{k;k∈[1,M]};本实施例中将E设置为0.6,M代表大小为16×16的非重叠局部区域。
(3)颜色恒常损失
其中Jp代表增强后的图像中p个通道的像素平均灰度值,(p,q)代表(R,G),(R,B),(G,B)中的任一通道,ε表示(R,G),(R,B),(G,B)的集合;
(4)亮度平滑损失
其中代表每个状态下的参数曲线映射,N代表强化学习中图像增强的迭代次数, 依次代表水平和垂直梯度运算,ξ表示图像中R,G,B三个通道的集合;。
(5)美学质量损失
目前美学图像分析在计算机视觉领域中引起了越来越多的关注。它与对视觉美学的高级感知有关。用于图像美学质量评估的机器学习模型具备广泛应用前景,如图像检索、照片管理、图像编辑和摄影等。对于人类来说,审美质量评价总是与图像的颜色和亮度、图像的质量、构图和深度以及语义内容相关联。很难将审美质量评价视为一项孤立的任务。为了对增强后图像的美学质量进行评分,本发明额外引入图像美学评分深度学习网络模型来对图像进行美学评分,进而计算美学质量损失。分别利用图像的颜色和亮度属性以及质量属性训练两个独立的美学评分模型,记为Model1和Model2。美学质量损失的表示如下:
Leva=αf1+βf2
其中f1表示增强后的图像中颜色和亮度属性的评分,即将增强后的图像输入到Model1后输出的分数。f2表示增强后的图像质量属性的评分,即将增强后的图像输入到Model2后输出的分数,评分越高表示图像的质量越好。α和β均为权重系数。
图像增强的目标是使奖励值r尽可能大。空间一致性损失、曝光控制损失、颜色恒常损失和亮度平滑损失越小表示图像质量越好,美学质量损失越大表示图像质量越好。因此,在t时刻的奖励值r(t)表示如下:
r(t)=-Lspa-Lexp-Lcol-Ltv+Leva
t时刻的环境奖励值在引入影响因子的条件下表示如下:
R(t)=r(t)+γr(t+1)2r(t+2)+...+γn-1r(t+n-1)nV(s(t+n))。
步骤S3.2、基于历史阶段图像对训练数据集进行训练,获得价值网络输出值。
步骤S3.3、基于环境奖励值和价值网络输出值对价值网络进行更新:
其中θv代表价值网络参数。
步骤S3.4、基于环境奖励值和价值网络输出值对策略网络进行更新:
策略网络的输出采用动作a(t)∈A的策略π(a(t)|s(t)),其中π(a(t)|s(t))为通过softmax计算得到的概率。A代表动作空间。策略网络的输出维度为|A|。具体更新如下:
A(a(t),s(t))=R(t)-V(s(t))
其中θp代表策略网络参数。
在增强操作的每一步中,首先将该状态下的低光图像输入策略网络。策略网格根据当前输入的图像对图像中每个像素制定一个增强策略并输出。输入的图像根据策略网络制定的策略执行增强操作。该增强操作在预先制定的方案需要迭代多次。
动作空间A对于网络的性能而言至关重要,因为太小的范围会导致对低光图像的增强有限,而太大的范围会导致非常大的搜索空间,网络训练会变得非常困难。本实施例中,根据经验设置范围A∈[-0.5,0.5],步距为0.05。这一设置作用在预先定义的输出表示,具体如下:
It(x)=It-1(x)+At(x)It-1(x)(1-It-1(x))
其中It-1(x)代表在t-1步迭代时图像像素点x处的像素值,At(x)代表在t步迭代时图像像素点x处选择的动作,It(x)代表在t步迭代时图像像素点x处增强后的像素值。
类似照片编辑软件中使用的图像亮度曲线调整,这里预先定义的输出表示是一个二次曲线,表示为:
Ioutput(x)=Iinput(x)+δIinput(x)(1-Iinput(x))
其中x表示像素坐标,δ是调整参数。Iinput(x)表示输入图像x坐标处的像素值,像素值被归一化在[0,1]范围内,Ioutput(x)表示经过调整参数调整后输出的像素值。
上述设置可以确保:
a、每个像素都在[0,1]的归一化范围内。
b、降低了寻找合适增强策略的成本。对于不同的增强迭代次数选择,我们的增强曲线可以有效地覆盖该动作空间设置下的像素值空间。
步骤S4、当满足对所有样本训练完成,且完成所有训练迭代次数时,模型训练完成。
步骤S5、输出对低光图像增强后的图像结果。
以上所述仅是本发明的优选实施方式,应当指出:对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。

Claims (7)

  1. 一种基于强化学习和美学评估的低光图像增强方法,其特征在于,包括以下步骤:
    步骤S1、生成不同光照场景下的非正常亮度图像,并基于所述图像构建强化学习系统的训练数据集;
    步骤S2、初始化强化学习系统中的训练数据集、策略网络和价值网络;
    步骤S3、基于无参考奖励值和美学评估奖励值更新策略网络和价值网络;
    步骤S4、当所有样本训练完成,且完成所有训练迭代次数时,模型训练完毕;
    步骤S5、输出对低光图像增强后的图像结果。
  2. 根据权利要求1所述的一种基于强化学习和美学评估的低光图像增强方法,其特征在于,所述步骤S2中初始化策略网络和价值网络具体方法包括:
    使用当前状态s(t)作为策略网络和价值网络的输入,s(t)代表时间步长t时的状态;策略网络的输出是采取动作a(t)的策略π(a(t)|s(t));价值网络输出值为V(s(t)),代表来自当前状态s(t)的预期总奖励。
  3. 根据权利要求2所述的一种基于强化学习和美学评估的低光图像增强方法,其特征在于,所述步骤S3中更新策略网络和价值网络的具体步骤包括:
    步骤S3.1、基于历史阶段图像对训练数据集进行训练,获得环境奖励值如下:
    R(t)=r(t)+γr(t+1)2r(t+2)+…γn-1r(t+n-1)nV(s(t+n))
    其中γi是折扣因子γ的第i次幂,r(t)代表t时刻的环境奖励值;
    步骤S3.2、基于历史阶段图像对训练数据集进行训练,获得价值网络输出值;
    步骤S3.3、基于环境奖励值和价值网络输出值对价值网络进行更新:
    其中θv代表价值网络参数;
    步骤S3.4、基于环境奖励值和预测价值对策略网络进行更新:
    所述策略网络的输出采用动作a(t)∈A的策略π(a(t)|s(t)),其中π(a(t)|s(t))为通过softmax计算得到的概率;A代表动作空间;所述策略网络的输出维度为|A|;具体更新如下:
    A(a(t),s(t))=R(t)-V(s(t))
    其中θp代表策略网络参数。
  4. 根据权利要求3所述的一种基于强化学习和美学评估的低光图像增强方法,其特征在于,所述步骤S3.4中将动作空间A范围设置为A∈[-0.5,0.5],步距为0.05,用于预先定义的输出表示,具体如下:
    It(x)=It-1(x)+At(x)It-1(x)(1-It-1(x))
    其中It-1(x)代表在t-1步迭代时图像像素点x处的像素值,It(x)代表在t步迭代时图像像素点x处选择的动作,It(x)代表在t步迭代时图像像素点x处增强后的像素值。
  5. 根据权利要求3所述的一种基于强化学习和美学评估的低光图像增强方法,其特征在于,所述步骤S3.1中环境奖励值考虑以下影响因子:
    (1)空间一致性损失
    其中K代表局部区域大小;Ω(i)代表以区域i为中心的四个相邻区域;Y代表增强后的图像中局部区域的像素平均灰度值;I代表输入图像中局部区域的像素平均灰度值;
    (2)曝光控制损失
    其中E代表图像像素在RGB颜色空间中的灰度水平;M代表非重叠的若干局部区域;Y代表增强后的图像中一个局部区域的像素平均灰度值,局部区域的大小为{k;k∈[1,M]};
    (3)颜色恒常损失
    其中Jp代表增强后的图像中p个通道的像素平均灰度值,Jq代表增强后的图像中q个通道的像素平均灰度值;(p,q)代表(R,G),(R,B),(G,B),中的任一通道,ε表示(R,G),(R,B),(G,B)的集合;
    (4)亮度平滑损失
    其中代表每个状态下的参数曲线映射,N代表强化学习中图像增强的迭代次数,依次代表水平和垂直梯度运算;ξ表示图像中R,G,B三个通道的集合;
    (5)美学质量损失
    为了对增强后图像的美学质量进行评分,额外引入图像美学评分深度学习网络模型来对图像进行美学评分,进而计算美学质量损失;分别利用图像的颜色和亮度属性以及质量属性训练两个独立的美学评分模型,记为Model1和Model2;通过一个额外引入的美学评估模型进行评分;
    美学质量损失表示如下:
    Leva=αf1+βf2
    其中f1表示增强后的图像中颜色和亮度属性的评分,即将增强后的图像输入到Model1后输出的分数;f2表示增强后的图像质量属性的评分,即将增强后的图像输入到Model2后输出的分数,评分越高表示图像的质量越好;α和β均为权重系数;
    图像增强的目标是使奖励值r尽可能大;空间一致性损失、曝光控制损失、颜色恒常损失和亮度平滑损失越小表示图像质量越好,美学质量损失越大表示图像质量越好;因此,在t时刻的奖励值r(t)表示如下:
    r(t)=-Lspa-Lexp-Lcol-Ltv+Leva
    t时刻的环境奖励值在引入影响因子的条件下表示如下:
    R(t)=r(t)+γr(t+1)2r(t+2)+…+γn-1r(t+n-1)+γnV(s(t+n))。
  6. 根据权利要求5所述的一种基于强化学习和美学评估的低光图像增强方法,其特征在于,空间一致性损失中将局部区域大小K设置为4×4。
  7. 根据权利要求5所述的一种基于强化学习和美学评估的低光图像增强方法,其特征在于,曝光控制损失中将E设置为0.6,M代表大小为16×16的非重叠局部区域。
PCT/CN2023/074843 2022-06-10 2023-02-07 一种基于强化学习和美学评估的低光图像增强方法 WO2023236565A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210650946.7A CN114723643B (zh) 2022-06-10 2022-06-10 一种基于强化学习和美学评估的低光图像增强方法
CN202210650946.7 2022-06-10

Publications (1)

Publication Number Publication Date
WO2023236565A1 true WO2023236565A1 (zh) 2023-12-14

Family

ID=82232404

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/074843 WO2023236565A1 (zh) 2022-06-10 2023-02-07 一种基于强化学习和美学评估的低光图像增强方法

Country Status (2)

Country Link
CN (1) CN114723643B (zh)
WO (1) WO2023236565A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117690062A (zh) * 2024-02-02 2024-03-12 武汉工程大学 一种矿内矿工异常行为检测方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114723643B (zh) * 2022-06-10 2022-10-25 南京航空航天大学 一种基于强化学习和美学评估的低光图像增强方法
CN115511754B (zh) * 2022-11-22 2023-09-12 北京理工大学 基于改进的Zero-DCE网络的低照度图像增强方法
CN117893449A (zh) * 2024-03-15 2024-04-16 荣耀终端有限公司 图像处理方法及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113033693A (zh) * 2021-04-09 2021-06-25 中国矿业大学 一种融合用户主观属性的个性化图像美学评价方法及装置
CN114037622A (zh) * 2021-10-25 2022-02-11 浙江工业大学 一种基于成像模型和强化学习的水下图像增强方法
CN114219182A (zh) * 2022-01-20 2022-03-22 天津大学 一种基于强化学习的异常天气场景风电预测方法
CN114723643A (zh) * 2022-06-10 2022-07-08 南京航空航天大学 一种基于强化学习和美学评估的低光图像增强方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11069030B2 (en) * 2018-03-22 2021-07-20 Adobe, Inc. Aesthetics-guided image enhancement
CN112258420B (zh) * 2020-11-02 2022-05-20 北京航空航天大学杭州创新研究院 基于dqn的图像增强处理方法及装置
CN114283083B (zh) * 2021-12-22 2024-05-10 杭州电子科技大学 一种基于解耦表示的场景生成模型的美学增强方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113033693A (zh) * 2021-04-09 2021-06-25 中国矿业大学 一种融合用户主观属性的个性化图像美学评价方法及装置
CN114037622A (zh) * 2021-10-25 2022-02-11 浙江工业大学 一种基于成像模型和强化学习的水下图像增强方法
CN114219182A (zh) * 2022-01-20 2022-03-22 天津大学 一种基于强化学习的异常天气场景风电预测方法
CN114723643A (zh) * 2022-06-10 2022-07-08 南京航空航天大学 一种基于强化学习和美学评估的低光图像增强方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117690062A (zh) * 2024-02-02 2024-03-12 武汉工程大学 一种矿内矿工异常行为检测方法
CN117690062B (zh) * 2024-02-02 2024-04-19 武汉工程大学 一种矿内矿工异常行为检测方法

Also Published As

Publication number Publication date
CN114723643A (zh) 2022-07-08
CN114723643B (zh) 2022-10-25

Similar Documents

Publication Publication Date Title
WO2023236565A1 (zh) 一种基于强化学习和美学评估的低光图像增强方法
Zhuang et al. Underwater image enhancement with hyper-laplacian reflectance priors
Ma et al. Deep guided learning for fast multi-exposure image fusion
Zhou et al. Underwater image restoration via backscatter pixel prior and color compensation
US8280165B2 (en) System and method for segmenting foreground and background in a video
Liu et al. HoLoCo: Holistic and local contrastive learning network for multi-exposure image fusion
Niu et al. 2D and 3D image quality assessment: A survey of metrics and challenges
US8692830B2 (en) Automatic avatar creation
CN111047543A (zh) 图像增强方法、装置和存储介质
Hou et al. Underwater image dehazing and denoising via curvature variation regularization
JP2006301779A (ja) 画像処理システム、画像処理方法及び画像処理プログラム
WO2022199710A1 (zh) 图像融合方法、装置、计算机设备和存储介质
Liu et al. Progressive complex illumination image appearance transfer based on CNN
CN110321452B (zh) 一种基于方向选择机制的图像检索方法
Xiong et al. An efficient underwater image enhancement model with extensive Beer-Lambert law
Xie et al. Color transfer using adaptive second-order total generalized variation regularizer
Li et al. All-e: aesthetics-guided low-light image enhancement
Zhang et al. Dehazing with improved heterogeneous atmosphere light estimation and a nonlinear color attenuation prior model
Huang et al. An end-to-end dehazing network with transitional convolution layer
Chen et al. Low‐light image enhancement based on exponential Retinex variational model
CN115018729A (zh) 一种面向内容的白盒图像增强方法
JP2012510201A (ja) デジタル画像における記憶色の修正
Shuang et al. Algorithms for improving the quality of underwater optical images: A comprehensive review
Parihar et al. UndarkGAN: Low-light Image Enhancement with Cycle-consistent Adversarial Networks
Wen et al. Self-Reference Deep Adaptive Curve Estimation for Low-Light Image Enhancement

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23818733

Country of ref document: EP

Kind code of ref document: A1