CN111489305B

CN111489305B - Image Enhancement Method Based on Reinforcement Learning

Info

Publication number: CN111489305B
Application number: CN202010244525.5A
Authority: CN
Inventors: 华中华; 侯春萍; 杨阳; 及浩然; 王霄聪
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2023-05-30
Anticipated expiration: 2040-03-31
Also published as: CN111489305A

Abstract

The invention relates to an image enhancement method based on reinforcement learning, which comprises the following steps: making a distorted picture data set: preprocessing a training set and a testing set by using matlab by adopting three types of processing modes with different degrees to generate a distortion picture; designing an image enhancement processing tool: respectively training enhancement algorithm parameters aiming at different types or different degrees of distortion pictures, generating corresponding meta files, and obtaining a plurality of processing tools, wherein each tool correspondingly processes distortion of a specific degree and a specific type; training an optimal processing tool to select a network; and (5) testing the performance of the model.

Description

Image enhancement method based on reinforcement learning

技术领域Technical Field

本发明属于图像处理技术领域，涉及一种基于机器学习技术，采用强化学习提高图像质量的方法。The present invention belongs to the technical field of image processing and relates to a method for improving image quality by using reinforcement learning based on machine learning technology.

背景技术Background Art

随着大数据时代的到来，图像信息因其更具有直观性，更加通俗易懂，已经是日常生活中必不可少的信息来源。然而通过图片去了解信息的必要条件是获得一张高质量的图片，降质的图片含有许多噪音，为之后的分析带来各种各样的障碍。随着人工智能的发展，人们对于图片质量的要求也越来越高。在计算机视觉领域，图像增强也一直是研究的热点。With the advent of the big data era, image information has become an indispensable source of information in daily life because it is more intuitive and easier to understand. However, the necessary condition for understanding information through images is to obtain a high-quality image. Degraded images contain a lot of noise, which brings various obstacles to subsequent analysis. With the development of artificial intelligence, people's requirements for image quality are getting higher and higher. In the field of computer vision, image enhancement has always been a hot research topic.

在许多数字图像处理应用中，经常需要用高质量的图片或者视频来进行处理和分析。由于现有技术的局限性，成像设备往往得到的是低质量的图片，包括但不仅限于带有高斯白噪声，分辨率过低，模糊等，这些都会给后续的图像处理和分析带来困难，而如何提升这些图片的质量就成了需要关注的重点。In many digital image processing applications, high-quality pictures or videos are often needed for processing and analysis. Due to the limitations of existing technologies, imaging devices often obtain low-quality pictures, including but not limited to Gaussian white noise, low resolution, blur, etc., which will bring difficulties to subsequent image processing and analysis, and how to improve the quality of these pictures has become a focus of attention.

在最近几年，深度学习尤其是卷积神经网络被证明是一种高效的数据驱动框架，并且在底层图像处理问题上展示出了不错的效果。Dong等人设计了一种卷积神经网络架构[1]来解决单张图片超分辨率重建问题，设计的三层卷积神经网络分别模拟基于稀疏编码超分辨率重建。Wang等人把神经网络融入稀疏编码框架来解决超分辨率重建问题[2]，利用一种针对快速解稀疏编码的神经网络设计，巧妙地将这个设计应用到了图像超分辨率问题。Schuler等人开发了一种通过解卷积构造的多层感知机方法[3]去除噪声和手工痕迹。Xu等人使用一种深度卷积神经网络[4]来恢复带有噪声的图像，使用了奇异值分解的方法，减少了网络中的参数。Kim等人[5]发现可以通过学习低分辨率图片和高分辨率图片之间的残差把更多的层加入到网络中，帮助神经网络收敛。In recent years, deep learning, especially convolutional neural networks, has been proven to be an efficient data-driven framework and has shown good results in low-level image processing problems. Dong et al. designed a convolutional neural network architecture [1] to solve the problem of single image super-resolution reconstruction. The designed three-layer convolutional neural network simulates sparse coding-based super-resolution reconstruction. Wang et al. integrated neural networks into the sparse coding framework to solve the super-resolution reconstruction problem [2]. They used a neural network design for fast decomposition of sparse coding and cleverly applied this design to the image super-resolution problem. Schuler et al. developed a multi-layer perceptron method constructed by deconvolution [3] to remove noise and hand-crafted traces. Xu et al. used a deep convolutional neural network [4] to restore noisy images and used the singular value decomposition method to reduce the parameters in the network. Kim et al. [5] found that more layers can be added to the network by learning the residual between low-resolution and high-resolution images to help the neural network converge.

[1]Dong C,Chen C L,He K,et al.Image super-Resolution Using DeepConvolutional Networks[J].IEEE Transactions onPatternAnalysis&MachineIntelligence,2016,38(2):295-307[1]Dong C,Chen C L,He K,et al.Image super-Resolution Using DeepConvolutional Networks[J].IEEE Transactions onPatternAnalysis&MachineIntelligence,2016,38(2):295-307

[2]Wang Z,Liu D,Yang J,et al.Deep networks for Image Super-Resolutionwith Sparse Prior[C].In proceedingofICCV,Santiago,Chile,2015:370-378.[2]Wang Z, Liu D, Yang J, et al.Deep networks for Image Super-Resolution with Sparse Prior[C].In proceedingofICCV,Santiago,Chile,2015:370-378.

[3]Schuler C J,Burger H C,Haemeling S,et al.A Machine LearningApproach for Non-blind Image Deconvolution[C].InProceedingofCVPR,Portland,ORUSA,2013:1067-1074[3]Schuler C J, Burger H C, Haemeling S, et al.A Machine LearningApproach for Non-blind Image Deconvolution[C].InProceedingofCVPR,Portland,ORUSA,2013:1067-1074

[4]Xu L,Ren J S,Liu C,et al.Deep convolutional neural network forimage deconvolution[C].In Proceeding ofNIPS,Montreal,Quebec,Canada,2014:1790-1798.[4]Xu L,Ren J S,Liu C,et al.Deep convolutional neural network forimage deconvolution[C].In Proceeding ofNIPS,Montreal,Quebec,Canada,2014:1790-1798.

[5]Kim J,Lee J K,Lee KM.Accurate Image Super-Resolution Using VeryDeep ConvolutionNetworks[C].InProceedings ofCVPR,LasVegas,NV,USA,2016:1646-1654[5]Kim J, Lee J K, Lee KM.Accurate Image Super-Resolution Using VeryDeep ConvolutionNetworks[C].InProceedings ofCVPR,LasVegas,NV,USA,2016:1646-1654

发明内容Summary of the invention

本发明的目的是提供一种基于强化学习的基于强化学习的图像增强方法，本发明选择最合适的图像增强方法，对图像进行处理，丰富图像的细节信息，增强图像质量，便于后续的处理，提高效率和准确率。技术方案如下：The purpose of the present invention is to provide an image enhancement method based on reinforcement learning. The present invention selects the most appropriate image enhancement method to process the image, enrich the image details, enhance the image quality, facilitate subsequent processing, and improve efficiency and accuracy. The technical solution is as follows:

一种基于强化学习的图像增强方法，包括下列步骤：An image enhancement method based on reinforcement learning comprises the following steps:

第一步：制作失真图片数据集Step 1: Create a distorted image dataset

将公开的图片数据集分为训练集，测试集，用matlab采用不同程度的三种类型处理方式对训练集和测试集进行预处理，生成失真图片，包含不同程度的JPEG压缩处理方式，不同程度的高斯噪音处理方式，不同程度的高斯模糊处理方式；The public image dataset is divided into a training set and a test set. Matlab is used to preprocess the training set and the test set using three types of processing methods of different degrees to generate distorted images, including different degrees of JPEG compression processing methods, different degrees of Gaussian noise processing methods, and different degrees of Gaussian blur processing methods.

第二步：设计图像增强处理工具Step 2: Design image enhancement processing tools

针对不同类型或不同程度的失真图片，分别训练增强算法参数，并生成对应的meta文件，获得多个处理工具，每个工具对应处理特定程度、特定类型的失真，其中，针对不同程度的JPEG压缩处理方式，采用生成对抗网络恢复重建算法恢复；针对不同程度的高斯噪声处理方式，采用卷积神经网络去噪算法恢复；针对不同程度的高斯模糊处理方式，采用卷积神经网络去模糊算法恢复；For different types or degrees of distorted images, the enhancement algorithm parameters are trained respectively, and the corresponding meta files are generated to obtain multiple processing tools. Each tool is used to process a specific degree or type of distortion. For different degrees of JPEG compression processing, the generative adversarial network recovery and reconstruction algorithm is used for recovery; for different degrees of Gaussian noise processing, the convolutional neural network denoising algorithm is used for recovery; for different degrees of Gaussian blur processing, the convolutional neural network deblurring algorithm is used for recovery;

第三步：训练最优处理工具选择网络Step 3: Train the optimal processing tool selection network

对失真图像进行重建时，不同处理工具的恢复效果不同，不同的处理顺序恢复效果也不同，需要设计自主选择最优处理工具的网络；采用DQN强化学习算法，将选择问题看作马尔科夫过程，用回报函数来评价每一个动作，面对不同的当前状态，采取最合适的动作来转换状态使得回报函数最大化，将处理工具的选择视为离散的动作，面对不同程度的失真图片选择最合适的处理工具和处理顺序；When reconstructing distorted images, different processing tools have different restoration effects, and different processing orders have different restoration effects. It is necessary to design a network that can autonomously select the optimal processing tool. The DQN reinforcement learning algorithm is used to regard the selection problem as a Markov process, and each action is evaluated by a reward function. In the face of different current states, the most appropriate action is taken to transform the state to maximize the reward function. The selection of processing tools is regarded as a discrete action, and the most appropriate processing tools and processing order are selected in the face of distorted images of different degrees.

确定训练最优处理工具选择网络参数调整方案：Batch设置为32，学习率设置为0.0001，探索率初值0.1，迭代次数设置为100000次；测试时将Batch设置为1，即每次只处理一张图像；训练最优处理工具选择网络参数，使得目标函数累计回报函数最大化；训练迭代结束或者累计回报函数收敛时，得到失真图片最优处理工具选择网络：输入失真图片，输出最优处理工具对应的标号和处理顺序；Determine the training optimal processing tool selection network parameter adjustment plan: Batch is set to 32, learning rate is set to 0.0001, exploration rate initial value is 0.1, and number of iterations is set to 100,000 times; Batch is set to 1 during testing, that is, only one image is processed each time; Training optimal processing tool selection network parameters maximize the objective function cumulative reward function; When the training iteration ends or the cumulative reward function converges, the optimal processing tool selection network for distorted images is obtained: input the distorted image, and output the label and processing order corresponding to the optimal processing tool;

第四步：模型性能测试Step 4: Model performance testing

将测试集中的失真图片输入最优处理工具选择网络中，得到各图片处理工具对应的标号和处理顺序，并对失真图片进行对应操作，得到增强图片；通过计算没有失真的原始图片和增强后的图像之间的峰值功率信噪比PSNR，对模型的性能进行评估，PSNR值越高，恢复效果越好。The distorted images in the test set are input into the optimal processing tool selection network to obtain the corresponding labels and processing orders of each image processing tool, and the corresponding operations are performed on the distorted images to obtain enhanced images. The performance of the model is evaluated by calculating the peak power signal-to-noise ratio (PSNR) between the original image without distortion and the enhanced image. The higher the PSNR value, the better the restoration effect.

本发明的有益效果如表1所示。The beneficial effects of the present invention are shown in Table 1.

表1结果统计表Table 1 Results Statistics

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

附图1图像增强模型结构图Figure 1 Image enhancement model structure diagram

附图2JPEG压缩图Figure 2: JPEG compression image

附图3压缩图像模型增强结果图Figure 3: Compressed image model enhancement result diagram

附图4高斯噪声图Figure 4 Gaussian noise diagram

附图5噪音图像模型增强结果图Figure 5: Noise image model enhancement result

附图6模糊处理图Figure 6 Blurred image

附图7模糊图像模型增强结果图Figure 7: Blurred image model enhancement result

具体实施方式DETAILED DESCRIPTION

为使本发明的技术方案更加清楚，下面结合附图对本发明具体实施方式做进一步地描述。In order to make the technical solution of the present invention clearer, the specific implementation manner of the present invention is further described below in conjunction with the accompanying drawings.

第一步：制作失真图片数据集。Step 1: Create a distorted image dataset.

将DIV2K数据集按照15∶1的比例分为训练集，测试集。用matlab采用12种处理方式对训练集和测试集进行预处理，生成失真图片，获得750张训练集图片，50张测试集图片，其中每张图片会经过多次处理。本发明中使用的12种方式，如表2所示。The DIV2K dataset is divided into a training set and a test set at a ratio of 15:1. Matlab is used to preprocess the training set and the test set using 12 processing methods to generate distorted images, and 750 training set images and 50 test set images are obtained, where each image is processed multiple times. The 12 methods used in the present invention are shown in Table 2.

表2失真处理Table 2 Distortion processing

(1)高斯模糊处理(1) Gaussian blur processing

1/N1/N ...... 1/N1/N ...... 1/N1/N ...... 1/N1/N ...... 1/N1/N

图1 [N，N]高斯模糊卷积核Figure 1 [N, N] Gaussian blur convolution kernel

利用图1所示的卷积核对图像进行遍历卷积，得到模糊处理后的图片。The image is convolved using the convolution kernel shown in Figure 1 to obtain a blurred image.

(2)加入高斯噪声(2) Adding Gaussian noise

对一幅输入图像f(x,y)进行处理，产生一幅退化后的图像g(x,y)。给定g(x,y)、退化函数H和加性噪声项η(x,y)，空间域中退化图像可由下式给出：An input image f(x,y) is processed to produce a degraded image g(x,y). Given g(x,y), the degradation function H and the additive noise term η(x,y), the degraded image in the spatial domain can be given by the following formula:

g(x,y)＝h(x,y)*f(x,y)+η(x,y)g(x,y)=h(x,y)*f(x,y)+eta(x,y)

在频域上：In the frequency domain:

G(u,v)＝H(u,v)F(u,v)+N(u,v)G(u,v)＝H(u,v)F(u,v)+N(u,v)

第二步：设计图像增强处理工具。Step 2: Design image enhancement processing tools.

针对不同类型或不同程度的失真，分别训练增强算法参数，并生成对应的meta文件，获得12个处理工具，用于处理特定程度、特定类型的失真。本发明中，针对4种不同程度的JPEG压缩，采用生成对抗网络恢复重建算法恢复；针对4种不同程度的高斯噪声，采用卷积神经网络去噪算法恢复；针对4种不同程度的模糊，采用卷积神经网络去模糊算法恢复。For different types or degrees of distortion, the enhancement algorithm parameters are trained respectively, and the corresponding meta files are generated to obtain 12 processing tools for processing specific degrees and specific types of distortion. In the present invention, for 4 different degrees of JPEG compression, the generative adversarial network restoration and reconstruction algorithm is used for restoration; for 4 different degrees of Gaussian noise, the convolutional neural network denoising algorithm is used for restoration; for 4 different degrees of blur, the convolutional neural network deblurring algorithm is used for restoration.

第三步：训练最优处理工具选择网络。Step 3: Train the optimal processing tool selection network.

对失真图像进行重建时，不同处理工具的恢复效果不同，不同的处理顺序恢复效果也不同，所以应当设计自主选择最优处理工具的网络。强化学习算法，将选择问题看作马尔科夫过程，用回报函数来评价每一个动作。面对不同的当前状态，采取最合适的动作来转换状态使得回报函数最大化。本发明采用DQN强化学习算法，将处理工具的选择视为离散的动作，面对不同程度的失真图片选择最合适的处理工具和处理顺序，如图2所示。When reconstructing distorted images, different processing tools have different restoration effects, and different processing orders have different restoration effects, so a network that can autonomously select the optimal processing tool should be designed. The reinforcement learning algorithm regards the selection problem as a Markov process and uses a reward function to evaluate each action. Faced with different current states, the most appropriate action is taken to transform the state to maximize the reward function. The present invention adopts the DQN reinforcement learning algorithm, regards the selection of processing tools as discrete actions, and selects the most appropriate processing tools and processing order for distorted images of different degrees, as shown in Figure 2.

本发明DQN算法中，环境状态S_t＝{I_t,v_t}，其中I_t表示输入的失真图片向量，v_t表示历史动作向量，第一步时，v_t为0向量；t时刻个体采取的动作A_t∈{12个工具}，采取动作即选择一种工具处理失真图片，获得重建后图片，转换状态至S_t+1，同时得到环境的奖励R_t，计算公式如下：In the DQN algorithm of the present invention, the environment state S _t ={I _t ,v _t }, where I _t represents the input distorted image vector, v _t represents the historical action vector, and in the first step, v _t is a 0 vector; the action taken by the individual at time t is A _t ∈{12 tools}, and taking an action means selecting a tool to process the distorted image, obtaining the reconstructed image, switching the state to S _t+1 , and obtaining the environment reward R _t , and the calculation formula is as follows:

R_t＝||I_target-I_t-1||²-||I_target-I_t||² R _t =||I _target -I _t-1 || ² -||I _target -I _t || ²

其中I_target表示无失真原图；累计回报函数Q(t)＝E(R_t+1+λR_t+2+λ²R_t+3+…|S_t)，其中E是期望函数，λ是衰减因子，累计回报函数Q(t)最大化与选择最优处理工具问题等价。Where I _target represents the undistorted original image; the cumulative reward function Q(t) = E(R _t+1 +λR _t+2 +λ ² R _t+3 +…|S _t ), where E is the expectation function and λ is the attenuation factor. Maximizing the cumulative reward function Q(t) is equivalent to the problem of selecting the optimal processing tool.

由于本发明所使用的训练图片尺寸较大，经过多次实验结果，确定训练最优处理工具选择网络参数调整方案：Batch设置为32，学习率设置为0.0001，探索率初值0.1，迭代次数设置为100000次。测试时将Batch设置为1，即每次只处理一张图像。训练最优处理工具选择网络参数，使得目标函数累计回报函数Q(t)最大化。实验环境为Ubuntu16.04操作系统，利用NVIDIA公司6GB显存的RTX2060GPU进行训练并利用CUDA进行训练的加速。Since the training images used in the present invention are large in size, after multiple experimental results, the optimal training processing tool is determined to select the network parameter adjustment scheme: Batch is set to 32, the learning rate is set to 0.0001, the initial value of the exploration rate is 0.1, and the number of iterations is set to 100,000 times. During the test, Batch is set to 1, that is, only one image is processed each time. The optimal training processing tool selects network parameters to maximize the cumulative reward function Q(t) of the objective function. The experimental environment is the Ubuntu 16.04 operating system, and the RTX 2060 GPU with 6GB of video memory of NVIDIA is used for training and CUDA is used to accelerate the training.

训练迭代结束或者累计回报函数Q(t)收敛时，得到失真图片最优处理工具选择网络：输入失真图片，输出最优处理工具对应的标号和处理顺序。When the training iteration ends or the cumulative reward function Q(t) converges, the optimal processing tool selection network for the distorted image is obtained: the distorted image is input, and the label and processing order corresponding to the optimal processing tool are output.

第四步：模型性能测试。Step 4: Model performance testing.

将测试集中的失真图片输入最优处理工具选择网络中，得到各图片处理工具对应的标号和处理顺序，并对失真图片进行对应操作，得到增强图片。通过计算没有失真的原始图片和增强后的图像之间的峰值功率信噪比PSNR，对模型的性能进行评估，PSNR值越高，恢复效果越好。The distorted images in the test set are input into the optimal processing tool selection network to obtain the corresponding labels and processing orders of each image processing tool, and the corresponding operations are performed on the distorted images to obtain enhanced images. The performance of the model is evaluated by calculating the peak power signal-to-noise ratio (PSNR) between the original image without distortion and the enhanced image. The higher the PSNR value, the better the restoration effect.

PSNR的定义为：PSNR is defined as:

其中m,n,c表示图像的尺寸，本发明中为256，256，8；x为无失真的原始图片，y为重建后的图片，MAX_I是像素最大值，即为255。Wherein m, n, c represent the size of the image, which is 256, 256, and 8 in the present invention; x is the original image without distortion, y is the reconstructed image, and MAX _I is the maximum pixel value, which is 255.

对实验数据进行分析与处理，评价本发明图片质量增强性能。测试后结果如表1所示，对比可知，本发明对图片质量增强效果较好。The experimental data were analyzed and processed to evaluate the image quality enhancement performance of the present invention. The test results are shown in Table 1. By comparison, it can be seen that the present invention has a better effect on image quality enhancement.

Claims

1. An image enhancement method based on reinforcement learning, comprising the following steps:

Step 1: Create a distorted image dataset

Divide the public image data set into training set and test set, and use matlab to preprocess the training set and test set with three types of processing methods of different degrees to generate distorted images, including different degrees of JPEG compression processing methods, different degrees Gaussian noise processing methods, different degrees of Gaussian blur processing methods;

Step 2: Design image enhancement processing tools

For different types or different degrees of distorted pictures, train the enhancement algorithm parameters separately, generate corresponding meta files, and obtain multiple processing tools. Each tool corresponds to a specific degree and type of distortion. Among them, for different degrees of JPEG compression The processing method is restored by the generative confrontation network restoration and reconstruction algorithm; for different degrees of Gaussian noise processing methods, the convolutional neural network denoising algorithm is used for restoration; for different degrees of Gaussian blur processing methods, the convolutional neural network defuzzification algorithm is used for restoration;

Step 3: Train the optimal processing tool selection network

When reconstructing distorted images, different processing tools have different recovery effects, and different processing sequences have different recovery effects. It is necessary to design a network that independently selects the optimal processing tool; using the DQN reinforcement learning algorithm, the selection problem is regarded as a Markov process , use the reward function to evaluate each action, face different current states, take the most appropriate action to convert the state to maximize the reward function, treat the selection of processing tools as a discrete action, and face different degrees of distortion image selection the most appropriate processing tools and sequence of processing;

Determine the optimal processing tool for training and select a network parameter adjustment scheme: Batch is set to 32, the learning rate is set to 0.0001, the initial value of the exploration rate is 0.1, and the number of iterations is set to 100,000; when testing, set Batch to 1, that is, only one batch is processed each time. image; training the optimal processing tool to select network parameters to maximize the cumulative return function of the objective function; when the training iteration ends or the cumulative return function converges, the optimal processing tool selection network for the distorted picture is obtained: input the distorted picture, and output the optimal processing tool Corresponding labels and processing order;

Step 4: Model performance testing

Input the distorted pictures in the test set into the optimal processing tool selection network, obtain the corresponding labels and processing order of each picture processing tool, and perform corresponding operations on the distorted pictures to obtain enhanced pictures; by calculating the original picture without distortion and the enhanced picture The peak power signal-to-noise ratio (PSNR) between images is used to evaluate the performance of the model. The higher the PSNR value, the better the restoration effect.