CN113421237B

CN113421237B - No-reference image quality evaluation method based on depth feature transfer learning

Info

Publication number: CN113421237B
Application number: CN202110678186.6A
Authority: CN
Inventors: 何立火; 任伟; 李嘉秀; 邓夏迪; 甘海林; 唐杰浩; 柯俊杰; 张超仑; 路文
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-06-18
Filing date: 2021-06-18
Publication date: 2023-04-18
Anticipated expiration: 2041-06-18
Also published as: CN113421237A

Abstract

A no-reference image quality evaluation method based on depth feature transfer learning comprises the following steps: constructing a distortion characteristic extraction network; constructing a multi-branch feature attention module; constructing a quality regression network; generating a reference-free image quality regression network; generating a training set; training a non-reference image quality regression network; and evaluating the quality of the image to be evaluated. The multi-branch feature attention module contained in the distortion feature extraction network can capture the distortion features of natural images in a self-adaptive mode, and the quality scores of the input images can be automatically obtained on the output side of the quality regression network. The wide experimental results on a plurality of international open databases show that the method improves the prediction precision of the distorted image quality, and has the advantages of higher consistency with the human eye visual perception and stronger generalization performance when evaluating the quality of a non-reference image.

Description

No-reference image quality evaluation method based on depth feature transfer learning

Technical Field

The invention belongs to the technical field of image processing, and further relates to an image quality evaluation method based on depth feature transfer learning in the technical field of image quality evaluation. The invention can be used to automatically calculate the quality score of a naturally distorted image without an original reference image.

Background

With the advent of the world of everything interconnection and the rapid development of digital multimedia technology, images have become a major source of visual information perceived by humans from the outside world. However, some uncontrollable factors such as noise jitter and the like are inevitably introduced in the process of the signal from the transmitting end to the receiving end to cause image quality degradation, so that visual quality degradation and semantic information loss are caused. Therefore, the evaluation of the image quality is very important, and the image acquisition and processing system is optimized by designing an efficient and accurate image quality evaluation method to acquire images with higher quality. Image quality evaluation technology comes along, and since an original reference image (a corresponding distortion-free version) is difficult to acquire in most practical application scenes, a reference-free image quality evaluation method is most widely applied. The no-reference quality evaluation method is a technology which can automatically calculate the quality without any information about the original image, and obtains the quality representation of the target image by establishing the mapping relation from the subjective belief score to the objective evaluation score.

Wuhan university discloses a color image quality evaluation method based on a multi-path deep convolutional neural network in a patent technology 'color image quality evaluation method based on a multi-path deep convolutional neural network' (application number: CN201910414080.8, and authorization publication number: CN 110163855B). The patent technology mainly solves the problem that the quality prediction precision of the color image by the traditional method is not high. The patent technology comprises the following implementation steps: (1) Performing multi-scale transformation and color space transformation processing on the color image, and outputting a plurality of different component images; (2) designing and improving a single-path deep convolution network structure; (3) training and optimizing a single-path deep convolutional network; (4) Performing feature extraction and multi-dimensional feature collaborative fusion on a plurality of component images by a single-path depth convolution network model; (5) feature dimension reduction processing of the multi-dimensional output feature vector; (6) And the nonlinear regression method maps the subjective opinion score and the function of the dimensionality reduction characteristic to establish a color image quality prediction model and evaluate the quality of the color image. The patented technology extracts quality perception features in color components, although improving the no-reference image quality evaluation technology for color images. However, the method still has the defects that a plurality of images with different components are output through multi-scale transformation and color space transformation processing of the color image, a plurality of redundant irrelevant features exist among the transformed image features extracted by a plurality of convolutional neural networks, and finally, functional mapping between subjective opinion scores and dimensionality reduction features is completed through a nonlinear regression method, so that the method is not an end-to-end learnable process.

Ren et al, published in the paper "Ran4iqa: reactive adaptive networks for no-reference image quality assessment" (2018 proceedings of the AAAI Conference on Industrial intellectual evaluation.32 (1), 2018) discloses a no-reference image quality assessment method based on generation of an antagonistic network. The method includes the steps that firstly, an image to be tested is cut into a plurality of image blocks, secondly, a pre-training restorability countermeasure network is obtained after training is conducted on a large-scale smooth Lu database, input images in a pre-training stage are four types of distorted images (JPEG, JPEG2000 compression distortion, gaussian blur and Gaussian white noise), a pseudo reference image of the image blocks is generated through a pre-training model, and then the generated image blocks and the restored image blocks are sent to a regression network together to calculate the quality scores of the images. The method has two disadvantages, one of which is that the method utilizes the generated image which is obtained after pre-training on the large-scale database of the Torilis kuro and corresponds to the distortion type, the training period is long, and the method cannot be applied to the actual production and life; secondly, the distorted image recovery model of the method only considers 4 different levels of distortion types, the image in the real scene is often the combination of multiple distortion types, and the final subjective and objective consistency result depends on the accuracy of the recovery model to a great extent.

Disclosure of Invention

The invention aims to provide a no-reference image quality evaluation method based on depth feature transfer learning, aiming at solving the problems that in an image quality evaluation task, a quality evaluation network constructed by using a traditional transfer learning method is difficult to train, the prediction precision is not high and the generalization is not ideal because the distortion feature difference between a artificially synthesized distorted image and a real scene image is large.

The idea for realizing the purpose of the invention is as follows: the importance characteristics are modeled by constructing a multi-branch characteristic attention module, and the characteristics sensitive to the local distortion of the image are automatically highlighted by utilizing the attention modeling, so that the attention characteristics capable of expressing the image quality are obtained. The multi-branch feature attention module divides the feature graphs of the input images into two groups, uses an inter-channel attention mechanism to adaptively recalibrate channel feature responses in each group, ensures that a reference-free image quality regression network can learn the diversity features of the reference-free images, effectively learns distortion modes different from real scene images, quickly completes the self-adaptation of the real scene images and artificially synthesized distortion images, obviously improves the prediction accuracy of the image quality, and solves the problems that a quality evaluation network is difficult to train, the prediction accuracy is not high, and the generalization is not ideal.

The specific steps for realizing the purpose of the invention are as follows:

(1) Constructing a distortion feature extraction sub-network:

(1a) A five-layer image distortion characteristic extraction sub-network is built, and the structure sequentially comprises the following steps: the convolution layer comprises a 1 st convolution calculation unit, a 2 nd convolution calculation unit, a 3 rd convolution calculation unit and a 4 th convolution calculation unit; the 1 st to 4 th convolution calculation units adopt bottleneck structures, and each bottleneck structure is formed by cascading three convolution layers;

(1b) Setting the number of input channels of the convolution layer to 64, the number of output channels to 128, the size of the convolution kernel to 7 multiplied by 7 and the step length to 2; the number of the bottleneck structures of the 1 st to 4 th convolution computing units is respectively 3,4,6,3, and the sizes of convolution kernels of convolution layers in each bottleneck structure are respectively set to be 1 multiplied by 1,3 multiplied by 3 and 1 multiplied by 1;

(2) Constructing a multi-branch feature attention module:

building a multi-branch feature attention module formed by cascading three convolution layers; setting the feature map grouping number groups in the 1 st convolutional layer as 2, setting the input channel numbers of the 1 st to 3 rd convolutional layers as 64, 128 and 128 respectively, setting the sizes of convolutional cores as 3 × 3,1 × 1 and 1 × 1 respectively, and setting the step sizes as 1;

(3) Constructing a quality regression subnetwork:

constructing a quality regression sub-network formed by connecting two downsampling layer groups in parallel and having the same structure and parameter setting; each parallel down-sampling layer group consists of five cascade linear layers with the same structure and parameter setting, the number of nodes of the five cascade linear layers is respectively set to 2048, 1024, 512, 256 and 64, and the random node deactivation rates of the linear layers are respectively set to 0.5,0.25 and 0;

(4) Generating a reference-free image quality regression network:

sequentially cascading a distortion feature extraction sub-network, a multi-branch feature attention module, a quality regression sub-network and a prediction layer into a non-reference image quality regression network; the number of input nodes of the prediction layer is set to be 128, and the number of output nodes is set to be 1;

(5) Generating a training set:

(5a) Selecting at least 1020 and at most 6000 natural images without reference from the natural image quality evaluation data set to form a sample set, and sequentially carrying out normalization processing and pretreatment on each image in the sample set;

(5b) All the preprocessed images and the labels corresponding to the preprocessed images form a training set;

(6) Training a non-reference image quality regression network:

setting training parameters, inputting a training set into a reference-free image quality regression network, and iteratively updating network parameters by adopting a random gradient descent method until a loss function is converged to obtain a trained reference-free image quality regression network;

(7) And (3) carrying out quality evaluation on the non-reference image to be evaluated:

and (5) sequentially normalizing and preprocessing the non-reference image to be evaluated by adopting the same method as the steps (5 a) and (5 b), inputting the preprocessed image into a trained non-reference image quality regression network, and outputting the predicted quality score of the image.

Compared with the prior art, the invention has the following advantages:

firstly, because the multi-branch feature attention module is constructed, the distortion quality perception feature which seriously influences the visual impression of human beings is extracted from the input non-reference image in a self-adaptive manner by the multi-branch feature attention module, and the problem that in the prior art, a large amount of pre-training needs to be carried out on a large image database because the similarity difference between a artificially synthesized distortion target domain and a source domain image of a real scene is large is solved, so that the importance feature of the input image can be extracted in a self-adaptive manner by the multi-branch feature attention module, and the method has the advantage of more accurate result when the non-reference image quality is predicted.

Secondly, because the invention constructs a quality regression sub-network which is composed of two branches connected in parallel, the design of the two branches enhances the aggregation capability of the distortion feature extraction sub-network on the quality perception features, and simultaneously has the feature synergy effect. The constructed non-reference image quality regression network is a network capable of learning end to end, and the problems that an image quality prediction model in the prior art is a two-stage learning process, so that the processing flow is complicated and the feature expression capability is not strong are solved. The reference-free image quality regression network constructed by the method can further enhance the learning efficiency of the network under the synergistic effect of the quality regression sub-network, and has the advantages of higher consistency with human visual perception, higher prediction precision and stronger generalization performance when evaluating the reference-free image quality.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The invention is further described below with reference to fig. 1 and simulation experiments.

Step 1, constructing a distortion feature extraction sub-network.

A five-layer image distortion characteristic extraction sub-network is built, and the structure sequentially comprises the following steps: the convolution layer comprises a 1 st convolution calculation unit, a 2 nd convolution calculation unit, a 3 rd convolution calculation unit and a 4 th convolution calculation unit; the 1 st to 4 th convolution calculation units adopt a bottleneck structure, and each bottleneck structure is formed by cascading three convolution layers.

Setting the number of input channels of the convolutional layer to be 64, the number of output channels to be 128, the size of the convolutional core to be 7 multiplied by 7 and the step length to be 2; the number of the bottleneck structures of the 1 st to 4 th convolution calculation units is respectively 3,4,6,3, and the sizes of convolution kernels of convolution layers in each bottleneck structure are respectively set to be 1 × 1,3 × 3 and 1 × 1.

And 2, constructing a multi-branch feature attention module.

Building a multi-branch feature attention module formed by cascading three convolution layers; the feature map grouping number groups in the 1 st convolutional layer is set to be 2, the input channel numbers of the 1 st to 3 rd convolutional layers are set to be 64, 128 and 128 respectively, the sizes of the convolutional kernels are set to be 3 x 3,1 x 1 and 1 x 1 respectively, and the step sizes are all set to be 1.

And 3, constructing a quality regression subnetwork.

Constructing a quality regression sub-network formed by connecting two downsampling layer groups in parallel and having the same structure and parameter setting; each parallel down-sampling layer group consists of five cascaded linear layers with the same structure and parameter setting, the number of nodes of the five cascaded linear layers is respectively set to 2048, 1024, 512, 256 and 64, and the random node deactivation rates of the linear layers are respectively set to 0.5,0.25 and 0.

And 4, generating a reference-free image quality regression network.

Sequentially cascading a distortion feature extraction sub-network, a multi-branch feature attention module, a quality regression sub-network and a prediction layer into a non-reference image quality regression network; the number of input nodes of the prediction layer is set to be 128, and the number of output nodes is set to be 1.

And 5, generating a training set.

Selecting at least 1020 and at most 6000 natural images without reference from the natural image quality evaluation data set to form a sample set, and sequentially carrying out normalization processing and preprocessing on each image in the sample set.

The normalization processing refers to: the mean value of the normalization process of each image in the sample set is set as mean = [0.485,0.456,0.406], the standard deviation is set as std = [0.229,0.224,0.225], the normalization process is performed in the range of [0,1] using the mean value of 0.485,0.456,0.406 and the standard deviation of 0.229,0.224,0.225 for the three channels of R, G, B of the image respectively.

The preprocessing refers to dividing each normalized image into non-overlapping image blocks with the size of 32 x 32 and the sampling step size of 32.

And forming a training set by all the preprocessed images and the labels corresponding to the preprocessed images.

And 6, training a non-reference image quality regression network.

Setting training parameters, inputting a training set into the reference-free image quality regression network, and iteratively updating network parameters by adopting a random gradient descent method until a loss function is converged to obtain the trained reference-free image quality regression network.

The set training parameters are as follows: the small constant is set to eps =1e-8, and the first and second order moment estimate exponential decay rates are set to: beta is a ₁ ＝0.9，β ₂ =0.999, set the first-order and second-order moment estimates to s =0, r =0, respectively, set the initial learning rate to lr _ ratio =1e-3, set the batch size to batch _ size =128, and set the weight decay to weight _ decay =0.

The loss function is as follows:

wherein L (-) represents a loss function of the reference-free image quality regression network,

labels, Q, representing the ith image in the training set _i The image quality regression method comprises the steps of representing a predicted value of an ith image in a training set output through a non-reference image quality regression network, representing the total number of images in the training set by N, representing summation operation by sigma, representing the sequence number of the images in the training set by i, and representing absolute value operation by | and | in.

And 7, evaluating the quality of the non-reference image to be evaluated.

And 5, sequentially normalizing and preprocessing the non-reference image to be evaluated by adopting the same method as the step 5, inputting the preprocessed image into a trained non-reference image quality regression network, and outputting the predicted quality score of the image.

The effect of the present invention is further explained by combining the simulation experiment as follows:

1. simulation experiment conditions are as follows:

the hardware platform of the simulation experiment of the invention is as follows: the processor is Intel (R) Core (TM) i9-7900X @3.30GHz, the main frequency is 3.30GHz, the memory is 32GB, and the display card is NVIDIA GeForce GTX 1080Ti.

The software platform of the simulation experiment of the invention is as follows: ubuntu 16.04.12 operating system, pyTorch-gpu 1.6 open source deep learning framework, python 3.7.

The input images used by the simulation experiment of the invention are natural images and come from image quality evaluation known databases TID2008, TID2013 and KADID-10k.

The TID2008 database includes 25 reference images and 1700 distorted images, which are in bmp format.

The TID2013 database comprises 25 reference images and 3000 distorted images, and the image format of the database is bmp format.

The KADID-10k database contains 81 reference images, 10125 distorted images, in the format png.

2. Simulation content and result analysis thereof:

the simulation experiment of the invention adopts the invention and two prior arts (a deep non-reference image quality evaluation method CNN based on a convolutional neural network and a blind image quality evaluation method HOSA based on high-order statistical data aggregation) to respectively carry out the quality prediction of a non-reference image on distorted images in three image quality evaluation known databases of TID2008, TID2013 and KADID-10k.

And calculating the consistency of the quality predicted value of the non-reference image and the image label to obtain an evaluation index, and measuring the quality evaluation effect of the invention and two prior arts on the non-reference image in the three image quality evaluation known databases by using the evaluation index.

In the simulation experiments, two prior arts are adopted to mean:

the no-reference image quality evaluation method based on the Convolutional neural network refers to a no-reference image quality evaluation method provided by L.Kang et al in' Convolutional neural networks for no-reference image quality assessment [ C ]// Proceedings ofhe IEEE con-ference on computer vision and pattern registration.2014: 1733-1740 ], which is called a depth N no-reference image quality evaluation method based on the Convolutional neural network for short.

The Blind Image quality evaluation method based on high-order statistical data aggregation refers to a no-reference Image quality evaluation method, abbreviated as HOSA no-reference Image quality evaluation method, proposed by J.xu et al in "Black Image quality assessment based on high order statistics aggregation [ J ]. IEEE Transactions on Image Processing,2016,25 (9): 4444-4457.

In the simulation experiment, the three known image quality evaluation databases are adopted:

the TID2008 well-known database refers to the database of n.pontomarenko et al in "TID2008-a database for evaluation of future visual quality assessment metrics [ J ]. Advances of model radio electronics,2009, 10:30-45 ", referred to as TID2013 known database.

The TID2013 known database refers to an image quality evaluation database, called TID2013 known database for short, proposed by N.Ponomarenko et al in Color image database TID2013: pecliarities and preliminary results in Europan Workshop on Visual Information Processing (EUVIP), 106-111,2013.

The KADID-10k well-known database refers to an image Quality evaluation database, called KADID-10k well-known database for short, which is set forth by Lin H et al in "Kadid-10 k.

In Order to judge the quality evaluation effect of the non-reference image in the invention and the quality evaluation effect of the non-reference image in the other two prior arts, the simulation experiment adopts two indexes of Spearman Rank Order Correlation Coefficient (SROCC) and Pearson Linear Correlation Coefficient (PLCC) to objectively judge the quality evaluation effect of the non-reference image in the invention and the quality evaluation effect of the non-reference image in the other two prior arts.

(1) Spearman Rank Order Correlation Coefficient (SROCC)

The Spearman correlation determines the strength and direction of a monotonic relation between two variables, measures the monotonicity of algorithm prediction, and has the expression:

wherein r is _xi Expressing the subjective quality evaluation result of the ith image to be tested, r _yi (r) represents the result of objective quality evaluation _xi -r _yi ) ² It represents the difference between the two, obtained by sorting the differential set calculation.

(2) Pearson Linear Correlation Coefficient (PLCC)

x _i And y _i The subjective quality evaluation score and the objective score of the ith tested image are represented respectively. The expression is as follows:

wherein n is the total number of images,

and &>

The evaluation scores are respectively the subjective evaluation scores of human eyes for the database and the average values of the evaluation scores obtained by the automatic calculation of an objective evaluation algorithm. The linear correlation coefficient describes the correlation between the evaluation value of the algorithm and the subjective score of human eyes, and meanwhile, the accuracy of algorithm prediction is measured.

The simulation experiment uses the invention and two prior arts to evaluate images in three different known databases, and calculates two consistency indexes of the evaluation result of each method, and the calculation result is shown in table 1.

TABLE 1 comparison of evaluation results of three methods

As can be seen from Table 1, the spearman order correlation coefficient SROCC and the pilson linear correlation coefficient PLCC of the evaluation results of the invention on three image quality evaluation known databases are higher than those of the two prior arts, and the invention is proved to have better non-reference image quality evaluation effect.

Claims

1. A no-reference image quality evaluation method based on depth feature transfer learning is characterized in that a multi-branch feature attention module is embedded in a distortion feature extraction sub-network, two branches connected in parallel are connected at the tail part of the distortion feature extraction sub-network to serve as a quality regression sub-network, and a prediction layer is used for predicting the quality score of a distorted image; the method comprises the following specific steps:

(1) Constructing a distortion feature extraction sub-network:

(1a) A five-layer image distortion characteristic extraction sub-network is built, and the structure sequentially comprises the following steps: the convolution layer comprises a 1 st convolution calculation unit, a 2 nd convolution calculation unit, a 3 rd convolution calculation unit and a 4 th convolution calculation unit; the 1 st to 4 th convolution computing units adopt bottleneck structures, and each bottleneck structure is formed by cascading three convolution layers;

(1b) Setting the number of input channels of the convolution layer to 64, the number of output channels to 128, the size of the convolution kernel to 7 multiplied by 7 and the step length to 2; the number of the bottleneck structures of the 1 st to 4 th convolution computing units is respectively 3,4,6 and 3, and the sizes of convolution kernels of convolution layers in each bottleneck structure are respectively set to be 1 multiplied by 1,3 multiplied by 3 and 1 multiplied by 1;

(2) Constructing a multi-branch feature attention module:

(3) Constructing a quality regression subnetwork:

(4) Generating a reference-free image quality regression network:

(5) Generating a training set:

(6) Training a non-reference image quality regression network:

setting training parameters, inputting a training set into a non-reference image quality regression network, and iteratively updating network parameters by adopting a random gradient descent method until a loss function is converged to obtain a trained non-reference image quality regression network;

(7) And (3) performing quality evaluation on the non-reference image to be evaluated:

2. The method for evaluating the quality of the non-reference image based on the depth feature transfer learning of claim 1, wherein the normalization process in the step (5 a) is: setting the mean value of normalization processing of each image in the sample set as mean = [0.485,0.456,0.406], setting the standard deviation as std = [0.229,0.224,0.225], and performing normalization processing in the range of [0,1] by using the mean value of 0.485,0.456,0.406 and the standard deviation of 0.229,0.224,0.225 for the three channels of R, G, B of the image respectively; the preprocessing described in step (5 a) refers to dividing each normalized image into non-overlapping image blocks of size 32 × 32 and sampling step size 32.

3. The method for evaluating the quality of the non-reference image based on the depth feature transfer learning according to claim 1, wherein: the loss function in step (6) is as follows:

label representing the ith image in the training set, Q _i The image quality regression method comprises the steps of representing a predicted value of an ith image in a training set output through a non-reference image quality regression network, representing the total number of images in the training set by N, representing summation operation by sigma, representing the sequence number of the images in the training set by i, and representing absolute value operation by | and | in.

4. The method for evaluating the quality of the non-reference image based on the depth feature transfer learning according to claim 1, wherein: the training parameters set in step (6) are as follows: the small constant is set to eps =1e-8, and the first and second order moment estimate exponential decay rates are set to: beta is a ₁ ＝0.9，β ₂ =0.999, the first and second order moment estimates are set to s =0, r =0, respectively, the initial learning rate is set to lr _ ratio =1e-3, the batch size is set to batch _ size =128, and the weight decay is set to weight _ decay =0.