CN115880234A

CN115880234A - No-reference color image quality evaluation method based on color and structure distortion

Info

Publication number: CN115880234A
Application number: CN202211510202.1A
Authority: CN
Inventors: 贾惠珍; 胡其层; 王同罕
Original assignee: East China Institute of Technology
Current assignee: East China Institute of Technology
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2023-03-31

Abstract

The invention is suitable for the field of image processing, and provides a color and structure distortion-based quality evaluation method for a non-reference color image, which comprises the following steps: carrying out blocking operation on the color image to form a color distorted image block data set to form a training set and a test set; constructing a group of two-dimensional Scharr detection operators, and performing convolution processing on the input image blocks by using the two-dimensional Scharr detection operators to obtain gradient characteristic information of distorted image blocks; converting the obtained image block data set into an HSV color space; constructing a color and structure distortion-based non-reference image quality evaluation convolutional neural network model according to an existing image block data set; and inputting the divided training sets into the constructed non-reference image quality evaluation convolutional neural network model to obtain a trained non-reference image quality evaluation model. The invention has reasonable design, strong correlation and small error between the simulated mass fraction and the real mass fraction, and is worth popularizing.

Description

No-reference color image quality evaluation method based on color and structure distortion

Technical Field

The invention belongs to the field of image processing, and particularly relates to a quality evaluation method of a non-reference color image based on color and structural distortion.

Background

Currently, the commonly used image objective evaluation methods are classified into three categories, namely full reference quality evaluation (FR-IQA), half reference quality evaluation (RR-IQA) and no reference quality evaluation (NR-IQA). In the first two methods, the quality evaluation must be performed by using the original non-distorted image as a reference, but in many practical applications, the original reference image is difficult or impossible to obtain, and it is very important to establish a stable and general non-reference image quality evaluation (NR-IQA) method which conforms to the human visual system.

With the rapid development of information technology, high-resolution color images become the mainstream of transmission at present. Most users beautify before publishing or sharing photos. However, it is always possible for inexperienced users to produce unknown color and structural distortions in the process of processing images. In recent years, many scholars have proposed a large number of methods for color image non-reference quality evaluation, but the current image quality evaluation methods are mainly directed to blurred images, JPEG compression images, noise and other distorted images, and the quality evaluation of color distorted images is less studied. Most of the existing objective natural image quality evaluation methods are directed at grayscale images, such as SSIM (Structure Similarity), GMSD (Gradient magnetic Similarity development), and the like, while the work in color image quality evaluation is relatively little.

In the color image quality evaluation process, two methods are generally employed. Firstly, a color image is converted into a gray image, and then the gray image is evaluated by using a conventional method, which is the most commonly used method in actual image processing, but the methods only consider structural distortion of the image and do not fully consider the influence of color distortion on image quality. The second method is to use the color information in the RGB color space to perform quality evaluation, such as "a Feature-organized complete black image quality analyzer" to separate and quantify the RGB three channels of the color image to perceive the color difference, which has good effect, but still has a space for improvement.

These algorithms still suffer from the following drawbacks: 1) Most algorithms are based on errors among pixel points, and the visual characteristics of human eyes are not considered. Therefore, the experimental results in many cases are not consistent with the subjective feeling of human, and these methods are only applicable to grayscale images but not to color images; 2) Most quality assessment algorithms for color images are studied on the RGB channel. However, the RGB color space does not conform to the perception of color by human vision, which results in a deviation in the visual quality of the image.

At present, most image quality evaluation methods do not specially research color images, mainly aim at common distortion types such as noise, compression, blurring and the like, a large amount of color information of the color images can be lost in the conversion process, and when the methods evaluate the quality of the color distortion images, the obtained result has a large difference with a subjective quality score.

Therefore, in view of the above situation, there is an urgent need to develop a quality evaluation method for color and structure distortion-based color images without reference, so as to overcome the shortcomings in the current practical application.

Disclosure of Invention

An object of the embodiments of the present invention is to provide a method for evaluating quality of a color image without reference based on color and structural distortion, which aims to solve the above-mentioned problems in the background.

The embodiment of the invention is realized in such a way that a no-reference color image quality evaluation method based on color and structure distortion comprises the following steps:

step 1: partitioning the color image, cutting the color image into a group of image blocks with the same size to form a color distorted image block data set, and forming a training set and a test set;

step 2: constructing a group of two-dimensional Scharr detection operators, performing convolution processing on input image blocks by using the two-dimensional Scharr detection operators to obtain gradient characteristic information of distorted image blocks and generating gradient image blocks;

and 3, step 3: converting the image block data set obtained in the step 1 into an HSV color space, and separating an H channel and an S channel in the HSV color space;

and 4, step 4: constructing a non-reference image quality evaluation convolutional neural network model based on color and structural distortion according to an existing image block data set;

and 5: inputting the training sets divided in the step 1, the step 2 and the step 3 into the non-reference image quality evaluation convolutional neural network model constructed in the step 4 to obtain a trained non-reference image quality evaluation model; and then calculating the quality predicted values of all sub-blocks segmented by the images in the test set by the non-reference image quality evaluation model.

In step 1, dividing a color image into image blocks of 32 × 32 size according to a non-overlapping division manner; 80% of the color distorted image block data set was randomly selected as the training set, and the remaining 20% was selected as the test set.

In a further technical scheme, the step 2 specifically comprises the following steps:

step 2.1: inputting a distorted image block I (x);

step 2.2: establishing a group of two-dimensional Scharr detection operators:

step 2.3: performing convolution calculation on the distorted image block I (x) input in the step 2.1 by using the two-dimensional Scharr detection operator established in the step 2.2, so as to obtain gradient information of the distorted image block I (x) in a two-dimensional space; for image block I (x), its formula for computation by Scharr convolution is:

wherein G is _P (x) And G _Q (x) Representing the gradient of the image along the horizontal and vertical directions respectively,

represents a convolution operation; the overall gradient calculation formula for image block I (x) is:

in a further technical scheme, the step 3 specifically comprises the following steps:

step 3.1: converting image block I (x) from RGB color space to HSV color space, the formula is as follows:

/>

V＝max；

wherein, R, G and B are the values of RGB color quantity of the image point respectively, and max and min are the maximum value and the minimum value of the three values of R, G and B.

According to a further technical scheme, the step 4 specifically comprises the following steps:

step 4.1: the network model consists of a sub-network M and a sub-network N; the sub-network M consists of three convolutional layers and three maximum pooling layers; the sub-network N consists of five convolutional layers and five maximum pooling layers;

the maximum pooling layer can retain more texture information, and the calculation formula of the maximum pooling layer is as follows:

wherein, y _kij Indicating that the feature map of the kth is related to the rectangular region R _ij Maximum pooled output value of, x _kpq Represents a rectangular region R _ij The element at (p, q);

step 4.2: the no-reference image quality evaluation network model adopts an L1 loss function, and the calculation formula of the L1 loss function of the network model is as follows:

where N denotes the number of images in batch processing, p _i Representing the prediction score of the i-th image block, q _i Representing the real label score of the ith image block;

step 4.3: the model is optimized by adopting an Adam optimizer in the non-reference image quality evaluation network model, parameters such as learning rate and the like are adjusted in a self-adaptive mode, and the convergence speed of the model is accelerated.

In a further technical scheme, the step 5 specifically comprises the following steps:

step 5.1: taking the original distortion image blocks, the gradient image blocks, the H channel image blocks and the S channel image blocks in the training set obtained in the step 1, the step 2 and the step 3 as the input of a subnetwork M;

step 5.2: obtaining four feature vectors after the four image block data pass through a first convolution layer and a pooling layer of a sub-network M, and fusing the four feature vectors to enhance the extraction of global features through a sub-network N; the formula for fusing the four feature vectors is as follows:

X＝concat(X _i ，X _g ，X _h ，X _s )；

wherein, X _i 、X _g 、X _h And X _s Respectively representing feature vectors obtained after the original image block, the gradient image block, the H-channel image block and the S-channel image block are subjected to first-layer convolution pooling, wherein X represents a feature after fusion, and concat is a feature fusion operation;

step 5.3: inputting the X obtained in the step 5.2 into the sub-network M, reducing the dimensions of the extracted global features through the fully-connected neural network again to obtain a feature vector Y, in the process, calculating by using matrix multiplication in the fully-connected layer, converting the input high-dimensional feature data into low-dimensional sample marks, retaining useful information therein, and eliminating the spatial relationship among the features, wherein the calculation formula output by the l layer is as follows:

a ^l ＝σ(W ^l a ^l-1 +b ^l )；

wherein, a ^l-1 Represents the output data of layer l-1, W ^l Weight parameter representing the l-th layer, b ^l Represents the offset of the l-th layer;

step 5.4: fusing the feature vector Y subjected to dimensionality reduction with the structure distortion high-dimensional features and the color distortion high-dimensional features, wherein the fused features pass through a linear regression layer;

step 5.5: in the linear regression layer, the network is subjected to multiple iterations, a small batch of image block training samples are read in each iteration, and a group of predictions are obtained through a network model; after calculating the loss, the network starts to perform backward propagation and stores the gradient of each parameter; meanwhile, the network calls an optimization algorithm Adam to update the model parameters;

step 5.6: after the steps, obtaining a trained reference-free image quality evaluation network model;

step 5.7: inputting the test set obtained in the step 1, the step 2 and the step 3 into a no-reference image quality evaluation model based on color and structure distortion, then calculating the quality predicted value of all subblocks segmented by the image in the test set by the no-reference image quality evaluation model, and performing equivalent average on all scores to obtain the predicted quality score Q of the image, wherein the calculation formula is as follows:

wherein, N represents the number of sub-blocks after the test image is divided, p _i Indicating the quality prediction value of the ith image block.

The non-reference color image quality evaluation method based on color and structure distortion provided by the embodiment of the invention has the following beneficial effects:

1) The method is an end-to-end optimization network model algorithm facing color images, compared with a divide-and-conquer strategy, the 'end-to-end' network does not need human intervention in the whole learning process, and the method overcomes the defects of inaccuracy and incompleteness caused by manual feature extraction in the traditional machine learning algorithm. Meanwhile, the end-to-end learning mode has the advantage of synergy and can obtain a global optimal solution.

2) The method is a no-reference image quality evaluation algorithm, and can evaluate the quality of a distorted image under the condition of no reference image (original image) by using a trained convolutional neural network framework.

3) The Scharr operator is introduced into the method, amplifies the weight coefficient in the filter to increase the difference between pixel values, and is effective for extracting weaker edges in the image.

4) The method fully considers the influence of non-structural distortion, such as color distortion, on the image quality, and introduces HSV color space consistent with the human visual system; wherein the luminance component (V) is independent of the image color information, and the hue (H) and saturation (S) approach the way the human eye perceives colors. Therefore, the invention separates the hue (H) and saturation (S) components in the HSV color space in the image for quality evaluation, and obtains the effect consistent with the visual system of human eyes.

5) The method fully considers the influence of high-low-layer characteristics on the image quality evaluation effect, extracts low-dimensional characteristics and high-dimensional characteristics of structure distortion and color distortion by using a designed network model, performs characteristic fusion on the low-dimensional characteristics and the high-dimensional characteristics, and inputs the low-dimensional characteristics and the high-dimensional characteristics into a full-connection-layer neural network for quality regression prediction. From the results, the simulated mass fraction and the real mass fraction have strong correlation and small error.

Drawings

Fig. 1 is a flowchart of a method for evaluating quality of a color image without reference based on color and structural distortion according to an embodiment of the present invention;

fig. 2 is a schematic network structure diagram of a color and structure distortion-based non-reference color image quality evaluation method according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Specific implementations of the present invention are described in detail below with reference to specific embodiments.

As shown in fig. 1-2, a method for evaluating quality of a color image without reference based on color and structural distortion according to an embodiment of the present invention includes the following steps:

and 2, step: constructing a group of two-dimensional Scharr (edge extraction) detection operators, performing convolution processing on the input image blocks by using the two-dimensional Scharr detection operators to obtain gradient characteristic information of distorted image blocks and generate gradient image blocks;

and step 3: converting the (RGB) image block data set obtained in step 1 into an HSV (Hue, value) color space, and separating an H (Hue) channel and an S (Saturation) channel in the HSV color space;

and 4, step 4: constructing a color and structure distortion-based non-reference image quality evaluation convolutional neural network model according to an existing image block data set;

and 5: inputting the training sets divided in the step 1, the step 2 and the step 3 into the non-reference image quality evaluation convolutional neural network model constructed in the step 4 to obtain a trained non-reference image quality evaluation model; and then calculating the quality predicted value of all sub-blocks segmented by the images in the test set by the non-reference image quality evaluation model.

As a preferred embodiment of the present invention, in step 1, the color image is divided into image blocks of 32 × 32 size in a non-overlapping division manner; 80% of the color distorted image block data set was randomly selected as the training set, and the remaining 20% was selected as the test set.

As a preferred embodiment of the present invention, in step 2, since the gradient information is more sensitive to image blur and compression, using multi-feature fusion will improve the performance of IQA (picture-based interactive question answering). Under normal circumstances, the gradient image represents the detail of the input. However, given a distorted image, the distortion of the high frequency components typically contains the most important features to identify image degradation. Thus, in practice, it is more likely that the image quality will be described using both the image and the gradient as inputs.

Wherein, the step 2 specifically comprises the following steps:

step 2.1: inputting a distorted image block I (x);

step 2.2: establishing a group of two-dimensional Scharr detection operators:

step 2.3: performing convolution calculation on the distorted image block I (x) input in the step 2.1 by using the two-dimensional Scharr detection operator established in the step 2.2, so as to obtain gradient information of the distorted image block I (x) in a two-dimensional space; for image block I (x), its calculation formula by Scharr convolution is:

wherein, G _P (x) And G _Q (x) Representing the gradient of the image along the horizontal and vertical directions respectively,

representing a convolution operation; the overall gradient calculation formula for the image block I (x) is:

as a preferred embodiment of the invention, in step 3, the color and intensity of the pixel are the original and primary information reflecting the image quality. Most of the methods achieve good performance by using them. Therefore, it is a natural idea to solve the IQA problem by using color images, and the present invention extracts H channels and S channels of HSV channels most relevant to the human visual system to obtain more color features, and extracts a variety of features from the H channel image and S channel image through a convolutional neural network.

Wherein, the step 3 specifically comprises the following steps:

V＝max；

wherein, R, G, B are the scores of RGB color quantity of the image point respectively, and max and min are the maximum value and the minimum value of the three values of R, G, B.

As a preferred embodiment of the present invention, in step 4, a network model based on HSV convolutional neural network is designed, and the model is composed of a plurality of single-channel CNN (convolutional neural network) models. The training results for each CNN may be considered a depth feature extractor. The multi-CNN model can extract quality characteristic representation of the multi-color component map, and more comprehensively describe image quality distortion characteristics, particularly image color information. The characteristics extracted from a plurality of network models are fused in a collaborative mode, and the performance of the image quality predictor is further improved through the idea of ensemble learning.

Wherein, step 4 specifically comprises the following steps:

wherein, y _kij Indicating that the feature map of the kth is related to the rectangular region R _ij Maximum pooled output value of, x _kpq Represents a rectangular region R _ij Is located at (p, q).

where N denotes the number of images in batch processing, p _i Representing the prediction score of the i-th image block, q _i Representing the true label score of the ith image block.

As a preferred embodiment of the present invention, step 5 specifically includes the following steps:

step 5.1: and (3) taking the original distortion image blocks, the gradient image blocks, the H-channel image blocks and the S-channel image blocks in the training set obtained in the steps 1, 2 and 3 as the input of the subnetwork M.

Step 5.2: obtaining four feature vectors after the four image block data pass through a first convolution layer and a pooling layer of a sub-network M, and fusing the four feature vectors to enhance the extraction of global features through a sub-network N; wherein, the formula of the four feature vector fusion is as follows:

X＝concat(X _i ，X _g ，X _h ，X _s )；

wherein X _i 、X _g 、X _h And X _s Respectively representing an original image block, a gradient image block,And obtaining a feature vector after the H-channel image block and the S-channel image block are subjected to first-layer convolution pooling, wherein X represents the feature after fusion, and concat is a feature fusion operation.

Step 5.3: inputting the X obtained in the step 5.2 into a sub-network M, performing dimensionality reduction on the extracted global features through a fully-connected neural network again to obtain feature vectors Y, in the process, calculating by using matrix multiplication in a fully-connected layer, converting input high-dimensional feature data into low-dimensional sample marks, retaining useful information in the sample marks, and eliminating the spatial relationship among the features, wherein a calculation formula output by a layer I is as follows:

a ^l ＝σ(W ^l a ^l-1 +b ^l )；

wherein, a ^l-1 Represents output data of layer l-1, W ^l Weight parameter representing the l-th layer, b ^l Indicating the offset of the l-th layer.

Step 5.4: and fusing the feature vector Y subjected to dimensionality reduction with the structure distortion high-dimensional features and the color distortion high-dimensional features, wherein the fused features pass through a linear regression layer.

Step 5.5: in the linear regression layer, the network is subjected to multiple iterations, a small batch of image block training samples are read in each iteration, and a group of predictions are obtained through a network model; after calculating the loss, the network starts to perform backward propagation and stores the gradient of each parameter; at the same time, the network invokes the optimization algorithm Adam to update the model parameters.

Step 5.6: after the steps, the trained reference-free image quality evaluation network model is obtained.

Step 5.7: inputting the test set obtained in the steps 1, 2 and 3 into a no-reference image quality evaluation model based on color and structural distortion, then calculating the quality prediction values of all sub-blocks segmented by the image in the test set by the no-reference image quality evaluation model, and performing equivalent averaging on all the scores to obtain the predicted quality score Q of the image, wherein the calculation formula is as follows:

In addition, fig. 1 shows a workflow for color image quality evaluation using the multi-depth CNN. First, a color image is subjected to a blocking operation, and divided into a set of image block data. Second, a generic CNN structure is employed and improved to learn useful feature representations from the image block dataset. Each image block is an input to a single CNN, thus forming a multiple CNN model. Thirdly, extracting hue (H) and saturation (S) of the HSV color model. The multiple output feature vectors are then fused into depth features. And finally, constructing a nonlinear regression model, and mapping the extracted depth features to the visual quality scores.

The embodiment of the invention provides a quality evaluation method of a non-reference color image based on color and structural distortion, which has the following beneficial effects:

1) The method is an end-to-end optimization network model algorithm facing color images, compared with a divide-and-conquer strategy, the end-to-end network does not need artificial intervention in the whole learning process, and the defects of inaccuracy and incompleteness caused by manually extracting features in the traditional machine learning algorithm are overcome. Meanwhile, the end-to-end learning mode has the advantage of synergy and has larger possibility of obtaining the global optimal solution.

3) In terms of structural distortion, in order to effectively extract weak edges, the difference between pixel values needs to be increased, so a Scharr operator is introduced, and the Scharr operator amplifies the weight coefficients in the filter to increase the difference between the pixel values, so that the Scharr operator is also effective in extracting the weak edges in the image.

4) Under the condition that most non-reference quality evaluation methods only consider structural distortion, the method fully considers the influence of non-structural distortion, such as color distortion, on the image quality, and introduces HSV color space consistent with the human visual system; wherein the luminance component (V) is independent of the image color information, and the hue (H) and saturation (S) are close to the way colors are perceived by the human eye. Therefore, the invention separates the hue (H) and saturation (S) components in the HSV color space in the image for quality evaluation, and obtains the effect consistent with the visual system of human eyes.

5) The method fully considers the influence of high-low-layer characteristics on the image quality evaluation effect, extracts the low-dimensional characteristics and the high-dimensional characteristics of structural distortion and color distortion by using a designed network model, performs characteristic fusion on the low-dimensional characteristics and the high-dimensional characteristics, and inputs the low-dimensional characteristics and the high-dimensional characteristics into a full-connection-layer neural network for quality regression prediction. From the results, the simulated mass fraction and the real mass fraction have strong correlation and small error.

All possible combinations of the technical features of the above embodiments may not be described for the sake of brevity, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is specific and detailed, but not to be understood as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A no-reference color image quality evaluation method based on color and structure distortion is characterized by comprising the following steps:

and 2, step: constructing a group of two-dimensional Scharr detection operators, performing convolution processing on input image blocks by using the two-dimensional Scharr detection operators to obtain gradient characteristic information of distorted image blocks and generating gradient image blocks;

and step 3: converting the image block data set obtained in the step 1 into an HSV color space, and separating an H channel and an S channel in the HSV color space;

2. The color and structure distortion-based non-reference color image quality evaluation method according to claim 1, wherein in step 1, the color image is divided into 32 x 32 sized image blocks in a non-overlapping partitioning manner;

80% of the color distorted image block data set was randomly selected as the training set, and the remaining 20% was selected as the test set.

3. The color and structure distortion-based non-reference color image quality evaluation method according to claim 2, wherein the step 2 specifically comprises the following steps:

step 2.1: inputting a distorted image block I (x);

step 2.2: establishing a group of two-dimensional Scharr detection operators:

4. the color and structure distortion-based non-reference color image quality evaluation method according to claim 3, wherein the step 3 specifically comprises the following steps:

V＝max；

wherein, R, G and B are the values of RGB color quantity of the image point respectively; max and min are the maximum and minimum of the three values R, G, B.

5. The color and structure distortion-based non-reference color image quality evaluation method according to claim 4, wherein the step 4 specifically comprises the following steps:

step 4.3: the model is optimized by adopting an Adam optimizer in the non-reference image quality evaluation network model, the learning rate parameters are adjusted in a self-adaptive mode, and the convergence speed of the model is accelerated.

6. The color and structure distortion-based non-reference color image quality evaluation method according to claim 5, wherein the step 5 specifically comprises the following steps:

step 5.2: in step 5.1, four image block data are processed by a first convolution layer and a pooling layer of a sub-network M to obtain four feature vectors, and the four feature vectors are fused and then used for enhancing the extraction of global features through a sub-network N; the formula for fusing the four feature vectors is as follows:

X＝concat(X _i ，X _g ，X _h ，X _s )；

wherein X _i 、X _g 、X _h And X _s Respectively representing feature vectors obtained after the original image block, the gradient image block, the H-channel image block and the S-channel image block are subjected to first-layer convolution pooling, wherein X represents a feature after fusion, and concat is a feature fusion operation;

step 5.3: inputting the X obtained in the step 5.2 into the sub-network M, reducing the dimensions of the extracted global features through the fully-connected neural network again to obtain a feature vector Y, in the process, calculating by using matrix multiplication in the fully-connected layer, converting input high-dimensional feature data into low-dimensional sample marks, retaining useful information therein, and eliminating the spatial relationship among the features, wherein a calculation formula output by the l layer is as follows:

a ^l ＝σ(W ^l a ^l-1 +b ^l )；

wherein, a ^l-1 Represents output data of layer l-1, W ^l Weight parameter representing the l-th layer, b ^l Represents the offset of the l-th layer;

and step 5.5: in the linear regression layer, the network is subjected to multiple iterations, a small batch of image block training samples are read in each iteration, and a group of predictions are obtained through a network model; after calculating the loss, the network starts to perform backward propagation and stores the gradient of each parameter; meanwhile, the network calls an optimization algorithm Adam to update the model parameters;

step 5.7: inputting the test set obtained in the steps 1, 2 and 3 into a no-reference image quality evaluation model based on color and structural distortion, then calculating the quality predicted values of all sub-blocks segmented by the image in the test set by the no-reference image quality evaluation model, and performing equivalent averaging on all the quality predicted values to obtain the predicted quality score Q of the image, wherein the calculation formula is as follows: