CN108615231B

CN108615231B - All-reference image quality objective evaluation method based on neural network learning fusion

Info

Publication number: CN108615231B
Application number: CN201810240606.0A
Authority: CN
Inventors: 丰明坤; 吴茗蔚; 王中鹏; 施祥; 林志洁; 向桂山
Original assignee: Zhejiang Lover Health Science and Technology Development Co Ltd
Current assignee: Zhejiang Lover Health Science and Technology Development Co Ltd
Priority date: 2018-03-22
Filing date: 2018-03-22
Publication date: 2020-08-04
Anticipated expiration: 2038-03-22
Also published as: CN108615231A

Abstract

The invention discloses a full-reference image quality objective evaluation method, which comprises the following steps: the BP neural network is used for image quality evaluation, visual multichannel evaluation results of distorted images based on various objective evaluation algorithms are input into the BP neural network by designing a BP neural network image quality prediction model of visual multichannel multi-algorithm adaptive fusion, supervised learning training is carried out on the BP neural network by taking subjective test result scores of human eyes as a training target, objective evaluation results of various objective evaluation algorithms are output in a prediction mode, the objective evaluation results of various algorithms are subjected to adaptive fusion, and final objective evaluation of the quality of the distorted images is obtained. The method comprehensively improves the index levels of the PSNR, SSIM and SVD evaluation methods, exceeds the recent visual characteristic perception processing and visual psychology derivation fusion evaluation method, and has better evaluation stability.

Description

All-reference image quality objective evaluation method based on neural network learning fusion

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a full-reference image quality objective evaluation method based on neural network learning fusion.

Background

As a signal which is widely used, an image plays a significant role in various fields such as information acquisition, transmission, and processing. Currently, with the improvement of cloud computing capability and the rise of artificial intelligence research, various application services based on an image terminal processing platform have been developed unprecedentedly, however, image signals are easily polluted, and therefore, the research significance of image quality evaluation is great. In the field of image quality evaluation research, an objective method is a research hotspot in the field in an automatic continuous efficient working mode, wherein the research significance of full-reference image quality evaluation is particularly important.

The evaluation method of the engineering evaluation methods and the improved algorithms thereof, such as a structural similarity SSIM method, an information fidelity IFC method, a singular value decomposition SVD method and the like, which appear in recent years, but the evaluation standards of the methods are different, so that the evaluation result data is not comparable, for example, the SSIM value range is between {0 and 1} and the larger the SSIM value range is, the higher the image quality is, the IFC range is not limited, the larger the SVD is, the lower the image quality is, the larger the IFC is, the higher the image quality is, the experiment proves that some important evaluation index levels of the performance of the methods are to be improved, for example, the relevant coefficient of the brain Pierce, the relevant coefficient of the brain Pearsn and the PSNR are, and the like, and the subjective evaluation result of the visual quality is sensitive to the visual characteristic of the visual field, such as the visual characteristic of the visual similarity of the visual field is more sensitive to the subjective evaluation of the visual information, the visual characteristic of the visual perception of the visual field of the visual information, the visual perception of the visual characteristic of the visual field is more sensitive to the subjective evaluation method, the subjective evaluation of the visual characteristic of the visual perception of the visual information, the visual characteristic of the visual system, the visual characteristic of the visual perception of the visual characteristic of the visual system is more sensitive to the visual characteristic of the visual system, the visual perception of the visual characteristic of the visual system, the visual system is more sensitive to the visual system, the subjective evaluation method of the visual characteristic of the visual system, the visual characteristic of the visual system is more sensitive to the visual system, the visual characteristic of the visual system, the visual characteristic of the visual system is more sensitive to the visual characteristic of.

In recent years, with the intensive research of neural networks, remarkable achievements have been achieved in a plurality of artificial intelligence fields such as signal processing, pattern recognition and the like. The Back Propagation (BP) neural network technology is particularly applied to the field of image processing, theoretically, a BP neural network with more than three layers can approach a nonlinear function with any precision, and has the self-adaptive learning capacity on external excitation, so that the BP neural network has very strong classification and identification capacity, and the problem of nonlinear classification in the development history of the neural network is solved due to the occurrence of the BP. The BP neural network and the human eye vision system have very similar parts in image processing, for example, the classification and identification capacity of the BP neural network on image features reflects the multichannel characteristic of human eye vision for extracting image features, and the strong approximation capacity of the BP neural network on any nonlinear function simulates different subjective and objective mapping relations of human eye vision multichannel on different image quality objective evaluation algorithms. Therefore, the visual multichannel evaluation method based on the BP neural network fuses visual multichannel evaluation results of different image quality objective algorithms, and therefore performance of the existing image quality objective evaluation algorithm is improved.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a full-reference image quality objective evaluation method based on neural network learning fusion, which is characterized in that a BP neural network is utilized to simulate the uncertainty of multiple channels of a human eye vision system to various image quality objective evaluation algorithms, the fusion of the various image quality objective evaluation algorithms to distorted image quality visual multi-channel evaluation results is predicted through the learning training of the BP neural network, the prediction output of the BP neural network also unifies the consistency of data results of various evaluation algorithms, further, the self-adaptive fusion algorithms of various evaluation methods are designed, the BP neural network prediction results of various different objective algorithm visual multi-channel evaluations are subjected to function self-adaptive fusion, and finally, the objective evaluation results of the distorted image quality are obtained.

In order to achieve the purpose, the invention provides the following technical scheme:

a full-reference image quality objective evaluation method based on neural network learning fusion comprises the following steps:

(1) randomly dividing the distorted image into a training set and a test set;

(2) preprocessing all reference images and all distorted images respectively to obtain corresponding image gray level matrixes;

(3) processing image gray matrixes of the distorted image and the reference image by adopting a wavelet transform method to obtain visual multichannel information of the distorted image and the reference image;

(4) carrying out perception sparsification processing on each visual channel information of the distorted image and the reference image;

(5) based on the information of each visual channel of the reference image after sparsification, carrying out full-reference quality evaluation on the information of the distorted image corresponding to the visual channel after sparsification by using an image quality objective evaluation algorithm to obtain a visual multichannel objective evaluation result of the quality of the distorted image;

(6) establishing a BP neural network training model, performing learning training on the model by using a visual multi-channel objective evaluation result of the quality of the distorted image in a training set database, and storing the weight and threshold parameters of the training result;

(7) constructing a BP neural network prediction model in response according to the weight and threshold parameters of the BP neural network training result, taking the visual multichannel objective evaluation result of each frame of distorted image quality in a test set database as the test input of the BP neural network prediction model, taking the prediction output of the BP neural network prediction model as the fusion evaluation result of the selected image quality objective algorithm on the corresponding distorted image quality, and carrying out bias processing on the result;

(8) changing different objective evaluation algorithms of image quality, repeating the steps (5) to (7) to obtain a fusion evaluation result of all distorted image quality in the test set data based on the different objective evaluation algorithms;

(9) and sequentially carrying out self-adaptive fusion on the quality of each frame of distorted image in the test set database based on fusion evaluation results of different objective evaluation algorithms to obtain the final objective evaluation of the quality of each frame of distorted image.

Based on the outstanding classification recognition capability of the BP neural network in the field of image processing and the strong approximation capability to any nonlinear function, the BP neural network is used for simulating the uncertain characteristics of a human visual system, namely, human visual multichannel has different main and objective mapping relations to various objective evaluation algorithms, the visual multichannel evaluation results of various objective algorithms are input into the BP neural network, the human eye subjective test score of a distorted image is used as a true value to be output, the BP neural network is supervised learned, and the fusion evaluation result of the quality of the distorted image based on various objective algorithms is obtained. By means of the strong analysis deduction capability and nonlinear numerical approximation capability of the BP neural network, the method simulates different subjective and objective mapping relations of the visual multichannel to different objective evaluation algorithms, so that visual multichannel evaluation results of all objective evaluation algorithms can be well fused, and the performance of all objective evaluation algorithms is improved. The prediction output of the BP neural network also unifies the consistency of data results of various evaluation algorithms, and further designs a self-adaptive fusion algorithm of various evaluation methods. The invention comprehensively improves the index levels of the PSNR, SSIM and SVD image quality objective evaluation method, exceeds the recent visual characteristic perception processing and visual psychology derivation fusion evaluation method, and has better stability.

In the step (1), according to the empirical value, the proportion of the distorted images in the training set and the test set is 1/3-1.

Preferably, in the step (2), the graying processing and the gaussian low-pass filtering are sequentially performed on all the reference images and all the distorted images respectively to obtain corresponding image grayscale matrixes.

Specifically, the window size used in gaussian low-pass filtering is k × k, the value range of k is 0.01t to 0.1t, t is the minimum value of rows and columns of an image matrix, i.e., t is min { P, Q }, P × Q is the size of a distorted image, and the standard deviation used in gaussian low-pass filtering is 1.0 to 2.0.

The existing wavelets are of many kinds, and preferably, in step (3), L og-Gabor wavelets are selected to process the image gray level matrixes of the distorted image and the reference image to obtain visual multichannel information of the distorted image and the reference image, L og-Gabor wavelets have nonlinearity of visual brightness perception and good visual direction filtering characteristics, and for convenience of description, the visual channel of L og-Gabor wavelets is recorded as (s, o), s represents a visual scale factor, and o represents a visual direction factor, and further preferably, s is 5 and o is 4.

Preferably, in the step (4), a threshold-based filtering algorithm is adopted to perform perceptual thinning processing on each visual channel information of the distorted image and the reference image.

Further, ξ 'is taken as a threshold value'_s,o(T)＝ξ_s,o(T)^K(T)

Where K (T) is a function of the foveal character of the human visual system, ξ_s,o(T) is the threshold of the visual pathway (S, O), ξ_s,oThe values of (T) are as follows:

ξ_s,o(T)＝ξ(T)·CSF[f_o(s)]

in the formula, CSF [ f_o(s)]For the contrast sensitivity function of visual multi-channel, ξ (T) is the decision threshold function, and takes the following values:

wherein a hard threshold is detected for the degree of visual distortion, C_vr(T) is a local effective contrast function of a distorted image of a second T sub-block relative to a reference image after the image is blocked, and the related value function is as follows:

C_vr(T)＝σ_vr(T)/μ_vr(T)，

in the formula, mu_vr(T) is the mean value of the gray levels of the reference image sub-blocks T, σ_err(T) is the standard deviation, σ, of the distorted image in the sub-block T relative to the reference image_vr(T) is a subblock_TMinimization of standard deviation of intra-set distorted image with respect to reference imageThe value is obtained.

Further, the value is-5.

The block size of the image is k × k, the value range of k is 0.01-0.05 t, t is the minimum value of rows and columns of the image matrix, namely t is min { P, Q }, P × Q is the size of the visual channel image, and preferably, the value range of k is 8-16.

Preferably, the objective image quality evaluation algorithm is PSRN, SSIM or SVD, which are the most widely used and mature algorithms IQA for objective image quality evaluation.

Preferably, in the constructed BP neural network training model, the number of neurons in an input layer of the BP neural network is equal to the number of channels of wavelet decomposition, only one neuron in an output layer of the BP neural network is provided, the output represents the quality evaluation result of the selected image quality objective evaluation algorithm, in the training process, the multichannel objective evaluation result of distorted image quality is used as the input of the BP neural network training model, the human eye subjective test result value DMOS of distorted image quality is used as the training target of the BP neural network training model, and the error e between the output of the BP neural network training model and the true value output is smaller than 0.00001 or the training iteration number reaches 500 as the training termination condition.

Specifically, a hidden layer of the BP neural network training model is taken as one layer, and the number of neurons in the hidden layer is 10-30. The length of the training input data of the BP neural network is equal to the number of distorted images in the training set database.

The BP neural network prediction model is basically the same as the BP neural network training model, and the difference is that the prediction model removes the true value output of the training model, the true value output is used for monitoring the learning of the training model on input data, namely the input data is used as a training target to be input into the training model during training, the prediction model adds parameter input, the parameters are the network weight and the threshold value of the learning result of the training model, the prediction model adds feedback control from the BP neural network prediction model to the BP neural network training model, the length of the model testing input data is equal to the number of distorted images in a test set database, the prediction output of a neuron at the BP neural network output layer is the quality evaluation result of a selected objective algorithm, and whether the training process of the BP neural network training model is repeated is determined by testing the evaluation index level of the quality evaluation result output by the BP neural network until the evaluation index of the prediction output of the BP neural network prediction model reaches the ideal level, and the evaluation indexes are three of RMSE, P L CC and SROCC.

In the step (8), the bias processing of the quality evaluation result is to eliminate negative numbers in the prediction result of the BP neural network, and the bias method is to superpose all the results by one positive number.

In the step (9), the self-adaptive fusion is to arbitrarily select the prediction results of the two image quality objective evaluation algorithms for fusion, then fuse the fusion result with the prediction result of the third image quality objective evaluation algorithm, and circulate in sequence until the prediction results of all the image quality objective evaluation algorithms are fused, so as to obtain the final objective evaluation of the distorted image quality. The following formula is adopted for adaptive fusion:

wherein x is₁And x₂BP neural network prediction model prediction results representing two different objective evaluation algorithms, or x₁、x₂One of which is an intermediate value in a certain fusion process, lambda₁The values are taken as self-adaptive parameters according to the following formula:

wherein the parameter gamma₁₁And gamma₁₂To adjust the parameters.

Compared with the prior art, the invention has the following technical effects:

(1) the invention carries out consistency conversion on the result data of various objective image quality evaluation algorithms, so that the data has visual comparability.

(2) The three evaluation indexes of RMSE, P L CC and SROCC corresponding to the method have the highest level, are far higher than that of the conventional classical method and engineering method, have larger advantages compared with the information processing evaluation method of engineering, are obviously superior to the visual characteristic and fusion processing evaluation method thereof, and have comprehensive advantages compared with the visual characteristic perception processing and visual psychology derivation fusion evaluation method.

(3) For image quality evaluation of different distortion types, the variation range of the corresponding three evaluation index levels of RMSE, P L CC and SROCC is the minimum relative to all other methods, and the method has better stability.

Drawings

FIG. 1 is a flowchart of an objective evaluation method for quality of a full reference image according to an embodiment;

FIG. 2 is a result image of the embodiment after processing an original image;

FIG. 3 is a visual multi-channel information view extracted from FIG. 2 provided by an embodiment;

FIG. 4 is a view of the sparsification processing result of the visual multi-channel information view of FIG. 3 provided by the embodiment;

FIG. 5 is a schematic structural diagram of a BP neural network training model provided by an embodiment;

fig. 6 is a schematic structural diagram of a BP neural network prediction model provided by the embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

The implementation is described by taking an L IVE Release2 image standard database provided by the Austin partial school image video engineering of Texas university as an example, wherein the L IVELEASE 2 image standard database provided by the Austin partial school image video engineering of Texas university stores a plurality of paired standard cases (namely, reference images and distorted image pairs), and the distorted image in each case has a known corresponding MOS value (subjective evaluation score), wherein the MOS value is a subjective test result of human eyes.

When the objective evaluation of the quality of the full reference image is carried out, firstly, a reference image and a corresponding distorted image are selected from an L IVE Release2 image standard database, and then the objective evaluation of the quality of the full reference image is carried out on the selected distorted image, wherein the flow is shown in FIG. 1, and the method comprises the following steps:

step 1, randomly dividing the distorted images into a training set database and a test set database, wherein the dividing method is to randomly select the distorted images, and the quantity ratio of the distorted images in the two databases is set as 1.

And 2, respectively preprocessing all the reference images and the distorted images to obtain corresponding image gray matrixes.

When the pretreatment is carried out:

first, the reference image and the distorted image are subjected to Gray scale transformation according to the following formula, and are respectively transformed into Gray scale images Gray:

Gray＝0.29900·R+0.58700·G+0.11400·B

wherein R, G, B are the intensity values of the source image (distorted image or reference image) on R, G, B three channels, respectively.

Then, the gray level images obtained through gray level conversion are respectively subjected to gaussian low-pass filtering, in this embodiment, the size of the image in the L IVE Release2 image standard database is considered, the window size adopted when gaussian low-pass filtering is performed on all the images is 16 × 16, and the standard deviation is 1.0.

The grayscale image is low-pass filtered based on the filter, and the edge 0-complementing part is not used for calculating the result part in the two-dimensional cross-correlation process.

The result of the pre-processing of the original image in this embodiment is shown in fig. 2.

And 3, extracting visual multichannel information of the image by using wavelet transformation, wherein an extraction formula is as follows by taking L og-Gabor wavelet as an example:

v_(s,o)(i,j)＝F^-1[G(ω,θ_j)×F(f(i,j)]

wherein f (i, j) representsOriginal image, v_(s,o)(i, j) represents visual channel (s, o) information view extracted from original image F (i, j), s and o are log-Gabor scale factor and direction factor respectively, where s is 5, o is 4, F represents frequency domain positive transformation, and F represents the transformation of original image F (i, j) and the transformation of original image F (i, j) is performed by using the method^-1Representing the inverse frequency domain transformation, G (omega, theta)_j) Is a frequency function expression of log-Gabor.

In this embodiment, the result of the multi-channel decomposition based on L og-Gabor wavelet is shown in fig. 3 for fig. 2.

Step 4, carrying out perception sparsification processing on each visual channel information, wherein the processing formula is as follows:

in the formula

For the thinning processing result, the calculation process is taken as-5, and the image block size is taken as 11 × 11.

In this embodiment, the results of perceptual thinning processing performed on each visual channel in fig. 3 are shown in fig. 4.

And 5, selecting an existing image quality objective evaluation algorithm to perform full reference image quality evaluation on the multi-visual channel information vision of all distorted images, and selecting a PSRN algorithm to evaluate.

And 6, constructing a BP neural network training model, learning and training the model by using a multi-channel objective evaluation result of the quality of the distorted image in a training set database, and storing the weight and threshold parameters of the training result, wherein the constructed training model is shown in FIG. 5.

The model characteristics are described as follows:

the number of neurons of an input layer of the BP neural network is equal to the number of channels of wavelet decomposition (5 × 4 is equal to 20), the hidden layer is one layer, the number of the neurons of the hidden layer is 20, the number of neurons of an output layer of the BP neural network is only one, and the output represents a quality evaluation result of a selected objective algorithm PSNR (Peak to noise ratio). the training target of the BP neural network is a human eye subjective test result score value DMOS of distorted image quality.the length of input data trained by the BP neural network is equal to the number of distorted images in a training set database, the BP neural network is subjected to supervised learning training until one of training termination conditions of the BP neural network is reached, and all parameters such as network weight and threshold of the training result of the BP neural network are stored.

(1) Error e between BP prediction output and DMOS < 0.00001

(2) The number of iterations is taken to be 500.

And 7, testing the visual multichannel evaluation result data of the selected objective algorithm by the BP neural network prediction model, wherein the prediction output is used as the quality evaluation of the objective algorithm, and the BP neural network prediction model is shown in figure 6.

The model characteristics can be described in such a way that the model is basically the same as a BP neural network training model, and the difference is that firstly, the training target input of the training model is removed from the prediction model, secondly, the parameter input is added to the prediction model, the parameters are the network weight and the threshold of the learning result of the training model, and thirdly, the feedback control from the BP neural network prediction model to the BP neural network training model is added to the prediction model.

In this embodiment, for the bias processing of the quality evaluation results of the PSNR, SSIM, and SVD objective evaluation algorithms, the processing method is to add 1 to all the results uniformly.

And 8, replacing different image quality objective evaluation algorithms, wherein two image quality objective evaluation algorithms of SSIM and SVD are adopted, and repeating the steps 5 to 7. Thus, for three image quality objective algorithms of PSNR, SSIM and SVD, the quality evaluation results of all distorted images in the test set database about visual multi-channel evaluation of the three objective algorithms based on BP neural network prediction are obtained.

And 9, performing self-adaptive fusion on the quality evaluation results of the PSNR, SSIM and SVD image quality objective algorithms to obtain the final objective evaluation of the quality of the distorted image.

Firstly, selecting quality evaluation results of two objective algorithms of SSIM and PSNR for fusion, wherein the fusion formula is as follows:

in the formula y₁Representing the fusion result, the SSIM and the PSNR respectively representing the quality evaluation result of the SSIM and the PSNR algorithm, and the parameter lambda₁The calculation formula is as follows:

will y₁And continuously carrying out self-adaptive fusion with the quality evaluation result SVD of the third objective algorithm, wherein the fusion formula is as follows:

where y represents the final objective assessment of the quality of the distorted image, parameter λ₁The calculation formula is as follows:

based on the y-value of the objective evaluation result and the MOS value of the subjective evaluation score of each distorted image recorded in the L IVE Release2 image standard database, the SROCC index, the RMSE index and the P L CC index of the objective evaluation method are obtained by calculation according to the specification of the international Video Quality Experts Group (VQEG).

Table 1 shows the comparison of SROCC index, RMSE index and P L CC index of the evaluation result obtained when the objective evaluation method (BP model) of the embodiment is used for carrying out full reference image quality objective evaluation on a standard case in L IVE Release2 image standard database with the existing evaluation method.

The formats of the distorted images in the Release2 image standard database include JP2K, JPEG, WN, Gblur, and FF, and in order to illustrate the applicable range of the method of the present embodiment, the values of the respective indexes (i.e., evaluation indexes) under the different methods given in table 1 are evaluation index levels evaluated for all the distorted images of this type in the Release2 image standard database.

TABLE 1

As can be seen from table 1, for the evaluation of various types of distorted image quality, compared with the existing evaluation method, the three evaluation indexes of RMSE, P L CC, and SROCC corresponding to the method of this embodiment maintain the highest levels, which are much higher than the existing classical method and engineering method, and have greater advantages compared with the information processing evaluation method of engineering, and are significantly better than the visual feature and fusion processing evaluation method thereof, and compared with the visual psychology derivation fusion evaluation method, the recent visual feature perception processing also has overall advantages.

The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims

1. A full-reference image quality objective evaluation method based on neural network learning fusion is characterized by comprising the following steps:

(1) randomly dividing the distorted image into a training set and a test set;

(9) performing self-adaptive fusion on the quality of each frame of distorted image in the test set database based on fusion evaluation results of different objective evaluation algorithms by adopting the following formula to obtain the final objective evaluation of the quality of each frame of distorted image;

wherein x is₁And x₂The prediction output of BP neural network prediction model representing two different objective evaluation algorithms, that is, the fusion evaluation result of each frame distortion image quality based on different objective evaluation algorithms, or x₁、x₂One of which is an intermediate value in a certain fusion process, lambda₁The values are taken as self-adaptive parameters according to the following formula:

wherein the parameter gamma₁₁And gamma₁₂To adjust the parameters.

2. The method for objectively evaluating the quality of a full-reference image based on neural network learning fusion as claimed in claim 1, wherein in the step (1), the ratio of the distorted image in the training set to the distorted image in the test set is 1/3-1.

3. The objective evaluation method for quality of full-reference images based on neural network learning fusion as claimed in claim 1, wherein in step (2), all reference images and all distorted images are respectively and sequentially subjected to graying processing and gaussian low-pass filtering to obtain corresponding image grayscale matrices.

4. The objective evaluation method for quality of full-reference images based on neural network learning fusion as claimed in claim 1, wherein in step (3), L og-Gabor wavelets are selected to process the image gray level matrixes of the distorted image and the reference image, so as to obtain visual multichannel information of the distorted image and the reference image.

5. The objective evaluation method for quality of full-reference images based on neural network learning fusion as claimed in claim 1, wherein in step (4), a threshold-based filtering algorithm is adopted to perform perceptual thinning processing on each visual channel information of the distorted image and the reference image.

6. The full-reference image quality objective evaluation method based on neural network learning fusion as claimed in claim 1, wherein the image quality objective evaluation algorithm is PSRN, SSIM or SVD.