CN111325733A

CN111325733A - Image quality evaluation method combining low-level vision and high-level vision statistical characteristics

Info

Publication number: CN111325733A
Application number: CN202010112724.0A
Authority: CN
Inventors: 刘玉涛; 李秀
Original assignee: Shenzhen International Graduate School of Tsinghua University
Current assignee: Shenzhen International Graduate School of Tsinghua University
Priority date: 2020-02-24
Filing date: 2020-02-24
Publication date: 2020-06-23

Abstract

The invention provides an image quality evaluation method combining low-level vision and high-level vision statistical characteristics, which comprises the following steps: s1, carrying out local normalization on the image, and extracting low-level visual statistical characteristics; s2, sparsely representing the image, calculating a representation residual error, and extracting high-level visual statistical characteristics; s3, training the artificial neural network, learning a mapping model of the image features extracted from steps S1 and S2 to image quality to predict the image quality. The invention extracts the low-level and high-level characteristics by means of the characteristics of low-level human vision and the activities of high-level brain to obtain a mapping model from image characteristics to image quality, can effectively measure the loss of image perception quality and accurately evaluate the quality of the image.

Description

Image quality evaluation method combining low-level vision and high-level vision statistical characteristics

Technical Field

The invention relates to the technical field of image processing, in particular to an image quality evaluation method combining low-level vision and high-level vision statistical characteristics.

Background

Since the twenty-first century, with the rapid development of internet technology, digital media technology and communication technology, digital images have become an important way for people to communicate information. In recent years, the large-scale popularization of digital devices such as digital cameras, smart phones, tablet computers and the like provides great convenience for image acquisition and video acquisition. However, digital images are susceptible to various kinds of distortion in a series of processes such as acquisition, compression, storage, transmission, etc., and the quality thereof is inevitably affected to some extent. For example, in the process of shooting an image, mechanical shake and unfocusing can cause the acquired image to be blurred; during image transmission, noise and the like may be introduced. Therefore, accurately evaluating the quality of an image is of great significance for digital image industrial applications.

In terms of Image quality evaluation indexes, the structural similarity method (SSIM) proposed by Wang, Z et al in the article "Image quality assessment from information to structural similarity" published in IEEE Trans. Image Process 13, Vol.4, pp.600 to 612, judges the quality of images by measuring the structural similarity of the images. The article "VSNR: A wave-Based Visual Signal-to-Noise Ratio (VSNR) algorithm is designed in the article" IEEE Trans. image Process ", volume 16, stage 9, page 2284 to page 2298", the algorithm is divided into two steps, the first step is to judge whether distortion is visible to human eyes through Visual masking effect, if not, the algorithm considers that the image has the best Visual quality, and if so, the algorithm estimates the quality of the image by calculating the contrast distortion of the Visual system at low level and the image edge distortion at medium level. Based on a Visual salience-based Index (VSI) proposed in the paper "VSI: a Visual salience-Induced Index for Perceptual Image Quality Assessment" published by Zhang, l.et al in ieee trans. Image Process, vol.23, page 10, page 4270 to page 4281, in which firstly the change in Visual salience caused by Image distortion is studied, then the Visual Saliency is used as a feature of Image Quality to reflect the degree of distortion of the Image, and finally the Visual Saliency feature and the gradient magnitude feature are combined to predict the distortion of the Image.

Liu, Y. et al, in the article "Reduced-Reference Image Quality Assessment in Free-Energy Principle and sparse Representation" published in IEEE Trans.multimedia, volume 20, 2, pages 379 to 391, calculate the entropy of information that sparsely represents the residual to evaluate the Quality of the Image. Liu, A. et al, in the paper "image quality Assessment Based on Gradient Similarity", published in IEEE Trans. image processing, Vol.21, pp.4, 1500 to 1512, designs a Gradient Similarity Index (GSI) [127], which first calculates the Gradient magnitude Similarity between the original image and the distortion map, then improves the Gradient magnitude Similarity calculation Based on the masking property of the human visual system, and finally estimates the quality of the image by adaptively pooling the brightness, contrast and structure. Moorthy et al established Distortion-based Image authenticity and INtegrity Evaluation indexes (DIVINE) in a paper "Black Image Quality assessment From Natural Scene Statistics to Perceptional Quality", published by IEEE Trans. ImageProcess, Vol.20, No. 12, pp.3350 to 3364. Wu, J. et al, in the paper "comprehensive Quality Metal with Internal Generation Mechanism", published in IEEE Trans. image processing, Vol.22, pp.1, 43 to 54, propose a self-generating Model algorithm (IGM), which assumes a generating Model inside the brain, responsible for understanding and inferring the image and generating a corresponding predictive image, then uses an Autoregressive Model (AR) to simulate a generating Model inside the brain, and decomposes the image into two parts, a predictable part and an unpredictable part, the predictable part predicts its Quality using SSIM, the unpredictable part predicts its Quality using PSNR, and then synthesizes the qualities of the two parts to predict the Quality of the image.

Disclosure of Invention

The invention mainly aims to provide an image quality evaluation method and device combining low-level vision and high-level vision statistical characteristics so as to effectively measure the loss of image perception quality and accurately evaluate the quality of an image.

To achieve the above object, the present invention provides an image quality evaluation method combining low-level vision and high-level vision statistical characteristics, the method comprising the steps of:

s1, carrying out local normalization on the image, and extracting low-level visual statistical characteristics;

s2, sparsely representing the image, calculating a representation residual error, and extracting high-level visual statistical characteristics;

s3, training the artificial neural network, learning a mapping model of the image features extracted from steps S1 and S2 to image quality to predict the image quality.

Preferably, in step S1, the image is locally normalized by using the local mean and variance of the image to obtain a normalized coefficient image, and then a segment of distribution in the normalized coefficient distribution is intercepted as the low-level visual feature vector describing the image quality variation.

Preferably, in step S2, the input image is sparsely represented, then a representation residual is calculated, and a segment of distribution in the distribution of the residual is truncated as a high-level visual feature vector describing the change of image quality.

Preferably, in step S3, an artificial neural network with a four-layer structure is designed, which includes three hidden layers and a linear regression layer, and then the network is trained to obtain a model for mapping image features to image quality, and the model is used to predict the image quality.

An image quality evaluation device combining the low-level vision and the high-level vision statistical characteristics comprises a computer readable storage medium and a processor, wherein the computer readable storage medium stores an executable program, and the executable program is executed by the processor to realize the image quality evaluation method combining the low-level vision and the high-level vision statistical characteristics.

A computer-readable storage medium storing an executable program which, when executed by a processor, implements the method for image quality assessment that combines low-level vision with high-level vision statistical features.

The invention has the beneficial effects that:

the invention provides an image quality evaluation method combining low-level vision and high-level vision statistical characteristics. In the method, statistical characteristics of low-level vision and high-level vision of an image are respectively extracted, then a mapping model of the extracted image characteristics to image quality is learned by utilizing a neural network, and the model is utilized to predict the image quality. The invention extracts the low-level and high-level characteristics by means of the low-level human vision characteristics and the activity of the high-level brain, learns the mapping from the vision characteristics to the image quality by utilizing a neural network, obtains a mapping model from the image characteristics to the image quality, and estimates the image quality.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

fig. 1 is a schematic diagram of an embodiment of an image quality evaluation method combining the statistical characteristics of the low-level vision and the high-level vision according to the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.

The invention provides an image quality evaluation method combining low-level vision and high-level vision statistical characteristics. In the method, statistical characteristics of low-level vision and high-level vision of an image are respectively extracted, then a mapping model of the extracted image characteristics to image quality is learned by utilizing a neural network, and the model is utilized to predict the image quality.

Fig. 1 is a schematic diagram of an embodiment of an image quality evaluation method combining the statistical characteristics of the low-level vision and the high-level vision according to the present invention. As shown in fig. 1, an embodiment of the present invention provides an image quality evaluation method combining low-level vision and high-level vision statistical features, the method including the following steps: s1, carrying out local normalization on the image, and extracting low-level visual statistical characteristics; s2, sparsely representing the image, calculating a representation residual error, and extracting high-level visual statistical characteristics; s3, training the artificial neural network, learning a mapping model of the image features extracted from steps S1 and S2 to the image quality for predicting the image quality. The invention extracts the low-level and high-level characteristics by means of the low-level human vision characteristics and the activity of the high-level brain, learns the mapping from the vision characteristics to the image quality by utilizing a neural network, obtains a mapping model from the image characteristics to the image quality, and estimates the image quality.

In some embodiments, the above image quality evaluation method combining the low-level vision and the high-level vision statistical features is implemented as follows:

in the embodiment of the present invention, first, the image is locally normalized to obtain a normalization coefficient, where the normalization coefficient of the image may be calculated as:

where I is the input image, (x, y) represents positional information,

the method includes that an image with normalized coefficients is represented, mu (x, y), sigma (x, y) are mean values and variance of a part with (x, y) as a center, the normalized coefficients are original image part-removed mean values, normalization is carried out by using the local mean values, and the calculation method of the mu (x, y) and the sigma (x, y) is as follows:

here, ω is { ω ═ ω_s,tS ═ S,.., S; t ═ T., T } represents a symmetric gaussian filter, the width of the local image block is 2S, the height is 2T, and both S and T take values of 16. Then, the interval [ -2,2 [ -2]The average is divided into 20 sub-intervals, the step size is 0.2, and then the number of pixels falling in each sub-interval is taken to form a 20-dimensional feature vector.

Then, sparsely representing the image (physiological research shows that the human visual system senses external visual signals in a sparse sensing mode), for the input image I, firstly extracting one image block to sparsely represent the image block, and assuming that the image block is sparsely represented

It has a size of

This process can be expressed as:

x_k＝R_k(I)

wherein R is_k(·) is an image block extraction operator that extracts an image block at position k, where k ═ 1,2, 3.

For image block x_kIt is in the dictionary

The sparse representation of (1) is to obtain a sparse vector

(α_kMost of the elements are 0 or close to 0) satisfies:

the first term is a fidelity term, the second term is a sparse constraint term, lambda is a constant and is used for balancing the proportion of the two terms, p is 0 or 1, and if p is 0, the sparse term represents non-in-coefficientThe number of 0 is consistent with the sparsity required by us, however, the optimization problem of 0 norm is non-convex and is difficult to solve, and the alternative solution is to set p to 1, so the above equation becomes the solution of convex optimization problem. Thus, p is set to 1. Solving the above formula by using Orthogonal Matching Pursuit (OMP) algorithm to obtain image block x_kIs sparse representation coefficient

X is then_kCan be expressed sparsely as

The sparse representation of the entire image I can be written as:

where I' represents a sparse representation of image I. Then, the representation residual is calculated, the interval is divided into 100 equal subintervals on the residual interval (-50, 50), the interval of each interval is 1, and then the number of pixels in each subinterval is taken as a feature to obtain a 100-dimensional feature vector.

Designing an artificial neural network comprising four layers, three hidden layers and a linear regression layer, wherein the bottom layer of the network is input with extracted image characteristics f₁,f₂,...f_nAnd the network outputs the image quality. The size of each hidden layer is 200,40,6, respectively.

Training a designed network in three steps, in the first step, pre-training each hidden layer by an unsupervised method, training each hidden layer as a sparse self-encoder, training each sparse self-encoder by utilizing an L-BFGS algorithm, setting the iteration number to be 1000, taking a sigmoid function as an activation function, and training each layer by using a loss function as follows:

wherein the content of the first and second substances,

and

here, W denotes the network weights, b denotes the hidden layer bias, h_W,b(. g) the output of each neuron of the hidden layer, p represents the average activation value,

representing the expected average activation value, ρ is set to 0.1, β is set to 3, γ is a weighted decay parameter set to 0.0001. the training loss function of the linear regression layer is:

where Y represents the output of the linear regression layer and Label represents the subjective score of the image. After training is completed, for a new image, features are extracted and input into the network, and the network outputs its quality score.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.

Claims

1. An image quality evaluation method combining low-level vision and high-level vision statistical characteristics is characterized by comprising the following steps:

2. The method for evaluating the image quality by combining the statistical characteristics of the low-level vision and the high-level vision as claimed in claim 1, wherein in step S1, the image is locally normalized by using the local mean and variance of the image to obtain a normalized coefficient image, and then a segment of the normalized coefficient distribution is intercepted as the low-level vision characteristic vector describing the image quality variation.

3. The method for evaluating image quality by combining statistic features of low-level vision and high-level vision according to claim 2, wherein said step S1 specifically comprises: carrying out local normalization on the image to obtain a local normalization coefficient; the normalized coefficients for an image may be calculated as:

where I is the input image, (x, y) represents positional information,

wherein ω ═ { ω ═ ω_s,tS ═ S,.., S; t ═ T., T } represents a symmetric gaussian filter, the width of the local image block is 2S, the height is 2T, and the values of S and T are both 16; then, the interval-2,2]The average is divided into 20 subintervals, the step size is 0.2, and then the number of the normalization coefficients falling in each subinterval is taken to form a 20-dimensional low-level visual feature vector.

4. The method for evaluating image quality according to any of claims 1 to 3, wherein in step S2, the input image is sparsely represented, then a representation residual is calculated, and a segment of distribution in the distribution of the residual is intercepted as a high-level visual feature vector describing the image quality variation.

5. The method for evaluating image quality by combining statistic features of low-level vision and high-level vision according to claim 4, wherein said step S2 specifically comprises: firstly, sparsely representing an image, and for an image I, firstly, extracting one image block to sparsely represent the image, wherein the assumption is that

It has a size of

The process is represented as:

x_k＝R_k(I)

wherein R is_k(·) is an image block extraction operator, which extracts an image block at a position k, where k is 1,2, 3.

For image block x_kIt is in the dictionary

The sparse representation of (1) is to obtain a sparse vector

α_kMost of the elements are 0 or close to 0, and satisfy the following conditions:

the first term is a fidelity term, the second term is a sparse constraint term, lambda is a constant and is used for balancing the proportion of the two terms, p is 0 or 1, p is preferably set to be 1, the above expression is changed into the solution of a convex optimization problem, the above expression is solved by using an Orthogonal Matching Pursuit (OMP) algorithm, and an image block x is obtained_kIs sparse representation coefficient

X is then_kIs expressed as

The sparse representation of the entire image I is:

wherein I' represents a sparse representation of image I; then calculating a sparse representation residual error, dividing the interval into 100 equal subintervals on the interval (-50, 50), wherein the interval of each interval is 1, and then taking the number of the values representing the residual error in each subinterval as a feature to obtain a 100-dimensional high-level visual feature vector.

6. The method for evaluating image quality according to any of claims 1 to 5, wherein in step S3, an artificial neural network with a four-layer structure is designed, comprising three hidden layers and a linear regression layer, and then the network is trained to obtain a model for mapping image characteristics to image quality, so as to use the model to predict the image quality.

7. The method for evaluating image quality by combining statistic features of low-level vision and high-level vision according to claim 6, wherein said step S3 specifically comprises: designing a neural network comprising four layers, three hidden layers and one linear regression layer, of the networkBottom layer input is extracted image feature f₁,f₂,...f_nNetwork output image quality; the sizes of the hidden layers of each layer are 200,40 and 6 respectively;

wherein the content of the first and second substances,

and

wherein W represents the network weight, b represents the hidden layer bias, h_w,b(. g) the output of each neuron of the hidden layer, p represents the average activation value,

representing the expected average activation value, p is set to 0.1, β is set to 3, lambda is a weight attenuation parameter and is set to 0.0001, and the training loss function of the linear regression layer is as follows:

wherein Y represents the output of the linear regression layer, and Label represents the subjective score of the image; after training is completed, for a new image, features are extracted and input into the network, and the network outputs its quality score.

8. An image quality evaluation apparatus combining low-level vision and high-level vision statistical features, comprising a computer-readable storage medium and a processor, wherein the computer-readable storage medium stores an executable program, and wherein the executable program, when executed by the processor, implements the image quality evaluation method combining low-level vision and high-level vision statistical features according to any one of claims 1 to 7.

9. A computer-readable storage medium storing an executable program, wherein the executable program, when executed by a processor, implements the method for evaluating image quality by combining low-level vision and high-level vision statistical features according to any one of claims 1 to 7.