CN115829971A

CN115829971A - Fine-grained image blind quality evaluation method based on bilinear convolutional neural network

Info

Publication number: CN115829971A
Application number: CN202211547291.7A
Authority: CN
Inventors: 方玉明; 刘丽霞; 鄢杰斌; 姜文晖; 王耀南; 吴成中
Original assignee: Jiangxi Communication Terminal Industry Technology Research Institute Co ltd; Jiangxi University of Finance and Economics
Current assignee: Jiangxi Communication Terminal Industry Technology Research Institute Co ltd; Jiangxi University of Finance and Economics
Priority date: 2022-12-02
Filing date: 2022-12-02
Publication date: 2023-03-21

Abstract

The invention provides a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network, which comprises the following steps of: carrying out image preprocessing on an original image; acquiring the average subjective score of all data in a fine-grained database; extracting information content sensitive to fine-grained features to obtain feature mapping; compressing the feature mapping to obtain a channel information descriptor, learning the channel information descriptor to obtain a feature vector, and performing channel multiplication calculation on the feature mapping and the feature vector to obtain a channel multiplication output feature; performing bilinear pooling on the channel multiplication output characteristics to obtain a fine-grained quality difference image; obtaining an evaluation score corresponding to the fine-grained quality difference image; and comparing the evaluation score corresponding to the fine-grained quality difference image with the average subjective score to obtain each test index. The method can effectively predict the image quality of fine-grained distortion difference, and has important significance for reducing the difference between objective image quality evaluation and practical application.

Description

Fine-grained image blind quality evaluation method based on bilinear convolutional neural network

Technical Field

The invention relates to the technical field of computer vision and multimedia digital image processing, in particular to a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network.

Background

Image quality evaluation plays an irreplaceable role in multimedia processing such as image acquisition, image compression, image restoration, image enhancement and the like, and is generally divided into subjective quality evaluation and objective quality evaluation. The subjective quality evaluation model is difficult to be embedded into a practical application program, and the objective quality evaluation model can be easily deployed in a practical application, such as a parameter tuner and a system optimizer. As the number of image quality evaluation databases increases, the images in most existing databases are coarse-grained distorted images, i.e., the images are easily recognized by humans, since two adjacent distortion levels in the database are set to be distinguishable. When the quality difference between distorted images is fine (also referred to as fine granularity), the image quality difference is also difficult for humans to distinguish. In addition, the existing quality evaluation model is designed according to a coarse-grained database, so that the fine-grained distortion characteristics cannot be well captured, and the development of many applications is limited. Therefore, the development of a blind quality evaluation model of the fine-grained image is very important for efficiently and accurately distinguishing the difference of the fine-grained image, and the method has practical value and academic research value.

In order to meet the high requirements of practical application on image quality identification, the fast and accurate prediction of the image quality is the key for the upgrade and optimization of the auxiliary multimedia processing technology. The existing image quality evaluation models are obtained by training in a coarse-grained database, and the effect is good according to the test index result.

However, the statistics on the existing coarse-grained databases mask the fine-grained differences, and these models cannot be directly used to estimate the visual quality of fine-grained distorted images. Based on this, it is necessary to provide an efficient and accurate image quality evaluation method for evaluating the quality of a fine-grained image.

Disclosure of Invention

In view of the above situation, the main objective of the present invention is to provide a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network, so as to solve the above technical problems.

The embodiment of the invention provides a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network, wherein the fine-grained image blind quality evaluation method is realized through a fine-grained image blind quality evaluation model, the fine-grained image blind quality evaluation model comprises a feature extraction module, a compression excitation module, a bilinear pooling module and a full connection layer, and the method comprises the following steps:

step one, obtaining an original image with fine granularity quality difference, and carrying out image preprocessing on the original image;

acquiring the average subjective score of all data in the fine-grained database by using a Bradley-Terry model;

thirdly, constructing a feature extraction module based on the convolutional layer sequence, and extracting information content sensitive to fine-grained features through the feature extraction module to obtain feature mapping;

step four, constructing a compression excitation module, inputting the feature mapping into the compression excitation module, performing global average pooling operation on the feature mapping to obtain a channel information descriptor through compression, learning the channel information descriptor by utilizing two full-connection layers to obtain a feature vector, and performing channel multiplication calculation on the feature mapping and the feature vector to obtain a channel multiplication output feature;

step five, performing bilinear pooling on the channel multiplication output characteristics by using a bilinear pooling module to obtain a fine-grained quality difference image, and inputting the fine-grained quality difference image to a full connection layer;

step six, obtaining an evaluation score corresponding to the fine-grained quality difference image through a full connection layer;

and seventhly, comparing the evaluation score corresponding to the fine-grained quality difference image with the average subjective score to obtain each test index.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network, which is characterized in that a constructed fine-grained image blind quality evaluation model can effectively predict the image quality of fine-grained distortion difference by designing a characteristic extraction module for extracting quality perception characteristics, a compression excitation module for improving characteristic representation capability and a bilinear pooling module for improving characteristic identification capability, and has important significance for reducing the difference between objective image quality evaluation and actual application.

The fine-grained image blind quality evaluation method based on the bilinear convolutional neural network comprises the following steps of:

scaling the image size of the original image to a uniform size for facilitating input of a model;

cutting the original image with the scaled size to obtain an image edge area and further obtain an image center square area;

and (4) performing data augmentation on the images after being cut in the training set by adopting random angle center rotation, random vertical turnover and random horizontal turnover so as to prevent overfitting.

The fine-grained image blind quality evaluation method based on the bilinear convolutional neural network is characterized in that in the second step, the expression of the Bradley-Terry model is as follows:

where γ (i) represents the preference probability of the ith image, S _i Representing the raw score, S, of the ith image _j Representing the raw score of the jth image, S representing the set of raw score scores, w _i，j Shows the viewer's preference for the quality of the ith and jth images, w _i，j =1 means that the viewer considers the i-th image to be of better quality than the j-th image, w _i，j =0 indicates that the viewer considers the j-th image to be of better quality than the i-th image,

n (i, j) represents the frequency of the ith image with better quality than the jth image in the experiment, and N (j, i) represents the frequency of the jth image with better quality than the ith image in the experiment.

In the third step, the input features of the feature extraction module are represented as:

X＝[x ¹ ，x ² ，...，x ^C ]

wherein X represents the input feature of the feature extraction module, X ^C The c channel representing the input features, X ∈ R ^H×W×C R represents a real number set, H represents the height of the feature, W represents the width of the feature, and C represents the number of channels;

the feature map finally obtained by the feature extraction module is expressed as:

U＝[u ₁ ，u ₂ ，...，u _c ]

wherein U represents a feature map;

wherein u is _c Feature map, v, representing the c-th channel _c Representing a 3-dimensional spatial kernel with c channels,

k channel, x, representing the k 2-dimensional spatial kernel and acting on the input features ^k K channel representing input features, representing convolution operation, c representing channel index number, c ∈ (1, C)]。

In the fourth step, the feature mapping is input into a compressed excitation module, and in the step of performing global average pooling operation on the feature mapping to obtain a channel information descriptor by compression, the following formula is corresponded to:

wherein z is _c Denotes the c channel information descriptor, F _sq (. Represents a compression operation, u _c (m, n) represents the feature value of the mth row and nth column under the mth channel in the mapping feature u.

In the fourth step, in the step of learning the channel information descriptor by using two full connection layers to obtain the feature vector, the following formula is corresponded to:

S＝F _ex (z)＝σ(δ(W ₁ z)W ₂ )

wherein S represents a feature vector, F _ex (. -) represents the excitation operation, z represents the information descriptor, σ represents the Sigmoid activation function, δ represents the ReLU function, W ₁ Weight matrix, W, representing the first fully-connected layer ₂ A weight matrix representing the second fully-connected layer.

In the fourth step, in the step of performing channel multiplication calculation on the feature mapping and the feature vector to obtain a channel multiplication output feature, the following formula is corresponded to:

wherein the content of the first and second substances,

representing the channel multiplication output characteristic of the c-th channel, F _scale (. To) denotes a channel multiplication operation between a feature vector and a feature map, S _c Representing the feature vector of the c-th channel.

In the fifth step, in the step of performing bilinear pooling processing on the channel multiplication output characteristics by using a bilinear pooling module to obtain a fine-grained quality difference image, the following formula exists:

where bilinear (-) denotes bilinear pooling, f _ε Representing the input obtained by resizing the multiplication output characteristics of the channel, N representing the spatial position set in the fine-grained quality difference image, epsilon representing the position sign, { epsilon | epsilon ∈ N }, and T representing the transposition.

clipping the fine-grained quality difference image to 224 multiplied by 224, using random gradient descent as an optimizer, setting the learning rate to 0.1, and setting the attenuation rate to 1e according to a weight attenuation strategy ^-5 ；

Inputting an original image and a corresponding fine-grained quality difference image, using a Margin Ranking Loss function as a Loss function, and comparing losses in pairs to obtain a Loss value, wherein the corresponding formula is expressed as:

L(x ₁ ，x ₂ ，y)＝max(0，-y*(x ₁ -x ₂ )+margin)

wherein, L (x) ₁ ，x ₂ Y) represents a Loss value of a Margin Ranking Loss function, margin represents a difference value of Ranking image quality scores, x ₁ Representing first-order image information to be input, x ₂ Indicating second-order image information to be input, y indicating a label of the comparison result, if x ₁ ＞x ₂ Then y =1, otherwise y =0.

The fine-grained image blind quality evaluation method based on the bilinear convolutional neural network is characterized in that in the seventh step, the test indexes of the fine-grained image blind quality evaluation model comprise a prediction monotonicity index, a prediction accuracy index and a prediction pair preference consistency index;

the prediction monotonicity index comprises a Kendel correlation coefficient and a Spireman correlation coefficient, wherein the Kendel correlation coefficient KRCC is expressed as:

wherein N is _all Representing the number of images to be ranked, N _c Representing the number of coincident pairs of predicted and subjective results, N _d Representing the number of inconsistent pairs of predicted results and subjective results;

the spearman correlation coefficient SRCC is expressed as:

where N represents the number of distorted images in the data, d _i Representing the difference between the subjective score and the objective prediction score of the ith image;

the prediction accuracy index comprises a Pearson correlation coefficient, and the Pearson correlation coefficient PLCC is expressed as:

wherein S is _i Indicating the subjective score, p, of the ith image _i Represents the objective prediction score of the ith image,

the mean value of the subjective scores is represented,

representing an objective prediction score average;

the predicted pairwise preference consistency indicator includes a pairwise preference consistency check coefficient, denoted as:

wherein M is _c Indicating the number of pairs of images that are preferred to be identical, and M indicates the number of all pairs of images.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

FIG. 1 is a flow chart of a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network according to the present invention;

fig. 2 is a schematic diagram of a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network according to the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

These and other aspects of embodiments of the invention will be apparent with reference to the following description and attached drawings. In the description and drawings, particular embodiments of the invention have been disclosed in detail as being indicative of some of the ways in which the principles of the embodiments of the invention may be practiced, but it is understood that the scope of the embodiments of the invention is not limited correspondingly. On the contrary, the embodiments of the invention include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.

Referring to fig. 1 and fig. 2, the present invention provides a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network, wherein the method is implemented by a fine-grained image blind quality evaluation model, the fine-grained image blind quality evaluation model includes a feature extraction module, a compressed excitation module, a bilinear pooling module, and a full connection layer, and the method includes the following steps:

the method comprises the steps of firstly, obtaining an original image with fine granularity quality difference, and carrying out image preprocessing on the original image.

The original image of the invention comes from FG-IQA2018 database, and comprises 100 original images from Watero application database, and the resolution is from 400 × 400 to 723 × 480. And compressing to 3 distortion levels by using four JPEG compression methods, wherein the four JPEG compression methods correspond to low, medium and high bit rate scenes.

Before training of the blind quality evaluation model of the fine-grained image, image preprocessing needs to be carried out on an original image, and the corresponding method comprises the following steps:

And step two, acquiring the average subjective score of all data in the fine-grained database by using a Bradley-Terry model.

In this step, the expression of the Bradley-Terry model is:

where γ (i) represents the preference probability of the ith image, S _i Representing the raw score, S, of the ith image _j Representing the raw score of the jth image, S representing the set of raw score scores, w _i，j Representing the viewer's quality preference for the ith and jth images, w _i，j =1 indicates that the viewer considers the i-th image to be of better quality than the v-th image, w _i，j =0 indicates that the viewer considers the j-th image to be of better quality than the i-th image,

And thirdly, constructing a feature extraction module based on the convolutional layer sequence, and extracting information content sensitive to fine-grained features through the feature extraction module to obtain feature mapping.

The feature extraction module comprises three convolution groups, wherein each convolution group comprises 2 or 3 convolution layers and a 3 multiplied by 3 filter; after all convolution operations, using ReLU as the activation function, this operation can significantly reduce the computational complexity. Since the receptive field of the human visual system is the main functional and structural unit of signal processing, the max-pooling layer is set only after the first two convolution groups to preserve more perceptual information.

In this step, the input features of the feature extraction module are represented as:

X＝[x ¹ ，x ² ，...，x ^C ]

wherein X represents the output of the feature extraction moduleInto feature, x ^C The c channel representing the input features, X ∈ R ^H×W×C R represents a real number set, H represents the height of the feature, W represents the width of the feature, and C represents the number of channels;

the final feature map obtained by the feature extraction module is represented as:

U＝[u ₁ ，u ₂ ，...，u _c ]

wherein U represents a feature map;

k channel, x, representing the k 2-dimensional spatial kernel and acting on the input features ^k K channel representing input feature, c represents channel index sequence number, c belongs to (1, C)]。

And fourthly, constructing a compression excitation module, inputting the feature mapping into the compression excitation module, performing global average pooling operation on the feature mapping to obtain a channel information descriptor through compression, learning the channel information descriptor by utilizing two full-connection layers to obtain a feature vector, and performing channel multiplication calculation on the feature mapping and the feature vector to obtain a channel multiplication output feature.

In this step, the feature map is input into the compressed excitation module, and in the step of performing global average pooling operation on the feature map to obtain the channel information descriptor by compression, the following formula is corresponded to:

wherein z is _c Denotes the c channel information descriptor, F _sq (. Smallcircle.) denotes a compression operation，u _c (m, n) represents the feature value of the mth row and nth column under the c-th channel in the mapping feature u.

Further, in the step of learning the channel information descriptor by using two full-connected layers to obtain the feature vector, the following formula is corresponded to:

s＝F _ex (z)＝σ(δ(W ₁ z)W ₂ )

wherein n represents a feature vector, F _ex (. -) represents the excitation operation, z represents the information descriptor, σ represents the Sigmoid activation function, δ represents the ReLU function, W ₁ Weight matrix, W, representing the first fully-connected layer ₂ A weight matrix representing the second fully-connected layer.

Finally, in the step of performing channel multiplication calculation on the feature mapping and the feature vector to obtain the channel multiplication output feature, the following formula is corresponded to:

wherein the content of the first and second substances,

And fifthly, performing bilinear pooling on the channel multiplication output characteristics by using a bilinear pooling module to obtain a fine-grained quality difference image, and inputting the fine-grained quality difference image to a full connection layer.

wherein biliiner(. To) represent a bilinear pooling operation, f _ε Representing the input obtained by resizing the multiplication output characteristics of the channel, N representing the spatial position set in the fine-grained quality difference image, epsilon representing the position sign, { epsilon | epsilon ∈ N }, and T representing the transposition.

And step six, obtaining the evaluation score corresponding to the fine-grained quality difference image through a full connection layer.

In this step, the method for comparing the evaluation score corresponding to the fine-grained quality difference image with the average subjective score includes the following steps:

L(x ₁ ，x ₂ ，y)＝max(0，-y*(x ₁ -x ₂ )+margin)

wherein, L (x) ₁ ，x ₂ Y) represents a Loss value of a Margin Ranking Loss function, margin represents a difference value of Ranking image quality scores, x ₁ Representing first-order image information, x, to be input ₂ Indicating second-order image information to be input, y indicating a label of the comparison result, if x ₁ ＞x ₂ Then y =1, otherwise y =0.

Further, the test indexes of the blind quality evaluation model of the fine-grained image comprise a prediction monotonicity index, a prediction accuracy index and a prediction pair preference consistency index;

the spearman correlation coefficient SRCC is expressed as:

the mean value of the subjective scores is expressed,

representing an objective prediction score average;

The invention provides a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network, which aims to:

1. designing a specific feature extraction module to capture quality perception features, acquiring information content sensitive to fine-grained features, guiding a model to distinguish subtle differences among fine-grained images, enhancing the identifiability of the features and providing a meaningful feature extraction method for the field of image processing;

2. the interdependency of the modeling feature mapping channels is researched, the sensitivity of features to differences among images is improved, the development of an image processing model for deep learning is facilitated, and continuous optimization and upgrading of related practical applications are promoted.

Therefore, the blind image quality evaluation method for efficiently and accurately distinguishing fine-grained image differences can greatly promote the development of the image quality evaluation field and the development of the computer vision field.

Compared with the prior art, the invention has the beneficial effects that:

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A fine-grained image blind quality evaluation method based on a bilinear convolutional neural network is characterized by being achieved through a fine-grained image blind quality evaluation model, the fine-grained image blind quality evaluation model comprises a feature extraction module, a compression excitation module, a bilinear pooling module and a full connection layer, and the method comprises the following steps:

step one, obtaining an original image with fine-grained quality difference, and carrying out image preprocessing on the original image;

2. The blind quality evaluation method for fine-grained images based on the bilinear convolutional neural network as claimed in claim 1, wherein in the step one, the method for preprocessing the original image comprises the following steps:

3. The fine-grained image blind quality evaluation method based on the bilinear convolutional neural network as claimed in claim 2, wherein in the second step, the expression of the Bradley-Terry model is as follows:

where γ (i) represents the preference probability of the ith image, S _i Representing the raw score, S, of the ith image _j Representing the raw score of the jth image, S representing the set of raw score scores, w _i，j Representing the viewer's quality preference for the ith and jth images, w _i，j =1 indicates that the viewer considers the i-th image to be of better quality than the j-th image, w _i，j =0 indicates that the viewer considers the j-th image to be of better quality than the i-th image,

4. The fine-grained image blind quality evaluation method based on the bilinear convolutional neural network as claimed in claim 3, wherein in the third step, the input features of the feature extraction module are represented as:

X＝[x ¹ ，x ² ，...，x ^C ]

U＝[u ₁ ，u ₂ ，...，u _c ]

wherein U represents a feature map;

5. The blind quality evaluation method for fine-grained images based on the bilinear convolutional neural network as claimed in claim 4, wherein in the fourth step, the feature map is input into a compressed excitation module, and in the step of performing global average pooling operation on the feature map to obtain the channel information descriptor through compression, the following formula corresponds to:

wherein z is _c Denotes the c channel information descriptor, F _sq (. Represents a compression operation, u _c (m, n) represents the feature value of the mth row and nth column under the c-th channel in the mapping feature u.

6. The fine-grained image blind quality evaluation method based on the bilinear convolutional neural network as claimed in claim 5, wherein in the step four, the step of learning the channel information descriptor by using two full-connected layers to obtain the feature vector corresponds to the following formula:

S＝F _ex (z)＝σ(δ(W ₁ z)W ₂ )

7. The blind quality evaluation method for fine-grained images based on the bilinear convolutional neural network as claimed in claim 6, wherein in the fourth step, in the step of performing channel multiplication calculation on the feature map and the feature vector to obtain the channel multiplication output feature, the following formula corresponds to:

wherein the content of the first and second substances,

8. The blind quality evaluation method for fine-grained images based on the bilinear convolutional neural network as claimed in claim 7, wherein in the fifth step, in the step of performing bilinear pooling on the channel multiplication output feature by using a bilinear pooling module to obtain the fine-grained quality difference image, the following formula exists:

where bilinear (-) represents a bilinear pooling operation, f _ε Representing the input obtained by resizing the multiplication output characteristics of the channel, N representing the spatial position set in the fine-grained quality difference image, epsilon representing the position sign, { epsilon | epsilon ∈ N }, and T representing the transposition.

9. The blind quality evaluation method for the fine-grained image based on the bilinear convolutional neural network as claimed in claim 8, wherein in the seventh step, the method for comparing the evaluation score corresponding to the fine-grained quality difference image with the average subjective score comprises the following steps:

L(x ₁ ，x ₂ ，y)＝max(0，-y*(x ₁ -x ₂ )+margin)

10. The blind quality evaluation method of fine-grained image based on bilinear convolutional neural network of claim 9, wherein in the seventh step, the test indexes of the blind quality evaluation model of fine-grained image include a monotonicity prediction index, a prediction accuracy index and a pair-wise preference consistency prediction index;

wherein N is _all Representing the number of images to be ranked, N _c Representing the number of coincident pairs of predicted and subjective results, N _d Indicating the number of inconsistent pairs of predicted and subjective results；

The spearman correlation coefficient SRCC is expressed as:

the mean value of the subjective scores is expressed,

representing an objective prediction score average;

the predicted pairwise preference consistency index includes a pairwise preference consistency check coefficient, denoted as pairwise preference consistency check coefficient P _ test: