CN115829971A - Fine-grained image blind quality evaluation method based on bilinear convolutional neural network - Google Patents

Fine-grained image blind quality evaluation method based on bilinear convolutional neural network Download PDF

Info

Publication number
CN115829971A
CN115829971A CN202211547291.7A CN202211547291A CN115829971A CN 115829971 A CN115829971 A CN 115829971A CN 202211547291 A CN202211547291 A CN 202211547291A CN 115829971 A CN115829971 A CN 115829971A
Authority
CN
China
Prior art keywords
image
fine
grained
representing
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211547291.7A
Other languages
Chinese (zh)
Inventor
方玉明
刘丽霞
鄢杰斌
姜文晖
王耀南
吴成中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Communication Terminal Industry Technology Research Institute Co ltd
Jiangxi University of Finance and Economics
Original Assignee
Jiangxi Communication Terminal Industry Technology Research Institute Co ltd
Jiangxi University of Finance and Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Communication Terminal Industry Technology Research Institute Co ltd, Jiangxi University of Finance and Economics filed Critical Jiangxi Communication Terminal Industry Technology Research Institute Co ltd
Priority to CN202211547291.7A priority Critical patent/CN115829971A/en
Publication of CN115829971A publication Critical patent/CN115829971A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network, which comprises the following steps of: carrying out image preprocessing on an original image; acquiring the average subjective score of all data in a fine-grained database; extracting information content sensitive to fine-grained features to obtain feature mapping; compressing the feature mapping to obtain a channel information descriptor, learning the channel information descriptor to obtain a feature vector, and performing channel multiplication calculation on the feature mapping and the feature vector to obtain a channel multiplication output feature; performing bilinear pooling on the channel multiplication output characteristics to obtain a fine-grained quality difference image; obtaining an evaluation score corresponding to the fine-grained quality difference image; and comparing the evaluation score corresponding to the fine-grained quality difference image with the average subjective score to obtain each test index. The method can effectively predict the image quality of fine-grained distortion difference, and has important significance for reducing the difference between objective image quality evaluation and practical application.

Description

Fine-grained image blind quality evaluation method based on bilinear convolutional neural network
Technical Field
The invention relates to the technical field of computer vision and multimedia digital image processing, in particular to a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network.
Background
Image quality evaluation plays an irreplaceable role in multimedia processing such as image acquisition, image compression, image restoration, image enhancement and the like, and is generally divided into subjective quality evaluation and objective quality evaluation. The subjective quality evaluation model is difficult to be embedded into a practical application program, and the objective quality evaluation model can be easily deployed in a practical application, such as a parameter tuner and a system optimizer. As the number of image quality evaluation databases increases, the images in most existing databases are coarse-grained distorted images, i.e., the images are easily recognized by humans, since two adjacent distortion levels in the database are set to be distinguishable. When the quality difference between distorted images is fine (also referred to as fine granularity), the image quality difference is also difficult for humans to distinguish. In addition, the existing quality evaluation model is designed according to a coarse-grained database, so that the fine-grained distortion characteristics cannot be well captured, and the development of many applications is limited. Therefore, the development of a blind quality evaluation model of the fine-grained image is very important for efficiently and accurately distinguishing the difference of the fine-grained image, and the method has practical value and academic research value.
In order to meet the high requirements of practical application on image quality identification, the fast and accurate prediction of the image quality is the key for the upgrade and optimization of the auxiliary multimedia processing technology. The existing image quality evaluation models are obtained by training in a coarse-grained database, and the effect is good according to the test index result.
However, the statistics on the existing coarse-grained databases mask the fine-grained differences, and these models cannot be directly used to estimate the visual quality of fine-grained distorted images. Based on this, it is necessary to provide an efficient and accurate image quality evaluation method for evaluating the quality of a fine-grained image.
Disclosure of Invention
In view of the above situation, the main objective of the present invention is to provide a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network, so as to solve the above technical problems.
The embodiment of the invention provides a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network, wherein the fine-grained image blind quality evaluation method is realized through a fine-grained image blind quality evaluation model, the fine-grained image blind quality evaluation model comprises a feature extraction module, a compression excitation module, a bilinear pooling module and a full connection layer, and the method comprises the following steps:
step one, obtaining an original image with fine granularity quality difference, and carrying out image preprocessing on the original image;
acquiring the average subjective score of all data in the fine-grained database by using a Bradley-Terry model;
thirdly, constructing a feature extraction module based on the convolutional layer sequence, and extracting information content sensitive to fine-grained features through the feature extraction module to obtain feature mapping;
step four, constructing a compression excitation module, inputting the feature mapping into the compression excitation module, performing global average pooling operation on the feature mapping to obtain a channel information descriptor through compression, learning the channel information descriptor by utilizing two full-connection layers to obtain a feature vector, and performing channel multiplication calculation on the feature mapping and the feature vector to obtain a channel multiplication output feature;
step five, performing bilinear pooling on the channel multiplication output characteristics by using a bilinear pooling module to obtain a fine-grained quality difference image, and inputting the fine-grained quality difference image to a full connection layer;
step six, obtaining an evaluation score corresponding to the fine-grained quality difference image through a full connection layer;
and seventhly, comparing the evaluation score corresponding to the fine-grained quality difference image with the average subjective score to obtain each test index.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network, which is characterized in that a constructed fine-grained image blind quality evaluation model can effectively predict the image quality of fine-grained distortion difference by designing a characteristic extraction module for extracting quality perception characteristics, a compression excitation module for improving characteristic representation capability and a bilinear pooling module for improving characteristic identification capability, and has important significance for reducing the difference between objective image quality evaluation and actual application.
The fine-grained image blind quality evaluation method based on the bilinear convolutional neural network comprises the following steps of:
scaling the image size of the original image to a uniform size for facilitating input of a model;
cutting the original image with the scaled size to obtain an image edge area and further obtain an image center square area;
and (4) performing data augmentation on the images after being cut in the training set by adopting random angle center rotation, random vertical turnover and random horizontal turnover so as to prevent overfitting.
The fine-grained image blind quality evaluation method based on the bilinear convolutional neural network is characterized in that in the second step, the expression of the Bradley-Terry model is as follows:
Figure BDA0003977317100000031
where γ (i) represents the preference probability of the ith image, S i Representing the raw score, S, of the ith image j Representing the raw score of the jth image, S representing the set of raw score scores, w i,j Shows the viewer's preference for the quality of the ith and jth images, w i,j =1 means that the viewer considers the i-th image to be of better quality than the j-th image, w i,j =0 indicates that the viewer considers the j-th image to be of better quality than the i-th image,
Figure BDA0003977317100000032
n (i, j) represents the frequency of the ith image with better quality than the jth image in the experiment, and N (j, i) represents the frequency of the jth image with better quality than the ith image in the experiment.
In the third step, the input features of the feature extraction module are represented as:
X=[x 1 ,x 2 ,...,x C ]
wherein X represents the input feature of the feature extraction module, X C The c channel representing the input features, X ∈ R H×W×C R represents a real number set, H represents the height of the feature, W represents the width of the feature, and C represents the number of channels;
the feature map finally obtained by the feature extraction module is expressed as:
U=[u 1 ,u 2 ,...,u c ]
wherein U represents a feature map;
Figure BDA0003977317100000041
wherein u is c Feature map, v, representing the c-th channel c Representing a 3-dimensional spatial kernel with c channels,
Figure BDA0003977317100000042
k channel, x, representing the k 2-dimensional spatial kernel and acting on the input features k K channel representing input features, representing convolution operation, c representing channel index number, c ∈ (1, C)]。
In the fourth step, the feature mapping is input into a compressed excitation module, and in the step of performing global average pooling operation on the feature mapping to obtain a channel information descriptor by compression, the following formula is corresponded to:
Figure BDA0003977317100000043
wherein z is c Denotes the c channel information descriptor, F sq (. Represents a compression operation, u c (m, n) represents the feature value of the mth row and nth column under the mth channel in the mapping feature u.
In the fourth step, in the step of learning the channel information descriptor by using two full connection layers to obtain the feature vector, the following formula is corresponded to:
S=F ex (z)=σ(δ(W 1 z)W 2 )
wherein S represents a feature vector, F ex (. -) represents the excitation operation, z represents the information descriptor, σ represents the Sigmoid activation function, δ represents the ReLU function, W 1 Weight matrix, W, representing the first fully-connected layer 2 A weight matrix representing the second fully-connected layer.
In the fourth step, in the step of performing channel multiplication calculation on the feature mapping and the feature vector to obtain a channel multiplication output feature, the following formula is corresponded to:
Figure BDA0003977317100000051
wherein the content of the first and second substances,
Figure BDA0003977317100000052
representing the channel multiplication output characteristic of the c-th channel, F scale (. To) denotes a channel multiplication operation between a feature vector and a feature map, S c Representing the feature vector of the c-th channel.
In the fifth step, in the step of performing bilinear pooling processing on the channel multiplication output characteristics by using a bilinear pooling module to obtain a fine-grained quality difference image, the following formula exists:
Figure BDA0003977317100000053
where bilinear (-) denotes bilinear pooling, f ε Representing the input obtained by resizing the multiplication output characteristics of the channel, N representing the spatial position set in the fine-grained quality difference image, epsilon representing the position sign, { epsilon | epsilon ∈ N }, and T representing the transposition.
The fine-grained image blind quality evaluation method based on the bilinear convolutional neural network comprises the following steps of:
clipping the fine-grained quality difference image to 224 multiplied by 224, using random gradient descent as an optimizer, setting the learning rate to 0.1, and setting the attenuation rate to 1e according to a weight attenuation strategy -5
Inputting an original image and a corresponding fine-grained quality difference image, using a Margin Ranking Loss function as a Loss function, and comparing losses in pairs to obtain a Loss value, wherein the corresponding formula is expressed as:
L(x 1 ,x 2 ,y)=max(0,-y*(x 1 -x 2 )+margin)
wherein, L (x) 1 ,x 2 Y) represents a Loss value of a Margin Ranking Loss function, margin represents a difference value of Ranking image quality scores, x 1 Representing first-order image information to be input, x 2 Indicating second-order image information to be input, y indicating a label of the comparison result, if x 1 >x 2 Then y =1, otherwise y =0.
The fine-grained image blind quality evaluation method based on the bilinear convolutional neural network is characterized in that in the seventh step, the test indexes of the fine-grained image blind quality evaluation model comprise a prediction monotonicity index, a prediction accuracy index and a prediction pair preference consistency index;
the prediction monotonicity index comprises a Kendel correlation coefficient and a Spireman correlation coefficient, wherein the Kendel correlation coefficient KRCC is expressed as:
Figure BDA0003977317100000061
wherein N is all Representing the number of images to be ranked, N c Representing the number of coincident pairs of predicted and subjective results, N d Representing the number of inconsistent pairs of predicted results and subjective results;
the spearman correlation coefficient SRCC is expressed as:
Figure BDA0003977317100000062
where N represents the number of distorted images in the data, d i Representing the difference between the subjective score and the objective prediction score of the ith image;
the prediction accuracy index comprises a Pearson correlation coefficient, and the Pearson correlation coefficient PLCC is expressed as:
Figure BDA0003977317100000071
wherein S is i Indicating the subjective score, p, of the ith image i Represents the objective prediction score of the ith image,
Figure BDA0003977317100000072
the mean value of the subjective scores is represented,
Figure BDA0003977317100000073
representing an objective prediction score average;
the predicted pairwise preference consistency indicator includes a pairwise preference consistency check coefficient, denoted as:
Figure BDA0003977317100000074
wherein M is c Indicating the number of pairs of images that are preferred to be identical, and M indicates the number of all pairs of images.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
FIG. 1 is a flow chart of a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network according to the present invention;
fig. 2 is a schematic diagram of a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network according to the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
These and other aspects of embodiments of the invention will be apparent with reference to the following description and attached drawings. In the description and drawings, particular embodiments of the invention have been disclosed in detail as being indicative of some of the ways in which the principles of the embodiments of the invention may be practiced, but it is understood that the scope of the embodiments of the invention is not limited correspondingly. On the contrary, the embodiments of the invention include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.
Referring to fig. 1 and fig. 2, the present invention provides a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network, wherein the method is implemented by a fine-grained image blind quality evaluation model, the fine-grained image blind quality evaluation model includes a feature extraction module, a compressed excitation module, a bilinear pooling module, and a full connection layer, and the method includes the following steps:
the method comprises the steps of firstly, obtaining an original image with fine granularity quality difference, and carrying out image preprocessing on the original image.
The original image of the invention comes from FG-IQA2018 database, and comprises 100 original images from Watero application database, and the resolution is from 400 × 400 to 723 × 480. And compressing to 3 distortion levels by using four JPEG compression methods, wherein the four JPEG compression methods correspond to low, medium and high bit rate scenes.
Before training of the blind quality evaluation model of the fine-grained image, image preprocessing needs to be carried out on an original image, and the corresponding method comprises the following steps:
scaling the image size of the original image to a uniform size for facilitating input of a model;
cutting the original image with the scaled size to obtain an image edge area and further obtain an image center square area;
and (4) performing data augmentation on the images after being cut in the training set by adopting random angle center rotation, random vertical turnover and random horizontal turnover so as to prevent overfitting.
And step two, acquiring the average subjective score of all data in the fine-grained database by using a Bradley-Terry model.
In this step, the expression of the Bradley-Terry model is:
Figure BDA0003977317100000081
where γ (i) represents the preference probability of the ith image, S i Representing the raw score, S, of the ith image j Representing the raw score of the jth image, S representing the set of raw score scores, w i,j Representing the viewer's quality preference for the ith and jth images, w i,j =1 indicates that the viewer considers the i-th image to be of better quality than the v-th image, w i,j =0 indicates that the viewer considers the j-th image to be of better quality than the i-th image,
Figure BDA0003977317100000091
n (i, j) represents the frequency of the ith image with better quality than the jth image in the experiment, and N (j, i) represents the frequency of the jth image with better quality than the ith image in the experiment.
And thirdly, constructing a feature extraction module based on the convolutional layer sequence, and extracting information content sensitive to fine-grained features through the feature extraction module to obtain feature mapping.
The feature extraction module comprises three convolution groups, wherein each convolution group comprises 2 or 3 convolution layers and a 3 multiplied by 3 filter; after all convolution operations, using ReLU as the activation function, this operation can significantly reduce the computational complexity. Since the receptive field of the human visual system is the main functional and structural unit of signal processing, the max-pooling layer is set only after the first two convolution groups to preserve more perceptual information.
In this step, the input features of the feature extraction module are represented as:
X=[x 1 ,x 2 ,...,x C ]
wherein X represents the output of the feature extraction moduleInto feature, x C The c channel representing the input features, X ∈ R H×W×C R represents a real number set, H represents the height of the feature, W represents the width of the feature, and C represents the number of channels;
the final feature map obtained by the feature extraction module is represented as:
U=[u 1 ,u 2 ,...,u c ]
wherein U represents a feature map;
Figure BDA0003977317100000101
wherein u is c Feature map, v, representing the c-th channel c Representing a 3-dimensional spatial kernel with c channels,
Figure BDA0003977317100000102
k channel, x, representing the k 2-dimensional spatial kernel and acting on the input features k K channel representing input feature, c represents channel index sequence number, c belongs to (1, C)]。
And fourthly, constructing a compression excitation module, inputting the feature mapping into the compression excitation module, performing global average pooling operation on the feature mapping to obtain a channel information descriptor through compression, learning the channel information descriptor by utilizing two full-connection layers to obtain a feature vector, and performing channel multiplication calculation on the feature mapping and the feature vector to obtain a channel multiplication output feature.
In this step, the feature map is input into the compressed excitation module, and in the step of performing global average pooling operation on the feature map to obtain the channel information descriptor by compression, the following formula is corresponded to:
Figure BDA0003977317100000103
wherein z is c Denotes the c channel information descriptor, F sq (. Smallcircle.) denotes a compression operation,u c (m, n) represents the feature value of the mth row and nth column under the c-th channel in the mapping feature u.
Further, in the step of learning the channel information descriptor by using two full-connected layers to obtain the feature vector, the following formula is corresponded to:
s=F ex (z)=σ(δ(W 1 z)W 2 )
wherein n represents a feature vector, F ex (. -) represents the excitation operation, z represents the information descriptor, σ represents the Sigmoid activation function, δ represents the ReLU function, W 1 Weight matrix, W, representing the first fully-connected layer 2 A weight matrix representing the second fully-connected layer.
Finally, in the step of performing channel multiplication calculation on the feature mapping and the feature vector to obtain the channel multiplication output feature, the following formula is corresponded to:
Figure BDA0003977317100000111
wherein the content of the first and second substances,
Figure BDA0003977317100000112
representing the channel multiplication output characteristic of the c-th channel, F scale (. To) denotes a channel multiplication operation between a feature vector and a feature map, S c Representing the feature vector of the c-th channel.
And fifthly, performing bilinear pooling on the channel multiplication output characteristics by using a bilinear pooling module to obtain a fine-grained quality difference image, and inputting the fine-grained quality difference image to a full connection layer.
In the fifth step, in the step of performing bilinear pooling processing on the channel multiplication output characteristics by using a bilinear pooling module to obtain a fine-grained quality difference image, the following formula exists:
Figure BDA0003977317100000113
wherein biliiner(. To) represent a bilinear pooling operation, f ε Representing the input obtained by resizing the multiplication output characteristics of the channel, N representing the spatial position set in the fine-grained quality difference image, epsilon representing the position sign, { epsilon | epsilon ∈ N }, and T representing the transposition.
And step six, obtaining the evaluation score corresponding to the fine-grained quality difference image through a full connection layer.
And seventhly, comparing the evaluation score corresponding to the fine-grained quality difference image with the average subjective score to obtain each test index.
In this step, the method for comparing the evaluation score corresponding to the fine-grained quality difference image with the average subjective score includes the following steps:
clipping the fine-grained quality difference image to 224 multiplied by 224, using random gradient descent as an optimizer, setting the learning rate to 0.1, and setting the attenuation rate to 1e according to a weight attenuation strategy -5
Inputting an original image and a corresponding fine-grained quality difference image, using a Margin Ranking Loss function as a Loss function, and comparing losses in pairs to obtain a Loss value, wherein the corresponding formula is expressed as:
L(x 1 ,x 2 ,y)=max(0,-y*(x 1 -x 2 )+margin)
wherein, L (x) 1 ,x 2 Y) represents a Loss value of a Margin Ranking Loss function, margin represents a difference value of Ranking image quality scores, x 1 Representing first-order image information, x, to be input 2 Indicating second-order image information to be input, y indicating a label of the comparison result, if x 1 >x 2 Then y =1, otherwise y =0.
Further, the test indexes of the blind quality evaluation model of the fine-grained image comprise a prediction monotonicity index, a prediction accuracy index and a prediction pair preference consistency index;
the prediction monotonicity index comprises a Kendel correlation coefficient and a Spireman correlation coefficient, wherein the Kendel correlation coefficient KRCC is expressed as:
Figure BDA0003977317100000121
wherein N is all Representing the number of images to be ranked, N c Representing the number of coincident pairs of predicted and subjective results, N d Representing the number of inconsistent pairs of predicted results and subjective results;
the spearman correlation coefficient SRCC is expressed as:
Figure BDA0003977317100000122
where N represents the number of distorted images in the data, d i Representing the difference between the subjective score and the objective prediction score of the ith image;
the prediction accuracy index comprises a Pearson correlation coefficient, and the Pearson correlation coefficient PLCC is expressed as:
Figure BDA0003977317100000123
wherein S is i Indicating the subjective score, p, of the ith image i Represents the objective prediction score of the ith image,
Figure BDA0003977317100000131
the mean value of the subjective scores is expressed,
Figure BDA0003977317100000132
representing an objective prediction score average;
the predicted pairwise preference consistency indicator includes a pairwise preference consistency check coefficient, denoted as:
Figure BDA0003977317100000133
wherein M is c Indicating the number of pairs of images that are preferred to be identical, and M indicates the number of all pairs of images.
The invention provides a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network, which aims to:
1. designing a specific feature extraction module to capture quality perception features, acquiring information content sensitive to fine-grained features, guiding a model to distinguish subtle differences among fine-grained images, enhancing the identifiability of the features and providing a meaningful feature extraction method for the field of image processing;
2. the interdependency of the modeling feature mapping channels is researched, the sensitivity of features to differences among images is improved, the development of an image processing model for deep learning is facilitated, and continuous optimization and upgrading of related practical applications are promoted.
Therefore, the blind image quality evaluation method for efficiently and accurately distinguishing fine-grained image differences can greatly promote the development of the image quality evaluation field and the development of the computer vision field.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network, which is characterized in that a constructed fine-grained image blind quality evaluation model can effectively predict the image quality of fine-grained distortion difference by designing a characteristic extraction module for extracting quality perception characteristics, a compression excitation module for improving characteristic representation capability and a bilinear pooling module for improving characteristic identification capability, and has important significance for reducing the difference between objective image quality evaluation and actual application.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A fine-grained image blind quality evaluation method based on a bilinear convolutional neural network is characterized by being achieved through a fine-grained image blind quality evaluation model, the fine-grained image blind quality evaluation model comprises a feature extraction module, a compression excitation module, a bilinear pooling module and a full connection layer, and the method comprises the following steps:
step one, obtaining an original image with fine-grained quality difference, and carrying out image preprocessing on the original image;
acquiring the average subjective score of all data in the fine-grained database by using a Bradley-Terry model;
thirdly, constructing a feature extraction module based on the convolutional layer sequence, and extracting information content sensitive to fine-grained features through the feature extraction module to obtain feature mapping;
step four, constructing a compression excitation module, inputting the feature mapping into the compression excitation module, performing global average pooling operation on the feature mapping to obtain a channel information descriptor through compression, learning the channel information descriptor by utilizing two full-connection layers to obtain a feature vector, and performing channel multiplication calculation on the feature mapping and the feature vector to obtain a channel multiplication output feature;
step five, performing bilinear pooling on the channel multiplication output characteristics by using a bilinear pooling module to obtain a fine-grained quality difference image, and inputting the fine-grained quality difference image to a full connection layer;
step six, obtaining an evaluation score corresponding to the fine-grained quality difference image through a full connection layer;
and seventhly, comparing the evaluation score corresponding to the fine-grained quality difference image with the average subjective score to obtain each test index.
2. The blind quality evaluation method for fine-grained images based on the bilinear convolutional neural network as claimed in claim 1, wherein in the step one, the method for preprocessing the original image comprises the following steps:
scaling the image size of the original image to a uniform size for facilitating input of a model;
cutting the original image with the scaled size to obtain an image edge area and further obtain an image center square area;
and (4) performing data augmentation on the images after being cut in the training set by adopting random angle center rotation, random vertical turnover and random horizontal turnover so as to prevent overfitting.
3. The fine-grained image blind quality evaluation method based on the bilinear convolutional neural network as claimed in claim 2, wherein in the second step, the expression of the Bradley-Terry model is as follows:
Figure FDA0003977317090000021
where γ (i) represents the preference probability of the ith image, S i Representing the raw score, S, of the ith image j Representing the raw score of the jth image, S representing the set of raw score scores, w i,j Representing the viewer's quality preference for the ith and jth images, w i,j =1 indicates that the viewer considers the i-th image to be of better quality than the j-th image, w i,j =0 indicates that the viewer considers the j-th image to be of better quality than the i-th image,
Figure FDA0003977317090000022
n (i, j) represents the frequency of the ith image with better quality than the jth image in the experiment, and N (j, i) represents the frequency of the jth image with better quality than the ith image in the experiment.
4. The fine-grained image blind quality evaluation method based on the bilinear convolutional neural network as claimed in claim 3, wherein in the third step, the input features of the feature extraction module are represented as:
X=[x 1 ,x 2 ,...,x C ]
wherein X represents the input feature of the feature extraction module, X C The c channel representing the input features, X ∈ R H×W×C R represents a real number set, H represents the height of the feature, W represents the width of the feature, and C represents the number of channels;
the final feature map obtained by the feature extraction module is represented as:
U=[u 1 ,u 2 ,...,u c ]
wherein U represents a feature map;
Figure FDA0003977317090000031
wherein u is c Feature map, v, representing the c-th channel c Representing a 3-dimensional spatial kernel with c channels,
Figure FDA0003977317090000032
k channel, x, representing the k 2-dimensional spatial kernel and acting on the input features k K channel representing input features, representing convolution operation, c representing channel index number, c ∈ (1, C)]。
5. The blind quality evaluation method for fine-grained images based on the bilinear convolutional neural network as claimed in claim 4, wherein in the fourth step, the feature map is input into a compressed excitation module, and in the step of performing global average pooling operation on the feature map to obtain the channel information descriptor through compression, the following formula corresponds to:
Figure FDA0003977317090000033
wherein z is c Denotes the c channel information descriptor, F sq (. Represents a compression operation, u c (m, n) represents the feature value of the mth row and nth column under the c-th channel in the mapping feature u.
6. The fine-grained image blind quality evaluation method based on the bilinear convolutional neural network as claimed in claim 5, wherein in the step four, the step of learning the channel information descriptor by using two full-connected layers to obtain the feature vector corresponds to the following formula:
S=F ex (z)=σ(δ(W 1 z)W 2 )
wherein S represents a feature vector, F ex (. -) represents the excitation operation, z represents the information descriptor, σ represents the Sigmoid activation function, δ represents the ReLU function, W 1 Weight matrix, W, representing the first fully-connected layer 2 A weight matrix representing the second fully-connected layer.
7. The blind quality evaluation method for fine-grained images based on the bilinear convolutional neural network as claimed in claim 6, wherein in the fourth step, in the step of performing channel multiplication calculation on the feature map and the feature vector to obtain the channel multiplication output feature, the following formula corresponds to:
Figure FDA0003977317090000041
wherein the content of the first and second substances,
Figure FDA0003977317090000042
representing the channel multiplication output characteristic of the c-th channel, F scale (. To) denotes a channel multiplication operation between a feature vector and a feature map, S c Representing the feature vector of the c-th channel.
8. The blind quality evaluation method for fine-grained images based on the bilinear convolutional neural network as claimed in claim 7, wherein in the fifth step, in the step of performing bilinear pooling on the channel multiplication output feature by using a bilinear pooling module to obtain the fine-grained quality difference image, the following formula exists:
Figure FDA0003977317090000043
where bilinear (-) represents a bilinear pooling operation, f ε Representing the input obtained by resizing the multiplication output characteristics of the channel, N representing the spatial position set in the fine-grained quality difference image, epsilon representing the position sign, { epsilon | epsilon ∈ N }, and T representing the transposition.
9. The blind quality evaluation method for the fine-grained image based on the bilinear convolutional neural network as claimed in claim 8, wherein in the seventh step, the method for comparing the evaluation score corresponding to the fine-grained quality difference image with the average subjective score comprises the following steps:
clipping the fine-grained quality difference image to 224 multiplied by 224, using random gradient descent as an optimizer, setting the learning rate to 0.1, and setting the attenuation rate to 1e according to a weight attenuation strategy -5
Inputting an original image and a corresponding fine-grained quality difference image, using a Margin Ranking Loss function as a Loss function, and comparing losses in pairs to obtain a Loss value, wherein the corresponding formula is expressed as:
L(x 1 ,x 2 ,y)=max(0,-y*(x 1 -x 2 )+margin)
wherein, L (x) 1 ,x 2 Y) represents a Loss value of a Margin Ranking Loss function, margin represents a difference value of Ranking image quality scores, x 1 Representing first-order image information, x, to be input 2 Indicating second-order image information to be input, y indicating a label of the comparison result, if x 1 >x 2 Then y =1, otherwise y =0.
10. The blind quality evaluation method of fine-grained image based on bilinear convolutional neural network of claim 9, wherein in the seventh step, the test indexes of the blind quality evaluation model of fine-grained image include a monotonicity prediction index, a prediction accuracy index and a pair-wise preference consistency prediction index;
the prediction monotonicity index comprises a Kendel correlation coefficient and a Spireman correlation coefficient, wherein the Kendel correlation coefficient KRCC is expressed as:
Figure FDA0003977317090000051
wherein N is all Representing the number of images to be ranked, N c Representing the number of coincident pairs of predicted and subjective results, N d Indicating the number of inconsistent pairs of predicted and subjective results;
The spearman correlation coefficient SRCC is expressed as:
Figure FDA0003977317090000052
where N represents the number of distorted images in the data, d i Representing the difference between the subjective score and the objective prediction score of the ith image;
the prediction accuracy index comprises a Pearson correlation coefficient, and the Pearson correlation coefficient PLCC is expressed as:
Figure FDA0003977317090000053
wherein S is i Indicating the subjective score, p, of the ith image i Represents the objective prediction score of the ith image,
Figure FDA0003977317090000061
the mean value of the subjective scores is expressed,
Figure FDA0003977317090000062
representing an objective prediction score average;
the predicted pairwise preference consistency index includes a pairwise preference consistency check coefficient, denoted as pairwise preference consistency check coefficient P _ test:
Figure FDA0003977317090000063
wherein M is c Indicating the number of pairs of images that are preferred to be identical, and M indicates the number of all pairs of images.
CN202211547291.7A 2022-12-02 2022-12-02 Fine-grained image blind quality evaluation method based on bilinear convolutional neural network Pending CN115829971A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211547291.7A CN115829971A (en) 2022-12-02 2022-12-02 Fine-grained image blind quality evaluation method based on bilinear convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211547291.7A CN115829971A (en) 2022-12-02 2022-12-02 Fine-grained image blind quality evaluation method based on bilinear convolutional neural network

Publications (1)

Publication Number Publication Date
CN115829971A true CN115829971A (en) 2023-03-21

Family

ID=85543956

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211547291.7A Pending CN115829971A (en) 2022-12-02 2022-12-02 Fine-grained image blind quality evaluation method based on bilinear convolutional neural network

Country Status (1)

Country Link
CN (1) CN115829971A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180286032A1 (en) * 2017-04-04 2018-10-04 Board Of Regents, The University Of Texas System Assessing quality of images or videos using a two-stage quality assessment
CN113111940A (en) * 2021-04-13 2021-07-13 东南大学 Expression recognition method based on feature fusion
CN114549492A (en) * 2022-02-27 2022-05-27 北京工业大学 Quality evaluation method based on multi-granularity image information content

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180286032A1 (en) * 2017-04-04 2018-10-04 Board Of Regents, The University Of Texas System Assessing quality of images or videos using a two-stage quality assessment
CN113111940A (en) * 2021-04-13 2021-07-13 东南大学 Expression recognition method based on feature fusion
CN114549492A (en) * 2022-02-27 2022-05-27 北京工业大学 Quality evaluation method based on multi-granularity image information content

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIXIA LIU 等: "Bilinear CNNs for Blind Quality Assessment of Fine-Grained Images", 《2022 IEEE 24TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP)》, pages 1 - 6 *
张维夏: "基于特征聚合和数据驱动的盲图像质量评价", 《中国博士学位论文全文数据库》, pages 138 - 45 *

Similar Documents

Publication Publication Date Title
CN110189334B (en) Medical image segmentation method of residual error type full convolution neural network based on attention mechanism
CN110021425B (en) Comparison detector, construction method thereof and cervical cancer cell detection method
CN111461232A (en) Nuclear magnetic resonance image classification method based on multi-strategy batch type active learning
CN111931931B (en) Deep neural network training method and device for pathology full-field image
CN112116605A (en) Pancreas CT image segmentation method based on integrated depth convolution neural network
CN114897779B (en) Cervical cytology image abnormal region positioning method and device based on fusion attention
CN108875076B (en) Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network
CN113378796B (en) Cervical cell full-section classification method based on context modeling
CN115018824A (en) Colonoscope polyp image segmentation method based on CNN and Transformer fusion
CN113610144A (en) Vehicle classification method based on multi-branch local attention network
CN110751644B (en) Road surface crack detection method
CN113112446A (en) Tunnel surrounding rock level intelligent judgment method based on residual convolutional neural network
CN110738663A (en) Double-domain adaptive module pyramid network and unsupervised domain adaptive image segmentation method
CN113706544B (en) Medical image segmentation method based on complete attention convolutional neural network
CN110930378A (en) Emphysema image processing method and system based on low data demand
CN114266757A (en) Diabetic retinopathy classification method based on multi-scale fusion attention mechanism
CN116525075A (en) Thyroid nodule computer-aided diagnosis method and system based on few sample learning
CN114495210A (en) Posture change face recognition method based on attention mechanism
CN114140437A (en) Fundus hard exudate segmentation method based on deep learning
CN113469961A (en) Neural network-based carpal tunnel image segmentation method and system
CN110992309B (en) Fundus image segmentation method based on deep information transfer network
CN115829971A (en) Fine-grained image blind quality evaluation method based on bilinear convolutional neural network
CN113971764B (en) Remote sensing image small target detection method based on improvement YOLOv3
CN113192076B (en) MRI brain tumor image segmentation method combining classification prediction and multi-scale feature extraction
CN115063602A (en) Crop pest and disease identification method based on improved YOLOX-S network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination