CN115829971A - Fine-grained image blind quality evaluation method based on bilinear convolutional neural network - Google Patents
Fine-grained image blind quality evaluation method based on bilinear convolutional neural network Download PDFInfo
- Publication number
- CN115829971A CN115829971A CN202211547291.7A CN202211547291A CN115829971A CN 115829971 A CN115829971 A CN 115829971A CN 202211547291 A CN202211547291 A CN 202211547291A CN 115829971 A CN115829971 A CN 115829971A
- Authority
- CN
- China
- Prior art keywords
- image
- fine
- grained
- representing
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Image Analysis (AREA)
Abstract
The invention provides a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network, which comprises the following steps of: carrying out image preprocessing on an original image; acquiring the average subjective score of all data in a fine-grained database; extracting information content sensitive to fine-grained features to obtain feature mapping; compressing the feature mapping to obtain a channel information descriptor, learning the channel information descriptor to obtain a feature vector, and performing channel multiplication calculation on the feature mapping and the feature vector to obtain a channel multiplication output feature; performing bilinear pooling on the channel multiplication output characteristics to obtain a fine-grained quality difference image; obtaining an evaluation score corresponding to the fine-grained quality difference image; and comparing the evaluation score corresponding to the fine-grained quality difference image with the average subjective score to obtain each test index. The method can effectively predict the image quality of fine-grained distortion difference, and has important significance for reducing the difference between objective image quality evaluation and practical application.
Description
Technical Field
The invention relates to the technical field of computer vision and multimedia digital image processing, in particular to a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network.
Background
Image quality evaluation plays an irreplaceable role in multimedia processing such as image acquisition, image compression, image restoration, image enhancement and the like, and is generally divided into subjective quality evaluation and objective quality evaluation. The subjective quality evaluation model is difficult to be embedded into a practical application program, and the objective quality evaluation model can be easily deployed in a practical application, such as a parameter tuner and a system optimizer. As the number of image quality evaluation databases increases, the images in most existing databases are coarse-grained distorted images, i.e., the images are easily recognized by humans, since two adjacent distortion levels in the database are set to be distinguishable. When the quality difference between distorted images is fine (also referred to as fine granularity), the image quality difference is also difficult for humans to distinguish. In addition, the existing quality evaluation model is designed according to a coarse-grained database, so that the fine-grained distortion characteristics cannot be well captured, and the development of many applications is limited. Therefore, the development of a blind quality evaluation model of the fine-grained image is very important for efficiently and accurately distinguishing the difference of the fine-grained image, and the method has practical value and academic research value.
In order to meet the high requirements of practical application on image quality identification, the fast and accurate prediction of the image quality is the key for the upgrade and optimization of the auxiliary multimedia processing technology. The existing image quality evaluation models are obtained by training in a coarse-grained database, and the effect is good according to the test index result.
However, the statistics on the existing coarse-grained databases mask the fine-grained differences, and these models cannot be directly used to estimate the visual quality of fine-grained distorted images. Based on this, it is necessary to provide an efficient and accurate image quality evaluation method for evaluating the quality of a fine-grained image.
Disclosure of Invention
In view of the above situation, the main objective of the present invention is to provide a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network, so as to solve the above technical problems.
The embodiment of the invention provides a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network, wherein the fine-grained image blind quality evaluation method is realized through a fine-grained image blind quality evaluation model, the fine-grained image blind quality evaluation model comprises a feature extraction module, a compression excitation module, a bilinear pooling module and a full connection layer, and the method comprises the following steps:
step one, obtaining an original image with fine granularity quality difference, and carrying out image preprocessing on the original image;
acquiring the average subjective score of all data in the fine-grained database by using a Bradley-Terry model;
thirdly, constructing a feature extraction module based on the convolutional layer sequence, and extracting information content sensitive to fine-grained features through the feature extraction module to obtain feature mapping;
step four, constructing a compression excitation module, inputting the feature mapping into the compression excitation module, performing global average pooling operation on the feature mapping to obtain a channel information descriptor through compression, learning the channel information descriptor by utilizing two full-connection layers to obtain a feature vector, and performing channel multiplication calculation on the feature mapping and the feature vector to obtain a channel multiplication output feature;
step five, performing bilinear pooling on the channel multiplication output characteristics by using a bilinear pooling module to obtain a fine-grained quality difference image, and inputting the fine-grained quality difference image to a full connection layer;
step six, obtaining an evaluation score corresponding to the fine-grained quality difference image through a full connection layer;
and seventhly, comparing the evaluation score corresponding to the fine-grained quality difference image with the average subjective score to obtain each test index.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network, which is characterized in that a constructed fine-grained image blind quality evaluation model can effectively predict the image quality of fine-grained distortion difference by designing a characteristic extraction module for extracting quality perception characteristics, a compression excitation module for improving characteristic representation capability and a bilinear pooling module for improving characteristic identification capability, and has important significance for reducing the difference between objective image quality evaluation and actual application.
The fine-grained image blind quality evaluation method based on the bilinear convolutional neural network comprises the following steps of:
scaling the image size of the original image to a uniform size for facilitating input of a model;
cutting the original image with the scaled size to obtain an image edge area and further obtain an image center square area;
and (4) performing data augmentation on the images after being cut in the training set by adopting random angle center rotation, random vertical turnover and random horizontal turnover so as to prevent overfitting.
The fine-grained image blind quality evaluation method based on the bilinear convolutional neural network is characterized in that in the second step, the expression of the Bradley-Terry model is as follows:
where γ (i) represents the preference probability of the ith image, S i Representing the raw score, S, of the ith image j Representing the raw score of the jth image, S representing the set of raw score scores, w i,j Shows the viewer's preference for the quality of the ith and jth images, w i,j =1 means that the viewer considers the i-th image to be of better quality than the j-th image, w i,j =0 indicates that the viewer considers the j-th image to be of better quality than the i-th image,n (i, j) represents the frequency of the ith image with better quality than the jth image in the experiment, and N (j, i) represents the frequency of the jth image with better quality than the ith image in the experiment.
In the third step, the input features of the feature extraction module are represented as:
X=[x 1 ,x 2 ,...,x C ]
wherein X represents the input feature of the feature extraction module, X C The c channel representing the input features, X ∈ R H×W×C R represents a real number set, H represents the height of the feature, W represents the width of the feature, and C represents the number of channels;
the feature map finally obtained by the feature extraction module is expressed as:
U=[u 1 ,u 2 ,...,u c ]
wherein U represents a feature map;
wherein u is c Feature map, v, representing the c-th channel c Representing a 3-dimensional spatial kernel with c channels,k channel, x, representing the k 2-dimensional spatial kernel and acting on the input features k K channel representing input features, representing convolution operation, c representing channel index number, c ∈ (1, C)]。
In the fourth step, the feature mapping is input into a compressed excitation module, and in the step of performing global average pooling operation on the feature mapping to obtain a channel information descriptor by compression, the following formula is corresponded to:
wherein z is c Denotes the c channel information descriptor, F sq (. Represents a compression operation, u c (m, n) represents the feature value of the mth row and nth column under the mth channel in the mapping feature u.
In the fourth step, in the step of learning the channel information descriptor by using two full connection layers to obtain the feature vector, the following formula is corresponded to:
S=F ex (z)=σ(δ(W 1 z)W 2 )
wherein S represents a feature vector, F ex (. -) represents the excitation operation, z represents the information descriptor, σ represents the Sigmoid activation function, δ represents the ReLU function, W 1 Weight matrix, W, representing the first fully-connected layer 2 A weight matrix representing the second fully-connected layer.
In the fourth step, in the step of performing channel multiplication calculation on the feature mapping and the feature vector to obtain a channel multiplication output feature, the following formula is corresponded to:
wherein the content of the first and second substances,representing the channel multiplication output characteristic of the c-th channel, F scale (. To) denotes a channel multiplication operation between a feature vector and a feature map, S c Representing the feature vector of the c-th channel.
In the fifth step, in the step of performing bilinear pooling processing on the channel multiplication output characteristics by using a bilinear pooling module to obtain a fine-grained quality difference image, the following formula exists:
where bilinear (-) denotes bilinear pooling, f ε Representing the input obtained by resizing the multiplication output characteristics of the channel, N representing the spatial position set in the fine-grained quality difference image, epsilon representing the position sign, { epsilon | epsilon ∈ N }, and T representing the transposition.
The fine-grained image blind quality evaluation method based on the bilinear convolutional neural network comprises the following steps of:
clipping the fine-grained quality difference image to 224 multiplied by 224, using random gradient descent as an optimizer, setting the learning rate to 0.1, and setting the attenuation rate to 1e according to a weight attenuation strategy -5 ;
Inputting an original image and a corresponding fine-grained quality difference image, using a Margin Ranking Loss function as a Loss function, and comparing losses in pairs to obtain a Loss value, wherein the corresponding formula is expressed as:
L(x 1 ,x 2 ,y)=max(0,-y*(x 1 -x 2 )+margin)
wherein, L (x) 1 ,x 2 Y) represents a Loss value of a Margin Ranking Loss function, margin represents a difference value of Ranking image quality scores, x 1 Representing first-order image information to be input, x 2 Indicating second-order image information to be input, y indicating a label of the comparison result, if x 1 >x 2 Then y =1, otherwise y =0.
The fine-grained image blind quality evaluation method based on the bilinear convolutional neural network is characterized in that in the seventh step, the test indexes of the fine-grained image blind quality evaluation model comprise a prediction monotonicity index, a prediction accuracy index and a prediction pair preference consistency index;
the prediction monotonicity index comprises a Kendel correlation coefficient and a Spireman correlation coefficient, wherein the Kendel correlation coefficient KRCC is expressed as:
wherein N is all Representing the number of images to be ranked, N c Representing the number of coincident pairs of predicted and subjective results, N d Representing the number of inconsistent pairs of predicted results and subjective results;
the spearman correlation coefficient SRCC is expressed as:
where N represents the number of distorted images in the data, d i Representing the difference between the subjective score and the objective prediction score of the ith image;
the prediction accuracy index comprises a Pearson correlation coefficient, and the Pearson correlation coefficient PLCC is expressed as:
wherein S is i Indicating the subjective score, p, of the ith image i Represents the objective prediction score of the ith image,the mean value of the subjective scores is represented,representing an objective prediction score average;
the predicted pairwise preference consistency indicator includes a pairwise preference consistency check coefficient, denoted as:
wherein M is c Indicating the number of pairs of images that are preferred to be identical, and M indicates the number of all pairs of images.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
FIG. 1 is a flow chart of a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network according to the present invention;
fig. 2 is a schematic diagram of a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network according to the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
These and other aspects of embodiments of the invention will be apparent with reference to the following description and attached drawings. In the description and drawings, particular embodiments of the invention have been disclosed in detail as being indicative of some of the ways in which the principles of the embodiments of the invention may be practiced, but it is understood that the scope of the embodiments of the invention is not limited correspondingly. On the contrary, the embodiments of the invention include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.
Referring to fig. 1 and fig. 2, the present invention provides a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network, wherein the method is implemented by a fine-grained image blind quality evaluation model, the fine-grained image blind quality evaluation model includes a feature extraction module, a compressed excitation module, a bilinear pooling module, and a full connection layer, and the method includes the following steps:
the method comprises the steps of firstly, obtaining an original image with fine granularity quality difference, and carrying out image preprocessing on the original image.
The original image of the invention comes from FG-IQA2018 database, and comprises 100 original images from Watero application database, and the resolution is from 400 × 400 to 723 × 480. And compressing to 3 distortion levels by using four JPEG compression methods, wherein the four JPEG compression methods correspond to low, medium and high bit rate scenes.
Before training of the blind quality evaluation model of the fine-grained image, image preprocessing needs to be carried out on an original image, and the corresponding method comprises the following steps:
scaling the image size of the original image to a uniform size for facilitating input of a model;
cutting the original image with the scaled size to obtain an image edge area and further obtain an image center square area;
and (4) performing data augmentation on the images after being cut in the training set by adopting random angle center rotation, random vertical turnover and random horizontal turnover so as to prevent overfitting.
And step two, acquiring the average subjective score of all data in the fine-grained database by using a Bradley-Terry model.
In this step, the expression of the Bradley-Terry model is:
where γ (i) represents the preference probability of the ith image, S i Representing the raw score, S, of the ith image j Representing the raw score of the jth image, S representing the set of raw score scores, w i,j Representing the viewer's quality preference for the ith and jth images, w i,j =1 indicates that the viewer considers the i-th image to be of better quality than the v-th image, w i,j =0 indicates that the viewer considers the j-th image to be of better quality than the i-th image,n (i, j) represents the frequency of the ith image with better quality than the jth image in the experiment, and N (j, i) represents the frequency of the jth image with better quality than the ith image in the experiment.
And thirdly, constructing a feature extraction module based on the convolutional layer sequence, and extracting information content sensitive to fine-grained features through the feature extraction module to obtain feature mapping.
The feature extraction module comprises three convolution groups, wherein each convolution group comprises 2 or 3 convolution layers and a 3 multiplied by 3 filter; after all convolution operations, using ReLU as the activation function, this operation can significantly reduce the computational complexity. Since the receptive field of the human visual system is the main functional and structural unit of signal processing, the max-pooling layer is set only after the first two convolution groups to preserve more perceptual information.
In this step, the input features of the feature extraction module are represented as:
X=[x 1 ,x 2 ,...,x C ]
wherein X represents the output of the feature extraction moduleInto feature, x C The c channel representing the input features, X ∈ R H×W×C R represents a real number set, H represents the height of the feature, W represents the width of the feature, and C represents the number of channels;
the final feature map obtained by the feature extraction module is represented as:
U=[u 1 ,u 2 ,...,u c ]
wherein U represents a feature map;
wherein u is c Feature map, v, representing the c-th channel c Representing a 3-dimensional spatial kernel with c channels,k channel, x, representing the k 2-dimensional spatial kernel and acting on the input features k K channel representing input feature, c represents channel index sequence number, c belongs to (1, C)]。
And fourthly, constructing a compression excitation module, inputting the feature mapping into the compression excitation module, performing global average pooling operation on the feature mapping to obtain a channel information descriptor through compression, learning the channel information descriptor by utilizing two full-connection layers to obtain a feature vector, and performing channel multiplication calculation on the feature mapping and the feature vector to obtain a channel multiplication output feature.
In this step, the feature map is input into the compressed excitation module, and in the step of performing global average pooling operation on the feature map to obtain the channel information descriptor by compression, the following formula is corresponded to:
wherein z is c Denotes the c channel information descriptor, F sq (. Smallcircle.) denotes a compression operation,u c (m, n) represents the feature value of the mth row and nth column under the c-th channel in the mapping feature u.
Further, in the step of learning the channel information descriptor by using two full-connected layers to obtain the feature vector, the following formula is corresponded to:
s=F ex (z)=σ(δ(W 1 z)W 2 )
wherein n represents a feature vector, F ex (. -) represents the excitation operation, z represents the information descriptor, σ represents the Sigmoid activation function, δ represents the ReLU function, W 1 Weight matrix, W, representing the first fully-connected layer 2 A weight matrix representing the second fully-connected layer.
Finally, in the step of performing channel multiplication calculation on the feature mapping and the feature vector to obtain the channel multiplication output feature, the following formula is corresponded to:
wherein the content of the first and second substances,representing the channel multiplication output characteristic of the c-th channel, F scale (. To) denotes a channel multiplication operation between a feature vector and a feature map, S c Representing the feature vector of the c-th channel.
And fifthly, performing bilinear pooling on the channel multiplication output characteristics by using a bilinear pooling module to obtain a fine-grained quality difference image, and inputting the fine-grained quality difference image to a full connection layer.
In the fifth step, in the step of performing bilinear pooling processing on the channel multiplication output characteristics by using a bilinear pooling module to obtain a fine-grained quality difference image, the following formula exists:
wherein biliiner(. To) represent a bilinear pooling operation, f ε Representing the input obtained by resizing the multiplication output characteristics of the channel, N representing the spatial position set in the fine-grained quality difference image, epsilon representing the position sign, { epsilon | epsilon ∈ N }, and T representing the transposition.
And step six, obtaining the evaluation score corresponding to the fine-grained quality difference image through a full connection layer.
And seventhly, comparing the evaluation score corresponding to the fine-grained quality difference image with the average subjective score to obtain each test index.
In this step, the method for comparing the evaluation score corresponding to the fine-grained quality difference image with the average subjective score includes the following steps:
clipping the fine-grained quality difference image to 224 multiplied by 224, using random gradient descent as an optimizer, setting the learning rate to 0.1, and setting the attenuation rate to 1e according to a weight attenuation strategy -5 ;
Inputting an original image and a corresponding fine-grained quality difference image, using a Margin Ranking Loss function as a Loss function, and comparing losses in pairs to obtain a Loss value, wherein the corresponding formula is expressed as:
L(x 1 ,x 2 ,y)=max(0,-y*(x 1 -x 2 )+margin)
wherein, L (x) 1 ,x 2 Y) represents a Loss value of a Margin Ranking Loss function, margin represents a difference value of Ranking image quality scores, x 1 Representing first-order image information, x, to be input 2 Indicating second-order image information to be input, y indicating a label of the comparison result, if x 1 >x 2 Then y =1, otherwise y =0.
Further, the test indexes of the blind quality evaluation model of the fine-grained image comprise a prediction monotonicity index, a prediction accuracy index and a prediction pair preference consistency index;
the prediction monotonicity index comprises a Kendel correlation coefficient and a Spireman correlation coefficient, wherein the Kendel correlation coefficient KRCC is expressed as:
wherein N is all Representing the number of images to be ranked, N c Representing the number of coincident pairs of predicted and subjective results, N d Representing the number of inconsistent pairs of predicted results and subjective results;
the spearman correlation coefficient SRCC is expressed as:
where N represents the number of distorted images in the data, d i Representing the difference between the subjective score and the objective prediction score of the ith image;
the prediction accuracy index comprises a Pearson correlation coefficient, and the Pearson correlation coefficient PLCC is expressed as:
wherein S is i Indicating the subjective score, p, of the ith image i Represents the objective prediction score of the ith image,the mean value of the subjective scores is expressed,representing an objective prediction score average;
the predicted pairwise preference consistency indicator includes a pairwise preference consistency check coefficient, denoted as:
wherein M is c Indicating the number of pairs of images that are preferred to be identical, and M indicates the number of all pairs of images.
The invention provides a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network, which aims to:
1. designing a specific feature extraction module to capture quality perception features, acquiring information content sensitive to fine-grained features, guiding a model to distinguish subtle differences among fine-grained images, enhancing the identifiability of the features and providing a meaningful feature extraction method for the field of image processing;
2. the interdependency of the modeling feature mapping channels is researched, the sensitivity of features to differences among images is improved, the development of an image processing model for deep learning is facilitated, and continuous optimization and upgrading of related practical applications are promoted.
Therefore, the blind image quality evaluation method for efficiently and accurately distinguishing fine-grained image differences can greatly promote the development of the image quality evaluation field and the development of the computer vision field.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a fine-grained image blind quality evaluation method based on a bilinear convolutional neural network, which is characterized in that a constructed fine-grained image blind quality evaluation model can effectively predict the image quality of fine-grained distortion difference by designing a characteristic extraction module for extracting quality perception characteristics, a compression excitation module for improving characteristic representation capability and a bilinear pooling module for improving characteristic identification capability, and has important significance for reducing the difference between objective image quality evaluation and actual application.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (10)
1. A fine-grained image blind quality evaluation method based on a bilinear convolutional neural network is characterized by being achieved through a fine-grained image blind quality evaluation model, the fine-grained image blind quality evaluation model comprises a feature extraction module, a compression excitation module, a bilinear pooling module and a full connection layer, and the method comprises the following steps:
step one, obtaining an original image with fine-grained quality difference, and carrying out image preprocessing on the original image;
acquiring the average subjective score of all data in the fine-grained database by using a Bradley-Terry model;
thirdly, constructing a feature extraction module based on the convolutional layer sequence, and extracting information content sensitive to fine-grained features through the feature extraction module to obtain feature mapping;
step four, constructing a compression excitation module, inputting the feature mapping into the compression excitation module, performing global average pooling operation on the feature mapping to obtain a channel information descriptor through compression, learning the channel information descriptor by utilizing two full-connection layers to obtain a feature vector, and performing channel multiplication calculation on the feature mapping and the feature vector to obtain a channel multiplication output feature;
step five, performing bilinear pooling on the channel multiplication output characteristics by using a bilinear pooling module to obtain a fine-grained quality difference image, and inputting the fine-grained quality difference image to a full connection layer;
step six, obtaining an evaluation score corresponding to the fine-grained quality difference image through a full connection layer;
and seventhly, comparing the evaluation score corresponding to the fine-grained quality difference image with the average subjective score to obtain each test index.
2. The blind quality evaluation method for fine-grained images based on the bilinear convolutional neural network as claimed in claim 1, wherein in the step one, the method for preprocessing the original image comprises the following steps:
scaling the image size of the original image to a uniform size for facilitating input of a model;
cutting the original image with the scaled size to obtain an image edge area and further obtain an image center square area;
and (4) performing data augmentation on the images after being cut in the training set by adopting random angle center rotation, random vertical turnover and random horizontal turnover so as to prevent overfitting.
3. The fine-grained image blind quality evaluation method based on the bilinear convolutional neural network as claimed in claim 2, wherein in the second step, the expression of the Bradley-Terry model is as follows:
where γ (i) represents the preference probability of the ith image, S i Representing the raw score, S, of the ith image j Representing the raw score of the jth image, S representing the set of raw score scores, w i,j Representing the viewer's quality preference for the ith and jth images, w i,j =1 indicates that the viewer considers the i-th image to be of better quality than the j-th image, w i,j =0 indicates that the viewer considers the j-th image to be of better quality than the i-th image,n (i, j) represents the frequency of the ith image with better quality than the jth image in the experiment, and N (j, i) represents the frequency of the jth image with better quality than the ith image in the experiment.
4. The fine-grained image blind quality evaluation method based on the bilinear convolutional neural network as claimed in claim 3, wherein in the third step, the input features of the feature extraction module are represented as:
X=[x 1 ,x 2 ,...,x C ]
wherein X represents the input feature of the feature extraction module, X C The c channel representing the input features, X ∈ R H×W×C R represents a real number set, H represents the height of the feature, W represents the width of the feature, and C represents the number of channels;
the final feature map obtained by the feature extraction module is represented as:
U=[u 1 ,u 2 ,...,u c ]
wherein U represents a feature map;
wherein u is c Feature map, v, representing the c-th channel c Representing a 3-dimensional spatial kernel with c channels,k channel, x, representing the k 2-dimensional spatial kernel and acting on the input features k K channel representing input features, representing convolution operation, c representing channel index number, c ∈ (1, C)]。
5. The blind quality evaluation method for fine-grained images based on the bilinear convolutional neural network as claimed in claim 4, wherein in the fourth step, the feature map is input into a compressed excitation module, and in the step of performing global average pooling operation on the feature map to obtain the channel information descriptor through compression, the following formula corresponds to:
wherein z is c Denotes the c channel information descriptor, F sq (. Represents a compression operation, u c (m, n) represents the feature value of the mth row and nth column under the c-th channel in the mapping feature u.
6. The fine-grained image blind quality evaluation method based on the bilinear convolutional neural network as claimed in claim 5, wherein in the step four, the step of learning the channel information descriptor by using two full-connected layers to obtain the feature vector corresponds to the following formula:
S=F ex (z)=σ(δ(W 1 z)W 2 )
wherein S represents a feature vector, F ex (. -) represents the excitation operation, z represents the information descriptor, σ represents the Sigmoid activation function, δ represents the ReLU function, W 1 Weight matrix, W, representing the first fully-connected layer 2 A weight matrix representing the second fully-connected layer.
7. The blind quality evaluation method for fine-grained images based on the bilinear convolutional neural network as claimed in claim 6, wherein in the fourth step, in the step of performing channel multiplication calculation on the feature map and the feature vector to obtain the channel multiplication output feature, the following formula corresponds to:
8. The blind quality evaluation method for fine-grained images based on the bilinear convolutional neural network as claimed in claim 7, wherein in the fifth step, in the step of performing bilinear pooling on the channel multiplication output feature by using a bilinear pooling module to obtain the fine-grained quality difference image, the following formula exists:
where bilinear (-) represents a bilinear pooling operation, f ε Representing the input obtained by resizing the multiplication output characteristics of the channel, N representing the spatial position set in the fine-grained quality difference image, epsilon representing the position sign, { epsilon | epsilon ∈ N }, and T representing the transposition.
9. The blind quality evaluation method for the fine-grained image based on the bilinear convolutional neural network as claimed in claim 8, wherein in the seventh step, the method for comparing the evaluation score corresponding to the fine-grained quality difference image with the average subjective score comprises the following steps:
clipping the fine-grained quality difference image to 224 multiplied by 224, using random gradient descent as an optimizer, setting the learning rate to 0.1, and setting the attenuation rate to 1e according to a weight attenuation strategy -5 ;
Inputting an original image and a corresponding fine-grained quality difference image, using a Margin Ranking Loss function as a Loss function, and comparing losses in pairs to obtain a Loss value, wherein the corresponding formula is expressed as:
L(x 1 ,x 2 ,y)=max(0,-y*(x 1 -x 2 )+margin)
wherein, L (x) 1 ,x 2 Y) represents a Loss value of a Margin Ranking Loss function, margin represents a difference value of Ranking image quality scores, x 1 Representing first-order image information, x, to be input 2 Indicating second-order image information to be input, y indicating a label of the comparison result, if x 1 >x 2 Then y =1, otherwise y =0.
10. The blind quality evaluation method of fine-grained image based on bilinear convolutional neural network of claim 9, wherein in the seventh step, the test indexes of the blind quality evaluation model of fine-grained image include a monotonicity prediction index, a prediction accuracy index and a pair-wise preference consistency prediction index;
the prediction monotonicity index comprises a Kendel correlation coefficient and a Spireman correlation coefficient, wherein the Kendel correlation coefficient KRCC is expressed as:
wherein N is all Representing the number of images to be ranked, N c Representing the number of coincident pairs of predicted and subjective results, N d Indicating the number of inconsistent pairs of predicted and subjective results;
The spearman correlation coefficient SRCC is expressed as:
where N represents the number of distorted images in the data, d i Representing the difference between the subjective score and the objective prediction score of the ith image;
the prediction accuracy index comprises a Pearson correlation coefficient, and the Pearson correlation coefficient PLCC is expressed as:
wherein S is i Indicating the subjective score, p, of the ith image i Represents the objective prediction score of the ith image,the mean value of the subjective scores is expressed,representing an objective prediction score average;
the predicted pairwise preference consistency index includes a pairwise preference consistency check coefficient, denoted as pairwise preference consistency check coefficient P _ test:
wherein M is c Indicating the number of pairs of images that are preferred to be identical, and M indicates the number of all pairs of images.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211547291.7A CN115829971A (en) | 2022-12-02 | 2022-12-02 | Fine-grained image blind quality evaluation method based on bilinear convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211547291.7A CN115829971A (en) | 2022-12-02 | 2022-12-02 | Fine-grained image blind quality evaluation method based on bilinear convolutional neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115829971A true CN115829971A (en) | 2023-03-21 |
Family
ID=85543956
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211547291.7A Pending CN115829971A (en) | 2022-12-02 | 2022-12-02 | Fine-grained image blind quality evaluation method based on bilinear convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115829971A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180286032A1 (en) * | 2017-04-04 | 2018-10-04 | Board Of Regents, The University Of Texas System | Assessing quality of images or videos using a two-stage quality assessment |
CN113111940A (en) * | 2021-04-13 | 2021-07-13 | 东南大学 | Expression recognition method based on feature fusion |
CN114549492A (en) * | 2022-02-27 | 2022-05-27 | 北京工业大学 | Quality evaluation method based on multi-granularity image information content |
-
2022
- 2022-12-02 CN CN202211547291.7A patent/CN115829971A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180286032A1 (en) * | 2017-04-04 | 2018-10-04 | Board Of Regents, The University Of Texas System | Assessing quality of images or videos using a two-stage quality assessment |
CN113111940A (en) * | 2021-04-13 | 2021-07-13 | 东南大学 | Expression recognition method based on feature fusion |
CN114549492A (en) * | 2022-02-27 | 2022-05-27 | 北京工业大学 | Quality evaluation method based on multi-granularity image information content |
Non-Patent Citations (2)
Title |
---|
LIXIA LIU 等: "Bilinear CNNs for Blind Quality Assessment of Fine-Grained Images", 《2022 IEEE 24TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP)》, pages 1 - 6 * |
张维夏: "基于特征聚合和数据驱动的盲图像质量评价", 《中国博士学位论文全文数据库》, pages 138 - 45 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110189334B (en) | Medical image segmentation method of residual error type full convolution neural network based on attention mechanism | |
CN110021425B (en) | Comparison detector, construction method thereof and cervical cancer cell detection method | |
CN111461232A (en) | Nuclear magnetic resonance image classification method based on multi-strategy batch type active learning | |
CN111931931B (en) | Deep neural network training method and device for pathology full-field image | |
CN112116605A (en) | Pancreas CT image segmentation method based on integrated depth convolution neural network | |
CN114897779B (en) | Cervical cytology image abnormal region positioning method and device based on fusion attention | |
CN108875076B (en) | Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network | |
CN113378796B (en) | Cervical cell full-section classification method based on context modeling | |
CN115018824A (en) | Colonoscope polyp image segmentation method based on CNN and Transformer fusion | |
CN113610144A (en) | Vehicle classification method based on multi-branch local attention network | |
CN110751644B (en) | Road surface crack detection method | |
CN113112446A (en) | Tunnel surrounding rock level intelligent judgment method based on residual convolutional neural network | |
CN110738663A (en) | Double-domain adaptive module pyramid network and unsupervised domain adaptive image segmentation method | |
CN113706544B (en) | Medical image segmentation method based on complete attention convolutional neural network | |
CN110930378A (en) | Emphysema image processing method and system based on low data demand | |
CN114266757A (en) | Diabetic retinopathy classification method based on multi-scale fusion attention mechanism | |
CN116525075A (en) | Thyroid nodule computer-aided diagnosis method and system based on few sample learning | |
CN114495210A (en) | Posture change face recognition method based on attention mechanism | |
CN114140437A (en) | Fundus hard exudate segmentation method based on deep learning | |
CN113469961A (en) | Neural network-based carpal tunnel image segmentation method and system | |
CN110992309B (en) | Fundus image segmentation method based on deep information transfer network | |
CN115829971A (en) | Fine-grained image blind quality evaluation method based on bilinear convolutional neural network | |
CN113971764B (en) | Remote sensing image small target detection method based on improvement YOLOv3 | |
CN113192076B (en) | MRI brain tumor image segmentation method combining classification prediction and multi-scale feature extraction | |
CN115063602A (en) | Crop pest and disease identification method based on improved YOLOX-S network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |