CN117274173A - Semantic and structural distillation reference-free image quality evaluation method - Google Patents

Semantic and structural distillation reference-free image quality evaluation method Download PDF

Info

Publication number
CN117274173A
CN117274173A CN202311135174.4A CN202311135174A CN117274173A CN 117274173 A CN117274173 A CN 117274173A CN 202311135174 A CN202311135174 A CN 202311135174A CN 117274173 A CN117274173 A CN 117274173A
Authority
CN
China
Prior art keywords
image
semantic
network
information
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311135174.4A
Other languages
Chinese (zh)
Inventor
邓杰航
陈浩民
顾国生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202311135174.4A priority Critical patent/CN117274173A/en
Publication of CN117274173A publication Critical patent/CN117274173A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention aims to provide a semantic and structural distillation reference-free image quality evaluation method, which comprises the following steps: constructing an image training set, and dividing the image training set into a reference image set and a degraded image set; training a teacher network 1 for extracting image semantic information by using a reference image set; training a teacher network 2 for image difference information extraction by using a reference image set; training a student network by using the teacher network 1,2 and the difference information distillation loss; training a student network by using the teacher network 1,2 and semantic information distillation loss; and inputting the image to be evaluated into a trained student network for image quality evaluation. The invention can solve the defect of lack of reference information and improve the performance of image quality evaluation.

Description

Semantic and structural distillation reference-free image quality evaluation method
Technical Field
The invention relates to the technical field of picture detection, in particular to a semantic and structural distillation reference-free image quality evaluation method.
Background
In recent years, with rapid development of digital technology and wide popularization of intelligent terminals such as smart phones and tablet computers, acquisition of image information is becoming more and more convenient, and quality requirements of the acquired image information are also increasing. However, the image information is subjected to a series of processing steps such as transmission, compression, storage, etc., before reaching the end user, which may introduce different distortions, thereby degrading the quality of the image. Therefore, a quality evaluation tool is required to accurately evaluate the image quality in real time to ensure high quality of image information.
The image quality evaluation is mainly classified into a subjective evaluation method and an objective evaluation method. The former is to make the image observer evaluate the quality of the image according to the subjective perception of the observer according to a certain standard and method; the latter is to construct a mathematical model to simulate a human observer for quality prediction according to the human visual system. Since humans are the ultimate recipients of visual information, subjective assessment methods are the most accurate and reliable methods. However, in practical application, this method requires a lot of manpower and money, and cannot be evaluated in real time. On the contrary, the objective evaluation method can evaluate a large amount of image data in real time, so that the objective image quality evaluation method has important research value.
Objective evaluation methods can be classified into a full reference quality evaluation method (FR-IQA), a half reference quality evaluation method (RR-IQA), and a no reference quality evaluation method (NR-IQA). The full reference method is based on the full accessibility of the original high-quality image of the distorted image, compares the deviation degree between the distorted image and the original high-quality image by taking the high-quality image as a reference basis, and scores the distorted image according to the deviation degree. The half reference method evaluates the quality of a distorted image using only part of the information of a reference image as a reference. In most cases, however, the reference image is not acquired, and therefore, a method capable of evaluating a distorted image without the reference image, that is, a no-reference quality evaluation method has been developed. The non-reference quality evaluation method extracts characteristics of the distorted image, which can reflect distortion conditions, and calculates the score of the distorted image according to the characteristics. The non-reference quality assessment method is most widely used in practice because it can score images without reference images.
Although the deep learning-based NR-IQA method is widely used in most scenes, it lacks reference information, resulting in inferior evaluation performance of the non-reference method than the full-reference and half-reference methods. To solve this problem, many people introduce an image restoration network into the NR-IQA field, which first restores a distorted image to an original reference image or a visual sensitive image using a restoration algorithm such as a GAN network, and then scores the distorted image using the restored image. The method introduces pseudo-reference information for NR-IQA, thereby greatly improving the performance of NR-IQA algorithm and solving the bottleneck of NR-IQA method. However, there are two limitations to this approach, one is that the restoration reference image generated by the restoration network typically contains a lot of false information, such as artifacts, and the generation of something that is not present in the original image, which affects the performance of the model in scoring the distorted image. Another limitation is that the NR-IQA method based on a restoration network tends to lose the ability to repair distorted images when they are severely corrupted.
In summary, the NR-IQA method based on deep learning has important research and practical significance, but the common deep learning method has the problem of lacking reference information, and the image recovery network such as GAN network is adopted to solve the problem to a certain extent, but some information influencing evaluation is introduced on the recovered image. Therefore, it is important how to design a method to introduce reference information for the NR-IQA method that does not affect performance.
Disclosure of Invention
The invention aims to provide a semantic and structural distillation non-reference image quality evaluation method which can solve the defect of lack of reference information and improve the performance of image quality evaluation.
A method for reference-free image quality assessment for semantic and structural distillation, comprising:
constructing an image training set, and dividing the image training set into a reference image set and a degraded image set;
training a teacher network 1 for extracting image semantic information by using a reference image set;
training a teacher network 2 for image difference information extraction by using a reference image set;
training a student network by using the teacher network 1,2 and the difference information distillation loss;
training a student network by using the teacher network 1,2 and semantic information distillation loss;
and inputting the image to be evaluated into a trained student network for image quality evaluation.
Training the teacher network 1 for image semantic information extraction with the reference image set includes:
the teacher network 1 uses a first convolution layer of ResNet-50 and first two residual blocks as a feature extraction module for extracting features of an input image;
reference will be made to figure I ref Inputting a teacher network 1;
teacher network outputRepresenting the semantic features of the reference image extracted from the reference map by the ith residual block.
Training the teacher network 2 for image difference information extraction with the reference image set includes:
map of pixel differences I diff Inputting a teacher network 2;
the pixel difference map is obtained by subtracting the degraded image from the reference image, and the calculation formula is as follows:
I diff =|I ref -I dis |
wherein I is ref For reference pictures, I dis To degrade the image, I diff Is a difference graph;
teacher network 2 outputRepresenting the features of the ith residual block extracted from the disparity map;
the semantic level difference information features are obtained by subtracting the reference image semantic features extracted by the reference image semantic information extraction branches from the distortion map semantic features extracted by the degradation image semantic information extraction branches.
Training the student network using the teacher network 1,2 and the difference information distillation loss function includes:
inputting the reference images into a teacher neural network one by one, and calculating characteristic attention mapping of the teacher neural network to the pictures: key featuresSum feature (v=λ (f) T ));
The degraded images are input to a student neural network one by one, and feature attention mapping of the student neural network to the pictures is calculated: query feature (q=θ (f) S ));
θ, φ and λ are 3 1×1 convolutions, respectively, which compress the number of original feature map channels to C', screen out redundant channel features;
calculating semantic difference distillation loss:
Calculating pixel differential distillation loss:
where N represents the number of training samples,and->Respectively a semantic difference feature map and a pixel difference feature map,extracting branch-extracted distortion map features for degraded image difference information, F cnl_sd And F cnl_diff Respectively, semantic difference features andthe channel inquiry module inquires the pixel difference characteristics;
updating parameters in the student network according to the loss function until the parameters in the student network converge.
Training the student network with teacher network 1,2 and semantic information distillation loss comprises:
splicing the reference image features and the degraded image features on the channel, and simultaneously carrying out distortion filtering from different scales through 3 largest pooling layers with different sizes, wherein only semantic contour information shared by the reference image and the degraded image is reserved;
activating by using a softmax module, indicating the existence position of semantic information to obtain a semantic profile information indication graph, multiplying the semantic profile information indication graph into a reference graph feature graph to obtain reference semantic information required to be learned by a student network, and learning by the student network in a feature matching mode;
calculating semantic information distillation loss:
wherein,and->Respectively a reference graph semantic feature graph and a no-reference network semantic information branch feature graph, +. >Representing matrix multiplication, F SSIIM The semantic profile information indicating module is used for indicating semantic profile information;
updating parameters in the student network according to the loss function until the parameters in the student network converge.
Before inputting the image to be evaluated into the trained student network for image quality evaluation, the method further comprises the step of carrying out feature fusion on the semantic features and the difference features of the degraded image, and specifically comprises the following steps:
obtaining maximum responses in all feature maps from the channels using channel maximization, the responses representing locations where semantic information exists;
the method comprises the steps that a channel selection module is utilized, semantic information is adjusted by using semantic difference features, and the most relevant semantic information is selected for each channel in a semantic difference feature map;
multiplying the maximum response diagram of the channel to the semantic difference feature diagram, selecting distortion at the position where semantic information exists, and adding the distortion with the adjusted semantic information features to obtain a multi-information fusion feature diagram.
Before inputting the image to be evaluated into the trained student network for image quality evaluation, the method further comprises the step of extracting the gradient characteristics of the degraded image as image quality evaluation auxiliary information, specifically comprising the following steps:
extracting the image gradient by adopting a Scharr operator, wherein the calculation formula is as follows:
I g =scharr(I dis )
Wherein I is g Is a gradient map of the distorted image.
Inputting the image to be evaluated into a trained student network for image quality evaluation comprises the following steps:
global average pooling is carried out on gradient features and distortion features extracted from the 3 rd residual blocks of two ResNet-50 in the student network, and the gradient features and the distortion features are spliced on channels and input into the student network;
the nodes of the three full-connection layers are 1024, 2048 and 2048 respectively, and finally the image quality fraction is output.
After the teacher network 1,2 and the semantic information distillation loss training student network are utilized, the method further comprises the step of calculating the quality score loss training student network of the degraded image, and specifically comprises the following steps:
the L2 loss is used for measuring the difference between the image quality score of the student network evaluation and the standard evaluation, and the definition is as follows:
wherein q is i Andthe quality scores for the label of the degraded image and the student network evaluation, respectively, define the overall loss of the model as follows:
L=L q +λL diff +λL sd +λL si
where λ is the distillation weight.
A system for reference-free image quality assessment for semantic and structural distillation, comprising:
the first data processing module is used for constructing an image training set and dividing the image training set into a reference image set and a degraded image set;
the first teacher network training module is used for training the teacher network 1 for extracting the image semantic information by using the reference image set;
A second teacher network training module for training the teacher network 2 for image difference information extraction with the reference image set;
the first student network training module is used for training the student network by utilizing the teacher network 1 and 2 and the difference information distillation loss;
the second student network training module is used for training the student network by utilizing the teacher network 1,2 and the semantic information distillation loss;
and the second data processing module is used for inputting the image to be evaluated into the trained student network to perform image quality evaluation.
The method comprises the steps of constructing an image training set, and dividing the image training set into a reference image set and a degraded image set; training a teacher network 1 for extracting image semantic information by using a reference image set; training a teacher network 2 for image difference information extraction by using a reference image set; training a student network by using the teacher network 1,2 and the difference information distillation loss; training a student network by using the teacher network 1,2 and semantic information distillation loss; and inputting the image to be evaluated into a trained student network for image quality evaluation. The reference information is transferred to the non-reference network by using a knowledge distillation mode, the defect that the non-reference network lacks reference information can be well solved, and the performance of the method on a plurality of databases is better than that of a general non-reference network model, and even exceeds that of an image quality evaluation model by using an image recovery technology.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a diagram of the overall architecture of the quality assessment model of the present invention;
FIG. 3 is a block diagram of a channel selection module according to the present invention;
FIG. 4 is a schematic diagram of a semantic profile information indicating module according to the present invention;
fig. 5 is a block diagram of a feature fusion module of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that all directional indicators (such as up, down, left, right, front, and rear … …) in the embodiments of the present invention are merely used to explain the relative positional relationship, movement, etc. between the components in a particular posture (as shown in the drawings), and if the particular posture is changed, the directional indicator is changed accordingly.
Furthermore, the description of "first," "second," etc. in this disclosure is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.
The NR-IQA method based on deep learning is widely applied in most scenes, but the evaluation performance of the non-reference method is poorer than that of the full-reference and half-reference methods due to the lack of reference information. To solve this problem, an image restoration network is introduced into the NR-IQA field, which first restores a distorted image to an original reference image or a visual sensitive image using a restoration algorithm such as a GAN network, and then scores the distorted image using the restored image. However, there are two limitations to this approach, one is that the restoration reference image generated by the restoration network typically contains a lot of false information, such as artifacts, and the generation of something that is not present in the original image, which affects the performance of the model in scoring the distorted image. Another limitation is that the NR-IQA method based on a restoration network tends to lose the ability to repair distorted images when they are severely corrupted.
Therefore, the invention provides the method for extracting the relevant characteristic information from the reference image and the degradation image by the distillation technology in the model training stage and transferring the relevant characteristic information to the non-reference network, thereby introducing the real reference information for the NR-IQA method. Thus, unlike models that utilize a generative network, the present method can introduce as little false information as possible that affects the performance of the model. Aiming at the problem that the characteristic channels of the student network and the teacher network are not matched, a channel selection module is provided and effectively applied to IQA. For the problem that the outline information positions shared by the distortion map and the reference map are not matched in semantic learning, the invention provides a semantic outline information indicating module which is used for reducing distortion influence in the distortion map, locating the outline information positions shared by the distortion map and the reference map, introducing the outline information positions into the reference feature map and finally matching the degradation map with the reference map in semantic. Through the module, the student network can learn semantic features better. The invention tests the performance of the model by using the open source database, and greatly improves the performance of the NR-IQA model.
Example 1
A method for reference-free image quality assessment for semantic and structural distillation, comprising:
S100, constructing an image training set, and dividing the image training set into a reference image set and a degraded image set;
image data of a public IQA database of image quality evaluation is acquired, the database is LIVE, CSIQ and TID2013, and then a data set is divided according to a ratio of training set to test set=8:2 according to a reference image. Wherein the reference images and their corresponding degraded images in the training set and the test set are mutually exclusive, i.e. the reference images and their corresponding degraded images that appear in the training set do not appear in the test set, and vice versa. To eliminate the effects of database sampling bias, each database was randomly partitioned 10 times and tested, with the final result being the average of 10 tests.
Because the IQA database is small, it is not sufficient to train the neural network. Moreover, conventional data enhancement methods (such as rotation, noise addition, etc.) are also a way of image degradation, and adding these processing ways changes the quality score of the image and thus cannot be used for data enhancement of IQA data. Therefore, the invention adopts the most widely applied random clipping method in the IQA field to enhance the data, namely, each training iteration randomly clips the original distorted image and the reference image, and the quality scores of the clipped image blocks and the original image are the same.
Therefore, during the training of the model, the training data can be randomly cut out every time, and each trained image can be respectively cut out 1 image block with the size of 196 x 196 at 3 random positions, and the quality score of the image block is the same as that of the original distorted image. When the model is tested, the model is evaluated in a random clipping mode, and the method is widely applied, but more blocks need to be clipped to cover the whole image. For the method, 30 blocks 196 by 196 are randomly sampled for each test image in the test process, then each block is respectively input into a network for scoring, and finally the scores of all the blocks are summed and averaged to obtain the final quality score.
S200, training a teacher network 1 for extracting image semantic information by using a reference image set;
s300, training a teacher network 2 for extracting image difference information by using a reference image set;
s400, training a student network by using teacher networks 1 and 2 and the difference information distillation loss;
s500, training a student network by using teacher networks 1 and 2 and semantic information distillation loss;
s600, inputting the image to be evaluated into a trained student network for image quality evaluation.
The invention transfers the reference information from the teacher network to the student network in a knowledge distillation mode by designing the semantic information distillation module and the difference information distillation module. Then, in order to reduce the influence of masking effect, the model fuses the semantic features and the difference features extracted by the degraded image semantic information extraction branch and the difference information extraction branch through a feature fusion module. In addition, the model also performs feature extraction on the gradient map of the degraded image through a gradient feature extraction network, and then inputs the gradient features and the fused distortion map features into a regression network to predict final quality scores. Finally, in order to constrain and optimize the model, the method adds distillation loss in the distillation process to improve the distillation performance, and in addition, the model prediction is supervised by using the labels of the training images.
S200 training the teacher network 1 for image semantic information extraction with the reference image set includes:
the teacher network 1 uses a first convolution layer of ResNet-50 and first two residual blocks as a feature extraction module for extracting features of an input image;
reference will be made to figure I ref Inputting a teacher network 1;
teacher network outputRepresenting the semantic features of the reference image extracted from the reference map by the ith residual block.
S300 training the teacher network 2 for image difference information extraction with the reference image set includes:
map of pixel differences I diff Inputting a teacher network 2;
the pixel difference map is obtained by subtracting the degraded image from the reference image, and the calculation formula is as follows:
I diff =|I ref -I dis |
wherein I is ref For reference pictures, I dis To degrade the image, I diff Is a difference graph;
teacher network 2 outputRepresenting the features of the ith residual block extracted from the disparity map;
the semantic level difference information features are obtained by subtracting the reference image semantic features extracted by the reference image semantic information extraction branches from the distortion map semantic features extracted by the degradation image semantic information extraction branches.
In order to transfer the reference information to the NR-IQA method, the present model uses knowledge distillation to transfer the reference information, thus requiring construction of a reference network and a student network. Wherein the reference network is used to extract the reference information and the student network is used to learn the reference information. Since the most effective method for quality evaluation of images is to compare the difference between the reference and degradation images, which includes the difference between the pixel level and the semantic level, the two difference information is widely used in the FR-IQA method and for quality prediction. Therefore, the present model takes these two kinds of difference information as reference information.
Although the difference information can well explain the position where the distortion occurs, using only the difference information brings about a serious masking effect, so that semantic information needs to be added to the model. Since the reference image has no distortion, a large amount of undamaged semantic information can be extracted from the reference image, thereby taking the image quality evaluation as a reference, and the model takes the semantic information of the reference image as the reference information.
The invention uses two ResNet-50 as branches of a backbone network to respectively construct a reference image semantic information extraction branch teacher network 1 and a pixel difference information extraction branch teacher network 2. As shown in fig. 2.
S400 training the student network with the teacher network 1,2 and the difference information distillation loss function includes:
inputting the reference images into a teacher neural network one by one, and calculating characteristic attention mapping of the teacher neural network to the pictures: key featuresSum feature (v=λ (f) T ));
The degraded images are input to a student neural network one by one, and feature attention mapping of the student neural network to the pictures is calculated: query feature (q=θ (f) S ));
θ, φ and λ are 3 1×1 convolutions, respectively, which compress the number of original feature map channels to C', screen out redundant channel features;
Calculating semantic difference distillation loss:
calculating pixel differential distillation loss:
where N represents the number of training samples,and->Respectively a semantic difference feature map and a pixel difference feature map,extracting branch-extracted distortion map features for degraded image difference information, F cnl_sd And F cnl_diff The channel inquiry module is used for inquiring the semantic difference characteristics and the pixel difference characteristics respectively;
updating parameters in the student network according to the loss function until the parameters in the student network converge.
In order to enable the student network to learn and extract the difference information, the application develops a difference information distillation module, and enables the student network to learn pixel difference information and semantic difference information in a characteristic distillation mode. As shown in fig. 1, the pixel-level distortion feature is obtained by extracting features of the disparity map through a pixel disparity information extraction branch. The semantic level features are obtained by subtracting the reference image semantic features extracted by the reference image semantic information extraction branches from the distortion image semantic features extracted by the degradation image semantic information extraction branches. Since the distortion at the semantic level is mostly the distortion in the detail of the image object, the whole object is destroyed as the distortion strength increases, but even so, the outline and the position thereof can still be possibly recognized. Therefore, the model focuses on semantic level difference information from detail distortion, adopts an asymmetric convolution layer with convolution kernels of 1*3, 3*1 and 3*3 to extract detail information of semantic features of a reference image and a distortion image, and then subtracts the detail information to obtain semantic level distortion features, wherein a specific calculation formula is as follows:
Wherein the method comprises the steps ofIs a semantic difference feature, F AC An asymmetric convolution layer.
After extracting these two reference features, the present model will employ knowledge distillation methods to transfer the reference information from the teacher's network to the student's network. However, the input of the teacher network and the student network in most of the knowledge distillation models are the same currently, so that the characteristic matching can be performed by directly subtracting the teacher characteristic diagram and the student characteristic diagram. Unlike most knowledge distillation network inputs, the student feature map of the present invention is a degradation map feature map, and the teacher feature map is a pixel difference feature map and a semantic difference feature map, respectively, which are all different, so that two problems exist in the direct subtraction method, the first problem is that the teacher feature map and the student feature map perceive different information or activation on the same channel, so that the teacher feature and the student feature on the channel are not necessarily the most relevant, and the teacher feature most relevant to the student feature on the channel may be on other channels in the teacher feature map, so that the direct matching may reduce the transfer efficiency of the teacher knowledge. The second problem is that the feature information on a certain channel of the student feature map may be related to the feature information on a plurality of channels in the teacher feature map, but in direct matching, each channel of the student feature map only learns the information of one channel of the teacher feature map, so that the information of the teacher feature map cannot be better learned. Thus, the present model is inspired by the non-local attention module, and a channel selection module is provided, and the structure diagram of the channel selection module is shown in fig. 2. First, a student feature map and a teacher feature map are projected to obtain a query feature (q=θ (f) S ) Key feature (k=phi (f) T ) Sum value feature (v=λ (f) T ) The query feature is mapped by the student feature map, the key feature and the value feature are mapped by the teacher feature map, and theta, phi and lambda are respectively 3 1 multiplied by 1 convolutions, which compress the number of channels of the original feature map into C', and screen out redundant channel features. In this model, a 1×1 convolution compresses the channels of the teacher feature map and the student feature map to 1/8 of the original. Thereafter, the shapes of Q, K and V were adjusted so that 3 features were obtainedThe shapes of the figures are respectivelyAnd->By multiplying Q and K and then obtaining a channel correlation matrix by softmax, the values of each row of the channel correlation matrix represent the correlation between each channel of the student profile and all channels of the teacher profile. And multiplying the channel correlation matrix with the value characteristic V, wherein for the same channel of the student characteristic diagram and the teacher characteristic diagram, the teacher characteristic diagram can take the correlation between the channel of the student characteristic diagram and each channel of the teacher characteristic diagram as the weight of each channel of the teacher characteristic diagram, multiply the weight of each channel of the teacher characteristic diagram, and sum all the channels, so that the characteristics of a plurality of channels are fused. And finally, restoring the adjusted teacher characteristic diagram to the original size through a 1X 1 convolution, wherein the calculation formula is as follows:
f' T =ω(V·Softmax(K·Q))
Wherein f' T For the adjusted teacher feature map, ω is a 1×1 convolution. The model firstly carries out feature query on semantic difference information and pixel difference information through the distortion graph features extracted by the degradation image difference information extraction branches, namely adjusts the semantic difference features and the pixel difference features through the improved non-local attention module, and then carries out matching. Matching adopts L2 loss to measure learning condition, and semantic difference information loss and pixel difference information loss are shown in formulas (5) and (6):
where N represents the number of training samples,and->Respectively a semantic difference feature map and a pixel difference feature map,extracting branch-extracted distortion map features for degraded image difference information, F cnl_sd And F cnl_diff And the channel query module is used for respectively querying the semantic difference features and the pixel difference features.
S500 training the student network with teacher network 1,2 and semantic information distillation loss comprises:
splicing the reference image features and the degraded image features on the channel, and simultaneously carrying out distortion filtering from different scales through 3 largest pooling layers with different sizes, wherein only semantic contour information shared by the reference image and the degraded image is reserved;
activating by using a softmax module, indicating the existence position of semantic information to obtain a semantic profile information indication graph, multiplying the semantic profile information indication graph into a reference graph feature graph to obtain reference semantic information required to be learned by a student network, and learning by the student network in a feature matching mode;
Calculating semantic information distillation loss:
wherein,and->Respectively a reference graph semantic feature graph and a no-reference network semantic information branch feature graph, +.>Representing matrix multiplication, F SSIIM The semantic profile information indicating module is used for indicating semantic profile information;
updating parameters in the student network according to the loss function until the parameters in the student network converge.
In order to enable the NR-IQA model to extract the difference information and the semantic information in the degraded image, a student network is also required to be constructed, and the capability of the teacher network for extracting the difference information and the semantic information is transferred to the student network.
In order to learn semantic information, the semantic information in the reference image is extracted by adopting a reference image semantic information extraction branch, and the semantic information extraction capability is hoped to be transferred to a degraded image semantic information extraction branch, but distortion exists in a distorted image, so that semantic information features in the image can be damaged, at the moment, a student network is difficult to learn and extract the semantic information features from a teacher network in a feature matching mode, and the semantic features of a distorted image part can be lost due to the fact that the semantic information of the distorted image is damaged. However, for most distortions, even if the distortion exists, the outline of the object in the image can still be observed by people, namely, people can recognize that the object exists here, so the model takes outline information as semantic information to be learned, and transfers the information from a teacher characteristic diagram to a student characteristic diagram. In order to extract the semantic profile information common to the reference graph and the distortion graph, the model is inspired by the SPP module structure, and a semantic profile information indicating module is provided, and the structure diagram of the semantic profile information indicating module is shown in fig. 3. The module is used for splicing the reference map features and the distortion map features on the channel, distortion filtering is carried out on the reference map features and the distortion map features from different scales through 3 largest pooling layers with different sizes, only semantic contour information shared by the reference map and the distortion map is reserved, and finally the module is used for activating and indicating the existence position of the semantic information, so that a semantic contour information indication map is obtained. Multiplying the weight matrix into the reference graph feature graph to obtain the reference semantic information required to be studied by the student network.
The invention respectively constructs the degradation image semantic information extraction branch network and the difference information extraction branch network by two network modules with the same branch structure as the teacher network to doIs a student network. The inputs of the two branches are degradation images, and semantic information features and difference information features of the degradation images are respectively learned and extracted. Semantic information features and difference information features are respectively marked asAnd->In order to transfer the reference information from the teacher network to the student network, the invention adopts a characteristic distillation mode to enable the student network to learn and extract the reference information. Specifically, as shown in fig. 1, for the feature map output by each residual block of the degraded image semantic information extraction branch and the difference information extraction branch, feature matching is performed on the feature maps extracted by the reference image semantic information extraction branch teacher network 1 and the pixel difference information extraction branch teacher network 2 respectively in a feature matching manner, so that the student network imitates the output of the teacher network, and the reference information in the teacher network is transferred to the student network.
S600, before inputting the image to be evaluated into a trained student network for image quality evaluation, S510 is further included for carrying out feature fusion on the semantic features and the difference features of the degraded image, and specifically:
Obtaining maximum responses in all feature maps from the channels using channel maximization, the responses representing locations where semantic information exists;
the method comprises the steps that a channel selection module is utilized, semantic information is adjusted by using semantic difference features, and the most relevant semantic information is selected for each channel in a semantic difference feature map;
multiplying the maximum response diagram of the channel to the semantic difference feature diagram, selecting distortion at the position where semantic information exists, and adding the distortion with the adjusted semantic information features to obtain a multi-information fusion feature diagram.
The two branches of the student network extract features from the semantic information and the difference information respectively, but semantic distortion is not caused by each type of distortion, and only pixel difference information can be extracted and evaluated for the distortion. In this case, the evaluation performance is severely affected by masking effects, i.e. for some high frequency regions, the complex texture thereof may cause distortion not to be observed by the human eye, i.e. this region, even if distorted, will not affect human perception of the image, but the neural network is able to extract distortion features in these high frequency regions and use them for scoring, resulting in model scores that tend to be lower than the subjective score of the distorted image (lower means worse image quality).
To solve this problem, it is necessary to introduce semantic information for the difference information features, because the occurrence of distortion on the image object affects the human score of the image more than the occurrence of distortion in the image background, thereby solving the masking problem to some extent. Thus, we blend the semantic information and the difference information features to better indicate the distortion location that most affects the human score. Specific structure as shown in fig. 4, the maximum responses in all feature maps are obtained from the channels using channel maximization pooling, and represent the locations where semantic information exists. And then, a channel selection module is utilized to adjust semantic information by using the semantic difference features, and the most relevant semantic information is selected for each channel in the semantic difference feature map. And multiplying the maximum response diagram of the channel to the semantic difference feature diagram, selecting distortion at the position where semantic information exists, and adding the distortion and the adjusted semantic information features to obtain a multi-information fusion feature diagram. And then, the model further extracts the distortion characteristics and gradient characteristics after fusion by using two ResNet-50 3 rd residual blocks.
S600, before inputting the image to be evaluated into a trained student network for image quality evaluation, S520 is also included for extracting degradation image gradient characteristics as image quality evaluation auxiliary information, specifically:
Extracting the image gradient by adopting a Scharr operator, wherein the calculation formula is as follows:
I g =scharr(I dis )
wherein I is g Is a gradient map of the distorted image.
Gradient information is also important for image quality assessment, which can indicate distortion in the image structure, reducing the impact of masking effects. Therefore, the invention adopts the gradient feature extraction network to extract the features of the gradient map of the degraded image, and uses the gradient map as an index for predicting the quality of the final image. The gradient map of the degraded image is obtained by calculating a gradient operator, experiments are carried out on a Prewitt operator, a Sobel operator, a Robert operator and a Scharr operator, and the gradient map calculated by the Scharr operator is found to have the best effect on the model, so that the model adopts the Scharr operator to extract the image gradient.
S600, inputting the image to be evaluated into a trained student network for image quality evaluation comprises the following steps:
global average pooling is carried out on gradient features and distortion features extracted from the 3 rd residual blocks of two ResNet-50 in the student network, and the gradient features and the distortion features are spliced on channels and input into the student network;
the nodes of the three full-connection layers are 1024, 2048 and 2048 respectively, and finally the image quality fraction is output.
S500, after distilling loss training student network by using teacher network 1,2 and semantic information, further comprises S530 calculating quality score loss training student network of degraded image, specifically:
the L2 loss is used for measuring the difference between the image quality score of the student network evaluation and the standard evaluation, and the definition is as follows:
wherein q is i Andthe quality scores for the label of the degraded image and the student network evaluation, respectively, define the overall loss of the model as follows:
L=L q +λL diff +λL sd +λL si
where λ is the distillation weight.
And (3) performing performance test on the model, performing performance evaluation on the trained model by using a test data set through various evaluation indexes, and directly removing the model without a teacher network during test, wherein the model is a reference-free model, and only the distorted image and the gradient graph thereof are respectively input into a student network and a gradient feature extraction network for feature extraction and evaluating the quality of the distorted image.
The model is trained and tested on LIVE, CSIQ and TID2013 databases, the trained model is evaluated by adopting two classical evaluation indexes of SROCC and PLCC, and meanwhile, the model performance is analyzed by performing a comparison experiment with other deep learning models. The application and 5 advanced IQA methods were tested on 3 databases, respectively, and the model that acquired the first performance was marked red, the model that acquired the second performance was marked green, and the model that acquired the third performance was marked blue. As can be seen from table 2, the present model achieves the first three properties in all three databases and the first property on the TID2013 database, which illustrates that the present model can greatly improve the performance of the NR-IQA method after introducing the reference information for the NR-IQA method. In addition, for the TID2013 database with the most distortion types, the performance of the model is improved by 1.7% on SROCC compared with the model with the rank 2, which shows that the model is more universal and can adapt to various distortion types. To further illustrate that the present model is capable of handling multiple distortions, the present application tests each distortion of TID2013 separately, the results of which are shown in table 2, and the first performing model is reddish. As can be seen from table 2, the present model achieves optimal performance among 15 kinds of distortion, which indicates that the present model has a stronger generalization ability and can effectively cope with various kinds of distortion.
Table 2 6 IQA methods SROCC results for each distortion test of TID2013
Example 2
A system for reference-free image quality assessment for semantic and structural distillation, comprising:
the first data processing module is used for constructing an image training set and dividing the image training set into a reference image set and a degraded image set;
the first teacher network training module is used for training the teacher network 1 for extracting the image semantic information by using the reference image set;
a second teacher network training module for training the teacher network 2 for image difference information extraction with the reference image set;
the first student network training module is used for training the student network by utilizing the teacher network 1 and 2 and the difference information distillation loss;
the second student network training module is used for training the student network by utilizing the teacher network 1,2 and the semantic information distillation loss;
and the second data processing module is used for inputting the image to be evaluated into the trained student network to perform image quality evaluation.
The method comprises the steps of constructing an image training set, and dividing the image training set into a reference image set and a degraded image set; training a teacher network 1 for extracting image semantic information by using a reference image set; training a teacher network 2 for image difference information extraction by using a reference image set; training a student network by using the teacher network 1,2 and the difference information distillation loss; training a student network by using the teacher network 1,2 and semantic information distillation loss; and inputting the image to be evaluated into a trained student network for image quality evaluation. The reference information is transferred to the non-reference network by using a knowledge distillation mode, the defect that the non-reference network lacks reference information can be well solved, and the performance of the method on a plurality of databases is better than that of a general non-reference network model, and even exceeds that of an image quality evaluation model by using an image recovery technology.
The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for reference-free image quality assessment for semantic and structural distillation, comprising:
constructing an image training set, and dividing the image training set into a reference image set and a degraded image set;
training a teacher network 1 for extracting image semantic information by using a reference image set;
training a teacher network 2 for image difference information extraction by using a reference image set;
training a student network by using the teacher network 1,2 and the difference information distillation loss;
training a student network by using the teacher network 1,2 and semantic information distillation loss;
and inputting the image to be evaluated into a trained student network for image quality evaluation.
2. A semantic and structural retorting no-reference image quality assessment method as claimed in claim 1, wherein said training teacher network 1 for image semantic information extraction with reference image set includes:
the teacher network 1 uses a first convolution layer of ResNet-50 and first two residual blocks as a feature extraction module for extracting features of an input image;
reference will be made to figure I ref Inputting a teacher network 1;
teacher network output Representing the semantic features of the reference image extracted from the reference map by the ith residual block.
3. A semantic and structural retorting no-reference image quality evaluation method as claimed in claim 1, wherein said training teacher network 2 for image difference information extraction with reference image set includes:
map of pixel differences I diff Inputting a teacher network 2;
the pixel difference map is obtained by subtracting the degraded image from the reference image, and the calculation formula is as follows:
I diff =|I ref -I dis |
wherein I is ref For reference pictures, I dis To degrade the image, I diff Is a difference graph;
teacher network 2 output Representing the features of the ith residual block extracted from the disparity map;
the semantic level difference information features are obtained by subtracting the reference image semantic features extracted by the reference image semantic information extraction branches from the distortion map semantic features extracted by the degradation image semantic information extraction branches.
4. A method of non-reference image quality assessment for semantic and structural distillation according to claim 1, wherein training the student network using the teacher network 1,2 and the difference information distillation loss function comprises:
inputting the reference images into a teacher neural network one by one, and calculating characteristic attention mapping of the teacher neural network to the pictures: key featuresSum feature (v=λ (f) T ));
The degraded images are input to a student neural network one by one, and feature attention mapping of the student neural network to the pictures is calculated: query feature (q=θ (f) S ));
θ, φ and λ are 3 1×1 convolutions, respectively, which compress the number of original feature map channels to C', screen out redundant channel features;
calculating semantic difference distillation loss:
calculating pixel differential distillation loss:
where N represents the number of training samples,and->Semantic difference feature map and pixel difference feature map, respectively,>extracting branch-extracted distortion map features for degraded image difference information, F cnl_sd And F cnl_diff The channel inquiry module is used for inquiring the semantic difference characteristics and the pixel difference characteristics respectively;
updating parameters in the student network according to the loss function until the parameters in the student network converge.
5. A method of non-reference image quality assessment for semantic and structural retorting according to claim 1, wherein training student networks using teacher network 1,2 and semantic information retorting loss comprises:
Splicing the reference image features and the degraded image features on the channel, and simultaneously carrying out distortion filtering from different scales through 3 largest pooling layers with different sizes, wherein only semantic contour information shared by the reference image and the degraded image is reserved;
activating by using a softmax module, indicating the existence position of semantic information to obtain a semantic profile information indication graph, multiplying the semantic profile information indication graph into a reference graph feature graph to obtain reference semantic information required to be learned by a student network, and learning by the student network in a feature matching mode;
calculating semantic information distillation loss:
wherein,and->Respectively a reference graph semantic feature graph and a no-reference network semantic information branch feature graph, +.>Representing matrix multiplication, F SSIIM The semantic profile information indicating module is used for indicating semantic profile information;
updating parameters in the student network according to the loss function until the parameters in the student network converge.
6. The method for evaluating the quality of a reference-free image by semantic and structural distillation according to claim 1, wherein before inputting the image to be evaluated into a trained student network for evaluating the quality of the image, the method further comprises the step of carrying out feature fusion on semantic features and difference features of a degraded image, and is characterized in that:
Obtaining maximum responses in all feature maps from the channels using channel maximization, the responses representing locations where semantic information exists;
the method comprises the steps that a channel selection module is utilized, semantic information is adjusted by using semantic difference features, and the most relevant semantic information is selected for each channel in a semantic difference feature map;
multiplying the maximum response diagram of the channel to the semantic difference feature diagram, selecting distortion at the position where semantic information exists, and adding the distortion with the adjusted semantic information features to obtain a multi-information fusion feature diagram.
7. The semantic and structural distillation no-reference image quality evaluation method according to claim 1, wherein before inputting the image to be evaluated into a trained student network for image quality evaluation, the method further comprises extracting degradation image gradient characteristics as image quality evaluation auxiliary information, specifically comprising:
extracting the image gradient by adopting a Scharr operator, wherein the calculation formula is as follows:
I g =scharr(I dis )
wherein I is g Is a gradient map of the distorted image.
8. The method for evaluating the quality of a reference-free image for semantic and structural distillation according to claim 1, wherein said inputting the image to be evaluated into a trained student network for evaluating the quality of the image comprises:
Global average pooling is carried out on gradient features and distortion features extracted from the 3 rd residual blocks of two ResNet-50 in the student network, and the gradient features and the distortion features are spliced on channels and input into the student network;
the nodes of the three full-connection layers are 1024, 2048 and 2048 respectively, and finally the image quality fraction is output.
9. The method for evaluating quality of reference-free image for semantic and structural distillation according to claim 1, wherein after training student network by using teacher network 1,2 and semantic information distillation loss, further comprising training student network by calculating quality score loss of degraded image, specifically comprising:
the L2 loss is used for measuring the difference between the image quality score of the student network evaluation and the standard evaluation, and the definition is as follows:
wherein q is i Andthe quality scores for the label of the degraded image and the student network evaluation, respectively, define the overall loss of the model as follows:
L=L q +λL diff +λL sd +λL si
where λ is the distillation weight.
10. A system for reference-free image quality assessment for semantic and structural retorting, comprising:
the first data processing module is used for constructing an image training set and dividing the image training set into a reference image set and a degraded image set;
the first teacher network training module is used for training the teacher network 1 for extracting the image semantic information by using the reference image set;
A second teacher network training module for training the teacher network 2 for image difference information extraction with the reference image set;
the first student network training module is used for training the student network by utilizing the teacher network 1 and 2 and the difference information distillation loss;
the second student network training module is used for training the student network by utilizing the teacher network 1,2 and the semantic information distillation loss;
and the second data processing module is used for inputting the image to be evaluated into the trained student network to perform image quality evaluation.
CN202311135174.4A 2023-09-04 2023-09-04 Semantic and structural distillation reference-free image quality evaluation method Pending CN117274173A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311135174.4A CN117274173A (en) 2023-09-04 2023-09-04 Semantic and structural distillation reference-free image quality evaluation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311135174.4A CN117274173A (en) 2023-09-04 2023-09-04 Semantic and structural distillation reference-free image quality evaluation method

Publications (1)

Publication Number Publication Date
CN117274173A true CN117274173A (en) 2023-12-22

Family

ID=89209601

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311135174.4A Pending CN117274173A (en) 2023-09-04 2023-09-04 Semantic and structural distillation reference-free image quality evaluation method

Country Status (1)

Country Link
CN (1) CN117274173A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117593296A (en) * 2024-01-18 2024-02-23 厦门大学 No-reference image quality evaluation method based on diffusion model
CN117593296B (en) * 2024-01-18 2024-05-31 厦门大学 No-reference image quality evaluation method based on diffusion model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117593296A (en) * 2024-01-18 2024-02-23 厦门大学 No-reference image quality evaluation method based on diffusion model
CN117593296B (en) * 2024-01-18 2024-05-31 厦门大学 No-reference image quality evaluation method based on diffusion model

Similar Documents

Publication Publication Date Title
CN108428227B (en) No-reference image quality evaluation method based on full convolution neural network
CN106920224B (en) A method of assessment stitching image clarity
CN108665460B (en) Image quality evaluation method based on combined neural network and classified neural network
CN109872305B (en) No-reference stereo image quality evaluation method based on quality map generation network
Lee et al. Toward a no-reference image quality assessment using statistics of perceptual color descriptors
CN110879982B (en) Crowd counting system and method
CN106548169A (en) Fuzzy literal Enhancement Method and device based on deep neural network
CN109871845B (en) Certificate image extraction method and terminal equipment
CN114066812B (en) No-reference image quality evaluation method based on spatial attention mechanism
CN110782413B (en) Image processing method, device, equipment and storage medium
CN111709914B (en) Non-reference image quality evaluation method based on HVS characteristics
CN113284100A (en) Image quality evaluation method based on recovery image to mixed domain attention mechanism
CN109859166A (en) It is a kind of based on multiple row convolutional neural networks without ginseng 3D rendering method for evaluating quality
CN115526891B (en) Training method and related device for defect data set generation model
CN114863236A (en) Image target detection method based on double attention mechanism
Fang et al. Blind quality assessment for tone-mapped images by analysis of gradient and chromatic statistics
CN115205196A (en) No-reference image quality evaluation method based on twin network and feature fusion
CN113538400B (en) Cross-modal crowd counting method and system
CN113222902B (en) No-reference image quality evaluation method and system
CN110570402A (en) Binocular salient object detection method based on boundary perception neural network
CN114241344A (en) Plant leaf disease and insect pest severity assessment method based on deep learning
CN114187261A (en) Non-reference stereo image quality evaluation method based on multi-dimensional attention mechanism
CN113706400A (en) Image correction method, image correction device, microscope image correction method, and electronic apparatus
CN113378620A (en) Cross-camera pedestrian re-identification method in surveillance video noise environment
CN105574844A (en) Radiation response function estimation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination