CN115661114A

CN115661114A - Full-reference image quality evaluation method based on Conformer and meta learning

Info

Publication number: CN115661114A
Application number: CN202211400068.XA
Authority: CN
Inventors: 周明亮; 郎书君; 蒲华燕; 罗均; 张太平; 尚赵伟; 向涛; 房斌
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2022-11-09
Filing date: 2022-11-09
Publication date: 2023-01-31

Abstract

The invention discloses a full-reference image quality evaluation method based on a former and meta learning, which comprises the following steps of: acquiring a distorted image to be predicted and a corresponding reference image; inputting the distorted image and the corresponding reference image into an image quality evaluation model which is trained in advance; the image quality evaluation model includes: the system comprises a feature extraction module, a similarity calculation module, a weight calculation module, an evaluation score module and a training and testing module based on meta-learning; and outputting a prediction score corresponding to the distorted image. The method is based on the image quality evaluation model, and can realize accurate prediction of the quality score of the distorted image to be predicted while ensuring the generalization capability of the model when facing small sample data sets of various distortion types through the constructed feature extraction module, the similarity calculation module, the weight calculation module, the evaluation score module and the training and testing module based on meta-learning.

Description

Full-reference image quality evaluation method based on Conformer and meta-learning

Technical Field

The invention relates to the technical field of image and video processing, in particular to a full-reference image quality evaluation method based on Conformer and meta-learning.

Background

In the fields of image denoising, image super-resolution, image compression and the like, the image distortion condition becomes a key evaluation index in the field of image processing. The image perceived quality score obtained by a human observer is accurate, but it requires a lot of manpower and material resources.

The traditional full-reference image quality evaluation method generally adopts manual acquisition of feature descriptors to extract distortion-related features representing the difference between the features and a reference image from a distorted image, however, the manual feature descriptors are difficult to include all distortion types.

In the full-reference image quality evaluation method based on deep learning, if a conventional single gradient descent method is adopted in a training process, when a training data set and a test data set belong to two different types of small sample data sets, the distortion type judgment capability learned by a model from the current data set is difficult to actually apply to the other data set, which can cause the generalization capability of the model to be reduced.

Therefore, on the basis of the existing full-reference image quality evaluation method, when the method is applied to small sample data sets with various distortion types, how to accurately sense the image quality on the basis of ensuring the model generalization capability becomes a problem to be solved in the field.

Disclosure of Invention

The invention aims to provide a full-reference image quality evaluation method based on the Conformer and the meta-learning, which can accurately predict the quality score of an image on the basis of ensuring the generalization capability of a model when facing small sample data sets of different distortion types.

In order to achieve the purpose, the invention adopts the technical scheme that:

the invention provides a full-reference image quality evaluation method based on a former and meta learning, which comprises the following steps:

s1, obtaining a distorted image to be predicted and a corresponding reference image;

s2, inputting the distorted image and the corresponding reference image into an image quality evaluation model which is trained in advance; the image quality evaluation model includes: the device comprises a feature extraction module, a similarity calculation module, a weight calculation module, an evaluation score module and a training and testing module based on meta-learning;

and S3, outputting the prediction score corresponding to the distorted image.

Further, the training process of the image quality evaluation model in step S2 includes:

s21, inputting various types of distorted images and corresponding reference images as training data sets into a twin network with a former as a feature extraction module, and extracting feature vectors corresponding to the distorted images and the reference images respectively;

s22, in a similarity calculation module, calculating the feature vectors corresponding to the distorted image and the reference image respectively in a preset two modes to obtain the similarity of the feature vectors of the distorted image and the reference image;

s23, in the weight calculation module, after the feature vectors of the distorted image and the feature vectors of the reference image are fused, mapping the fused features into the weight of the feature vector similarity through a plurality of full connection layers and activation functions;

s24, calculating the final feature vector similarity of the distorted image and the reference image by combining the similarity of the feature vectors of the distorted image and the reference image in the step S22 and the weight of the feature vector similarity in the step S23 to obtain the final quality evaluation score of the distorted image;

s25, adding meta learning into the training and testing process of the model: in the training stage, the model learns and adjusts the current model initialization parameters according to a part of distortion types by using a dual gradient descent method from a support set to a query set; in the testing stage, a model with the capability of identifying the distortion type after the training stage is subjected to parameter fine adjustment through a small amount of other distortion type images, so that a final image quality evaluation model is obtained.

Further, in step S21, inputting a twin network using a transformer as a feature extraction module, and extracting feature vectors corresponding to the distorted image and the reference image; the method comprises the following steps:

respectively inputting the distorted image and the corresponding reference image into a feature extraction network with a transformer as a main body, and extracting respective corresponding global features by using a self-attention mechanism; extracting corresponding local features by utilizing convolution operation;

and after the global features and the local features are embedded mutually, changing the dimensionality of the global features and the local features, and extracting the feature vectors corresponding to the distorted image and the reference image respectively.

Further, in step S22, the calculating and obtaining the similarity between the feature vectors of the distorted image and the feature vectors of the reference image by using two preset methods includes:

calculating similarity between the feature vector of the distorted image and the feature vector of the reference image by using the project DotProduceSimetric method ₁ ：

(1) In the formula, V ₁ Representing a feature vector obtained after the reference image passes through the twin network; v ₂ Representing the feature vector, w, obtained after the distorted image has passed through the twin network ₁ And w ₂ A random initialization vector representing the dimension of the adjusted feature vector;

calculating similarity between the feature vector of the distorted image and the feature vector of the reference image by using TriLinearSimiarity method ₂ ：

similarity ₂ ＝w ^T [V ₁ ,V ₂ ,V ₁ *V ₂ ]+b (2)

(2) In the formula, V ₁ *V ₂ Representing a vector obtained by multiplying corresponding elements of the two characteristic vectors; [ V ] ₁ ,V ₂ ,V ₁ *V ₂ ]Representing the result of splicing the three vectors according to the last dimension; w represents the random initialization vector for adjusting the dimension of the stitching vector, and b represents the bias term for fine-tuning the similarity result.

Further, the step S23 includes:

s231, referring to the feature vector V of the image ₁ And feature vector V of distorted image ₂ Fusion was performed using concat procedure:

V _fuse ＝cat(V ₁ ,V ₂ ,1) (3)

s232, obtaining fusion characteristics V _fuse Then, V is converted into V through a plurality of full connection layers and activation functions _fuse Mapping the obtained result as the weight parameter of the similarity

Wherein, FC ₁ 、FC ₂ All the layers are fully connected, and f represents a sigmoid activation function.

Further, the step S24 includes:

combining weight parameters

Calculating the similarity of the distorted image feature vector and the reference image feature vector:

similarity represents the similarity between the distorted image and the reference image as the final distorted picture quality prediction score.

Further, the step S25 of adding meta learning to the training and testing process of the model specifically includes:

s251, in the training process, selecting a part of distortion types, and dividing the small sample data set into subtasks, wherein each subtask is divided into a support set and a query set;

s252, setting loss functions of a support set and an inquiry set in the subtask;

s253, in the training process, calculating a first-order gradient of a loss function corresponding to the current subtask support set, and updating the current model parameter through an Adam optimizer, so as to complete gradient optimization corresponding to the support set;

s254, in the training process, on the basis of the model parameters after the step S253, inputting a query set of the current subtask, and updating the current model parameters again through an Adam optimizer, so that gradient optimization corresponding to the query set is completed; integrating the gradient updating results of all subtasks to obtain prior model parameters having a guiding effect on a testing stage;

and S255, inputting a small amount of images of the other distortion types in the testing process, and updating the model with the prior parameters by using an Adam optimizer to obtain a final image quality evaluation model.

Further, the step S252 includes:

setting the loss functions of the support set and the inquiry set in the subtask:

wherein e represents the quality label score corresponding to the distorted image, and s represents the quality prediction score corresponding to the distorted image.

Compared with the prior art, the invention has the following beneficial effects:

the embodiment of the invention provides a full-reference image quality evaluation method based on Conformer and meta learning, which comprises the following steps: acquiring a distortion image to be predicted and a corresponding reference image; inputting the distorted image and the corresponding reference image into an image quality evaluation model which is trained in advance; the image quality evaluation model includes: the device comprises a feature extraction module, a similarity calculation module, a weight calculation module, an evaluation score module and a training and testing module based on meta-learning; and outputting a prediction score corresponding to the distorted image. The method is based on the image quality evaluation model, and can realize accurate prediction of the quality score of the distorted image to be predicted while ensuring the generalization capability of the model when facing small sample data sets of various distortion types through the constructed feature extraction module, the similarity calculation module, the weight calculation module, the evaluation score module and the training and testing module based on meta-learning.

Drawings

FIG. 1 is a flowchart of a full-reference image quality evaluation method based on Conformer and meta learning according to an embodiment of the present invention;

FIG. 2 is a flowchart of an image quality evaluation model training process according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a full-reference image quality evaluation method based on the former and meta learning according to an embodiment of the present invention.

Detailed Description

In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.

In the description of the present invention, it should be noted that the terms "upper", "lower", "inner", "outer", "front", "rear", "both ends", "one end", "the other end", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "disposed," "connected," and the like are to be construed broadly, such as "connected," which may be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Referring to fig. 1, the method for evaluating quality of a full-reference image based on a former and meta learning according to the present invention includes the following steps:

s2, inputting the distorted image and the corresponding reference image into an image quality evaluation model which is trained in advance; the image quality evaluation model includes: the system comprises a feature extraction module, a similarity calculation module, a weight calculation module, an evaluation score module and a training and testing module based on meta-learning;

and S3, outputting a prediction score corresponding to the distorted image.

In step S1, for example, in the image compression field, the distorted image refers to a compressed image, and the reference image refers to: an image before compression; the picture contents of the two images are consistent, and the distorted image has noise and/or lower resolution compared with the reference image, so that the human visual perception has certain blurring. The image quality evaluation model in step S2 includes: the device comprises a feature extraction module, a similarity calculation module, a weight calculation module, an evaluation score module and a training and testing module based on meta-learning;

and the feature extraction module is used for extracting a feature vector of the image, is formed by fusing the global feature and the local feature of the image, and shares the weight parameter of the network when extracting the feature of the distorted image and the feature of the reference image. The similarity calculation module is used for calculating the similarity of the image feature vectors; and the weight calculation module is used for calculating the similarity weight after the feature vector of the distorted image and the feature vector of the reference image are fused. And the evaluation score module is used for calculating the final quality evaluation score of the distorted image according to the outputs of the similarity calculation module and the weight calculation module. And the training and testing module based on the meta-learning adds the meta-learning into the training and testing process of the model, so that the model can adjust the current initialization parameters of the model according to different distortion types to improve the generalization capability of the model.

In step S3, a prediction score corresponding to the distorted image is output based on the image quality evaluation model trained in advance. The method can accurately predict the quality score of the distorted image to be predicted while ensuring the generalization capability of the model when facing small sample data sets of various distortion types.

In one embodiment, referring to fig. 2, the image annotation model training process of step S2 includes the following steps:

s21, inputting various types of distorted images and corresponding reference images serving as training data sets into a twin network taking a transformer as a feature extraction module, and extracting feature vectors corresponding to the distorted images and the reference images respectively;

s22, in a similarity calculation module, calculating the similarity of the feature vectors of the distorted image and the reference image by respectively adopting two preset modes;

In step S21, a plurality of types of distorted images and corresponding reference images are used as the training data set, for example, the distortion types include: white noise, gaussian blur, image compression, etc.

And inputting the distorted image and the corresponding reference image into a twin network with the Conformer as a feature extraction module. The characteristic vectors corresponding to the reference image and the distorted image are V ₁ And V ₂ They are all made up of a combination of global and local features of the image. The feature extraction modules of the distorted image and the reference image share the weight parameters of the network. Specifically, a reference image is input into a feature extraction network with a former as a backbone, global features of the reference image are extracted by using a self-attention mechanism, local features of the reference image are extracted by using convolution operation, the dimensions of the global features and the local features of the reference image are changed after the global features and the local features of the reference image are embedded into each other, and a feature vector V of the reference image is obtained ₁ . Constructing a twin network sharing the internal structure parameters of the former, sending the distorted image into the same feature extraction network, and obtaining the feature vector V of the distorted image by the same method as the reference image ₂ 。

In step S22, after passing through the feature extraction network, the similarity between the distorted image feature vector and the reference image feature vector is calculated by using the project dotproductsimilarity method ₁ ，

In the above formula, V ₁ Representing the feature vector, V, obtained after the reference image has passed through the twin network ₂ Representing the feature vector, w, obtained after the distorted image has passed through the twin network ₁ And w ₂ A random initialization vector representing the dimensions of the adjusted feature vector. Calculating similarity between the feature vector of the distorted image and the feature vector of the reference image by using TriLinearSimiarity method ₂ ，

similarity ₂ ＝w ^T [V ₁ ,V ₂ ,V ₁ *V ₂ ]+b (2)

In the above formula, V ₁ *V ₂ Represents a vector obtained by multiplying corresponding elements of two feature vectors, [ V ] ₁ ,V ₂ ,V ₁ *V ₂ ]The method comprises the steps of representing the result of splicing three vectors according to the last dimension, w represents a random initialization vector for adjusting the dimension of the spliced vectors, and b represents a bias term for fine adjustment of the similarity result.

In step S23, weight parameters for obtaining similarity

Feature vector V of reference image ₁ And feature vector V of distorted image ₂ The first dimension of the vector is stitched using the concat operation,

V _fuse ＝cat(V ₁ ,V ₂ ,1) (3)

obtaining a fusion feature V _fuse Then, V is converted into V through a plurality of full connection layers and activation functions _fuse Mapping the obtained result as the weight parameter of the similarity

Therein, FC ₁ 、FC ₂ Are all full connection layers, and the input and output dimensions thereof are respectively (4000, 512) and (512)1), f denotes a sigmoid activation function.

In step S24, weight parameters are obtained

Then, calculating the final similarity of the distortion image feature vector and the reference image feature vector,

similarity represents the similarity between the distorted image and the reference image, and the closer the distorted image is to the reference image, the larger the similarity value is, therefore, the invention takes the similarity as the final picture quality prediction score.

In step S25, the meta-learning is added to the training and testing process of the model, and the model society optimally adjusts the model parameters by using the dual gradient descent method from the support set to the query set in the meta-learning.

In the training process, the training data set is composed of N subtask sets, each subtask set is composed of a support set and a query set, and the training data set can be expressed as,

wherein the content of the first and second substances,

a different set of sub-tasks is represented,

representing the corresponding support set and query set in different subtask sets, N being the total number of subtasks. The loss functions corresponding to the support set and the query set are set as:

wherein e represents the quality label score corresponding to the distorted image, and s represents the quality prediction score corresponding to the distorted image. The support set in the ith subtask can be expressed as

Its corresponding loss function can be expressed as

In the training process, the model learns and adjusts the current model initialization parameters according to a part of distortion types by using a dual gradient descent method from the support set to the query set. Such as: white noise, gaussian blur, image compression, and other distortion types.

Firstly, gradient optimization is carried out on a support set, all parameters of the model are represented by theta,

the first order gradient with respect to θ is:

wherein f is _θ A model with a parameter theta is represented. Support set at ith subtask through Adam optimizer

The updating process is divided into T steps, and after the updating, the parameters of the model are as follows:

where a is an internal learning rate, and in order to prevent the denominator from becoming 0, a correction term is added

Is provided with

m _θ(t) And v _θ(t) The first and second moments of the gradient are represented, respectively, as:

m _θ(t) ＝y ₁ *m _θ(t-1) +(1-y ₁ )*g _θ(t)

wherein, y ₁ And y ₂ Is m _θ(t) And v _θ(t) Exponential decay rate of (g) _θ(t) Is the gradient (T e {1, 2.., T }) updated in step T. Next, corresponding gradient optimization is carried out on the inquiry set, and based on the optimization result of the support set, the inquiry set of the ith subtask is subjected to optimization by an Adam optimizer

After the update, the parameters of the model are:

wherein, theta _i ' is the optimized model parameter, θ, on the ith subtask support set _i "is the optimized model parameter on the ith subtask query set. And integrating the gradient updating results of all the tasks for the N tasks in the training process to obtain prior model parameters having a guiding effect on the testing stage.

Where β is the external learning rate, θ _f And representing prior model parameters obtained after the gradient optimization is finished. During the test, a small number of other distortion type images are input, such as: using Adam optimizer to update model with prior parameters, completing the update by P steps, and after update, performing the modelThe parameters of the model are:

a _f is the external learning rate, θ, in the above model parameter fine tuning process _te Are model parameters after complete meta-learning. Finally, a distorted image to be predicted is input, and the optimal fractional prediction of the distorted image can be expressed as,

where x represents a single distorted picture of the model input.

The full-reference image quality evaluation method based on the former and meta learning of the present invention will be specifically described below with reference to fig. 3: the public data set is divided according to distortion types, white noise images, gaussian blurred images, compressed images and the like and reference images thereof are sent to a feature extraction module in batches to obtain feature vectors V1 and V2, and prediction scores of the current distortion images are obtained through a similarity calculation module and a weight calculation module. When the model training is carried out by using the distorted image and the reference image thereof, the parameters of the model are subjected to gradient updating twice; and selecting a small amount of other distortion types, such as ringing images and reference images thereof, and carrying out gradient updating on the trained model parameters for one time to finish fine adjustment of the model so as to obtain the optimal model. Inputting the rest ringing effect image and the reference image thereof to obtain the best prediction score.

The full-reference image quality evaluation method based on the former and the meta learning provided by the embodiment can accurately predict the quality score of a distorted image on the premise of ensuring the generalization capability of a model when the method faces small sample data sets of various distortion types.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A full-reference image quality evaluation method based on a former and meta learning is characterized by comprising the following steps:

and S3, outputting the prediction score corresponding to the distorted image.

2. The method according to claim 1, wherein the image quality evaluation model training process in step S2 includes:

3. The method according to claim 2, wherein in step S21, the full-reference image quality evaluation method based on former and meta learning is input into a twin network using former as a feature extraction module, and feature vectors corresponding to the distorted image and the reference image are extracted; the method comprises the following steps:

inputting the distorted image and the corresponding reference image into a feature extraction network with a transformer as a backbone, and extracting corresponding global features by using a self-attention mechanism; extracting corresponding local features by utilizing convolution operation;

4. The method as claimed in claim 2, wherein the step S22 of calculating the similarity between the feature vectors of the distorted image and the feature vectors of the reference image by using two preset methods includes:

calculating similarity between the feature vector of the distorted image and the feature vector of the reference image by using a project DotProducedSimilarity method ₁ ：

similarity ₁ ＝V ₁ ^T w ₁ (V ₂ ^T w ₂ ) ^T (1)

(1) In the formula, V ₁ Representing a feature vector obtained after the reference image passes through the twin network;V ₂ representing the feature vector, w, obtained after the distorted image has passed through the twin network ₁ And w ₂ A random initialization vector representing the dimension of the adjusted feature vector;

similarity ₂ ＝w ^T [V ₁ ,V ₂ ,V ₁ *V ₂ ]+b (2)

5. The method according to claim 4, wherein the step S23 comprises:

V _fuse ＝cat(V ₁ ,V ₂ ,1) (3)

s232, obtaining fusion characteristics V _fuse Then, V is divided into a plurality of full connection layers and activation functions _fuse Mapping the obtained result as the weight parameter of the similarity

Therein, FC ₁ 、FC ₂ All the layers are fully connected, and f represents a sigmoid activation function.

6. The method according to claim 5, wherein the step S24 comprises:

combining weight parameters

Calculating the similarity of the distortion image feature vector and the reference image feature vector:

7. The method for evaluating the quality of the full-reference image based on the former and the meta-learning of claim 6, wherein the step S25 adds the meta-learning to a training and testing process of the model, and specifically comprises:

s252, setting loss functions of a support set and a query set in the subtasks;

s253, in the training process, calculating a first-order gradient of a loss function corresponding to the current subtask support set, and updating current model parameters through an Adam optimizer, thereby completing gradient optimization corresponding to the support set;

8. The method as claimed in claim 7, wherein the step S252 includes: