CN113205495B

CN113205495B - Image quality evaluation and model training method, device, equipment and storage medium

Info

Publication number: CN113205495B
Application number: CN202110471056.5A
Authority: CN
Inventors: 朱若琳
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-04-28
Filing date: 2021-04-28
Publication date: 2023-08-22
Anticipated expiration: 2041-04-28
Also published as: CN113205495A

Abstract

The invention discloses a method, a device, equipment and a storage medium for evaluating image quality and training a model, relates to the technical field of artificial intelligence, and particularly relates to the technical fields of computer vision, deep learning and the like. The image quality evaluation method comprises the following steps: extracting image characteristics of each image in an image pair by adopting a characteristic extraction network in an image quality evaluation model, wherein the image characteristics comprise characteristics to be fused and output characteristics, and the image pair comprises: the method comprises the steps of referencing an image and an image to be evaluated, and carrying out space alignment processing on the features to be fused to obtain fusion features of each image; and determining the quality score of the image to be evaluated based on the fusion characteristic and the output characteristic by adopting a score determination network in the image quality evaluation model. The present disclosure can improve the image quality evaluation effect.

Description

Image quality evaluation and model training method, device, equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, deep learning and the like, and particularly relates to an image quality evaluation and model training method, device, equipment and storage medium.

Background

The image quality assessment (Image quality assessment, IQA) may process the image using an image quality assessment model to obtain a quality score. Specifically, an image quality evaluation model may be used to extract image features of an image, and the image features are fused to obtain fusion features, and image quality evaluation is performed based on the fusion features.

In the related art, the image features are directly vector-stitched when being fused.

Disclosure of Invention

The present disclosure provides an image quality evaluation and model training method, apparatus, device, and storage medium.

According to an aspect of the present disclosure, there is provided an image quality evaluation method including: extracting image characteristics of each image in an image pair by adopting a characteristic extraction network in an image quality evaluation model, wherein the image characteristics comprise characteristics to be fused and output characteristics, and the image pair comprises: the method comprises the steps of referencing an image and an image to be evaluated, and carrying out space alignment processing on the features to be fused to obtain fusion features of each image; and determining the quality score of the image to be evaluated based on the fusion characteristic and the output characteristic by adopting a score determination network in the image quality evaluation model.

According to another aspect of the present disclosure, there is provided a training method of an image quality evaluation model including: a feature extraction network and a score determination network, the method comprising: extracting image features of each image in a sample image pair by adopting the feature extraction network, wherein the image features comprise features to be fused and output features, the sample image pair comprises a reference image and a distorted image, and the features to be fused are subjected to space alignment processing to obtain the fused features of each image; determining a prediction score of the distorted image based on the fusion feature and the output feature using the score determination network; a loss function is determined based on the predictive score, and the feature extraction network and the score determination network are trained based on the loss function.

According to another aspect of the present disclosure, there is provided an image quality evaluation apparatus including: the extraction module is used for extracting the image characteristics of each image in the image pair by adopting a characteristic extraction network in the image quality evaluation model, wherein the image characteristics comprise characteristics to be fused and output characteristics, and the image pair comprises: the method comprises the steps of referencing an image and an image to be evaluated, and carrying out space alignment processing on the features to be fused to obtain fusion features of each image; and the determining module is used for determining a network by adopting the score in the image quality evaluation model and determining the quality score of the image to be evaluated based on the fusion characteristic and the output characteristic.

According to another aspect of the present disclosure, there is provided a training apparatus of an image quality evaluation model including: a feature extraction network and a score determination network, the apparatus comprising: the extraction module is used for extracting image characteristics of each image in a sample image pair by adopting the characteristic extraction network, wherein the image characteristics comprise characteristics to be fused and output characteristics, the sample image pair comprises a reference image and a distortion image, and the characteristics to be fused are subjected to space alignment processing to obtain fusion characteristics of each image; a determining module for determining a prediction score of the distorted image based on the fusion feature and the output feature using the score determining network; a training module for determining a loss function based on the predictive score and training the feature extraction network and the score determination network based on the loss function.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the above aspects.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method according to any one of the above aspects.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the above aspects.

According to the technical scheme, the image quality evaluation effect can be improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;

FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure;

FIG. 4 is a schematic diagram according to a fourth embodiment of the present disclosure;

FIG. 5 is a schematic diagram according to a fifth embodiment of the present disclosure;

FIG. 6 is a schematic diagram according to a sixth embodiment of the present disclosure;

FIG. 7 is a schematic diagram according to a seventh embodiment of the present disclosure;

FIG. 8 is a schematic diagram according to an eighth embodiment of the present disclosure;

FIG. 9 is a schematic diagram according to a ninth embodiment of the present disclosure;

fig. 10 is a schematic diagram of an electronic device for implementing any of the image quality evaluation and model training methods of the embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The purpose of IQA is to keep the evaluation result obtained by using the image quality evaluation model consistent with the subjective quality evaluation, i.e., the higher the quality score of the image with good subjective evaluation quality should be. When the model is adopted for quality evaluation, a plurality of image features can be fused, and the quality evaluation can be performed based on the fused features. In the related art, during fusion, each image feature can be respectively converted into a vector, and then the vectors corresponding to each image feature are directly spliced. However, this approach ignores the spatial positional relationship between the individual image features, affecting the image quality evaluation effect.

In order to improve the image quality evaluation effect, the present disclosure shows the following embodiments.

Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure. The present embodiment provides an image quality evaluation method, which includes:

101. extracting image characteristics of each image in an image pair by adopting a characteristic extraction network in an image quality evaluation model, wherein the image characteristics comprise characteristics to be fused and output characteristics, and the image pair comprises: and the fusion characteristics of each image are obtained by carrying out space alignment processing on the fusion characteristics.

102. And determining the quality score of the image to be evaluated based on the fusion characteristic and the output characteristic by adopting a score determination network in the image quality evaluation model.

In the image quality evaluation, an image quality evaluation model may be used. As shown in fig. 2, the image quality assessment model may include a feature extraction network 201 and a score determination network 202. The image quality evaluation may include an image quality evaluation based on a reference image, and when the image quality evaluation based on the reference image is employed, the input of the image quality evaluation model includes: and outputting the reference image and the image to be evaluated as the quality fraction of the image to be evaluated. The quality score is used for evaluating the quality of the image to be evaluated relative to the reference image. The reference image may be determined based on a specific application scenario, for example, the image quality evaluation may be used for image compression, image encoding and decoding, and the like, and taking image compression as an example, the reference image may be an image before compression, the image to be evaluated is an image after compression, and the compression effect may be evaluated by determining the quality score of the image after compression, for example, the quality score of the image after compression is higher, and the compression effect is better.

The feature extraction network may be a deep convolutional neural network (Deep Convolutional Neural Network, DCNN) with a backbone network (backbone) such as Resnet50.

As shown in fig. 3, the feature extraction network may include a backbone network 301 and a convergence module 302. The backbone network 301 is used to extract image features of an input image, and the backbone network includes a plurality of convolution layers, each of which can output image features of a corresponding layer, so that image features of different layers can be extracted. The image characteristics of partial layers can be selected as characteristics to be fused according to the setting, and the image characteristics output by the last layer of the backbone network can be used as output characteristics. The fusion module 302 is configured to perform fusion processing on the feature to be fused to obtain a fused feature, where a specific fusion processing manner is that the feature to be fused is subjected to spatial alignment processing.

The feature extraction networks process the image to be evaluated and the reference image respectively, and for ease of understanding, two feature extraction networks are shown in fig. 2, which process the image to be evaluated and the reference image respectively, and these two feature extraction networks share parameters. That is, in implementation, the image to be evaluated and the reference image may be input into the same feature extraction network at different times, respectively, so as to obtain the fusion feature and the output feature corresponding to the image to be evaluated and the reference image, respectively.

As shown in fig. 3, taking the feature extraction network to process the image to be evaluated as an example, taking the block-based image processing as an example, the image to be evaluated can be divided into a plurality of image blocks, the input of the feature extraction network is the image block, and a is used for ^m The representation, where m is the index of the block, the output is the corresponding fusion feature and output feature, the fusion feature is usedIndicating that the output characteristics are +.>And (3) representing.

Similarly, corresponding to the reference image, the fusion corresponding to the reference image can be output through the processing of the feature extraction networkFeatures (e.g. a character)And output image feature +.>

Further, the fusion module 302 may use a feature pyramid network (Feature Pyramid Network, FPN) to spatially align the features to be fused to obtain the fused features.

As shown in fig. 4, the feature to be fused includes a plurality of feature maps (feature maps), each of which is different in size, and the deeper the layer, i.e., the farther from the input image, the smaller the size of the image feature (or referred to as feature map) of the corresponding layer. According to the sequence from shallow to deep of the layers, the corresponding layers can be called from the bottom layer to the top layer, when FPN processing is adopted, 1*1 convolution can be carried out on the feature map of the top layer to realize smooth processing, then the feature map of the top layer is converted into the same size as the feature map of the next layer of the top layer through up-sampling, and then the feature map of the top layer after conversion with the same size and the feature map of the next layer of the top layer are subjected to corresponding channel splicing, namely corresponding elements in the map are added, so that layer-by-layer processing is realized, and the spatial alignment processing of the feature maps of different layers is realized. For example, in fig. 4, taking an example that the features to be fused include 3 feature graphs, the features after the smoothing process are respectively represented by C1 to C3 from the bottom layer to the top layer, and the corresponding features after the smoothing process are respectively represented by P1 to P3, then the fused features can be obtained by up-sampling and splicing layer by layer.

Through FPN processing, the obtained fusion features retain a large amount of texture information of the bottom layer and semantic information of the top layer, and spatial position alignment is realized to the greatest extent, so that the image quality evaluation effect is improved. In addition, by fusing the image features, the parameter amount of operation can be reduced, and the image quality evaluation efficiency can be improved.

Assuming that the fusion feature and the output feature corresponding to the reference image are called a first fusion feature and a first output feature, and the fusion feature and the output feature corresponding to the image to be evaluated are called a second fusion feature and a second output feature, the first fusion feature, the second fusion feature, the first output feature and the second output feature can be obtained through the processing of the feature extraction network. These features may then be input into a score determination network to obtain a quality score of the image to be evaluated.

Specifically, the method comprises the following steps: determining a difference characteristic of the first fusion characteristic and the second fusion characteristic to obtain a first difference characteristic; determining a difference characteristic of the first output characteristic and the second output characteristic to obtain a second difference characteristic; converting the first differential feature into a fractional feature by adopting a first conversion network; converting the second differential feature into a weight feature by adopting a second conversion network; and determining the quality score of the image to be evaluated based on the score feature and the weight feature.

As shown in fig. 5, the score determination network may include: a difference determination module 501, a feature transformation network 502, and a score determination module 503. The difference determining module 501 is configured to determine a first difference feature and a second difference feature, where the first difference feature is a difference feature between the first fused feature and the second fused feature, the second difference feature is a difference feature between the first output feature and the second output feature, and the difference feature is a difference between the two corresponding features, and the first difference feature is expressed asThe second differential feature is denoted->The feature transformation network 502 is used to transform the first differential feature into a fractional feature and the second differential feature into a weight feature. The feature transformation network may specifically include two full connection networks (FC 1 and FC2 in fig. 5, respectively transforming the first differential feature into a fractional feature and transforming the second differential feature into a weight feature, and the fractional feature may be represented by->The weight feature can be expressed as +.>And (3) representing. The score determination module 503 is configured to determine a quality score of the image to be evaluated based on the score feature and the weight feature. For example, the score determination module 503 may determine the quality score s using the following calculation formula _A ：

Through the above-described processing of the score determination network, the quality score of the image to be evaluated can be determined.

In this embodiment, by performing spatial alignment processing on the features to be fused, spatial position relationships among different features to be fused can be considered, so that an image quality evaluation effect is improved.

Fig. 6 is a schematic diagram of a sixth embodiment of the present disclosure, which provides a training method of an image quality evaluation model, the image quality evaluation model including: a feature extraction network and a score determination network, the method comprising:

601. and extracting image features of each image in a sample image pair by adopting the feature extraction network, wherein the image features comprise features to be fused and output features, the sample image pair comprises a reference image and a distorted image, and the features to be fused are subjected to space alignment processing to obtain the fused features of each image.

602. And determining a prediction score of the distorted image based on the fusion feature and the output feature by using the score determination network.

603. A loss function is determined based on the predictive score, and the feature extraction network and the score determination network are trained based on the loss function.

The spatial alignment processing is performed on the features to be fused, and the prediction score of the distorted image is determined, so that the processing flow in the image quality evaluation process can be similar.

Specifically, the performing spatial alignment processing on the features to be fused may include: and adopting a feature pyramid network to perform space alignment processing on the features to be fused.

Through FPN processing, the obtained fusion features retain a large amount of texture information of the bottom layer and semantic information of the top layer, and spatial position alignment is realized to the greatest extent, so that the image quality evaluation effect is improved. In addition, by fusing the image features, the parameter quantity of operation can be reduced, and the training efficiency of the image quality evaluation model can be improved.

The fusion features include a first fusion feature and a second fusion feature, the first fusion feature is a fusion feature corresponding to the reference image, the second fusion feature is a fusion feature corresponding to the distorted image, the output features include a first output feature and a second output feature, the first output feature is an output feature corresponding to the reference image, the second output feature is an output feature corresponding to the distorted image, and determining a prediction score of the distorted image based on the fusion feature and the output feature may include: determining a difference characteristic of the first fusion characteristic and the second fusion characteristic to obtain a first difference characteristic; determining a difference characteristic of the first output characteristic and the second output characteristic to obtain a second difference characteristic; converting the first differential feature into a fractional feature using a first conversion network of the fractional determination network; converting the second differential feature into a weight feature by adopting a second conversion network in the score determination network; a prediction score of the distorted image is determined based on the score feature and the weight feature.

By the above-described processing of the score determination network, the prediction score of the distorted image can be determined.

Unlike image quality assessment, the sample image pairs are two pairs at the time of model training, and further include a probability determination module.

As shown in fig. 7, in model training, the training system includes: an image quality evaluation model 701 and a probability determination module 702. Three features in the image quality evaluation model 701 extract network sharing parameters, and two scores determine the network sharing parameters. The image quality evaluation model is divided into two groups, and each group corresponds to a pair of sample image pairs. Only the image quality evaluation models corresponding to the first distorted image and the reference image are shown in fig. 7, and the image quality evaluation models corresponding to the second distorted image and the reference image are similar.

Specifically, sample groups can be obtained first, the sample groups are multiple groups, each sample group is a triplet, and each sample group comprises: the image processing device comprises a reference image, a first distorted image and a second distorted image, wherein the first distorted image and the second distorted image are obtained by adopting two different distortion processing modes to carry out distortion processing on the reference image. After obtaining the ternary sample set, two pairs of sample image pairs may be formed, the first sample image pair comprising the reference image and the first distorted image and the second sample image pair comprising the reference image and the second distorted image described above, assuming that the two pairs of sample image pairs are referred to as a first sample image pair and a second sample image pair, respectively.

As shown in fig. 7, after the feature extraction network and the score determination network of the image quality evaluation model are passed through, the prediction scores of the distorted images in each pair of sample images can be obtained, and assuming that the first distorted image is denoted by a and the second distorted image is denoted by B, the corresponding prediction scores can be referred to as a first prediction score and a second prediction score, denoted as sA and sB, respectively.

After the first prediction score and the second prediction score are obtained, the two prediction scores can be input into a probability determination module, and the probability determination module processes the two prediction scores and outputs the two prediction scores as prediction preference probabilities. The prediction preference probability uses h(s) _A ，s _B ) The specific form of the h (x) function may be selected according to the need, for example, the following calculation formula is shown:

the preference probability is used to indicate the preference of the first distorted image a over the second distorted image B, the smaller the preference probability, the closer the first distorted image a is to the reference image.

For distinction, the preference probability may be divided into a predicted preference probability and a true preference probability, and the predicted preference probability refers to a preference probability obtained based on the two prediction scores during training. The true preference probability refers to a preference probability calculated based on the true scores of two distorted images, wherein the true scores of the distorted images can be determined by performing processing such as manual labeling on the sample images because the distorted images are sample images. After the true score is obtained, the true preference probability can be calculated based on h (x) as described above.

After obtaining the predicted preference probability and the true preference probability, a loss function may be determined based on the two preference probabilities, and an image quality evaluation model may be trained based on the loss function.

Wherein the loss function may be of the form:

when training the model based on the loss function, the training objective may be to minimize the loss function described above, i.e., the final parameters of the image quality assessment model are expressed as:

in the above formula, f is a function corresponding to the image quality evaluation model, θ is a parameter of the image quality evaluation model, θ is a final parameter, i is a sample image index, T is a total number of samples, ai and Bi are a first distorted image and a second distorted image, respectively, ri is a reference image, and p _AB，i Is the true preference probability, h (x) is the predicted preference probability.

By determining the loss function based on the predicted preference probability and the true preference probability, the robustness of model training may be improved.

In some embodiments, the concept of course learning may be adopted in the training process, that is, the sample groups may be divided into sample groups of multiple categories, where the sample groups of different categories have different degrees of difficulty, where the degree of difficulty is determined based on an absolute value of a score difference, and the score difference is a difference between a true score of the first distorted image and a true score of the second distorted image; correspondingly, during training, the feature extraction network and the score determination network are trained according to the order of the difficulty level from easy to difficult and based on the loss function corresponding to the sample group of the corresponding category.

The respective sample groups are classified into three kinds according to the order of difficulty from easy to difficult, and may be referred to as an easy sample group, a general sample group, and a difficult sample group. The difficulty level of the sample groups of the three types is determined based on the absolute value of the score difference, for example, a first threshold and a second threshold may be preset, where the first threshold is greater than the second threshold, the sample group with the absolute value of the score difference greater than the first threshold is used as an easy sample group, the sample group with the absolute value of the score difference less than the second threshold is used as a difficult sample group, the rest is a general sample group, the first threshold is 0.5, and the second threshold is 0.1.

For example, one set of samples is denoted as < A1, B1, R1>, the other set of samples is denoted as < A2, B2, R2>, the other set of samples is denoted as < A3, B3, R3>, if the absolute value of the difference between the true score of A1 and the true score of B1 is greater than 0.5, if the absolute value of the difference between the true score of A2 and the true score of B2 is less than 0.1, the set of samples corresponding to < A1, B1, R1> is an easy set of samples, the set of samples corresponding to < A2, B2, R2> is a difficult set of samples, and the set of samples corresponding to < A3, B3, R3> is a normal set of samples.

In the training process, the parameters of the model can be continuously adjusted by adopting the easy sample group, for example, after the model is adjusted for 32 rounds, the parameters of the model can be continuously adjusted by adopting the general sample group, after the model reaches 32 rounds, the parameters of the model can be continuously adjusted by adopting the difficult sample group until the training ending condition is reached, and the training ending condition is preset, for example, the predicted training times are reached, or the loss function is converged. And taking the model parameters when the training ending condition is reached as final parameters.

In the image quality evaluation stage, the image quality evaluation can be performed on the image to be evaluated by adopting the final parameter model.

By introducing course learning ideas into the model training process, model convergence can be accelerated, and the model is ensured to gradually converge to an optimal space.

In some embodiments, due to the limited number of existing image pairs, the existing image pairs may be data enhanced to obtain extended image pairs, after which the existing image pairs and/or extended image pairs may be used as the sample image pairs.

That is, the method may further include: acquiring an existing image pair, the existing image pair comprising: an existing reference image and an existing distorted image; performing random erasure processing on the same positions of the existing reference image and the existing distorted image to obtain an extended image pair; the sample image pair is constructed based on the existing image pair and the extended image pair.

For example, if the existing image pair includes a distorted image a and a reference image R, denoted as < a, R >, the distorted image a and the reference image R may be subjected to the same random position erasure processing, for example, an element in a certain region D of the distorted image a is replaced with 0, and an element in a region corresponding to the position of the region D in the reference image R is also replaced with 0, and assuming that the processed images are denoted as a 'and R', respectively, a 'and R' form an extended image pair, denoted as < a ', R'. Thereafter, the sample image pair may be composed of multiple sets of < A, R >, < A ', R' >.

The number of sample image pairs can be expanded by carrying out the same random position erasing processing on the existing image pairs to obtain the expanded image pairs, and the training effect is improved.

In this embodiment, by performing spatial alignment processing on the features to be fused, spatial position relationships among different features to be fused can be considered, so that the training effect of the image quality evaluation model is improved.

Fig. 8 is a schematic diagram of an eighth embodiment of the present disclosure, which provides an image quality evaluation apparatus 800 including: an extraction module 801 and a determination module 802.

The extracting module 801 is configured to extract an image feature of each image in an image pair by using a feature extraction network in an image quality evaluation model, where the image feature includes a feature to be fused and an output feature, and the image pair includes: the method comprises the steps of referencing an image and an image to be evaluated, and carrying out space alignment processing on the features to be fused to obtain fusion features of each image; the determining module 802 is configured to determine a quality score of the image to be evaluated based on the fusion feature and the output feature using the score determination network in the image quality evaluation model.

In some embodiments, the extracting module 801 is specifically configured to: and adopting a feature pyramid network to perform space alignment processing on the features to be fused.

In some embodiments, the fusion features include a first fusion feature and a second fusion feature, where the first fusion feature is a fusion feature corresponding to the reference image, the second fusion feature is a fusion feature corresponding to the image to be evaluated, the output features include a first output feature and a second output feature, the first output feature is an output feature corresponding to the reference image, the second output feature is an output feature corresponding to the image to be evaluated, and the determining module 802 is specifically configured to: determining a difference characteristic of the first fusion characteristic and the second fusion characteristic to obtain a first difference characteristic; determining a difference characteristic of the first output characteristic and the second output characteristic to obtain a second difference characteristic; converting the first differential feature into a fractional feature by adopting a first conversion network; converting the second differential feature into a weight feature by adopting a second conversion network; and determining the quality score of the image to be evaluated based on the score feature and the weight feature.

In the embodiment of the disclosure, by performing spatial alignment processing on the features to be fused, spatial position relations among different features to be fused can be considered, so that the image quality evaluation effect is improved.

Fig. 9 is a schematic diagram of a ninth embodiment of the present disclosure, which provides a training apparatus of an image quality evaluation model including: a feature extraction network and a score determination network, the apparatus 900 comprising: an extraction module 901, a determination module 902, and a training module 903.

The extracting module 901 is configured to extract an image feature of each image in a sample image pair by using the feature extraction network, where the image feature includes a feature to be fused and an output feature, and the sample image pair includes a reference image and a distorted image, and perform spatial alignment processing on the feature to be fused to obtain a fused feature of each image; a determining module 902 is configured to determine a prediction score of the distorted image based on the fusion feature and the output feature using the score determination network;

the training module 903 is configured to determine a loss function based on the predictive score and train the feature extraction network and the score determination network based on the loss function.

In some embodiments, the extracting module 901 is specifically configured to: and adopting a feature pyramid network to perform space alignment processing on the features to be fused.

In some embodiments, the fusion features include a first fusion feature and a second fusion feature, where the first fusion feature is a fusion feature corresponding to the reference image, the second fusion feature is a fusion feature corresponding to the distorted image, the output features include a first output feature and a second output feature, the first output feature is an output feature corresponding to the reference image, and the determining module 902 is specifically configured to: determining a difference characteristic of the first fusion characteristic and the second fusion characteristic to obtain a first difference characteristic; determining a difference characteristic of the first output characteristic and the second output characteristic to obtain a second difference characteristic; converting the first differential feature into a fractional feature using a first conversion network of the fractional determination network; converting the second differential feature into a weight feature by adopting a second conversion network in the score determination network; a prediction score of the distorted image is determined based on the score feature and the weight feature.

In some embodiments, the sample image pair includes two sample image pairs, the two sample image pairs include the same reference image and two different distorted images, the prediction score includes a first prediction score and a second prediction score corresponding to the two distorted images, respectively, and the training module 903 is specifically configured to: determining a predictive preference probability based on the first predictive score and the second predictive score; a loss function is determined based on the predicted preference probability and the true preference probability.

In some embodiments, the sample image pairs are constructed from groups of samples, the groups of samples comprising: a reference image, a first distorted image, and a second distorted image, the apparatus further comprising: the grouping module is used for grouping the sample groups into sample groups of various types, wherein the difficulty degrees corresponding to the sample groups of different types are different, the difficulty degrees are determined based on absolute values of score difference values, and the score difference values are differences between the true scores of the first distorted image and the true scores of the second distorted image; the training module 903 is specifically configured to: and training the characteristic extraction network and the score determination network according to the order of the difficulty level from easy to difficult and according to the loss function corresponding to the sample group of the corresponding category.

In some embodiments, the apparatus further comprises: the device comprises an acquisition module, an expansion module and a construction module. The acquisition module is used for acquiring an existing image pair, and the existing image pair comprises: an existing reference image and an existing distorted image; the expansion module is used for carrying out the same random position erasing processing on the existing reference image and the existing distorted image so as to obtain an expansion image pair; a construction module is for constructing the sample image pair based on the existing image pair and the extended image pair.

In the embodiment of the disclosure, by performing spatial alignment processing on the features to be fused, spatial position relations among different features to be fused can be considered, so that the training effect of the image quality evaluation model is improved.

It is to be understood that in the embodiments of the disclosure, the same or similar content in different embodiments may be referred to each other.

It can be understood that "first", "second", etc. in the embodiments of the present disclosure are only used for distinguishing, and do not indicate the importance level, the time sequence, etc.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 10 shows a schematic block diagram of an example electronic device 1000 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile apparatuses, such as personal digital assistants, cellular telephones, smartphones, wearable devices, and other similar computing apparatuses. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the electronic device 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1002 or a computer program loaded from the storage unit 508 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data required for the operation of the electronic apparatus 1000 can also be stored. The computing unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

Various components in the electronic device 1000 are connected to the I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and communication unit 1009 such as a network card, modem, wireless communication transceiver, etc. Communication unit 1009 allows electronic device 1000 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.

The computing unit 1001 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 1001 performs the respective methods and processes described above, for example, an image quality evaluation method or a training method of an image quality evaluation model. For example, in some embodiments, the image quality evaluation method or training method of the image quality evaluation model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto electronic device 500 via ROM 1002 and/or communication unit 1009. When the computer program is loaded into the RAM 1003 and executed by the computing unit 1001, one or more steps of the image quality evaluation method or the training method of the image quality evaluation model described above may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured to perform the image quality evaluation method or the training method of the image quality evaluation model in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An image quality evaluation method, comprising:

extracting image characteristics of each image in an image pair by adopting a characteristic extraction network in an image quality evaluation model, wherein the image characteristics comprise characteristics to be fused and output characteristics, and the image pair comprises: the method comprises the steps of referencing an image and an image to be evaluated, and carrying out space alignment processing on the features to be fused to obtain fusion features of each image;

Determining a quality score of the image to be evaluated based on the fusion characteristic and the output characteristic by adopting a score determination network in the image quality evaluation model;

the fusion features comprise a first fusion feature and a second fusion feature, the first fusion feature is a fusion feature corresponding to the reference image, the second fusion feature is a fusion feature corresponding to the image to be evaluated, the output features comprise a first output feature and a second output feature, the first output feature is an output feature corresponding to the reference image, the second output feature is an output feature corresponding to the image to be evaluated, and the quality score of the image to be evaluated is determined based on the fusion feature and the output feature, and the fusion feature comprises:

determining a difference characteristic of the first fusion characteristic and the second fusion characteristic to obtain a first difference characteristic;

determining a difference characteristic of the first output characteristic and the second output characteristic to obtain a second difference characteristic;

converting the first differential feature into a fractional feature by adopting a first conversion network;

converting the second differential feature into a weight feature by adopting a second conversion network;

And determining the quality score of the image to be evaluated based on the score feature and the weight feature.

2. The method of claim 1, wherein the spatially aligning the features to be fused comprises:

and adopting a feature pyramid network to perform space alignment processing on the features to be fused.

3. A training method of an image quality evaluation model, the image quality evaluation model comprising: a feature extraction network and a score determination network, the method comprising:

extracting image features of each image in a sample image pair by adopting the feature extraction network, wherein the image features comprise features to be fused and output features, the sample image pair comprises a reference image and a distorted image, and the features to be fused are subjected to space alignment processing to obtain the fused features of each image;

determining a prediction score of the distorted image based on the fusion feature and the output feature using the score determination network;

determining a loss function based on the predictive score, and training the feature extraction network and the score determination network based on the loss function;

the fusion features include a first fusion feature and a second fusion feature, the first fusion feature is a fusion feature corresponding to the reference image, the second fusion feature is a fusion feature corresponding to the distorted image, the output features include a first output feature and a second output feature, the first output feature is an output feature corresponding to the reference image, the second output feature is an output feature corresponding to the distorted image, and the determining a prediction score of the distorted image based on the fusion feature and the output feature includes:

converting the first differential feature into a fractional feature using a first conversion network of the fractional determination network;

converting the second differential feature into a weight feature by adopting a second conversion network in the score determination network;

a prediction score of the distorted image is determined based on the score feature and the weight feature.

4. A method according to claim 3, wherein the spatially aligning the features to be fused comprises:

5. A method according to claim 3, wherein the sample image pair comprises two sample image pairs comprising the same reference image and two different distorted images, the prediction score comprising a first prediction score and a second prediction score corresponding to the two distorted images, respectively, the determining a loss function based on the prediction scores comprising:

Determining a predictive preference probability based on the first predictive score and the second predictive score;

a loss function is determined based on the predicted preference probability and the true preference probability.

6. The method of any of claims 3-5, wherein the sample image pairs are constructed from sample sets, the sample sets being a plurality of sets, each set of sample sets comprising: a reference image, a first distorted image, and a second distorted image, the method further comprising:

dividing the sample group into sample groups of various categories, wherein the difficulty degrees corresponding to the sample groups of different categories are different, the difficulty degrees are determined based on absolute values of score difference values, and the score difference values are differences between the true scores of the first distorted image and the true scores of the second distorted image;

the training the feature extraction network and the score determination network based on the loss function comprises:

and training the characteristic extraction network and the score determination network according to the order of the difficulty level from easy to difficult and according to the loss function corresponding to the sample group of the corresponding category.

7. The method of any of claims 3-5, further comprising:

acquiring an existing image pair, the existing image pair comprising: an existing reference image and an existing distorted image;

Performing the same random position erasure processing on the existing reference image and the existing distorted image to obtain an extended image pair;

the sample image pair is constructed based on the existing image pair and the extended image pair.

8. An image quality evaluation device comprising:

the extraction module is used for extracting the image characteristics of each image in the image pair by adopting a characteristic extraction network in the image quality evaluation model, wherein the image characteristics comprise characteristics to be fused and output characteristics, and the image pair comprises: the method comprises the steps of referencing an image and an image to be evaluated, and carrying out space alignment processing on the features to be fused to obtain fusion features of each image;

the determining module is used for determining a network by adopting the score in the image quality evaluation model and determining the quality score of the image to be evaluated based on the fusion characteristic and the output characteristic;

the fusion features comprise a first fusion feature and a second fusion feature, the first fusion feature is a fusion feature corresponding to the reference image, the second fusion feature is a fusion feature corresponding to the image to be evaluated, the output features comprise a first output feature and a second output feature, the first output feature is an output feature corresponding to the reference image, the second output feature is an output feature corresponding to the image to be evaluated, and the determining module is specifically configured to:

9. The apparatus of claim 8, wherein the extraction module is specifically configured to:

10. A training apparatus of an image quality evaluation model, the image quality evaluation model comprising: a feature extraction network and a score determination network, the apparatus comprising:

the extraction module is used for extracting image characteristics of each image in a sample image pair by adopting the characteristic extraction network, wherein the image characteristics comprise characteristics to be fused and output characteristics, the sample image pair comprises a reference image and a distortion image, and the characteristics to be fused are subjected to space alignment processing to obtain fusion characteristics of each image;

A determining module for determining a prediction score of the distorted image based on the fusion feature and the output feature using the score determining network;

a training module for determining a loss function based on the predictive score and training the feature extraction network and the score determination network based on the loss function;

the fusion features comprise a first fusion feature and a second fusion feature, the first fusion feature is a fusion feature corresponding to the reference image, the second fusion feature is a fusion feature corresponding to the distorted image, the output features comprise a first output feature and a second output feature, the first output feature is an output feature corresponding to the reference image, and the determining module is specifically configured to:

11. The apparatus of claim 10, wherein the extraction module is specifically configured to:

12. The apparatus of claim 10, wherein the sample image pair comprises two sample image pairs comprising the same reference image and two different distorted images, the prediction score comprising a first prediction score and a second prediction score corresponding to the two distorted images, respectively, the training module being specifically configured to:

13. The apparatus of any of claims 10-12, wherein the sample image pairs are constructed from sample sets, the sample sets being a plurality of sets, each set of sample sets comprising: a reference image, a first distorted image, and a second distorted image, the apparatus further comprising:

the grouping module is used for grouping the sample groups into sample groups of various types, wherein the difficulty degrees corresponding to the sample groups of different types are different, the difficulty degrees are determined based on absolute values of score difference values, and the score difference values are differences between the true scores of the first distorted image and the true scores of the second distorted image;

The training module is specifically used for:

14. The apparatus of any of claims 10-12, further comprising:

an acquisition module, configured to acquire an existing image pair, where the existing image pair includes: an existing reference image and an existing distorted image;

the expansion module is used for carrying out the same random position erasing processing on the existing reference image and the existing distorted image so as to obtain an expansion image pair;

a construction module for constructing the sample image pair based on the existing image pair and the extended image pair.

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.