WO2022142445A1

WO2022142445A1 - Model training method, and image quality evaluation method and apparatus

Info

Publication number: WO2022142445A1
Application number: PCT/CN2021/116766
Authority: WO
Inventors: 于文海; 郭伟
Original assignee: 中国银联股份有限公司
Priority date: 2020-12-28
Filing date: 2021-09-06
Publication date: 2022-07-07
Also published as: CN112700408A; CN112700408B

Abstract

A model training method, and an image quality evaluation method and apparatus. The training method comprises: acquiring a real image sample set, wherein the real image sample set comprises a plurality of real image samples (101); performing iterative training on a pre-built generative adversarial network by using the real image sample set, and collecting a plurality of pseudo image sample sets that are respectively generated within a plurality of rounds of iteration by a generative network in the generative adversarial network (102); generating a first training sample library composed of the real image sample set and the plurality of pseudo image sample sets, and automatically categorizing and labeling each first training sample of the first training sample library according to a plurality of preset image quality levels to obtain the first training sample library (103); and training a preset multi-classification network by using the first training sample library so as to obtain an image quality evaluation model (104). By using the foregoing method, only a small number of clear real image samples need to be collected to generate a large number of pseudo image samples of different quality levels, and automatic labeling reduces manual costs while improving the quality of data labeling, so that the image quality evaluation model is trained at a lower cost.

Description

Model training method, image quality assessment method and device

This application claims the priority of the Chinese patent application with the application number of 202011578791.8 and titled "Model Training Method, Image Quality Evaluation Method and Device", filed on December 28, 2020, the disclosure content of the Chinese patent application is cited by method is incorporated herein.

technical field

The invention belongs to the field of computers, and in particular relates to a model training method, an image quality evaluation method and a device.

Background technique

This section is intended to provide a background or context for the embodiments of the invention that are recited in the claims. The descriptions herein are not admitted to be prior art by inclusion in this section.

With the popularization of object recognition technologies such as face recognition, people have higher and higher requirements for object recognition accuracy. However, the quality of the collected object images directly affects the object recognition accuracy. Poor quality object images will lead to object recognition. Misrecognition or omission, so it is very important to conduct quality assessment before object recognition.

Image quality assessment is mainly divided into full reference quality assessment, semi-reference quality assessment, and no reference quality assessment. For example, the face image quality assessment is subject to individual differences in facial features, including but not limited to hairstyles, wearing glasses, makeup, etc., which will lead to large changes in content, which is a non-reference quality assessment. Among the no-reference quality assessment methods, most of the current methods still need to utilize subjective quality scores to train quality assessment models.

The training process of the existing image quality assessment model mainly includes image data collection, manual data cleaning and labeling of the collected data, and then detecting the region of interest through the detection model, and expanding the boundary margin to retain the object area with complete content, Object regions and human-annotated quality labels are input to the deep learning network for training learning.

Image quality assessment model training needs to collect a large amount of image data and label the quality score labels corresponding to the image data, which is a huge workload. At the same time, due to the individual subjectivity of those who perform the labeling work and the differences in the richness of the content contained in the images themselves, it is difficult to formulate a unified set of standards to perform the labeling work. Different people are observing the same image. Due to differences in cognition, there will be differences in the quality level annotation of the same image. Therefore, data collection and annotation for quality assessment has always been a difficult problem in face image quality assessment.

SUMMARY OF THE INVENTION

Aiming at the problems existing in the above-mentioned prior art, a model training method, an image quality assessment method and an apparatus are proposed, and the above-mentioned problems can be solved by using the method and apparatus.

The present invention provides the following solutions.

In a first aspect, a method for training an image quality assessment model is provided, including: acquiring a real image sample set, wherein the real image sample set includes multiple real image samples; and using the real image sample set to iteratively train a pre-built generative adversarial network , collect multiple fake image sample sets generated by the generative network in the generative adversarial network in multiple iteration rounds; generate a first training sample library consisting of real image sample sets and multiple fake image sample sets, according to multiple The preset image quality level is automatically graded and labeled for each first training sample in the first training sample library to obtain a labeled first training sample library; the preset multi-classification network is trained by using the first training sample library to obtain Image quality assessment model.

In some embodiments, automatically grading and labeling each first training sample in the first training sample library includes: labeling the real image samples included in the real image sample set as the highest image quality level; The number of iteration rounds corresponding to the set marks the multiple pseudo-image samples included in each pseudo-image sample set as the corresponding image quality level, wherein a higher iteration number corresponds to a higher image quality level.

In some embodiments, automatically grading and labeling each first training sample in the first training sample library includes: labeling a plurality of real image samples included in the real image sample set as the highest image quality level; The Frescher distance between the image sample set and the real image sample set, according to the calculation result, the multiple fake image samples included in each fake image sample set are marked as the corresponding image quality level, wherein, the smaller Frescher distance corresponds to Higher image quality levels.

In some embodiments, automatically grading and labeling each first training sample in the first training sample library includes: labeling a plurality of real image samples included in the real image sample set as the highest image quality level; The mean square error (MSE) value of the image sample and the real image sample, according to the mean square error (MSE) value, each fake image sample is marked as the corresponding image quality level, where the lower mean square error (MSE) value corresponds to at a higher image quality level.

In some embodiments, acquiring a sample set of real images further includes: acquiring a plurality of real images, and performing the following preprocessing operations on the plurality of real images: determining a region of interest (ROI) in each real image by using an object detection algorithm, and Each real image is cropped according to the determined region of interest (ROI); and the size of the multiple real images is normalized to obtain a real image sample set.

In some embodiments, the real image sample is a face image, and the object detection algorithm is a face detection algorithm.

In some embodiments, after acquiring the real image sample set, the method further includes: removing non-frontal face pictures in the real image sample set by using a key point detection algorithm and/or a pose estimation algorithm.

In some embodiments, using the first training sample library to perform classification training on a preset multi-classification network to obtain an image quality assessment model includes: acquiring each labeled first training sample in the first training sample library, and the label is used to indicate the image quality level of the first training sample; perform row direction filtering processing on each first training sample to obtain a first filtered image; perform column direction filtering processing on each first training sample to obtain a second filtered image; Each first training sample and the corresponding first filtered image and the second filtered image are spliced and merged to generate a labeled second training sample respectively; a plurality of second training samples corresponding to a plurality of first training samples are obtained respectively, and A plurality of second training samples are input into a preset multi-classification network for iterative training to obtain an image quality evaluation model.

In some embodiments, the preset multi-classification network is a ResNet network, and the preset multi-classification network uses a binary cross-entropy function as a loss function and uses a softmax function for binary classification.

In some embodiments, the method further includes: pre-constructing a generative adversarial network, the generative adversarial network includes a generative network and a discriminant network; wherein the generative network includes a linear mapping layer, a plurality of convolutional layers, and each of the plurality of convolutional layers. Batch normalization function and ReLU activation function after the convolutional layer, the generative network is used to receive random noise and generate fake image samples; the discriminative network includes multiple convolutional layers and LeakyRelu after each convolutional layer of the multiple convolutional layers The activation function layer and pooling layer, as well as the fully connected layer, the LeakyRelu activation function layer and the sigmoid activation function layer after multiple convolutional layers, the discriminant network is used to determine the authenticity of real image samples and fake image samples.

In some embodiments, the method further includes: generating a loss function of the network using a cross-entropy function.

In a second aspect, an image quality evaluation method is provided, including: receiving an image to be evaluated; using an image quality evaluation model trained by the method of the first aspect to perform image quality evaluation on the image to be evaluated, so as to confirm that the image to be evaluated is a plurality of pre-evaluated images. one of the set image quality levels.

In some embodiments, the image to be evaluated is a face image to be evaluated, the image quality evaluation model is used to perform quality evaluation on the face image, and the method further includes: after receiving the image to be evaluated, using a face detection algorithm to determine the face to be evaluated The region of interest (ROI) in the image, and the face image to be evaluated is cropped according to the determined region of interest (ROI); the size of the cropped face image to be evaluated is normalized according to the size of the first training sample use key point detection algorithm and/or pose estimation algorithm to determine whether the face image to be evaluated after size normalization is a frontal image; wherein, if the face image to be evaluated is not a frontal image, the evaluation is stopped. If the face image is a frontal image, the image quality evaluation model is used to evaluate the image quality of the image to be evaluated after the size is normalized.

In some embodiments, using an image quality evaluation model to perform image quality assessment on the image to be evaluated includes: performing row-direction filtering on the image to be evaluated to obtain a first filtered image to be evaluated; performing column-direction filtering on the image to be evaluated to obtain a second Filter the image to be evaluated; input the combined image of the image to be evaluated, the first filtered image to be evaluated and the second filtered image to be evaluated into an image quality evaluation model for evaluation, to determine that the image to be evaluated is one of a plurality of preset image quality levels .

In a third aspect, a model training device is provided, comprising: an acquisition module for acquiring a real image sample set, wherein the real image sample set includes a plurality of real image samples; a generative adversarial network module for using the real image sample set to pair The pre-built generative adversarial network is iteratively trained, and multiple fake image sample sets generated by the generative network in the generative adversarial network in multiple iteration rounds are collected; the automatic labeling module is used to generate real image sample sets and multiple The first training sample library composed of pseudo-image sample sets is automatically graded and labeled for each first training sample of the first training sample library according to a plurality of preset image quality levels, so as to obtain a labeled first training sample library; the model The training module is used to train the preset multi-classification network by using the first training sample library to obtain an image quality evaluation model.

In some embodiments, the automatic labeling module is further used to: label the real image samples included in the real image sample set as the highest image quality level; The multiple pseudo-image samples included are annotated with corresponding image quality levels, where a higher number of iterations corresponds to a higher image quality level.

In some embodiments, the automatic labeling module is further configured to: label multiple real image samples included in the real image sample set as the highest image quality level; calculate the Frecher distance between each fake image sample set and the real image sample set , according to the calculation results, the multiple pseudo-image samples included in each pseudo-image sample set are marked as corresponding image quality levels, wherein a smaller Frecher distance corresponds to a higher image quality level.

In some embodiments, the automatic labeling module is further configured to: label multiple real image samples included in the real image sample set as the highest image quality level; calculate the mean square error (MSE) between each fake image sample and the real image sample value, and label each pseudo-image sample as a corresponding image quality level according to the mean square error (MSE) value, where lower mean square error (MSE) values correspond to higher image quality levels.

In some embodiments, the acquisition module is further configured to: collect multiple real images, and perform the following preprocessing operations on the multiple real images: determine a region of interest (ROI) in each real image by using an object detection algorithm, and determine The region of interest (ROI) of each real image is cropped; and the size of multiple real images is normalized to obtain a real image sample set.

In some embodiments, after acquiring the real image sample set, the acquiring module is further configured to: remove non-frontal pictures in the real image sample set by using a key point detection algorithm and/or a pose estimation algorithm.

In some embodiments, the model training module is further configured to: obtain each labeled first training sample in the first training sample library, where the label is used to indicate the image quality level of the first training sample; for each first training sample The sample is subjected to row direction filtering processing to obtain a first filtered image; each first training sample is subjected to column direction filtering processing to obtain a second filtered image; each first training sample and the corresponding first filtered image and the second filtered image are The images are spliced and merged to respectively generate labeled second training samples; a plurality of second training samples corresponding to a plurality of first training samples are obtained respectively, and the plurality of second training samples are input into a preset multi-classification network for iterative training, to obtain an image quality assessment model.

In some embodiments, the generative adversarial network module is further configured to: construct a generative adversarial network in advance, and the generative adversarial network includes a generative network and a discriminant network; wherein, the generative network includes a linear mapping layer, a plurality of convolutional layers, and a plurality of convolutional layers located in a plurality of convolutional layers. The batch normalization function and ReLU activation function after each convolutional layer of the layer, the generative network is used to receive random noise and generate fake image samples; the discriminative network includes multiple convolutional layers and each convolutional layer located in multiple convolutional layers The LeakyRelu activation function layer and the pooling layer after the layer, as well as the fully connected layer, the LeakyRelu activation function layer and the sigmoid activation function layer after multiple convolutional layers, the discriminant network is used for real image samples and fake image samples. determination.

In some embodiments, the loss function of the generative network employs a cross-entropy function.

In a fourth aspect, an image quality evaluation device is provided, comprising: a receiving module for receiving the image to be evaluated; Evaluation to confirm that the image to be evaluated is one of several preset image quality levels.

In some embodiments, the image to be evaluated is a face image to be evaluated, the image quality evaluation model is used to perform quality evaluation on the face image, and the evaluation module is further configured to: after receiving the image to be evaluated, use a face detection algorithm to determine the image to be evaluated The region of interest (ROI) in the face image, and the face image to be evaluated is cropped according to the determined region of interest (ROI); according to the size of the first training sample, the cropped face image to be evaluated is sized. Normalization; use the key point detection algorithm and/or the pose estimation algorithm to determine whether the face image to be evaluated after size normalization is a frontal image; wherein, if the face image to be evaluated is not a frontal image, the evaluation is stopped, if If the face image to be evaluated is a frontal face image, the image quality evaluation model is used to evaluate the image quality of the to-be-evaluated image after size normalization.

In some embodiments, the evaluation module is further configured to: perform row-direction filtering on the image to be evaluated to obtain a first filtered image to be evaluated; perform column-direction filtering on the image to be evaluated to obtain a second filtered image to be evaluated; , the combined image of the first filtered image to be evaluated and the second filtered image to be evaluated is input to the image quality evaluation model for evaluation, to determine that the image to be evaluated is one of a plurality of preset image quality levels.

In a fifth aspect, a model training apparatus is provided, comprising: at least one processor; and a memory connected in communication with the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are processed by the at least one processor The processor executes to enable at least one processor to execute: the method of the first aspect.

In a sixth aspect, an image quality assessment method is provided, comprising: at least one processor; and a memory connected in communication with the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor. The processor executes to enable at least one processor to execute: the method of the second aspect.

In a seventh aspect, a computer-readable storage medium is provided, the computer-readable storage medium stores a program, and when the program is executed by a multi-core processor, the multi-core processor executes the method of the first aspect and/or the second aspect.

The above-mentioned at least one technical solution adopted in the embodiment of the present application can achieve the following beneficial effects: In this embodiment, .

It should be understood that the above description is only an overview of the technical solutions of the present invention, so that the technical means of the present invention can be more clearly understood, and thus can be implemented in accordance with the contents of the description. In order to make the above-mentioned and other objects, features and advantages of the present invention more apparent and comprehensible, specific embodiments of the present invention are illustrated below.

Description of drawings

The advantages and benefits described herein, as well as other advantages and benefits, will become apparent to those of ordinary skill in the art upon reading the following detailed description of the exemplary embodiments. The drawings are for purposes of illustrating exemplary embodiments only and are not to be considered limiting of the invention. Also, the same components are denoted by the same reference numerals throughout the drawings. In the attached image:

1 is a schematic flowchart of a model training method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a generative adversarial network according to an embodiment of the present invention;

3 is a schematic diagram of a generation network according to an embodiment of the present invention;

4 is a schematic diagram of a discrimination network according to an embodiment of the present invention;

5 is a schematic diagram of splicing a first training sample and a corresponding first filtered image and a second filtered image according to an embodiment of the present invention;

6 is a schematic structural diagram of a model training apparatus according to an embodiment of the present invention;

7 is a schematic structural diagram of a model training apparatus according to another embodiment of the present invention;

FIG. 8 is a schematic structural diagram of an image quality assessment apparatus according to an embodiment of the present invention.

In the drawings, the same or corresponding reference numerals denote the same or corresponding parts.

Detailed ways

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the present disclosure will be more thoroughly understood, and will fully convey the scope of the present disclosure to those skilled in the art.

In the present invention, it should be understood that terms such as "comprising" or "having" are intended to indicate the presence of features, numbers, steps, acts, components, parts or combinations thereof disclosed in this specification, and are not intended to exclude a or multiple other features, numbers, steps, acts, components, parts, or combinations thereof may exist.

In addition, it should be noted that the embodiments of the present invention and the features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Embodiments of the present invention provide a model training method, an image quality assessment method, and an apparatus. In the following, the inventive concept of the model training method is first introduced.

Embodiments of the present invention provide a model training method for training an image quality assessment model. Specifically, firstly, a real image sample set including multiple real image samples is obtained, a pre-built generative adversarial network is used, and the real image sample is used. Iteratively trains the generative adversarial network, and collects multiple fake image sample sets generated by the generation network in multiple iteration rounds, and generates a first training sample consisting of a real image sample set and multiple fake image sample sets. Since the generation network gradually generates pseudo-image samples with higher quality in multiple iteration rounds, each first training sample in the first training sample library can be automatically graded and labeled according to multiple preset image quality levels , obtain the first training sample library with labels, and further use the first training sample library to train the preset multi-class network to obtain an image quality evaluation model, and finally use the trained image quality evaluation model to evaluate the image quality of the image to be evaluated. Evaluate, and determine that the image to be evaluated is one of a plurality of preset image quality levels. This embodiment only needs to collect a small number of clear real image samples to generate a large number of fake image samples of different quality levels, complete the labeling during the generation process, avoid manual intervention, reduce labor costs, and improve the quality of data labeling. The training of the image quality assessment model is completed at a lower cost.

Those skilled in the art can understand that the described application scenario is only an example in which the embodiments of the present invention can be implemented. The scope of application of the embodiments of the present invention is not limited in any way. Having introduced the basic principles of the present invention, various non-limiting embodiments of the present invention are described in detail below.

1 is a schematic flowchart of a model training method 100 according to an embodiment of the present application, which is used to evaluate the quality of an image. In this process, from a device perspective, the execution subject may be one or more electronic devices; from a program perspective In other words, the execution body may be a program mounted on these electronic devices accordingly.

As shown in FIG. 1, the method 100 may include:

Step 101: Obtain a real image sample set, wherein the real image sample set includes a plurality of real image samples.

In one embodiment, in order to obtain a real image sample set that is convenient for subsequent training, step 101 may further include: collecting multiple real images, and performing the following preprocessing operations on the multiple real images: using an object detection algorithm to determine the Each real image is cropped according to the determined region of interest (ROI); and the size of multiple real images is normalized to obtain a real image sample set.

The real image may be image data for a specific object, such as a face image, an animal image, a vehicle image, and the like. Object detection algorithms are used to detect target objects from real images to obtain regions of interest (ROIs).

In one embodiment, the real image sample is a face image, and the object detection algorithm is a face detection algorithm.

In one embodiment, after acquiring the real image sample set, the method may further include: removing non-frontal pictures in the real image sample set by using a key point detection algorithm and/or a pose estimation algorithm. In this way, it can be avoided that the non-frontal face pictures will adversely affect the subsequent training.

For example, a visible light camera can be used to collect a clear face image database A, and each image in the face image database A can be detected using an open-source face detection algorithm to obtain the region of interest (ROI) of the corresponding image. ), crop the original image to obtain the corresponding clear face picture database B, normalize the size of each picture in the clear face picture database B, and obtain a set of face pictures with a size of H*W, such as where H=120, W=160, and finally, non-frontal face pictures such as profile and pitch can be removed by using key point detection and pose estimation algorithms. Finally, the remaining face image data is stored, and a face image database with a quality of 1 (highest) is obtained as a real image sample set D1.

Step 102 , iteratively train the pre-built generative adversarial network using the real image sample set, and collect multiple fake image sample sets generated by the generative network in the generative adversarial network in multiple iteration rounds.

Referring to Figure 2, the process of iterative training of a pre-built generative adversarial network is shown. During training, the generative and discriminative networks have opposite goals: the discriminative network tries to distinguish fake images from real images, while the generative network tries to produce images that look real enough to fool the discriminative network. Since the generative adversarial network consists of two networks with different objectives, each training iteration can be divided into two stages: In the first stage, the discriminative network is trained, a batch of real images is sampled from the real image sample set D1, and the generative network receives Random noise R and generate fake image samples R', real image sample set D1 and fake image samples R' to form a training batch, where the labels of fake image samples are set to 0 (pseudo) and the labels of real image samples are set to 1 ( true), and train the discriminative network on this labeled batch using a binary cross-entropy loss. Backpropagation at this stage can only optimize the weights of the discriminative network. In the second stage, train the generative network, first use the generative network to generate another batch of fake image samples, and then use the discriminative network again to judge whether the image is a fake image sample or a real image sample, in this stage all labels are set to 1 (true image sample) ). In other words, it is hoped that the discriminative network will falsely judge the fake image samples produced by the generative network to be true. Crucially, in this step, the weights of the discriminative network are fixed, so backpropagation only affects the weights of the generative network.

It can be understood that through the above-mentioned iterative training process of the generative adversarial network, the generative network never actually generates any real images, but as the training iteration progresses, the quality gap between the fake image samples generated by the generative network and the real image samples gradually increases. smaller.

In one embodiment, step 102 further includes: pre-constructing a generative adversarial network, where the generative adversarial network includes a generative network and a discriminant network; wherein the generative network includes a linear mapping layer, a plurality of convolutional layers, and each of the plurality of convolutional layers. A batch normalization function and a ReLU activation function after a convolutional layer, the generative network is used to generate fake image samples based on random noise. The discriminative network includes multiple convolutional layers and a LeakyRelu activation function layer and a pooling layer located after each convolutional layer of the multiple convolutional layers, and a fully connected layer, a LeakyRelu activation function layer and a fully connected layer located after the multiple convolutional layers. The sigmoid activation function layer, the discriminant network is used to determine the authenticity of real image samples and fake image samples.

For example, referring to Figure 3, the input of the generator network is 20-dimensional random noise of length 3*H*2*W*2, the first layer is a linear mapping, and the input is mapped as 1*3*(H*2)*( W*2) four-dimensional data; the second layer is convolution operation, the output result of the first layer is convolved with the Kernel of 50*3*3, where the step size is 1, the padding is 1; the third layer is convolution Operation, convolve the output result of the second layer with the 25*3*3 Kernel, where the step size is 1 and the padding is 1; the fourth layer is convolution operation, and the output result of the third layer is convolved with 16*3*3 The Kernel is convolved, where the step size is 2, and the padding is 1; the fifth layer is convolution operation, and the output result of the fourth layer is convolved with the 16*3*3 Kernel, where the step size is 1, and the padding is 1; The sixth layer is a convolution operation, convolving the output result of the fifth layer with a 16*3*3 Kernel, where the step size is 1 and the padding is 1; the seventh layer is a convolution operation, and the sixth layer is convolved. The output result is convolved with the 8*3*3 Kernel, where the step size is 1 and the padding is 1; the seventh layer is the convolution operation, and the sixth layer output result is convolved with the 3*3*3 Kernel, where the step size is 1 and the padding is 1. A BatchNormlization layer and a ReLU activation function layer are added to the output of each of the above layers. In one embodiment, the loss function of the generating network adopts a cross-entropy function. Specifically, the calculation of the loss function uses the cross-entropy function between the prediction results of the adversarial network on the fake image samples and the real labels.

For example, referring to Figure 4, the input of the discriminant network is the real image sample set D1 and the fake image sample set R', the label of the real image sample set D1 is set to 1 (true), and the label of the fake image sample set R' is set to 0 (pseudo) uses the single-target binary cross-entropy function as the loss function, in which the first layer in the discriminant network is the convolution operation, and the input 1*3*H*W image data and the 32*7*7 Kernel Perform convolution with a step size of 1 and padding of 3. The convolution result is processed by the LeakyRelu activation function, followed by a 2*2 average pooling process with a step size of 2; the second layer is a convolution operation, Convolve the output result of the first layer with a 32*3*3 Kernel, where the step size is 1, the padding is 1, and the convolution result is processed by the LeakyRelu activation function, followed by 2* with a step size of 2 The average pooling processing of 2; the third layer is convolution operation, the output result of the second layer is convolved with the 16*3*3 Kernel, where the step size is 1, the padding is 1, and the volume is adjusted by the LeakyRelu activation function. The product results are processed, followed by a 2*2 average pooling process with a step size of 2; the fourth layer is 2 fully connected layers, the output of the third layer is mapped to 1*1024 dimensions, and the LeakyRelu activation function is used to pair the The convolution result is processed, the 1*1024 dimension is mapped to 1*1 dimension, and finally the sigmoid activation function is connected to obtain a probability before 0-1, so as to perform two-classification.

Step 103: Generate a first training sample library consisting of a real image sample set and a plurality of pseudo-image sample sets, and automatically grade and label each first training sample in the first training sample library according to a plurality of preset image quality levels. , to get the first training sample library with labels.

In one embodiment, in step 103, the automatic grading and labeling of each first training sample in the first training sample library includes: according to the number of iterations corresponding to each pseudo image sample set, Multiple fake image samples are marked as corresponding image quality levels, wherein higher iteration times correspond to higher image quality levels; the real image samples included in the real image sample set are marked as the highest image quality level.

For example, the preset image quality levels can be divided into 6 types from high to low, including "level I", "level II", ..., "level VI". The pseudo-image samples and real image samples can be stored in stages according to their quality by saving the pseudo-image samples in the intermediate training process. For example, when iterating 500 times, the image quality level of the pseudo-image sample set generated by the generation network is "level VI". When iteratively 1000 times, the image quality level of the fake image sample set generated by the network is "V level". As the number of iterations increases, the distinction between the fake image samples generated by the network and the real image samples collected is lower. , that is, the pseudo-image sample set generated by the network has a higher image quality level and better quality. Here, multiple levels of pseudo-image sample sets with different qualities can be generated. In other words, the multiple pseudo-image samples included in each pseudo-image sample set can be marked as corresponding image quality levels according to the number of iteration rounds corresponding to each pseudo-image sample set (such as the above-mentioned 500 times, 1000 times, etc.), wherein, A higher number of iterations corresponds to a higher image quality level; the real image samples contained in the real image sample set are marked as the highest image quality level "Class I".

In another embodiment, the automatic grading and labeling of each first training sample in the first training sample library based on the image quality in step 103 includes: calculating the Fraser difference between each fake image sample set and the real image sample set Distance (Frech Inception Distance); according to the calculation results, multiple pseudo-image samples included in each pseudo-image sample set are marked as corresponding image quality levels, where a smaller Frech distance corresponds to a higher image quality level; And, multiple real image samples included in the real image sample set are marked as the highest image quality level.

For example, the preset image quality levels can be divided into 6 types from high to low, including "level I", "level II", ..., "level VI". During the training process, the corresponding pseudo-image samples for different iterations can be saved in folders. For example, folder F1 stores the pseudo-image samples corresponding to the 10th round of training, and folder F2 stores the corresponding pseudo-image samples when the training reaches the 20th round. The fake image samples of ... and so on; the Frecher distance between the data in multiple folders and the real image sample set D1 can be calculated to measure the quality difference between the generated fake image samples and the clear real image samples, and according to Calculation results, according to the distribution of Frecher distance, the folders generated in the second step are grouped into 5 categories, and they are arranged according to the distance from small to large to obtain a real image sample set D1 with a quality of "level I" and a quality of "level II". The fake image sample set, ..., the fake image sample set of quality "VI".

Optionally, parameters for evaluating the degree of similarity, such as cosine similarity, KL divergence (Kullback-Leibler divergence), etc., may also be used instead of the above-mentioned Frechet distance. For example, the cosine similarity between the data in multiple folders and the real image sample set D1 can be calculated to measure the quality difference between the generated fake image samples and the clear real image sample pictures, and the quality can be sorted in descending order of similarity. It is a real image sample set D1 of "level I", a fake image sample set of "level II" quality, ..., a fake image sample set of "level VI" quality.

Optionally, the evaluation of the degree of similarity may be performed based on the data in multiple folders and some image information of the real image sample set D1, or may also be based on the data in multiple folders and all of the real image sample set D1. The image information is evaluated for the above degree of similarity, which is not specifically limited in this application.

After completing the above steps, the automatically labeled first training sample library D can be obtained, wherein the first training sample library D contains 6 subfolders, which can correspond to real image samples D1 with a quality of "Class I" and a quality of "II". The fake image sample set D2 of "level", the fake image sample set D3 of "level III" quality, the database D4 of fake image sample set of "level IV", and the face image database D5 of "level V" quality; It is the face image database D6 of "level VI";

In yet another embodiment, the automatic grading and labeling of each first training sample in the first training sample library based on the image quality in step 103 includes: calculating the mean square error (MSE) between each fake image sample and the real image sample ) value; label each fake image sample as a corresponding image quality level according to a mean squared error (MSE) value, where lower mean squared error (MSE) values correspond to higher image quality levels; The image sample set contains multiple real image samples annotated with the highest image quality level.

Step 104: Use the first training sample library to train a preset multi-classification network to obtain an image quality evaluation model.

The first training sample library consists of a real image sample set and a plurality of fake image sample sets, and each of the first training samples carries a label used to indicate the image quality. For example, it is assumed that the image quality is divided into 6 from high to low. The first training sample, which is a real image sample, is labeled as "Class I", and the first training sample, which is a fake image sample, is based on its image The order of quality from good to bad is "Class II", ..., "Class VI". Therefore, the preset multi-classification network can be trained by using the first training sample library until it converges, and the obtained image quality evaluation model can determine the image quality according to the input image as "level I", "level II", ..., " One of the "Class VI".

In one embodiment, step 104 may specifically include: acquiring each labeled first training sample in the first training sample library, where the label is used to indicate the image quality level of the first training sample; Perform row direction filtering processing to obtain a first filtered image; perform column direction filtering processing on each first training sample to obtain a second filtered image; combine each first training sample with the corresponding first filtered image and the second filtered image Perform splicing and merging to generate labeled second training samples respectively; respectively obtain multiple second training samples corresponding to multiple first training samples, and input multiple second training samples into a preset multi-classification network for iterative training to Obtain an image quality assessment model.

Referring to Fig. 5, for a plurality of labeled first training samples Img of the first training sample library, row-direction filtering and column-direction filtering are respectively performed on them, that is, the first training samples Img and the convolution kernel of 1*N are performed. The filtered image Img1 in the row direction (the first filtered image) is obtained by convolution, and the first training sample Img is convolved with another convolution kernel of N*1 to obtain the filtered image Img2 in the column direction (the second filtered image). image). The above-mentioned Img, Img1, and Img2 are merged into a picture of H*(3*W) (that is, the second training sample), as shown in Figure 5, in the second training sample, the first training sample Img is on the left, The first filtered image Img1 is in the middle and the second filtered image Img2 is on the right. A plurality of second training samples corresponding to the plurality of first training samples constitute a second training sample library. The plurality of second training samples are input into a preset multi-classification network for iterative training, and an image quality evaluation model can be obtained.

It can be understood that, using the image quality evaluation model trained in this embodiment, when performing deep learning to train the image quality evaluation model after completing the data labeling, the traditional digital image preprocessing is performed for the model input of the training quality evaluation, and the input of the model is increased. Feature information to improve the stability and generalization ability of the model

In one embodiment, the above-mentioned preset multi-classification network is a ResNet network, and the preset multi-classification network uses a binary cross-entropy function as a loss function and uses a softmax function for binary classification. Optionally, the preset multi-classification network may also use a network other than ResNet.

Based on the same technical concept, an embodiment of the present invention also provides an image quality assessment method, which uses the model training method of the above embodiment to perform the image quality assessment method, which specifically includes: receiving an image to be assessed; using the method described in the above embodiment The image quality evaluation model trained by the training method performs image quality evaluation on the image to be evaluated, so as to confirm that the image to be evaluated is one of a plurality of preset image quality levels.

Based on the same technical concept, an embodiment of the present invention further provides a model training device for executing the image quality assessment model training method provided in FIG. 1 above. FIG. 6 is a schematic structural diagram of a model training apparatus according to an embodiment of the present invention.

As shown in Figure 6, the model training device includes:

The obtaining module 601 is configured to obtain a real image sample set, wherein the real image sample set includes a plurality of real image samples.

The generative adversarial network module 602 is used to iteratively train the pre-built generative adversarial network by using the real image sample set, and collect multiple pseudo-image sample sets respectively generated by the generative network in the generative adversarial network in multiple iteration rounds;

The automatic labeling module 603 is used to generate a first training sample library consisting of a real image sample set and a plurality of pseudo-image sample sets, and according to a plurality of preset image quality levels, each first training sample of the first training sample library is Perform automatic grading and labeling to obtain the first training sample library with labels;

The model training module 604 is configured to use the first training sample library to train a preset multi-classification network to obtain an image quality evaluation model.

It should be noted that, the model training apparatus in the embodiment of the present application can implement each process of the foregoing embodiment of the model training method, and achieve the same effects and functions, which will not be repeated here.

Based on the same technical concept, the embodiments of the present invention further provide an image quality evaluation apparatus, which is used to execute the image quality evaluation methods provided by the above embodiments. Specifically, it includes: a receiving module for receiving an image to be evaluated; an evaluation module for performing image quality evaluation on the image to be evaluated by using the image quality evaluation model trained by the method in the first aspect, to confirm that the image to be evaluated is a plurality of pre-evaluated images. one of the set image quality levels.

FIG. 7 is a model training apparatus according to an embodiment of the present application, for executing the model training method shown in FIG. 1 , the apparatus includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein, The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the model training method described in the above embodiments.

FIG. 8 is an image quality evaluation apparatus according to an embodiment of the present application, which is used for executing the image quality evaluation method shown in the above-mentioned embodiment, the apparatus includes: at least one processor; and a memory connected in communication with the at least one processor ; wherein, the memory stores instructions executable by at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the image quality assessment method described in the above embodiments.

According to some embodiments of the present application, there is provided a non-volatile computer storage medium of a model training method and an image quality assessment method having computer-executable instructions stored thereon, the computer-executable instructions being configured to be executed when executed by a processor : the method described in the above embodiment.

Each embodiment in this application is described in a progressive manner, and the same and similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the apparatus, device, and computer-readable storage medium embodiments, since they are basically similar to the method embodiments, the description thereof is simplified, and reference may be made to the partial descriptions of the method embodiments for related parts.

The apparatuses, devices, and computer-readable storage media and methods provided in the embodiments of the present application are in one-to-one correspondence. Therefore, the apparatuses, devices, and computer-readable storage media also have beneficial technical effects similar to those of the corresponding methods. The beneficial technical effects of the method have been described in detail, therefore, the beneficial technical effects of the apparatus, equipment and computer-readable storage medium will not be repeated here.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

Memory may include non-persistent memory in computer readable media, random access memory (RAM) and/or non-volatile memory in the form of, for example, read only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. Furthermore, although the operations of the methods of the present invention are depicted in the figures in a particular order, this does not require or imply that the operations must be performed in the particular order, or that all illustrated operations must be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined to be performed as one step, and/or one step may be decomposed into multiple steps to be performed.

While the spirit and principles of the present invention have been described with reference to a number of specific embodiments, it should be understood that the invention is not limited to the specific embodiments disclosed, nor does the division of aspects imply that features of these aspects cannot be combined to perform Benefit, this division is only for convenience of presentation. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

A model training method, comprising:

obtaining a real image sample set, wherein the real image sample set includes a plurality of real image samples;

Iteratively trains the pre-built generative adversarial network by using the real image sample set, and collects multiple pseudo-image sample sets that are respectively generated by the generative network in the generative adversarial network in multiple iteration rounds;

Generate a first training sample library consisting of the real image sample set and the plurality of pseudo image sample sets, and perform each first training sample in the first training sample library according to a plurality of preset image quality levels. Automatic grading and labeling to obtain the labeled first training sample library;

The preset multi-classification network is trained by using the first training sample library to obtain an image quality evaluation model.
The method according to claim 1, wherein the automatic grading and labeling of each first training sample in the first training sample library comprises:

Marking the real image samples included in the real image sample set as the highest image quality level;

The plurality of pseudo-image samples included in each of the pseudo-image sample sets are marked as corresponding image quality levels according to the number of iteration rounds corresponding to each of the pseudo-image sample sets, wherein a higher iteration number corresponds to a higher Image quality level.
The method according to claim 1 or 2, wherein the automatic grading and labeling of each first training sample in the first training sample library comprises:

Marking a plurality of the real image samples included in the real image sample set as the highest image quality level;

Calculate the Frecher distance between each of the fake image sample sets and the real image sample set, and mark the multiple fake image samples included in each of the fake image sample sets as corresponding image quality levels according to the calculation results, wherein , a smaller Frechet distance corresponds to a higher level of image quality.
The method according to any one of claims 1-3, wherein the automatic grading and labeling of each first training sample in the first training sample library comprises:

Marking a plurality of the real image samples included in the real image sample set as the highest image quality level;

Calculate the mean square error (MSE) value of each fake image sample and the real image sample, and mark each fake image sample as a corresponding image quality level according to the mean square error (MSE) value, wherein the more Lower mean squared error (MSE) values correspond to higher image quality levels.
The method according to any one of claims 1-4, wherein the acquiring a real image sample set further comprises:

Collect multiple real images, and perform the following preprocessing operations on the multiple real images:

Determine a region of interest (ROI) in each of the real images by using an object detection algorithm, and perform cropping processing on each of the real images according to the determined region of interest (ROI); The real images are size-normalized to obtain the real image sample set.
The method according to claim 5, wherein the real image sample is a face image, and the object detection algorithm is a face detection algorithm.
The method according to claim 6, wherein after acquiring the real image sample set, the method further comprises:

Use a key point detection algorithm and/or a pose estimation algorithm to remove non-frontal face pictures in the real image sample set.
The method according to any one of claims 1-7, characterized in that, using the first training sample library to perform classification training on a preset multi-classification network to obtain an image quality evaluation model, comprising:

acquiring each labeled first training sample in the first training sample library, where the label is used to indicate an image quality level of the first training sample;

Perform row direction filtering processing on each of the first training samples to obtain a first filtered image;

Perform column-direction filtering processing on each of the first training samples to obtain a second filtered image;

splicing and merging each of the first training samples and the corresponding first filtered images and the second filtered images to generate a labeled second training sample respectively;

Respectively obtain multiple second training samples corresponding to multiple first training samples, and input multiple second training samples into the preset multi-classification network for iterative training to obtain the image quality assessment Model.
The method according to any one of claims 1-8, wherein,

The preset multi-classification network is a ResNet network, and the preset multi-classification network uses a binary cross-entropy function as a loss function and uses a softmax function for binary classification.
The method according to any one of claims 1-9, wherein the method further comprises:

The generative adversarial network is pre-built, and the generative adversarial network includes the generative network and the discriminant network; wherein,

The generation network includes a linear mapping layer, a plurality of convolutional layers, and a batch normalization function and a ReLU activation function after each of the plurality of convolutional layers, the generation network is used to receive random noise and generate the pseudo-image sample;

The discriminative network includes a plurality of convolutional layers and a LeakyRelu activation function layer and a pooling layer after each of the plurality of convolutional layers, and a fully connected layer after the plurality of convolutional layers , LeakyRelu activation function layer and sigmoid activation function layer, and the discriminant network is used to perform authenticity determination on the real image samples and the fake image samples.
The method of claim 10, wherein the method further comprises:

The loss function of the generating network adopts a cross entropy function.
An image quality assessment method, comprising:

receive images to be evaluated;

Perform image quality evaluation on the image to be evaluated by using the image quality evaluation model trained by the method according to any one of claims 1-11 to confirm that the image to be evaluated is of a plurality of preset image qualities one of the levels.
The method according to claim 12, wherein the image to be evaluated is a face image to be evaluated, the image quality evaluation model is used to perform quality evaluation on the face image, and the method further comprises:

After receiving the to-be-evaluated image, a face detection algorithm is used to determine a region of interest (ROI) in the to-be-evaluated face image, and the to-be-evaluated face image is analyzed according to the determined region of interest (ROI). cut out;

Perform size normalization on the cropped face image to be evaluated according to the size of the first training sample;

Utilize key point detection algorithm and/or pose estimation algorithm to determine whether the face image to be evaluated after size normalization is a frontal face image;

Wherein, if the to-be-evaluated face image is not a frontal image, the evaluation is stopped; if the to-be-evaluated face image is a frontal image, the image quality evaluation model is used to normalize the size of the to-be-evaluated image for image quality assessment.
The method according to claim 12 or 13, characterized in that, using the image quality evaluation model to perform image quality evaluation on the image to be evaluated, comprising:

Perform row direction filtering processing on the image to be evaluated to obtain a first filtered image to be evaluated;

Performing column-direction filtering processing on the image to be evaluated to obtain a second filtered image to be evaluated;

The combined image of the image to be evaluated, the first filtered image to be evaluated and the second filtered image to be evaluated is input into the image quality evaluation model for evaluation, so as to determine that the image to be evaluated is a preset multiple One of the image quality levels.
A model training device, comprising:

an acquisition module, configured to acquire a real image sample set, wherein the real image sample set includes a plurality of real image samples;

A generative adversarial network module is used to iteratively train a pre-built generative adversarial network using the real image sample set, and collect multiple pseudo-image samples generated by the generative network in the generative adversarial network in multiple iteration rounds. set;

An automatic labeling module, configured to generate a first training sample library consisting of the real image sample set and the plurality of pseudo-image sample sets, and assign each of the first training sample library according to a plurality of preset image quality levels. The first training samples are automatically graded and labeled to obtain the labeled first training sample library;

The model training module is used for training a preset multi-classification network by using the first training sample library to obtain an image quality evaluation model.
The device according to claim 15, wherein the automatic labeling module is further configured to:

Marking the real image samples included in the real image sample set as the highest image quality level;

The plurality of pseudo-image samples included in each of the pseudo-image sample sets are marked as corresponding image quality levels according to the number of iteration rounds corresponding to each of the pseudo-image sample sets, wherein a higher iteration number corresponds to a higher Image quality level.
The device according to claim 15 or 16, wherein the automatic labeling module is further used for:

Marking a plurality of the real image samples included in the real image sample set as the highest image quality level;

Calculate the Frecher distance between each of the fake image sample sets and the real image sample set, and mark the multiple fake image samples included in each of the fake image sample sets as corresponding image quality levels according to the calculation results, wherein , a smaller Frechet distance corresponds to a higher level of image quality.
The device according to any one of claims 15-17, wherein the automatic labeling module is further configured to:

Marking a plurality of the real image samples included in the real image sample set as the highest image quality level;

Calculate the mean square error (MSE) value of each fake image sample and the real image sample, and mark each fake image sample as a corresponding image quality level according to the mean square error (MSE) value, wherein the more Lower mean squared error (MSE) values correspond to higher image quality levels.
The device according to any one of claims 15-18, wherein the acquiring module is further configured to:

Collect multiple real images, and perform the following preprocessing operations on the multiple real images:

Determine a region of interest (ROI) in each of the real images by using an object detection algorithm, and perform cropping processing on each of the real images according to the determined region of interest (ROI); The real images are size-normalized to obtain the real image sample set.
The device according to claim 19, wherein the real image sample is a face image, and the object detection algorithm is a face detection algorithm.
The device according to claim 20, wherein after the acquisition of the real image sample set, the acquisition module is further configured to:

Use a key point detection algorithm and/or a pose estimation algorithm to remove non-frontal face pictures in the real image sample set.
The device according to any one of claims 15-21, wherein the model training module is further configured to:

acquiring each labeled first training sample in the first training sample library, where the label is used to indicate an image quality level of the first training sample;

Perform row direction filtering processing on each of the first training samples to obtain a first filtered image;

Perform column-direction filtering processing on each of the first training samples to obtain a second filtered image;

splicing and merging each of the first training samples and the corresponding first filtered images and the second filtered images to generate a labeled second training sample respectively;

Respectively obtain multiple second training samples corresponding to multiple first training samples, and input multiple second training samples into the preset multi-classification network for iterative training to obtain the image quality assessment Model.
The device according to any one of claims 15-22, characterized in that,

The preset multi-classification network is a ResNet network, and the preset multi-classification network uses a binary cross-entropy function as a loss function and uses a softmax function for binary classification.
The apparatus according to any one of claims 15-23, wherein the generative adversarial network module is further configured to:

The generative adversarial network is pre-built, and the generative adversarial network includes the generative network and the discriminant network; wherein,

The generation network includes a linear mapping layer, a plurality of convolutional layers, and a batch normalization function and a ReLU activation function after each of the plurality of convolutional layers, the generation network is used to receive random noise and generate the pseudo-image sample;

The discriminative network includes a plurality of convolutional layers and a LeakyRelu activation function layer and a pooling layer after each of the plurality of convolutional layers, and a fully connected layer after the plurality of convolutional layers , LeakyRelu activation function layer and sigmoid activation function layer, and the discriminant network is used to perform authenticity determination on the real image samples and the fake image samples.
The apparatus according to claim 24, wherein the loss function of the generation network adopts a cross-entropy function.
A device for evaluating image quality, comprising:

a receiving module for receiving the image to be evaluated;

The evaluation module is used to perform image quality evaluation on the image to be evaluated by using the image quality evaluation model trained by the method according to any one of claims 1-11, to confirm that the image to be evaluated is a plurality of pre-images. one of the set image quality levels.
The device according to claim 26, wherein the image to be evaluated is a face image to be evaluated, the image quality evaluation model is used to perform quality evaluation on the face image, and the evaluation module is further used for:

After receiving the to-be-evaluated image, a face detection algorithm is used to determine a region of interest (ROI) in the to-be-evaluated face image, and the to-be-evaluated face image is analyzed according to the determined region of interest (ROI). cut out;

Perform size normalization on the cropped face image to be evaluated according to the size of the first training sample;

Utilize key point detection algorithm and/or pose estimation algorithm to determine whether the face image to be evaluated after size normalization is a frontal face image;

Wherein, if the to-be-evaluated face image is not a frontal image, the evaluation is stopped; if the to-be-evaluated face image is a frontal image, the image quality evaluation model is used to normalize the size of the to-be-evaluated image for image quality assessment.
The device according to claim 26 or 27, wherein the evaluation module is further used for:

Perform row direction filtering processing on the image to be evaluated to obtain a first filtered image to be evaluated;

Performing column-direction filtering processing on the image to be evaluated to obtain a second filtered image to be evaluated;

The combined image of the image to be evaluated, the first filtered image to be evaluated and the second filtered image to be evaluated is input into the image quality evaluation model for evaluation, so as to determine that the image to be evaluated is a preset multiple One of the image quality levels.
A model training device, comprising:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to execute: such as The method of any one of claims 1-11.
An image quality assessment method, comprising:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to execute: such as The method of any of claims 12-14.
A computer-readable storage medium, the computer-readable storage medium stores a program, when the program is executed by a multi-core processor, the multi-core processor is made to execute the method according to any one of claims 1-11 method, or performing a method as claimed in any one of claims 12-14.