CN117078656A - Novel unsupervised image quality assessment method based on multi-mode prompt learning - Google Patents

Novel unsupervised image quality assessment method based on multi-mode prompt learning Download PDF

Info

Publication number
CN117078656A
CN117078656A CN202311131117.9A CN202311131117A CN117078656A CN 117078656 A CN117078656 A CN 117078656A CN 202311131117 A CN202311131117 A CN 202311131117A CN 117078656 A CN117078656 A CN 117078656A
Authority
CN
China
Prior art keywords
image
model
text
image quality
prompt
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311131117.9A
Other languages
Chinese (zh)
Inventor
纪荣嵘
高体民
潘文胜
郑侠武
张岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202311131117.9A priority Critical patent/CN117078656A/en
Publication of CN117078656A publication Critical patent/CN117078656A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

A novel unsupervised image quality assessment method based on multi-mode prompt learning belongs to the technical field of computer vision. The no reference image quality assessment aims to simulate human assessment of image quality without a reference image (original image). The invention fully plays the potential of the pre-trained CLIP model in the task of challenging image perception evaluation. The multi-mode prompt learning is introduced first, so that the expression space of the CLIP model in BIQA can be flexibly adjusted, and the potential in a challenging image perception evaluation task is stimulated. And secondly, improving a previous text prompt learning method, and replacing the anti-ambiguous text prompt learning used in the previous method by using the text prompt learning with fine granularity, so that the fine granularity characteristics of the image can be captured, and more accurate quality evaluation can be obtained.

Description

Novel unsupervised image quality assessment method based on multi-mode prompt learning
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a novel multi-mode prompt learning-based reference-free image quality assessment method.
Background
Image quality assessment (Image Quality Assessment, IQA for short) is an important research direction in the field of computer vision, and the main objective of the image quality assessment is to quantitatively predict and evaluate the visual quality of an image based on human visual perception features. Image quality assessment has important practical significance in many applications such as image processing, image transmission, video coding, and the like.
Over the years, a number of IQA methods have been developed and evaluated, which generally fall into 3 categories by how much the original reference image provides information: full Reference (Full Reference-IQA, FR-IQA), half Reference (Reduced Reference-IQA, RR-IQA) and No Reference (No Reference-IQA, NR-IQA), also called Blind Reference (Blind IQA, BIQA). The full reference image quality assessment may refer to the original undistorted image and obtain an image quality score of the distorted image based on a difference between the distorted image and the original image. The half-reference image quality assessment predicts image quality using partial information of the original image as a reference. However, in practical applications, the reference image is difficult to obtain, making the above two methods inapplicable, so that the work in recent years is gradually focused on the BIQA field.
Conventional reference-free image quality assessment methods generally rely on manually designed features and rules, and are difficult to cope with complex and variable image distortion and quality variation situations. With the rapid development of deep learning technology, an image quality evaluation method based on a deep neural network has made remarkable progress in terms of precision and generalization capability. The method utilizes the strong expression capability of the deep neural network, and can automatically learn and extract complex perception features in the image, thereby realizing accurate evaluation of the image quality. The limitation of manual design characteristics in the traditional method can be overcome through a deep learning method, and end-to-end training can be performed on a large-scale data set, so that the performance and generalization capability of image quality assessment are improved.
OpenAI proposes the CLIP model (Radford, et al learning transferable visual models from natural language preference in International conference on machine learning (PMLR), pages 8748-8763, 2021.) which is a self-supervising pre-training framework that encodes images and text into a shared vector space without labeling data. CLIP shows a powerful zero sample transfer learning capability over various tasks. Wang et al (Wang, et al, listing CLIP for assessing the look and feel of images. ArXiv preprintarXiv: 2207.12396 (2022)) explored the potential of CLIP in a challenging image perception assessment task for the first time. They propose a prompt pairing strategy using anti-ambiguous prompts (e.g. "good photo" and "bad photo") to reduce the ambiguity of the prompt. On this basis, zhang et al (Zhang, et al blind image quality assessment via vision-language correspondence: A multitask learning superpositive. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14071-14081,2023.) have proposed multitasking, fine tuning CLIP by combining image quality assessment with distortion type classification and scene classification tasks. This approach automatically determines sharing and loss weights for model parameters, exploiting the auxiliary knowledge from other tasks. They used a five quality level Likert scale, with the model output taking the quality level as a score for a logically weighted sum.
However, the effectiveness of these methods is often limited by the choice of entering the text prompt. For image quality assessment, it is crucial to select an appropriate text prompt, which may lead to instabilities and fluctuations in performance. In addition, some models in the existing methods still predict a higher quality score when the processed image is clear and other parts are distorted, which is not the case. On the other hand, the distortion condition of different areas of the image is rarely considered in the existing method, and only the perception evaluation of the whole image is focused. This makes these methods somewhat limited in processing images with local distortion.
Disclosure of Invention
The invention aims to provide a novel multi-mode prompt learning-based reference-free image quality assessment method, which fully exerts the potential of a CLIP model in a challenging image perception assessment task. The method introduces fine-grained text prompts, and enables the model to more accurately capture various fine differences of image quality by more finely classifying the image quality assessment problems. Meanwhile, each layer of the visual branch is added with a leachable image prompt, so that the model can better adjust the representation space of the model in the image quality assessment task, and the information related to quality in the image can be more comprehensively understood. In addition, a two-stage evaluation paradigm is also proposed, which enables a model to be adapted gradually from coarse perception to detail evaluation, thereby achieving a comprehensive understanding and accurate prediction of image quality.
The invention comprises the following steps:
1) Images with scores in the range of [49-50 ] are classified as level [50], corresponding to category [50], for example.
2) Introducing a learnable text prompt into a text branch of a model to solve the problem of prompt sensitivity of the CLIP model to downstream tasks; meanwhile, in order to enable the model to better adjust the representation space of the model in the image quality evaluation task, information related to quality in an image is more comprehensively understood, and a leachable image prompt is introduced into an image branch.
3) Training of the model is divided into two phases: in the first model training stage, only the text prompt and the image prompt are trained, other parameters are frozen, and the migrated CLIP model has the sensing and evaluation capability for the image quality under the condition of few training parameters. In the second model training stage, the prompts of the two branches are frozen, and only the image encoder is trained, so that the model can be gradually adapted to detail evaluation from coarse perception, and comprehensive understanding and accurate prediction of image quality are realized.
The invention has the characteristics and effects that:
the novel multi-mode prompt learning-based reference-free image quality assessment method provided by the invention solves the defect that the prior work has performance bottleneck, and greatly improves the performance. The invention introduces the concept of multi-mode prompt learning in the image quality assessment task, not only introduces the learnable text prompt in the text branch, but also adds the learnable image prompt in each layer of the image branch. The method and the device enable the model to fully utilize information interaction between the text and the image when evaluating the image quality, and better understand the quality-related characteristics in the image, so that the performance of the model in an image quality evaluation task is improved.
Drawings
FIG. 1 is a comparative diagram of the present invention and previous methods.
Fig. 2 is a frame diagram of the present invention.
Fig. 3 is a visual thermodynamic diagram of the present invention.
Detailed Description
The invention will be further illustrated by the following examples in conjunction with the accompanying drawings.
The flow of the method of the present invention is shown in FIG. 2. Comprising two phases. In the first model training stage, only the text prompt and the image prompt are trained, other parameters are frozen, and the migrated CLIP model has the sensing and evaluation capability on the image quality under the condition of few training parameters. In the second model training stage, the prompts of the two branches are frozen, and only the image encoder is trained, so that the model can be gradually adapted to detail evaluation from coarse perception, and comprehensive understanding and accurate prediction of image quality are realized. The upper right hand corner of fig. 2 is where a learnable hint is introduced at the image branches, with a learnable image hint added at each transducer layer of the image encoder.
1. Training instructions
The embodiment of the invention comprises the following steps:
1) Images with scores in the range of [49-50 ] are classified as level [50], corresponding to category [50], for example. The model will calculate the similarity from the output features of the text encoder and the input features of the image encoder and get the final quality score q (x) in a weighted sum.
Where P (c|x) represents the class probability after application of softmax and C represents the total number of classes.
2) A set of learnable text hints is introduced at the text branches of the model to fully exploit the characterization capabilities of the text encoder. The form of text entry is designed as follows: "[ X ]] 1 [X] 2 [X] 3 ...[X] M [class]。”“[X] 1 [X] 2 [X] 3 ...[X] M "is used to denote the prefix of the text input, where M denotes the number of learnable labels. In addition, at the image branch, a "depth visual cue" is introduced, which involves adding cues at each layer of the image encoder. The main purpose is to enhance the alignment between the perceived features of the image and the quality-level text features. By introducing cues at multiple levels, the learning process of image features is more finely controlled, thereby better adapting to the image quality assessment task.
A set of learnable labels P is introduced between the class token of the image and the block embedding of the image. At each layer of the image encoder, the input is represented as [ Cls ] i-1 ,P i-1 ,E i-1 ]Where Cls represents class token and E represents block embedding. Through the ith transducer layer L i After that, a new learnable marker P is continued to be introduced, which is connected to the outputs Cis and E.
[Cls i ,○,E i ]=L i ([Cls i-1 ,P i-1 ,E i-1 ])#(2)
Wherein O indicates not being input to the next transducer layer. This design allows the present invention to introduce a learnable marker at each layer of the image encoder, enriching the ability of the model to capture and represent important image features and quality characteristics, ultimately contributing to a more efficient and accurate image quality assessment.
3) Training of the model is divided into two phases. In the first model training stage, only the text prompt and the image prompt are trained, other parameters are frozen, and the migrated CLIP model has the sensing and evaluation capability on the image quality under the condition of few training parameters. The loss function used is:
wherein i represents the ith picture, T and V represent output characteristics of the text encoder and the image encoder respectively, sim (,) represents cosine similarity of the two characteristics, B represents a minimatch, and A represents a set of all pictures in a small batch belonging to the same category as the picture i.
In the second model training stage, the prompts of the two branches are frozen, and only the image encoder is trained, so that the model can be gradually adapted to detail evaluation from coarse perception, and comprehensive understanding and accurate prediction of image quality are realized. The loss function used at this stage is:
introducing loss of fidelityTo consider pair-wise learning to rank the model estimates. Furthermore, use is made of a smooth L1LossAnd cross entropy loss with tag smoothing +.>To optimize. Wherein α and β are balance ∈ ->And->Is a coefficient of (a).
p and p' represent the true probability distribution and the predicted probability distribution, respectively.
I′ k =(1-ε)I k +ε/C represents the value in the quality level target distribution, P k Representing the predicted logits of category k.
2. Implementation details
1) Model details
The invention is implemented using a Pytorch framework. Both the image encoder and the text encoder of the present invention are derived from the CLIP framework. Specifically, a ViT-B/16 architecture and a single layer transducer decoder were employed as the image encoder. This configuration contains 12 transducer layers, each with a concealment size of 768 dimensions. To match the output of the text encoder, the linear layer is used to reduce the dimension of the image feature vector from 768 to 512.
2) Training details
The training process is divided into two phases, each phase containing 60 epochs. In the first stage, only the learnable text cues and image cues are focused, freezing other parameters. Initial learning rate was 3×10 using Adam optimizer -5 And then performing attenuation by adopting cosine learning rate scheduling. To enhance the training data, each original image was randomly cropped into 8 sub-images, each sub-image having a size of 3×224×224. In the second stage, the image encoder is optimized. Adam optimizers are still used, but this phase involves a warm-up phase of up to 10 epochs. Learning rateFrom 9.5X10- 7 The linearity increases to 5 x 10-6. The learning rate was reduced at 30 th and 50 th epochs by multiplying by 0.1. Data enhancement at this stage includes random horizontal flipping and random clipping. The coefficient α is set to 0.001, and β is set to 0.1.
Each image is assigned a new category label based on the score label for each image. The batch sizes of LIVE and CSIQ data sets are set to 32, while the batch sizes of other data sets are set to 64. Data were divided into 80% for training and 20% for testing. To reduce the performance bias, each experiment was repeated 10 times and the average PLCC and SRCC were calculated.
3. Application field
The method and the device can be applied to the field of non-reference image quality evaluation, and can be used for judging the quality of the distorted picture under the condition of no original image.
Table 1 is a comparison of the performance of the model of the present invention with the previous model over the 6 common image quality assessment datasets. These datasets include four real datasets and two synthetic datasets. The real data set used includes LIVE, CSIQ, TID2013 and KADID. Synthetic datasets include LIVEC and KoniQ. For evaluating the performance of the model, pearson Linear Correlation Coefficient (PLCC) and spearman scale correlation coefficient (SRCC) were used as evaluation indexes. PLCC measures the accuracy of model predictions, while SRCC evaluates the monotonicity of BIQA algorithm predictions. The values of the two indexes are all from 0 to 1, and the higher the numerical value is, the better the prediction accuracy and the monotonicity are.
As can be seen from table 1, the model of the present invention exhibits competitive performance over the most advanced methods across all data sets. Notably, the most advanced performance is achieved on CSIQ, TID2013 and KADID datasets by increasing PLCC index by 1.0%, 1.7% and 5.0%, and SRCC index by 1.9%, 2.2% and 4.2%, respectively, over existing methods. These results highlight the effectiveness and leading performance of the model of the present invention in image quality assessment.
TABLE 1
Table 1:Performance comparison is measured by averages of SRCC and PLCC.Best results are highlighted in bold,second-best are underlined.
Table 2 is a generalization performance comparison of the model of the present invention over previous models on intersecting datasets. Specifically, one BIQA model is trained on one dataset and evaluated directly on the other dataset without trimming or adjusting parameters. Four data sets were used and median experimental results were reported. The model of the present invention exhibits superior performance on both the KonIQ and CSIQ data sets and remains competitive on other data sets. These experimental results highlight the robust generalization performance of the model.
TABLE 2
Figure 1 gives a comparison of the present invention and the previous method. It can be seen that CLIP-IQA + A learnable text hint is introduced on the basis of CLIP-IQA, but the way of disambiguation classification is not changed, and in comparison with the former two, a fine-grained way of classification by quality score interval is introduced. In addition, the invention also introduces a learnable prompt in the image branches.
FIG. 3 shows a visual thermodynamic diagram of the present invention, using GradCAM to visualize DEIQT (Qin G, hu R, liuY, et al data-Efficient Image Quality Assessment with Attention-Panel Decoder [ J ]. ArXiv preprint arXiv:2304.04952,2023 ]) and a feature-attention diagram of an input image in a model of the present invention. Focusing on low scoring images, the aim was to reveal the reason why the model of the invention was superior to DEIQT on the level. As shown in fig. 3, DEIQT is observed to be excessively concentrated in distortion within the main content of the image. This results in that DEIQT predicts a higher quality score even though the main content of the image remains clear, while other areas are severely distorted, which is clearly not realistic. In contrast, the model of the present invention takes into account the distortion of different regions of the image, thereby making a more accurate assessment of image quality. This advantage results from the design of the multi-mode prompt and two-stage training modes of the present invention. First, the multi-modal prompt design allows the model to contain both textual and visual information, enabling a more comprehensive understanding of the inherent features in the image. Particularly in low quality images, the model of the present invention can effectively capture fine distortions that may be ignored. Secondly, the two-stage training strategy of the invention enables the model to gradually adapt to the BIQA task, and the full understanding of the image quality is realized from rough perception to detailed evaluation.
The above-described embodiments are merely preferred embodiments of the present invention and should not be construed as limiting the scope of the present invention. All equivalent changes and modifications within the scope of the present invention are intended to be covered by the present invention.

Claims (4)

1. The novel unsupervised image quality assessment method based on multi-mode prompt learning is characterized by comprising the following steps of:
1) Grading according to the score of each picture in the data set and giving a grade or category label;
2) Introducing a learnable text prompt into a text branch of a model to solve the problem of prompt sensitivity of the CLIP model to downstream tasks; meanwhile, in order to enable the model to better adjust the representation space in the image quality assessment task, the information related to quality in the image is more comprehensively understood, and a learnable image prompt is introduced into the image branch;
3) Training of the model is divided into two phases: in the first model training stage, only training text prompts and image prompts, freezing other parameters, and enabling the migrated CLIP model to have the sensing and evaluation capability on the image quality under the condition of few training parameters; in the second model training stage, the prompts of the two branches are frozen, and only the image encoder is trained, so that the model can be gradually adapted to detail evaluation from coarse perception, and comprehensive understanding and accurate prediction of image quality are realized.
2. The method for evaluating the quality of an unsupervised image based on multi-modal prompt learning as claimed in claim 1, wherein in step 1), the images are classified according to the score of each picture in the dataset and are given a class or category label, and if the images with the score in the range of [49-50 ] are classified as class [50], the images are classified as class [50]; the model calculates the similarity according to the output characteristics of the text encoder and the input characteristics of the image encoder, and obtains the final quality score q (x) in a weighted summation mode:
where P (c|x) represents the class probability after application of softmax and C represents the total number of classes.
3. The novel multi-modal prompt learning-based unsupervised image quality assessment method as claimed in claim 1, wherein in step 2), in the text branch of the model, a learnable text prompt is introduced to solve the problem of prompt sensitivity of the CLIP model to downstream tasks, in particular:
a set of learnable text prompts is introduced to fully exploit the characterization capabilities of the text encoder to design a form of text input as follows: "[ X ]] 1 [X] 2 [X] 3 ...[X] M [class];”“[X] 1 [X] 2 [X] 3 ...[X] M "used to denote a prefix of text input, where M denotes the number of learnable labels; in addition, at the image branch, a "depth visual cue" is introduced, which involves adding cues at each layer of the image encoder for enhancing the alignment between the perceived features of the image and the quality-level text features; by introducing prompts at a plurality of layers, the learning process of the image features is controlled more finely, so that the image quality assessment task is adapted better;
introducing a group of learnable marks P between a class token of the image and the block embedding of the image; at each layer of the image encoder, inputRepresented as [ Cls ] i-1 ,P i-1 ,E i-1 ]Wherein Cls represents class token and E represents block embedding; through the ith transducer layer L i Then, continuously introducing a new learnable mark P, and connecting with the output Cls and E;
[Cls i ,○,E i ]=Li([Cls i-1 ,P i-1 ,E i-1 ])#(2)
wherein O represents the input not as the next transducer layer; this design allows the introduction of a learnable marker at each layer of the image encoder, enriching the ability of the model to capture and represent important image features and quality characteristics, ultimately contributing to a more efficient and accurate image quality assessment.
4. The novel multi-modal prompt learning-based unsupervised image quality assessment method according to claim 1, wherein in step 3), the training of the model is divided into two phases: in the first model training stage, only training text prompts and image prompts, freezing other parameters, and enabling the migrated CLIP model to have the sensing and evaluation capability on the image quality under the condition of few training parameters; the loss function used is:
wherein i represents an ith picture, T and V represent output characteristics of a text encoder and an image encoder respectively, sim (,) represents cosine similarity of the two characteristics, B represents a minimatch, and A represents a set of all pictures in a small batch belonging to the same category as the picture i;
in the second model training stage, the prompts of the two branches are frozen, and only the image encoder is trained, so that the model can be gradually adapted to detail evaluation from coarse perception, and the comprehensive understanding and accurate prediction of the image quality are realized; the loss function used at this stage is:
introducing loss of fidelityTo consider pair-wise learning to rank model estimates; furthermore, use of smoothness +.>And cross entropy loss with tag smoothing +.>To optimize; wherein α and β are balance ∈ ->And->Coefficients of (2);
wherein p and p' represent the true probability distribution and the predicted probability distribution, respectively;
wherein I' k =(1-ε)I k +ε/C represents the value in the quality level target distribution, P k Representing the predicted logits of category k.
CN202311131117.9A 2023-09-04 2023-09-04 Novel unsupervised image quality assessment method based on multi-mode prompt learning Pending CN117078656A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311131117.9A CN117078656A (en) 2023-09-04 2023-09-04 Novel unsupervised image quality assessment method based on multi-mode prompt learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311131117.9A CN117078656A (en) 2023-09-04 2023-09-04 Novel unsupervised image quality assessment method based on multi-mode prompt learning

Publications (1)

Publication Number Publication Date
CN117078656A true CN117078656A (en) 2023-11-17

Family

ID=88715252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311131117.9A Pending CN117078656A (en) 2023-09-04 2023-09-04 Novel unsupervised image quality assessment method based on multi-mode prompt learning

Country Status (1)

Country Link
CN (1) CN117078656A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117437211A (en) * 2023-11-20 2024-01-23 电子科技大学 Low-cost image quality evaluation method based on double-bias calibration learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117437211A (en) * 2023-11-20 2024-01-23 电子科技大学 Low-cost image quality evaluation method based on double-bias calibration learning

Similar Documents

Publication Publication Date Title
CN109949317B (en) Semi-supervised image example segmentation method based on gradual confrontation learning
CN109299274B (en) Natural scene text detection method based on full convolution neural network
US7187811B2 (en) Method for image resolution enhancement
JP2006172437A (en) Method for determining position of segment boundary in data stream, method for determining segment boundary by comparing data subset with vicinal data subset, program of instruction executable by computer, and system or device for identifying boundary and non-boundary in data stream
CN110598018B (en) Sketch image retrieval method based on cooperative attention
CN114282047A (en) Small sample action recognition model training method and device, electronic equipment and storage medium
CN111696136A (en) Target tracking method based on coding and decoding structure
Al-Amaren et al. RHN: A residual holistic neural network for edge detection
CN114926826A (en) Scene text detection system
CN112967292B (en) Automatic cutout and scoring method and system for E-commerce products
CN111428730A (en) Weak supervision fine-grained object classification method
CN112348809A (en) No-reference screen content image quality evaluation method based on multitask deep learning
CN117058386A (en) Asphalt road crack detection method based on improved deep Labv3+ network
CN117011515A (en) Interactive image segmentation model based on attention mechanism and segmentation method thereof
CN117078656A (en) Novel unsupervised image quality assessment method based on multi-mode prompt learning
CN114821174B (en) Content perception-based transmission line aerial image data cleaning method
CN115294424A (en) Sample data enhancement method based on generation countermeasure network
CN114596433A (en) Insulator identification method
CN114445662A (en) Robust image classification method and system based on label embedding
Zhao et al. Single image dehazing based on enhanced generative adversarial network
CN113077525A (en) Image classification method based on frequency domain contrast learning
CN117115123A (en) Novel reference-free image quality assessment method based on text-image pair
CN111626409B (en) Data generation method for image quality detection
Montajabi et al. Using ML to Find the Semantic Region of Interest
Xu et al. Blind Quality Assessment of Tone-Mapped Images with Multi-scale Visual Feature Extraction Neural Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination