CN116310375A - Blind image quality assessment method based on visual attention mechanism - Google Patents

Blind image quality assessment method based on visual attention mechanism Download PDF

Info

Publication number
CN116310375A
CN116310375A CN202310003353.6A CN202310003353A CN116310375A CN 116310375 A CN116310375 A CN 116310375A CN 202310003353 A CN202310003353 A CN 202310003353A CN 116310375 A CN116310375 A CN 116310375A
Authority
CN
China
Prior art keywords
image
attention
feature
level features
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310003353.6A
Other languages
Chinese (zh)
Inventor
于天河
孙岩
程士成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN202310003353.6A priority Critical patent/CN116310375A/en
Publication of CN116310375A publication Critical patent/CN116310375A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a blind image quality assessment method based on a visual attention mechanism. The method comprises the following steps: the original image is input into a feature extraction network A to extract high-level features and low-level features after size limitation; preprocessing the original image to generate an just-noticeable distortion image and a significant image, and inputting the just-noticeable distortion image and the significant image into a feature extraction network B to extract high-level features; the low-level features extracted from the original image are subjected to respective dimension reduction pooling modules to obtain feature vectors; fusing the high-level features extracted from the original image and the just-perceived distortion image with features obtained by dimension reduction pooling; and obtaining the quality fraction in a quality regression network according to the fused feature vector. The method combines visual attention mechanisms, uses different attention mechanisms in different feature extraction networks, and the extracted features are more in line with the attention characteristics of human eyes, so that the influence of low-layer features extracted from images on the image quality is considered, and the image quality evaluation is more accurate.

Description

Blind image quality assessment method based on visual attention mechanism
Technical Field
The invention relates to a blind image quality assessment method based on a visual attention mechanism, and belongs to the field of image processing.
Background
With the continuous development of information technology, digital information resources are exploded, and people can acquire complex and various information through electronic equipment. The image information can intuitively record objective things, and is more efficient compared with text information, voice information and the like. Image information is commonly found in various fields in daily life, and is one of the most widely applied and most efficient information media. However, noise and other interference are inevitably introduced in the process of image acquisition and transmission and the like, so that the image quality is reduced. Low quality images severely impact the human visual experience and the development of computer vision in various areas, and therefore how to accurately evaluate the quality of images is a fundamental and important issue.
In objective image quality assessment, blind image quality assessment methods have received considerable attention from researchers because of the involvement of reference images is not required. The Kang et al first applied a convolutional neural network (Convolutional Neural Network, CNN) to the image quality assessment, the model was a shallow CNN network, and the image was diced and then input to the network to obtain the quality score. Ma et al also designed a multi-task learning model based on a CNN network comprising two sub-networks, a quality prediction sub-network and a distortion class identification sub-network, respectively, wherein the two sub-network features extract part of parameter sharing, pre-train the distortion class identification sub-network, and then perform overall network training.
Currently, most methods for evaluating image quality using deep learning directly extract features on distorted images, and the human eye vision system does not look at each region of the image at the same time, but screens out and focuses on the regions of interest that are partially related. Both of these methods ignore the effect of the human visual attention mechanism on the image quality assessment, resulting in inaccurate prediction results. And for image quality assessment, only features extracted by using the deep network ignore the influence of low-level features, such as texture and gradient features, on quality to a certain extent.
Disclosure of Invention
The invention aims to disclose a blind image quality assessment method based on a visual attention mechanism, which uses the attention mechanism to make a network pay more attention to a part of an image with larger influence on image quality and takes account of low-layer characteristics extracted from the image, thereby improving the prediction precision of quality scores of distorted images.
A blind image quality assessment method based on visual attention mechanisms, comprising:
step 1, inputting an original image to a feature extraction network A to extract high-level features and low-level features after size limitation;
step 2, preprocessing the original image to generate an just noticeable distortion image and a significant image, and inputting the just noticeable distortion image and the significant image into a feature extraction network B to extract high-level features;
step 3, obtaining feature vectors from the low-level features extracted in the step 1 through respective dimension reduction pooling modules;
step 4, carrying out feature fusion on the high-level features extracted in the step 1 and the step 2 and the features obtained in the step 3;
and step 5, obtaining the quality score in a quality regression network according to the fused feature vector.
The invention is also characterized in that:
the size in step 1 is defined as the input picture size being limited to a range of n x n, and when the image is wider or taller than n, scaling the wider or taller than n to n, where n takes 512 pixels.
In the step 1, the original distorted image with limited size is taken as input, the feature extraction network A is based on a MobilenetV2 network, a mixed attention module is added in the last pouring residual of each bottleneck structure, the low-level features are feature graphs output by the second bottleneck structure and the fourth bottleneck structure, and the high-level features are feature graphs output by the last bottleneck structure of the network.
The added mixed attention module is formed by adopting a mode of firstly channel module and then space module. The channel attention specifically operates as follows:
C=Mul(σ(a(K 1 (GAP(m)),K 2 (GMP(m)))),m)
wherein m is a feature diagram to be passed through the channel attention module, GAP and GMP are global max pooling and global average pooling operations on m, respectively, K 1 And K 2 To perform 1×1 adaptive convolution operations on GAP and GMP post-operation features, a is to be K 1 And K 2 The operated features are added by corresponding elements, sigma is a sigmod operation, mul is corresponding multiplication of channel weights after m and sigma operations, and C passing through a channel attention module is input into space attention. The spatial attention is the spatial attention portion of the CBAM attention module, which is a top-down data driven attention.
In step 2, the just-noticeable distortion map and the saliency map are taken as inputs, the saliency map and the just-noticeable distortion map are spliced into two-channel images, and the saliency map is taken as a space attention part of the just-noticeable distortion map. The specific process is as follows: a saliency map and an just noticeable distortion map of the image are extracted using the saliency extraction model and the just noticeable distortion model. Splicing the salient image and the just noticeable distortion image to obtain an image with two channels; inputting the spliced images into a designed feature extraction network B, wherein the network is used for removing a spatial attention module for the feature extraction network A in the step 2;
the specific process of the step 3 is as follows: and (3) respectively carrying out dimension reduction pooling on the low-level features extracted in the step (1) by a dimension reduction pooling module, wherein the dimension reduction pooling module comprises average pooling, 1 multiplied by 1 and SPP pooling, reducing the width and the height of the feature map by half by 2 multiplied by 2 average pooling with the step length of 2, reducing the number of channels to 10 by 1 multiplied by 1 convolution, and finally converting the feature map into one-dimensional feature vectors by SPP pooling.
The step 4 is specifically as follows: the method comprises the steps of twice feature fusion, firstly, channel splicing is carried out on the high-level features extracted in the step 1 and the step 2, the spliced features are sent into a self-adaptive average pool to obtain a feature vector, and then the obtained feature vector is spliced with the feature vector obtained in the step 3 to obtain the features of the quality regression network to be input finally.
The beneficial effects of the invention are as follows:
the invention provides a blind image quality assessment method based on a visual attention mechanism, which constructs two feature extraction networks, wherein the two paths of networks use the same channel attention, different spatial attention is used for different input images, the extracted features are more in line with the attention characteristics of human eyes, and the accuracy of image quality assessment is improved. The image quality evaluation requires low-level information such as gradient and texture of an image and high-level semantic information, and the low-level features extracted from the original image make up the defect that most of the current image quality evaluation methods for deep learning only use the high-level features, so that the accuracy of image quality evaluation is improved.
Drawings
Fig. 1 is a flow chart of a blind image quality assessment method based on visual attention mechanisms according to the present invention.
FIG. 2 is a diagram of a dimension reduction pooling module according to the present invention
Detailed Description
The invention provides a blind image quality assessment method based on a visual attention mechanism. In order to better understand the technical solution in the embodiments of the present invention and make the above objects, features and advantages of the present invention more obvious, the following technical solution is further described in detail with reference to the accompanying drawings:
the invention firstly provides a blind image quality assessment method based on a visual attention mechanism, as shown in fig. 1, and the specific method is as follows:
step 1, inputting an original image to a feature extraction network A to extract high-level features and low-level features after size limitation;
the step 1 specifically comprises the following steps:
the input picture size is limited to the n x n range, and when the image width or height is greater than n, the image width or height greater than n is scaled to n, where n takes 512 pixels. The image with limited size is input into a feature extraction network, the feature extraction network A is based on a Mobilene V2 network, and the Mobilene V2 network is a lightweight convolutional neural network and has the advantages of small parameter quantity, small calculation amount, high accuracy and the like, and can extract rich image features while keeping small calculation cost. And removing the last output layer and the pooling layer of the Mobilene V2, adding a mixed attention module in the last pouring residual error of each bottleneck structure of the Mobilene V2, taking the characteristics output by the second bottleneck structure and the fourth bottleneck structure as low-level characteristics, and taking the characteristic diagram output by the last bottleneck structure as high-level characteristics.
The added mixed attention module is formed by adopting a mode of firstly channel module and then space module. The channel attention specifically operates as follows:
C=Mul(σ(a(K 1 (GAP(m)),K 2 (GMP(m)))),m)
wherein m is a feature diagram to be passed through the channel attention module, GAP and GMP are global max pooling and global average pooling operations on m, respectively, K 1 And K 2 To perform 1×1 adaptive convolution operations on GAP and GMP post-operation features, a is to be K 1 And K 2 The operated features are added by corresponding elements, sigma is a sigmod operation, mul is corresponding multiplication of channel weights after m and sigma operations, and C passing through a channel attention module is input into space attention. The spatial attention is the spatial attention portion of the CBAM attention module. The mixed attention module is essentially task driven to assign weights, which is a top-down attention.
Step 2, preprocessing the original image to generate an just noticeable distortion image and a significant image, and inputting the just noticeable distortion image and the significant image into a feature extraction network B to extract high-level features;
the step 2 is specifically as follows:
the just-noticeable distortion map and the saliency map are taken as inputs, the just-noticeable distortion map is obtained by the just-noticeable distortion model, and the just-noticeable distortion map reflects the sensitivity of human eyes to different distortions and the perceived distortion threshold value as the existing model. The saliency map is taken as a space attention part of the just noticeable distortion image, the saliency map expresses the attention of saliency, the attention is typically from bottom to top and driven by external stimulus, and the saliency map is obtained by a saliency extraction model and is an existing model.
Extracting a saliency map and an just-noticeable distortion map of the image using the saliency extraction model and the just-noticeable distortion model; splicing the salient image and the just noticeable distortion image to obtain an image with two channels; inputting the spliced images into a designed feature extraction network B, wherein the network is used for removing a spatial attention module for the feature extraction network A in the step 2;
step 3, obtaining feature vectors from the low-level features extracted in the step 1 through respective dimension reduction pooling modules;
the step 3 is specifically as follows:
taking the feature graphs with the second bottleneck structure and the fourth bottleneck structure in the network as low-level features, respectively carrying out dimension reduction pooling modules shown in fig. 2, reducing the width and the height of the feature graphs by half by 2×2 average pooling with the step length of 2, reducing the number of channels to 10 by 1×1 convolution, and finally converting the feature graphs into one-dimensional feature vectors by SPP pooling.
And 4, carrying out feature fusion on the high-level features extracted in the step 1 and the step 2 and the features obtained in the step 3.
The step 4 is specifically as follows:
the method comprises the steps of twice feature fusion, firstly, channel splicing is carried out on the high-level features extracted in the step 1 and the step 2, the spliced features are sent into a self-adaptive average pool to obtain a feature vector, and then the obtained feature vector is spliced with the feature vector obtained in the step 3 to obtain the features of the quality regression network to be input finally.
Step 5, obtaining quality scores in a quality regression network according to the fused feature vectors;
the step 5 is specifically as follows:
the quality regression network does not extract image features any more but forms a mapping relationship of the previously extracted features with the image quality. The quality regression network consisted of 4 fully connected layers, using a sigmod activation function, the loss function using a smoothl 1 loss function in combination with the ordering loss as the final loss function. The network parameters are updated using a back propagation mechanism. Compared with L1 and L2 loss functions, the Smooth L1 loss function has the characteristics of faster convergence, insensitivity to abnormal values and easy training. The ranking loss refers to the performance of the lifting model that the predicted image quality score of the network is the same as the actual quality score in order, and the use of the smoth L1 loss function in combination with the ranking loss function is more beneficial.
The Smooth L1 loss function formula is as follows:
Figure BDA0004034932130000051
the ordering penalty is as follows:
Figure BDA0004034932130000052
Figure BDA0004034932130000053
the total loss function formula is as follows:
L=α×Smooth L1 +β×L rank
wherein x is i True value, y, represented in the ith picture i Is the predicted value of the i-th picture,
Figure BDA0004034932130000054
for the ordering penalty between picture i and picture j, α and β are the weights of the smoth L1 penalty and ordering penalty, respectively.
The invention relates to a blind image quality assessment method based on a visual attention mechanism, which is characterized in that features are extracted on an original image and an just-perceived distortion image respectively, the just-perceived distortion image reflects the distorted perceived features of a human eye visual system, the attention driven by the features from top to bottom is adopted when the features are extracted on the original image, the just-perceived distortion image loses part of the features while reflecting the distorted perceived features, the attention driven by external stimulus from bottom to top, namely a salient image is used for expressing a salient region of the blind image, two paths of networks for extracting the features use the same channel attention, and the extracted features are more in line with the attention characteristics of human eyes and improve the precision of image quality assessment. The method also combines the high-low layer characteristics extracted from the image, and improves the accuracy of image quality assessment.

Claims (6)

1. A blind image quality assessment method based on a visual attention mechanism, characterized in that: the method is realized by the following steps:
step 1, inputting an original image to a feature extraction network A to extract high-level features and low-level features after size limitation;
step 2, preprocessing the original image to generate an just noticeable distortion image and a significant image, and inputting the just noticeable distortion image and the significant image into a feature extraction network B to extract high-level features;
step 3, obtaining feature vectors from the low-level features extracted in the step 1 through respective dimension reduction pooling modules;
step 4, carrying out feature fusion on the high-level features extracted in the step 1 and the step 2 and the features obtained in the step 3;
and step 5, obtaining the quality score in a quality regression network according to the fused feature vector.
2. The visual attention mechanism based blind image quality assessment method of claim 1, wherein: the size in the step 1 is limited to be limited in the range of n×n, when the width or height of the image is larger than n, the width or height of the image larger than n is scaled to n, and n is 512 pixels.
3. The visual attention mechanism based blind image quality assessment method of claim 1, wherein: in the step 1, the feature extraction network a is based on a mobiletv 2 network, a mixed attention module is added in the last pouring residual of each bottleneck structure, the low-level features are feature graphs output by the second bottleneck structure and the fourth bottleneck structure, the high-level features are feature graphs output by the last bottleneck structure of the network, and the channel attention in the added mixed attention module specifically operates as follows:
C=Mul(σ(a(K 1 (GAP(m)),K 2 (GMP(m)))),m)
wherein m is a feature diagram to be passed through the channel attention module, GAP and GMP are global max pooling and global average pooling operations on m, respectively, K 1 And K 2 To perform 1×1 adaptive convolution operations on GAP and GMP post-operation features, a is to be K 1 And K 2 The operated characteristics are added by corresponding elements, sigma is a sigmod operation, mul is that channel weights after m and sigma operations are correspondingly multiplied, C which passes through a channel attention module is input into a space attention, and the space attention is a space attention part of a CBAM attention module.
4. The visual attention mechanism based blind image quality assessment method of claim 1, wherein: and in the step 2, the saliency map and the just noticeable distortion map are spliced into two-channel images, the saliency map is used as a space attention part of the just noticeable distortion map, and the feature extraction network B is a network space attention removal module in the step 1.
5. The visual attention mechanism based blind image quality assessment method of claim 1, wherein: the dimension reduction pooling module in the step 3 comprises average pooling, 1×1 convolution and SPP pooling, the low-level features extracted in the step 1 are subjected to downsampling by the average pooling, the number of channels is reduced by the 1×1 convolution, and finally the feature map is converted into one-dimensional feature vectors by the SPP pooling.
6. The visual attention mechanism based blind image quality assessment method of claim 1, wherein: the step 4 comprises twice feature fusion, namely, firstly, channel splicing is carried out on the high-level features extracted in the step 1 and the step 2, the spliced features are sent into a self-adaptive average pool to obtain a feature vector, and then, the obtained feature vector is spliced with the feature vector obtained in the step 3 to obtain the features which are finally input into a quality regression network.
CN202310003353.6A 2023-01-03 2023-01-03 Blind image quality assessment method based on visual attention mechanism Pending CN116310375A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310003353.6A CN116310375A (en) 2023-01-03 2023-01-03 Blind image quality assessment method based on visual attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310003353.6A CN116310375A (en) 2023-01-03 2023-01-03 Blind image quality assessment method based on visual attention mechanism

Publications (1)

Publication Number Publication Date
CN116310375A true CN116310375A (en) 2023-06-23

Family

ID=86815744

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310003353.6A Pending CN116310375A (en) 2023-01-03 2023-01-03 Blind image quality assessment method based on visual attention mechanism

Country Status (1)

Country Link
CN (1) CN116310375A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117611516A (en) * 2023-09-04 2024-02-27 北京智芯微电子科技有限公司 Image quality evaluation, face recognition, label generation and determination methods and devices

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117611516A (en) * 2023-09-04 2024-02-27 北京智芯微电子科技有限公司 Image quality evaluation, face recognition, label generation and determination methods and devices

Similar Documents

Publication Publication Date Title
CN110363716B (en) High-quality reconstruction method for generating confrontation network composite degraded image based on conditions
CN113688723B (en) Infrared image pedestrian target detection method based on improved YOLOv5
US20230186056A1 (en) Grabbing detection method based on rp-resnet
CN112634276A (en) Lightweight semantic segmentation method based on multi-scale visual feature extraction
CN110569851B (en) Real-time semantic segmentation method for gated multi-layer fusion
CN109874053A (en) The short video recommendation method with user's dynamic interest is understood based on video content
CN110751649B (en) Video quality evaluation method and device, electronic equipment and storage medium
CN111476133B (en) Unmanned driving-oriented foreground and background codec network target extraction method
CN111832453B (en) Unmanned scene real-time semantic segmentation method based on two-way deep neural network
CN112487949A (en) Learner behavior identification method based on multi-modal data fusion
CN113554032A (en) Remote sensing image segmentation method based on multi-path parallel network of high perception
CN114120272A (en) Multi-supervision intelligent lane line semantic segmentation method fusing edge detection
CN111079864A (en) Short video classification method and system based on optimized video key frame extraction
CN114913493A (en) Lane line detection method based on deep learning
CN114781499B (en) Method for constructing ViT model-based intensive prediction task adapter
CN116310375A (en) Blind image quality assessment method based on visual attention mechanism
CN115035171A (en) Self-supervision monocular depth estimation method based on self-attention-guidance feature fusion
CN115565043A (en) Method for detecting target by combining multiple characteristic features and target prediction method
CN112418032A (en) Human behavior recognition method and device, electronic equipment and storage medium
CN116310305A (en) Coding and decoding structure semantic segmentation model based on tensor and second-order covariance attention mechanism
US20220301106A1 (en) Training method and apparatus for image processing model, and image processing method and apparatus
CN117314787A (en) Underwater image enhancement method based on self-adaptive multi-scale fusion and attention mechanism
CN112149496A (en) Real-time road scene segmentation method based on convolutional neural network
CN116977200A (en) Processing method and device of video denoising model, computer equipment and storage medium
CN117151987A (en) Image enhancement method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination