CN116563615B - Bad picture classification method based on improved multi-scale attention mechanism - Google Patents

Bad picture classification method based on improved multi-scale attention mechanism Download PDF

Info

Publication number
CN116563615B
CN116563615B CN202310434595.0A CN202310434595A CN116563615B CN 116563615 B CN116563615 B CN 116563615B CN 202310434595 A CN202310434595 A CN 202310434595A CN 116563615 B CN116563615 B CN 116563615B
Authority
CN
China
Prior art keywords
feature
attention
feature map
picture
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310434595.0A
Other languages
Chinese (zh)
Other versions
CN116563615A (en
Inventor
吴馨
石晓涛
王哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Xunsiya Information Technology Co ltd
Original Assignee
Nanjing Xunsiya Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Xunsiya Information Technology Co ltd filed Critical Nanjing Xunsiya Information Technology Co ltd
Priority to CN202310434595.0A priority Critical patent/CN116563615B/en
Publication of CN116563615A publication Critical patent/CN116563615A/en
Application granted granted Critical
Publication of CN116563615B publication Critical patent/CN116563615B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Abstract

A bad picture classification method based on an improved multi-scale attention mechanism relates to the technical fields of artificial intelligence, attention and multi-scale feature fusion and picture classification. Obtaining pictures to be classified, judging the quality of the pictures, carrying out picture enhancement on pictures with low quality such as fuzzy, noisy and the like to improve the definition of the pictures, and then carrying out pretreatment operations such as picture resolution, regularization and the like; the improved multiscale attention module is embedded in the ResUnit of the ResNet network model. The picture features are processed by a ResUnit module to obtain a calibration feature X' integrating channel weights and position weights, and the calibration feature focuses more on important positions and classification categories of user prediction classification; outputting the classification category and the score of the picture after improving the ResNet network model; and carrying out post-processing on the results according to the user set threshold value, the attention category and the like to obtain a final output result.

Description

Bad picture classification method based on improved multi-scale attention mechanism
Technical Field
The invention relates to the technical field of artificial intelligence, attention and multi-scale feature fusion and picture classification, in particular to a poor image classification method based on an improved multi-scale attention mechanism.
Background
With the wide application and rapid development of the internet, the problem of bad information in the network space is increasingly serious, especially yellow and violent pictures. Therefore, it is becoming increasingly important to enhance network monitoring with poor picture classification techniques. Currently, many bad picture classification methods based on deep learning models are developed to improve the security and health of the network environment.
In recent years, researchers have found that the human visual system automatically focuses on important areas when processing images. This approach to focusing on important information, namely the attention mechanism, has been widely applied to convolutional neural network-based image classification, thereby improving the performance and accuracy of the model. However, when the attention mechanism-based picture classification method extracts attention feature vectors, in order to reduce the calculation amount, the channel domain information and the spatial domain information are generally compressed by adopting methods such as global maximum pooling or global average pooling, and a large amount of fusion information is lost by adopting the method to compress the information; and then, obtaining a attention characteristic diagram through a single convolution layer, wherein the problem of multiple scales of the attention target cannot be processed due to the limited receptive field of the single convolution layer. Thus, these methods may exhibit certain limitations in processing multi-scale objects, blurred objects.
Disclosure of Invention
The technical purpose is that: aiming at the defects in the prior art, the invention discloses a bad image classification method based on an improved multi-scale attention mechanism.
A bad picture classification method based on an improved multi-scale attention mechanism comprises the following steps:
step S1, a feature map X epsilon R obtained by processing pictures to be classified through a network model C*H*W The feature map is an input feature map of the scheme, and the dimension is C, H and W;
step S2, performing 3*3 convolution and 3*3 hole convolution on the input feature map respectively, wherein the hole rate of the 3*3 hole convolution is conditionaln=2, and obtaining two different feeling fields by two transformations 1 、F 2
F 1 =f(X)∈R H*W*C ,F 2 =f (X)∈R H*W*C
Step S3, respectively carrying out global average pooling on the two feature images to obtain 1 x C feature vectors corresponding to the two different receptive fields, and fusing the two feature vectors S;
s=AvgPool(F 1 )+AvgPool(F 2 )
step S4, introducing a super parameter K, and respectively carrying out (K+2) 1 cavity convolution and K1 convolution on the fused feature vector to obtain a feature 1 and a feature 2; then fusing the feature 1 and the feature 2 to obtain a channel domain attention feature map, namely a channel weight value A c
A c =σ(C (K+2)*1 (s)+C K*1 (s))
Wherein r and b are super parameters, r=2, b=1, C is the channel number of the fusion feature vector, and sigma is a sigmod function;
step S5, utilizing the channel weight A obtained in step S4 c Operating the original input feature map to obtain a calibration feature map X' of the attention of the fusion channel;
X'=F scale (X,A c )=X*A c ,X'∈R H*W*C
namely multiplying all values on H X W on each position of the original input feature map X by the weight of the corresponding channel;
step S6, taking the calibration feature image X' of the attention of the fusion channel as an input feature image for obtaining a spatial attention feature image;
s7, performing global maximum pooling and global average pooling on the calibration feature map X' along the channel domain to obtain S respectively 1 、S 2 The channel domain information is embedded in the spatial domain. And then S is carried out 1 、S 2 Fusing to obtain a feature map S;
S=concat(AvgPool(X′),MaxPool(X′))
wherein S.epsilon.R H×W*2
Step S8, carrying out two 3*3 hole convolutions (hole rate condition=2) on the fusion feature map S to obtain two features S with different receptive fields 1 And features s 2 Features s 1 And features s 2 Performing concat to obtain a feature map F fused with importance degrees of different scales;
F=concat(s 1 ,C 3*3 (s 1 ))∈R 1*H*W ,s 2 =C 3*3 (s 1 )
step S9, obtaining a spatial attention feature map, namely a spatial position weight A, by means of a sigmod function from feature maps F fused with importance degrees of different scales s Wherein the value of each pixel represents the degree of importance of the pixel in the feature map;
step S10, using the spatial position weight A obtained in step S9 s Operating a calibration feature map X' of the fusion channel attention;
X”=F scale (X',A s )=X'*A s ,X”∈R H*W*C
i.e. the values on all channels at each position of the calibration feature map X' are multiplied by the weights of the corresponding spaces.
Compared with the prior art, the invention has the following advantages:
1. the method has the advantages that features with different scales are extracted and fused from the input feature map through 3*3 convolution and 3*3 cavity convolution, and the method is more reasonable than a method of directly adopting global average pooling. Global averaging pooling compresses the feature map of each channel into a single value, and losing excessive position information in the feature map reduces detection accuracy. The 3x3 convolution and the 3x3 hole convolution are used for extracting the features and fusing, so that multi-scale feature representation and more position information can be obtained.
2. The fusion feature vector is subjected to (K+2) 1 cavity convolution and K1 cavity convolution respectively to obtain a feature 1 and a feature 2, and then the feature 1 and the feature 2 are fused to obtain a channel domain attention feature map, and the scope of feature extraction can be enlarged by increasing the receptive field so as to fuse more channel information, better capture features with different scales and different directions, improve the richness and the distinguishing degree of feature expression, and enable the obtained channel weight value to be more reasonable and accurate.
3. When the space weight is acquired, the position information of different receptive fields is fused in a mode of twice 3*3 cavity convolution and jump connection, so that wider picture features are captured, richer position information can be obtained, the acquired position weight value is more reasonable and accurate, and the classification accuracy is improved.
Drawings
FIG. 1 is a flow chart of the attention profile of the acquisition channel domain of the present invention;
FIG. 2 is a flow chart of the acquired spatial domain attention feature map of the present invention;
FIG. 3 is a flow chart of a fused feature map with attention acquisition in accordance with the present invention;
FIG. 4 is a flow chart of bad image classification based on an improved multi-scale attention mechanism of the present invention.
Detailed Description
The technical scheme of the invention is described in detail below with reference to the accompanying drawings:
as shown in fig. 3, a bad picture classification method based on an improved multi-scale attention mechanism is characterized by comprising the following steps:
step S1, as shown in FIG. 1, a feature map X E R obtained by processing a picture to be classified through a network model C*H*W The feature map is an input feature map of the scheme, and the dimension is C, H and W;
step S2, as shown in fig. 1, 3*3 convolution and 3*3 hole convolution are respectively performed on the input feature map in step S1, wherein the hole ratio condition of the hole convolution of 3*3 =2, and two feature maps F with different sense fields are obtained through two transformations 1 、F 2
F 1 =f(X)∈R H*W*C ,F 2 =f (X)∈R H*W*C
Step S3, as shown in FIG. 1, global average pooling is carried out on the two feature graphs in step S2 respectively to obtain 1 x C feature vectors corresponding to the two different receptive fields, and the two feature vectors S are fused;
s=AvgPool(F 1 )+AvgPool(F 2 )
step S4, as shown in FIG. 1, introducing a super parameter K, and carrying out (K+2) 1 cavity convolution and K1 convolution on the fusion feature vector in the step S3 to obtain a feature 1 and a feature 2; then fusing the feature 1 and the feature 2 to obtain a channel domain attention feature map, namely a channel weight value A c
A c =σ(C (K+2)*1 (s)+C K*1 (s))
Wherein r and b are super parameters, r=2, b=1, C is the channel number of the fusion feature vector, and sigma is a sigmod function;
step S5, as shown in FIG. 2, the channel weight A obtained in step S4 is used c Operating the original input feature map to obtain a calibration feature map X' of the attention of the fusion channel;
X'=F scale (X,A c )=X*A c ,X'∈R H*W*C
namely multiplying all values on H X W on each position of the original input feature map X by the weight of the corresponding channel;
step S6, taking the calibration feature image X' of the attention of the fusion channel as an input feature image for obtaining a spatial attention feature image;
s7, performing global maximum pooling and global average pooling on the calibration feature map X' along the channel domain to obtain S respectively 1 、S 2 The channel domain information is embedded in the spatial domain. And then S is carried out 1 、S 2 Fusing to obtain a feature map S;
S=concat(AvgPool(X′),MaxPool(X′))
wherein S.epsilon.R H*W*2
Step S8, the fusion characteristic diagram S is passed throughTwo 3*3 hole convolutions (void ratio condition=2) to obtain two features s with different receptive fields 1 And features s 2 Features s 1 And features s 2 Performing concat to obtain a feature map F fused with importance degrees of different scales;
F=concat(s 1 ,C 3*3 (s 1 ))∈R 1*H*W ,s 2 =C 3*3 (s 1 )
step S9, obtaining a spatial attention feature map, namely a spatial position weight A, by means of a sigmod function from feature maps F fused with importance degrees of different scales s Wherein the value of each pixel represents the degree of importance of the pixel in the feature map;
step S10, using the spatial position weight A obtained in step S9 s Operating a calibration feature map X' of the fusion channel attention;
X”=F scale (X',A s )=X'*A s ,X”∈R H*W*C
i.e. the values on all channels at each position of the calibration feature map X' are multiplied by the weights of the corresponding spaces.
As shown in FIG. 4, the present invention classifies bad pictures
1. Obtaining pictures to be classified, judging the quality of the pictures, carrying out picture enhancement on pictures with low quality such as fuzzy, noisy and the like to improve the definition of the pictures, and then carrying out pretreatment operations such as picture resolution, regularization and the like; the picture enhancement operation includes, but is not limited to, super-resolution reconstruction, image deblurring, etc.; preprocessing operations include, but are not limited to, operations of Resize, regularization, and the like;
2. the improved multiscale attention module is embedded in the ResUnit of the ResNet network model. Sending the preprocessed picture into an improved ResNet network model;
3. the picture features are processed by a ResUnit module to obtain a calibration feature X' integrating channel weights and position weights, and the calibration feature focuses more on important positions and classification categories of user prediction classification;
X'=F scale (X,A c )=X*A c X'∈R H*W*C
X”=F scale (X',A s )=X'*A s X”∈R H*W*C
4. outputting the classification category and the score of the picture after improving the ResNet network model; the ResNet is taken as a Backbone, a classification model of the self can be trained, the ResNet model is improved, fusion weight is introduced into each ResUnit, and classification model prediction pictures are trained on the improved ResNet model.
5. And carrying out post-processing on the results according to the user set threshold value, the attention category and the like to obtain a final output result.

Claims (1)

1. The bad picture classification method based on the improved multi-scale attention mechanism is characterized by comprising the following steps:
(1) Obtaining pictures to be classified, judging the quality of the pictures, carrying out picture enhancement on the pictures with low quality to improve the definition of the pictures, and then carrying out picture resolution and regularization pretreatment operation;
(2) Embedding the improved multiscale attention module into a ResUnit of a ResNet network model, and sending the preprocessed picture into the improved ResNet network model;
(3) The picture features are processed by the ResUnit module to obtain calibration features X' fusing channel weights and position weights, and the calibration features pay more attention to important positions and classification categories of user prediction classification;
(4) Outputting the classification category and the fraction of the picture after improving the ResNet network model; the ResNet is used as a Backbone training classification model, the ResNet model is improved, fusion weight is introduced into each ResUnit, and classification model prediction pictures are trained on the improved ResNet model;
(5) Post-processing the result according to the user set threshold and the attention category to obtain a final output result;
specific:
step S1, a feature map X epsilon R obtained by processing pictures to be classified through a network model C*H*W The feature map is an input feature map, and the dimension is C, H and W;
step S2, respectively for the input characteristic diagramsPerforming 3*3 convolution and 3*3 hole convolution, wherein the hole rate condition=2 of the 3*3 hole convolution, and obtaining two characteristic diagrams F with different sense fields through two transformations 1 、F 2
F 1 =f(X)∈R H*W*C ,F 2 =f (X)∈R H*W*C
Step S3, respectively carrying out global average pooling on the two feature images to obtain 1 x C feature vectors corresponding to the two different receptive fields, and then fusing to obtain a feature vector S;
s=AvgPool(F 1 )+AvgPool(F 2 )
step S4, introducing a super parameter K, and respectively carrying out (K+2) 1 cavity convolution and K1 convolution on the fused feature vector to obtain a feature 1 and a feature 2; then fusing the feature 1 and the feature 2 to obtain a channel domain attention feature map, namely a channel weight value A c
A c =σ(C (K+2)*1 (s)+C K*1 (s))
Wherein r and b are super parameters, r=2, b=1, c is the channel number of the fusion feature vector, and sigma is a sigmod function;
step S5, utilizing the channel weight A obtained in step S4 c Operating the original input feature map to obtain a calibration feature map X' of the attention of the fusion channel;
X'=F scale (X,A c )=X*A c ,X'∈R H*W*C
namely multiplying all values on H X W on each position of the original input feature map X by the weight of the corresponding channel;
step S6, taking the calibration feature image X' of the attention of the fusion channel as an input feature image for obtaining a spatial attention feature image;
s7, performing global maximum pooling and global average pooling on the calibration feature map X' along the channel domain to obtain S respectively 1 、S 2 Embedding channel domain information into space domain, and then S 1 、S 2 Fusing to obtain a feature map S;
S=concat(AvgPool(X′),MaxPool(X′))
wherein S.epsilon.R H*W*2
Step S8, carrying out cavity convolution on the fusion feature map S twice 3*3, wherein the cavity rate condition=2, and obtaining two features S with different receptive fields 1 And features s 2 Features s 1 And features s 2 Performing concat to obtain a feature map F fused with importance degrees of different scales;
F=concat(s 1 ,C 3*3 (s 1 ))∈R 1*H*W ,s 2 =C 3*3 (s 1 )
step S9, obtaining a spatial attention feature map, namely a spatial position weight A, by means of a sigmod function from feature maps F fused with importance degrees of different scales s Wherein the value of each pixel represents the degree of importance of the pixel in the feature map;
step S10, using the spatial position weight A obtained in step S9 s Operating a calibration feature map X' of the fusion channel attention;
X”=F scale (X',A s )=X'*A s ,X”∈R H*W*C
i.e. the values on all channels at each position of the calibration feature map X' are multiplied by the weights of the corresponding spaces.
CN202310434595.0A 2023-04-21 2023-04-21 Bad picture classification method based on improved multi-scale attention mechanism Active CN116563615B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310434595.0A CN116563615B (en) 2023-04-21 2023-04-21 Bad picture classification method based on improved multi-scale attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310434595.0A CN116563615B (en) 2023-04-21 2023-04-21 Bad picture classification method based on improved multi-scale attention mechanism

Publications (2)

Publication Number Publication Date
CN116563615A CN116563615A (en) 2023-08-08
CN116563615B true CN116563615B (en) 2023-11-07

Family

ID=87490813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310434595.0A Active CN116563615B (en) 2023-04-21 2023-04-21 Bad picture classification method based on improved multi-scale attention mechanism

Country Status (1)

Country Link
CN (1) CN116563615B (en)

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2455088A1 (en) * 2003-02-28 2004-08-28 Eastman Kodak Company Method and system for enhancing portrait images that are processed in a batch mode
WO2019037654A1 (en) * 2017-08-23 2019-02-28 京东方科技集团股份有限公司 3d image detection method and apparatus, electronic device, and computer readable medium
CN110414377A (en) * 2019-07-09 2019-11-05 武汉科技大学 A kind of remote sensing images scene classification method based on scale attention network
CN111199233A (en) * 2019-12-30 2020-05-26 四川大学 Improved deep learning pornographic image identification method
WO2020108366A1 (en) * 2018-11-27 2020-06-04 腾讯科技(深圳)有限公司 Image segmentation method and apparatus, computer device, and storage medium
WO2020253663A1 (en) * 2019-06-20 2020-12-24 腾讯科技(深圳)有限公司 Artificial intelligence-based image region recognition method and apparatus, and model training method and apparatus
JP6830707B1 (en) * 2020-01-23 2021-02-17 同▲済▼大学 Person re-identification method that combines random batch mask and multi-scale expression learning
WO2021115159A1 (en) * 2019-12-09 2021-06-17 中兴通讯股份有限公司 Character recognition network model training method, character recognition method, apparatuses, terminal, and computer storage medium therefor
WO2021139069A1 (en) * 2020-01-09 2021-07-15 南京信息工程大学 General target detection method for adaptive attention guidance mechanism
CN113610144A (en) * 2021-08-02 2021-11-05 合肥市正茂科技有限公司 Vehicle classification method based on multi-branch local attention network
WO2021249255A1 (en) * 2020-06-12 2021-12-16 青岛理工大学 Grabbing detection method based on rp-resnet
US11222217B1 (en) * 2020-08-14 2022-01-11 Tsinghua University Detection method using fusion network based on attention mechanism, and terminal device
CN114067107A (en) * 2022-01-13 2022-02-18 中国海洋大学 Multi-scale fine-grained image recognition method and system based on multi-grained attention
CN114091551A (en) * 2021-10-22 2022-02-25 北京奇艺世纪科技有限公司 Pornographic image identification method and device, electronic equipment and storage medium
CN114202502A (en) * 2021-08-30 2022-03-18 浙大宁波理工学院 Thread turning classification method based on convolutional neural network
WO2022127227A1 (en) * 2020-12-15 2022-06-23 西安交通大学 Multi-view semi-supervised lymph node classification method and system, and device
CN114708511A (en) * 2022-06-01 2022-07-05 成都信息工程大学 Remote sensing image target detection method based on multi-scale feature fusion and feature enhancement
WO2022160771A1 (en) * 2021-01-26 2022-08-04 武汉大学 Method for classifying hyperspectral images on basis of adaptive multi-scale feature extraction model
CN115331109A (en) * 2022-08-27 2022-11-11 南京理工大学 Remote sensing image target detection method based on rotation equal-variation convolution channel attention enhancement and multi-scale feature fusion
CN115761258A (en) * 2022-11-10 2023-03-07 山西大学 Image direction prediction method based on multi-scale fusion and attention mechanism

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295678B (en) * 2016-07-27 2020-03-06 北京旷视科技有限公司 Neural network training and constructing method and device and target detection method and device
CN108875752B (en) * 2018-03-21 2022-06-07 北京迈格威科技有限公司 Image processing method and apparatus, computer readable storage medium
US20220415027A1 (en) * 2021-06-29 2022-12-29 Shandong Jianzhu University Method for re-recognizing object image based on multi-feature information capture and correlation analysis

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2455088A1 (en) * 2003-02-28 2004-08-28 Eastman Kodak Company Method and system for enhancing portrait images that are processed in a batch mode
WO2019037654A1 (en) * 2017-08-23 2019-02-28 京东方科技集团股份有限公司 3d image detection method and apparatus, electronic device, and computer readable medium
WO2020108366A1 (en) * 2018-11-27 2020-06-04 腾讯科技(深圳)有限公司 Image segmentation method and apparatus, computer device, and storage medium
WO2020253663A1 (en) * 2019-06-20 2020-12-24 腾讯科技(深圳)有限公司 Artificial intelligence-based image region recognition method and apparatus, and model training method and apparatus
CN110414377A (en) * 2019-07-09 2019-11-05 武汉科技大学 A kind of remote sensing images scene classification method based on scale attention network
WO2021115159A1 (en) * 2019-12-09 2021-06-17 中兴通讯股份有限公司 Character recognition network model training method, character recognition method, apparatuses, terminal, and computer storage medium therefor
CN111199233A (en) * 2019-12-30 2020-05-26 四川大学 Improved deep learning pornographic image identification method
WO2021139069A1 (en) * 2020-01-09 2021-07-15 南京信息工程大学 General target detection method for adaptive attention guidance mechanism
JP6830707B1 (en) * 2020-01-23 2021-02-17 同▲済▼大学 Person re-identification method that combines random batch mask and multi-scale expression learning
WO2021249255A1 (en) * 2020-06-12 2021-12-16 青岛理工大学 Grabbing detection method based on rp-resnet
US11222217B1 (en) * 2020-08-14 2022-01-11 Tsinghua University Detection method using fusion network based on attention mechanism, and terminal device
WO2022127227A1 (en) * 2020-12-15 2022-06-23 西安交通大学 Multi-view semi-supervised lymph node classification method and system, and device
WO2022160771A1 (en) * 2021-01-26 2022-08-04 武汉大学 Method for classifying hyperspectral images on basis of adaptive multi-scale feature extraction model
CN113610144A (en) * 2021-08-02 2021-11-05 合肥市正茂科技有限公司 Vehicle classification method based on multi-branch local attention network
CN114202502A (en) * 2021-08-30 2022-03-18 浙大宁波理工学院 Thread turning classification method based on convolutional neural network
CN114091551A (en) * 2021-10-22 2022-02-25 北京奇艺世纪科技有限公司 Pornographic image identification method and device, electronic equipment and storage medium
CN114067107A (en) * 2022-01-13 2022-02-18 中国海洋大学 Multi-scale fine-grained image recognition method and system based on multi-grained attention
CN114708511A (en) * 2022-06-01 2022-07-05 成都信息工程大学 Remote sensing image target detection method based on multi-scale feature fusion and feature enhancement
CN115331109A (en) * 2022-08-27 2022-11-11 南京理工大学 Remote sensing image target detection method based on rotation equal-variation convolution channel attention enhancement and multi-scale feature fusion
CN115761258A (en) * 2022-11-10 2023-03-07 山西大学 Image direction prediction method based on multi-scale fusion and attention mechanism

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MULTI-SCALE FEATURE FUSION: LEARNING BETTER SEMANTIC SEGMENTATION FOR ROAD POTHOLE DETECTION;Jiahe Fan et al;《arXiv》;全文 *
基于多尺度残差网络的视网膜OCT图像分类;李冰等;《2022中国自动化大会论文集》;全文 *

Also Published As

Publication number Publication date
CN116563615A (en) 2023-08-08

Similar Documents

Publication Publication Date Title
CN111080629B (en) Method for detecting image splicing tampering
CN110135366B (en) Shielded pedestrian re-identification method based on multi-scale generation countermeasure network
CN112507997B (en) Face super-resolution system based on multi-scale convolution and receptive field feature fusion
CN110348319B (en) Face anti-counterfeiting method based on face depth information and edge image fusion
CN111612008B (en) Image segmentation method based on convolution network
CN110648334A (en) Multi-feature cyclic convolution saliency target detection method based on attention mechanism
CN110879982B (en) Crowd counting system and method
CN112541864A (en) Image restoration method based on multi-scale generation type confrontation network model
CN112052877B (en) Picture fine granularity classification method based on cascade enhancement network
CN114782298B (en) Infrared and visible light image fusion method with regional attention
CN111161224A (en) Casting internal defect grading evaluation system and method based on deep learning
CN110020658B (en) Salient object detection method based on multitask deep learning
CN113222124B (en) SAUNet + + network for image semantic segmentation and image semantic segmentation method
CN114048822A (en) Attention mechanism feature fusion segmentation method for image
CN111008570B (en) Video understanding method based on compression-excitation pseudo-three-dimensional network
CN113807356B (en) End-to-end low-visibility image semantic segmentation method
Babu et al. An efficient image dahazing using Googlenet based convolution neural networks
CN116452469B (en) Image defogging processing method and device based on deep learning
CN111401209B (en) Action recognition method based on deep learning
CN111696090A (en) Method for evaluating quality of face image in unconstrained environment
CN116630245A (en) Polyp segmentation method based on saliency map guidance and uncertainty semantic enhancement
CN116563615B (en) Bad picture classification method based on improved multi-scale attention mechanism
CN109165551B (en) Expression recognition method for adaptively weighting and fusing significance structure tensor and LBP characteristics
CN113450313B (en) Image significance visualization method based on regional contrast learning
CN115019367A (en) Genetic disease face recognition device and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant