CN116580289A - Fine granularity image recognition method based on attention - Google Patents

Fine granularity image recognition method based on attention Download PDF

Info

Publication number
CN116580289A
CN116580289A CN202310678774.9A CN202310678774A CN116580289A CN 116580289 A CN116580289 A CN 116580289A CN 202310678774 A CN202310678774 A CN 202310678774A CN 116580289 A CN116580289 A CN 116580289A
Authority
CN
China
Prior art keywords
attention
scale
module
feature
image recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310678774.9A
Other languages
Chinese (zh)
Inventor
李兰英
林成承
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN202310678774.9A priority Critical patent/CN116580289A/en
Publication of CN116580289A publication Critical patent/CN116580289A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

A fine granularity image recognition method based on attention belongs to the technical field of image classification, and builds a network model through a spatial depth module, a multi-scale feature extraction module, a context attention perception module and a multi-head attention module. The feature extraction capability of the model is enhanced through the spatial depth module, and the loss of a discrimination area caused by downsampling is reduced; extracting multi-scale features based on the salient regions by a multi-scale feature extraction module so as to enhance the recognition accuracy of the model; learning local relations among all scale features through a context awareness module; learning global and long-term links of multi-scale features through a multi-head attention module; and finally, adopting a cross entropy loss function and a center loss function as loss functions of the network, and reducing the intra-class distance by expanding the inter-class distance between samples so as to reduce the influence of the confusable area on the model identification precision. The method can well solve the problems of low-level information loss caused by deepening of network layers and low recognition accuracy caused by neglecting the relation among multi-scale features in fine-grained image recognition.

Description

Fine granularity image recognition method based on attention
Technical Field
The invention belongs to the technical field of fine-granularity image processing, and particularly relates to a fine-granularity image recognition method based on attention.
Background
As an important research direction in the field of computer vision, image recognition is the most basic task, and is also the basis for other various visual tasks. As an important branch extending from the field of image recognition, fine-grained image recognition is different from conventional image recognition. Fine-grained image recognition is the division of various subcategories under the same meta-category, e.g., from among a wide variety of cats. The fine-grained image recognition can be classified into fine-grained image recognition based on strong supervision and fine-grained image recognition based on weak supervision, wherein the fine-grained image recognition uses annotation points and annotation frames to assist learning during model training, and the fine-grained image recognition uses only image labels to learn. The fine-grained image recognition based on weak supervised learning mainly comprises three methods of region-positioning sub-network based, high-order feature coding based and additional information assisted recognition based.
The current fine-grained image recognition method is mainly a region-positioning sub-network-based method, which mainly locates the regions with discriminant features through an attention mechanism and learns features from the regions. Although this approach achieves good results, it has the following disadvantages: the existing method ignores the effect of low-level information, and can cause the loss of the low-level information in a small discriminant area along with the increase of the network layer number; furthermore, these methods only find critical areas by spatial and channel attention, ignoring the links between them.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention provides a fine granularity image recognition method based on attention, which comprises the following steps:
s1, constructing a fine-grained image recognition network model: the system specifically comprises a feature extraction network, a spatial depth convolution module, a multi-scale feature extraction module, a context attention sensing module, a multi-head self-attention module and a classifier;
s2, optimizing an initial network by using the pre-training parameters;
s3, dividing a data set and preprocessing a sample image;
s4, inputting the sample image into a feature extraction network to obtain a feature map and an attention thermodynamic diagram;
s5, simultaneously inputting the extracted feature map and thermodynamic diagram into a multi-scale feature module to obtain a multi-scale feature map;
s6, inputting the multi-scale feature map into a context attention sensing module, so that the model learns multi-scale context information of the salient region;
s7, inputting multi-scale context information into a multi-head self-attention module, so that the model learns the long-term dependency relationship of each scale characteristic;
and S8, training the network model according to the loss function, and repeating the steps S4 to S7 until the loss function converges.
And finally, inputting the fine-grained images to be identified into a trained model for classification identification.
The characteristic extraction network adopts ConvNeXt convolutional neural network as backbone network.
Further, the backbone network includes:
and adding a spatial depth convolution module to replace the original downsampling part in each Stage, so as to enhance the identification capability of the model for judging the key region. For a size of sxsxc 1 The feature map X of (2) is divided into sub-maps, and the formula is as follows:
f s-1,s-1 =X[s-1:S:s,s-1:S:s]
where f is the sub-feature map and s is the scale factor. Connecting sub-feature maps in the channel dimension to convert feature map X into a new intermediate feature map
Then using non-stride convolutionFeature conversion, adding a C after feature mapping X 2 Convolutional layer, where C 2 <s 2 C 1 Will beConversion to->So as to keep the discrimination information of the micro-area as much as possible.
Further, for a given feature map X ε R C×H×W Wherein C, H, W respectively represents the number, height and width of channels, the multi-scale feature module captures regions with different scales on a feature map X through rectangular regions with different sizes, and for a response region r (i, j, [ delta ] X, [ delta ] y), wherein i and j are central positions of the response region, and [ delta ] X, [ delta ] y are the width and the height. By varying the width and height of the regions, a set of regions, r=r (i, j, mΔx, nΔy), where m, n=1, 2,3, …, is obtained; and i<i+m△x≤W,j<j+m delta y is less than or equal to H, and rich context information of subtle changes of response areas is captured step by step, so that a group of area sets R= { R } are obtained.
Further, for a plurality of areas r=r (i, j, m Δx, n Δy) with different sizes, feature vectors with fixed sizes are generated by bilinear pooling and bilinear interpolation to represent the areas, and the areas are represented in target coordinatesTransformed image +.>The formula is as follows:
wherein R (L) ψ (y)) represents that a feature vector with the region coordinate of y is obtained from the original image; l (L) ψ (y) represents a transformation of the coordinate y, wherein ψ is a learnable parameter; k is a kernel function whenAnd L ψ (y) when not directly adjacent, < >>
Further, the context awareness module is used for capturing the relation among the multi-scale features, so that the model can selectively pay attention to more relevant areas to generate overall context information, and a specific formula among the multi-scale features is obtained as follows:
v in r Note for the context that the feature vector,feature map, alpha, representing other scales associated with the current scale r,r' Representing the correlation between the current scale feature and other neighboring scale features, the formula is as follows:
m in the formula α B is a nonlinear combination of weight matrix α 、b β Representing the deviation;representing a query vector->The formula for representing the key vector is shown below:
m in the formula β And M β' The weight matrix is represented by a matrix of weights,a feature map representing a current scale;
further, for context vector v= { V r Global average pooling of r=1..|r| } and the resulting contextual features f are pooled r As the input of the multi-head self-attention module, the spatial arrangement information of the learning region and the long-term dependency relationship are studied, and the calculation formula of the multi-head self-attention is as follows:
A=Concat(A 1 ,A 2 ,...,A |R| )W 0
q, K, V is query vector, key vector and value vector, W 0 Is a weight matrix.
The attention-based fine-grained image recognition method according to claim 1, wherein a network of models is trained using a combination of cross entropy loss functions and center loss functions, the loss function formulas of the models being as follows:
L=L CE +λL cent
wherein lambda is a weight coefficient, the influence of a center loss function on the total loss is measured, N is the number of categories, y i As a true value tag, p i Predicting labels for the models; w is the number of samples, x i In order to train the sample,the center vector is represented by a vector of the center,||·|| 2 representing the Euclidean distance;
and carrying out optimization training on the network model according to the total loss L, thereby obtaining an optimally trained network model.
The fine granularity image recognition method based on attention provided by the invention has the following advantages:
(1) According to the method, the space depth convolution module is designed, so that the model keeps low-level information which is lost with the deepening of the layer number of the convolution network, the diversity of model learning characteristics is enhanced, and the recognition accuracy is improved.
(2) According to the method, not only is the key area considered, but also the multi-scale characteristics adjacent to the key area are obtained by designing the multi-scale characteristic module, so that the robustness and the recognition capability of the model are enhanced.
(3) According to the method, the local relation and the global relation among the scale features are obtained by designing the contextual attention and the multi-head attention features, and the local relation and the global relation are fused to obtain rich feature representations, so that the recognition performance of the model is further improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of a modified ConvNeXt network structure according to the present invention
FIG. 3 is a system configuration diagram of the present invention
Detailed Description
The objects, technical solutions and advantages of the present invention will become more apparent by the following detailed description of the present invention with reference to the accompanying drawings. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.
As shown in fig. 1, the present invention provides a fine-grained image learning method based on attention, which comprises the following steps:
step 1, inputting an image to be classified into a feature extraction network to obtain a feature map:
as shown in fig. 2, the feature extraction network is formed by using a ConvNeXt convolution network as a basic network and adding a spatial depth convolution module thereon, wherein the network is mainly divided into four stages, namely four stages, each Stage comprises a downsampling layer and a plurality of convolution layers except for the first Stage, and the downsampling layer in the Stage is replaced by the spatial depth convolution module to enhance the identification capability of the model on a micro-discrimination key region. For a size of sxsxc 1 The feature map X of (2) is divided into sub-maps, and the formula is as follows:
f s-1,s-1 =X[s-1:S:s,s-1:S:s]
where f is the sub-feature map and s is the scale factor. Connecting sub-feature maps in the channel dimension to convert feature map X into a new intermediate feature map
Then adopting non-stride convolution to make feature conversion, adding a C after feature mapping X 2 Convolutional layer, where C 2 <s 2 C 1 Will beConversion to->The space size of the feature map is reduced to half of the original space size after the input image passes through one Stage, and the channel data is doubled, so that the discrimination information of the micro area is kept as much as possible. Here, the feature map after Stage 4 is acquired, and an attention thermodynamic diagram is obtained through CAM (Class Activation Mapping).
Step 2, acquiring multi-scale features through a multi-scale feature module:
as shown in FIG. 3, the model's multiscale feature module, X ε R, for a given feature map C×H×W Wherein C, H, W respectively represents the number, height and width of channels, the multi-scale feature module captures regions with different scales on a feature map X through rectangular regions with different sizes, and for a key region r (i, j, deltax, deltay), i and j are central positions of response regions, deltax and Deltay are widths and heights. By varying the width and height of the regions, a set of regions, r=r (i, j, mΔx, nΔy), where m, n=1, 2,3, …, is obtained; and i<i+m△x≤W,j<j+m delta y is less than or equal to H, and rich context information of subtle changes of response areas is captured step by step, so that a group of area sets R= { R } are obtained.
The regions are then represented using bilinear pooling, bilinear interpolation to generate feature vectors of fixed size for the set of regions r=r (i, j, mΔx, nΔy), at the target coordinatesTransformed image +.>The formula is as follows:
wherein R (L) ψ (y)) represents that a feature vector with the region coordinate of y is obtained from the original image; l (L) ψ (y) represents a transformation of the coordinate y, wherein ψ is a learnable parameter; k is a kernel function whenAnd L ψ (y) when not directly adjacent, < >>Through the module, multi-scale features are obtained from the feature map, and the features with different scales are integratedAnd feature vectors with the same size are combined, so that the subsequent calculation of the model is facilitated.
Step 3, obtaining local connection through the context attention:
as shown in fig. 3, the model's contextual attention module is used to capture local relationships between multi-scale features, enabling the model to selectively focus on more relevant regions to generate overall contextual information. After receiving the multi-scale features, a specific formula for obtaining the relation between the multi-scale features is as follows:
v in r Note for the context that the feature vector,feature map, alpha, representing other scales associated with the current scale r,r' Representing the correlation between the current scale feature and other neighboring scale features, the formula is as follows:
m in the formula α B is a nonlinear combination of weight matrix α 、b β Representing the deviation;representing a query vector->The formula for representing the key vector is shown below:
m in the formula β And M β' The weight matrix is represented by a matrix of weights,a feature map representing the current scale.
Step 4, acquiring global connection through a multi-head attention module:
as shown in fig. 3, the multi-head attention module of the model first pairs the context vector v= { V r R=1..|r| } is globally averaged pooled and the resulting contextual feature f is used r As the input of the multi-head self-attention module, the spatial arrangement information of the learning region and the long-term dependency relationship are studied, and the calculation formula of the multi-head self-attention is as follows:
A=Concat(A 1 ,A 2 ,...,A |R| )W 0
q, K, V is query vector, key vector and value vector, W 0 Is a weight matrix.
Step 5, combining the local features and the global features to obtain a final classification result:
as shown in fig. 3, the contextual attention derived features and the multi-head attention derived features are stitched together through the FC layer as the basis for the final classification. In the training stage, a model network is trained by adopting a cross entropy loss function and a center loss function in a combined mode, and the loss function formula of the model is as follows:
L=L CE +λL cent
wherein lambda is a weight coefficient, the influence of a center loss function on the total loss is measured, N is the number of categories, y i Is true toValue tag, p i Predicting labels for the models; w is the number of samples, x i In order to train the sample,the center vector is represented by a vector of the center, I.I 2 Representing the Euclidean distance;
and carrying out optimization training on the network model according to the total loss L, and continuously repeating the steps until the loss function converges, so as to finally obtain the network model with optimized training. After training is completed, fine-grained images are input, and the model can realize high-accuracy recognition.
Briefly, the present embodiment provides a fine-granularity image recognition method based on attention, which is used for classifying fine-granularity images, and designs a fine-granularity recognition network model based on fine-granularity recognition, and the fine-granularity recognition network model mainly comprises a spatial depth convolution module, a feature extraction network, a multi-scale feature module, a context attention module, a multi-head attention module and a classifier. On one hand, the problem of low-level information loss of a tiny judging area is considered, and on the other hand, the problem of connection between the judging area and other areas is also considered.
Finally, it is to be understood that the above-described embodiments of the present invention are merely illustrative of or explanation of the principles of the present invention and are in no way limiting of the invention. Accordingly, any modification, equivalent replacement, improvement, etc. made without departing from the spirit and scope of the present invention should be included in the scope of the present invention.

Claims (8)

1. A fine-grained image recognition method based on an attention mechanism, the method comprising the steps of:
s1, constructing a fine-grained image recognition network model: the system specifically comprises a feature extraction network, a spatial depth convolution module, a multi-scale feature extraction module, a context attention sensing module, a multi-head attention module and a classifier;
s2, optimizing an initial network by using the pre-training parameters;
s3, dividing a data set and preprocessing a sample image;
s4, inputting the sample image into a feature extraction network to obtain a feature map and an attention thermodynamic diagram;
s5, inputting the extracted feature map and thermodynamic diagram into a multi-scale feature extraction module to obtain a multi-scale feature map;
s6, inputting the multi-scale feature map into a context attention sensing module, so that the model learns multi-scale context information of the salient region;
s7, inputting multi-scale context information into a multi-head attention module, so that the model learns long-term dependency of each scale characteristic;
and S8, training the network model according to the loss function, and repeating the steps S4 to S7 until the loss function converges.
And finally, inputting the fine-grained images to be identified into a trained model for classification identification.
2. The attention-based fine-granularity image recognition method according to claim 1, wherein the feature extraction network adopts a ConvNeXt convolutional neural network as a backbone network.
3. The attention-based fine-granularity image recognition method according to claim 2, wherein a spatial depth convolution module is added in each Stage to replace an original downsampling part, so that the recognition capability of the model on a tiny discrimination key region is enhanced. For a size of sxsxc 1 The feature map X of (2) is divided into sub-maps, and the formula is as follows:
f s-1,s-1 =X[s-1:S:s,s-1:S:s]
where f is the sub-feature map and s is the scale factor. Connecting sub-feature maps in the channel dimension to convert feature map X into a new intermediate feature map
Then adopting non-stride convolution to make feature conversion, adding a C after feature mapping X 2 A convolution layer, whichMiddle C 2 <s 2 C 1 Will beConversion to->So as to keep the discrimination information of the micro-area as much as possible.
4. The attention-based fine-grained image recognition method according to claim 1, wherein for a given feature map X e R C×H×W Wherein C, H, W respectively represent the number, height and width of channels, the multi-scale feature extraction module captures regions of different scales on the feature map X through rectangular regions of different sizes, and for the response regions r (i, j, [ delta ] X, [ delta ] y), wherein i and j are central positions of the response regions, and [ delta ] X, [ delta ] y are the width and the height. By varying the width and height of the regions, a set of regions, r=r (i, j, mΔx, nΔy), where m, n=1, 2,3, …, is obtained; and i<i+m△x≤W,j<j+m delta y is less than or equal to H, and rich context information of subtle changes of response areas is captured step by step, so that a group of area sets R= { R } are obtained.
5. The attention-based fine-granularity image recognition method according to claim 4, wherein for a number of different-sized regions r=r (i, j, m Δx, n Δy), a bilinear pooling, bilinear interpolation is used to generate feature vectors of fixed size to represent the regions at target coordinatesTransformed image +.>The formula is as follows:
wherein R (L) ψ (y)) represents that a feature vector with the region coordinate of y is obtained from the original image; l (L) ψ (y) represents a transformation of the coordinate y, wherein ψ is a learnable parameter; k is a kernel function whenAnd L ψ (y) when not directly adjacent, < >>
6. The attention-based fine-grained image recognition method according to claim 1, wherein the context attention-aware module is used to capture the relationships between the multi-scale features, enabling the model to selectively focus on more relevant regions to generate overall context information, resulting in the following specific formulas for the relationships between the multi-scale features:
v in r Note for the context that the feature vector,feature map, alpha, representing other scales associated with the current scale r,r' Representing the correlation between the current scale feature and other neighboring scale features, the formula is as follows:
m in the formula α B is a nonlinear combination of weight matrix α 、b β Representing the deviation;representing a query vector->The formula for representing the key vector is shown below:
m in the formula β And M β' The weight matrix is represented by a matrix of weights,a feature map representing the current scale.
7. The attention-based fine granularity image recognition method of claim 1, wherein for the context vector v= { V r Global average pooling of r=1..|r| } and the resulting contextual features f are pooled r As the input of the multi-head attention module, the spatial arrangement information of the learning region and the long-term dependency relationship are studied, and the calculation formula of the multi-head attention is as follows:
A=Concat(A 1 ,A 2 ,...,A |R| )W 0
q, K, V is query vector, key vector and value vector, W 0 Is a weight matrix.
8. The attention-based fine-grained image recognition method according to claim 1, wherein a network of models is trained using a combination of cross entropy loss functions and center loss functions, the loss function formulas of the models being as follows:
L=L CE +λL cent
wherein lambda is a weight coefficient, the influence of a center loss function on the total loss is measured, N is the number of categories, y i As a true value tag, p i Predicting labels for the models; w is the number of samples, x i In order to train the sample,the center vector is represented by a vector of the center, I.I 2 Representing the Euclidean distance;
and carrying out optimization training on the network model according to the total loss L, thereby obtaining an optimally trained network model.
CN202310678774.9A 2023-06-08 2023-06-08 Fine granularity image recognition method based on attention Pending CN116580289A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310678774.9A CN116580289A (en) 2023-06-08 2023-06-08 Fine granularity image recognition method based on attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310678774.9A CN116580289A (en) 2023-06-08 2023-06-08 Fine granularity image recognition method based on attention

Publications (1)

Publication Number Publication Date
CN116580289A true CN116580289A (en) 2023-08-11

Family

ID=87534131

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310678774.9A Pending CN116580289A (en) 2023-06-08 2023-06-08 Fine granularity image recognition method based on attention

Country Status (1)

Country Link
CN (1) CN116580289A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117853875A (en) * 2024-03-04 2024-04-09 华东交通大学 Fine-granularity image recognition method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117853875A (en) * 2024-03-04 2024-04-09 华东交通大学 Fine-granularity image recognition method and system
CN117853875B (en) * 2024-03-04 2024-05-14 华东交通大学 Fine-granularity image recognition method and system

Similar Documents

Publication Publication Date Title
CN109949317B (en) Semi-supervised image example segmentation method based on gradual confrontation learning
CN108961235B (en) Defective insulator identification method based on YOLOv3 network and particle filter algorithm
Zheng et al. Unsupervised change detection by cross-resolution difference learning
CN109063565B (en) Low-resolution face recognition method and device
CN112364931B (en) Few-sample target detection method and network system based on meta-feature and weight adjustment
CN107092884B (en) Rapid coarse-fine cascade pedestrian detection method
CN109635726B (en) Landslide identification method based on combination of symmetric deep network and multi-scale pooling
CN113609896A (en) Object-level remote sensing change detection method and system based on dual-correlation attention
CN111582044A (en) Face recognition method based on convolutional neural network and attention model
CN112633382A (en) Mutual-neighbor-based few-sample image classification method and system
CN116342894B (en) GIS infrared feature recognition system and method based on improved YOLOv5
CN106022223A (en) High-dimensional local-binary-pattern face identification algorithm and system
US20240161531A1 (en) Transformer-based multi-scale pedestrian re-identification method
CN112488128A (en) Bezier curve-based detection method for any distorted image line segment
CN112580480A (en) Hyperspectral remote sensing image classification method and device
CN113378675A (en) Face recognition method for simultaneous detection and feature extraction
CN116580289A (en) Fine granularity image recognition method based on attention
CN115187786A (en) Rotation-based CenterNet2 target detection method
CN110349176B (en) Target tracking method and system based on triple convolutional network and perceptual interference learning
CN110263731B (en) Single step human face detection system
Xu et al. UCDFormer: Unsupervised change detection using a transformer-driven image translation
CN116597267B (en) Image recognition method, device, computer equipment and storage medium
CN114998702A (en) Entity recognition and knowledge graph generation method and system based on BlendMask
CN114913504A (en) Vehicle target identification method of remote sensing image fused with self-attention mechanism
CN114202659A (en) Fine-grained image classification method based on spatial symmetry irregular local region feature extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination