CN117095222A - Small sample image classification method, system, device and medium based on coordinate attention and BDC measurement - Google Patents

Small sample image classification method, system, device and medium based on coordinate attention and BDC measurement Download PDF

Info

Publication number
CN117095222A
CN117095222A CN202311076911.8A CN202311076911A CN117095222A CN 117095222 A CN117095222 A CN 117095222A CN 202311076911 A CN202311076911 A CN 202311076911A CN 117095222 A CN117095222 A CN 117095222A
Authority
CN
China
Prior art keywords
bdc
training
depth model
matrix
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311076911.8A
Other languages
Chinese (zh)
Inventor
刘颖
张恒畅
薛家昊
杨剑宁
张伟东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Posts and Telecommunications
Original Assignee
Xian University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Posts and Telecommunications filed Critical Xian University of Posts and Telecommunications
Priority to CN202311076911.8A priority Critical patent/CN117095222A/en
Publication of CN117095222A publication Critical patent/CN117095222A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0985Hyperparameter optimisation; Meta-learning; Learning-to-learn
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a small sample image classification method, a system, a device and a medium based on coordinate attention and BDC measurement, which comprise the following steps: collecting images and dividing the collected images to obtain a training set, a verification set and a test set; constructing a depth model frame; training the depth model based on the training set, and stopping training when the training times reach the maximum training times threshold value and the performance of the depth model on the verification set reaches the preset requirement or is not obviously improved any more, so as to obtain an optimized depth model; and carrying out classification prediction on the test set based on the optimized depth model and the element learning N-way K-shot mode, and evaluating the classification accuracy. The invention can help the model to better complete the classification task of the small sample image by introducing the coordinate attention of the space information and the position information and combining the BDC measurement mode utilizing the difference between the image edge distribution and the joint distribution, and can effectively improve the classification precision.

Description

Small sample image classification method, system, device and medium based on coordinate attention and BDC measurement
Technical Field
The invention belongs to the technical field of image processing, and relates to a small sample image classification method, system, device and medium based on coordinate attention and BDC measurement.
Background
Deep learning has made tremendous progress in image classification tasks by virtue of large-scale data. However, training of the deep learning model needs to be based on massive labeling data, so that a large-scale training sample is often difficult to obtain in practical situations, and the model is easy to be subjected to over-fitting phenomenon under the condition of a small number of samples. However, in special fields such as medical treatment and military, image classification suffers from small sample problems such as difficulty in acquiring data and difficulty in labeling. Therefore, the research of the image classification technology for small samples becomes an important research direction.
The current small sample image classification technology mainly utilizes element learning strategies to divide data, utilizes a deep neural network to extract characteristics of an input image, and finally completes classification in a measurement learning mode. For example, the introduction of SENet attention mechanisms establishes associations between feature channels, and finally small sample image classification is achieved by computing image-to-class similarity. However, the above method often does not consider position information that is critical to generating spatially selective attention, resulting in a limited feature extraction capability of the model. Therefore, how to improve the feature expression capability of the model and further improve the classification performance is an urgent technical problem in the art.
Disclosure of Invention
The invention aims to solve the problem that in the prior art, when small sample image classification is carried out, position information which is important to generating space selectivity attention is not considered, so that the feature extraction capability of a model is limited, and provides a small sample image classification method, a system, a device and a medium based on coordinate attention and BDC measurement.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
a method of classifying small sample images based on coordinate attention and BDC metrics, comprising:
collecting images and dividing the collected images to obtain a training set, a verification set and a test set;
constructing a depth model frame;
training the depth model based on the training set, and stopping training when the training times reach the maximum training times threshold value and the performance of the depth model on the verification set reaches the preset requirement or is not obviously improved any more, so as to obtain an optimized depth model;
and carrying out classification prediction on the test set based on the optimized depth model and the element learning N-way K-shot mode, and evaluating the classification accuracy.
The invention further improves that:
further, capturing an image and dividing the captured image, including: acquiring a public small sample data set miniImageNet, omniglot, a small sample fine granularity data set CUB-200 and a tire pattern data set CIIP-TPID applied to the public security field; training set, test set and validation set partitioning are performed on the miniImageNet dataset, the Omniglot dataset, the CUB-200 dataset and the CIIP-TPID dataset, respectively.
Further, the depth model frame includes: an embedding module, a coordinate attention module, a BDC measurement module and a Softmax layer;
the embedded module performs feature extraction on the input training set; the coordinate attention module carries out global average pooling along the horizontal direction and the vertical direction respectively, and decomposes the input feature map X into one-dimensional features in the horizontal direction and the vertical direction to obtain the horizontal position information and the vertical position information of the input feature map; connecting the horizontal and vertical position information of the input feature map together, F by 1X 1 convolution 1 The transformation operation obtains the correspondingAttention feature map, after nonlinear operation, the output result is decomposed into feature map f in horizontal direction according to the space dimension h ∈R C/r×H And feature mapping f in the vertical direction w ∈R C/r×W The method comprises the steps of carrying out a first treatment on the surface of the Feature map f in horizontal direction h ∈R C/r×H And feature mapping f in the vertical direction w ∈R C/r×W By 1X 1 convolution and F h And F w A transformation operation, namely obtaining the attention weight by using a Sigmoid activation function, and finally multiplying the attention weight in the horizontal and vertical directions with the input feature map data to obtain a final output feature;
the BDC measurement module is introduced with a depth Brownian distance covariance method, a BDC matrix of a support set is calculated, an average operation is carried out on the obtained BDC matrix to obtain prototype representation of each category, then the BDC matrix of a query set is calculated, and the BDC matrix and the category prototypes are respectively subjected to inner products to obtain similarity between a query image and each category, so that the category of the image is predicted;
the Softmax layer outputs the results of the BDC metric module classification.
Further, the input feature map X is decomposed into one-dimensional features in the horizontal and vertical directions, specifically:
wherein H represents the height of the feature vector, c represents the c-th channel, W represents the width of the input feature map of the current module, x represents the decomposed feature vector, W represents the width of the feature vector, and H represents the height of the input feature map of the current module;
the input feature map is connected with the horizontal and vertical position information and F is carried out through 1X 1 convolution 1 The transformation operation obtains a corresponding attention feature map, specifically:
f=δ(F 1 ([z h ,z w ])),f∈R C/r×(H+W) (3)
wherein f is the coded intermediate feature map, delta is a nonlinear activation function, [ ·, ] is a splicing operation, and r is a superparameter controlling the size of the module;
the feature map f in the horizontal direction h ∈R C/r×H And feature mapping f in the vertical direction w ∈R C/r×W By 1X 1 convolution and F h And F w The transformation operation, which obtains the weight of the attention by using the Sigmoid activation function, specifically comprises the following steps:
g h =σ(F h (f h )) (4)
g w =σ(F w (f w )) (5)
wherein σ represents a Sigmoid activation function;
the method comprises the steps of multiplying the attention weights in the horizontal and vertical directions with input feature map data to obtain final output features, wherein the final output features are specifically as follows:
further, a depth Brownian distance covariance method is introduced into the BDC measurement module, and a BDC matrix of the support set image is calculated, specifically:
calculating a square Euclidean distance matrix of the input feature map:
open square is carried out on the square Euclidean distance matrix to obtain the Euclidean distance matrix
To European distance matrixSubtracting the element mean value of each row, the element mean value of each column and the element mean values of all the elements to obtain a final BDC matrix.
The square Euclidean distance matrix of the input feature map is calculated, and the square Euclidean distance matrix specifically comprises:
wherein the method comprises the steps of The squared Euclidean distance represented as the kth and the first column of matrix X; i represents an identity matrix, ">Is the Hadamard product, defined as (U) sym =(U+U T )/2;1∈R c×c Represented as a matrix with all values of 1;
the square Euclidean distance matrix is subjected to open square to obtain the Euclidean distance matrixThe method comprises the following steps:
the pair of Euclidean distance matrixesSubtracting the element mean value of each row, the element mean value of each column and the element mean value of all elements to obtain a final BDC matrix, wherein the final BDC matrix comprises the following concrete steps:
further, training the depth model based on the training set is specifically as follows:
inputting training set data into a depth model, accelerating the training of the depth model by adopting an Adam optimization method, and selecting a cross entropy loss function by using a ReLU activation function, wherein the loss function is specifically as follows:
L=cross_entropy(softmax(F),Label) (10)
wherein cross_entropy is a cross entropy loss function, F is a feature extracted from the training set by the feature extractor, and Label represents a true distribution corresponding to the feature extracted from the training set by the feature extractor.
Further, the test set is classified and predicted based on an optimized depth model and a meta learning N-way K-shot mode, and the classification accuracy is evaluated, specifically:
inputting the test set into an optimized depth model, classifying and predicting the data of the test set in a meta-learning N-way K-shot mode, namely randomly selecting N types of images in the test set, selecting K samples in each type as a support set for training the model, selecting the K samples in the N types as a query set, verifying by using the query set, and verifying the superiority of each algorithm by comparing the classification accuracy obtained by calculation.
A small sample image classification system based on coordinate attention and BDC metrics, comprising:
the dividing module is used for collecting images and dividing the collected images to obtain a training set, a verification set and a test set;
the construction module is used for constructing a depth model frame;
the acquisition module is used for training the depth model based on the training set, and stopping training when the training times reach the maximum training times threshold value and the performance of the depth model on the verification set reaches the preset requirement or is not obviously improved any more, so as to acquire the optimized depth model;
and the evaluation module is used for carrying out classification prediction on the test set based on the optimized depth model and the meta learning N-way K-shot mode and evaluating the classification accuracy.
A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method as described above when the computer program is executed.
A computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of a method as described above.
Compared with the prior art, the invention has the following beneficial effects:
the invention constructs a depth model frame comprising an embedding module, a coordinate attention module, a BDC measuring module and a Softmax layer; deep features of the mined image are realized based on the embedded module; combining the space information and the position information of the input feature map through a coordinate attention module; then calculating a BDC matrix of the support set image through a BDC measurement module; and calculating the BDC matrix of the query set image, respectively carrying out inner product on the BDC matrix and the category prototype to obtain the similarity between the query image and each category, and completing the classification task. Training the depth model based on the training set, and stopping training when the training times reach a maximum training time threshold value and the performance of the depth model on the verification set reaches a preset requirement or is not obviously improved any more, so as to obtain an optimized depth model; and carrying out classification prediction on the test set based on the optimized depth model and the element learning N-way K-shot mode, and evaluating the classification accuracy. The invention introduces the coordinate attention of the space information and the position information, and simultaneously combines the BDC measurement mode utilizing the difference between the image edge distribution and the joint distribution to help the model to better finish the small sample image classification task, thereby effectively improving the classification precision; the invention is not only suitable for classifying small sample images, but also excellent in small sample fine granularity data sets, and has excellent performance when facing different fields of tire pattern image data.
Drawings
For a clearer description of the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a small sample image classification method based on coordinate attention and BDC measurement of the present invention;
FIG. 2 is a schematic diagram of the small sample image classification system based on coordinate attention and BDC measurements of the present invention;
FIG. 3 is another flow chart of the small sample image classification method based on the coordinate attention and BDC measurement of the present invention;
FIG. 4 is a schematic diagram of a depth model framework of the present invention;
FIG. 5 is a schematic diagram of the network architecture of ResNet-12;
FIG. 6 is a schematic diagram of the principle of the coordinate attention mechanism;
fig. 7 is a schematic diagram of a depth brown distance covariance method.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
In the description of the embodiments of the present invention, it should be noted that, if the terms "upper," "lower," "horizontal," "inner," and the like indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, or the azimuth or the positional relationship in which the inventive product is conventionally put in use, it is merely for convenience of describing the present invention and simplifying the description, and does not indicate or imply that the apparatus or element to be referred to must have a specific azimuth, be configured and operated in a specific azimuth, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like, are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.
Furthermore, the term "horizontal" if present does not mean that the component is required to be absolutely horizontal, but may be slightly inclined. As "horizontal" merely means that its direction is more horizontal than "vertical", and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.
In the description of the embodiments of the present invention, it should also be noted that, unless explicitly specified and limited otherwise, the terms "disposed," "mounted," "connected," and "connected" should be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
The invention is described in further detail below with reference to the attached drawing figures:
referring to fig. 1, the invention discloses a small sample image classification method based on coordinate attention and BDC measurement, comprising the following steps:
s101, acquiring images and dividing the acquired images to obtain a training set, a verification set and a test set.
Acquiring a public small sample data set miniImageNet, omniglot, a small sample fine granularity data set CUB-200 and a tire pattern data set CIIP-TPID applied to the public security field; training set, test set and validation set partitioning are performed on the miniImageNet dataset, the Omniglot dataset, the CUB-200 dataset and the CIIP-TPID dataset, respectively.
S102, constructing a depth model frame.
The depth model frame includes: an embedding module, a coordinate attention module, a BDC measurement module and a Softmax layer;
the embedded module performs feature extraction on the input training set; the coordinate attention module carries out global average pooling along the horizontal direction and the vertical direction respectively, and decomposes the input feature map X into one-dimensional features in the horizontal direction and the vertical direction to obtain the horizontal position information and the vertical position information of the input feature map; connecting the horizontal and vertical position information of the input feature map together, F by 1X 1 convolution 1 The transformation operation obtains the corresponding attention feature map, and after nonlinear operation, the output result is decomposed into the feature map f in the horizontal direction according to the space dimension h ∈R C/r×H And feature mapping f in the vertical direction w ∈R C/r×W The method comprises the steps of carrying out a first treatment on the surface of the Feature map f in horizontal direction h ∈R C/r×H And feature mapping f in the vertical direction w ∈R C/r×W By 1X 1 convolution and F h And F w A transformation operation, namely obtaining the attention weight by using a Sigmoid activation function, and finally multiplying the attention weight in the horizontal and vertical directions with the input feature map data to obtain a final output feature;
the input feature map X is decomposed into one-dimensional features in the horizontal and vertical directions, specifically:
wherein H represents the height of the feature vector, c represents the c-th channel, W represents the width of the input feature map of the current module, x represents the decomposed feature vector, W represents the width of the feature vector, and H represents the height of the input feature map of the current module;
the input feature map is connected with the horizontal and vertical position information and F is carried out through 1X 1 convolution 1 The transformation operation obtains a corresponding attention feature map, specifically:
f=δ(F 1 ([z h ,z w ])),f∈R C/r×(H+W) (3)
wherein f is the coded intermediate feature map, delta is a nonlinear activation function, [ ·, ] is a splicing operation, and r is a superparameter controlling the size of the module;
the feature map f in the horizontal direction h ∈R C/r×H And feature mapping f in the vertical direction w ∈R C/r×W By 1X 1 convolution and F h And F w The transformation operation, which obtains the weight of the attention by using the Sigmoid activation function, specifically comprises the following steps:
g h =σ(F h (f h )) (4)
g w =σ(F w (f w )) (5)
wherein σ represents a Sigmoid activation function;
the method comprises the steps of multiplying the attention weights in the horizontal and vertical directions with input feature map data to obtain final output features, wherein the final output features are specifically as follows:
the BDC measurement module is introduced with a depth Brownian distance covariance method, a BDC matrix of a support set is calculated, an average operation is carried out on the obtained BDC matrix to obtain prototype representation of each category, then the BDC matrix of a query set is calculated, and the BDC matrix and the category prototypes are respectively subjected to inner products to obtain similarity between a query image and each category, so that the category of the image is predicted;
calculating a square Euclidean distance matrix of the input feature map:
square Euclidean distance matrixSquare to obtain Euclidean distance matrix
To European distance matrixSubtracting the element mean value of each row, the element mean value of each column and the element mean values of all the elements to obtain a final BDC matrix.
The square Euclidean distance matrix of the input feature map is calculated, and the square Euclidean distance matrix specifically comprises:
wherein the method comprises the steps of The squared Euclidean distance represented as the kth and the first column of matrix X; i represents an identity matrix, ">Is the Hadamard product, defined as (U) sym =(U+U T )/2;1∈R c×c Represented as a matrix with all values of 1;
the square Euclidean distance matrix is subjected to open square to obtain the Euclidean distance matrixThe method comprises the following steps:
the pair of Euclidean distance matrixesSubtracting outThe element mean value of each row, the element mean value of each column and the element mean value of all the elements are used for obtaining a final BDC matrix, specifically:
the Softmax layer outputs the results of the BDC metric module classification.
And S103, training the depth model based on the training set, and stopping training when the training times reach a maximum training time threshold value and the performance of the depth model on the verification set reaches a preset requirement or is not obviously improved any more, so as to obtain the optimized depth model.
Inputting training set data into a depth model, accelerating the training of the depth model by adopting an Adam optimization method, and selecting a cross entropy loss function by using a ReLU activation function, wherein the loss function is specifically as follows:
L=cross_entropy(softmax(F),Label) (10)
wherein cross_entropy is a cross entropy loss function, F is a feature extracted from the training set by the feature extractor, and Label represents a true distribution corresponding to the feature extracted from the training set by the feature extractor.
And S104, carrying out classification prediction on the test set based on the optimized depth model and the meta learning N-way K-shot mode, and evaluating the classification accuracy.
Inputting the test set into an optimized depth model, classifying and predicting the data of the test set in a meta-learning N-way K-shot mode, namely randomly selecting N types of images in the test set, selecting K samples in each type as a support set for training the model, selecting the K samples in the N types as a query set, verifying by using the query set, and verifying the superiority of each algorithm by comparing the classification accuracy obtained by calculation.
Referring to fig. 2, the invention discloses a small sample image classification system based on coordinate attention and BDC metrics, comprising:
the dividing module is used for collecting images and dividing the collected images to obtain a training set, a verification set and a test set;
the construction module is used for constructing a depth model frame;
the acquisition module is used for training the depth model based on the training set, and stopping training when the training times reach the maximum training times threshold value and the performance of the depth model on the verification set reaches the preset requirement or is not obviously improved any more, so as to acquire the optimized depth model;
and the evaluation module is used for carrying out classification prediction on the test set based on the optimized depth model and the meta learning N-way K-shot mode and evaluating the classification accuracy.
Examples: referring to fig. 3, the small sample image classification method based on coordinate attention and BDC measurement disclosed by the invention specifically comprises the following steps:
step S1: a data set is acquired and partitioned.
S11: and acquiring a public small sample data set miniImageNet, omniglot and a small sample fine granularity data set CUB-200, and a tire pattern image data set CIIP-TPID built by a platform cooperated with public security departments and supported by a Western-security university image and information processing institute.
S12: the miniImageNet dataset is divided into a training set 64 class, a verification set 16 class and a test set 20 class, and the image size is 84 multiplied by 84. The 1200 samples in the Omniglot dataset were trained and the remaining 423 were used for testing, with each image size being 28 x 28. The CUB-200 data set has 200 categories, wherein the training set 130 category, the verification set 20 category and the test set 50 category, and the image size is 84 multiplied by 84. The present experiment divided the CIIP-TPID dataset into three sub-datasets containing different images, respectively: tire surface pattern data sets, tire indentation data sets, and hybrid data sets, each subset having 69 types of image data. Of these, 46 kinds are used as training sets, 10 kinds are used as verification sets, 13 kinds are used as test sets, and the size of each image is set to 48×48. Each category in the surface dataset and the indentation dataset contains a different tire surface pattern image and tire indentation image, respectively, each category containing 80 image samples. Each class of the blended dataset contains 160 blended samples of the tire surface pattern image and the tire indentation image.
Step S2: and constructing a depth model frame.
As shown in FIG. 4, this example consists of an embedding module Resnet-12, a coordinate attention module, a BDC metric module, and a Softmax layer.
The present example uses ResNet-12 as the embedded feature extraction network, and the ResNet-12 network architecture is shown in FIG. 5. The embedded module performs feature extraction on the input training set;
as shown in fig. 6, the coordinate attention module performs global average pooling along the horizontal and vertical directions respectively, and decomposes the input feature map X into one-dimensional features in the horizontal and vertical directions to obtain horizontal and vertical position information of the input feature map; connecting the horizontal and vertical position information of the input feature map together, F by 1X 1 convolution 1 The transformation operation obtains the corresponding attention feature map, and after nonlinear operation, the output result is decomposed into the feature map f in the horizontal direction according to the space dimension h ∈R C/r×H And feature mapping f in the vertical direction w ∈R C/r×W The method comprises the steps of carrying out a first treatment on the surface of the Feature map f in horizontal direction h ∈R C/r×H And feature mapping f in the vertical direction w ∈R C/r×W By 1X 1 convolution and F h And F w A transformation operation, namely obtaining the attention weight by using a Sigmoid activation function, and finally multiplying the attention weight in the horizontal and vertical directions with the input feature map data to obtain a final output feature;
the coordinate attention module of this example performs global average pooling along its horizontal and vertical directions, respectively, and decomposes the input feature map X into two one-dimensional features, to obtain horizontal and vertical position information of the input feature map. The one-dimensional characteristics of the horizontal and vertical direction outputs are shown in the formula (11) and the formula (12), respectively:
wherein h represents the height of the feature vector, c represents the c-th channel, W is the width of the input feature map of the current module, and x is the decomposed feature vector; w represents the width of the feature vector, and H is the height of the current module input feature map.
Connecting the horizontal and vertical position information of the input feature map together, F by 1X 1 convolution 1 The transformation operation results in a corresponding attention profile map as shown in equation (13):
f=δ(F 1 ([z h ,z w ])),f∈R C/r×(H+W) (13)
wherein f is the encoded intermediate feature map, delta is a nonlinear activation function, [ ·, ] is a concatenation operation, and r is a hyper-parameter controlling the size of the module.
After nonlinear operation, decomposing the output result into a feature map f in the horizontal direction according to the space dimension h ∈R C/r×H And feature mapping f in the vertical direction w ∈R C/r×W . Feature map f in horizontal direction h ∈R C/r×H And feature mapping f in the vertical direction w ∈R C/r×W By 1X 1 convolution and F h And F w A transformation operation of obtaining the weight of the attention using the Sigmoid activation function as shown in the formula (14) and the formula (15):
g h =σ(F h (f h )) (14)
g w =σ(F w (f w )) (15)
where σ represents the Sigmoid activation function.
Multiplying the horizontal and vertical attention weights with the input profile data yields the final output characteristics as shown in equation (16):
in the BDC measurement module, firstly, calculating a BDC matrix of the support set image, then carrying out average operation on the obtained BDC matrix by means of the thought of a prototype network to obtain prototype representation of each category, then calculating the BDC matrix of the query set image, and respectively carrying out inner product on the BDC matrix and the category prototype to obtain similarity between the query image and each category, so as to predict the category to which the image belongs.
In order to fully mine the statistical characteristics among the image feature distributions, the present example introduces a depth brown distance covariance method in the BDC metric module, as shown in fig. 7. First, a square Euclidean distance matrix of an input feature map is calculated according to formula (17):
wherein the method comprises the steps of Represented as the squared euclidean distance of the kth and the first column of matrix X. I represents an identity matrix, ">Is the Hadamard product, defined as (U) sym =(U+U T )/2;1∈R c×c Represented as a matrix with all values of 1. Then, square opening operation is carried out on the square Euclidean distance matrix to obtain the Euclidean distance matrix +.>As shown in equation (18):
to European distance matrixSubtracting the element mean value of each row, the element mean value of each column and the total element mean value to obtain a final BDC matrix, as shown in formula (19):
step S3: and (5) training a network model.
Training 64 classes of the miniImageNet dataset, using 16 classes as a training set, and testing 20 classes; training 1200 types of samples in the Omniglot data set, and testing the rest 423 types; testing 130 kinds of CUB-200 data sets as training sets, 20 kinds of CUB-200 data sets as training sets and 50 kinds of CUB-200 data sets; the CIIP-TPID data set was tested with 46 classes as training set, 10 classes as validation set, and 13 classes as test set. The present example trains the training set into the network as follows:
inputting training set data into a network, accelerating model training by adopting an Adam optimization method, and selecting cross entropy by using a ReLU activation function and a loss function, wherein the batch size is 16. The experiment was run for a total of 200 epochs, each of which was trained 100 times. Model learning rate of 1×10 -3 Every 20 epochs are reduced to half of the original, and when the designated training wheel times are completed, the performance of the model on the verification set reaches the preset requirement or is not obviously improved any more, the training is ended.
The loss function is specifically:
L=cross_entropy(softmax(F),Label) (20)
wherein cross_entropy is a cross entropy loss function, F is a feature extracted from the training set by the feature extractor, and Label represents a true distribution corresponding to the feature extracted from the training set by the feature extractor.
The ReLU activation function is shown in equation (21), with a gradient of 1 in the case of x >0, so that the gradient vanishing problem is alleviated. And when x <0, the weights cannot be updated, resulting in a "death problem" for the ReLU.
Step S4: classification prediction and model performance assessment.
The test set is input to a model with highest accuracy obtained in the step S3 on the verification set, classification prediction is carried out on the test set data in a meta-learning N-way K-shot mode, namely N types of images are randomly selected from the data set, K samples are selected from each type of images as supporting sets for training the model, and N is generally {5,10}, K is generally {1,5}. And secondly, selecting a small amount of samples remained in the N classes as a query set, and finally using the query set for verification, and verifying the superiority of each algorithm through the classification accuracy obtained by comparison calculation. And the test verification is carried out by adopting a 5-way 1-shot mode and a 5-way 5-shot mode.
The small sample image classification method based on the coordinate attention is completed.
In order to verify the beneficial effects of the invention, the inventor adopts the method of the embodiment 1 of the invention to carry out simulation experiments, and the experimental conditions are as follows:
1. simulation conditions
Hardware platform: NVIDIA TITAN XP GPUs,
software platform: the operating system Ubuntu 16.04,
software environment: pyCharm 2019.2
CPU:Intel(R)Xeon(R)CPU E5-2620 v4@2.10GHz
2. Simulation content and results
Under the simulation conditions, the classification accuracy of the model is tested in the modes of 5-way 1-shot and 5-way 5-shot in the data set of miniImageNet, omniglot, CUB-200 and CIIP-TPID. Compared with a prototype network of a 4-layer convolution network, the 1-shot classification result of the method on the Omniglot, miniImageNet, CUB-200 data set is respectively improved by 20.15%, 1.03% and 33.82%, and compared with the prototype network of the 4-layer convolution network, the 1-shot classification result of the CIIP-TPID data set applied to the public security field is at least improved by 5.36%, and the method has stronger generalization.
The embodiment of the invention provides terminal equipment. The terminal device of this embodiment includes: a processor, a memory, and a computer program stored in the memory and executable on the processor. The steps of the various method embodiments described above are implemented when the processor executes the computer program. Alternatively, the processor may implement the functions of the modules/units in the above-described device embodiments when executing the computer program.
The computer program may be divided into one or more modules/units, which are stored in the memory and executed by the processor to accomplish the present invention.
The terminal equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The terminal device may include, but is not limited to, a processor, a memory.
The processor may be a central processing unit (CentralProcessingUnit, CPU), but may also be other general purpose processors, digital signal processors (DigitalSignalProcessor, DSP), application specific integrated circuits (ApplicationSpecificIntegratedCircuit, ASIC), off-the-shelf programmable gate arrays (Field-ProgrammableGateArray, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like.
The memory may be used to store the computer program and/or module, and the processor may implement various functions of the terminal device by running or executing the computer program and/or module stored in the memory and invoking data stored in the memory.
The modules/units integrated in the terminal device may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), an electrical carrier signal, a telecommunication signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method of classifying small sample images based on coordinate attention and BDC metrics, comprising:
collecting images and dividing the collected images to obtain a training set, a verification set and a test set;
constructing a depth model frame;
training the depth model based on the training set, and stopping training when the training times reach the maximum training times threshold value and the performance of the depth model on the verification set reaches the preset requirement or is not obviously improved any more, so as to obtain an optimized depth model;
and carrying out classification prediction on the test set based on the optimized depth model and the element learning N-way K-shot mode, and evaluating the classification accuracy.
2. The method of small sample image classification based on coordinate attention and BDC metrics of claim 1, wherein the capturing and classifying the captured image comprises: acquiring a public small sample data set miniImageNet, omniglot, a small sample fine granularity data set CUB-200 and a tire pattern data set CIIP-TPID applied to the public security field; training set, test set and validation set partitioning are performed on the miniImageNet dataset, the Omniglot dataset, the CUB-200 dataset and the CIIP-TPID dataset, respectively.
3. The method of small sample image classification based on coordinate attention and BDC metrics of claim 2, wherein the depth model framework comprises: an embedding module, a coordinate attention module, a BDC measurement module and a Softmax layer;
the embedded module performs feature extraction on the input training set; the coordinate attention module carries out global average pooling along the horizontal direction and the vertical direction respectively, and decomposes the input feature map X into one-dimensional features in the horizontal direction and the vertical direction to obtain the horizontal position information and the vertical position information of the input feature map; connecting the horizontal and vertical position information of the input feature map together, F by 1X 1 convolution 1 The transformation operation obtains the corresponding attention feature map, and after nonlinear operation, the output result is decomposed into the feature map f in the horizontal direction according to the space dimension h ∈R C/r×H And feature mapping f in the vertical direction w ∈R C/r×W The method comprises the steps of carrying out a first treatment on the surface of the Feature map f in horizontal direction h ∈R C/r×H And feature mapping f in the vertical direction w ∈R C/r×W By 1X 1 convolution and F h And F w A transformation operation, namely obtaining the attention weight by using a Sigmoid activation function, and finally multiplying the attention weight in the horizontal and vertical directions with the input feature map data to obtain a final output feature;
the BDC measurement module is introduced with a depth Brownian distance covariance method, a BDC matrix of a support set is calculated, an average operation is carried out on the obtained BDC matrix to obtain prototype representation of each category, then the BDC matrix of a query set is calculated, and the BDC matrix and the category prototypes are respectively subjected to inner products to obtain similarity between a query image and each category, so that the category of the image is predicted;
the Softmax layer outputs the results of the BDC metric module classification.
4. A method of classifying small sample images based on coordinate attention and BDC metrics according to claim 3, characterized in that said decomposing the input feature map X into one-dimensional features in horizontal and vertical directions is:
wherein H represents the height of the feature vector, c represents the c-th channel, W represents the width of the input feature map of the current module, x represents the decomposed feature vector, W represents the width of the feature vector, and H represents the height of the input feature map of the current module;
the input feature map is connected with the horizontal and vertical position information and F is carried out through 1X 1 convolution 1 The transformation operation obtains a corresponding attention feature map, specifically:
f=δ(F 1 ([z h ,z w ])),f∈R C/r×(H+W) (3)
wherein f is the coded intermediate feature map, delta is a nonlinear activation function, [ ·, ] is a splicing operation, and r is a superparameter controlling the size of the module;
the feature map f in the horizontal direction h ∈R C/r×H And feature mapping f in the vertical direction w ∈R C/r×W By 1X 1 convolution and F h And F w The transformation operation, which obtains the weight of the attention by using the Sigmoid activation function, specifically comprises the following steps:
g h =σ(F h (f h )) (4)
g w =σ(F w (f w )) (5)
wherein σ represents a Sigmoid activation function;
the method comprises the steps of multiplying the attention weights in the horizontal and vertical directions with input feature map data to obtain final output features, wherein the final output features are specifically as follows:
5. the small sample image classification method based on coordinate attention and BDC metric according to claim 4, wherein a depth brown distance covariance method is introduced into the BDC metric module, and a BDC matrix of a support set image is calculated, specifically:
calculating a square Euclidean distance matrix of the input feature map:
open square is carried out on the square Euclidean distance matrix to obtain the Euclidean distance matrix
To European distance matrixSubtracting the element mean value of each row, the element mean value of each column and the element mean values of all the elements to obtain a final BDC matrix;
the square Euclidean distance matrix of the input feature map is calculated, and the square Euclidean distance matrix specifically comprises:
wherein the method comprises the steps of The squared Euclidean distance represented as the kth and the first column of matrix X; i represents an identity matrix, ">Is the Hadamard product, defined as (U) sym =(U+U T )/2;1∈R c×c Represented as a matrix with all values of 1;
the square Euclidean distance matrix is subjected to open square to obtain the Euclidean distance matrixThe method comprises the following steps:
the pair of Euclidean distance matrixesSubtracting the element mean value of each row, the element mean value of each column and the element mean value of all elements to obtain a final BDC matrix, wherein the final BDC matrix comprises the following concrete steps:
6. the method for classifying small sample images based on coordinate attention and BDC metrics according to claim 5, characterized in that the training of depth model based on training set is specifically:
inputting training set data into a depth model, accelerating the training of the depth model by adopting an Adam optimization method, and selecting a cross entropy loss function by using a ReLU activation function, wherein the loss function is specifically as follows:
L=cross_entropy(softmax(F),Label) (10)
wherein cross_entropy is a cross entropy loss function, F is a feature extracted from the training set by the feature extractor, and Label represents a true distribution corresponding to the feature extracted from the training set by the feature extractor.
7. The small sample image classification method based on coordinate attention and BDC measurement according to claim 6, wherein the method for classifying and predicting the test set based on the optimized depth model and the meta-learning N-way K-shot is used for evaluating the classification accuracy, specifically:
inputting the test set into an optimized depth model, classifying and predicting the data of the test set in a meta-learning N-way K-shot mode, namely randomly selecting N types of images in the test set, selecting K samples in each type as a support set for training the model, selecting the K samples in the N types as a query set, verifying by using the query set, and verifying the superiority of each algorithm by comparing the classification accuracy obtained by calculation.
8. A small sample image classification system based on coordinate attention and BDC metrics, comprising:
the dividing module is used for collecting images and dividing the collected images to obtain a training set, a verification set and a test set;
the construction module is used for constructing a depth model frame;
the acquisition module is used for training the depth model based on the training set, and stopping training when the training times reach the maximum training times threshold value and the performance of the depth model on the verification set reaches the preset requirement or is not obviously improved any more, so as to acquire the optimized depth model;
and the evaluation module is used for carrying out classification prediction on the test set based on the optimized depth model and the meta learning N-way K-shot mode and evaluating the classification accuracy.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-7 when the computer program is executed.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1-7.
CN202311076911.8A 2023-08-24 2023-08-24 Small sample image classification method, system, device and medium based on coordinate attention and BDC measurement Pending CN117095222A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311076911.8A CN117095222A (en) 2023-08-24 2023-08-24 Small sample image classification method, system, device and medium based on coordinate attention and BDC measurement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311076911.8A CN117095222A (en) 2023-08-24 2023-08-24 Small sample image classification method, system, device and medium based on coordinate attention and BDC measurement

Publications (1)

Publication Number Publication Date
CN117095222A true CN117095222A (en) 2023-11-21

Family

ID=88769575

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311076911.8A Pending CN117095222A (en) 2023-08-24 2023-08-24 Small sample image classification method, system, device and medium based on coordinate attention and BDC measurement

Country Status (1)

Country Link
CN (1) CN117095222A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118038190A (en) * 2024-04-09 2024-05-14 深圳精智达技术股份有限公司 Training method, device and storage medium of deep prototype network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118038190A (en) * 2024-04-09 2024-05-14 深圳精智达技术股份有限公司 Training method, device and storage medium of deep prototype network

Similar Documents

Publication Publication Date Title
Srinivas et al. Deep transfer learning approaches in performance analysis of brain tumor classification using MRI images
CN109344736B (en) Static image crowd counting method based on joint learning
Hoang et al. Automatic recognition of asphalt pavement cracks based on image processing and machine learning approaches: a comparative study on classifier performance
Al-Ghraibah et al. An automated classification approach to ranking photospheric proxies of magnetic energy build-up
Agrawal et al. Pixels to voxels: modeling visual representation in the human brain
Sachdeva et al. A systematic method for breast cancer classification using RFE feature selection
CN117095222A (en) Small sample image classification method, system, device and medium based on coordinate attention and BDC measurement
Beura et al. Classification of mammogram using two‐dimensional discrete orthonormal S‐transform for breast cancer detection
CN106991355A (en) The face identification method of the analytical type dictionary learning model kept based on topology
CN112560966B (en) Polarized SAR image classification method, medium and equipment based on scattering map convolution network
AlBdairi et al. Identifying ethnics of people through face recognition: A deep CNN approach
CN105868711B (en) Sparse low-rank-based human behavior identification method
Nawalaniec Classifying and analysis of random composites using structural sums feature vector
Chang et al. Hyperspectral band selection based on parallel particle swarm optimization and impurity function band prioritization schemes
Hermansen et al. Uncovering 2-d toroidal representations in grid cell ensemble activity during 1-d behavior
Xie et al. Optimal number of clusters in explainable data analysis of agent-based simulation experiments
Conti et al. An eddy tracking algorithm based on dynamical systems theory
CN104021399A (en) SAR object identification method based on range profile time-frequency diagram non-negative sparse coding
CN109460777B (en) Picture classification method and device and computer readable storage medium
CN103258211A (en) Handwriting digital recognition method and system
Farhadloo et al. Towards Spatially-Lucid AI Classification in Non-Euclidean Space: An Application for MxIF Oncology Data
haj ali Wafa et al. Biological cells classification using bio-inspired descriptor in a boosting k-NN framework
Qu et al. Enhancing understandability of omics data with shap, embedding projections and interactive visualisations
Malakooti et al. An efficient algorithm for human cell detection in electron microscope images based on cluster analysis and vector quantization techniques
Arcolano et al. Nyström approximation of Wishart matrices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination