CN112784921A - Task attention guided small sample image complementary learning classification algorithm - Google Patents

Task attention guided small sample image complementary learning classification algorithm Download PDF

Info

Publication number
CN112784921A
CN112784921A CN202110150081.3A CN202110150081A CN112784921A CN 112784921 A CN112784921 A CN 112784921A CN 202110150081 A CN202110150081 A CN 202110150081A CN 112784921 A CN112784921 A CN 112784921A
Authority
CN
China
Prior art keywords
network
branch
feature
images
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110150081.3A
Other languages
Chinese (zh)
Inventor
程塨
李瑞敏
郎春博
韩军伟
郭雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202110150081.3A priority Critical patent/CN112784921A/en
Publication of CN112784921A publication Critical patent/CN112784921A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a task attention-guided small sample image complementary learning classification algorithm. Firstly, a double-branch multi-part complementary feature learning module is designed, and distinguishing features of a plurality of significant parts are fused, so that the network can deeply explore and utilize the whole space region of a feature map, and further more distinguishing information can be obtained; then, a task-related attention-guiding module is introduced to enable the neural network to obtain the capability of distinguishing the most important features of the current input category by strengthening or suppressing part of knowledge provided by the meta-learner and finding representative features related to the current task. By combining the multi-part complementary feature learning module and the attention module related to the task, the complementary feature most related to the current input category can be deeply mined, the discrimination capability of the network is improved, the high classification precision is realized under the condition of a small number of training samples, and the high classification accuracy and the high robustness are realized.

Description

Task attention guided small sample image complementary learning classification algorithm
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a small sample image complementary learning classification algorithm guided by task attention, which can realize rapid classification of new class images under the condition of small samples.
Background
Deep learning has achieved significant results in recent years in many data-intensive applications, such as object detection, image classification, and semantic segmentation. However, the performance of the deep learning technique depends heavily on the size of the labeled data volume, and lacks learning ability and generalization ability in a low data state. In real life, a certain difficulty exists in the collection of a large amount of labeled data, and the further development of deep learning is greatly limited. On the one hand, in certain fields, such as the military, it is difficult to obtain a large number of samples due to various limitations. On the other hand, mass data labeling requires a large amount of manpower and material resources. Particularly in some professional fields, the data labeling work needs experts in the industry, and great difficulty is brought to the labeling work of a large amount of data. The small sample learning utilizes prior knowledge, and has higher classification accuracy in the face of new classes with only a small amount of labeled data.
The existing small sample image classification method can be broadly divided into four categories, namely a small sample image classification algorithm based on a model, a small sample image classification algorithm based on measurement, a small sample image classification algorithm based on optimization and a small sample image classification algorithm based on data amplification. The model-based method aims to quickly update parameters of a small number of samples by designing a model structure and directly establish a mapping function between input and prediction, but the traditional gradient descent algorithm has more parameters and cannot quickly realize optimization. The metric-based approach mainly learns the mapping of the image to the embedding space and makes the space somewhat discriminative, however, it is difficult to generalize quickly to new classes with limited training data because it is task-independent. The purpose of the small sample image classification algorithm based on optimization is to obtain a better initialized model or gradient descending direction, so that the model still has good generalization capability when facing a new class with limited sample size, however, the method is easy to be trapped in a local optimal point due to limited data size. The method based on data amplification proposes to generate false data by using a small amount of marker samples so as to realize data amplification, but noise is easily brought to a network due to irrational generated data.
In addition, most of the small sample image classification methods are based on shallow feature extraction networks, and the performance difference of the small sample image classification algorithm on a data set can be obviously reduced when the number of layers of the backbone network is deep. Specifically, when the number of layers of the feature extraction network is shallow, the influence of the intra-class difference on the performance of the algorithm is large, but when a deeper backbone network is used, the influence of the intra-class difference on the performance of the network is significantly reduced. Therefore, it is a future development trend to use a deep backbone network to solve the small sample image classification problem. However, deep networks are prone to overfitting problems, and the overfitting problems caused by the feature expression capability and the network depth of the network need to be effectively balanced. First, deep networks typically tend to identify local regions from the most discriminative object parts, rather than from the entire object, resulting in incomplete feature representations. Furthermore, in small sample image classification algorithms, meta-learning is a learning problem on a set of tasks, and meta-learners are typically shared among all tasks. To achieve the correct classification of new classes under different tasks, one base learner needs to be learned for each task. In such cases, it is a challenge to make the underlying learner more specialized and thus respond to different inputs in a task-dependent manner for different tasks.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a task attention-guided small sample image complementary learning classification algorithm. A double-branch multi-part complementary feature learning module is designed, and the distinguishing features of a plurality of remarkable parts are fused, so that the network can deeply explore and utilize the whole space region of a feature map, and further more distinguishing information can be obtained; then, a task-related attention-guiding module is introduced to enable the neural network to obtain the capability of distinguishing the most important features of the current input category by strengthening or suppressing part of knowledge provided by the meta-learner and finding representative features related to the current task. The invention overcomes the problem of overfitting brought by a deep backbone network from two aspects: on one hand, a GAP layer in a backbone network is used for replacing an FC layer in a VGG network, and a task-related attention guidance module is used for capturing characteristic representation related to a task, so that the parameters of the network are greatly reduced, and overfitting in a small sample scene is avoided; on the other hand, the "erase" operation in the multi-site complementary feature learning module is a "Dropout" strategy with a significant learning capability, which generates a mask according to a given threshold value, thereby inactivating some neurons of the extracted feature map of the backbone network, and finally realizing generalization of the network. The invention can rapidly learn a new category under the condition of a small amount of marked samples, and has higher classification precision and good generalization.
A task attention-guided small sample image complementary learning classification algorithm is characterized by comprising the following steps:
step 1, data preprocessing: classifying an image dataset C into a base class CbaseAnd new class CnovelTwo subsets, CbaseThe image in (1) is a training image with class labels, a new class CnovelEach category only has k marked images, and the value range of k is [1,20 ]](ii) a Pair base class CbaseCarrying out preprocessing operation on the image to obtain a preprocessed base class image; from new class CnovelRandomly extracting a plurality of groups of images to simulate small sample conditions, wherein each group of images is a task, each task comprises n categories, each category comprises k marked images and m images without marks, the images with the marks are marked as support images, the images without the marks are marked as query images, and the support images and the query images are respectively preprocessed to obtain preprocessed images; the preprocessing operation is normalization processing by using a mean value and a standard deviation; n is in the range of [1,5 ]]K has a value range of [1,20 ]]M takes the value of 15;
step 2, constructing a metalearner network: backbone network f of meta-learner networkθHead of Chinese character' HeNetwork fφConstituting, backbone network fθThe first w convolutional layers of the VGG network, wherein the value of w is 5; head network fφConvolution operation with a plurality of different convolution kernels is included, wherein the convolution kernel size of a front p layer is 3 x 3, the convolution kernel size of a rear q layer is 1 x 1, p is 2, and q is 1;
using base class CbaseTraining the metalearner network by all the preprocessed images to obtain a pre-trained metalearner network; wherein, the loss function of the network adopts a cross entropy loss function;
step 3, constructing a basic learner network: modifying the head network f based on the pre-trained metalearner networkφObtaining a basic learner network; wherein the modified header network fφThe system mainly comprises a multi-part complementary feature learning module and a task-related attention module;
the multi-part complementary feature learning module is composed of a branch A and a branch B which are connected in sequence, and specifically comprises the following steps: backbone network fθOutput characteristic F ofmInputting the data into branch A, and obtaining a feature expression F with n channels through two convolutional layers with convolutional kernel size of 3 multiplied by 3 and one convolutional layer with convolutional kernel size of 1 multiplied by 1ha,FhaAnd performing thresholding operation on the obtained activation mapping to obtain the characteristic mask corresponding to the most significant part of the objectAThe threshold parameter of the thresholding operation is a predefined parameter, and the value range is [0.5,0.9 ]](ii) a Then, at FmThe maskASetting the corresponding value to zero to obtain the mask not includedACharacteristic map F'mPrepared from F'mInput to branch B, output feature Fhb(ii) a The branch B comprises two layers of convolution layers with convolution kernel size of 3 multiplied by 3 and one layer of convolution layer with convolution kernel size of 1 multiplied by 1;
the specific implementation process of the attention module related to the task is as follows: first, the backbone network f is operated by global average poolingθOutput characteristic F ofmIs compressed to obtain the global representation characteristics s of C channels[s1,s2,...,sC]Wherein s isiDenotes the average characteristic of the ith channel characteristic, i is 1,2, … C, C denotes the characteristic FmThe number of channels of (a); then, the feature s is transformed through two full-connection layers connected in series to obtain the weight u of each channela=W2(W1(s)), wherein,
Figure BDA0002929875040000031
is a parameter of the first fully-connected layer,
Figure BDA0002929875040000032
taking the value of r as the parameter of the second full connection layer, adding a ReLU activation function behind the first full connection layer, enabling the number of output channels of the second full connection layer to be consistent with the number n of categories, and executing u 'to the weight of each channel by adopting a sigmoid function'a=σ(ua) Operation to obtain normalized weight u'aσ (·) denotes a sigmoid function; meanwhile, feature maps F 'obtained from the multi-part complementary feature learning module are subjected to'mAlso treated as above to give F'mNormalized weight u 'of corresponding each channel'b(ii) a Finally, weight u'aAnd u'bFeature map F obtained in feature learning module respectively complementary with multiple partshaAnd FhbMultiplying to obtain a classification feature map F 'of the branch A and the branch B'haAnd F'hb
Figure BDA0002929875040000041
Figure BDA0002929875040000042
Step 4, training a basic learner network: firstly, inputting each preprocessed support image into a basic learner network to obtain a classification feature map F 'of branch A'haAnd a classified feature map F 'of branch B'hb(ii) a Then, F 'are respectively mixed'haAnd F'hbInputting the data into a GAP layer, and outputting a classification feature map F 'respectively obtaining A branches through a softmax layer'haAnd a classified feature map F 'of branch B'hbThe classification Loss of the basic learner network is calculated according to the prediction probabilities of the two branches, and the basic learner network is updated by adopting a gradient descent method, wherein the overall classification Loss function Loss of the network is as follows:
Loss=LossA+λLossB (3)
LossA=L(fα(Fm),yi) (4)
LossB=L(fβ((Fm⊙maskA),yi)) (5)
therein, LossARepresents the class Loss, of the A branchBRepresents the classification loss of the B branch, and lambda represents the weight occupied by the B branch and has the value range of [0.1, 1%](ii) a L (-) represents the cross entropy loss, fα(. and f)β(. represents a feature extraction operation, fα(. a) two convolutional layers of convolution kernel size 3 × 3 and one convolutional layer of convolution kernel size 1 × 1 including the A branch and a task-dependent attention module in step 3, fβ(. two convolutional layers with a convolutional kernel size of 3 × 3 and one convolutional layer with a convolutional kernel size of 1 × 1 including B branches and a task-related attention module in step 3, indicating channel-by-channel multiplication, yiA label indicating the ith input image, i ═ 1,2, …, k;
step 5, verifying the classification effect: firstly, inputting each preprocessed inquiry image into the basic learner network trained in the step 4 to obtain a classification feature map F'haAnd F'hb(ii) a Then prepared from feature F'haAnd F'hbPerforming fusion, and combining the fused characteristics FhInputting the query image into a GAP layer, obtaining the prediction probabilities of n classes of the query image through a softmax layer, and taking the class corresponding to the maximum value of the prediction probabilities as a classification result; the fusion refers to the characteristic F'haAnd F'hbThe value of each position in the comparisonTaking the maximum value of each position as the feature F after fusionhThe value at that position.
The invention has the beneficial effects that: the prior knowledge learned by the metalearner network in the base class is reserved, so that the basic learner can obtain the rapid learning capability by utilizing the prior knowledge; in the basic learner network, as the processing technology of the multi-part complementary feature learning module is adopted, the discrimination features of the target complementary parts are extracted by utilizing the double-branch network, so that the network can obtain higher classification effect; in the basic learner network, due to the adoption of the attention module related to the task, the network has the capability of distinguishing the most important characteristics of the current input category; by combining the multi-part complementary feature learning module and the attention module related to the task, the complementary features most related to the current input category can be deeply mined, and the identification capability of the network is improved. The invention can realize higher classification precision under the condition of a small amount of training samples, and has higher classification accuracy and better robustness.
Drawings
FIG. 1 is a basic flow chart of the task attention-directed small sample image complementary learning classification algorithm of the present invention;
FIG. 2 is a basic framework diagram of a Metalearner network of the present invention;
FIG. 3 is a basic framework diagram of the basic learner network during a training phase in accordance with the present invention;
FIG. 4 is a basic framework diagram of the basic learner network during a verification phase of the present invention;
FIG. 5 is an example of a database image used by an embodiment of the present invention;
fig. 6 is a visualization result image subjected to classification processing by the method of the present invention.
Detailed Description
The present invention will be further described with reference to the following drawings and examples, which include, but are not limited to, the following examples.
The hardware environment for implementation of the embodiment is as follows: intel (R) core (TM) i3-8100 CPU computer, 8.0GB memory, the running software environment is: ubuntu16.04.5lts and Pycharm 2017. Public databases miniImageNet and CUB-200 were used. The miniImageNet is composed of 100 classes, each class comprises 600 samples, 60000 samples in total, the size of each graph is 84 multiplied by 84, and 100 different classes are divided into 64 base classes, 16 verification classes and 20 new classes; the CUB-200 includes 200 kinds of birds, 11788 images in total, 100, 50, and 50 kinds of images are randomly extracted from 200 kinds of images to form a base class, a verification class, and a new class, and in this embodiment, no processing or use is performed on the verification class in the two data sets. In order to show the reliability of the data, 500 tasks are randomly extracted from the new class respectively under the settings of 5-way 1-shot and 5-way 5-shot for verifying the effect of the model. Under the 5-way 1-shot setting, each task includes 5 categories, each of which picks 1 support image and 15 query images. Under the 5-way 5-shot setting, each task includes 5 categories, each of which picks 5 support images and 15 query images.
As shown in fig. 1, the specific implementation process of the present invention is as follows:
1. data pre-processing
Classifying an image dataset C into a base class CbaseAnd new class CnovelTwo subsets, CbaseThe image in (1) is a training image with class labels, a new class CnovelEach category only has k marked images, and the value range of k is [1,20 ]](ii) a Pair base class CbaseCarrying out preprocessing operation on the image to obtain a preprocessed base class image; from new class CnovelRandomly extracting a plurality of groups of images to simulate small sample conditions, wherein each group of images is a task, each task comprises n categories, each category comprises k marked images and 15 images without marks, the images with the marks are marked as support images, the images without the marks are marked as query images, and the support images and the query images are respectively preprocessed to obtain preprocessed images. n is in the range of [1,5 ]]K has a value range of [1,20 ]]. In this embodiment, n is 5, and k is 1 and 5 in the two tasks, respectively.
The preprocessing operation is normalization processing by using a mean value and a standard deviation, and specifically comprises the following steps:
normalizing the three RGB channels of each image I according to the following formula:
Figure BDA0002929875040000061
wherein, IcC channel, I 'representing an image'cDenotes the normalized c-th channel, MeancRepresents the mean value of the c-th channel, StdcRepresents the standard deviation of the c-th channel.
2. Building Meta learner networks
As shown in FIG. 2, the Metalearner network is composed of a backbone network fθAnd a head network fφConstituting, backbone network fθThe first w convolutional layers of the VGG network, wherein the value of w is 5; head network fφThe convolution operation comprises a plurality of convolution kernels with different sizes, wherein the sizes of the convolution kernels of the first two layers are 3 multiplied by 3, and the sizes of the convolution kernels of the next layer are 1 multiplied by 1.
Using base class CbaseTraining the metalearner network by all the preprocessed images to obtain a pre-trained metalearner network; wherein, the loss function of the network adopts a cross entropy loss function.
3. Building a basic learner network
Based on the pre-trained metalearner network, the backbone network f obtained in the step 2 is usedθAt CbaseFixing the parameters obtained from the data, modifying the head network fφObtaining a basic learner network; wherein the modified header network fφThe system mainly comprises a multi-part complementary feature learning module and a task-related attention module.
The multi-part complementary feature learning module is composed of a branch A and a branch B which are connected in sequence, and specifically comprises the following steps: backbone network fθOutput characteristic F ofm∈RC×W×HInput to branches A, C as feature FmW is the width of the feature map, H is the height of the feature map, in this embodiment, C is 512, and FmTwo convolution layers with convolution kernel size of 3 × 3 and one convolution layer with convolution kernel size of 1 × 1 are passed through to obtain a convolution layer with 5 channelsCharacterization of Fha,FhaThe feature dimension with the largest response is the activation mapping of the target class, wherein in fig. 3 and 4, the convolutional layer with a two-layer convolutional kernel size of 3 × 3 is denoted as "new layer". Carrying out threshold operation on the obtained activation mapping to obtain the characteristic mask of the most significant part of the corresponding objectAThe threshold parameter tau of the thresholding operation is a predefined parameter with a value range of [0.5,0.9 ]]In this embodiment, when the data set is miniImageNet, τ is 0.4; when the data set is CUB, tau is 0.5; then, at FmThe maskASetting the corresponding value to zero to obtain the mask not includedACharacteristic map F'mPrepared from F'mInput to branch B, output feature Fhb. Branch B includes two convolutional layers with a convolutional kernel size of 3 × 3 and one convolutional layer with a convolutional kernel size of 1 × 1.
The specific implementation process of the attention module related to the task is as follows: first, the backbone network f is operated by global average poolingθOutput characteristic F ofmIs compressed to obtain the global representation feature s ═ s of C channels1,s2,...,s512]Wherein s isiDenotes the average characteristic of the ith channel characteristic, i is 1,2, … C, C denotes the characteristic FmThe number of channels of (a), in this embodiment, C is 512; then, the feature s is transformed through two full-connection layers connected in series to obtain the weight u of each channela=W2(W1(s)), wherein,
Figure BDA0002929875040000071
is a parameter of the first fully-connected layer,
Figure BDA0002929875040000072
for the second fully-connected layer parameter, r takes the value 32, in this embodiment
Figure BDA0002929875040000073
The first full connection layer is added with a ReLU activation function, the output channel number of the second full connection layer is consistent with the class number n, in the embodiment, n is 5,and executing u 'on weight of each channel by adopting sigmoid function'a=σ(ua) Operation to obtain normalized weight u'aσ (·) denotes a sigmoid function; meanwhile, feature maps F 'obtained from the multi-part complementary feature learning module are subjected to'mAlso treated as above to give F'mNormalized weight u 'of corresponding each channel'b(ii) a Finally, weight u'aAnd u'bFeature map F obtained in feature learning module respectively complementary with multiple partshaAnd FhbMultiplying to obtain a classification feature map F 'of the branch A and the branch B'haAnd F'hb
Figure BDA0002929875040000074
Figure BDA0002929875040000075
4. Network model for training basic learners
As shown in FIG. 3, firstly, inputting each preprocessed support image into the basic learner network to obtain the classification characteristic map F 'of branch A'haAnd a classified feature map F 'of branch B'hb(ii) a Then, F 'are respectively mixed'haAnd F'hbInputting the data into a GAP layer, and outputting a classification feature map F 'respectively obtaining A branches through a softmax layer'haAnd a classified feature map F 'of branch B'hbThe prediction probability of n classes, n is 5, the classification Loss of the basic learner network is calculated according to the prediction probabilities of the two branches, and the basic learner network is updated by adopting a gradient descent method, wherein the overall classification Loss function Loss of the network is as follows:
Loss=LossA+λLossB (9)
LossA=L(fα(Fm),yi) (10)
LossB=L(fβ((Fm⊙maskA),yi)) (11)
therein, LossARepresents the class Loss, of the A branchBRepresents the classification loss of the B branch, and lambda represents the weight occupied by the B branch and has the value range of [0.1, 1%]In this embodiment, λ is 0.5 for miniImageNet dataset and 0.1 for CUB-200; l (-) represents the cross entropy loss, fα(. and f)β(. represents a feature extraction operation, fα(. a) two convolutional layers of convolution kernel size 3 × 3 and one convolutional layer of convolution kernel size 1 × 1 including the A branch and a task-dependent attention module in step 3, fβ(. two convolutional layers with a convolutional kernel size of 3 × 3 and one convolutional layer with a convolutional kernel size of 1 × 1 including B branches and a task-related attention module in step 3, indicating channel-by-channel multiplication, yiA label indicating the ith input image, i ═ 1,2, …, k.
5. Classification effect verification
The verification process of the basic learner network is as shown in fig. 4, firstly, inputting each preprocessed inquiry image into the basic learner network trained in the step 4 to obtain a classification feature map F'haAnd F'hb(ii) a Then prepared from feature F'haAnd F'hbPerforming fusion, and combining the fused characteristics FhInputting the image data into a GAP layer, obtaining the prediction probabilities of n categories of the query image through a softmax layer, taking n as 5, and taking the category corresponding to the maximum value of the prediction probabilities as the classification result. The fusion refers to the characteristic F'haAnd F'hbComparing the values of each position, and taking the maximum value of each position as the feature F after fusionhThe value at that position.
And (4) evaluating the effectiveness of the method by selecting the classification accuracy rate accurve. The accuracy is the percentage of the number of correctly classified samples in the total number of samples, and generally, the larger the value of the accuracy is, the better the algorithm effect is. The accuracy is calculated as follows:
Figure BDA0002929875040000081
the relationship among TP, TN, FP and FN is shown in Table 1.
TABLE 1
Figure BDA0002929875040000082
The classification result obtained by adopting the method of the invention is compared with the baseline method on the miniimagenet data set, the comparison result is shown in table 2, and the classification accuracy shows the effectiveness of the method of the invention. Compared with the method, the baseline model does not comprise a multi-part complementary feature learning module and a task-related attention-guiding module. Specifically, the Baseline model is composed of the first 5 convolution blocks of VGG16, and is followed by three convolution layers, where the number of convolution kernels of the first two convolution layers is 512, the size of the convolution kernel is 3 × 3, the step size is 1, the number of convolution kernels of the next convolution layer is 5, the size of the convolution kernel is 1 × 1, and the step size is 1.
TABLE 2
Model 1-shot 5-shot
Baseline 56.75±0.89% 77.22±0.66%
Ours 59.31%±0.99% 79.21%±0.64%
The classification result obtained by the method of the invention is compared with the baseline method on the CUB data set, the comparison result is shown in Table 3, and the classification accuracy rate shows the effectiveness of the method of the invention. Fig. 5 is an example of a partial image of a CUB data set, and the visualization result of fig. 6 on the CUB data set proves the excellent classification effect exhibited by the method of the present invention.
TABLE 3
Model 1-shot 5-shot
Baseline 74.81%±0.88% 92.61%±0.35%
Ours 77.30%±0.86% 94.20%±0.34%

Claims (1)

1. A task attention-guided small sample image complementary learning classification algorithm is characterized by comprising the following steps:
step 1, data preprocessing: classifying an image dataset C into a base class CbaseAnd new class CnovelTwo subsets, CbaseThe image in (1) is a training image with class labels, a new class CnovelEach category only has k marked images, and the value range of k is [1,20 ]](ii) a Pair base class CbaseDrawing (1) ofCarrying out preprocessing operation on the image to obtain a preprocessed base class image; from new class CnovelRandomly extracting a plurality of groups of images to simulate small sample conditions, wherein each group of images is a task, each task comprises n categories, each category comprises k marked images and m images without marks, the images with the marks are marked as support images, the images without the marks are marked as query images, and the support images and the query images are respectively preprocessed to obtain preprocessed images; the preprocessing operation is normalization processing by using a mean value and a standard deviation; n is in the range of [1,5 ]]K has a value range of [1,20 ]]M takes the value of 15;
step 2, constructing a metalearner network: backbone network f of meta-learner networkθAnd a head network fφConstituting, backbone network fθThe first w convolutional layers of the VGG network, wherein the value of w is 5; head network fφConvolution operation with a plurality of different convolution kernels is included, wherein the convolution kernel size of a front p layer is 3 x 3, the convolution kernel size of a rear q layer is 1 x 1, p is 2, and q is 1;
using base class CbaseTraining the metalearner network by all the preprocessed images to obtain a pre-trained metalearner network; wherein, the loss function of the network adopts a cross entropy loss function;
step 3, constructing a basic learner network: modifying the head network f based on the pre-trained metalearner networkφObtaining a basic learner network; wherein the modified header network fφThe system mainly comprises a multi-part complementary feature learning module and a task-related attention module;
the multi-part complementary feature learning module is composed of a branch A and a branch B which are connected in sequence, and specifically comprises the following steps: backbone network fθOutput characteristic F ofmInputting the data into branch A, and obtaining a feature expression F with n channels through two convolutional layers with convolutional kernel size of 3 multiplied by 3 and one convolutional layer with convolutional kernel size of 1 multiplied by 1ha,FhaThe activation mapping with the maximum response characteristic dimension as the target category is subjected to thresholding operation to obtain the acquired activation mappingFeature mask to the most significant part of the corresponding objectAThe threshold parameter of the thresholding operation is a predefined parameter, and the value range is [0.5,0.9 ]](ii) a Then, at FmThe maskASetting the corresponding value to zero to obtain the mask not includedACharacteristic map F'mPrepared from F'mInput to branch B, output feature Fhb(ii) a The branch B comprises two layers of convolution layers with convolution kernel size of 3 multiplied by 3 and one layer of convolution layer with convolution kernel size of 1 multiplied by 1;
the specific implementation process of the attention module related to the task is as follows: first, the backbone network f is operated by global average poolingθOutput characteristic F ofmIs compressed to obtain the global representation feature s ═ s of C channels1,s2,...,sC]Wherein s isiDenotes the average characteristic of the ith channel characteristic, i is 1,2, … C, C denotes the characteristic FmThe number of channels of (a); then, the feature s is transformed through two full-connection layers connected in series to obtain the weight u of each channela=W2(W1(s)), wherein,
Figure FDA0002929875030000021
is a parameter of the first fully-connected layer,
Figure FDA0002929875030000022
taking the value of r as the parameter of the second full connection layer, adding a ReLU activation function behind the first full connection layer, enabling the number of output channels of the second full connection layer to be consistent with the number n of categories, and executing u 'to the weight of each channel by adopting a sigmoid function'a=σ(ua) Operation to obtain normalized weight u'aσ (·) denotes a sigmoid function; meanwhile, feature maps F 'obtained from the multi-part complementary feature learning module are subjected to'mAlso treated as above to give F'mNormalized weight u 'of corresponding each channel'b(ii) a Finally, weight u'aAnd u'bFeature map F obtained in feature learning module respectively complementary with multiple partshaAnd FhbMultiplying to obtain branch A and branch BClassification feature map F 'of branches'haAnd F'hb
Figure FDA0002929875030000023
Figure FDA0002929875030000024
Step 4, training a basic learner network: firstly, inputting each preprocessed support image into a basic learner network to obtain a classification feature map F 'of branch A'haAnd a classified feature map F 'of branch B'hb(ii) a Then, F 'are respectively mixed'haAnd F'hbInputting the data into a GAP layer, and outputting a classification feature map F 'respectively obtaining A branches through a softmax layer'haAnd a classified feature map F 'of branch B'hbThe classification Loss of the basic learner network is calculated according to the prediction probabilities of the two branches, and the basic learner network is updated by adopting a gradient descent method, wherein the overall classification Loss function Loss of the network is as follows:
Loss=LossA+λLossB (3)
LossA=L(fα(Fm),yi) (4)
LossB=L(fβ((Fm⊙maskA),yi)) (5)
therein, LossARepresents the class Loss, of the A branchBRepresents the classification loss of the B branch, and lambda represents the weight occupied by the B branch and has the value range of [0.1, 1%](ii) a L (-) represents the cross entropy loss, fα(. and f)β(. represents a feature extraction operation, fα(. a) two convolutional layers of convolution kernel size 3 × 3 and one convolutional layer of convolution kernel size 1 × 1 including the A branch and a task-dependent attention module in step 3, fβ(. The) two layers of convolution layers with convolution kernel size of 3 x 3 and one layer of convolution layers with convolution kernel size of 1 x 1 including B branch and the stepsThe task-related attention module in step 3, which indicates a channel-by-channel multiplication, yiA label indicating the ith input image, i ═ 1,2, …, k;
step 5, verifying the classification effect: firstly, inputting each preprocessed inquiry image into the basic learner network trained in the step 4 to obtain a classification feature map F'haAnd F'hb(ii) a Then prepared from feature F'haAnd F'hbPerforming fusion, and combining the fused characteristics FhInputting the query image into a GAP layer, obtaining the prediction probabilities of n classes of the query image through a softmax layer, and taking the class corresponding to the maximum value of the prediction probabilities as a classification result; the fusion refers to the characteristic F'haAnd F'hbComparing the values of each position, and taking the maximum value of each position as the feature F after fusionhThe value at that position.
CN202110150081.3A 2021-02-02 2021-02-02 Task attention guided small sample image complementary learning classification algorithm Pending CN112784921A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110150081.3A CN112784921A (en) 2021-02-02 2021-02-02 Task attention guided small sample image complementary learning classification algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110150081.3A CN112784921A (en) 2021-02-02 2021-02-02 Task attention guided small sample image complementary learning classification algorithm

Publications (1)

Publication Number Publication Date
CN112784921A true CN112784921A (en) 2021-05-11

Family

ID=75760722

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110150081.3A Pending CN112784921A (en) 2021-02-02 2021-02-02 Task attention guided small sample image complementary learning classification algorithm

Country Status (1)

Country Link
CN (1) CN112784921A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505861A (en) * 2021-09-07 2021-10-15 广东众聚人工智能科技有限公司 Image classification method and system based on meta-learning and memory network
CN114580571A (en) * 2022-04-01 2022-06-03 南通大学 Small sample power equipment image classification method based on migration mutual learning
CN114926702A (en) * 2022-04-16 2022-08-19 西北工业大学深圳研究院 Small sample image classification method based on depth attention measurement
CN114937199A (en) * 2022-07-22 2022-08-23 山东省凯麟环保设备股份有限公司 Garbage classification method and system based on discriminant feature enhancement

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505861A (en) * 2021-09-07 2021-10-15 广东众聚人工智能科技有限公司 Image classification method and system based on meta-learning and memory network
CN114580571A (en) * 2022-04-01 2022-06-03 南通大学 Small sample power equipment image classification method based on migration mutual learning
CN114580571B (en) * 2022-04-01 2023-05-23 南通大学 Small sample power equipment image classification method based on migration mutual learning
CN114926702A (en) * 2022-04-16 2022-08-19 西北工业大学深圳研究院 Small sample image classification method based on depth attention measurement
CN114926702B (en) * 2022-04-16 2024-03-19 西北工业大学深圳研究院 Small sample image classification method based on depth attention measurement
CN114937199A (en) * 2022-07-22 2022-08-23 山东省凯麟环保设备股份有限公司 Garbage classification method and system based on discriminant feature enhancement

Similar Documents

Publication Publication Date Title
WO2021134871A1 (en) Forensics method for synthesized face image based on local binary pattern and deep learning
CN112308158B (en) Multi-source field self-adaptive model and method based on partial feature alignment
CN110532900B (en) Facial expression recognition method based on U-Net and LS-CNN
CN113378632B (en) Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method
CN110413924B (en) Webpage classification method for semi-supervised multi-view learning
CN112784921A (en) Task attention guided small sample image complementary learning classification algorithm
CN109063649B (en) Pedestrian re-identification method based on twin pedestrian alignment residual error network
CN111738143B (en) Pedestrian re-identification method based on expectation maximization
CN113011357B (en) Depth fake face video positioning method based on space-time fusion
Badawi et al. A hybrid memetic algorithm (genetic algorithm and great deluge local search) with back-propagation classifier for fish recognition
CN112308115B (en) Multi-label image deep learning classification method and equipment
CN112819023A (en) Sample set acquisition method and device, computer equipment and storage medium
CN115410026A (en) Image classification method and system based on label propagation contrast semi-supervised learning
CN113688894B (en) Fine granularity image classification method integrating multiple granularity features
CN114038037B (en) Expression label correction and identification method based on separable residual error attention network
CN112364974B (en) YOLOv3 algorithm based on activation function improvement
CN112381248A (en) Power distribution network fault diagnosis method based on deep feature clustering and LSTM
CN112232395B (en) Semi-supervised image classification method for generating countermeasure network based on joint training
CN113920472A (en) Unsupervised target re-identification method and system based on attention mechanism
Vora et al. Iterative spectral clustering for unsupervised object localization
CN111310820A (en) Foundation meteorological cloud chart classification method based on cross validation depth CNN feature integration
CN114897085A (en) Clustering method based on closed subgraph link prediction and computer equipment
CN114625908A (en) Text expression package emotion analysis method and system based on multi-channel attention mechanism
CN114417975A (en) Data classification method and system based on deep PU learning and class prior estimation
CN117516937A (en) Rolling bearing unknown fault detection method based on multi-mode feature fusion enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination