CN112418351A - Zero sample learning image classification method based on global and local context sensing - Google Patents

Zero sample learning image classification method based on global and local context sensing Download PDF

Info

Publication number
CN112418351A
CN112418351A CN202011460544.8A CN202011460544A CN112418351A CN 112418351 A CN112418351 A CN 112418351A CN 202011460544 A CN202011460544 A CN 202011460544A CN 112418351 A CN112418351 A CN 112418351A
Authority
CN
China
Prior art keywords
global
feature
local
feature map
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011460544.8A
Other languages
Chinese (zh)
Other versions
CN112418351B (en
Inventor
王国威
陶文源
管乃洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202011460544.8A priority Critical patent/CN112418351B/en
Publication of CN112418351A publication Critical patent/CN112418351A/en
Application granted granted Critical
Publication of CN112418351B publication Critical patent/CN112418351B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a zero sample learning image classification method based on global and local context sensing, which comprises the following steps: carrying out feature extraction on the image by using a deep neural network to obtain a multilayer feature map; calculating any layer of feature map by using global attention to obtain a feature map containing global information; calculating the characteristic diagram of the same layer by using local attention to obtain a characteristic vector representing local information; obtaining a global feature vector from the last layer of global feature map through a full connection layer; performing element-by-element addition on the multiple groups of local feature vectors to obtain complete local feature vectors; splicing the complete local feature vector and the global feature vector, projecting the complete local feature vector and the global feature vector to a semantic space and an implicit feature space, and respectively adopting softmax loss and triple loss to carry out parameter optimization; and repeating the steps, setting a plurality of periods for training to obtain a zero sample learning model with strong representation capability, and classifying the images through the trained zero sample learning model.

Description

Zero sample learning image classification method based on global and local context sensing
Technical Field
The invention relates to the field of image classification, in particular to a zero sample learning image classification method based on global and local context sensing.
Background
Deep learning techniques have been developed rapidly, and their related applications have been practiced in a number of fields (computer vision, natural language processing, etc.), since deep learning can utilize massive data for model training and thus obtain powerful recognition capabilities. However, the training sample may not cover all of the classes. In particular, for existing data, it is also inherently subject to long tail distributions, which means that only a few common classes can provide a large number of samples, while the most uncommon classes can collect a very limited amount of samples. The phenomenon is reflected in deep learning, that is, the deep learning model can achieve ideal recognition accuracy for common classes due to the fact that the training samples are abundant, but for uncommon classes, the recognition capability of the model is different from that of the former models in nature. In particular, for classes for which no training samples are collected, the recognition capability is zero. However, the model is applied in reality, and not only needs to obtain strong recognition capability from the collected data, but also needs to have recognition capability when a brand new category without any training sample appears. New categories such as new species and new models of electronic equipment are generated every day in the world, the identification capability of unseen categories can be realized, the key turning of development of deep learning systems to date is realized, and the task of identifying unseen categories can be solved through zero-sample learning.
Zero-sample learning is a deep learning technique that mimics the ability of the human brain to recognize, and Lambert states that humans can recognize perhaps 30,000 fundamental classes, as well as fine-grained subclasses of these classes. In addition to identifying the categories that have been seen and using this knowledge to identify fine-grained sub-categories, humans can identify entirely new categories or concepts, such as those that can be accurately identified when they first see zebra, as expressed by the expression "look similar to horses, with black and white stripes".
In the zero-sample learning image classification task, the model can only use the images from the known classes, but can identify the classes to which the images from the unknown classes belong, so that the task of identifying the unknown classes can be completed, because a high-level semantic indication for describing the characteristics of the object, such as attributes, is used, and the unknown classes and the known classes are linked by assuming that the known classes and the unknown classes share all the attributes. Generally speaking, the zero sample learning step is as follows, in the training phase, the model learns a visual-semantic mapping, in the inference phase, for an image of unknown class, firstly, the image is converted into the form of semantic vector by using the mapping relation learned in the previous step, then, the semantic vector is compared with the real attribute vector of unknown class, and the closest class is selected as the prediction result.
Existing zero sample learning algorithms can be classified into two categories, one being model-based algorithms and the other being compatibility-based algorithms, depending on whether new training data is generated during the training phase. The first type of algorithm generates images according to semantic descriptions of unknown classes and trains the images together with the existing known classes of images by adopting a traditional deep learning mode. However, the existing methods have a plurality of defects, such as that the generated unknown class images cannot well restore the details and the generated unknown class characteristics have no interpretability. Such methods ignore the importance of information-rich visual regions in the image. The second category of methods directly uses semantic knowledge to learn a visual-semantic mapping relationship by aligning the visual space with the semantic space. Most models based on the compatibility method focus on how to mine the discriminative local information that the object itself has, and how to better align two different spaces. However, the forward contribution of global information to the zero sample learning task is ignored.
Disclosure of Invention
The invention provides a zero sample learning image classification method based on global and local context sensing, which considers global features and local features at the same time, enhances the learned mapping expression capability, and further improves the performance of a zero sample learning model, as described in detail in the following:
a zero sample learning image classification method based on global and local context sensing comprises the following steps:
carrying out feature extraction on the image by using a deep neural network to obtain a multilayer feature map;
calculating any layer of feature map by using global attention to obtain a feature map containing global information; calculating the characteristic diagram of the same layer by using local attention to obtain a characteristic vector representing local information;
obtaining a global feature vector from the last layer of global feature map through a full connection layer; performing element-by-element addition on the multiple groups of local feature vectors to obtain complete local feature vectors;
splicing the complete local feature vector and the global feature vector, projecting the complete local feature vector and the global feature vector to a semantic space and an implicit feature space, and respectively adopting softmax loss and triple loss to carry out parameter optimization;
and repeating the steps, setting a plurality of periods for training to obtain a zero sample learning model with strong representation capability, and classifying the images through the trained zero sample learning model.
The calculating any layer of feature map by using global attention to obtain the feature map containing global information specifically includes:
obtaining a spatial self-attention module weight matrix, and using the obtained weight matrix to the eigenvalue
Figure BDA0002831416280000021
Weighting to obtain weighted value
Figure BDA0002831416280000022
Adding the residual error linkage mode on the basis of the weighting characteristics
Figure BDA0002831416280000023
To obtain
Figure BDA0002831416280000024
Will obtain
Figure BDA0002831416280000025
Re-dimensioned to the same size as the original feature map,
Figure BDA0002831416280000026
will be provided with
Figure BDA0002831416280000027
And inputting the new feature map into a next layer of neural network, taking the same operation in multiple layers of feature maps, and transferring the global context information to the last layer.
Further, the spatial self-attention module weight matrix is specifically:
Figure BDA0002831416280000031
wherein the content of the first and second substances,
Figure BDA0002831416280000032
dimensional information, softmax, representing variablescolIn order to realize the purpose,
Figure BDA0002831416280000033
to transpose the re-dimensioned query features,
Figure BDA0002831416280000034
for re-dimensioned key features, T is transposed, L × W is the product of the length and width of the feature map,
Figure BDA0002831416280000035
is a re-dimensioned feature map.
Wherein the weighting values
Figure BDA0002831416280000036
Comprises the following steps:
Figure BDA0002831416280000037
Figure BDA0002831416280000038
wherein, alpha is a balance factor; c is the channel number of the characteristic diagram,
Figure BDA0002831416280000039
is a re-dimensioned feature map.
Further, the calculating the feature map of the same layer by using the local attention to obtain the feature vector representing the local information specifically includes:
calculating by a space converter and carrying out matrix multiplication with an original characteristic diagram to obtain a plurality of corresponding region Rs, and extracting characteristics of each region Rs by adopting an acceptance:
processing the IR by adopting global maximum pooling and global average pooling on the extracted features; processing the IR' obtained from the plurality of areas by adopting element-by-element addition to obtain the characteristics which finally represent the local area; and respectively learning visual-semantic mapping and visual-implicit mapping, and splicing.
The technical scheme provided by the invention has the beneficial effects that:
1. the method leads the model to be more adaptive to a zero sample learning classification task by directly training the sample of the original image;
2. according to the method, the global attention module is adopted to extract global context information from the original feature map, the feature map containing global information is generated, the global features extracted by the model have strong expression capability, and the global understanding of the model to the object is enhanced;
3. according to the method, a local attention module is adopted to extract local context information of an original characteristic diagram to obtain local characteristic vectors, the same steps are adopted for a plurality of characteristic diagrams, and finally the plurality of local characteristic vectors are summed to obtain complete local characteristic vectors, so that the local understanding of the model to the object is enhanced;
4. according to the method, a complete feature expression is obtained by adopting a feature splicing mode, and both global information and local information are considered, so that the representation capability of the model is greatly improved, and the precision of the model is improved;
5. according to the method, the scheme of projecting image features to a semantic space and a hidden space at the same time is adopted, and softmax loss and triplet loss are respectively adopted to optimize and update parameters.
Drawings
FIG. 1 is a flow chart of a zero sample learning image classification method based on global and local context sensing;
FIG. 2 is a schematic diagram of a global attention module;
FIG. 3 is a schematic diagram of a space transformer;
fig. 4 is a schematic diagram of an initiation network.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
Example 1
A zero sample learning image classification method based on global and local context sensing, referring to FIG. 1, the method comprises the following steps:
101: carrying out feature extraction on the image by using a deep neural network to obtain a multilayer feature map;
102: calculating any layer of feature map by using global attention to obtain a feature map containing global information;
103: calculating the characteristic diagram of the same layer by using local attention to obtain a characteristic vector representing local information;
104: repeating the operations of steps 102 and 103 for multiple layers to obtain a plurality of global feature maps and local feature vectors;
105: obtaining a global feature vector from the last layer of global feature map through a full connection layer; performing element-by-element addition on the multiple groups of local feature vectors to obtain complete local feature vectors;
106: splicing the complete local feature vector and the global feature vector, projecting the complete local feature vector and the global feature vector to a semantic (attribute) space and an implicit feature space simultaneously, and performing parameter optimization by respectively adopting softmax loss and triple loss;
107: and repeating the steps, setting a plurality of periods for training, finally obtaining a zero sample learning model with strong representation capability, and classifying the images through the trained zero sample learning model.
In summary, in the embodiment of the present invention, a deep neural network is used to calculate the feature map extracted from the image through global attention, so as to obtain a new feature map including global information, and local features are obtained by calculating local attention for each feature map; calculating by a plurality of groups of feature maps, finally performing feature fusion, and projecting the fused features to a semantic (attribute) space and a hidden feature space simultaneously; by the method, the learned features are enhanced, the expression ability of the learned mapping is improved, and the classification accuracy of the model is improved.
Example 2
The scheme of example 1 is further described below with reference to specific calculation formulas and examples, which are described in detail below:
first, the basic setup is introduced:
training set
Figure BDA0002831416280000051
Containing Ns samples, wherein
Figure BDA0002831416280000052
The ith image representing a known class s,
Figure BDA0002831416280000053
is its corresponding class label. Test set
Figure BDA0002831416280000054
Contains Nu samples, wherein
Figure BDA0002831416280000055
The jth sample representing the unknown class u,
Figure BDA0002831416280000056
is its corresponding class label. The semantic features of the known class and the unknown class can be represented as:
Figure BDA0002831416280000057
and
Figure BDA0002831416280000058
the known class and the unknown class are disjoint,
Figure BDA0002831416280000059
Ys∪Yuy. Using phi (x) as theta (x)TW represents the projection of the visual features in the semantic space, wherein theta (x) is the visual features extracted by the deep neural network, W represents a conversion matrix, and T represents transposition. σ (x) represents the projection of the visual feature in the hidden space.
In zero-sample learning, the training phase can only use known class images and semantic features (attributes), and the model needs to obtain the capability of predicting unknown classes by learning visual-semantic mapping or visual-implicit feature mapping.
One, global context information extraction
The convolutional layer is an important component of the deep neural network, but is limited by the size of the convolutional kernel, so that the features extracted by the deep neural network inevitably contain only local information. However, for computer vision tasks such as image classification, image segmentation and object detection, extracting more global features is key to improving the model characterization capability. If global information can be introduced into some layers, the dilemma limited by the size of a convolution kernel can be relieved, and the performance of the deep neural network is improved. It is critical to be able to extract global information from the image.
The global self-attention module is initially used in natural language processing tasks and subsequently widely used in computer vision tasks. Specifically, global self-attention can be gained by:
for an input profile, X ∈ RC×H×WFirstly, a set of convolution operations is adopted, the size of a convolution kernel is 1 x 1, and a query feature Q, a key feature K and a value feature are generated
Figure BDA00028314162800000510
And re-dimension features
Figure BDA00028314162800000513
Wherein Q, K ∈ RC′×H×WC' represents the number of reduced feature map channels,
Figure BDA00028314162800000512
l is H × W, R represents dimension information of the variable, C represents the number of channels of the feature map, H represents the length of the feature map, and W represents the width of the feature map.
Then re-dimension Q and K to obtain
Figure BDA0002831416280000061
Then the spatial self-attention module weight matrix at this time can be expressed as:
Figure BDA0002831416280000062
then using the obtained weight matrix to make an eigenvalue pair
Figure BDA0002831416280000063
Weighting to obtain: (2)
Figure BDA0002831416280000064
wherein α is a balance factor.
In order to prevent the loss of original information, a residual linking mode is adopted, and a weighting characteristic is added
Figure BDA0002831416280000065
Obtaining:
Figure BDA0002831416280000066
finally, will obtain
Figure BDA0002831416280000067
Re-dimensioned to the same size as the original feature map,
Figure BDA0002831416280000068
will be provided with
Figure BDA0002831416280000069
The global context information can be transferred to the last layer by taking the same action at multiple layers of feature maps as a new feature map input to the next layer of neural network.
Second, local context information extraction
The local attention module also uses a layer of feature map X ∈ RC×H×WAs input, a local feature vector Z ∈ R is outputk×1Wherein the k value is consistent with the dimension size of the attribute feature. The module consists of three sub-modules, namely a space transformer, an initiation and a global max/average pooling. The spatial transformer can be represented as a function ST (-) whose role is to help the network linearity learn the spatial invariance and the translational invariance and extend its range to all affine transformations or nonradiative transformations. This means that the spatial transformer can learn a transformation that can rectify the object that has undergone the affine transformation:
Figure BDA00028314162800000610
wherein (t)x,ty) Representing two-dimensional spatial coordinates (r)h,rw) Representing the scale transformation factor, l corresponds to the characteristic diagram of the ith layer. And obtaining a plurality of corresponding regions by calculating through a space converter and carrying out matrix multiplication on the regions and the original characteristic diagram:
Rs=STl(X) (5)
for each extracted region R, extracting the characteristics by using the inference:
IR=Inception(Rs) (6)
then processing the IR by respectively adopting global maximum pooling and global average pooling for the extracted features:
IRl=GAP(IR)+GMP(IR) (7)
the features obtained at this time encode important information of the local area. For the IR's obtained from multiple regions, they are processed by element-by-element addition to obtain features that ultimately represent local regions,
Figure BDA0002831416280000071
the model needs to learn two mappings, namely visual-semantic mapping and visual-implicit mapping, which respectively correspond to the two mapping matrixes WaAnd WbFor computational convenience, Z is self-stitched such that its dimension is 2 k.
Three, visual-semantic mapping and visual-latent mapping
Dividing the deep neural network into a plurality of layers of feature maps according to different receptive field sizes, extracting global context information from the feature maps by using a global attention module to obtain new feature maps to replace the original feature maps as input of the next layer of the network, wherein feature vectors obtained in the last layer contain the global context information. And then the last layer of feature vectors are projected to a semantic space and a hidden space through a full link layer, so that two kinds of mapping, namely visual-semantic mapping and visual-hidden mapping, are generated respectively. And performing parameter optimization by adopting a softmax loss function for visual-semantic mapping, and performing optimization by adopting a triple loss function for visual-implicit mapping. This has the advantage that both the interpretability of the attribute is preserved and the identifiability of the hidden attribute is taken into account.
For visual-semantic mapping, order
Figure BDA0002831416280000072
Being a semantic feature of category y, its compatibility score can be expressed as:
Figure BDA0002831416280000073
wherein, thetaxRepresenting a visual feature, WaRepresenting the visual-semantic mapping matrix that needs to be learned. Considering the compatibility score s as logits in softmax, then sotfmax loss can be expressed as:
Figure BDA0002831416280000074
wherein the content of the first and second substances,
Figure BDA0002831416280000075
for visual-implicit mapping, triple loss is adopted to minimize intra-class distance and maximize inter-class distance, so as to obtain implicit features with discriminativity:
Figure BDA0002831416280000076
wherein x isi,xj,xkRespectively, an anchor point, a positive class sample and a negative class sample, and mrg represents a separation distance and is set to 1.0. Loss function of combined visual-semantic mapping, visual-implicit mapping and clipping network, integral loss functionThe number may be expressed as:
L=Latt+αLlat (13)
where α is a balance factor and is set to 1.0.
Four, zero sample learning prediction
Since the visual-semantic mapping and the visual implicit feature mapping are learned simultaneously in the training phase, in the testing phase, correspondingly, for the case of the visual-semantic mapping, a test image x is given whose projection in the semantic space is phi (x), with the goal of assigning it a class label:
Figure BDA0002831416280000081
for visual-latent feature mapping, image x is tested, its projection in semantic space is σ (x), and the mean of the prototypes of the known class of latent features is:
Figure BDA0002831416280000082
for an unseen class u, its relationship in semantic space to all the seen classes is first computed:
Figure BDA0002831416280000083
the unseen class u is assumed to share a relationship in the hidden space that is consistent with the semantic space:
Figure BDA0002831416280000084
the prediction of the entire blend can be expressed as,
Figure BDA0002831416280000085
where s (·, ·) is a compatibility function.
The parameters and the meanings of English abbreviations are as follows:
Figure BDA0002831416280000086
Figure BDA0002831416280000091
in the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (5)

1. A zero sample learning image classification method based on global and local context sensing is characterized by comprising the following steps:
carrying out feature extraction on the image by using a deep neural network to obtain a multilayer feature map;
calculating any layer of feature map by using global attention to obtain a feature map containing global information; calculating the characteristic diagram of the same layer by using local attention to obtain a characteristic vector representing local information;
obtaining a global feature vector from the last layer of global feature map through a full connection layer; performing element-by-element addition on the multiple groups of local feature vectors to obtain complete local feature vectors;
splicing the complete local feature vector and the global feature vector, projecting the complete local feature vector and the global feature vector to a semantic space and an implicit feature space, and respectively adopting softmax loss and triple loss to carry out parameter optimization;
and repeating the steps, setting a plurality of periods for training to obtain a zero sample learning model with strong representation capability, and classifying the images through the trained zero sample learning model.
2. The method for zero-sample learning image classification based on global and local context awareness according to claim 1, wherein the computing is performed on any layer of feature map by using global attention, and the obtaining of the feature map containing global information specifically includes:
obtaining a spatial self-attention module weight matrix, and using the obtained weight matrix to the eigenvalue
Figure FDA0002831416270000011
Weighting to obtain weighted value
Figure FDA0002831416270000012
Adding the residual error linkage mode on the basis of the weighting characteristics
Figure FDA0002831416270000013
To obtain
Figure FDA0002831416270000014
Will obtain
Figure FDA0002831416270000015
Re-dimensioned to the same size as the original feature map,
Figure FDA0002831416270000016
will be provided with
Figure FDA0002831416270000017
And inputting the new feature map into a next layer of neural network, taking the same operation in multiple layers of feature maps, and transferring the global context information to the last layer.
3. The method according to claim 2, wherein the spatial self-attention module weight matrix is specifically:
Figure FDA0002831416270000018
wherein the content of the first and second substances,
Figure FDA00028314162700000111
dimensional information, softmax, representing variablescolTo calculate the softmax score by column for the matrix,
Figure FDA0002831416270000019
to re-dimension the transpose of the query features,
Figure FDA00028314162700000110
for re-dimension key features, T is transposed and L — H × W is the product of the length and width of the feature map.
4. The zero sample learning image classification method based on global and local context awareness according to claim 3,
the weighting values
Figure FDA0002831416270000021
Comprises the following steps:
Figure FDA0002831416270000022
Figure FDA0002831416270000023
wherein, alpha is a balance factor; c is the channel number of the characteristic diagram,
Figure FDA0002831416270000024
is a re-dimensioned feature map.
5. The method according to claim 1, wherein the computing of the feature map of the same layer using local attention to obtain the feature vector representing the local information specifically comprises:
calculating by a space converter and carrying out matrix multiplication with an original characteristic diagram to obtain a plurality of corresponding region Rs, and extracting characteristics of each region Rs by adopting an acceptance:
processing the IR by adopting global maximum pooling and global average pooling on the extracted features; processing the IR' obtained from the plurality of areas by adopting element-by-element addition to obtain the characteristics which finally represent the local area; and respectively learning visual-semantic mapping and visual-implicit mapping, and splicing.
CN202011460544.8A 2020-12-11 2020-12-11 Zero sample learning image classification method based on global and local context sensing Active CN112418351B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011460544.8A CN112418351B (en) 2020-12-11 2020-12-11 Zero sample learning image classification method based on global and local context sensing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011460544.8A CN112418351B (en) 2020-12-11 2020-12-11 Zero sample learning image classification method based on global and local context sensing

Publications (2)

Publication Number Publication Date
CN112418351A true CN112418351A (en) 2021-02-26
CN112418351B CN112418351B (en) 2023-04-07

Family

ID=74775587

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011460544.8A Active CN112418351B (en) 2020-12-11 2020-12-11 Zero sample learning image classification method based on global and local context sensing

Country Status (1)

Country Link
CN (1) CN112418351B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298091A (en) * 2021-05-25 2021-08-24 商汤集团有限公司 Image processing method and device, electronic equipment and storage medium
CN113435531A (en) * 2021-07-07 2021-09-24 中国人民解放军国防科技大学 Zero sample image classification method and system, electronic equipment and storage medium
CN113486981A (en) * 2021-07-30 2021-10-08 西安电子科技大学 RGB image classification method based on multi-scale feature attention fusion network
CN113673599A (en) * 2021-08-20 2021-11-19 大连海事大学 Hyperspectral image classification method based on correction prototype learning
CN116842329A (en) * 2023-07-10 2023-10-03 湖北大学 Motor imagery task classification method and system based on electroencephalogram signals and deep learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190073353A1 (en) * 2017-09-07 2019-03-07 Baidu Usa Llc Deep compositional frameworks for human-like language acquisition in virtual environments
CN109447115A (en) * 2018-09-25 2019-03-08 天津大学 Zero sample classification method of fine granularity based on multilayer semanteme supervised attention model
CN109582960A (en) * 2018-11-27 2019-04-05 上海交通大学 The zero learn-by-example method based on structured asso- ciation semantic embedding
CN110443273A (en) * 2019-06-25 2019-11-12 武汉大学 A kind of zero sample learning method of confrontation identified for natural image across class
CN111222471A (en) * 2020-01-09 2020-06-02 中国科学技术大学 Zero sample training and related classification method based on self-supervision domain perception network
CN111598155A (en) * 2020-05-13 2020-08-28 北京工业大学 Fine-grained image weak supervision target positioning method based on deep learning
CN111881262A (en) * 2020-08-06 2020-11-03 重庆邮电大学 Text emotion analysis method based on multi-channel neural network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190073353A1 (en) * 2017-09-07 2019-03-07 Baidu Usa Llc Deep compositional frameworks for human-like language acquisition in virtual environments
CN109447115A (en) * 2018-09-25 2019-03-08 天津大学 Zero sample classification method of fine granularity based on multilayer semanteme supervised attention model
CN109582960A (en) * 2018-11-27 2019-04-05 上海交通大学 The zero learn-by-example method based on structured asso- ciation semantic embedding
CN110443273A (en) * 2019-06-25 2019-11-12 武汉大学 A kind of zero sample learning method of confrontation identified for natural image across class
CN111222471A (en) * 2020-01-09 2020-06-02 中国科学技术大学 Zero sample training and related classification method based on self-supervision domain perception network
CN111598155A (en) * 2020-05-13 2020-08-28 北京工业大学 Fine-grained image weak supervision target positioning method based on deep learning
CN111881262A (en) * 2020-08-06 2020-11-03 重庆邮电大学 Text emotion analysis method based on multi-channel neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YIZHE ZHU: ""Semantic-Guided Multi-Attention Localization for Zero-Shot Learning"", 《ARXIV》 *
魏杰: ""零样本学习中的细粒度图像分类研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298091A (en) * 2021-05-25 2021-08-24 商汤集团有限公司 Image processing method and device, electronic equipment and storage medium
WO2022247128A1 (en) * 2021-05-25 2022-12-01 上海商汤智能科技有限公司 Image processing method and apparatus, electronic device, and storage medium
CN113435531A (en) * 2021-07-07 2021-09-24 中国人民解放军国防科技大学 Zero sample image classification method and system, electronic equipment and storage medium
CN113435531B (en) * 2021-07-07 2022-06-21 中国人民解放军国防科技大学 Zero sample image classification method and system, electronic equipment and storage medium
CN113486981A (en) * 2021-07-30 2021-10-08 西安电子科技大学 RGB image classification method based on multi-scale feature attention fusion network
CN113673599A (en) * 2021-08-20 2021-11-19 大连海事大学 Hyperspectral image classification method based on correction prototype learning
CN113673599B (en) * 2021-08-20 2024-04-12 大连海事大学 Hyperspectral image classification method based on correction prototype learning
CN116842329A (en) * 2023-07-10 2023-10-03 湖北大学 Motor imagery task classification method and system based on electroencephalogram signals and deep learning

Also Published As

Publication number Publication date
CN112418351B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN112418351B (en) Zero sample learning image classification method based on global and local context sensing
CN110188705B (en) Remote traffic sign detection and identification method suitable for vehicle-mounted system
CN110532920B (en) Face recognition method for small-quantity data set based on FaceNet method
CN111126482B (en) Remote sensing image automatic classification method based on multi-classifier cascade model
CN110569738B (en) Natural scene text detection method, equipment and medium based on densely connected network
CN110633708A (en) Deep network significance detection method based on global model and local optimization
CN105825511A (en) Image background definition detection method based on deep learning
CN109583483A (en) A kind of object detection method and system based on convolutional neural networks
CN111325318B (en) Neural network training method, neural network training device and electronic equipment
CN111325243B (en) Visual relationship detection method based on regional attention learning mechanism
CN113609896A (en) Object-level remote sensing change detection method and system based on dual-correlation attention
CN112784782B (en) Three-dimensional object identification method based on multi-view double-attention network
Nguyen et al. Satellite image classification using convolutional learning
CN111461213A (en) Training method of target detection model and target rapid detection method
CN109766752B (en) Target matching and positioning method and system based on deep learning and computer
CN107301643A (en) Well-marked target detection method based on robust rarefaction representation Yu Laplce's regular terms
CN112861970A (en) Fine-grained image classification method based on feature fusion
CN115937774A (en) Security inspection contraband detection method based on feature fusion and semantic interaction
CN115147607A (en) Anti-noise zero-sample image classification method based on convex optimization theory
CN110705384A (en) Vehicle re-identification method based on cross-domain migration enhanced representation
CN112668662B (en) Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN113642602A (en) Multi-label image classification method based on global and local label relation
CN114627312B (en) Zero sample image classification method, system, equipment and storage medium
CN113688864B (en) Human-object interaction relation classification method based on split attention
CN115049833A (en) Point cloud component segmentation method based on local feature enhancement and similarity measurement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant