CN112418351B - Zero sample learning image classification method based on global and local context sensing - Google Patents
Zero sample learning image classification method based on global and local context sensing Download PDFInfo
- Publication number
- CN112418351B CN112418351B CN202011460544.8A CN202011460544A CN112418351B CN 112418351 B CN112418351 B CN 112418351B CN 202011460544 A CN202011460544 A CN 202011460544A CN 112418351 B CN112418351 B CN 112418351B
- Authority
- CN
- China
- Prior art keywords
- global
- feature
- local
- feature map
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a zero sample learning image classification method based on global and local context sensing, which comprises the following steps: carrying out feature extraction on the image by using a deep neural network to obtain a multilayer feature map; calculating any layer of feature map by using global attention to obtain a feature map containing global information; calculating the characteristic diagram of the same layer by using local attention to obtain a characteristic vector representing local information; obtaining a global feature vector from the last layer of global feature map through a full connection layer; performing element-by-element addition on the multiple groups of local feature vectors to obtain complete local feature vectors; splicing the complete local feature vector and the global feature vector, projecting the complete local feature vector and the global feature vector to a semantic space and a hidden feature space, and performing parameter optimization by respectively adopting softmax loss and triple loss; and repeating the steps, setting a plurality of periods for training to obtain a zero sample learning model with strong representation capacity, and classifying the images through the trained zero sample learning model.
Description
Technical Field
The invention relates to the field of image classification, in particular to a zero sample learning image classification method based on global and local context sensing.
Background
Deep learning techniques have evolved rapidly, and their related applications have been practiced in a number of fields (computer vision, natural language processing, etc.), since deep learning can utilize massive amounts of data for model training and thus achieve powerful recognition capabilities. However, the training sample may not cover all of the classes. In particular, for existing data, it is also inherently subject to long tail distributions, which means that only a few common classes can provide a large number of samples, while the most uncommon classes can collect a very limited amount of samples. The phenomenon is reflected in deep learning, that is, the deep learning model can achieve ideal recognition accuracy for common classes due to the fact that the training samples are abundant, but for uncommon classes, the recognition capability of the model is different from that of the former models in nature. In particular, for classes for which no training samples are collected, the recognition capability is zero. However, the model is applied in reality, and not only needs to obtain strong recognition capability from the collected data, but also needs to have recognition capability when a brand new category without any training sample appears. New categories such as new species and new models of electronic equipment are generated every day in the world, the identification capability of unseen categories can be realized, the key turning of development of deep learning systems to date is realized, and the task of identifying unseen categories can be solved through zero-sample learning.
Zero-sample learning is a deep learning technique that mimics the ability of the human brain to recognize, and Lambert states that humans can recognize perhaps 30,000 fundamental classes, as well as fine-grained subclasses of these classes. In addition to identifying the categories that have been seen and using this knowledge to identify fine-grained sub-categories, humans can identify entirely new categories or concepts, such as those that can be accurately identified when they first see zebra, as expressed by the expression "look similar to horses, with black and white stripes".
In the zero-sample learning image classification task, the model can only use the images from the known classes, but can identify the classes to which the images from the unknown classes belong, so that the task of identifying the unknown classes can be completed, because a high-level semantic indication for describing the characteristics of the object, such as attributes, is used, and the unknown classes and the known classes are linked by assuming that the known classes and the unknown classes share all the attributes. Generally speaking, the zero sample learning step is as follows, in the training phase, the model learns a visual-semantic mapping, in the inference phase, for an image of unknown class, firstly, the image is converted into the form of semantic vector by using the mapping relation learned in the previous step, then, the semantic vector is compared with the real attribute vector of unknown class, and the closest class is selected as the prediction result.
Existing zero sample learning algorithms can be classified into two categories, one being model-based algorithms and the other being compatibility-based algorithms, depending on whether new training data is generated during the training phase. The first type of algorithm generates images according to semantic description of unknown classes, and trains the images together with the existing known class images by adopting a traditional deep learning mode. However, the existing methods have a plurality of defects, such as that the generated unknown class images cannot well restore the details and the generated unknown class characteristics have no interpretability. Such methods ignore the importance of information-rich visual regions in the image. The second category of methods directly uses semantic knowledge to learn a visual-semantic mapping relationship by aligning the visual space with the semantic space. Most models based on the compatibility method focus on how to mine the discriminative local information that the object itself has, and how to better align two different spaces. However, the forward contribution of global information to the zero sample learning task is ignored.
Disclosure of Invention
The invention provides a zero sample learning image classification method based on global and local context sensing, which considers global features and local features at the same time, enhances the learned mapping expression capability, and further improves the performance of a zero sample learning model, as described in detail in the following:
a zero sample learning image classification method based on global and local context sensing comprises the following steps:
performing feature extraction on the image by using a deep neural network to obtain a multilayer feature map;
calculating any layer of feature map by using global attention to obtain a feature map containing global information; calculating the characteristic diagram of the same layer by using local attention to obtain a characteristic vector representing local information;
obtaining a global feature vector from the last layer of global feature map through a full connection layer; performing element-by-element addition on the multiple groups of local feature vectors to obtain complete local feature vectors;
splicing the complete local feature vector and the global feature vector, projecting the complete local feature vector and the global feature vector to a semantic space and an implicit feature space, and respectively adopting softmax loss and triple loss to carry out parameter optimization;
and repeating the steps, setting a plurality of periods for training to obtain a zero sample learning model with strong representation capacity, and classifying the images through the trained zero sample learning model.
The calculating any layer of feature map by using global attention to obtain the feature map containing global information specifically includes:
obtaining a spatial self-attention module weight matrix, and using the obtained weight matrix to the eigenvalueWeighting is effected to obtain a weighted value>Based on the weighted characteristic, a residual linking method is adopted to add->Get>
Will obtainIs re-dimensioned to be as large as the original feature map, is>Will->The global context information is transmitted to the last layer by taking the same operation in the multi-layer characteristic diagram.
Further, the spatial self-attention module weight matrix is specifically:
wherein the content of the first and second substances,dimensional information, softmax, representing variables col In and/or in>Transposing of query features for re-dimension, <' >>For re-dimensional key features, T is transposed, L = H × W is the product of the length and the width of the characteristic diagram, is->Is a re-dimensioned feature map.
wherein, alpha is a balance factor; c is the number of channels of the characteristic diagram,to a re-dimensioned feature map.
Further, the obtaining of the feature vector representing the local information by using the local attention to calculate the feature map of the same layer is specifically:
calculating by a space converter and carrying out matrix multiplication with an original characteristic diagram to obtain a plurality of corresponding region Rs, and extracting characteristics by adopting an initiation for each region Rs:
processing the IR by adopting global maximum pooling and global average pooling on the extracted features; processing the IR' obtained from the plurality of areas by adopting element-by-element addition to obtain the characteristics which finally represent the local area; and respectively learning visual-semantic mapping and visual-implicit mapping, and splicing.
The technical scheme provided by the invention has the beneficial effects that:
1. the method leads the model to be more adaptive to a zero sample learning classification task by directly training the sample of the original image;
2. according to the invention, the global attention module is adopted to extract global context information from the original feature map to generate the feature map containing global information, the global features extracted by the model have strong expression capability, and the global understanding of the model to the object is enhanced;
3. according to the method, a local attention module is adopted to extract local context information of an original characteristic diagram to obtain local characteristic vectors, the same steps are adopted for a plurality of characteristic diagrams, and finally the plurality of local characteristic vectors are summed to obtain complete local characteristic vectors, so that the local understanding of the model to the object is enhanced;
4. according to the method, complete feature expression is obtained by adopting a feature splicing mode, both global information and local information are considered, the representation capability of the model is greatly improved, and the model precision is improved;
5. according to the method, the scheme of projecting image features to a semantic space and a hidden space at the same time is adopted, and softmax loss and triplet loss are respectively adopted to optimize and update parameters.
Drawings
FIG. 1 is a flow chart of a zero sample learning image classification method based on global and local context awareness;
FIG. 2 is a schematic diagram of a global attention module;
FIG. 3 is a schematic diagram of a space transformer;
fig. 4 is a schematic diagram of an initiation network.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
Example 1
A zero sample learning image classification method based on global and local context sensing, referring to FIG. 1, comprises the following steps:
101: carrying out feature extraction on the image by using a deep neural network to obtain a multilayer feature map;
102: calculating any layer of feature map by using global attention to obtain a feature map containing global information;
103: calculating the characteristic diagram of the same layer by using local attention to obtain a characteristic vector representing local information;
104: repeating the operations of steps 102 and 103 for multiple layers to obtain a plurality of global feature maps and local feature vectors;
105: obtaining a global feature vector from the last layer of global feature map through a full connection layer; performing element-by-element addition on the multiple groups of local feature vectors to obtain complete local feature vectors;
106: splicing the complete local feature vector and the global feature vector, projecting the complete local feature vector and the global feature vector to a semantic (attribute) space and an implicit feature space simultaneously, and performing parameter optimization by respectively adopting softmax loss and triple loss;
107: and repeating the steps, setting a plurality of periods for training, finally obtaining a zero sample learning model with strong representation capability, and classifying the images through the trained zero sample learning model.
In summary, in the embodiment of the present invention, the deep neural network is used to calculate the feature maps extracted from the image through global attention, so as to obtain new feature maps containing global information, and local features are obtained by calculating local attention for each feature map; calculating a plurality of groups of feature maps, finally performing feature fusion, and projecting the fused features to a semantic (attribute) space and a hidden feature space simultaneously; by the method, the learned features are enhanced, the expression ability of the learned mapping is improved, and the classification accuracy of the model is improved.
Example 2
The scheme of example 1 is further described below with reference to specific calculation formulas and examples, which are described in detail below:
first, the basic setup is introduced:
training setContaining Ns samples, wherein>The ith image, representing a known class s>Is its corresponding class label. Test set>Contains Nu samples, wherein->The jth sample, representing an unknown class u>Is its corresponding class label. The semantic features of the known class and the unknown class can be represented as:
and & ->The known class and the unknown class are disjoint,Y s ∪Y u = Y. Using phi (x) = theta (x) T W represents the projection of the visual features in the semantic space, wherein theta (x) is the visual features extracted by the deep neural network, W represents a conversion matrix, and T represents transposition. σ (x) represents the projection of the visual feature in the hidden space.
In zero-sample learning, the training phase can only use known class images and semantic features (attributes), and the model needs to obtain the capability of predicting unknown classes by learning visual-semantic mapping or visual-implicit feature mapping.
1. Global context information extraction
Convolutional layers are important components of deep neural networks, but are limited by the size of their convolutional kernels, so that the features extracted by deep neural networks inevitably contain only local information. However, for computer vision tasks such as image classification, image segmentation and object detection, extracting more global features is the key to improve the model characterization capability. If global information can be introduced into some layers, the dilemma limited by the size of a convolution kernel can be relieved, and the performance of the deep neural network is improved. It is critical to be able to extract global information from the image.
The global self-attention module is initially used in natural language processing tasks and subsequently widely applied in computer vision tasks. Specifically, global self-attention can be gained by:
for an input profile, X ∈ R C×H×W Firstly, a set of convolution operations is adopted, the size of a convolution kernel is 1 x 1, and query characteristics Q, key characteristics K and value characteristics are generatedAnd a re-dimension feature->Wherein Q, K ∈ R C′×H×W C' denotes the number of reduced feature map channels, based on the comparison result, is selected>L = H × W, R represents dimension information of a variable, C represents the number of channels of the feature map, H represents the length of the feature map, and W represents the width of the feature map. />
Then re-dimension Q and K to obtainThen the spatial self-attention module weight matrix at this time can be expressed as:
Wherein α is a balance factor.
In order to prevent the loss of the original information, a residual error chaining mode is adopted, and the method is added on the basis of the weighting characteristicObtaining:
finally, will obtainIs re-dimensioned to the same size as the original feature map, is>Will->The global context information can be transferred to the last layer by taking the same action at multiple layers of feature maps as a new feature map input to the next layer of neural network.
2. Local context information extraction
The local attention module also uses a layer of feature map X ∈ R C×H×W As input, a local feature vector Z ∈ R is output k×1 Wherein the k value is consistent with the dimension size of the attribute feature. The module consists of three sub-modules, namely a space transformer, an initiation and a global max/average pooling. The spatial transformer can be represented as a function ST (-) whose role is to help the network linearity learn the spatial invariance and the translational invariance and extend its range to all affine transformations or nonradiative transformations. This means that the spatial transformer can learn a transformation that can rectify the object that has undergone the affine transformation:
wherein,(t x ,t y ) Representing two-dimensional spatial coordinates (r) h ,r w ) Representing the scale transformation factor, l corresponds to the characteristic diagram of the l-th layer. And obtaining a plurality of corresponding regions by calculating through a space converter and carrying out matrix multiplication on the regions and the original characteristic diagram:
Rs=ST l (X) (5)
for each extracted region R, extracting the characteristics by using the inference:
IR=Inception(Rs) (6)
then processing the IR by respectively adopting global maximum pooling and global average pooling for the extracted features:
IR l =GAP(IR)+GMP(IR) (7)
the features obtained at this time encode important information of the local area. For the IR's obtained from multiple regions, they are processed by element-by-element addition, to obtain the features that ultimately represent local regions,
the model needs to learn two mappings, namely visual-semantic mapping and visual-implicit mapping, which respectively correspond to the two mapping matrixes W a And W b For computational convenience, Z is self-stitched such that its dimension is 2k.
3. Visual-semantic mapping and visual-latent mapping
Dividing the deep neural network into a plurality of layers of feature maps according to different receptive field sizes, extracting global context information from the feature maps by using a global attention module to obtain new feature maps to replace the original feature maps as the input of the next layer of the network, wherein the feature vectors obtained in the last layer contain the global context information. And then the last layer of feature vectors are projected to a semantic space and a hidden space through a full link layer, so that two kinds of mapping, namely visual-semantic mapping and visual-hidden mapping, are generated respectively. And performing parameter optimization by adopting a softmax loss function for visual-semantic mapping, and performing optimization by adopting a triple loss function for visual-implicit mapping. This has the advantage that both the interpretability of the attribute is preserved and the identifiability of the hidden attribute is taken into account.
For visual-semantic mapping, orderBeing a semantic feature of category y, its compatibility score can be expressed as:
wherein, theta x Representing a visual feature, W a Representing the visual-semantic mapping matrix to be learned. Considering the compatibility score s as logits in softmax, then sotfmax loss can be expressed as:
wherein the content of the first and second substances,
for visual-implicit mapping, triple loss is adopted to minimize intra-class distance and maximize inter-class distance, so as to obtain implicit features with discriminativity:
wherein x is i ,x j ,x k Respectively representing anchor point, positive class and negative class samples, mrg representing separation distance and set to 1.0. Combining the visual-semantic mapping, the visual-implicit mapping and the loss function of the clipping network, the overall loss function can be expressed as:
L=L att +αL lat (13)
where α is a balance factor and is set to 1.0.
4. Zero sample learning prediction
Since the visual-semantic mapping and the visual latent feature mapping are learned simultaneously in the training phase, in the testing phase, correspondingly, for the case of the visual-semantic mapping, a test image x is given, whose projection in the semantic space is phi (x), with the goal of assigning it a class label:
for visual-latent feature mapping, image x is tested, its projection in semantic space is σ (x), and the mean of the prototypes of the known class of latent features is:
for an unseen class u, its relationship in semantic space to all the seen classes is first computed:
suppose that unseen class u shares a relationship in hidden space that is consistent with the semantic space:
the prediction of the entire blend can be expressed as,
where s (·, ·) is a compatibility function.
The parameters and the meanings of English abbreviations are as follows:
in the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (3)
1. A zero sample learning image classification method based on global and local context sensing is characterized by comprising the following steps:
1) Carrying out feature extraction on the image by using a deep neural network to obtain a multilayer feature map;
2) Calculating any layer of feature map by using global attention to obtain a feature map containing global information;
3) Calculating the characteristic diagram of the same layer by using local attention to obtain a characteristic vector representing local information;
4) Repeating the operations of the step 2) and the step 3) for multiple layers to obtain a plurality of global feature maps and local feature vectors;
5) Obtaining a global feature vector from the last layer of global feature map through a full connection layer; performing element-by-element addition on the multiple groups of local feature vectors to obtain complete local feature vectors;
splicing the complete local feature vector and the global feature vector, projecting the complete local feature vector and the global feature vector to a semantic space and a hidden feature space, and performing parameter optimization by respectively adopting softmax loss and triple loss;
repeating the steps, setting a plurality of periods for training to obtain a zero sample learning model with strong representation capability, and classifying the images through the trained zero sample learning model;
the calculating any layer of feature map by using global attention to obtain the feature map containing global information specifically comprises:
obtaining a spatial self-attention module weight matrix, and using the obtained weight matrix to the eigenvalueWeighting is carried out to obtain a weighted value->Based on the weighted characteristic, a residual linking method is adopted to add->Get->
Will obtainIs re-dimensioned to be as large as the original feature map, is>Will->Inputting the feature map as a new feature map into a next layer of neural network, adopting the same operation in a plurality of layers of feature maps, and transmitting the global context information to the last layer;
the method for calculating the feature map of the same layer by using local attention to obtain the feature vector representing the local information specifically comprises the following steps:
calculating by a space converter and carrying out matrix multiplication with an original characteristic diagram to obtain a plurality of corresponding region Rs, and extracting characteristics by adopting an initiation for each region Rs:
processing the IR by adopting global maximum pooling and global average pooling on the extracted features; processing the IR' obtained from the plurality of areas by adopting element-by-element addition to obtain the characteristics which finally represent the local area; and respectively learning visual-semantic mapping and visual-implicit mapping, and splicing.
2. The method according to claim 1, wherein the spatial self-attention module weight matrix is specifically:
wherein the content of the first and second substances,dimension information, softmax, representing variables col For the calculation of the softmax score column by column for the matrix, ->Transposing for re-dimension query features, <' > in>To re-dimension a key feature, T is transposed, and L = H × W is the product of the length and width of the feature map.
3. The zero sample learning image classification method based on global and local context awareness according to claim 2,
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011460544.8A CN112418351B (en) | 2020-12-11 | 2020-12-11 | Zero sample learning image classification method based on global and local context sensing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011460544.8A CN112418351B (en) | 2020-12-11 | 2020-12-11 | Zero sample learning image classification method based on global and local context sensing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112418351A CN112418351A (en) | 2021-02-26 |
CN112418351B true CN112418351B (en) | 2023-04-07 |
Family
ID=74775587
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011460544.8A Active CN112418351B (en) | 2020-12-11 | 2020-12-11 | Zero sample learning image classification method based on global and local context sensing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112418351B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113298091A (en) * | 2021-05-25 | 2021-08-24 | 商汤集团有限公司 | Image processing method and device, electronic equipment and storage medium |
CN113435531B (en) * | 2021-07-07 | 2022-06-21 | 中国人民解放军国防科技大学 | Zero sample image classification method and system, electronic equipment and storage medium |
CN113486981B (en) * | 2021-07-30 | 2023-02-07 | 西安电子科技大学 | RGB image classification method based on multi-scale feature attention fusion network |
CN113673599B (en) * | 2021-08-20 | 2024-04-12 | 大连海事大学 | Hyperspectral image classification method based on correction prototype learning |
CN116842329A (en) * | 2023-07-10 | 2023-10-03 | 湖北大学 | Motor imagery task classification method and system based on electroencephalogram signals and deep learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109447115A (en) * | 2018-09-25 | 2019-03-08 | 天津大学 | Zero sample classification method of fine granularity based on multilayer semanteme supervised attention model |
CN109582960A (en) * | 2018-11-27 | 2019-04-05 | 上海交通大学 | The zero learn-by-example method based on structured asso- ciation semantic embedding |
CN110443273A (en) * | 2019-06-25 | 2019-11-12 | 武汉大学 | A kind of zero sample learning method of confrontation identified for natural image across class |
CN111222471A (en) * | 2020-01-09 | 2020-06-02 | 中国科学技术大学 | Zero sample training and related classification method based on self-supervision domain perception network |
CN111598155A (en) * | 2020-05-13 | 2020-08-28 | 北京工业大学 | Fine-grained image weak supervision target positioning method based on deep learning |
CN111881262A (en) * | 2020-08-06 | 2020-11-03 | 重庆邮电大学 | Text emotion analysis method based on multi-channel neural network |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10366166B2 (en) * | 2017-09-07 | 2019-07-30 | Baidu Usa Llc | Deep compositional frameworks for human-like language acquisition in virtual environments |
-
2020
- 2020-12-11 CN CN202011460544.8A patent/CN112418351B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109447115A (en) * | 2018-09-25 | 2019-03-08 | 天津大学 | Zero sample classification method of fine granularity based on multilayer semanteme supervised attention model |
CN109582960A (en) * | 2018-11-27 | 2019-04-05 | 上海交通大学 | The zero learn-by-example method based on structured asso- ciation semantic embedding |
CN110443273A (en) * | 2019-06-25 | 2019-11-12 | 武汉大学 | A kind of zero sample learning method of confrontation identified for natural image across class |
CN111222471A (en) * | 2020-01-09 | 2020-06-02 | 中国科学技术大学 | Zero sample training and related classification method based on self-supervision domain perception network |
CN111598155A (en) * | 2020-05-13 | 2020-08-28 | 北京工业大学 | Fine-grained image weak supervision target positioning method based on deep learning |
CN111881262A (en) * | 2020-08-06 | 2020-11-03 | 重庆邮电大学 | Text emotion analysis method based on multi-channel neural network |
Non-Patent Citations (2)
Title |
---|
"Semantic-Guided Multi-Attention Localization for Zero-Shot Learning";Yizhe Zhu;《arXiv》;20191202;1-11页 * |
"零样本学习中的细粒度图像分类研究";魏杰;《中国优秀硕士学位论文全文数据库信息科技辑》;20200215;正文18-42页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112418351A (en) | 2021-02-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112418351B (en) | Zero sample learning image classification method based on global and local context sensing | |
CN111476294B (en) | Zero sample image identification method and system based on generation countermeasure network | |
CN108764063B (en) | Remote sensing image time-sensitive target identification system and method based on characteristic pyramid | |
CN110059741B (en) | Image recognition method based on semantic capsule fusion network | |
CN114067107B (en) | Multi-scale fine-grained image recognition method and system based on multi-grained attention | |
CN111126482B (en) | Remote sensing image automatic classification method based on multi-classifier cascade model | |
CN105825511A (en) | Image background definition detection method based on deep learning | |
CN110633708A (en) | Deep network significance detection method based on global model and local optimization | |
CN109583483A (en) | A kind of object detection method and system based on convolutional neural networks | |
CN110991532B (en) | Scene graph generation method based on relational visual attention mechanism | |
CN114332578A (en) | Image anomaly detection model training method, image anomaly detection method and device | |
CN112861970B (en) | Fine-grained image classification method based on feature fusion | |
CN111461213A (en) | Training method of target detection model and target rapid detection method | |
CN115937774A (en) | Security inspection contraband detection method based on feature fusion and semantic interaction | |
CN113159067A (en) | Fine-grained image identification method and device based on multi-grained local feature soft association aggregation | |
CN114926693A (en) | SAR image small sample identification method and device based on weighted distance | |
Lechgar et al. | Detection of cities vehicle fleet using YOLO V2 and aerial images | |
CN114780767A (en) | Large-scale image retrieval method and system based on deep convolutional neural network | |
CN113642602A (en) | Multi-label image classification method based on global and local label relation | |
CN117173147A (en) | Surface treatment equipment and method for steel strip processing | |
CN113688864B (en) | Human-object interaction relation classification method based on split attention | |
CN114283289A (en) | Image classification method based on multi-model fusion | |
CN111753915A (en) | Image processing device, method, equipment and medium | |
CN110689071A (en) | Target detection system and method based on structured high-order features | |
Yang et al. | YOLOX with CBAM for insulator detection in transmission lines |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |