CN110163258B - Zero sample learning method and system based on semantic attribute attention redistribution mechanism - Google Patents

Zero sample learning method and system based on semantic attribute attention redistribution mechanism Download PDF

Info

Publication number
CN110163258B
CN110163258B CN201910335801.6A CN201910335801A CN110163258B CN 110163258 B CN110163258 B CN 110163258B CN 201910335801 A CN201910335801 A CN 201910335801A CN 110163258 B CN110163258 B CN 110163258B
Authority
CN
China
Prior art keywords
semantic
space
image
hidden layer
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910335801.6A
Other languages
Chinese (zh)
Other versions
CN110163258A (en
Inventor
刘洋
蔡登�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201910335801.6A priority Critical patent/CN110163258B/en
Publication of CN110163258A publication Critical patent/CN110163258A/en
Application granted granted Critical
Publication of CN110163258B publication Critical patent/CN110163258B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention discloses a zero sample learning method and a zero sample learning system based on a semantic attribute attention redistribution mechanism, wherein the zero sample learning method comprises the following steps: (1) establishing a neural network model based on a semantic attribute attention redistribution mechanism; (2) re-assigning weights between semantic features using attention of a semantic attribute space; (3) training a neural network model using the labeled image data set; (4) calculating the similarity between the weighted semantic features of the image and semantic prototypes of unknown classes, calculating the similarity between hidden layer features and hidden layer feature prototypes of the unknown classes, and adding the two similarities to obtain the similarity between the test image and each unknown class; (5) and sorting according to the similarity between the images and the classes, and selecting the class with the maximum similarity as the class prediction of the image. The zero sample learning method can enable zero sample learning to be more closely related to the semantic space and the hidden layer space in the training process, so that the result of combined classification of the two spaces is more robust.

Description

Zero sample learning method and system based on semantic attribute attention redistribution mechanism
Technical Field
The invention relates to the field of zero sample learning classification systems, in particular to a zero sample learning method and a zero sample learning system based on a semantic attribute attention redistribution mechanism.
Background
In recent years, object classification has received attention from researchers in the industry and academia as an important branch in the field of computer vision. The supervised object classification task has greatly advanced with the rapid development of deep learning technology, but at the same time, the training method under the supervision has some limitations. In supervised classification, each class requires enough labeled training samples. In addition, the learned classifier can only classify instances belonging to classes covered by the training data, lacking the ability to process previously unseen classes. In practical applications, each class may not have enough training samples, and there may be cases where classes not covered in training appear in the test samples. The zero sample learning aims at classifying examples belonging to classes which are not covered in training, becomes a rapidly developing direction in the field of machine learning, and has wide application in the aspects of computer vision, natural language processing and pervasive computation.
At present, the mainstream zero sample learning method mainly adopts attribute-based two-stage derivation to predict the label of the image. The derivation method comprises the following steps: an image is input, and the model predicts the attributes of the image in the first stage and deduces the class label by searching the class with the most similar attribute set in the second stage. For example, The DAP model proposed in Attribute-based classification for zero-shot visual object classification, published in 2013 in The IEEE schema Analysis and Machine Intelligence journal (The IEEE Transactions on Pattern Analysis and Machine Intelligence), by Christoph et al, estimates The posterior probability of each Attribute of an image by learning a probabilistic Attribute classifier, and then infers a class label of The image by calculating The posterior probability of a class and The maximum posterior estimate. "converting The transmission link" included in The Conference on Computer Vision and Pattern Recognition (The Conference on Computer Vision and Pattern Recognition) 2016: in the classification of class-attribute for unsurpassed zero-shot learning, a probability classifier is firstly learned for each attribute, and then classification is carried out by a random forest method, and the classification method can process some unreliable attributes. This two-stage approach has domain migration problems, for example, while the target task is to predict the labels of classes of images, the intermediate task of DAP is to learn classifiers related to image attributes.
The latest advances in zero-sample learning directly learn the mapping from the image feature space to the attribute semantic space. The ALE model proposed in The article Label-embedding for image classification, which was incorporated in The IEEE standards on Pattern Analysis and Machine Intelligence journal (The IEEE Transactions on Pattern Analysis and Machine Intelligence) in 2016, learns bilinear compatibility functions between images and attribute spaces using a rank-based loss function. The Semantic auto-encoder for zero-shot learning, which was recorded in The international Conference on Computer Vision and Pattern Recognition in 2017, proposed a Semantic auto-encoder that forcibly projects image features into a Semantic space that can be reconstructed from an image. An article entitled "Predicting visual artifacts of unsen classes for zero-shot learning" at The International Conference on Computer Vision in 2017 proposes to classify nearest neighbors by projecting class semantic representations into visual feature space and among these projections.
Zero-sample learning in addition to the commonly used semantic attribute space, some recent work has often involved joint class inference from multiple spaces. For example, an article named "Learning discrete attributes for zero-shot classification" at The International Conference on Computer Vision in 2017 proposed an LAD model using dictionary Learning to obtain a latent feature space that distinguishes but retains semantic information. In The international Computer Vision and Pattern Recognition Conference (The Conference on Computer Vision and Pattern Recognition) in 2018, an article named "cognitive Learning of content Features for Zero-Shot Recognition" proposes a new hidden feature space combining maximization of inter-class distance and minimization of intra-class distance, and performs joint inference on a semantic space and The hidden feature space. In The international Conference on Computer Vision of European Computer Vision in 2018, an article named "Learning Class protocols via Structure Alignment for Zero-Shot registration" proposed CDL model to align Class structures simultaneously in visual and semantic space. However, these methods essentially consider that all attributes are equally important in the classification process, and neglect that they have different distributions, variances, information entropies, and the like between different classes, this processing method is easy to cause misclassification on some challenging images.
Disclosure of Invention
The invention provides a zero sample learning method and a zero sample learning system based on a semantic attribute attention redistribution mechanism, which provide an attention redistribution mode for attribute prediction of each image, and redistribute the importance degree of each attribute in the image classification, thereby achieving a better zero sample learning effect.
The technical scheme of the invention is as follows:
a zero sample learning method based on a semantic attribute attention re-allocation mechanism comprises the following steps:
(1) establishing a neural network model based on a semantic attribute attention redistribution mechanism, wherein the neural network model comprises a visual-semantic space mapping branch, a visual-hidden layer space mapping branch and an attention branch, and when the neural network model is combined with the three branches to carry out image forward derivation, the semantic features of an image in a semantic attribute space, the hidden layer features in the hidden layer space and the attention in the semantic attribute space are respectively obtained;
(2) re-assigning weights between semantic features using attention of a semantic attribute space;
(3) training a neural network model using the labeled image data set;
(4) inputting an image to be tested, calculating the similarity between the weighted semantic features of the image and semantic prototypes of unknown classes, calculating the similarity between the hidden layer features and hidden layer feature prototypes of the unknown classes, and adding the two similarities to obtain the similarity between the test image and each unknown class;
(5) and sorting according to the similarity between the images and the classes, and selecting the class with the maximum similarity as the class prediction of the image.
The zero sample learning method based on semantic attribute attention redistribution is an improved algorithm for joint inference of semantic attribute space and hidden layer space. Compared with the prior algorithm, the combination of the semantic space and the hidden layer space in the method is tighter, and the method comprises the steps that 1) the hidden layer space provides guidance of class information for the semantic attribute space, so that the neural network generates correct and reasonable attention; 2) the semantic attribute space provides an initial method for the prototype construction of the unknown class in the hidden layer space, thereby reducing the defects caused by domain migration. Meanwhile, the model combines the semantic attribute space and the hidden layer space of the redistributed attention to carry out inference, so that the stability of model prediction is greatly improved.
In the step (1), the visual-semantic space mapping branch and the visual-hidden layer space mapping branch use a VGG19 framework network structure as a shared shallow network, and respectively use different full connection layers for feature mapping of different spaces;
the attention branch performs feature extraction on feature maps of different layers in the VGG19 framework network by using a single-layer convolutional neural network with convolution kernel size of 3 and different parameters, and calculates attention of semantic attribute spaces corresponding to the VGG19 feature maps of the different layers by using a feature fusion method.
In the step (1), the specific process of obtaining the semantic features of the image in the semantic attribute space, the hidden layer features in the hidden layer space and the attention in the semantic attribute space is as follows:
extracting image input x using a pre-trained deep convolutional neural networkiDeep visual feature of (theta)iRespectively mapping the deep visual features of the image to a semantic space and a hidden space by using a fully-connected neural network, wherein the calculation formulas of the semantic space and the hidden space are as follows:
Figure BDA0002039097550000041
σi=FC2i)
wherein the content of the first and second substances,
Figure BDA0002039097550000042
representing the vector representation, σ, of an image i in semantic spaceiVector representation, FC, representing image i in hidden layer space1Representing a mapping function, FC, mapped from a visual space to a semantic space2Representing a mapping function that maps from a visual space to a hidden layer space;
selecting an intermediate feature map representation for image i for layer I in a deep convolutional neural network
Figure BDA0002039097550000051
And hidden layer space vector sigmaiThe formula for calculating the semantic attribute attention of the image i at the visual depth of l is as follows:
Figure BDA0002039097550000052
wherein, WlAnd blIs a parameter of a single-layer fully-connected network,
Figure BDA0002039097550000053
is hiddenThe method comprises the following steps of (1) representing a layer vector and representing visual feature fusion when the depth is l, wherein the calculation formula of the feature fusion representation is as follows:
Figure BDA0002039097550000054
wherein, FsqIs a transform function of a matrix, converting a three-dimensional matrix representation of size C x H x W into a two-dimensional matrix representation of C x HW,
Figure BDA0002039097550000055
the representation matrices are summed by channel,
Figure BDA0002039097550000056
is a characteristic diagram
Figure BDA0002039097550000057
K represents the number of channels after feature fusion through a series of convolution results, and the length of the channels is consistent with that of semantic vector representation and hidden layer vector representation; finally, selecting the layer number l belonging to lBThe calculation formula of the attention of the image i in the semantic attribute space is as follows:
Figure BDA0002039097550000058
wherein p isi,lAttention is paid to the semantic attribute of image i at visual depth l.
In the step (2), the attention of the semantic attribute space is used for redistributing the weight among the semantic features, and the calculation formula is as follows:
Figure BDA0002039097550000059
wherein, diag (p)i) Is a diagonal matrix of k x k, with the value on the diagonal being pi
The specific process of the step (3) is as follows:
(3-1) in the data preparation process, the original training data set D is divided into a set consisting of a plurality of triples in advance
Figure BDA00020390975500000510
Wherein for any triplet
Figure BDA00020390975500000511
Figure BDA00020390975500000512
And
Figure BDA00020390975500000513
are from the same class
Figure BDA00020390975500000514
Of the different images of (a) the image,
Figure BDA00020390975500000515
is from and
Figure BDA00020390975500000516
classes of different images
Figure BDA00020390975500000517
The image of (a);
(3-2) in the training process, for each triplet
Figure BDA00020390975500000518
The neural network model trains the neural network by using a mixed loss function L, and the specific calculation formula of the loss function L is as follows:
Figure BDA0002039097550000061
wherein L isFIs a loss function defined in the hidden layer space, LAIs a loss function defined in the semantic attribute space;
The hidden layer space loss function simultaneously maximizes the inter-class distance and minimizes the intra-class distance by using a triple loss function; the specific calculation formula of the hidden layer loss function is as follows:
Figure BDA0002039097550000062
the loss function of the semantic attribute space maximizes the probability of classification in the semantic space using a cross entropy based loss function; the specific calculation formula of the semantic loss function is as follows:
Figure BDA0002039097550000063
where Y is the set of all training classes,
Figure BDA0002039097550000064
is known as class yiThe semantic attribute prototype of (1).
The specific steps of the step (4) are as follows:
(4-1) for input image xiPredicting its semantic vector representation using a trained model
Figure BDA0002039097550000065
Hidden layer vector representation σiAnd its semantic attribute attention pi
(4-2) for any one of the categories yu∈YuWherein Y isuRepresenting classes not covered by training, calculating image x separatelyiAnd type semantic prototype
Figure BDA0002039097550000066
And-like hidden layer prototype
Figure BDA0002039097550000067
Cosine similarity between the semantic attribute space and the hidden layer space; the calculation formula of the cosine similarity of the semantic attribute space is as follows:
Figure BDA0002039097550000068
the calculation formula of the cosine similarity of the hidden layer space is as follows:
Figure BDA0002039097550000069
summing the cosine similarity of the two spaces to obtain an image xiAnd class yuThe similarity between the two is calculated by the formula:
Figure BDA00020390975500000610
the specific steps of the step (5) are as follows:
computing image x using a nearest neighbor search algorithmiIn class set YuClass prediction of
Figure BDA0002039097550000071
The calculation formula is as follows:
Figure BDA0002039097550000072
the zero sample learning algorithm based on semantic attribute attention redistribution has all the advantages of zero sample learning, and can correctly distinguish some difficult samples with semantic ambiguity, such as pigs with speckles and spotted dogs. In practice, the proposed algorithm is found to have much lower variance of the attribute prediction value in the semantic space than the previous zero sample learning algorithm, so that the classification result is influenced by various semantic attributes instead of being dominated by one or a few more prominent attribute predictions when the final retrieval and classification are performed. The algorithm enables the relation between the semantic space and the hidden layer space to be tighter on the basis of the joint inference classification of the semantic space and the hidden layer space, and avoids the problem that the image is correctly classified in one characteristic space and is wrongly classified in the other characteristic space.
The invention also provides a zero sample learning system based on a semantic attribute attention re-allocation mechanism, comprising a computer memory, a computer processor and a computer program stored in the computer memory and executable on the computer processor, wherein the computer memory has stored therein the following modules:
the visual feature module captures the depth visual features of the input image by utilizing a depth convolution neural network;
the vision-semantic mapping module is used for mapping the vision characteristics to a semantic attribute space by utilizing a full-connection neural network;
the vision-hidden layer mapping module is used for mapping the vision characteristics to a hidden layer space by utilizing a full-connection neural network;
the semantic attention module is used for generating attribute attention of a semantic space by utilizing the shallow visual feature of the image and the class information of the hidden space;
the classification retrieval module is used for classifying the images by utilizing the semantic attribute space representation, the hidden layer space representation and the attention of the semantic space of the images;
and the classification generation module is used for outputting a classification result to the outside after the model classification is finished.
Compared with the prior art, the invention has the following beneficial effects:
1. the semantic attribute attention redistribution algorithm provided by the invention has the advantages that through an attention mechanism, the competitive relationship exists among semantic attribute predictions, the classification result is determined by more semantic attributes instead of only depending on part of more prominent semantic attributes, and the error classification on some difficult samples with semantic ambiguity is avoided.
2. The method can avoid the problem of domain migration when the semantic space and the hidden layer space jointly infer classification, so the problem that the classification result tends to train the coverage class when a common zero sample is learned can be avoided.
3. A large number of experiments prove that the model performance superior to other baseline algorithms is demonstrated. The superiority of the model is proved from experiments.
Drawings
FIG. 1 is a schematic overall framework diagram of a zero sample learning method based on a semantic attribute attention redistribution mechanism according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating operation of a semantic attention module of the method according to an embodiment of the present invention;
FIG. 3 is a schematic overall structure diagram of a zero-sample learning system based on a semantic attribute attention re-allocation mechanism according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a predicted distribution of semantic space obtained by using different attention mechanisms according to an embodiment of the present invention.
Detailed Description
The invention will be described in further detail below with reference to the drawings and examples, which are intended to facilitate the understanding of the invention without limiting it in any way.
As shown in FIG. 1, the main model of the present invention is divided into three outputs, a visual feature module and three branch modules, which respectively correspond to the image inputs, and the three branches are synchronously placed into the optimization process of the whole model. The method comprises the following specific steps:
(a) learning input image x in zero sample learning training process by visual feature moduleiDepth visual feature of (theta)iThe method comprises the following basic steps:
(a-1) initializing network model parameters using a pre-trained large neural network ResNet 101. Input image xiCenter-clipped image x 'of 224 × 224 size'iAs the actual input to the network.
(a-2) taking the feature vector of the last non-classified layer of the neural network as an image x'iDepth visual feature of (theta)iThe length of the eigenvector is denoted as V.
(b) The visual-semantic mapping module provides mapping from a deep visual space to a semantic space for a zero sample learning process, and the basic steps are as follows:
(b-1) initializing model parametersAnd (4) counting. Initializing a spatial mapping matrix W1∈Rk×VAnd b1∈Rk
(b-2) expressing the visual characteristics θiMapping to a semantic space to obtain a semantic space representation
Figure BDA0002039097550000091
The specific calculation formula is as follows:
Figure BDA0002039097550000092
(c) the visual-hidden layer mapping module provides mapping from a deep visual space to a hidden layer space for a zero sample learning process, and the basic steps are as follows:
(c-1) initializing the model to obtain W, similarly to the step (b-1)2And b2
(c-2), similar to the step (b-2), mapping the visual features, wherein the specific calculation formula is as follows:
σi=W2θi+b2
(d) the semantic-attention module uses partial class information of the hidden layer space and the shallow visual information of the image to redistribute the weight for the output of the semantic space in zero sample learning, as shown in fig. 2, and the basic steps are as follows:
(d-1) selecting a shallow visual feature map of a specific layer l
Figure BDA0002039097550000093
And hidden layer feature representation σi∈RkAs input to the semantic-attention module. Initializing convolutional neural networks
Figure BDA0002039097550000094
Parameter of (2), parameter W of the fully connected network FC3And b3
(d-2) superficial visual feature map
Figure BDA0002039097550000101
Through a series of convolution transformations
Figure BDA0002039097550000102
Obtaining a characteristic diagram
Figure BDA0002039097550000103
Wherein the size of the feature map is H '× W', the number of channels is k, and the hidden layer feature σiThe length of (a) is kept consistent.
(d-3) feature map
Figure BDA0002039097550000104
Hidden layer representation sigma of imagei∈RkChannel-by-channel addition is carried out, and the generated matrix is converted into column vectors channel-by-channel to obtain hidden variables
Figure BDA0002039097550000105
(d-4) obtaining shallow visual characteristics of the set image through the fully connected neural network FC by the hidden variable
Figure BDA0002039097550000106
And hidden layer feature σiThe attention representation p with a visual depth l in semantic spacei,lThe specific calculation formula is as follows:
Figure BDA0002039097550000107
(d-5) selecting four specific network depths l epsilon lBRepeating the steps 1-4, and accumulating the obtained attention to obtain the total semantic attribute attention of the image in the semantic attribute space, wherein the calculation formula is as follows:
Figure BDA0002039097550000108
the training steps of the zero sample learning method based on semantic attribute attention redistribution are as follows:
1. initializing a training data set D { (x)i,yi) In which xiRepresenting an input image, yie.Y represents the class label of the input image, Y represents the set of classes covered by the training set, Y for each classs∈Y,
Figure BDA0002039097550000109
Is the prototype vector of the class in semantic space. Sorting a dataset into sets of triples
Figure BDA00020390975500001010
Wherein
Figure BDA00020390975500001011
And
Figure BDA00020390975500001012
are from the same class
Figure BDA00020390975500001013
Of the different images of (a) the image,
Figure BDA00020390975500001014
is from and
Figure BDA00020390975500001015
classes of different images
Figure BDA00020390975500001016
The image of (2).
2. Selecting a triad pair
Figure BDA00020390975500001017
As the input of the network model, the vector representation of the attention of each image in the semantic space, the hidden layer space and the semantic attribute is obtained.
3. Minimizing the intra-class distance using a triplet-based loss function to simultaneously maximize the inter-class distance. The specific calculation formula of the hidden layer loss function is as follows:
Figure BDA00020390975500001018
a cross entropy based loss function is used to maximize the probability of classification in the semantic space. The specific calculation formula of the semantic loss function is as follows:
Figure BDA0002039097550000111
4. and (5) repeating the steps 2-3 by adopting a gradient descent method, and training the parameters of each module.
5. Using the average hidden layer space representation of the training picture as a class hidden layer prototype of the training coverage class, wherein the calculation formula is as follows:
Figure BDA0002039097550000112
wherein N issRepresents class ysThe number of training samples. S-use of ridge regression to compute hidden vector representations for training uncovered classes
Figure BDA0002039097550000113
The specific calculation formula is as follows:
Figure BDA0002039097550000114
Figure BDA0002039097550000115
the sample classification step of the zero sample learning method based on semantic attribute attention redistribution is as follows:
1. for an input image xiPredicting its semantic vector representation using a trained model
Figure BDA0002039097550000116
Hidden layer vector representation σiAnd its semantic attribute attention pi
2. For any class yu∈YuWherein Y isuRepresenting classes not covered by training, calculating image x separatelyiAnd type semantic prototype
Figure BDA0002039097550000117
And-like hidden layer prototype
Figure BDA0002039097550000118
Cosine similarity in semantic space and hidden layer space. The formula for calculating the cosine similarity of the semantic space is as follows:
Figure BDA0002039097550000119
the calculation formula of the cosine similarity of the hidden layer space is as follows:
Figure BDA00020390975500001110
summing the cosine similarity of the two spaces to obtain an image xiAnd class yuThe similarity between the two is calculated by the formula:
Figure BDA0002039097550000121
3. computing image x using a nearest neighbor search algorithmiIn class set YuClass prediction of
Figure BDA0002039097550000122
The calculation formula is as follows:
Figure BDA0002039097550000123
as shown in fig. 3, a zero sample classification system based on semantic attribute attention redistribution is divided into six modules, which are a visual feature module, a visual-semantic mapping module, a visual-hidden layer mapping module, a semantic-attention module, a classification retrieval module, and a classification generation module.
The method is applied to the following embodiments to achieve the technical effects of the present invention, and detailed steps in the embodiments are not described again.
This embodiment is compared on the three large public data sets AwA2, CUB and SUN with other current leading zero sample learning methods. AwA2 is a coarse-grained, medium-sized dataset from 37322 images of 50 animal categories with 85 user-defined attributes. The CUB is a fine-grained dataset consisting of 11788 images from 200 different birds, with 312 user-defined attributes. The SUN is another fine-grained dataset that includes 14340 images from 717 different scenes, providing 102 user-defined attributes. The data set is divided into two parts: training set and test set, divided differently on different data sets. On the AwA2 data set, 40 animals were used as training set and 10 animals were used as test set; similarly, 150 classes are used as training sets of CUBs, 50 classes are used as test sets of CUBs, 645 classes are used as training sets of SUNs, and 72 classes are used as test sets of SUNs. The evaluation index of this embodiment is the class average identification accuracy, 5 current mainstream zero sample identification algorithms are compared in total, and the overall comparison result is shown in table 1.
TABLE 1
Figure BDA0002039097550000124
Figure BDA0002039097550000131
As can be seen from Table 1, the zero sample learning framework based on semantic attribute attention redistribution provided by the invention obtains the optimal effect under each large evaluation index, and fully shows the superiority of the algorithm of the invention.
To further illustrate that the algorithm proposed by the present invention does suppress some of the more prominent predicted values in semantic space, the present invention compares the experimental comparison of "not using attention mechanism", "using attention mechanism based on sigmoid function" and "using attention mechanism using softmax function" on the CUB data set, and the experimental results are shown in table 2.
TABLE 2
Method Class average accuracy (%) Semantic predictor variance x 10-3
w/o Attention 62.1 2.48
w/Sigmoid Attention 73.5 1.75
w/Softmax Attention 81.1 0.86
As can be seen from table 2, the attention mechanism based on the softmax function achieves the optimal experimental result in the experiment, and it is found that as the variance of the semantic space prediction value decreases, the suppression effect of the prominent prediction value is better, the effect of the model is also better, and the effectiveness of the attention mechanism in zero sample learning is fully explained.
In addition, the kernel function estimation of the semantic predicted value as shown in fig. 4 also reflects that the attention mechanism limits the prediction distribution of the semantic space within a specific range, which fully shows that the algorithm provided by the invention achieves the effect of suppressing the abnormal semantic predicted value by redistributing the attribute weight of the semantic space.
The embodiments described above are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only specific embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims (5)

1. A zero sample learning method based on a semantic attribute attention re-allocation mechanism is characterized by comprising the following steps:
(1) establishing a neural network model based on a semantic attribute attention redistribution mechanism, wherein the neural network model comprises a visual-semantic attribute space mapping branch, a visual-hidden layer space mapping branch and an attention branch, so that when the image is subjected to forward derivation of the network, the semantic features of the image in a semantic attribute space, the hidden layer features in a hidden layer space and the attention in the semantic attribute space are respectively obtained;
(2) re-assigning weights between semantic features using attention of a semantic attribute space; the calculation formula is as follows:
Figure FDA0002869838940000011
wherein, diag (p)i) Is a diagonal matrix of k x k, with the value on the diagonal being pi
Figure FDA0002869838940000012
Representing a vector representation of an image i in semantic space;
(3) training a neural network model using the labeled image data set; the specific process is as follows:
(3-1) in the data preparation process, the original training data set D is divided into a set consisting of a plurality of triples in advance
Figure FDA0002869838940000013
Wherein for any triplet
Figure FDA0002869838940000014
Figure FDA0002869838940000015
And
Figure FDA0002869838940000016
are from the same class
Figure FDA0002869838940000017
Of the different images of (a) the image,
Figure FDA0002869838940000018
is from and
Figure FDA0002869838940000019
classes of different images
Figure FDA00028698389400000110
The image of (a);
(3-2) in the training process, for each triplet
Figure FDA00028698389400000111
The neural network model trains the neural network by using a mixed loss function L, and the specific calculation formula of the loss function L is as follows:
Figure FDA00028698389400000112
wherein L isFIs a loss function defined in the hidden layer space, LAIs a loss function defined in the semantic attribute space;
the hidden layer space loss function simultaneously maximizes the inter-class distance and minimizes the intra-class distance by using a triple loss function; the specific calculation formula of the hidden layer loss function is as follows:
Figure FDA0002869838940000021
the loss function of the semantic attribute space maximizes the probability of classification in the semantic space using a cross entropy based loss function; the specific calculation formula of the semantic loss function is as follows:
Figure FDA0002869838940000022
where Y is the set of all training classes,
Figure FDA0002869838940000023
is known as class yiThe semantic attribute prototype of (2);
(4) inputting an image to be tested, calculating the similarity between the weighted semantic features of the image and semantic prototypes of unknown classes, calculating the similarity between the hidden layer features and hidden layer feature prototypes of the unknown classes, and adding the two similarities to obtain the similarity between the test image and each unknown class;
(5) and sorting according to the similarity between the images and the classes, and selecting the class with the maximum similarity as the class prediction of the image.
2. The zero-sample learning method based on semantic attribute attention re-allocation mechanism as claimed in claim 1, wherein in step (1), the visual-semantic space mapping branch and the visual-hidden layer space mapping branch use VGG19 skeleton network structure as shared shallow layer network, and use different fully connected layers for feature mapping of different spaces respectively;
the attention branch performs feature extraction on feature maps of different layers in the VGG19 framework network by using a single-layer convolutional neural network with convolution kernel size of 3 and different parameters, and calculates attention of semantic attribute spaces corresponding to the VGG19 feature maps of the different layers by using a feature fusion method.
3. The zero-sample learning method based on the semantic attribute attention re-allocation mechanism as claimed in claim 1, wherein in the step (1), the specific processes of obtaining the semantic features of the image in the semantic attribute space, the hidden layer features in the hidden layer space and the attention in the semantic attribute space are as follows:
extracting image input x using a pre-trained deep convolutional neural networkiDeep visual feature of (theta)iRespectively mapping the deep visual features of the image to a semantic space and a hidden space by using a fully-connected neural network, wherein the calculation formulas of the semantic space and the hidden space are as follows:
Figure FDA0002869838940000024
σi=FC2i)
wherein the content of the first and second substances,
Figure FDA0002869838940000031
representing the vector representation, σ, of an image i in semantic spaceiVector representation, FC, representing image i in hidden layer space1Representing a mapping function, FC, mapped from a visual space to a semantic space2Representing a mapping function that maps from a visual space to a hidden layer space;
selecting an intermediate feature map representation for image i for layer I in a deep convolutional neural network
Figure FDA0002869838940000032
And hidden layer space vector sigmaiThe formula for calculating the semantic attribute attention of the image i at the visual depth of l is as follows:
Figure FDA0002869838940000033
wherein, WlAnd blIs a parameter of a single-layer fully-connected network,
Figure FDA0002869838940000034
the method is characterized by comprising the following steps of (1) hidden layer vector representation and visual feature fusion representation when the depth is l, wherein the calculation formula of the feature fusion representation is as follows:
Figure FDA0002869838940000035
wherein, FsqIs a transform function of a matrix, converting a three-dimensional matrix representation of size C x H x W into a two-dimensional matrix representation of C x HW,
Figure FDA0002869838940000036
the representation matrices are summed by channel,
Figure FDA0002869838940000037
is a characteristic diagram
Figure FDA0002869838940000038
K represents the number of channels after feature fusion through a series of convolution results, and the length of the channels is consistent with that of semantic vector representation and hidden layer vector representation; finally, selecting the layer number l belonging to lBThe calculation formula of the attention of the image i in the semantic attribute space is as follows:
Figure FDA0002869838940000039
wherein p isi,lAttention is paid to the semantic attribute of image i at visual depth l.
4. The zero-sample learning method based on the semantic attribute attention re-allocation mechanism as claimed in claim 1, wherein the step (4) comprises the following specific steps:
(4-1) for input image xiPredicting its semantic vector representation using a trained model
Figure FDA00028698389400000312
Hidden layer vector representation σiAnd its semantic attribute attention pi
(4-2) for any one of the categories yu∈YuWherein Y isuRepresenting classes not covered by training, calculating image x separatelyiAnd type semantic prototype
Figure FDA00028698389400000310
And-like hidden layer prototype
Figure FDA00028698389400000311
Cosine similarity between the semantic attribute space and the hidden layer space; the calculation formula of the cosine similarity of the semantic attribute space is as follows:
Figure FDA0002869838940000041
in the formula, diag (p)i) Is a diagonal matrix of k x k, with the value on the diagonal being pi
The calculation formula of the cosine similarity of the hidden layer space is as follows:
Figure FDA0002869838940000042
summing the cosine similarity of the two spaces to obtain an image xiAnd class yuThe similarity between the two is calculated by the formula:
Figure FDA0002869838940000043
5. the zero-sample learning method based on the semantic attribute attention re-allocation mechanism as claimed in claim 4, wherein the step (5) comprises the following specific steps:
computing image x using a nearest neighbor search algorithmiIn class set YuClass prediction of
Figure FDA0002869838940000044
The calculation formula is as follows:
Figure FDA0002869838940000045
CN201910335801.6A 2019-04-24 2019-04-24 Zero sample learning method and system based on semantic attribute attention redistribution mechanism Active CN110163258B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910335801.6A CN110163258B (en) 2019-04-24 2019-04-24 Zero sample learning method and system based on semantic attribute attention redistribution mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910335801.6A CN110163258B (en) 2019-04-24 2019-04-24 Zero sample learning method and system based on semantic attribute attention redistribution mechanism

Publications (2)

Publication Number Publication Date
CN110163258A CN110163258A (en) 2019-08-23
CN110163258B true CN110163258B (en) 2021-04-09

Family

ID=67639900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910335801.6A Active CN110163258B (en) 2019-04-24 2019-04-24 Zero sample learning method and system based on semantic attribute attention redistribution mechanism

Country Status (1)

Country Link
CN (1) CN110163258B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866140B (en) * 2019-11-26 2024-02-02 腾讯科技(深圳)有限公司 Image feature extraction model training method, image searching method and computer equipment
CN111222471B (en) * 2020-01-09 2022-07-15 中国科学技术大学 Zero sample training and related classification method based on self-supervision domain perception network
CN111428733B (en) * 2020-03-12 2023-05-23 山东大学 Zero sample target detection method and system based on semantic feature space conversion
CN111461025B (en) * 2020-04-02 2022-07-05 同济大学 Signal identification method for self-evolving zero-sample learning
CN111738313B (en) * 2020-06-08 2022-11-11 大连理工大学 Zero sample learning algorithm based on multi-network cooperation
CN112100380B (en) * 2020-09-16 2022-07-12 浙江大学 Generation type zero sample prediction method based on knowledge graph
CN112257808B (en) * 2020-11-02 2022-11-11 郑州大学 Integrated collaborative training method and device for zero sample classification and terminal equipment
CN112633382B (en) * 2020-12-25 2024-02-13 浙江大学 Method and system for classifying few sample images based on mutual neighbor
CN112686318B (en) * 2020-12-31 2023-08-29 广东石油化工学院 Zero sample learning mechanism based on sphere embedding, sphere alignment and sphere calibration
CN113077427B (en) * 2021-03-29 2023-04-25 北京深睿博联科技有限责任公司 Method and device for generating class prediction model
CN113326892A (en) * 2021-06-22 2021-08-31 浙江大学 Relation network-based few-sample image classification method
CN113627470B (en) * 2021-07-01 2023-09-05 汕头大学 Zero-order learning-based unknown event classification method for optical fiber early warning system
CN113435531B (en) * 2021-07-07 2022-06-21 中国人民解放军国防科技大学 Zero sample image classification method and system, electronic equipment and storage medium
CN113343941B (en) * 2021-07-20 2023-07-25 中国人民大学 Zero sample action recognition method and system based on mutual information similarity
CN113642621A (en) * 2021-08-03 2021-11-12 南京邮电大学 Zero sample image classification method based on generation countermeasure network
CN114627312B (en) * 2022-05-17 2022-09-06 中国科学技术大学 Zero sample image classification method, system, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679556A (en) * 2017-09-18 2018-02-09 天津大学 The zero sample image sorting technique based on variation autocoder
CN108846413A (en) * 2018-05-21 2018-11-20 复旦大学 A kind of zero sample learning method based on global semantic congruence network
CN109447115A (en) * 2018-09-25 2019-03-08 天津大学 Zero sample classification method of fine granularity based on multilayer semanteme supervised attention model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679556A (en) * 2017-09-18 2018-02-09 天津大学 The zero sample image sorting technique based on variation autocoder
CN108846413A (en) * 2018-05-21 2018-11-20 复旦大学 A kind of zero sample learning method based on global semantic congruence network
CN109447115A (en) * 2018-09-25 2019-03-08 天津大学 Zero sample classification method of fine granularity based on multilayer semanteme supervised attention model

Also Published As

Publication number Publication date
CN110163258A (en) 2019-08-23

Similar Documents

Publication Publication Date Title
CN110163258B (en) Zero sample learning method and system based on semantic attribute attention redistribution mechanism
Ma et al. Facial expression recognition using constructive feedforward neural networks
CN113139591B (en) Generalized zero-sample image classification method based on enhanced multi-mode alignment
WO2014205231A1 (en) Deep learning framework for generic object detection
CN112633382B (en) Method and system for classifying few sample images based on mutual neighbor
CN111027576B (en) Cooperative significance detection method based on cooperative significance generation type countermeasure network
CN109840518B (en) Visual tracking method combining classification and domain adaptation
CN109242097B (en) Visual representation learning system and method for unsupervised learning
Dev et al. Multi-level semantic labeling of sky/cloud images
Sankaran et al. Representation learning through cross-modality supervision
US8488873B2 (en) Method of computing global-to-local metrics for recognition
CN112507778B (en) Loop detection method of improved bag-of-words model based on line characteristics
CN113705596A (en) Image recognition method and device, computer equipment and storage medium
CN115147632A (en) Image category automatic labeling method and device based on density peak value clustering algorithm
CN110111365B (en) Training method and device based on deep learning and target tracking method and device
CN114358250A (en) Data processing method, data processing apparatus, computer device, medium, and program product
Sang et al. Image recognition based on multiscale pooling deep convolution neural networks
CN116543250A (en) Model compression method based on class attention transmission
CN116109649A (en) 3D point cloud instance segmentation method based on semantic error correction
CN115661539A (en) Less-sample image identification method embedded with uncertainty information
CN114780767A (en) Large-scale image retrieval method and system based on deep convolutional neural network
CN115049889A (en) Storage medium and inference method
CN114049531A (en) Pedestrian re-identification method based on weak supervision human body collaborative segmentation
Zeng et al. Feature difference for single‐shot object detection
Takahashi et al. Domain Adaptation for Agricultural Image Recognition and Segmentation Using Category Maps

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant