CN115546553A - Zero sample classification method based on dynamic feature extraction and attribute correction - Google Patents

Zero sample classification method based on dynamic feature extraction and attribute correction Download PDF

Info

Publication number
CN115546553A
CN115546553A CN202211268579.0A CN202211268579A CN115546553A CN 115546553 A CN115546553 A CN 115546553A CN 202211268579 A CN202211268579 A CN 202211268579A CN 115546553 A CN115546553 A CN 115546553A
Authority
CN
China
Prior art keywords
attribute
feature extraction
features
attribute correction
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211268579.0A
Other languages
Chinese (zh)
Inventor
贺喆南
徐浚哲
吕建成
汤臣薇
江姗霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202211268579.0A priority Critical patent/CN115546553A/en
Publication of CN115546553A publication Critical patent/CN115546553A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a zero sample classification method based on dynamic feature extraction and attribute correction, which comprises the following steps: acquiring a visual sample and semantic features; constructing a zero sample learning network based on dynamic feature extraction and attribute correction; transmitting the visual sample and the semantic features to a zero sample learning network based on dynamic feature extraction and attribute correction, obtaining and calculating a loss value according to the visual sample features and the corrected semantic features, and reversely transmitting the loss value and repeating the steps until the training is finished; verifying the trained zero sample learning network based on dynamic feature extraction and attribute correction; if the accuracy is higher than the preset value, entering the next step; otherwise, returning to the previous step; and classifying the data set by adopting a trained zero sample learning network based on dynamic feature extraction and attribute correction. The invention adopts different feature extraction methods aiming at different properties of different attributes, provides an attribute correction concept and enhances the characterization capability of the network.

Description

Zero sample classification method based on dynamic feature extraction and attribute correction
Technical Field
The invention relates to the field of zero sample identification, in particular to a zero sample classification method based on dynamic feature extraction and attribute correction.
Background
In the conventional deep learning classification algorithm research, samples in a training set include all label distribution information in the data set, and at this time, a model can grasp all knowledge of sample distribution through learning of the training set and verify the learning effect of the model through the prediction accuracy of the test model on the test set. In such a case, it is critical to verify the effect of the model that the training set and the test set have the same label. However, in some special application scenarios, training samples of some categories may be difficult to obtain or labels of the samples are difficult to label, and at this time, due to the absence of label information of the samples, a model trained in advance cannot be predicted on the categories, which greatly limits the application range of the deep learning model. Therefore, to solve the problem of prediction on new classes, a zero sample learning task is proposed that attempts to allow a model to accurately identify samples of classes that are not seen in the training set, while already being able to identify classes that are already in the training set. The task of enabling the model to learn the invisible type knowledge without seeing any sample greatly widens the application range of deep learning, and has high research value.
To study zero sample learning, researchers have proposed and designed several data sets, each of which includes a large number of visual samples X. Assume all categories of all visual samples are
Figure BDA0003894453150000011
Wherein
Figure BDA0003894453150000012
Is a modelThe number of visible classes that can be seen during training is N s Visual samples belonging to the visible category may be recorded as X s
Figure BDA0003894453150000013
The classes are invisible classes used by the test set to detect the learning performance of the model zero sample, and the number of the classes is N u A visual sample belonging to the invisible category may be recorded as X u . Notably, the visible and invisible classes are without any coincidence and encompass all classes in the dataset, i.e., all classes in the dataset
Figure BDA0003894453150000014
In order for a model to learn without a sample, researchers have introduced the concept of semantic features into a data set, one for each class
Figure BDA0003894453150000021
It is assumed that the semantic features of all classes in a dataset can be denoted as A, where
Figure BDA0003894453150000022
A semantic feature that represents all visible classes,
Figure BDA0003894453150000023
the semantic features of the invisible class are represented, and K represents the dimensions of the semantic feature vector, wherein each dimension can be represented as a specific attribute, so that each semantic feature can be represented by a combination of K attributes. When the zero sample learning model is trained, the model can see visual samples X of visible classes s And semantics a of all classes including invisible classes. The zero sample learning aims to use the semantic feature A as a bridge, so that the model learns the relationship between the corresponding visual samples through the relationship between the semantics of the visible class and the invisible class, and therefore accurate prediction is made on the visual samples of the invisible class in the test set.
Currently, zero sample learning has three main techniques:
the first prior art is as follows: the learning algorithm based on cross-modal mapping maps visual samples originally distributed in a visual space and semantic features distributed in a semantic space to the same space, aligns the visual sample distribution by taking the semantic features as central points, and maps invisible visual samples to the space for classification in a testing stage.
The method has the disadvantages that the feature extraction quality of the visual samples cannot be guaranteed, the global features and the semantic features of the visual samples are used for alignment, but the extraction and understanding of the local features of the samples are neglected, so that certain redundant features of the visual samples influence the training of a model, and the performance of an algorithm is reduced finally.
The second prior art is: based on the method of generation, the method directly hits the core problem of zero sample learning: samples of the invisible class are missing. And finally converting the zero-sample learning task into a standard supervised learning task by generating a large number of samples of invisible classes with semantics as reference.
The main disadvantage of the technology is similar to that of the prior art, namely that global features are used as feature expressions of visual samples for model training, and the importance of local features is ignored. In order to realize high-quality generation of invisible class samples, a model is often expected to have good generation capability on specific attributes related to semantics, while background parts unrelated to semantics are not so important, but the generation method based on global features is not considered to this point, so that the generation quality cannot be guaranteed.
The prior art is three: the method based on the attention mechanism comprises the steps of decomposing semantics into different attributes, extracting features on a visual picture by taking the attributes as units, aligning the extracted attribute features as feature expression and semantics of the picture, and well predicting an invisible class visual sample according to the features extracted from the attributes because the semantics are combined by the different attributes and the attributes are universal among different classes.
Although the technical route considers the importance of local features for the first time, the technology still has two major disadvantages, the first one is that the category of the attribute is not discussed in a targeted way. Semantic attributes can be generally divided into two categories, the first being low-level texture-based attributes that tend to describe color or shape features of specific parts of the subject of the visual sample, which can be easily extracted by the model. Another attribute is a high level abstract attribute that requires an understanding of the relevant content, such as the grass attribute of an animal, which cannot be captured by a low level texture. The existing scheme uniformly uses a set of methods for extracting low-level texture attributes to extract the characteristics of all attributes, and lacks of consideration for high-level abstract attributes. Another disadvantage is that the existing techniques tend to target a fixed attribute feature for prediction, but in practice, the semantic features may also change due to the fact that different visual samples may be taken due to different angles and different light rays. Therefore, all visual samples in one class are described by fixed attribute values, so that the characteristic change of attributes in different visual samples is ignored, and finally, the characteristic extraction effect is poor.
Disclosure of Invention
Aiming at the defects in the prior art, the zero sample classification method based on dynamic feature extraction and attribute correction solves the problems that high-level abstract attributes are not considered and the feature change of the attributes in different visual samples is ignored in the prior art.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that: a zero sample classification method based on dynamic feature extraction and attribute correction comprises the following steps:
s1, obtaining a visual sample x and semantic features alpha;
s2, constructing a zero sample learning network based on dynamic feature extraction and attribute correction;
s3, transmitting the visual samples and the semantic features to a zero sample learning network based on dynamic feature extraction and attribute correction to obtain and calculate a loss function according to the visual sample features and the corrected semantic features, calculating a loss value according to the loss function and transmitting the loss value back in a gradient manner; repeating the steps until the training is finished;
s4, verifying the trained zero sample learning network based on dynamic feature extraction and attribute correction; if the accuracy is higher than the preset value, the step S5 is carried out; otherwise, entering step S3;
and S5, classifying the data set by adopting the trained zero sample learning network based on dynamic feature extraction and attribute correction.
Furthermore, the zero sample learning network based on dynamic feature extraction and attribute correction comprises a feature extraction backbone network, an attribute positioning network, an attribute correction network, a scale control unit and a loss value calculation module;
the first output end of the feature extraction backbone network is connected with the first input end of the attribute correction network; the second output end of the feature extraction backbone network is connected with the first input end of the attribute positioning network; the third output end of the feature extraction backbone network is connected with the input end of the scale control unit; a first output end of the scale control unit is connected with a second input end of the attribute correction network; a second output end of the scale control unit is connected with a second input end of the attribute positioning network; and the output end of the attribute positioning network and the output end of the attribute correction network are connected with a loss value calculation module.
Further, the specific implementation manner of step S3 is as follows:
s3-1, positioning the feature attributes of the visual sample through an attribute positioning network and extracting local features and global features;
s3-2, extracting local features and global features required by attribute correction through an attribute correction network;
s3-3, fusing local features and global features extracted by the attribute positioning network and the attribute correction network through the scale control unit to obtain an attribute correction value and visual sample features;
s3-4, correcting the semantic features according to the attribute correction values to obtain corrected semantic features;
s3-5, calculating a loss value according to the distance between the visual sample characteristic and the corrected semantic characteristic; returning the loss value, and updating the zero sample learning network parameters based on dynamic feature extraction and attribute correction.
Further, the specific implementation manner of step S3-1 is as follows:
s3-1-1, obtaining a visual sample characteristic diagram of a visual sample x through a characteristic extraction backbone network
Figure BDA0003894453150000051
C represents the number of channels of the characteristic diagram, namely the dimension of the characteristic of each pixel point; h represents the height of the feature map; w represents the width of the feature map;
Figure BDA0003894453150000052
a shape representing the data;
s3-1-2, according to the formula:
Figure BDA0003894453150000053
obtaining local features u of a visual sample L (ii) a Wherein i represents the height of the feature map; j represents the width of the feature map;
Figure BDA0003894453150000054
the attribute graph represents the distribution of the attributes on the feature graph, K represents the number of the attributes, w represents the attention weight, and v represents the specific distribution value of the attributes; the softmax function represents normalizing the pixel values of the feature map on each channel to between 0-1; phi is a v And phi w Represents two convolution layers with convolution kernel size of 1 × 1;
Figure BDA0003894453150000055
s3-1-3, according to the formula:
Figure BDA0003894453150000056
obtaining global features u of visual samples G (ii) a Wherein;
Figure BDA0003894453150000057
i' represents the height of the feature map; j' represents the width of the feature map.
Further, the specific implementation manner of step S3-2 is as follows:
according to the formula:
Figure BDA0003894453150000061
obtaining local characteristics t of each attribute L And global feature t of each attribute G (ii) a Wherein phi is r A convolution kernel having a convolution kernel size of 1 × 1, which represents a calculation attribute correction value; max c’,d’ Representing global maximum pooling; c' represents the height of the feature map; d' represents the width of the feature map;
Figure BDA0003894453150000062
further, the specific implementation manner of step S3-3 is as follows:
s3-3-1, according to the formula:
Figure BDA0003894453150000063
obtaining the probability g of whether the attribute is a local attribute or a global attribute; wherein phi is s A convolution layer representing a convolution kernel of 1 × 1; c represents the height of the feature map; d represents the width of the feature map;
s3-3-2, according to the formula:
Figure BDA0003894453150000064
obtaining attribute correction values
Figure BDA0003894453150000065
And a visual sample characteristic ψ (x).
Further, the specific implementation manner of step S3-4 is as follows:
according to the formula:
Figure BDA0003894453150000066
obtaining corrected semantic feature pi m (alpha); wherein normaize indicates normalizing the vector length to 1;
Figure BDA0003894453150000067
represents the value corresponding to the nth dimension in the semantic features of the mth class, n =1, 2., K;
Figure BDA0003894453150000068
represent
Figure BDA0003894453150000069
The nth dimension of (a).
Further, the specific implementation manner of step S3-5 is as follows:
s3-5-1, according to the formula:
Figure BDA0003894453150000071
get the classification loss
Figure BDA0003894453150000072
And loss of distance
Figure BDA0003894453150000073
Wherein N is B A batch size representing a visual sample of each round of learning sampling; exp denotes the natural index; cos represents cosine similarity, tau represents temperature coefficient, alpha y Representation and sample x p Semantic features corresponding to the same class;
Figure BDA0003894453150000074
represents the square of the L2 norm;
Figure BDA0003894453150000075
is a visible class; alpha (alpha) ("alpha") q Is the semantic feature of the qth class in the visible class;
s3-5-2, according to the formula:
Figure BDA0003894453150000076
obtaining the difference between the predicted value and the true value of the zero sample learning network based on the dynamic feature extraction and the attribute correction
Figure BDA0003894453150000077
I.e. the final loss function.
And S3-5-3, calculating a loss value according to the loss function, carrying out gradient feedback, and updating the zero sample learning network parameters based on dynamic feature extraction and attribute correction.
The invention has the beneficial effects that: the invention classifies the semantic attributes and designs a set of comprehensive attribute feature extraction method. For those low-level texture-based attributes, attention-based local feature extraction is still retained, and for those high-level abstract attributes based on content understanding, the global features of the visual sample are used as their feature expression. And performing feature fusion on the local features and the global features, taking a gate control unit as a weight, adjusting the attribute ratio of the two features, and finally realizing the extraction of the features of the visual sample. And an attribute correction concept is provided, and an attribute correction module is designed to modify the attribute value, so that the value of the attribute is closer to the real expression of the visual sample. And aligning the characteristics extracted from the visual sample with the corrected attributes to enhance the characterization capability of the network.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a diagram of a network architecture according to the present invention;
FIG. 3 is a visualization of an attribute localization module attention mechanism.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in fig. 1, a zero sample classification method based on dynamic feature extraction and attribute modification includes the following steps:
s1, obtaining a visual sample x and semantic features alpha;
s2, constructing a zero sample learning network based on dynamic feature extraction and attribute correction;
s3, transmitting the visual samples and the semantic features to a zero sample learning network based on dynamic feature extraction and attribute correction to obtain and calculate a loss function according to the visual sample features and the corrected semantic features, calculating a loss value according to the loss function and transmitting the loss value back in a gradient manner; repeating the steps until the training is finished;
s4, verifying the trained zero sample learning network based on dynamic feature extraction and attribute correction; if the accuracy is higher than the preset value, the step S5 is carried out; otherwise, entering step S3;
and S5, classifying the data set by adopting the trained zero sample learning network based on dynamic feature extraction and attribute correction.
The specific implementation manner of step S3 is as follows:
s3-1, positioning the feature attributes of the visual sample through an attribute positioning network and extracting local features and global features;
s3-2, extracting local features and global features required by attribute correction through an attribute correction network;
s3-3, fusing local features and global features extracted by the attribute positioning network and the attribute correction network through the scale control unit to obtain an attribute correction value and visual sample features;
s3-4, correcting the semantic features according to the attribute correction values to obtain corrected semantic features;
s3-5, calculating a loss value according to the distance between the visual sample characteristic and the corrected semantic characteristic; returning a loss value, and updating zero sample learning network parameters based on dynamic feature extraction and attribute correction.
The specific implementation manner of the step S3-1 is as follows:
s3-1-1, obtaining a visual sample characteristic diagram of a visual sample x through a characteristic extraction backbone network
Figure BDA0003894453150000091
Wherein, C represents the channel number of the characteristic diagram, namely the dimension of the characteristic of each pixel point; h represents the height of the feature map; w represents the width of the feature map;
Figure BDA0003894453150000092
a shape representing the data;
s3-1-2, according to the formula:
Figure BDA0003894453150000093
obtaining local features u of a visual sample L (ii) a Wherein i represents the height of the feature map; j represents the width of the feature map;
Figure BDA0003894453150000094
the attribute graph represents the distribution of the attributes on the feature graph, K represents the number of the attributes, w represents the attention weight, and v represents the specific distribution value of the attributes; the softmax function represents normalizing the pixel values of the feature map on each channel to between 0-1; phi is a unit of v And phi w Represents two convolution layers with convolution kernel size of 1 × 1;
Figure BDA0003894453150000095
s3-1-3, according to the formula:
Figure BDA0003894453150000096
obtaining global characteristics u of visual sample G (ii) a Wherein;
Figure BDA0003894453150000097
i' represents the height of the feature map; j' represents the width of the feature map.
The specific implementation manner of the step S3-2 is as follows:
according to the formula:
Figure BDA0003894453150000101
obtaining local characteristics t of each attribute L And global feature t of each attribute G (ii) a Wherein phi is r A convolution kernel having a convolution kernel size of 1 × 1, which represents a calculation attribute correction value; max c’,d’ Representing global maximum pooling; c' represents the height of the feature map; d' represents the width of the feature map;
Figure BDA0003894453150000102
the specific implementation manner of the step S3-3 is as follows:
s3-3-1, according to the formula:
Figure BDA0003894453150000103
obtaining the probability g of whether the attribute is a local attribute or a global attribute; wherein phi is s A convolutional layer representing a convolution kernel of 1 × 1; c represents the height of the feature map; d represents the width of the feature map;
s3-3-2, according to the formula:
Figure BDA0003894453150000104
obtaining attribute correction values
Figure BDA0003894453150000105
And a visual sample characteristic ψ (x).
The specific implementation manner of step S3-4 is as follows:
according to the formula:
Figure BDA0003894453150000106
obtaining corrected semantic feature pi m (α); wherein normaize indicates normalizing the vector length to 1;
Figure BDA0003894453150000107
represents the value corresponding to the nth dimension in the semantic features of the mth class, n =1, 2., K;
Figure BDA0003894453150000108
to represent
Figure BDA0003894453150000109
The nth dimension of (a).
The specific implementation manner of the step S3-5 is as follows:
s3-5-1, according to a formula:
Figure BDA0003894453150000111
get the classification loss
Figure BDA0003894453150000112
And distance loss
Figure BDA0003894453150000113
Wherein N is B A batch size representing a visual sample of each round of learning sampling; exp denotes the natural index; cos represents cosine similarity, τ represents temperature coefficient, α y Representation and sample x p Semantic features corresponding to the same class;
Figure BDA0003894453150000114
represents the square of the L2 norm;
Figure BDA0003894453150000115
is a visible class; alpha is alpha q Is the semantic feature of the qth class in the visible class;
s3-5-2, according to the formula:
Figure BDA0003894453150000116
obtaining the difference between the predicted value and the true value of the zero sample learning network based on the dynamic feature extraction and the attribute correction
Figure BDA0003894453150000117
I.e. the final loss function.
And S3-5-3, calculating a loss value according to the loss function, carrying out gradient feedback, and updating the zero sample learning network parameters based on dynamic feature extraction and attribute correction.
As shown in fig. 2, the zero-sample learning network based on dynamic feature extraction and attribute correction includes a feature extraction backbone network, an attribute positioning network, an attribute correction network, a scale control unit, and a loss value calculation module;
the first output end of the feature extraction backbone network is connected with the first input end of the attribute correction network; the second output end of the feature extraction backbone network is connected with the first input end of the attribute positioning network; the third output end of the feature extraction backbone network is connected with the input end of the scale control unit; a first output end of the scale control unit is connected with a second input end of the attribute correction network; a second output end of the scale control unit is connected with a second input end of the attribute positioning network; and the output end of the attribute positioning network and the output end of the attribute correction network are connected with a loss value calculation module.
As shown in fig. 3, SUN represents a scene understanding dataset, and CUB represents a fine-grained bird classification dataset. It can be seen that the model is very accurate for the localization of local features, for example, for the localization of different parts of the bird's body and for still water, fences in complex scenes. In addition, for complex attributes requiring content understanding, such as an open field in the SUN dataset, the model assigns a higher attention weight value to the whole picture, which conforms to the definition of the open field.
In one embodiment of the invention, the softmax function normalizes the pixel values of the feature map on each channel to between 0-1, thereby representing the attention weight, with high value pixels representing higher importance. Global max pooling can be considered as a special manifestation of the attention mechanism, i.e. only one pixel has a weight of 1 and the other pixels have weights of 0. The zero sample learning network based on dynamic feature extraction and attribute correction integrates the features of each attribute on the image through global average pooling to obtain the judgment score of whether the attribute belongs to the global attribute or the local attribute of the image, and finally, the score is normalized to be between 0 and 1 by a sigmoid function. Loss of classification
Figure BDA0003894453150000121
The visual sample characteristics extracted by the attribute positioning module and the semantic characteristics corrected by the attribute correction module are drawn up in cosine similarity, which is the alignment of the whole semantic level; loss of distance
Figure BDA0003894453150000122
The sample features and the modified semantics are directly required to be the same in each dimension, which is to say that the extracted sample features are the same as the modified semantics in each dimension, which is the alignment of the attribute level.
On the aspect of quantitative analysis, compared with the prior art, the method disclosed by the invention obtains higher prediction accuracy of the test set, as shown in table 1.
TABLE 1
Figure BDA0003894453150000123
Figure BDA0003894453150000131
The zero sample learning task is quantized through three indexes, the prediction accuracy S of the zero sample learning network on the visible class, the prediction accuracy U on the invisible class and the harmonic mean H of the two accuracies are based on dynamic feature extraction and attribute correction, and generally, the higher the harmonic mean is, the better the comprehensive performance of the algorithm is. As can be seen from table 1, the present invention has a larger improvement in the harmonic mean of the accuracy compared to the prior art, which proves the superiority of the present invention.
On the qualitative analysis, the visual analysis of the attention mechanism can show that the invention obtains good effect on the key task of extracting the attribute features.
The invention classifies the semantic attributes and designs a set of comprehensive attribute feature extraction method. For those low-level texture-based attributes, local feature extraction based on the attention mechanism is still retained, and for those high-level abstract attributes based on content understanding, global features of the visual sample are adopted as feature expressions thereof. And performing feature fusion on the local features and the global features, taking a gate control unit as a weight, adjusting the attribute ratio of the two features, and finally realizing the extraction of the features of the visual sample. And an attribute correction concept is provided, and an attribute correction module is designed to modify the attribute value, so that the value of the attribute is closer to the real expression of the visual sample. And aligning the characteristics extracted from the visual sample with the corrected attributes to enhance the characterization capability of the network.

Claims (8)

1. A zero sample classification method based on dynamic feature extraction and attribute correction is characterized by comprising the following steps:
s1, obtaining a visual sample x and semantic features alpha;
s2, constructing a zero sample learning network based on dynamic feature extraction and attribute correction;
s3, transmitting the visual sample and the semantic features to a zero sample learning network based on dynamic feature extraction and attribute correction, obtaining and calculating a loss function according to the visual sample features and the corrected semantic features, calculating a loss value according to the loss function, and returning the loss value in a gradient manner; repeating the steps until the training is finished;
s4, verifying the trained zero sample learning network based on dynamic feature extraction and attribute correction; if the accuracy is higher than the preset value, the step S5 is carried out; otherwise, entering step S3;
and S5, classifying the data set by adopting the trained zero sample learning network based on dynamic feature extraction and attribute correction.
2. The zero sample classification method based on the dynamic feature extraction and the attribute correction as claimed in claim 1, wherein the zero sample learning network based on the dynamic feature extraction and the attribute correction comprises a feature extraction backbone network, an attribute positioning network, an attribute correction network, a scale control unit and a loss value calculation module;
a first output end of the characteristic extraction backbone network is connected with a first input end of the attribute correction network; the second output end of the feature extraction backbone network is connected with the first input end of the attribute positioning network; the third output end of the feature extraction backbone network is connected with the input end of the scale control unit; a first output end of the scale control unit is connected with a second input end of the attribute correction network; a second output end of the scale control unit is connected with a second input end of the attribute positioning network; and the output end of the attribute positioning network and the output end of the attribute correction network are connected with a loss value calculation module.
3. The zero sample classification method based on dynamic feature extraction and attribute modification as claimed in claim 2, wherein the specific implementation manner of step S3 is as follows:
s3-1, positioning the feature attributes of the visual sample through an attribute positioning network and extracting local features and global features;
s3-2, extracting local features and global features required by attribute correction through an attribute correction network;
s3-3, fusing local features and global features extracted by the attribute positioning network and the attribute correction network through the scale control unit to obtain an attribute correction value and visual sample features;
s3-4, correcting the semantic features according to the attribute correction values to obtain corrected semantic features;
s3-5, calculating a loss value according to the distance between the visual sample characteristic and the corrected semantic characteristic; returning a loss value, and updating zero sample learning network parameters based on dynamic feature extraction and attribute correction.
4. The zero sample classification method based on dynamic feature extraction and attribute modification as claimed in claim 3, wherein the specific implementation manner of step S3-1 is as follows:
s3-1-1, obtaining a visual sample characteristic diagram of a visual sample x through a characteristic extraction backbone network
Figure FDA0003894453140000021
C represents the number of channels of the characteristic diagram, namely the dimension of the characteristic of each pixel point; h represents the height of the feature map; w represents the width of the feature map;
Figure FDA0003894453140000022
a shape representing the data;
s3-1-2, according to the formula:
Figure FDA0003894453140000023
obtaining local features u of a visual sample L (ii) a Wherein i represents the height of the feature map; j represents the width of the feature map;
Figure FDA0003894453140000024
the attribute graph represents the distribution of the attributes on the feature graph, K represents the number of the attributes, w represents the attention weight, and v represents the specific distribution value of the attributes; the softmax function represents normalizing the pixel values of the feature map on each channel to between 0-1; phi is a unit of v And phi w Represents two convolution layers with convolution kernel size of 1 × 1;
Figure FDA0003894453140000025
s3-1-3, according to the formula:
Figure FDA0003894453140000031
obtaining global features u of visual samples G (ii) a Wherein;
Figure FDA0003894453140000032
i' represents the height of the feature map; j' represents the width of the feature map.
5. The zero sample classification method based on dynamic feature extraction and attribute modification as claimed in claim 4, wherein the step S3-2 is implemented as follows:
according to the formula:
Figure FDA0003894453140000033
obtaining local characteristics t of each attribute L And global feature t of each attribute G (ii) a Wherein phi is r A convolution kernel having a convolution kernel size of 1 × 1, which represents a calculation attribute correction value; max c’,d’ Representing global maximum pooling; c' represents the height of the feature map; d' represents the width of the feature map;
Figure FDA0003894453140000034
6. the zero sample classification method based on dynamic feature extraction and attribute modification as claimed in claim 5, wherein the specific implementation manner of step S3-3 is as follows:
s3-3-1, according to the formula:
Figure FDA0003894453140000035
obtaining the probability g of whether the attribute is a local attribute or a global attribute; wherein phi is s A convolution layer representing a convolution kernel of 1 × 1; c represents the height of the feature map; d represents the width of the feature map;
s3-3-2, according to the formula:
Figure FDA0003894453140000036
obtaining attribute correction values
Figure FDA0003894453140000037
And a visual sample characteristic ψ (x).
7. The zero sample classification method based on dynamic feature extraction and attribute modification as claimed in claim 6, wherein the specific implementation manner of step S3-4 is as follows:
according to the formula:
Figure FDA0003894453140000041
obtaining corrected semantic feature pi m (α); wherein normaize indicates normalizing the vector length to 1;
Figure FDA0003894453140000042
represents the value corresponding to the nth dimension in the semantic features of the mth class, n =1, 2.,;
Figure FDA0003894453140000043
to represent
Figure FDA0003894453140000044
The nth dimension of (a).
8. The zero sample classification method based on dynamic feature extraction and attribute modification as claimed in claim 7, wherein the specific implementation manner of step S3-5 is as follows:
s3-5-1, according to the formula:
Figure FDA0003894453140000045
get the classification loss
Figure FDA0003894453140000046
And distance loss
Figure FDA0003894453140000047
Wherein, N B A batch size representing a visual sample of each round of learning sampling; exp denotes the natural index; cos represents cosine similarity, τ represents temperature coefficient, α y Representation and sample x p Semantic features corresponding to the same class;
Figure FDA0003894453140000048
represents the square of the L2 norm;
Figure FDA0003894453140000049
is a visible class; alpha is alpha q Is the semantic feature of the qth class in the visible class;
s3-5-2, according to the formula:
Figure FDA00038944531400000410
obtaining the difference between the predicted value and the true value of the zero sample learning network based on the dynamic feature extraction and the attribute correction
Figure FDA00038944531400000411
I.e. the final loss function.
And S3-5-3, calculating a loss value according to the loss function, carrying out gradient feedback, and updating the zero sample learning network parameters based on dynamic feature extraction and attribute correction.
CN202211268579.0A 2022-10-17 2022-10-17 Zero sample classification method based on dynamic feature extraction and attribute correction Pending CN115546553A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211268579.0A CN115546553A (en) 2022-10-17 2022-10-17 Zero sample classification method based on dynamic feature extraction and attribute correction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211268579.0A CN115546553A (en) 2022-10-17 2022-10-17 Zero sample classification method based on dynamic feature extraction and attribute correction

Publications (1)

Publication Number Publication Date
CN115546553A true CN115546553A (en) 2022-12-30

Family

ID=84736103

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211268579.0A Pending CN115546553A (en) 2022-10-17 2022-10-17 Zero sample classification method based on dynamic feature extraction and attribute correction

Country Status (1)

Country Link
CN (1) CN115546553A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117274717A (en) * 2023-10-24 2023-12-22 中国人民解放军空军预警学院 Ballistic target identification method based on global and local visual feature mapping network
CN117388893A (en) * 2023-12-11 2024-01-12 深圳市移联通信技术有限责任公司 Multi-device positioning system based on GPS
CN117274717B (en) * 2023-10-24 2024-07-02 中国人民解放军空军预警学院 Ballistic target identification method based on global and local visual feature mapping network

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117274717A (en) * 2023-10-24 2023-12-22 中国人民解放军空军预警学院 Ballistic target identification method based on global and local visual feature mapping network
CN117274717B (en) * 2023-10-24 2024-07-02 中国人民解放军空军预警学院 Ballistic target identification method based on global and local visual feature mapping network
CN117388893A (en) * 2023-12-11 2024-01-12 深圳市移联通信技术有限责任公司 Multi-device positioning system based on GPS
CN117388893B (en) * 2023-12-11 2024-03-12 深圳市移联通信技术有限责任公司 Multi-device positioning system based on GPS

Similar Documents

Publication Publication Date Title
CN109949317B (en) Semi-supervised image example segmentation method based on gradual confrontation learning
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN109993072B (en) Low-resolution pedestrian re-identification system and method based on super-resolution image generation
CN111079847B (en) Remote sensing image automatic labeling method based on deep learning
CN110807434A (en) Pedestrian re-identification system and method based on combination of human body analysis and coarse and fine particle sizes
US20230162522A1 (en) Person re-identification method of integrating global features and ladder-shaped local features and device thereof
CN113177559B (en) Image recognition method, system, equipment and medium combining breadth and dense convolutional neural network
CN111738113A (en) Road extraction method of high-resolution remote sensing image based on double-attention machine system and semantic constraint
CN111738355A (en) Image classification method and device with attention fused with mutual information and storage medium
CN114119585B (en) Method for identifying key feature enhanced gastric cancer image based on Transformer
CN115761757A (en) Multi-mode text page classification method based on decoupling feature guidance
Pei et al. Consistency guided network for degraded image classification
CN112149689B (en) Unsupervised domain adaptation method and system based on target domain self-supervised learning
CN112712052A (en) Method for detecting and identifying weak target in airport panoramic video
CN114913498A (en) Parallel multi-scale feature aggregation lane line detection method based on key point estimation
CN115391625A (en) Cross-modal retrieval method and system based on multi-granularity feature fusion
CN114820655A (en) Weak supervision building segmentation method taking reliable area as attention mechanism supervision
CN110659601A (en) Depth full convolution network remote sensing image dense vehicle detection method based on central point
CN110991374B (en) Fingerprint singular point detection method based on RCNN
CN112819837A (en) Semantic segmentation method based on multi-source heterogeneous remote sensing image
CN115546553A (en) Zero sample classification method based on dynamic feature extraction and attribute correction
CN113255701B (en) Small sample learning method and system based on absolute-relative learning framework
CN105740879B (en) The zero sample image classification method based on multi-modal discriminant analysis
CN111144466B (en) Image sample self-adaptive depth measurement learning method
CN110533074B (en) Automatic image category labeling method and system based on double-depth neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination