CN112200267A - Zero sample learning classification method based on multi-scale feature fusion - Google Patents

Zero sample learning classification method based on multi-scale feature fusion Download PDF

Info

Publication number
CN112200267A
CN112200267A CN202011190644.3A CN202011190644A CN112200267A CN 112200267 A CN112200267 A CN 112200267A CN 202011190644 A CN202011190644 A CN 202011190644A CN 112200267 A CN112200267 A CN 112200267A
Authority
CN
China
Prior art keywords
scale
features
fine
image
zero sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011190644.3A
Other languages
Chinese (zh)
Inventor
王国威
管乃洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202011190644.3A priority Critical patent/CN112200267A/en
Publication of CN112200267A publication Critical patent/CN112200267A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a zero sample learning classification method based on multi-scale feature fusion, which uses a deep neural network model to extract original scale image features; carrying out secondary processing on the feature map extracted by the deep neural network by using a recalibration technology; calculating the processed characteristic graph by using a cutting technology to generate a fine-scale image; extracting the characteristics of the fine-scale image by using the generated fine-scale image and a parameter-shared deep neural network; performing feature fusion on the original scale image features and the fine scale image features, further projecting the fusion features to a semantic space and a hidden space, and performing parameter optimization and updating by respectively adopting a softmax loss and a triplet loss; in the prediction stage, classifying and identifying the image with given semantic information by using a conversion matrix W obtained in the training stage; the depth features extracted from the original scale images and the fine scale images are used for simultaneously participating in training, so that the fusion features comprise multi-scale features, the robustness of the model is improved, and the classification accuracy is improved.

Description

Zero sample learning classification method based on multi-scale feature fusion
Technical Field
The invention particularly relates to a zero sample learning classification method based on multi-scale feature fusion, and relates to the field of deep learning zero sample learning.
Background
In recent years, deep learning has experienced unprecedented growth in scale, and factors driving this dramatic growth include: the increased complexity of algorithms and models, the increased computing power of machines, and the application of large-scale data. In fact, successful application of deep learning algorithms is not supported by large amounts of data. If the classification model based on deep learning is required to achieve higher accuracy, a large number of high-quality manual labeling samples are required for training.
For some common categories, a large number of images can be acquired with relative ease by means of collection from a network or taking a picture in the field. For some rare categories, such as endangered species, the category is not only extremely low in stock, but also can live in extreme physical environments sometimes, so that the difficulty of image acquisition is increased, and a large amount of high-quality samples cannot be collected easily. For some newly generated classes, the sample size is zero. In any case, it is impractical to obtain sufficient data volume to improve model accuracy, and labeling samples can be prohibitively expensive and time consuming; conventional deep learning-based learning systems can only identify classes that have been seen in the training phase, and cannot identify a class that never occurred in the training phase.
The most widely studied of the zero sample learning algorithms is the compatibility-based algorithm. The method projects a sample image to a visual space, projects the category of the sample to a semantic space, reduces the difference between the visual space and the semantic space as much as possible, improves the compatibility of the visual space and the semantic space, and an end-to-end algorithm directly uses an original image to participate in training in the training process.
Disclosure of Invention
Therefore, in order to solve the above-mentioned deficiencies, the present invention herein provides a zero sample learning classification method based on multi-scale feature fusion.
The invention is realized in such a way that a zero sample learning classification method based on multi-scale feature fusion is constructed, and the method comprises the following steps:
s1, extracting the features of the original image by using a deep neural network to obtain a feature map of original size features;
s2, obtaining a fine-scale image by using a recalibration positioning and cutting combination technology for the original image and the feature map obtained in the previous step;
s3, extracting the features of the fine-scale image by using a parameter-shared deep neural network to obtain fine-scale features, and fusing the two features;
s4, projecting the fusion features obtained in the previous step to a semantic space and a hidden space, and respectively adopting softmax loss and triplet loss to carry out parameter optimization;
and S5, repeating the steps, setting a plurality of periods for training, and finally obtaining a zero sample learning model with strong characterization capability.
Further, the cropping is used to perform a cropping operation on the original image, and the cropped area generally contains the whole of the object or a part of the object.
Furthermore, the fine-scale image is an area which reserves the whole original image object and is rich in semantic information.
Further, the re-calibrated feature map automatically cuts out a target area.
Further, the output of the feature map after recalibration is a group of values including three parameters, and the three parameters respectively represent the coordinates of the center point of the region to be cropped on the original image and the length of the region to be cropped.
Further, the target area is designed to be a square.
Furthermore, the image obtained by clipping is converted into the same size as the original image again by bilinear interpolation.
The invention has the following advantages: the invention provides a zero sample learning classification method based on multi-scale feature fusion through improvement, and compared with the same type of methods, the zero sample learning classification method has the following beneficial effects:
the method has the advantages that: according to the zero sample learning classification method based on multi-scale feature fusion, the model is more adaptive to a zero sample learning classification task by directly training the samples of the original images.
The method has the advantages that: the zero sample learning classification method based on the multi-scale feature fusion is characterized in that a repositioning technology and a cutting technology based on an attention mechanism are used in a matched mode, on the basis of an original image, a feature map extracted through a deep neural network is processed to obtain a fine-scale image, fine-scale features are obtained through the deep neural network again, the two features are fused, the characterization capability of a model is greatly improved through the fusion features, and the model precision is improved.
The method has the advantages that: according to the zero sample learning classification method based on multi-scale feature fusion, the scheme that image features are projected to a semantic space and a hidden space is adopted, softmax loss and triplet loss are adopted to optimize and update parameters respectively, compared with the traditional method, learned feature points are restrained to have discriminativity in a training stage, and model learning capacity is improved.
Drawings
Fig. 1 is an explanatory view of the present invention.
Wherein: the method comprises the following steps of 1, an original image, 2, 3, 4, 5, 6, 7, 8, 9 and 10, wherein the original image is an original size feature, the original size feature is a recalibration feature, the clipping feature is a fine-scale image, the parameters are shared, the semantic space is a hidden space, and the fine-scale feature is a fine-scale feature.
Detailed Description
The present invention will be described in detail with reference to fig. 1, and the technical solutions in the embodiments of the present invention will be clearly and completely described, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
Basic setting:
suppose there is NsTraining set composed of samples
Figure BDA0002752637170000041
Wherein
Figure BDA0002752637170000047
Representing the ith image of the seen class s,
Figure BDA0002752637170000042
representing its corresponding class label. Test set consisting of Nu samples
Figure BDA0002752637170000043
Wherein
Figure BDA0002752637170000044
Representing the jth image of unseen class u,
Figure BDA0002752637170000045
representing its class label. The seen and unseen classes are disjoint,
Figure BDA0002752637170000046
and Y iss∪YuY. Let □ (x) be theta (x)TAnd W represents the mapping of the visual features in the semantic space and is regarded as a predicted semantic vector, wherein theta (x) represents the visual features, and W represents a mapping matrix. Let σ (x) represent the predicted implicit feature vector.
Recalibration operation:
the original image is subjected to a cropping operation, the cropped area typically containing the whole of the object or parts of the object. Since the target area should reflect as much of the attribute information as possible, the goal is to crop out areas that preserve the entire object and are rich in semantic information. For this purpose, the method of SENET [ is used for reference, a recalibrated feature map is obtained through a group of global scaling operations, and specifically, the first step describes channel information by obtaining a group of descriptors between channels through a global average pooling operation
p=[p1,p2,…,pC]∈RC
Figure BDA0002752637170000051
Wherein b iscIs the c-th element of the feature map. Secondly, the descriptors between the channels are operated through a two-layer full connection layer to capture the mutual relation between the characteristic channels,
ac=ρ(W2f(W1p))∈RC
where f (-) and ρ (-) represent the Relu activation function and Sigmoid activation function, respectively. a iscSince the non-co-occurrence relationship among the channels can be well modeled by obtaining the Sigmoid activation function, the non-co-occurrence relationship is used for re-weighting the original characteristic channels,
Figure BDA0002752637170000052
wherein Ffc(. cndot.) represents a two-layer fully connected layer, and ρ (-) represents the Sigmoid activation function. The information expressed by the feature map enhances the originally important feature channel information and weakens the originally less important feature channel information, so that the feature map has more obvious semantic meaning, and the starting point of the next cutting operation is ensured to be a target area with obvious semantic meaning.
Cutting operation:
after the recalibrated feature map is available, the target area is automatically cut out on the feature map. The input of the operation is a recalibration characteristic diagram, and the output is a group of values containing three parameters which respectively represent the coordinates of the center point of the area to be cut on the original image and the length of the area to be cut. To improve the efficiency of the calculation, the target area is designed as a square, which can be expressed as,
[tx,ty,tl]=FCM(M)
wherein, txAnd tyCoordinates of the x-axis and y-axis, t, respectivelylIs the side of a square regionLength, FCMRepresenting a clipping function. Once the re-aligned feature map M is obtained, a finer scale image can be obtained by cropping from the original scale image. Specifically, a two-dimensional continuous mask is first generated, where V (x, y) ═ Vx·VyWherein, in the step (A),
Vx=f(x-zx+0.5zl)-f(x-zx-0.5zl),
Vy=f(x-zy+0.5zl)-f(x-zy-0.5zl).
here, a non-linear mapping is set
f(x)=1/(1+exp(-kx))
And k is set to 10. The cropped area may then be obtained by multiplying the mask by the original image,
xcrop=x⊙V
then, the cropped image is transformed into the same size as the original image again by bilinear interpolation. Learning on the image with the finer scale has an important influence on obtaining finer visual features with identifiability, and makes up for the performance deficiency of the task of identifying the image with the fine granularity.
Feature fusion and mixed mode projection:
a hybrid projection mode is adopted, namely, the image features are projected to two spaces simultaneously, and the method not only can utilize the reasonability of manually defined attributes, but also can utilize the identification of hidden features. Thus, there are two kinds of mappings, visual-semantic mapping and visual-latent mapping.
For visual-semantic mapping, let φ (y) be the semantic feature (attribute) of category y, then the compatibility score can be defined as,
Figure BDA0002752637170000071
wherein theta isxRepresents the visual characteristics, and W represents the visual-semantic mapping matrix to be learned. Considering the compatibility score s as logits in softmax, then sotfmax loss can be expressed as,
Figure BDA0002752637170000072
Wherein the content of the first and second substances,
Figure BDA0002752637170000073
for the visual-implicit mapping, the triple loss [ ] is adopted to minimize the intra-class distance and maximize the inter-class distance, so as to obtain the implicit characteristic with the identifiability,
Figure BDA0002752637170000074
wherein xi,xj,xkRespectively, an anchor point, a positive class sample and a negative class sample, and mrg represents a separation distance, which is set to 1.0.
The characteristic diagram M of the cutting module obtains a group of three parameters t through a two-layer full connection layerx,ty,tl]Wherein t isx,tyCoordinates representing the target area, tlRepresenting the side length of the square area. Let the peak response coordinate of an image be the center, let tlIs a region of side length rx,ry,tl]The cutting module is trained by MSE loss for the real value of the target area, i.e.,
Figure BDA0002752637170000081
in combination with the loss functions of visual-semantic mapping, visual-implicit mapping and clipping network, the overall loss function can be expressed as:
L=Latt+αLlat+betaLmse
where α and β are balance factors and are set to 1.0.
Zero sample learning prediction:
because a mixed mapping mode is adopted in the training stage, namely, the vision-semantic mapping and the vision hidden feature mapping are learned at the same time, a mixed mode is correspondingly adopted for prediction in the testing stage. For the case of visual-semantic mapping, a test image x is given, its projection in semantic space is □ (x), the goal is to assign it a class label,
Figure BDA0002752637170000084
for visual-latent feature mapping, an image x is tested whose projection in semantic space is σ (x), the mean of the prototypes of the known class of latent features is,
Figure BDA0002752637170000082
for an unseen class u, we first compute its relationship in semantic space to all the seen classes,
Figure BDA0002752637170000083
assuming that unseen class u shares a consistent relationship with the semantic space in the hidden space,
Figure BDA0002752637170000091
the prediction of the entire blend can be expressed as, i.e.,
Figure BDA0002752637170000092
where s (·, ·) is a compatibility function.
The parameters and the meanings of English abbreviations are as follows:
Figure BDA0002752637170000093
Figure BDA0002752637170000101
the invention provides a zero sample learning classification method based on multi-scale feature fusion through improvement, and the working principle is as follows;
firstly, extracting features from an original image 2 through a deep neural network 1 to obtain a feature map of original size features 3, wherein the features are called original scale features;
secondly, carrying out recalibration 4 positioning and cutting 5 combined technology on the original scale features to obtain a fine scale image 6;
thirdly, obtaining a fine-scale feature 10 through the fine-scale image 6 by the deep neural network 1 with the parameter sharing 7 and fusing the fine-scale feature with the original-scale feature;
fourthly, projecting the obtained fusion features to a semantic space 8 and a hidden space 9, and respectively adopting softmax loss and triplet loss to carry out parameter optimization.
Fifthly, repeating the steps, setting a plurality of periods for training, and finally obtaining a zero sample learning model with strong characterization capability.
The zero sample learning classification method based on multi-scale feature fusion is provided through improvement, and a model is more adaptive to a zero sample learning classification task by directly training samples of an original image; the repositioning technology and the cutting technology based on the attention mechanism are used in a matched mode, on the basis of an original image, a feature map extracted through a deep neural network is processed to obtain a fine-scale image, fine-scale features are obtained through the deep neural network again, the two features are fused, the characterization capability of the model is greatly improved through the fusion features, and the model precision is improved; compared with the traditional method, the method has the advantages that the learned feature points are restrained to have identifiability in the training stage, and the model learning capacity is improved.

Claims (7)

1. A zero sample learning classification method based on multi-scale feature fusion is characterized by comprising the following steps: the method comprises the following steps:
s1, extracting the features of the original image (2) by using the deep neural network (1) to obtain a feature map of the original size features (3);
s2, obtaining a fine-scale image (6) by using a recalibration (4) positioning and cutting (5) combination technology for the original image (2) and the feature map obtained in the previous step;
s3, extracting features of the fine-scale image (6) by using the depth neural network (1) of the parameter sharing (7) to obtain fine-scale features (10), and fusing the two features;
s4, projecting the fusion features obtained in the previous step to a semantic space (8) and a hidden space (9), and respectively performing parameter optimization by adopting softmax loss and triplet loss;
and S5, repeating the steps, setting a plurality of periods for training, and finally obtaining a zero sample learning model with strong characterization capability.
2. The zero sample learning classification method based on the multi-scale feature fusion as claimed in claim 1, characterized in that: the cropping (5) is used to perform a cropping operation on the original image, the cropped area typically containing the whole of the object or parts of the object.
3. The zero sample learning classification method based on the multi-scale feature fusion as claimed in claim 1, characterized in that: the fine-scale image (6) is an area which reserves the whole object of the original image (2) and is rich in semantic information.
4. The zero sample learning classification method based on the multi-scale feature fusion as claimed in claim 1, characterized in that: and automatically cutting out a target area from the feature map after recalibration (4).
5. The zero sample learning classification method based on the multi-scale feature fusion as claimed in claim 1, characterized in that: and outputting the feature map after the recalibration (4) as a group of values containing three parameters, wherein the three parameters respectively represent the coordinates of the central point of the area to be cut on the original image (2) and the length of the area to be cut.
6. The zero sample learning classification method based on multi-scale feature fusion as claimed in claim 4, wherein: the target area is designed as a square.
7. The zero sample learning classification method based on multi-scale feature fusion as claimed in claim 4, wherein: the cropped image is re-transformed to the same size as the original image (2) by bilinear interpolation.
CN202011190644.3A 2020-10-30 2020-10-30 Zero sample learning classification method based on multi-scale feature fusion Withdrawn CN112200267A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011190644.3A CN112200267A (en) 2020-10-30 2020-10-30 Zero sample learning classification method based on multi-scale feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011190644.3A CN112200267A (en) 2020-10-30 2020-10-30 Zero sample learning classification method based on multi-scale feature fusion

Publications (1)

Publication Number Publication Date
CN112200267A true CN112200267A (en) 2021-01-08

Family

ID=74012167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011190644.3A Withdrawn CN112200267A (en) 2020-10-30 2020-10-30 Zero sample learning classification method based on multi-scale feature fusion

Country Status (1)

Country Link
CN (1) CN112200267A (en)

Similar Documents

Publication Publication Date Title
CN102341813B (en) System and method to match images
CN114067107B (en) Multi-scale fine-grained image recognition method and system based on multi-grained attention
CN110458077B (en) Vehicle color identification method and system
CN109583483A (en) A kind of object detection method and system based on convolutional neural networks
CN113609896A (en) Object-level remote sensing change detection method and system based on dual-correlation attention
CN112418351B (en) Zero sample learning image classification method based on global and local context sensing
CN113223042B (en) Intelligent acquisition method and equipment for remote sensing image deep learning sample
CN107977660A (en) Region of interest area detecting method based on background priori and foreground node
CN104050628A (en) Image processing method and image processing device
CN106910202A (en) The image partition method and system of a kind of remote sensing images atural object
CN111768415A (en) Image instance segmentation method without quantization pooling
CN110765882A (en) Video tag determination method, device, server and storage medium
CN111652273A (en) Deep learning-based RGB-D image classification method
CN111881716A (en) Pedestrian re-identification method based on multi-view-angle generation countermeasure network
CN110738132A (en) target detection quality blind evaluation method with discriminant perception capability
CN114463492A (en) Adaptive channel attention three-dimensional reconstruction method based on deep learning
CN112464775A (en) Video target re-identification method based on multi-branch network
CN113361496B (en) City built-up area statistical method based on U-Net
CN114707635A (en) Model construction method and device based on network architecture search and storage medium
CN110390724B (en) SLAM method with instance segmentation
CN116805360A (en) Obvious target detection method based on double-flow gating progressive optimization network
CN112200267A (en) Zero sample learning classification method based on multi-scale feature fusion
CN115965961A (en) Local-to-global multi-modal fusion method, system, device and storage medium
CN116258937A (en) Small sample segmentation method, device, terminal and medium based on attention mechanism
CN114359090A (en) Data enhancement method for oral cavity CT image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210108

WW01 Invention patent application withdrawn after publication