CN114565045A - Remote sensing target detection knowledge distillation method based on feature separation attention - Google Patents

Remote sensing target detection knowledge distillation method based on feature separation attention Download PDF

Info

Publication number
CN114565045A
CN114565045A CN202210194931.4A CN202210194931A CN114565045A CN 114565045 A CN114565045 A CN 114565045A CN 202210194931 A CN202210194931 A CN 202210194931A CN 114565045 A CN114565045 A CN 114565045A
Authority
CN
China
Prior art keywords
attention
mask
feature
foreground
background
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210194931.4A
Other languages
Chinese (zh)
Inventor
赵丹培
袁智超
苑博
史振威
张浩鹏
姜志国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202210194931.4A priority Critical patent/CN114565045A/en
Publication of CN114565045A publication Critical patent/CN114565045A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing target detection knowledge distillation method based on feature separation attention, which comprises the following steps: respectively extracting feature attention diagrams of feature diagrams output by a teacher network and a student network; separating a foreground area and a background area of the feature map, calculating a foreground attention mask through the feature attention map of the teacher network, and calculating a background attention mask through the feature attention map of the student network; calculating the L2 distillation loss using the foreground attention mask and the background attention mask; the knowledge of the teacher network was migrated to the student network based on the L2 distillation loss. The method can effectively select the area to be distilled, improve the distillation efficiency, and improve the detection precision of the final lightweight target detection network on the premise of not changing the structure of a student network and not increasing the calculation consumption.

Description

Remote sensing target detection knowledge distillation method based on feature separation attention
Technical Field
The invention relates to the technical field of knowledge distillation, in particular to a remote sensing target detection knowledge distillation method based on feature separation attention.
Background
The appearance of large-scale high-resolution remote sensing image data sets enables deep learning to be widely applied to remote sensing image target detection. However, the algorithm with high precision is high in time complexity and depends on a high-performance graphics processor. In the actual engineering application of remote sensing target detection, an embedded system is a more general platform. At present, some light-weight deep learning target detection algorithms are available, although the operation speed is high, the detection precision still cannot meet the task requirement.
At present, some knowledge distillation methods are used to improve the performance of the deep neural network, for example, a method of performing migration learning on the output, feature map, and information stream of the network. However, most of these studies are focused on the image classification field, in the target detection field, the definition of information to be migrated by knowledge distillation is not clear, and since the ratio of the background area in the target detection data is much higher than that of the classification data, the interference of the background area is serious when the knowledge distillation is directly performed by using the image classification method, and the ideal effect cannot be achieved.
Therefore, how to provide a knowledge distillation method capable of effectively extracting a region to be distilled, without increasing the calculation consumption, and improving the detection accuracy of a lightweight target detection network is a problem to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the invention provides a remote sensing target detection knowledge distillation method based on feature separation attention, which can effectively select a region to be distilled, improve distillation efficiency, and improve detection accuracy of a final lightweight target detection network on the premise of not changing a student network structure and not increasing calculation consumption.
In order to achieve the purpose, the invention adopts the following technical scheme:
a remote sensing target detection knowledge distillation method based on feature separation attention comprises the following steps:
respectively extracting feature attention diagrams of feature diagrams output by a teacher network and a student network;
separating a foreground area and a background area of the feature map, calculating a foreground attention mask through a feature attention map of a teacher network, and calculating a background attention mask through a feature attention map of a student network;
calculating the L2 distillation loss using the foreground attention mask and the background attention mask;
the knowledge of the teacher network was migrated to the student network based on the L2 distillation loss.
Preferably, in the above-mentioned remote sensing target detection knowledge distillation method based on feature separation attention, the feature attention map includes spatial attention and channel attention; wherein, the calculation formula of the space attention is as follows:
Figure BDA0003526856040000021
the channel attention is calculated as:
Figure BDA0003526856040000022
wherein G issRepresenting spatial attention, for characterizing the degree of importance of each pixel position in each channel, GcRepresenting the attention of the channels, and representing the importance degree of each channel of the feature map; H. w and C represent the height, width and dimension of the feature map, respectively; a. thek,i,jRepresenting the pixel value of profile a at the i, j coordinate position of its k-th channel.
Preferably, in the above method for distilling knowledge of remote sensing target detection based on feature separation attention, the computing of the foreground attention mask by the feature attention map of the teacher network includes:
calculating characteristic diagram A of teacher network outputtSpatial attention and channel attention of;
respectively matching the characteristic graphs A by using a Softmax functiontCarrying out probability normalization on the spatial attention and the channel attention;
feature map A after probability normalizationtForeground mask S of spatial attention and channel attention and labelingfMultiplying to obtain weighted foreground attention mask Jf
Preferably, in the above knowledge distillation method based on remote sensing target detection of feature separation attention, the calculation formula of the foreground attention mask is as follows:
Figure BDA0003526856040000031
Figure BDA0003526856040000032
Figure BDA0003526856040000033
wherein the probability normalization function
Figure BDA0003526856040000034
Normalizing the variable z between probabilities of belonging to class i to (0,1)And the sum of all probabilities is 1; H. w, C respectively represent characteristic diagrams AtThe length, width and dimensions of the substrate,
Figure BDA0003526856040000035
and
Figure BDA0003526856040000036
respectively representing the spatial attention and the channel attention of the teacher network, SfFor foreground mask, take 1 at foreground region and 0 at background region.
Preferably, in the above method for distilling knowledge of remote sensing target detection based on feature separation attention, the method further includes: masking the foreground according to the size of the targetfNormalization is carried out to obtain a normalized foreground mask
Figure BDA0003526856040000037
And masking the normalized foreground
Figure BDA0003526856040000038
Replacing the foreground mask SfCalculating a foreground attention mask; normalized foreground mask
Figure BDA0003526856040000039
The calculation formula of (2) is as follows:
Figure BDA00035268560400000310
where T represents the sum of all targets, T represents each target, stRepresenting the area of the target region.
Preferably, in the above method for distilling knowledge of remote sensing target detection based on feature separation attention, the calculating of the background attention mask by the feature attention map of the student network includes:
calculating characteristic diagram A of student network outputsSpatial attention and channel attention of;
characteristic diagram A is respectively matched through softmaxsSpace (A) ofCarrying out probability normalization on attention and channel attention;
feature map A after probability normalizationsThe background mask S of spatial attention and channel attention and labelingbMultiplying to obtain a weighted background attention mask Jb
Preferably, in the above-mentioned distillation method for remote sensing target detection knowledge based on feature separation attention, the background attention mask JbThe calculation formula of (c) is:
Figure BDA0003526856040000041
Figure BDA0003526856040000042
Figure BDA0003526856040000043
wherein H, W, C respectively represent characteristic diagrams AsThe length, width and dimensions of the substrate,
Figure BDA0003526856040000044
and
Figure BDA0003526856040000045
respectively representing the spatial attention and the channel attention, S, of the student networkbFor background masking, 0 is taken at the foreground and 1 is taken at the background.
Preferably, in the above method for distilling knowledge of remote sensing target detection based on feature separation attention, the method further includes: masking the background with a mask S according to the size of the targetbNormalization is carried out to obtain a normalized background mask
Figure BDA0003526856040000046
And masking the normalized background
Figure BDA0003526856040000047
Instead of the background mask SbCalculating a background attention mask; normalized background mask
Figure BDA0003526856040000048
The calculation formula of (2) is as follows:
Figure BDA0003526856040000049
where T represents the sum of all targets, T represents each target, stRepresenting the area of the target region.
Preferably, in the distillation method based on the knowledge of remote sensing target detection of feature separation attention, the formula for calculating the distillation loss of L2 is as follows:
Ld=δL2(As·Jf,At·Jf)+εL2(As·Jb,At·Jb)
where δ and ε are parameters that control the ratio of the computation of the foreground attention mask to the background attention mask loss, AsAnd AtFeature diagrams representing student and teacher networks, respectively, JfAnd JbRespectively representing a foreground attention mask and a background attention mask; the L2 loss function is a function for solving a space Euclidean distance for two vectors of X and Y, and the calculation mode is as follows:
Figure BDA00035268560400000410
wherein xi、yiEach term of the vectors X, Y, respectively, is represented by n terms.
According to the technical scheme, compared with the prior art, the remote sensing target detection knowledge distillation method based on feature separation attention is provided, and firstly, the foreground and background areas of the feature map are separated and the attention maps are respectively extracted. The characteristic separation adopts a mask mode, the coordinates of the region where the target is located are extracted from the labeling information, and the coordinates are mapped to a characteristic diagram to be distilled according to the resolution. For the foreground area, using a foreground mask, namely only considering the area where the target is located; for the background area, a background mask is used, i.e. the parts other than the target area are considered. The method comprises the steps of extracting space attention and channel attention from a foreground area and a background area respectively, obtaining stronger response in the foreground area containing a target compared with the output of a student network due to the fact that teacher network performance is stronger, distilling the foreground response of the teacher network to improve the capability of the student network in judging the foreground target, and calculating a foreground attention mask through a feature map of the teacher network. And some wrong responses may exist in the background, the feature extraction capability of the student network is relatively weak, compared with a teacher network, the wrong response in a background area of the student network output feature map is more prominent, and the misjudgment of the student network on the background area can be reduced by training the background area of the student network through distillation, so that the background attention mask is calculated through the feature map of the student network. Knowledge of the teacher network can be migrated to the student network using separate foreground and background attention masks to calculate the L2 loss separately. In general, the invention fuses the information of the characteristic diagram into an attention diagram in a way of spatial attention and channel attention, and avoids mutual interference among channels. And the false detection of the target in the foreground region and the false detection of the background region are distilled respectively in the mode of the foreground mask and the background mask, so that the detection effect of the lightweight model is greatly improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a distillation method for remote sensing target detection knowledge based on feature separation attention, provided by the invention;
FIG. 2 is a schematic diagram of the structure of a knowledge distillation model provided by the present invention;
FIG. 3 is a schematic diagram of a foreground attention mask acquisition process provided by the present invention;
FIG. 4 is a schematic diagram of a background attention mask process provided by the present invention;
FIG. 5 is a schematic of a distillation based on feature separation attention provided by the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in FIG. 1, the embodiment of the invention discloses a distillation method for remote sensing target detection knowledge based on feature separation attention, which comprises the following steps:
s1, extracting feature attention diagrams of feature diagrams output by the teacher network and the student network respectively;
s2, separating the foreground area and the background area of the feature map, calculating a foreground attention mask through the feature attention map of the teacher network, and calculating a background attention mask through the feature attention map of the student network;
s3, calculating the distillation loss of the L2 by using the foreground attention mask and the background attention mask; the knowledge of the teacher network was migrated to the student network based on the L2 distillation loss.
The overall structure of the remote sensing target detection knowledge distillation method based on feature separation attention is shown in figure 2, and the model is divided into a teacher network and a student network, wherein the teacher network is a high-performance complex neural network, and the student network is a light-weight simple neural network. Firstly, the teacher network is pre-trained to be converged and have high detection performance. And then, in the process of training the student network, the output of the teacher network is used as additional supervision information to train the student network, and the knowledge in the characteristic diagram is migrated, so that the training effect of the student network is improved.
The embodiment of the invention aims at carrying out knowledge distillation on the characteristic diagram in the convolutional neural network. Due to the fact that the dimension of the characteristic diagram is too high, mutual interference among channels is easily caused by directly calculating the characteristic diagram by utilizing the L2 loss, and the distillation effect is damaged. On the other hand, effective information may exist in a background region ignored by a knowledge distillation method for detecting most targets, and the capability of distinguishing positive and negative samples of the model can be improved by effectively learning the background region. Distillation of the foreground part can reduce the omission factor of a student network, namely, the model has stronger capability of identifying a positive sample; and distilling the background part can reduce the false detection rate of the student network, namely, the model has stronger capability of identifying the negative sample. Therefore, the invention provides a knowledge distillation mode based on feature separation attention by utilizing the mode of separating and distilling the foreground and the background and combining the feature map attention.
The above steps are further described below.
And S1, extracting characteristic attention diagrams of the characteristic diagrams output by the teacher network and the student network respectively.
The characteristic attention is divided into spatial attention and channel attention. Spatial attention refers to the dimensionality reduction of features along the channel dimension, represented by only one value at each pixel point. This attention is reflected in the degree of importance of the pixels in the feature map, with points with stronger responses indicating a greater probability of the presence of an object. Channel attention refers to the dimensionality reduction of the feature along the length-width dimension, with each value reflecting the response of one channel. Because the information contained in each channel is uneven when the network extracts the features, the attention reflects the importance degree of each channel in the feature map, and the network can be more focused on the channels rich in effective information through the attention of the channels.
Since there is no label information for training the attention module in the knowledge distillation, the invention adopts a simple manual design mode, namely taking the absolute value average as the attention map. Feature graph A over teacher networktAnd profile A of a student networksThe respective spatial attention map and the channel are calculated by the following formulaAttention is drawn to the force diagram.
The formula for calculating the spatial attention is as follows:
Figure BDA0003526856040000071
the channel attention is calculated as:
Figure BDA0003526856040000072
wherein G issRepresenting spatial attention, for characterizing the degree of importance of each pixel position in each channel, GcRepresenting the attention of the channels, and representing the importance degree of each channel of the feature map; H. w and C represent the height, width and dimension of the feature map, respectively; a. thek,i,jThe pixel values representing the i, j coordinate position of feature a at its k-th channel.
And S2, separating the foreground area and the background area of the feature map, calculating a foreground attention mask through the feature attention force diagram of the teacher network, and calculating a background attention mask through the feature attention force diagram of the student network.
As shown in fig. 3-4, knowledge distillation based on feature separation attention requires that a spatial and channel attention map is obtained by absolute value averaging, feature separation is performed on the feature attention map through a foreground mask and a background mask, and distillation losses of the foreground attention mask and the background attention mask are calculated respectively. For the foreground attention mask, multiplying the space of the teacher model and the channel attention and taking a foreground part by using the foreground mask; for the background attention mask, the spatial and channel attention of the student model are multiplied and the background portion is taken using the background mask. Subsequently, the L2 distillation loss function was calculated on the foreground and background attention masks, respectively.
Because the teacher network performance is strong, compared with the output of the student network, strong response can be obtained in the foreground area containing the target, and the ability of the student network to judge the foreground target can be improved by distilling the foreground response of the teacher network, so that the foreground attention mask is calculated through the feature map of the teacher network. In particular, the method comprises the following steps of,
1. the foreground attention mask is obtained by the following steps:
1) calculating characteristic diagram A of teacher network outputtSpatial attention and channel attention of;
2) respectively matching the characteristic graphs A by using a Softmax functiontCarrying out probability normalization on the spatial attention and the channel attention;
3) feature map A after probability normalizationtForeground mask S of spatial attention and channel attention and labelingfMultiplying to obtain weighted foreground attention mask Jf
The specific calculation formula is as follows:
Figure BDA0003526856040000081
Figure BDA0003526856040000082
Figure BDA0003526856040000083
wherein the probability normalization function
Figure BDA0003526856040000084
Normalizing the variable z between the probabilities belonging to class i to (0,1) and the sum of all probabilities being 1; H. w, C respectively represent characteristic diagrams AtThe length, width and dimensions of the substrate,
Figure BDA0003526856040000085
and
Figure BDA0003526856040000086
respectively representing the spatial attention and the channel attention of the teacher network, SfTaking 1 at foreground region as foreground maskAnd 0 is taken at the scene area.
In order to balance the influence of loss functions of targets with different sizes, the foreground mask needs to be normalized according to the size of the target, and the normalized foreground mask is utilized
Figure BDA0003526856040000087
Substituted for SfNormalized foreground mask
Figure BDA0003526856040000088
The calculation formula of (2) is as follows:
Figure BDA0003526856040000089
t represents the sum of all targets, T represents each target, stThe area of the target region is represented, which ensures that the influence of different sized targets in the loss function is the same.
2. Some wrong responses may exist in the background, the feature extraction capability of the student network is relatively weak, compared with the teacher network, the wrong responses in the background area of the student network output feature map are more prominent, and the misjudgment of the student network on the background area can be reduced by training the background area of the student network through distillation. Similar to the calculation method of the foreground attention mask, the background attention mask is obtained as follows:
1) calculating characteristic diagram A of student network outputsSpatial attention and channel attention of;
2) characteristic diagram A is respectively matched through softmaxsPerforming probability normalization on the spatial attention and the channel attention;
3) feature map A after probability normalizationsThe spatial attention and channel attention and labeling of the background mask SbMultiplying to obtain a weighted background attention mask Jb
The specific calculation formula is as follows:
Figure BDA0003526856040000091
Figure BDA0003526856040000092
Figure BDA0003526856040000093
wherein H, W, C represent the characteristic diagrams A respectivelysThe length, width and dimensions of the substrate,
Figure BDA0003526856040000094
and
Figure BDA0003526856040000095
respectively representing the spatial attention and the channel attention, S, of the student networkbFor background masking, 0 is taken at the foreground and 1 is taken at the background.
Similarly, to balance the effects of the background mask, the mask is also weighted by the area of the background region. Using weighted background masks
Figure BDA0003526856040000096
Substituted for Sb. Normalized background mask
Figure BDA0003526856040000097
The calculation formula of (2) is as follows:
Figure BDA0003526856040000098
where T represents the sum of all targets, T represents each target, stRepresenting the area of the target region. Using a weighted background mask, the contribution of the background region to the loss function can be made to coincide with the foreground region.
S3 and L2 distillation loss function design.
As shown in fig. 5, after obtaining the foreground attention mask and the background attention mask, the foreground and background attention masks are multiplied by the output feature maps of the teacher and student respectively to calculate the distillation loss, and the loss function is an L2 function:
Ld=δL2(As·Jf,At·Jf)+εL2(As·Jb,At·Jb),
where δ and ε are parameters that control the ratio of foreground to background loss calculation, AsAnd AtFeature diagrams representing models of students and teachers, respectively, JfAnd JbThe foreground attention mask and the background attention mask are indicated separately.
The L2 loss function is a function of solving the spatial euclidean distance for the two vectors X, Y, and is calculated as follows:
Figure BDA0003526856040000101
wherein x isi、yiEach term of the vectors X, Y, respectively, is represented by n terms.
Through the distillation loss function, the knowledge of the complex model can be transferred to the lightweight model, so that the detection performance of the lightweight model is improved. In practical application, a complex model is firstly trained aiming at a target task, then the pre-trained complex model is used as additional supervision information, and an additional knowledge distillation loss function based on feature separation attention is added when a lightweight model is trained. After the training convergence, the obtained lightweight remote sensing target detection model has improved detection performance compared with a model without distillation loss.
The invention is used as a lightweight target detection task and integrates the information of the characteristic diagram into an attention diagram in a mode of extending through space attention and channel attention, thereby avoiding mutual interference among channels. And the false detection of the target in the foreground region and the false detection of the background region are distilled respectively in the mode of the foreground mask and the background mask, so that the detection effect of the lightweight model is greatly improved.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. A remote sensing target detection knowledge distillation method based on feature separation attention is characterized by comprising the following steps:
respectively extracting feature attention diagrams of feature diagrams output by a teacher network and a student network;
separating a foreground area and a background area of the feature map, calculating a foreground attention mask through the feature attention map of the teacher network, and calculating a background attention mask through the feature attention map of the student network;
calculating the L2 distillation loss using the foreground attention mask and the background attention mask;
the knowledge of the teacher network was migrated to the student network based on the L2 distillation loss.
2. The distillation method for remote sensing target detection knowledge based on feature separation attention, according to claim 1, wherein the feature attention map comprises spatial attention and channel attention; wherein, the calculation formula of the space attention is as follows:
Figure FDA0003526856030000011
the channel attention is calculated as:
Figure FDA0003526856030000012
wherein G issRepresenting spatial attention, for characterizing the degree of importance of each pixel position in each channel, GcRepresenting the attention of the channels, and representing the importance degree of each channel of the feature map; H. w and C represent the height, width and dimension of the feature map, respectively; a. thek,i,jThe pixel values representing the i, j coordinate position of feature a at its k-th channel.
3. The method for distilling knowledge detection based on remote sensing target detection of feature separation attention according to claim 1, wherein the calculating of the foreground attention mask through the feature attention map of the teacher network comprises:
calculating characteristic diagram A of teacher network outputtSpatial attention and channel attention of;
respectively matching the characteristic graphs A by using a Softmax functiontPerforming probability normalization on the spatial attention and the channel attention;
feature map A after probability normalizationtForeground mask S of spatial attention and channel attention and labelingfMultiplying to obtain a weighted foreground attention mask Jf
4. The method for distilling knowledge detection of remote sensing target based on feature separation attention according to claim 3, wherein the foreground attention mask is calculated by the formula:
Figure FDA0003526856030000021
Figure FDA0003526856030000022
Figure FDA0003526856030000023
wherein the probability normalization function
Figure FDA0003526856030000024
Normalizing the variable z between the probabilities belonging to class i to (0,1) and the sum of all probabilities being 1; H. w, C respectively represent characteristic diagrams AtThe length, width and dimensions of the substrate,
Figure FDA0003526856030000025
and
Figure FDA0003526856030000026
respectively representing the spatial attention and the channel attention of the teacher network, SfFor foreground mask, take 1 at foreground region and 0 at background region.
5. The method for distilling knowledge detection of remote sensing target based on feature separation attention according to claim 4, characterized by further comprising: masking the foreground according to the size of the targetfNormalization is carried out to obtain a normalized foreground mask
Figure FDA0003526856030000027
And masking the normalized foreground
Figure FDA0003526856030000028
Replacing the foreground mask SfCalculating a foreground attention mask; normalized foreground mask
Figure FDA0003526856030000029
The calculation formula of (2) is as follows:
Figure FDA00035268560300000210
where T represents the sum of all targets, T represents each target, stRepresenting the area of the target region.
6. The method for distilling knowledge detection of remote sensing target based on feature separation attention according to claim 1, wherein the calculating of the background attention mask through the feature attention map of the student network comprises:
calculating characteristic diagram A of student network outputsSpatial attention and channel attention of;
characteristic diagram A is respectively matched through softmaxsCarrying out probability normalization on the spatial attention and the channel attention;
feature map A after probability normalizationsThe spatial attention and channel attention and labeling of the background mask SbMultiply to obtain a weighted background attention mask Jb
7. The method as claimed in claim 6, wherein the background attention mask J is used for the knowledge distillation of the remote sensing target detection based on the feature separation attentionbThe calculation formula of (2) is as follows:
Figure FDA0003526856030000031
Figure FDA0003526856030000032
Figure FDA0003526856030000033
wherein H, W, C respectively representThe length, width and dimensions of the characteristic map As,
Figure FDA0003526856030000034
and
Figure FDA0003526856030000035
respectively representing the spatial attention and the channel attention, S, of the student networkbFor background masking, 0 is taken at the foreground and 1 is taken at the background.
8. The method for distilling knowledge detection of remote sensing target based on feature separation attention according to claim 7, further comprising: masking the background with a mask S according to the size of the targetbNormalization is carried out to obtain a normalized background mask
Figure FDA0003526856030000036
And masking the normalized background
Figure FDA0003526856030000037
Instead of the background mask SbCalculating a background attention mask; normalized background mask
Figure FDA0003526856030000038
The calculation formula of (2) is as follows:
Figure FDA0003526856030000039
where T represents the sum of all targets, T represents each target, stRepresenting the area of the target region.
9. The distillation method for remote sensing target detection knowledge based on feature separation attention according to claim 1, wherein the calculation formula of the distillation loss of L2 is as follows:
Ld=δL2(As·Jf,At·Jf)+εL2(As·Jb,At·Jb)
where δ and ε are parameters that control the ratio of the computation of the foreground attention mask to the background attention mask loss, AsAnd AtFeature diagrams representing student and teacher networks, respectively, JfAnd JbRespectively representing a foreground attention mask and a background attention mask; the L2 loss function is a function for solving a space Euclidean distance for two vectors of X and Y, and the calculation mode is as follows:
Figure FDA00035268560300000310
wherein x isi、yiEach term of the vectors X, Y, respectively, is represented by n terms.
CN202210194931.4A 2022-03-01 2022-03-01 Remote sensing target detection knowledge distillation method based on feature separation attention Pending CN114565045A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210194931.4A CN114565045A (en) 2022-03-01 2022-03-01 Remote sensing target detection knowledge distillation method based on feature separation attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210194931.4A CN114565045A (en) 2022-03-01 2022-03-01 Remote sensing target detection knowledge distillation method based on feature separation attention

Publications (1)

Publication Number Publication Date
CN114565045A true CN114565045A (en) 2022-05-31

Family

ID=81714945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210194931.4A Pending CN114565045A (en) 2022-03-01 2022-03-01 Remote sensing target detection knowledge distillation method based on feature separation attention

Country Status (1)

Country Link
CN (1) CN114565045A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114882324A (en) * 2022-07-11 2022-08-09 浙江大华技术股份有限公司 Target detection model training method, device and computer readable storage medium
CN115019060A (en) * 2022-07-12 2022-09-06 北京百度网讯科技有限公司 Target recognition method, and training method and device of target recognition model
CN116664840A (en) * 2023-05-31 2023-08-29 博衍科技(珠海)有限公司 Semantic segmentation method, device and equipment based on mutual relationship knowledge distillation
CN116778300A (en) * 2023-06-25 2023-09-19 北京数美时代科技有限公司 Knowledge distillation-based small target detection method, system and storage medium
CN116994068A (en) * 2023-09-19 2023-11-03 湖北省长投智慧停车有限公司 Target detection method and device based on knowledge distillation
WO2024051686A1 (en) * 2022-09-05 2024-03-14 东声(苏州)智能科技有限公司 Compression and training method and apparatus for defect detection model
CN117974988A (en) * 2024-03-28 2024-05-03 南京邮电大学 Lightweight target detection method, lightweight target detection device and computer program product
CN117974988B (en) * 2024-03-28 2024-05-31 南京邮电大学 Lightweight target detection method, lightweight target detection device and computer program product

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114882324A (en) * 2022-07-11 2022-08-09 浙江大华技术股份有限公司 Target detection model training method, device and computer readable storage medium
CN115019060A (en) * 2022-07-12 2022-09-06 北京百度网讯科技有限公司 Target recognition method, and training method and device of target recognition model
WO2024051686A1 (en) * 2022-09-05 2024-03-14 东声(苏州)智能科技有限公司 Compression and training method and apparatus for defect detection model
CN116664840A (en) * 2023-05-31 2023-08-29 博衍科技(珠海)有限公司 Semantic segmentation method, device and equipment based on mutual relationship knowledge distillation
CN116664840B (en) * 2023-05-31 2024-02-13 博衍科技(珠海)有限公司 Semantic segmentation method, device and equipment based on mutual relationship knowledge distillation
CN116778300A (en) * 2023-06-25 2023-09-19 北京数美时代科技有限公司 Knowledge distillation-based small target detection method, system and storage medium
CN116778300B (en) * 2023-06-25 2023-12-05 北京数美时代科技有限公司 Knowledge distillation-based small target detection method, system and storage medium
CN116994068A (en) * 2023-09-19 2023-11-03 湖北省长投智慧停车有限公司 Target detection method and device based on knowledge distillation
CN117974988A (en) * 2024-03-28 2024-05-03 南京邮电大学 Lightweight target detection method, lightweight target detection device and computer program product
CN117974988B (en) * 2024-03-28 2024-05-31 南京邮电大学 Lightweight target detection method, lightweight target detection device and computer program product

Similar Documents

Publication Publication Date Title
CN114565045A (en) Remote sensing target detection knowledge distillation method based on feature separation attention
CN111209810B (en) Boundary frame segmentation supervision deep neural network architecture for accurately detecting pedestrians in real time through visible light and infrared images
CN112733822B (en) End-to-end text detection and identification method
CN108062525B (en) Deep learning hand detection method based on hand region prediction
CN113158862B (en) Multitasking-based lightweight real-time face detection method
CN107808376B (en) Hand raising detection method based on deep learning
CN103020992B (en) A kind of video image conspicuousness detection method based on motion color-associations
CN111311647B (en) Global-local and Kalman filtering-based target tracking method and device
CN105160310A (en) 3D (three-dimensional) convolutional neural network based human body behavior recognition method
CN107169985A (en) A kind of moving target detecting method based on symmetrical inter-frame difference and context update
CN107301376B (en) Pedestrian detection method based on deep learning multi-layer stimulation
CN112364931A (en) Low-sample target detection method based on meta-feature and weight adjustment and network model
CN112633088B (en) Power station capacity estimation method based on photovoltaic module identification in aerial image
CN111274905A (en) AlexNet and SVM combined satellite remote sensing image land use change detection method
CN109753962A (en) Text filed processing method in natural scene image based on hybrid network
CN105844667A (en) Structural target tracking method of compact color coding
CN113902753A (en) Image semantic segmentation method and system based on dual-channel and self-attention mechanism
CN114926826A (en) Scene text detection system
CN113901931A (en) Knowledge distillation model-based behavior recognition method for infrared and visible light videos
Guo et al. Save the tiny, save the all: hierarchical activation network for tiny object detection
CN110008834B (en) Steering wheel intervention detection and statistics method based on vision
CN110334703B (en) Ship detection and identification method in day and night image
CN116665097A (en) Self-adaptive target tracking method combining context awareness
CN116580289A (en) Fine granularity image recognition method based on attention
CN114067359B (en) Pedestrian detection method integrating human body key points and visible part attention characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination