CN116524351A - Rotary target detection light-weight method and system based on knowledge distillation - Google Patents

Rotary target detection light-weight method and system based on knowledge distillation Download PDF

Info

Publication number
CN116524351A
CN116524351A CN202310299030.6A CN202310299030A CN116524351A CN 116524351 A CN116524351 A CN 116524351A CN 202310299030 A CN202310299030 A CN 202310299030A CN 116524351 A CN116524351 A CN 116524351A
Authority
CN
China
Prior art keywords
gaussian distribution
divergence
detection
model
student model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310299030.6A
Other languages
Chinese (zh)
Inventor
康健
童风雨
陈曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN202310299030.6A priority Critical patent/CN116524351A/en
Publication of CN116524351A publication Critical patent/CN116524351A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a method and a system for detecting a rotation target in light weight based on knowledge distillation, wherein the method comprises the steps of respectively decoding coordinate codes of the detection target output by a student model and a trained teacher model into rotation coordinate forms; converting the rotation coordinates into two-dimensional Gaussian distribution; calculating the KL divergence of the Gaussian distribution representation of the detection target output by the student model and the teacher model, and calculating the KL divergence between the Gaussian distribution representation of the detection target output by the student model and the Gaussian distribution representation of the detection target obtained by the tag real frame; carrying out normalization treatment on KL divergence; calculating to obtain an overall loss function in the process that the teacher model carries out knowledge distillation on the student model according to the normalized KL divergence; prediction was performed using a student model after distillation. The invention provides more accurate positioning information for training of the lightweight network and improves the detection performance of the lightweight network on the optical remote sensing image.

Description

Rotary target detection light-weight method and system based on knowledge distillation
Technical Field
The invention relates to the technical field of high-resolution remote sensing image information processing, in particular to a rotation target detection light-weight method and system based on knowledge distillation.
Background
Target detection in remote sensing images aims at locating objects of interest (e.g. vehicles, aircraft, ships) on the earth's surface and predicting their class. The imaging direction of the remote sensing image is mostly overlooking, the included space scene is larger and more complex, the distribution of the interested targets is unbalanced, and the situation of sparse scene and dense scene distribution exists. Remote sensing target detection has the difficulties of small target, dense distribution and arbitrary direction. When the aspect ratio of the interested target is large, the inclined and compact arrangement exists, the size and the aspect ratio of the interested target cannot truly reflect the object, the object and the background pixels cannot be effectively separated, dense objects are difficult to separate, each detection frame contains adjacent objects, the cross ratio between the horizontal detection frames is high, the cross ratio between the horizontal detection frames is easy to inhibit in the non-maximum value inhibition process, and the final detection precision is low. Due to the periodic nature of the target orientation, i.e. the rotation angle, a suitable design of the loss function is crucial for model training optimization.
The traditional target detection method often needs complicated manual feature extraction and fine parameter adjustment, has poor generalization of a model, and cannot adapt to a continuously-changing environment. In recent years, with the improvement of computing power of hardware equipment, the development of remote sensing imaging technology and the convenience in acquiring remote sensing image resources, the deep learning technology is widely developed, and particularly, a convolutional neural network model has stronger generalization performance due to the strong robust feature extraction capability, function fitting capability and end-to-end network model design. Most of the most advanced target detection methods based on convolutional neural network models are focused on designing advanced network structures or loss functions to improve detection performance, but such methods have expensive calculation costs, so some methods are focused on distilling by using knowledge in a (teacher) large model through lightweight methods to improve detection performance. Knowledge distillation methods based on intermediate layer features, for example, have proven the importance of intermediate layer features in training and predicting the detection network, but if only intermediate layer features are distilled, the information of the coordinate space and class space is lost, which has an influence on the final detection result; there is also knowledge distillation based on generalized distribution information, which solves the regression problem with classification loss, including rotation target position and rotation target angle, solving the positioning uncertainty problem, but with quantization error. The conversion of rotation coordinates into gaussian distribution shows that there have been many applications in the rotation detection field in recent years, but there have been few studies on the light weight method of detectors. Therefore, there is an urgent need to propose a method for lightening the detection of a rotating target based on knowledge distillation to overcome the above technical drawbacks of the prior art.
Disclosure of Invention
Therefore, the technical problem to be solved by the invention is to overcome the technical defects in the prior art, and provide a rotation target detection light-weight method and system based on knowledge distillation, which utilize coordinate coding output of a teacher model and a student model to carry out knowledge distillation, calculate KL divergence between Gaussian distribution representations of rotation coordinates as added distillation loss to train student model parameters, provide more accurate positioning information for training a light-weight network, and improve the detection performance of the light-weight network on an optical remote sensing image.
In order to solve the technical problems, the invention provides a rotation target detection light-weight method based on knowledge distillation, which comprises the following steps:
s1: the coordinate codes of the detection targets output by the student model and the trained teacher model are respectively decoded into rotation coordinate forms, and rotation coordinate representations of the detection targets are obtained;
s2: converting the rotation coordinate representation into two-dimensional Gaussian distribution to obtain Gaussian distribution representation of a detection target;
s3: calculating the KL divergence represented by the Gaussian distribution of the detection target output by the student model and the teacher model to obtain a first KL divergence, and calculating the KL divergence between the Gaussian distribution representation of the detection target output by the student model and the Gaussian distribution representation of the detection target obtained by the tag real frame to obtain a second KL divergence;
s4: normalizing the first KL divergence and the second KL divergence;
s5: according to the normalized first KL divergence and the normalized second KL divergence, calculating to obtain an overall loss function in the process that the teacher model carries out knowledge distillation on the student model;
s6: prediction was performed using a student model after distillation.
In one embodiment of the present invention, in step S2, a method for decoding coordinate codes of a detection target output by the student model and the trained teacher model into a rotation coordinate form, respectively, includes:
and (3) coding the coordinates of the detection targets output by the student model and the trained teacher model into Y: decoding (dx, dy, dw, dh, dθ) into a rotation coordinate form(x, y, w, h, θ), wherein:
x=x a +dx*w a
y=y a +dy*h a
w=wa*e dw
h=h a *e dh
θ=θ a +dθ
wherein A: (x) a ,y a ,w a ,h a ,θ a ) And an anchor frame preset for the model.
In one embodiment of the invention, in step S3, a method of converting the rotation coordinate representation into a two-dimensional gaussian distribution, comprises:
representing the rotation coordinates(x, y, w, h, θ) to a two-dimensional Gaussian distribution +.>(mu, sigma)The calculation formula is as follows:
μ=(x,y) T
wherein, (x, y, w, h, θ) is a rotation coordinate representation of the detection target, and represents an abscissa and an ordinate of a center point of the detection target, a width, a height and a rotation angle of the detection target, respectively, (μ, Σ) represents a mean value and a covariance matrix of the gaussian distribution, respectively.
In one embodiment of the present invention, in step S4, a method for calculating KL divergence represented by gaussian distribution of a detection target output by the student model and the teacher model includes:
the calculation formula of the KL divergence represented by the Gaussian distribution of the detection target output by the student model and the teacher model is as follows:
in the method, in the process of the invention,(x S ,y S ,w S ,h S ,θ S ),S ,∑ S ) And->(x T ,y T ,w T ,h T ,θ T ),T ,∑ T ) Rotation coordinate representation and gaussian distribution representation of detection targets respectively representing outputs of student model and teacher model, Δx=x S -x T ,Δy=y S -y T ,Δθ=θ ST Respectively representing the difference between the horizontal and vertical coordinates and the rotation angle of the center points of the detection target and the tag real frame output by the student model.
In one embodiment of the present invention, in step S4, a method for calculating KL divergence between a gaussian distribution representation of a detection target output by a student model and a gaussian distribution representation of the detection target obtained by a tag real box includes:
the calculation formula of the KL divergence between the Gaussian distribution representation of the detection target output by the student model and the Gaussian distribution representation of the detection target obtained by the tag real frame is as follows:
in the middle of(x S ,y S ,w S ,h S ,θ S ),S ,∑ S ) And->(x G ,y G ,w G ,h G ,θ G ),G ,∑ G ) Rotation coordinate representation and gaussian distribution representation of a detection target obtained by respectively representing the detection target output by a student model and a tag real frame, wherein deltax=x S -x G ,Δy=y S -y G ,Δθ=θ SG Respectively representing the difference between the horizontal coordinate and the vertical coordinate of the center point of the detection target obtained by the detection target output by the student model and the real label frame and the difference between the rotation angles.
In one embodiment of the present invention, in step S7, after prediction is performed using the distilled student model, post-processing is performed on a prediction frame output from the student model.
In one embodiment of the present invention, in step S7, a method for post-processing a prediction frame output by a student model includes: the specific steps of the post-treatment are as follows:
arranging all prediction frames in each category in ascending or descending order of scores;
selecting the prediction frame with the highest score as a reference, and calculating the intersection ratio of other prediction frames and the prediction frame;
and screening the prediction frames in each category according to the intersection ratio, and deleting the prediction frames which do not meet the screening condition, wherein the screening condition is that the intersection ratio of the prediction frames to be screened and the reference prediction frame is smaller than or equal to a preset threshold value.
And selecting the prediction frame with the highest score from the rest prediction frames meeting the screening conditions as a reference frame, and repeating the screening steps until all the prediction frames meeting the screening conditions are used as the reference to participate in screening.
In addition, the invention also provides a rotary target detection light-weight system based on knowledge distillation, which comprises the following steps:
the coordinate decoding module is used for respectively decoding coordinate codes of the detection targets output by the student model and the trained teacher model into rotary coordinate forms to obtain rotary coordinate representations of the detection targets;
the coordinate conversion module is used for converting the rotation coordinate representation into two-dimensional Gaussian distribution to obtain Gaussian distribution representation of the detection target;
the KL divergence calculation module is used for calculating KL divergences represented by Gaussian distribution of the detection targets output by the student model and the teacher model to obtain a first KL divergence, and calculating KL divergences between the Gaussian distribution representation of the detection targets output by the student model and the Gaussian distribution representation of the detection targets obtained by the tag real frame to obtain a second KL divergence;
the normalization processing module is used for normalizing the first KL divergence and the second KL divergence;
the loss function calculation module is used for calculating and obtaining an overall loss function in the process that the teacher model carries out knowledge distillation on the student model according to the normalized first KL divergence and the normalized second KL divergence;
and the prediction module is used for predicting by using the distilled student model.
In one embodiment of the present invention, in a coordinate conversion module, a method of converting the rotated coordinate representation into a two-dimensional gaussian distribution, comprises:
representing the rotation coordinates(x, y, w, h, θ) to a two-dimensional Gaussian distribution +.>The formula for (μ, Σ) is:
μ=(x,y) T
wherein, (x, y, w, h, θ) is a rotation coordinate representation of the detection target, and represents an abscissa and an ordinate of a center point of the detection target, a width, a height and a rotation angle of the detection target, respectively, (μ, Σ) represents a mean value and a covariance matrix of the gaussian distribution, respectively.
In one embodiment of the present invention, in the KL divergence calculating module, a method for calculating KL divergences represented by gaussian distributions of detection targets output by the student model and the teacher model includes:
the calculation formula of the KL divergence represented by the Gaussian distribution of the detection target output by the student model and the teacher model is as follows:
in the middle of(x S ,y S ,w S ,h S ,θ S ),S ,∑ S ) And->(x T ,y T ,w T ,h T ,θ T ),T ,∑ T ) Rotation coordinate representation and gaussian distribution representation of detection targets respectively representing outputs of student model and teacher model, Δx=x S -x T ,Δy=y S -y T ,Δθ=θ ST Respectively representing the difference between the horizontal and vertical coordinates and the rotation angle of the center points of the detection target and the tag real frame output by the student model.
Compared with the prior art, the technical scheme of the invention has the following advantages:
according to the method and the system for detecting the rotation target based on the knowledge distillation, disclosed by the invention, the knowledge distillation is performed by utilizing the coordinate code output of a teacher model and a student model, the KL divergence between Gaussian distribution representations of rotation coordinates is calculated as the added distillation loss to perform training of student model parameters, more accurate positioning information is provided for training of a light-weight network, and the detection performance of the light-weight network on an optical remote sensing image is improved.
Drawings
In order that the invention may be more readily understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof that are illustrated in the appended drawings, in which
Fig. 1 is a training process diagram of a method for detecting and lightening a rotating target based on knowledge distillation.
Fig. 2 is a predictive flowchart of a method for detecting and lightening a rotating target based on knowledge distillation according to the present invention.
FIG. 3 is a graph showing the detection accuracy of HRSC2016 data set based on ResNet18-FPN-RetinaNet as a function of 20 rounds of training, wherein w/o-d is the original network without distillation, and the distillation method and other distillation methods according to the present invention; fitNet is a distillation method based on the characteristics of all intermediate layers; deFeat is a distillation method based on decoupling characteristics; LD is a distillation method based on generalized distribution information between coordinates; KLDD is a distillation method based on KL divergence among the Sitting Gaussian distributions.
FIG. 4 is a graph showing the detection accuracy of HRSC2016 data set based on MobileNet v2-FPN-RetinaNet as a function of 20 rounds of training, wherein w/o-d is the original network without distillation, and the distillation method and other distillation methods according to the present invention; fitNet is a distillation method based on the characteristics of all intermediate layers; deFeat is a distillation method based on decoupling characteristics; LD is a distillation method based on generalized distribution information between coordinates; KLDD is a distillation method based on KL divergence among the Sitting Gaussian distributions.
Fig. 5 is a comparison of detection results obtained on the HRSC2016 data set by different distillation methods, wherein fig. 5a is an original label of the data set, fig. 5b is a detection result of the original res net18-FPN-RetinaNet detector without using any distillation method, fig. 5c is a detection result of the FitNets distillation method based on all intermediate layer features, fig. 5d is a detection result of the defoat distillation method based on decoupling characteristics of a target area, fig. 5e is a detection result of the LD distillation method based on generalized distribution information between coordinates of a target position, and fig. 5f is a detection result of the proposed KLDD distillation method based on KL divergence between coordinates of a standard distribution. The blue-green box in the figure indicates the detected real Target (TP), the yellow box indicates the detected erroneous target (FP), and the red box indicates the missing detected target (FN).
Detailed Description
The present invention will be further described with reference to the accompanying drawings and specific examples, which are not intended to be limiting, so that those skilled in the art will better understand the invention and practice it.
The embodiment of the invention provides a rotation target detection light-weight method based on knowledge distillation, which comprises the following steps of:
s1: decoding coordinate codes of the detection targets output by the student model and the teacher model into rotation coordinate forms respectively to obtain rotation coordinate representations of the detection targets;
s2: converting the rotation coordinate representation into two-dimensional Gaussian distribution to obtain Gaussian distribution representation of a detection target;
s3: calculating the KL divergence represented by the Gaussian distribution of the detection target output by the student model and the teacher model to obtain a first KL divergence, and calculating the KL divergence between the Gaussian distribution representation of the detection target output by the student model and the Gaussian distribution representation of the detection target obtained by the tag real frame to obtain a second KL divergence;
s4: normalizing the first KL divergence and the second KL divergence;
s5: according to the normalized first KL divergence and the normalized second KL divergence, calculating to obtain an overall loss function in the process that the teacher model carries out knowledge distillation on the student model;
s6: prediction was performed using a student model after distillation.
According to the method for detecting and lightening the rotary target based on knowledge distillation, knowledge distillation is performed by utilizing coordinate coding output of teachers and student models, KL divergence between Gaussian distribution representations of rotary coordinates is calculated as added distillation loss to perform training of student model parameters, more accurate positioning information is provided for training of lightening networks, and detection performance of lightening networks on optical remote sensing images is improved.
Referring to fig. 1 and 2, the method for detecting and lightening a rotation target based on knowledge distillation mainly includes the following three parts:
(1) Training a deep convolutional neural network with stronger detection precision as a teacher model:
this section aims at generating a teacher model that provides additional supervision information for student model training, and the experiment selects a teacher model that extracts a backbone network with ResNet50-FPN as a feature and uses RetinoNet as a detection network.
(2) KLDD distillation method based on KL divergence among the Sitting Gaussian distributions:
this section is intended to generate a two-dimensional gaussian representation of the rotation coordinates and to calculate the KL-divergence between the gaussian distributions, the specific steps comprising: coordinate decoding, gaussian distribution transformation and loss calculation of KL divergence among Gaussian distributions.
The specific coordinate decoding calculation formula is as follows:
x=x a +dx*w a
y=y a +dy*h a
w=w a *e dw
h=h a *e dh
θ=θ a +dθ
obtaining a decoded rotation coordinate representation(x, Y, w, h, θ), wherein Y: (dx, dy, dw, dh, dθ) is the coordinate code of the model output, A: (x) a ,y a ,w a ,h a ,θ a ) And an anchor frame preset for the model.
Wherein the rotation coordinates are represented(x, y, w, h, θ) to a two-dimensional Gaussian distribution +.>The specific calculation formula of (μ, Σ) is as follows:
μ=(x,y) T
wherein, (x, y, w, h, θ) is a rotation coordinate representation of the detection target, and represents an abscissa and an ordinate of a center point of the detection target, a width, a height and a rotation angle of the detection target, respectively, (μ, Σ) represents a mean value and a covariance matrix of the gaussian distribution, respectively.
The calculation formula of the KL divergence represented by the Gaussian distribution of the detection target output by the student model S and the teacher model T is as follows:
in the method, in the process of the invention,(x S ,y S ,w S ,h S ,θ S ),S ,∑ S ) And->(x T ,y T ,w T ,h T ,θ T ),T ,∑ T ) Rotation coordinate representation and gaussian distribution representation of detection targets respectively representing outputs of student model and teacher model, Δx=x S -x T ,Δy=y S -y T ,Δθ=θ ST Respectively representing the difference between the horizontal and vertical coordinates and the rotation angle of the center points of the detection target and the tag real frame output by the student model.
The calculation formula of the KL divergence between the gaussian distribution representation of the detection target output by the student model S and the gaussian distribution representation of the detection target of the tag real frame G is as follows:
in the middle of(x S ,y S ,w S ,h S ,θ S ),S ,∑ S ) And->(x G ,y G ,w G ,h G ,θ G ),G ,∑ G ) Rotation coordinate representation and gaussian distribution representation of a detection target obtained by respectively representing the detection target output by a student model and a tag real frame, wherein deltax=x S -x G ,Δy=y S -y G ,Δθ=θ SG Respectively representing the difference between the horizontal coordinate and the vertical coordinate of the center point of the detection target obtained by the detection target output by the student model and the real label frame and the difference between the rotation angles.
Normalization F (KL) of KL divergence between gaussian distributions:
obtaining a final calculation formula of KL divergence loss between the Gaussian distributions of the coordinate parameters:
during training, the overall loss function of the modelThe method comprises the following steps:
wherein lambda is 0 ,λ 1 ,λ 2 Is a weight parameter, and is respectively set to be 1 and 15,5.5;for the positive sample area mask, determining by the intersection ratio of the anchor frame and the real target frame; c (C) S ,C T ,C G Independent-heat codes of category confidence and tag true categories output by students and teacher models respectively;The rotation coordinates and the real coordinates of the labels are respectively output by the student model and the teacher model;is cross entropy loss;Loss of KL divergence between category distributions;And->The KL divergence between the normalized student-truth value and the student-teacher Gaussian distribution is obtained.
(3) And in the prediction stage, only the distilled student model is used for prediction.
The part aims at carrying out post-processing, namely non-maximum suppression, on a prediction frame output by a network, and aims at avoiding repeated detection of the same target, and the specific steps of the post-processing are as follows:
arranging all prediction frames in each category in ascending or descending order of scores;
selecting the prediction frame with the highest score as a reference, and calculating the intersection ratio of other prediction frames and the prediction frame;
and screening the prediction frames in each category according to the intersection ratio, and deleting the prediction frames which do not meet the screening condition, wherein the screening condition is that the intersection ratio of the prediction frames to be screened and the reference prediction frame is smaller than or equal to a preset threshold value.
And selecting the prediction frame with the highest score from the rest prediction frames meeting the screening conditions as a reference frame, and repeating the screening steps until all the prediction frames meeting the screening conditions are used as the reference to participate in screening.
The following describes the beneficial effects of the rotation target detection light-weight method based on knowledge distillation in an experimental verification mode.
Examples:
1. experimental data
In order to verify the effectiveness of the detection model light weight method provided by the invention, two optical remote sensing rotation target detection data sets are adopted in the experiment: DOTA and HRSC2016.
1) DOTA dataset:
DOTA is one of the largest rotating target detection data sets in aerial images, and comprises 2806 aerial images, the image sizes are different from 800×800 to 4000×4000 pixels, the images comprise more than 188000 objects with different scales, orientations and shapes, the objects are marked by any quadrilateral, and the images comprise 15 object categories, namely: aircraft, baseball fields, bridges, ground tracks, small vehicles, large vehicles, ships, tennis courts, basketball courts, storage tanks, soccer courts, circular intersections, ports, swimming pools, and helicopters. The training set, validation set and test set are divided in a ratio of 1/2, 1/6 and 1/3. Because of the large image size, it is here cut into 1024×1024 pixels sub-images, with 200 pixels overlapping between sub-images.
2) HRSC2016 dataset:
HRSC2016 is a single-class dataset for ship detection, with pictures from 6 well-known ports, containing two scenarios: offshore and offshore vessels, the spatial resolution of the images is 0.4-2m. The dataset had a total of 1070 pictures and 2976 ship targets. The training set, validation set, and test set include 436, 181, and 453 images. Because of the unequal image sizes, it is scaled here to 800 x 512 pixels. Fig. 5a shows a partial picture and target box annotation in a dataset.
2. Experimental results
The experiment was built based on the MMRotate environment and was performed on a single NVIDIA RTX3090 GPU. In the experiment, a Retinonet network with ResNet50-FPN as a backbone is selected as a teacher model, and a student model respectively selects the Retinonet network with ResNet18-FPN and MobileNet V2-FPN as backbones. In the training stage, random horizontal, vertical and diagonal overturn is adopted as a data enhancement method, a standard Momentum optimizer (Momentum) is selected, weight attenuation and Momentum are respectively 0.0001 and 0.9, an initial learning rate is set to be 0.0025, 24 epochs are trained for DOTA data lumped, 72 epochs are trained for HRSC2016 data lumped, and batch sizes are all 2. In the test phase, mAP is adopted 50 (average accuracy in the case where the intersection ratio threshold of the true object is determined to be 0.5) is used as an index for evaluating the accuracy of the detection model.
1) DOTA dataset:
tables 1 and 2 show experimental results obtained using different comparative distillation methods, including a reference teacher model ResNet50-FPN, a reference student model ResNet18-FPN, and a reference student model MobileNet v2-FPN. As can be seen from the results in the table, the overall accuracy of the proposed method (KLDD) compared with the standard student models (ResNet 18-FPN and MobileNetv 2-FPN) is respectively improved from 65.2% and 56.8% to 68.1% and 61.0%, and is better than the detection results of the current mainstream distillation methods (FitNets, deFeat, GI agitation, LD, etc.), the overall accuracy is respectively improved by 4.45% and 7.39%.
Table 3 shows the accuracy (mAP) of two lightweight models ResNet 18-FPN-Retinonet and MobileNet v 2-FPN-Retinonet and teacher model ResNet 50-FPN-Retinonet selected for this experiment 50 ) The comparison of floating point operand (FLOPS) and model parameters (parameters) sets the picture size of the input network to 3x1024x1024.
Table 1 experimental results of different comparative distillation methods under DOTA dataset
TABLE 2 experimental results of different comparative distillation methods under DOTA data set
TABLE 3 experimental results before and after the KLDD distillation method under DOTA data set
2) HRSC2016 dataset:
tables 4 and 5 show experimental results obtained using different comparative distillation methods, including a reference teacher model ResNet50-FPN, a reference student model ResNet18-FPN, and a reference student model MobileNet v2-FPN. As can be seen from the results in the table, the overall accuracy of the proposed method (KLDD) compared with the standard student models (ResNet 18-FPN and MobileNetv 2-FPN) is respectively improved from 86.0% and 79.2% to 89.2% and 80.7%, and is better than the detection results of the current mainstream distillation methods (FitNets, deFeat, GI agitation, LD, etc.), the overall accuracy is respectively improved by 3.72% and 1.89%.
Table 6 shows the accuracy (mAP) of two lightweight models ResNet 18-FPN-Retinonet and MobileNet v 2-FPN-Retinonet and teacher model ResNet 50-FPN-Retinonet selected for this experiment 50 ) The comparison of floating point operand (FLOPS) and model parameters (parameters) sets the picture size of the input network to 3x1024x1024.
Table 4 experimental results of different comparative distillation methods under HRSC2016 data set
Table 5 experimental results of different comparative distillation methods under HRSC2016 data set
Table 6 experimental results before and after using KLDD distillation method under HRSC2016 dataset
It can be intuitively seen from fig. 5 that the proposed method is superior to other distillation methods, and for the first image, the proposed method is generally superior to other distillation methods, although false alarm phenomenon also occurs; aiming at the second image, the method correctly detects all ship targets, and the phenomena of false alarm and omission occur in a DeFeat distillation method based on decoupling characteristics and an LD distillation method based on generalized distribution information among coordinates; aiming at the third image, the method is provided for accurately detecting all ship targets, and false alarm and detection omission phenomena are commonly generated in other distillation methods.
The following describes a knowledge distillation-based rotation target detection light-weight system, and the knowledge distillation-based rotation target detection light-weight system and the knowledge distillation-based rotation target detection light-weight method described below can be referred to correspondingly.
The invention also provides a rotary target detection light-weight system based on knowledge distillation, which comprises the following steps:
the coordinate decoding module is used for respectively decoding coordinate codes of the detection targets output by the student model and the trained teacher model into rotary coordinate forms to obtain rotary coordinate representations of the detection targets;
the coordinate conversion module is used for converting the rotation coordinate representation into two-dimensional Gaussian distribution to obtain Gaussian distribution representation of the detection target;
the KL divergence calculation module is used for calculating KL divergences represented by Gaussian distribution of the detection targets output by the student model and the teacher model to obtain a first KL divergence, and calculating KL divergences between the Gaussian distribution representation of the detection targets output by the student model and the Gaussian distribution representation of the detection targets obtained by the tag real frame to obtain a second KL divergence;
the normalization processing module is used for normalizing the first KL divergence and the second KL divergence;
the loss function calculation module is used for calculating and obtaining an overall loss function in the process that the teacher model carries out knowledge distillation on the student model according to the normalized first KL divergence and the normalized second KL divergence;
and the prediction module is used for predicting by using the distilled student model.
In one embodiment of the present invention, in a coordinate conversion module, a method of converting the rotated coordinate representation into a two-dimensional gaussian distribution, comprises:
representing the rotation coordinates(x, y, w, h, θ) to a two-dimensional Gaussian distribution +.>The formula for (μ, Σ) is:
μ=(x,y) T
wherein, (x, y, w, h, θ) is a rotation coordinate representation of the detection target, and represents an abscissa and an ordinate of a center point of the detection target, a width, a height and a rotation angle of the detection target, respectively, (μ, Σ) represents a mean value and a covariance matrix of the gaussian distribution, respectively.
In one embodiment of the present invention, in the KL divergence calculating module, a method for calculating KL divergences represented by gaussian distributions of detection targets output by the student model and the teacher model includes:
the calculation formula of the KL divergence represented by the Gaussian distribution of the detection target output by the student model and the teacher model is as follows:
in the middle of(x S ,y S ,w S ,h S ,θ S )And->(x T ,y T ,w T ,h T ,θ T )T ,∑ T ) Rotation coordinate representation and gaussian distribution representation of detection targets respectively representing outputs of student model and teacher model, Δx=x S -x T ,Δy=y S -y T ,Δθ=θ ST Respectively representing the difference between the horizontal and vertical coordinates and the rotation angle of the center points of the detection target and the tag real frame output by the student model.
The knowledge distillation-based rotation target detection light-weight system of the present embodiment is used to implement the foregoing embodiment part of the knowledge distillation-based rotation target detection light-weight method, so the specific implementation manner thereof may refer to the description of the corresponding embodiment of each part, and will not be further described herein.
In addition, since the knowledge distillation-based rotation target detection light-weight system of the present embodiment is used to implement the aforementioned knowledge distillation-based rotation target detection light-weight method, the functions thereof correspond to those of the above method, and will not be described again here.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present invention will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present invention.

Claims (10)

1. A rotary target detection light weight method based on knowledge distillation is characterized in that: the method comprises the following steps:
s1: respectively decoding coordinate codes of the detection targets output by the student model and the trained teacher model into rotary coordinate forms to obtain rotary coordinate representations of the detection targets;
s2: converting the rotation coordinate representation into two-dimensional Gaussian distribution to obtain Gaussian distribution representation of a detection target;
s3: calculating the KL divergence represented by the Gaussian distribution of the detection target output by the student model and the teacher model to obtain a first KL divergence, and calculating the KL divergence between the Gaussian distribution representation of the detection target output by the student model and the Gaussian distribution representation of the detection target obtained by the tag real frame to obtain a second KL divergence;
s4: normalizing the first KL divergence and the second KL divergence;
s5: according to the normalized first KL divergence and the normalized second KL divergence, calculating to obtain an overall loss function in the process that the teacher model carries out knowledge distillation on the student model;
s6: prediction was performed using a student model after distillation.
2. The method for rotating object detection light weight based on knowledge distillation according to claim 1, wherein the method comprises the following steps: in step S2, the method for decoding the coordinate codes of the detection targets output by the student model and the trained teacher model into rotation coordinate forms includes:
and (3) coding the coordinates of the detection targets output by the student model and the trained teacher model into Y: decoding (dx, dy, dw, dh, dθ) into a rotation coordinate form(x, y, w, h, θ), wherein:
x=x a +dx*w a
y=y a +dy*h a
w=w a *e dw
h=h a *e dh
θ=θ a +dθ
wherein A: (x) a ,y a ,w a ,h a ,θ a ) And an anchor frame preset for the model.
3. A method for rotary target detection weight reduction based on knowledge distillation according to claim 1 or 2, wherein: in step S3, a method of converting the rotation coordinate representation into a two-dimensional gaussian distribution, comprising:
representing the rotation coordinates(x, y, w, h, θ) to a two-dimensional Gaussian distribution +.>The formula for (μ, Σ) is:
μ=(x,y) T
wherein, (x, y, w, h, θ) is a rotation coordinate representation of the detection target, and represents an abscissa and an ordinate of a center point of the detection target, a width, a height and a rotation angle of the detection target, respectively, (μ, Σ) represents a mean value and a covariance matrix of the gaussian distribution, respectively.
4. A method for rotating object detection weight reduction based on knowledge distillation according to claim 3, wherein: in step S4, a method for calculating KL divergence represented by gaussian distribution of a detection target output by the student model and the teacher model includes:
the calculation formula of the KL divergence represented by the Gaussian distribution of the detection target output by the student model and the teacher model is as follows:
in the method, in the process of the invention,(x S ,y S ,w S ,h S ,θ S ),S ,∑ S ) And->(x T ,y T ,w T ,h T ,θ T ),T ,∑ T ) Rotation coordinate representation and gaussian distribution representation of detection targets respectively representing outputs of student model and teacher model, Δx=x S -x T ,Δy=y S -y T ,Δθ=θ ST Respectively represent the detection target output by the student model and the central point of the tag real frame,The difference between the ordinate and the rotation angle.
5. A method for rotating object detection weight reduction based on knowledge distillation according to claim 3, wherein: in step S4, a method for calculating a KL divergence between a gaussian distribution representation of a detection target output by a student model and a gaussian distribution representation of the detection target obtained by a tag real frame, includes:
the calculation formula of the KL divergence between the Gaussian distribution representation of the detection target output by the student model and the Gaussian distribution representation of the detection target obtained by the tag real frame is as follows:
in the middle of(x S ,y S ,w S ,h S ,θ S ),S ,∑ S ) And->(x G ,y G ,w G ,h G ,θ G ),G ,∑ G ) Rotation coordinate representation and gaussian distribution representation of a detection target obtained by respectively representing the detection target output by a student model and a tag real frame, wherein deltax=x S -x G ,Δy=y S -y G ,Δθ=θ SG Respectively representing the difference between the horizontal coordinate and the vertical coordinate of the center point of the detection target obtained by the detection target output by the student model and the tag real frame and the rotation angleAnd (3) difference.
6. The method for rotating object detection light weight based on knowledge distillation according to claim 1, wherein the method comprises the following steps: in step S7, after prediction is performed using the distilled student model, a prediction frame output from the student model is post-processed.
7. The method for rotating object detection light weight based on knowledge distillation according to claim 6, wherein: in step S7, a method for post-processing a prediction frame output by a student model includes: the specific steps of the post-treatment are as follows:
arranging all prediction frames in each category in ascending or descending order of scores;
selecting the prediction frame with the highest score as a reference, and calculating the intersection ratio of other prediction frames and the prediction frame;
and screening the prediction frames in each category according to the intersection ratio, and deleting the prediction frames which do not meet the screening condition, wherein the screening condition is that the intersection ratio of the prediction frames to be screened and the reference prediction frame is smaller than or equal to a preset threshold value.
And selecting the prediction frame with the highest score from the rest prediction frames meeting the screening conditions as a reference frame, and repeating the screening steps until all the prediction frames meeting the screening conditions are used as the reference to participate in screening.
8. A rotary target detection light-weight system based on knowledge distillation is characterized in that: the method comprises the following steps:
the coordinate decoding module is used for respectively decoding coordinate codes of the detection targets output by the student model and the trained teacher model into rotary coordinate forms to obtain rotary coordinate representations of the detection targets;
the coordinate conversion module is used for converting the rotation coordinate representation into two-dimensional Gaussian distribution to obtain Gaussian distribution representation of the detection target;
the KL divergence calculation module is used for calculating KL divergences represented by Gaussian distribution of the detection targets output by the student model and the teacher model to obtain a first KL divergence, and calculating KL divergences between the Gaussian distribution representation of the detection targets output by the student model and the Gaussian distribution representation of the detection targets obtained by the tag real frame to obtain a second KL divergence;
the normalization processing module is used for normalizing the first KL divergence and the second KL divergence;
the loss function calculation module is used for calculating and obtaining an overall loss function in the process that the teacher model carries out knowledge distillation on the student model according to the normalized first KL divergence and the normalized second KL divergence;
and the prediction module is used for predicting by using the distilled student model.
9. The knowledge distillation based rotating target detection light weight system of claim 8, wherein: in a coordinate conversion module, a method of converting the rotational coordinate representation into a two-dimensional gaussian distribution, comprising:
representing the rotation coordinates(x, y, w, h, θ) to a two-dimensional Gaussian distribution +.>The formula for (μ, Σ) is:
μ=(x,y) T
wherein, (x, y, w, h, θ) is a rotation coordinate representation of the detection target, and represents an abscissa and an ordinate of a center point of the detection target, a width, a height and a rotation angle of the detection target, respectively, (μ, Σ) represents a mean value and a covariance matrix of the gaussian distribution, respectively.
10. The knowledge distillation based rotating target detection light weight system of claim 8, wherein: in the KL divergence calculation module, a method for calculating KL divergence represented by gaussian distribution of a detection target output by the student model and the teacher model includes:
the calculation formula of the KL divergence represented by the Gaussian distribution of the detection target output by the student model and the teacher model is as follows:
in the middle of(x S ,y S ,w S ,h S ,θ S ),S ,∑ S ) And->(x T ,y T ,w T ,h T ,θ T ),T ,∑ T ) Rotation coordinate representation and gaussian distribution representation of detection targets respectively representing outputs of student model and teacher model, Δx=s S -x T ,Δy=y S -y T ,Δθ=θ ST Respectively representing the difference between the horizontal and vertical coordinates and the rotation angle of the center points of the detection target and the tag real frame output by the student model.
CN202310299030.6A 2023-03-24 2023-03-24 Rotary target detection light-weight method and system based on knowledge distillation Pending CN116524351A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310299030.6A CN116524351A (en) 2023-03-24 2023-03-24 Rotary target detection light-weight method and system based on knowledge distillation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310299030.6A CN116524351A (en) 2023-03-24 2023-03-24 Rotary target detection light-weight method and system based on knowledge distillation

Publications (1)

Publication Number Publication Date
CN116524351A true CN116524351A (en) 2023-08-01

Family

ID=87389373

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310299030.6A Pending CN116524351A (en) 2023-03-24 2023-03-24 Rotary target detection light-weight method and system based on knowledge distillation

Country Status (1)

Country Link
CN (1) CN116524351A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117521848A (en) * 2023-11-10 2024-02-06 中国科学院空天信息创新研究院 Remote sensing basic model light-weight method and device for resource-constrained scene

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115577305A (en) * 2022-10-31 2023-01-06 中国人民解放军军事科学院系统工程研究院 Intelligent unmanned aerial vehicle signal identification method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115577305A (en) * 2022-10-31 2023-01-06 中国人民解放军军事科学院系统工程研究院 Intelligent unmanned aerial vehicle signal identification method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YINGJIE CUI 等: ""Quantitive short-term precipitation model using multimodal data fusion based on a cross-attention mechanism"", 《MDPI》, 31 December 2022 (2022-12-31) *
王耀: ""基于信息量化对知识蒸馏的研究"", 《中国优秀硕士学位论文全文数据库》, 15 February 2023 (2023-02-15) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117521848A (en) * 2023-11-10 2024-02-06 中国科学院空天信息创新研究院 Remote sensing basic model light-weight method and device for resource-constrained scene
CN117521848B (en) * 2023-11-10 2024-05-28 中国科学院空天信息创新研究院 Remote sensing basic model light-weight method and device for resource-constrained scene

Similar Documents

Publication Publication Date Title
CN113567984B (en) Method and system for detecting artificial small target in SAR image
US20230351573A1 (en) Intelligent detection method and unmanned surface vehicle for multiple type faults of near-water bridges
CN110276269B (en) Remote sensing image target detection method based on attention mechanism
Hou et al. Refined one-stage oriented object detection method for remote sensing images
CN110189304B (en) Optical remote sensing image target on-line rapid detection method based on artificial intelligence
CN110084093B (en) Method and device for detecting and identifying target in remote sensing image based on deep learning
CN111079739B (en) Multi-scale attention feature detection method
CN107967451A (en) A kind of method for carrying out crowd's counting to static image using multiple dimensioned multitask convolutional neural networks
CN111563473A (en) Remote sensing ship identification method based on dense feature fusion and pixel level attention
Wang et al. Gaussian focal loss: Learning distribution polarized angle prediction for rotated object detection in aerial images
CN111968088B (en) Building detection method based on pixel and region segmentation decision fusion
CN116229295A (en) Remote sensing image target detection method based on fusion convolution attention mechanism
CN113569788B (en) Building semantic segmentation network model training method, system and application method
CN113191296A (en) Method for detecting five parameters of target in any orientation based on YOLOV5
CN116524351A (en) Rotary target detection light-weight method and system based on knowledge distillation
CN116563726A (en) Remote sensing image ship target detection method based on convolutional neural network
CN116824335A (en) YOLOv5 improved algorithm-based fire disaster early warning method and system
CN113326763A (en) Remote sensing target detection method based on boundary frame consistency
CN115861756A (en) Earth background small target identification method based on cascade combination network
CN114565842A (en) Unmanned aerial vehicle real-time target detection method and system based on Nvidia Jetson embedded hardware
CN113487600A (en) Characteristic enhancement scale self-adaptive sensing ship detection method
CN110826485B (en) Target detection method and system for remote sensing image
Cheng et al. YOLOv3 Object Detection Algorithm with Feature Pyramid Attention for Remote Sensing Images.
Yuan et al. Dynamic Pyramid Attention Networks for multi-orientation object detection
Deng et al. Towards hierarchical adaptive alignment for aerial object detection in remote sensing images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination