CN116958687A - Unmanned aerial vehicle-oriented small target detection method and device based on improved DETR - Google Patents

Unmanned aerial vehicle-oriented small target detection method and device based on improved DETR Download PDF

Info

Publication number
CN116958687A
CN116958687A CN202310931094.3A CN202310931094A CN116958687A CN 116958687 A CN116958687 A CN 116958687A CN 202310931094 A CN202310931094 A CN 202310931094A CN 116958687 A CN116958687 A CN 116958687A
Authority
CN
China
Prior art keywords
detr
improved
small target
target detection
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310931094.3A
Other languages
Chinese (zh)
Inventor
杜强
姜明新
洪远
王杰
项靖
黄俊闻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaiyin Institute of Technology
Original Assignee
Huaiyin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaiyin Institute of Technology filed Critical Huaiyin Institute of Technology
Priority to CN202310931094.3A priority Critical patent/CN116958687A/en
Publication of CN116958687A publication Critical patent/CN116958687A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention discloses a small target detection method and device for an unmanned aerial vehicle based on improved DETR, which are used for constructing a data set aiming at small target detection of the unmanned aerial vehicle and dividing the data set into a training set and a testing set; constructing a small target detection network based on improved DETR; adopting a SheffeNet-d as a feature extraction network of the DETR, introducing a 1X 1 convolution module, and extracting features along the channel dimension; wherein, the SheffeNet-d is to delete the original global pooling and full connection layer of the SheffeNet v 2; replacing self-attribute in the encoder in the DETR with FlashA section-2; the Neck layer in the DETR adopts a deformable trans-scale feature fusion module defoforming-CCFM; adopting Smooth-L1 and DIOULoss as a loss function of the improved DETR lightweight characteristic extraction network based on FlashA technology-2; the data set is used to train and evaluate the small target detection network based on the improved DETR. According to the invention, aiming at the problem that a small target is difficult to detect in an unmanned plane scene, the design of a network structure is carried out, multi-scale and multi-level information is fused, the representation capability of the network is improved, and the detection precision of the small target is improved.

Description

Unmanned aerial vehicle-oriented small target detection method and device based on improved DETR
Technical Field
The invention belongs to the application of deep learning in the field of computer vision, and particularly relates to an unmanned aerial vehicle-oriented small target detection method and device based on improved DETR.
Background
Unmanned aerial vehicle target recognition becomes more and more popular, unmanned aerial vehicles gradually become an indispensable important part in complex scenes, unmanned aerial vehicle aerial data are similar to remote sensing image data in a few minutes, and small targets are in a lot in aerial data images. Unmanned aerial vehicles are typically deployed in large scenes, which means that various objects of interest in one image, such as pedestrians, bicycles, automobiles, etc., are small in scale and are easily disturbed by the environment due to the high photographing height, resulting in difficulty in detection by conventional object detection methods. The traditional unmanned aerial vehicle aerial image target detection method has the problems of high omission rate, low detection success rate, large model size and the like. Therefore, improving the detection capability of an algorithm on small targets in an aerial image of an unmanned aerial vehicle becomes a challenging research direction in the field of target detection, and meanwhile, considering the application of the algorithm to small equipment in an unmanned aerial vehicle scene, the model is required to be quantized, the calculated amount is reduced, and the utilization rate of a memory is improved.
DETR is the first end-to-end algorithm based on a transducer, and no anchor pre-processing and NMS post-processing exist, but the DETR is slow to converge, slow to train and slow to infer, and although the subsequent optimization algorithm continuously accelerates the convergence speed and improves the inference speed, the real-time requirement cannot be realized. DETR performs poorly on small object detection, existing detectors typically have multi-scale features, small object targets are typically detected on high-resolution feature maps, whereas DETR does not detect with multi-scale features, primarily high-resolution feature maps add unacceptable computational complexity to DETR. In the prior art, the DETR needs longer training time to converge, the model depth is increased for pursuing target detection precision by the variation of the DETR series, the number of stacking parameters is increased, the model structure is complex, the parameter quantity is large, the application of middle-low-end equipment is not facilitated, and the practical application situation is ignored. Because the time complexity and memory complexity of the transducer core self-attention module is quadratic in sequence length. Reduced attention and memory requirements have also been proposed, but they have too much focus on reducing the number of floating point operations performed per second and tend to ignore overhead from internal accesses.
Therefore, aiming at the problems that the existing DETR target detection algorithm is difficult to detect small targets under the unmanned aerial vehicle scene and the problem that the size of an aerial image detection model deployed on the unmanned aerial vehicle is relatively large, a small target detection method facing the unmanned aerial vehicle scene is needed, so that the target detection task in the aerial image can be rapidly and accurately realized under the condition of hardware resource limitation in an aerial image detection system deployed on the unmanned aerial vehicle.
Disclosure of Invention
The invention aims to: the invention provides an unmanned aerial vehicle-oriented small target detection method and device based on improved DETR, which are used for integrating multi-scale and multi-level information and improving the characterization capability of a network so as to improve the detection precision of the small target.
The technical scheme is as follows: the invention provides an unmanned aerial vehicle-oriented small target detection method based on improved DETR, which specifically comprises the following steps:
(1) Constructing a data set aiming at unmanned aerial vehicle small target detection, and dividing the data set into a training set and a testing set;
(2) Constructing a small target detection network based on improved DETR; adopting a SheffeNet-d as a feature extraction network of the DETR, introducing a 1X 1 convolution module, and extracting features along the channel dimension; wherein, the SheffeNet-d is to delete the original global pooling and full connection layer of the SheffeNet v 2; replacing self-attribute in the encoder in the DETR with FlashA section-2; the Neck layer in the DETR adopts a deformable trans-scale feature fusion module defoforming-CCFM;
(3) Adopting Smooth-L1Loss and DIoU Loss as a Loss function of the improved DETR lightweight characteristic extraction network based on FlashA technology-2;
(4) Training a small target detection network based on improved DETR by using a training set;
(5) And inputting the test set into a trained small target detection network based on the improved DETR, and evaluating the network to realize unmanned plane-oriented small target detection.
Further, the implementation process of the SheffeNet-d in the step (2) is as follows:
firstly, an initial image firstly passes through a 3 multiplied by 3 convolution layer with the step length of 2, the convolution layer uses a filter to extract the characteristics of the image, and the size of the image is changed into the original 1/2 size; performing maximum pooling operation on the generated feature map, using pooling cores with the size of 2×2, and taking the maximum value in each 2×2 area, so that the space dimension of the feature map is halved, and the image size is changed into the original 1/4 size; the method comprises the steps that a stage module is formed by a SheffeNet V2 unit 1 and a SheffeNet V2 unit 2, the repetition times of the units 1 and the units 2 in different stage modules are different, the first block of each stage is formed by the SheffeNet V2 unit 1, the stride is 2, the downsampling operation is completed, and the output channel is doubled; in the stage2 module, the times 1 and 3 of repeating the unit 1 and the unit 2 are changed to 1/8 of the original size of the output image; in the stage3 module, the number of times 1 and 7 of repetition of the unit 1 and the unit 2, the output image size becomes 1/16 of the original size; in the stage4 module, the number of times 1 and 3 of repetition of the unit 1 and the unit 2 is repeated, and the output image size is changed to the original 1/32 size; then, the outputs of stage2, stage3 and stage4 are used as multi-scale features, the channel number is unified through 1×1conv, as the input of the multi-scale feature fusion module, the feature map output by stage2 is output as S3 through 1×1conv, the feature map output by stage3 is output as S4 through 1×1conv, and the feature map output by stage4 is output as S5 through 1×1 conv.
Further, the deformable trans-scale feature Fusion module de-formable-CCFM in the step (3) completes feature Fusion through a Fusion module; f5 is taken as F high S4 as F low F is firstly carried out high Upsampling to sum F low The feature map of the same size is then identical to F low Adding channels, performing 1×1 convolution, reducing the number of channels to the previous dimension, dividing the output into two parts, and performing feature interaction on one part through n repeated Repvgg-block blocks; the other part is a residual edge part which is directly connected with the output, and finally the two parts are added element by element, and the output fused by the first Fusion module is taken as F high The method comprises the steps of carrying out a first treatment on the surface of the S3 is taken as F low And similarly, finishing a second Fusion module and outputting a final Fusion feature map.
Further, the Repvggblock has three branches, a main branch with a convolution kernel size of 3×3, a branch with a convolution kernel size of 1×1 and a branch connected with only BN, and the three branches are added element by element, and finally the PRelu activation function is performed.
Further, the step (4) is implemented by the following formula:
wherein b σ(i) The target box representing the i-th index,prediction box, b representing the ith index gt Representing the center points of the prediction frame and the target frame respectively, wherein ρ represents the Euclidean distance between the two center points; c represents the best that can cover both the predicted and target framesDiagonal distance of small rectangle.
Based on the same inventive concept, the present invention also provides an apparatus device comprising a memory and a processor, wherein:
a memory for storing a computer program capable of running on the processor;
a processor for performing the unmanned-oriented small-target detection method steps based on the improved DETR as described above when running said computer program.
Based on the same inventive concept, the present invention also provides a storage medium having stored thereon a computer program which, when executed by at least one processor, implements the unmanned oriented small object detection method steps based on improved DETR as described above.
The beneficial effects are that: compared with the prior art, the invention has the beneficial effects that: aiming at the problem that the existing target detection algorithm is difficult to detect a small target under the unmanned aerial vehicle scene and on a detection model deployed on the unmanned aerial vehicle, the design of a network structure is carried out again, multi-scale and multi-layer information is fused, the characterization capability of the network is improved, and the detection precision of the small target is improved; the lightweight backhaul is used for deployment in practical scenes such as unmanned aerial vehicles better; the novel attention of using the flashportion-2 is to greatly reduce the calculation complexity and improve the calculation efficiency and the utilization rate of the memory.
Drawings
Fig. 1 is a schematic diagram of a modified DETR small target detection network according to the present invention;
FIG. 2 is a schematic diagram of a SheffeNet-d module according to the present invention;
FIG. 3 is a schematic diagram of a module structure of Flashation-2 according to the present invention;
FIG. 4 is a schematic diagram of a default-CCFM module according to the present invention;
FIG. 5 is a schematic diagram of a Fusion module according to the present invention;
FIG. 6 is a schematic diagram of a modified Repvgg-block module structure according to the present invention;
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings.
The invention provides an unmanned aerial vehicle-oriented small target detection method based on improved DETR, which specifically comprises the following steps:
step S1: and constructing a data set aiming at the detection of the small target of the unmanned aerial vehicle, and dividing the data set into a training set and a verification set.
The VisDrone2019 data set is selected and collected by using different unmanned aerial vehicle platforms under different scenes, different weather and illumination conditions, wherein the data set comprises categories of pedestrians, automobiles, bicycles and the like, the backgrounds are different, and the generalization capability of a detection model can be better improved due to different categories. Including 6471 training samples and 1610 test samples, and converting the format of the data set noted in the visclone data set to COCO format.
Step S2: constructing a small target detection network based on improved DETR, as shown in FIG. 1, adopting a SheffeNet-d as a feature extraction network of the DETR, introducing a 1X 1 convolution module, and extracting features along the channel dimension; wherein, the SheffeNet-d is to delete the original global pooling and full connection layer of the SheffeNet v 2; replacing self-attribute in the encoder in the DETR with FlashA section-2; the Neck layer in DETR adopts a variable trans-scale feature fusion module, a variable-form-CCFM.
The invention changes the background of the original DETR into a lightweight feature extraction network, namely a SheffleNet-d, and specifically as shown in figure 2, firstly, an initial image firstly passes through a 3X3 convolution layer with a step length of 2, the convolution layer uses a plurality of filters (also called convolution kernels or convolution weights) to extract the features of the image, and the image size is changed into the original 1/2 size. The resulting feature map is then maximally pooled, typically using a 2x2 size pooling kernel, with a maximum value in each 2x2 region, thereby halving the spatial dimension of the feature map and changing the image size to the original 1/4 size. The stage module consists of a SheffeNetV 2 unit 1 and a SheffeNetV 2 unit 2, the repetition times of the units 1 and the units 2 in different stage modules are different, the first block of each stage consists of the SheffeNetV 2 unit 1, the stride is 2, the downsampling operation is completed, and the output channel is doubled. In the stage2 module, the times 1 and 3 of repeating the unit 1 and the unit 2 are changed to 1/8 of the original size of the output image; in the stage3 module, the number of times 1 and 7 of repetition of the unit 1 and the unit 2, the output image size becomes 1/16 of the original size; in the stage4 module, the number of times of repetition of the units 1 and 2 1 and 3, the output image size becomes the original 1/32 size. Then, the outputs of stage2, stage3 and stage4 are used as multi-scale features, the channel number is unified through 1×1conv, as the input of the multi-scale feature fusion module, the feature map output by stage2 is output as S3 through 1×1conv, the feature map output by stage3 is output as S4 through 1×1conv, and the feature map output by stage4 is output as S5 through 1×1 conv.
As shown in FIG. 3, the S5 is embedded as an x input, and three different linear transformations, called Query, key and Value, are represented by Q, K and V, respectively. And fed into flashportion-2, split Q into several warp while keeping K and V accessible to all warp. Each warp performs a matrix multiplication to obtain a slice of QK T, and then only multiplies with a shared slice of V to obtain a corresponding output slice; no communication is required between the warp. The speed can also be increased by reducing the read-write of the shared memory.
After extracting the features of images with different sizes, three effective feature graphs are obtained and input into a Neck layer, and in order to enhance the expression capability of network features, the embodiment of the invention provides a performable-CCFM in the Neck layer, and the output of an Encoder is adjusted back to two dimensions and marked as F5 so as to finish the subsequent cross-scale feature fusion. As shown in fig. 4, the Fusion of the features is completed by a Fusion module using S3, S4, and F5 as inputs of a formable-CCFM. As shown in FIG. 5, F5 is first taken as F in the Fusion module high S4 as F low We first put F high Upsampling to sum F low The feature map of the same size is then identical to F low Adding on channels, then performing 1×1 convolution, reducing the number of channels to the previous dimension, then dividing the output into two parts, and performing feature interaction on one part through n repeated Repvgg-block blocks, wherein three branches are formed by Repvgg-block as shown in FIG. 6: a main convolution kernel of 3x3 sizeBranches, a branch with convolution kernel size of 1×1 and a branch connected only with BN, and the three branches are added element by element, and finally the PRelu activation function is performed. The other part is a residual edge part, which is directly connected with the output, and finally the two parts are added element by element. Taking the fused output as F high S3 is taken as F low The above Fusion step was performed in the same manner. The invention introduces multi-scale feature fusion to enhance the capability of small target detection and improve the small target detection precision. And flattening the finally fused feature map into two dimensions, and selecting Top-K features from the encoder to initialize the target query by query selection, wherein K=100 is defaulted.
The Fusion module firstly carries out up-sampling operation on the high-level characteristic diagram, then carries out channel addition with the low-level characteristic diagram, then carries out 1X 1 convolution to reduce the number of channels to the previous dimension, then divides output into two parts, and carries out characteristic interaction on one part through n repeated improved Repvgg-block blocks, wherein when the Relu activation function in the Repvggblock processes the negative number part, the output is constant 0, so that the gradient vanishes, and the PRelu activation function can adjust the slope of the negative number part through learning parameters, so that the gradient decline problem is avoided.
Step S3: smooth-L1 and DIoULSs were used as loss functions for the improved DETR lightweight feature extraction network based on Flashatttion-2.
100 target queries are taken as input of a decoder, 100 token after attention and mapping are output, and then the token is put into two FFNs simultaneously, so that the position and class scores of 100 frames can be obtained. And finally, carrying out bipartite graph matching on the prediction frame and the real frame, and calculating a loss function. In the invention, the regression loss of the original DETR is optimized, and in order to improve the detection accuracy, prediction regression is performed on the detection frame by combining the Smooth-L1 and the DIoU loss function as the regression loss.
Wherein the method comprises the steps of,b σ(i) The target box representing the i-th index,representing the prediction box of the i-th index.
The smoth-L1 loss function uses only the abscissa and ordinate values and the length and width values of the prediction frame and the target frame when calculating the loss, and cannot describe whether a containment relationship exists between the prediction frame and the target frame. For this problem, a DIoU loss function is introduced in calculating the regression loss to calculate the overlap loss between the prediction box and the target box.
Wherein b, b gt The center points of the prediction frame and the target frame are represented, respectively, and ρ represents the euclidean distance between the two center points, and c represents the diagonal distance of the minimum rectangle that can cover both the prediction frame and the target frame.
Step S4: training the improved DETR-based lightweight feature extraction network with the training set; the validation set is input into a trained improved DETR lightweight feature-based extraction network, and the network is evaluated.
Based on the same inventive concept, the present invention also provides an apparatus device comprising a memory and a processor, wherein: a memory for storing a computer program capable of running on the processor; a processor for performing the unmanned-oriented small-target detection method steps based on the improved DETR as described above when running said computer program.
Based on the same inventive concept, the present invention also provides a storage medium having stored thereon a computer program which, when executed by at least one processor, implements the unmanned oriented small object detection method steps based on improved DETR as described above.
Thus far, the technical solution of the present invention has been described in connection with the specific experimental procedure shown in the drawings, but the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

Claims (7)

1. An unmanned aerial vehicle-oriented small target detection method based on improved DETR (det) is characterized by comprising the following steps:
(1) Constructing a data set aiming at unmanned aerial vehicle small target detection, and dividing the data set into a training set and a testing set;
(2) Constructing a small target detection network based on improved DETR; adopting a SheffeNet-d as a feature extraction network of the DETR, introducing a 1X 1 convolution module, and extracting features along the channel dimension; wherein, the SheffeNet-d is to delete the original global pooling and full connection layer of the SheffeNet v 2; replacing self-attribute in the encoder in the DETR with FlashA section-2; the Neck layer in the DETR adopts a deformable trans-scale feature fusion module defoforming-CCFM;
(3) Adopting Smooth-L1Loss and DIoU Loss as a Loss function of the improved DETR lightweight characteristic extraction network based on FlashA technology-2;
(4) Training a small target detection network based on improved DETR by using a training set;
(5) And inputting the test set into a trained small target detection network based on the improved DETR, and evaluating the network to realize unmanned plane-oriented small target detection.
2. The unmanned aerial vehicle-oriented small target detection method based on improved DETR of claim 1, wherein the ShuffleNet-d implementation procedure of step (2) is as follows:
firstly, an initial image firstly passes through a 3 multiplied by 3 convolution layer with the step length of 2, the convolution layer uses a filter to extract the characteristics of the image, and the size of the image is changed into the original 1/2 size; performing maximum pooling operation on the generated feature map, using pooling cores with the size of 2x2, and taking the maximum value in each 2x2 area, so that the space dimension of the feature map is halved, and the image size is changed into the original 1/4 size; the method comprises the steps that a stage module is formed by a shuffle V2 unit 1 and a shuffle V2 unit 2, the repetition times of the units 1 and the units 2 in different stage modules are different, a first block of each stage is formed by the shuffle V2 unit 1, a stride is 2, downsampling operation is completed, and output channels are doubled; in the stage2 module, the times 1 and 3 of repeating the unit 1 and the unit 2 are changed to 1/8 of the original size of the output image; in the stage3 module, the number of times 1 and 7 of repetition of the unit 1 and the unit 2, the output image size becomes 1/16 of the original size; in the stage4 module, the number of times 1 and 3 of repetition of the unit 1 and the unit 2 is repeated, and the output image size is changed to the original 1/32 size; then, the outputs of stage2, stage3 and stage4 are used as multi-scale features, the channel number is unified through 1×1conv, as the input of the multi-scale feature fusion module, the feature map output by stage2 is output as S3 through 1×1conv, the feature map output by stage3 is output as S4 through 1×1conv, and the feature map output by stage4 is output as S5 through 1×1 conv.
3. The unmanned aerial vehicle-oriented small target detection method based on improved DETR of claim 1, wherein the deformable cross-scale feature Fusion module defoforming-CCFM of step (3) completes feature Fusion through a Fusion module; f5 is taken as F high S4 as F low F is firstly carried out high Upsampling to sum F low The feature map of the same size is then identical to F low Adding channels, performing 1×1 convolution, reducing the number of channels to the previous dimension, dividing the output into two parts, and performing feature interaction on one part through n repeated Repvgg-block blocks; the other part is a residual edge part which is directly connected with the output, and finally the two parts are added element by element, and the output fused by the first Fusion module is taken as F high The method comprises the steps of carrying out a first treatment on the surface of the S3 is taken as F low And similarly, finishing a second Fusion module and outputting a final Fusion feature map.
4. A small target detection method for unmanned aerial vehicle based on improved DETR according to claim 3, wherein the Repvgg block has three branches in parallel, a main branch with a convolution kernel size of 3x3, a branch with a convolution kernel size of 1 x 1 and a branch connected only with BN, and the three branches are added element by element, and finally the three branches are subjected to a pralu activation function.
5. The unmanned aerial vehicle-oriented small target detection method based on the improved DETR of claim 1, wherein the step (4) is implemented by the following formula:
wherein b σ(i) The target box representing the i-th index,prediction box, b representing the ith index gt Representing the center points of the prediction frame and the target frame respectively, wherein ρ represents the Euclidean distance between the two center points; c represents the diagonal distance of the smallest rectangle that can cover both the prediction box and the target box.
6. An apparatus device comprising a memory and a processor, wherein:
a memory for storing a computer program capable of running on the processor;
a processor for performing the unmanned oriented small object detection method steps based on the improved DETR as claimed in any of claims 1-5 when running said computer program.
7. A storage medium having stored thereon a computer program which, when executed by at least one processor, implements the unmanned oriented small target detection method steps based on improved DETR as claimed in any of claims 1-5.
CN202310931094.3A 2023-07-27 2023-07-27 Unmanned aerial vehicle-oriented small target detection method and device based on improved DETR Pending CN116958687A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310931094.3A CN116958687A (en) 2023-07-27 2023-07-27 Unmanned aerial vehicle-oriented small target detection method and device based on improved DETR

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310931094.3A CN116958687A (en) 2023-07-27 2023-07-27 Unmanned aerial vehicle-oriented small target detection method and device based on improved DETR

Publications (1)

Publication Number Publication Date
CN116958687A true CN116958687A (en) 2023-10-27

Family

ID=88444236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310931094.3A Pending CN116958687A (en) 2023-07-27 2023-07-27 Unmanned aerial vehicle-oriented small target detection method and device based on improved DETR

Country Status (1)

Country Link
CN (1) CN116958687A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117191821A (en) * 2023-11-03 2023-12-08 山东宇影光学仪器有限公司 High-light-transmittance Fresnel lens real-time detection method based on defocable-DAB-DETR

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117191821A (en) * 2023-11-03 2023-12-08 山东宇影光学仪器有限公司 High-light-transmittance Fresnel lens real-time detection method based on defocable-DAB-DETR
CN117191821B (en) * 2023-11-03 2024-02-06 山东宇影光学仪器有限公司 High-light-transmittance Fresnel lens real-time detection method based on defocable-DAB-DETR

Similar Documents

Publication Publication Date Title
CN110378381B (en) Object detection method, device and computer storage medium
US10353271B2 (en) Depth estimation method for monocular image based on multi-scale CNN and continuous CRF
Shih et al. Real-time object detection with reduced region proposal network via multi-feature concatenation
CN114202672A (en) Small target detection method based on attention mechanism
JP2022515895A (en) Object recognition method and equipment
CN108537824B (en) Feature map enhanced network structure optimization method based on alternating deconvolution and convolution
CN111046821B (en) Video behavior recognition method and system and electronic equipment
CN110473137A (en) Image processing method and device
US11017542B2 (en) Systems and methods for determining depth information in two-dimensional images
CN113065645B (en) Twin attention network, image processing method and device
CN111488901B (en) Method and device for extracting features from input images in multiple modules in CNN
CN112580480B (en) Hyperspectral remote sensing image classification method and device
CN112529904A (en) Image semantic segmentation method and device, computer readable storage medium and chip
Yoo et al. Fast training of convolutional neural network classifiers through extreme learning machines
CN115171165A (en) Pedestrian re-identification method and device with global features and step-type local features fused
CN116958687A (en) Unmanned aerial vehicle-oriented small target detection method and device based on improved DETR
CN116863194A (en) Foot ulcer image classification method, system, equipment and medium
CN112084897A (en) Rapid traffic large-scene vehicle target detection method of GS-SSD
CN117011655A (en) Adaptive region selection feature fusion based method, target tracking method and system
Xie et al. Pedestrian detection and location algorithm based on deep learning
CN115410087A (en) Transmission line foreign matter detection method based on improved YOLOv4
CN116486203B (en) Single-target tracking method based on twin network and online template updating
CN114898359B (en) Litchi plant diseases and insect pests detection method based on improvement EFFICIENTDET
Lim et al. Global and local multi-scale feature fusion for object detection and semantic segmentation
CN115240121B (en) Joint modeling method and device for enhancing local features of pedestrians

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination