CN115471670A - Space target detection method based on improved YOLOX network model - Google Patents

Space target detection method based on improved YOLOX network model Download PDF

Info

Publication number
CN115471670A
CN115471670A CN202210874032.9A CN202210874032A CN115471670A CN 115471670 A CN115471670 A CN 115471670A CN 202210874032 A CN202210874032 A CN 202210874032A CN 115471670 A CN115471670 A CN 115471670A
Authority
CN
China
Prior art keywords
module
layer
yolox
conv
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210874032.9A
Other languages
Chinese (zh)
Inventor
张海峰
艾汗
董森
任龙
冯佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XiAn Institute of Optics and Precision Mechanics of CAS
Original Assignee
XiAn Institute of Optics and Precision Mechanics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XiAn Institute of Optics and Precision Mechanics of CAS filed Critical XiAn Institute of Optics and Precision Mechanics of CAS
Priority to CN202210874032.9A priority Critical patent/CN115471670A/en
Publication of CN115471670A publication Critical patent/CN115471670A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an image detection method, in particular to a space target detection method based on an improved YOLOX network model, and solves the technical problems of algorithm complexity, low detection rate and poor generalization capability of the traditional space target detection method under extremely complex space environment. The space target detection method based on the improved YOLOX network model comprises the following steps: step S1: acquiring a space target detection data set with labels and tags; step S2: constructing a YOLOX network model; s3, inputting the training set and the verification set obtained in the step S1 into the YOLOX network model constructed in the step S2, training and verifying to obtain a spatial target detection model and a prediction weight thereof, and continuously performing iterative optimization on the prediction weight through forward propagation and backward propagation to obtain a trained YOLOX network model; and S4, inputting the space target image in the test set into a trained YOLOX network model for space target detection. And high-precision detection of the space target is realized.

Description

Space target detection method based on improved YOLOX network model
Technical Field
The invention relates to an image detection method, in particular to a space target detection method based on an improved YOLOX network model.
Background
With the continuous development and competition of the related technologies in the space field, the space target detection is mutually crossed with other important fields, which becomes the important basis of the aerospace technology and has great research significance.
For space target detection, the traditional detection algorithm mainly extracts features of a selected area based on features of straight lines, polygons, ellipses and the like, and then judges the type of a target through the features. However, the traditional method has higher algorithm complexity, low detection rate and poor generalization capability in extremely complex space environment. Therefore, research on fast real-time algorithms, high accuracy and high reliability has become a hotspot.
In recent years, with the rapid development of computer technology and image processing technology, the convolutional layer neural network has made great progress in the field of target detection, and compared with the traditional identification method, the convolutional layer neural network has stronger feature expression capability on a target. The two-stage detection algorithm is based on candidate areas and then classifies the selected areas, such as Faster R-CNN, mask R-CNN, R-FCN and the like, the algorithm has high detection precision, but the detection speed is still unsatisfactory; the one-stage detection algorithm can directly position the target and output the type detection information of the target, such as SSD, YOLOv3, YOLOv4, YOLOv5 and the like. However, in many target detection efforts, the following problems exist: 1. the targets are different in size and cannot be effectively detected and identified; 2. the background is complicated, and misjudgment is easily caused. With the continuous development of artificial intelligence technology, deep learning methods begin to penetrate into various fields, so that intelligent supervision on space environment is urgently required.
Disclosure of Invention
The invention aims to provide a space target detection method based on an improved YOLOX network model aiming at the technical problems of low detection rate and poor generalization capability of the traditional space target detection method in extremely complex space environment, so that the detection precision of a space target is improved, and experiments show that the model can detect the target more accurately and quickly.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a space target detection method based on an improved YOLOX network model is characterized by comprising the following steps:
step S1: acquiring a spatial target detection data set with labels and tags, and dividing the spatial target detection data set into a training set, a verification set and a test set;
step S2: constructing a YOLOX network model, wherein the YOLOX network model comprises a Backbone feature extraction module Backbone network, a reinforced feature extraction module related Encoder network and a decoupling output module Yolohead network;
s3, inputting the training set and the verification set obtained in the step S1 into the YOLOX network model constructed in the step S2, training and verifying to obtain a spatial target detection model and prediction weights thereof, and continuously performing iterative optimization on the prediction weights through forward propagation and backward propagation to obtain a trained YOLOX network model;
and S4, inputting the space target image in the test set into a trained YOLOX network model for space target detection.
Further, step S1 specifically includes:
s11, obtaining an image with a space target, and performing Copy-Reduce-Paste data enhancement on the image to obtain an enhanced image;
s12, labeling the enhanced image obtained in the step S11, and obtaining a space target position corresponding to the enhanced image and an XML labeling file of the type of the space target position; establishing a space target detection data set by the enhanced image and the corresponding XML markup file;
s13, the space target detection data set obtained in the step S12 is processed according to the following steps of 8:1:1 are randomly divided into a training set, a validation set and a test set.
Further, step S2 specifically includes:
s21: constructing a Backbone feature extraction module Backbone network;
s22: constructing a reinforced feature extraction module related Encoder network;
s23: and constructing a decoupling output module Yolohead network to complete the construction of a Yolox network model.
Further, the Backbone feature extraction module backhaul network in step S21 includes a Focus module, a depth separable convolution layer, a residual module, and an sppbottleeck module;
the depth-separable convolutional layers include a first depth-separable convolutional layer, a second depth-separable convolutional layer, a third depth-separable convolutional layer, a fourth depth-separable convolutional layer, and a fifth depth-separable convolutional layer;
the residual module comprises a first CspLayer module, a second CspLayer module, a third CspLayer module and a fourth CspLayer module;
the Focus module, the first depth separable convolution layer, the second depth separable convolution layer, the first CspLayer module, the third depth separable convolution layer, the second CspLayer module, the fourth depth separable convolution layer, the third CspLayer module, the fifth depth separable convolution layer, the SPPBottleeck module and the fourth CspLayer module are connected in sequence;
the second CspLayer module generates a first feature layer; a third csplyer module produces a second feature layer and a fourth csplyer module produces a third feature layer.
Further, in step S22, the modified Encoder network of the enhanced feature extraction module includes 1 initial convolutional layer module, Z extended residual blocks, and Z-1 attention mechanism feedback modules CBAM, where Z is a positive integer;
step S22 specifically includes:
s221: constructing an initial convolutional layer module, and adding the first characteristic layer m obtained in the step S21 1 Using 1 × 1 convolutional layer as input of initial convolutional layer module to reduce channel dimension, adding 3 × 3 convolutional layer to refine semantic context, and obtaining output x of initial convolutional layer module 1
x 1 =conv 2 (conv 1 (m 1 ))
In the formula, conv 1 Is 1 × 1 convolutional layer, conv 2 Is a 3 × 3 convolutional layer;
s222: building an extended residual block, and outputting x of the initial convolutional layer module obtained in step S221 1 Performing convolution layer operation to obtain output X of the residual error block i
X i =x i +conv 5 (conv 4 (conv 3 (x i )))
In the formula, x i For the input of the i-th extended residual block, conv 3 、conv 5 Are all 1X 1 convolutional layers, conv 4 Is a 3X 3 convolution layer, X i For the output of the ith residual block for expansion,
Figure BDA0003756143960000031
s223: constructing an attention mechanism feedback module CBAM, and expanding the output X of the residual error block in the step S222 i Inputting an attention mechanism feedback module CBAM to obtain a channel attention output characteristic diagram
Figure BDA0003756143960000032
And spatial attention output profile
Figure BDA0003756143960000033
And outputting the spatial attention as a feature map
Figure BDA0003756143960000034
Output characteristic diagram Y as attention mechanism feedback module CBAM i
S224: establishing a recursive reinforced feature extraction module scaled Encoder network:
x i+1 =Y i
through the steps S222 to S223, the output characteristic diagram Y of the attention mechanism feedback module CBAM at the Z-1 st time is obtained Z-1 To obtain the input x of the Z-th expansion residual block Z X is to be Z Substituting into the output X of the expanded residual block in step S222 i =x i +conv 5 (conv 4 (conv 3 (x i ) ) obtaining a reinforced characteristic layer of a reinforced characteristic extraction module related to the first characteristic layer;
s225: and repeating the steps S221 to S224 to obtain a second feature layer and a third feature layer corresponding to the enhanced feature layer output by the enhanced feature extraction module related Encoder network, and completing the construction of the enhanced feature extraction module related Encoder network.
Further, in step S23, the decoupling output module YoloHead network includes a dynamic convolution layer, a layer attention mechanism and a prediction parameter layer;
step S23 specifically includes:
s231: calculating task interaction characteristics of a decoupling output module YoloHead network to obtain a dynamic convolution layer
Figure BDA0003756143960000041
Comprises the following steps:
X∈R×H×W×C
Figure BDA0003756143960000042
wherein, X is one of the enhanced feature layers obtained in step S225, R, H, W and C respectively represent the number of images batchSize, the image height, the image width and the number of channels of the YOLOX network model input each time, δ denotes a relu activation function, conv k Refers to the k-th convolutional layer,
Figure BDA0003756143960000043
s232: using the dynamic convolution layer obtained in step S231 using a layer attention mechanism
Figure BDA0003756143960000044
Feature layer for computational classification and regression tasks
Figure BDA0003756143960000045
w=σ(fc 2 (δ(fc 1 (x inter ))))
Figure BDA0003756143960000046
Wherein x is inter Is a spliced dynamic convolution layer
Figure BDA0003756143960000047
Characteristic map, fc, obtained thereafter 1 Is the first fully-connected layer, fc 2 Is a second fully-connected layer, w is x inter The k-dimensional weight variable calculated by the layer attention mechanism can capture the dependency relationship, w, between k convolutional layers k Is the kth element of w, σ is the sigmoid function;
s233: according to the characteristic layer in S232
Figure BDA0003756143960000048
Obtaining a prediction parameter Z for classification or regression obtained by the enhanced feature layer through a decoupling output module YoloHead network task
Z task =conv 12 (δ(conv 11 (X task )))
Wherein, X task Is a characteristic layer
Figure BDA0003756143960000051
Splicing characteristic map of (1), conv 11 1 × 1 convolution layer for adjusting the number of channels, conv 12 Convolutional layer for generating prediction parameter Z task
S234: and repeating the steps S231 to S233 to obtain prediction parameters of all the reinforced feature layers obtained through a decoupling output module YoloHead network, and completing the construction of a Yolox network model.
Further, step S3 specifically includes:
s31: inputting the RGB pictures of the training set and the verification set in the step S13 in the YOLOX network model, and carrying out slicing operation on the RGB pictures by using the Focus module in the step S21;
s32: inputting the RGB picture processed in the step S31 into a Backbone feature extraction module Backbone network, and obtaining an effective feature layer through the residual error module and the depth separable convolution layer in the step S21;
s33: respectively inputting the effective characteristic layers obtained in the step S32 into a reinforced characteristic extraction module scaled Encoder network to obtain effective reinforced characteristic layers;
s34: inputting the effective reinforced characteristic layer obtained in the step S33 into a YoloHead network of a decoupling output module to obtain a prediction parameter of the effective reinforced characteristic layer; the prediction parameters comprise category prediction parameters Cls, target frame parameters Reg and foreground/background parameters Obj;
s35: stacking the category prediction parameters Cls, the target frame parameters Reg and the foreground/background parameters Obj in the step S34 to obtain a prediction characteristic layer;
s36: and calculating the prediction parameters of the prediction feature layer in the step S35 and the category prediction parameters Cls and the target frame parameters Reg and the cross entropy loss of the foreground/background parameters Obj in the training set in the enhanced image in the step S12 and the corresponding XML annotation file, and continuously performing iterative optimization according to the model prediction weight of the cross entropy loss until a space target detection model is obtained.
Further, step S223 specifically includes:
the channel attention output characteristic graph calculation formula is as follows:
Figure BDA0003756143960000052
calculating a spatial attention output feature map from the channel attention output feature map:
Figure BDA0003756143960000053
taking the spatial attention output characteristic map as an output characteristic map of an attention mechanism feedback module CBAM:
Figure BDA0003756143960000054
in the formula, avgPool is the average pooling, maxPool is the maximum pooling operation, conv 6 、conv 7 、conv 8 、conv 9 Are all 1X 1 convolutional layers, conv 10 Is a 7 x 7 convolutional layer, cat is based on a one-dimensional stitching operation.
Further, the method also comprises the step S5:
and (3) inputting the space target image of the test set into the YOLOX network model constructed in the step (S2), and evaluating the overall detection performance of the YOLOX network model.
Further, in step S5, the method for evaluating the overall detection performance of the YOLOX network model specifically includes: the integral detection performance of the YOLOX network model meets the average value of the evaluation index AP of the average detection precision and all the evaluation indexes AP, namely the average accuracy rate mAP;
Figure BDA0003756143960000061
Figure BDA0003756143960000062
Figure BDA0003756143960000063
in the formula, P represents accuracy Precision and is used for evaluating the predicted accuracy; r represents a Recall rate Recall and is used for evaluating and predicting how many correct samples are predicted; TP refers to the positive samples of the positive class predicted by the model, FP is the negative samples of the positive class predicted by the model, and FN is the positive samples of the negative class predicted by the model.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
(1) The method comprises the steps of constructing a YOLOX network model, wherein the YOLOX network model comprises a Backbone feature extraction module Backbone network, a reinforced feature extraction module related Encoder network and a decoupling output module Yolohead network; the detection precision and speed of the method for the natural images reach the current higher level; on the basis, the network structure of the YOLOX network model is improved, and a spatial target detection data set with labels and tags is used for training and testing to obtain the spatial target detection model. In practical applications, the improved YOLOX algorithm is better weighted than the YOLO v3, YOLO v4 and YOLO v5 networks.
(2) The method can realize real-time detection of the space target, continuously performs iterative optimization on the prediction weight of the YOLOX network model through forward propagation and backward propagation, and each model evaluation index of the YOLOX network model achieves a better effect, so that the YOLOX network model can effectively detect and identify a specific type.
Drawings
FIG. 1 is a flow chart of the method for detecting a spatial target based on improved YOLOX.
FIG. 2 is a schematic diagram of a YOLOX network in accordance with an embodiment of the invention.
FIG. 3 is a schematic diagram of SPPBottllenck in the embodiment of the present invention.
FIG. 4 is a diagram of a DiateEncoder network in YOLOX in accordance with an embodiment of the present invention.
FIG. 5 is a schematic diagram of the attention mechanism feedback module CBAM in the DialateEncoder network in YOLOX in accordance with an embodiment of the present invention.
FIG. 6 is a schematic diagram of the YoloHead network in Yolox in accordance with an embodiment of the present invention.
Fig. 7 is a schematic diagram illustrating a spatial target detection effect according to an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art without creative efforts based on the technical solutions of the present invention belong to the protection scope of the present invention.
The invention discloses a space target detection method based on an improved YOLOX network model, which comprises the following steps as shown in figure 1:
step S1: acquiring a spatial target detection data set with labels and labels, and dividing the spatial target detection data set into a training set, a verification set and a test set;
s11, obtaining an image with a space target, and performing Copy-Reduce-Paste data enhancement on the image to obtain an enhanced image;
s12, labeling the enhanced image obtained in the step S11, and acquiring a space target position corresponding to the enhanced image and an XML labeling file of the type of the space target position; establishing a space target detection data set by the enhanced image and the corresponding XML markup file;
s13, the space target detection data set obtained in the step S12 is processed according to the following steps of 8:1:1 is randomly divided into a training set, a validation set and a test set.
Step S2: constructing a YOLOX network model, wherein the YOLOX network model comprises a Backbone feature extraction module Backbone network, a reinforced feature extraction module related Encoder network and a decoupling output module Yolohead network;
s21: constructing a Backbone feature extraction module Backbone network;
the Backbone feature extraction module Backbone network comprises a Focus module, a depth separable convolution layer, a residual module and an SPPBottlenck module;
the depth-separable convolutional layers include a first depth-separable convolutional layer, a second depth-separable convolutional layer, a third depth-separable convolutional layer, a fourth depth-separable convolutional layer, and a fifth depth-separable convolutional layer;
the residual module comprises a first CspLayer module, a second CspLayer module, a third CspLayer module and a fourth CspLayer module;
the device comprises a Focus module, a first depth separable convolution layer, a second depth separable convolution layer, a first CspLayer module, a third depth separable convolution layer, a second CspLayer module, a fourth depth separable convolution layer, a third CspLayer module, a fifth depth separable convolution layer, an SPPBottleeck module and a fourth CspLayer module which are sequentially arranged;
the second CspLayer module generates a first feature layer; a third csplyer module produces a second feature layer and a fourth csplyer module produces a third feature layer.
S22: constructing a reinforced feature extraction module related Encoder network;
in step S22, the enhanced feature extraction module related Encoder network comprises 1 initial convolution layer module, Z expansion residual blocks and Z-1 attention mechanism feedback modules CBAM, wherein Z is a positive integer;
s221: constructing an initial convolutional layer module, and adding the first characteristic layer m obtained in the step S21 1 Using 1 × 1 convolutional layer as input of initial convolutional layer module to reduce channel dimension, adding 3 × 3 convolutional layer to refine semantic context, and obtaining output x of initial convolutional layer module 1
x 1 =conv 2 (conv 1 (m 1 ))
In the formula, conv 1 Is 1 × 1 convolutional layer, conv 2 Is a 3 × 3 convolutional layer;
s222: building an extended residual block, and outputting x of the initial convolutional layer module obtained in step S221 1 Performing convolution layer operation to obtain output X of the residual block i
X i =x i +conv 5 (conv 4 (conv 3 (x i )))
In the formula, x i For the input of the i-th extended residual block, conv 3 、conv 5 All are 1 × 1 convolutional layers, conv 4 Is a 3X 3 convolution layer, X i For the output of the ith residual block for expansion,
Figure BDA0003756143960000081
s223: constructing an attention mechanism feedback module CBAM, and expanding the output X of the residual error block in the step S222 i Inputting an attention mechanism feedback module CBAM to obtain a channel attention output characteristic diagram
Figure BDA0003756143960000082
And spatial attention output profile
Figure BDA0003756143960000083
And outputting the spatial attention as a feature map
Figure BDA0003756143960000084
Output characteristic diagram Y as attention mechanism feedback module CBAM i
The channel attention output characteristic graph calculation formula is as follows:
Figure BDA0003756143960000091
calculating a spatial attention output feature map from the channel attention output feature map:
Figure BDA0003756143960000092
taking the spatial attention output characteristic map as an output characteristic map of an attention mechanism feedback module CBAM:
Figure BDA0003756143960000093
in the formula, avgPool is average pooling, maxpool is maximum pooling operation, conv 6 、conv 7 、conv 8 、conv 9 Are all 1X 1 convolutional layers, conv 10 Is a 7 x 7 convolutional layer, cat is based on a one-dimensional stitching operation.
S224: establishing a recursive reinforced feature extraction module scaled Encoder network:
x i+1 =Y i
through steps S222 to S223, the output characteristic diagram Y of the attention mechanism feedback module CBAM at the Z-1 st time is obtained Z-1 To obtain the input x of the Z-th expansion residual block Z X is to Z Substituting into the output X of the expanded residual block in step S222 i =x i +conv 5 (conv 4 (conv 3 (x i ) )) to obtain a reinforced characteristic layer of a reinforced characteristic extraction module scaled Encoder network corresponding to the first characteristic layer;
s225: and repeating the steps S221 to S224 to obtain a second feature layer and a third feature layer corresponding to the enhanced feature layer output by the enhanced feature extraction module related Encoder network, and completing the construction of the enhanced feature extraction module related Encoder network.
S23: and constructing a decoupling output module Yolohead network to complete the construction of a Yolox network model.
In the step S23, the decoupling output module YoloHead network comprises a dynamic convolution layer, a layer attention mechanism and a prediction parameter layer;
s231: calculating task interaction characteristics of a decoupling output module YoloHead network to obtain a dynamic convolution layer
Figure BDA0003756143960000094
Comprises the following steps:
X∈R×H×W×C
Figure BDA0003756143960000095
wherein, X is one of the enhancement feature layers obtained in step S225, R, H, W and C respectively represent the number of images batchSize, image height, image width and channel number of the YOLOX network model input each time, δ indicates the relu activation function, conv k Refers to the k-th convolutional layer,
Figure BDA0003756143960000096
S232:using the dynamic convolution layer obtained in step S231 using a layer attention mechanism
Figure BDA0003756143960000101
Feature layer for computational classification and regression tasks
Figure BDA0003756143960000102
w=σ(fc 2 (δ(fc 1 (x inter ))))
Figure BDA0003756143960000103
Wherein x is inter Is a spliced dynamic convolution layer
Figure BDA0003756143960000104
Characteristic map, fc, obtained thereafter 1 Is the first fully-connected layer, fc 2 Is a second fully connected layer, w is x inter The k-dimensional weight variables calculated by the layer attention mechanism can capture the dependency relationship between k convolutional layers, w k Is the kth element of w, σ is the sigmoid function;
s233: according to the characteristic layer in S232
Figure BDA0003756143960000105
Obtaining a prediction parameter Z for classification or regression obtained by the enhanced feature layer through a decoupling output module YoloHead network task
Z task =conv 12 (δ(conv 11 (X task )))
Wherein, X task Is a characteristic layer
Figure BDA0003756143960000106
Splicing characteristic map of (1), conv 11 Is 1 × 1 convolution layer for adjusting the number of channels, conv 12 Is 1 × 1 convolutional layer, and is used for generating prediction parameter Z task
S234: and repeating the steps S231 to S233 to obtain prediction parameters of all the reinforced feature layers obtained through a decoupling output module YoloHead network, and completing the construction of a Yolox network model.
S3, inputting the training set and the verification set obtained in the step S1 into the YOLOX network model constructed in the step S2, training and verifying to obtain a spatial target detection model and a prediction weight thereof, and continuously performing iterative optimization on the prediction weight through forward propagation and backward propagation to obtain a trained YOLOX network model;
s31: inputting the RGB pictures of the training set and the verification set in the step S13 in the YOLOX network model, and slicing the RGB pictures by using the Focus module in the step S21;
s32: inputting the RGB picture processed in the step S31 into a Backbone feature extraction module Backbone network, and obtaining an effective feature layer through the residual error module and the depth separable convolution layer in the step S21;
s33: respectively inputting the effective characteristic layers obtained in the step S32 into a reinforced characteristic extraction module scaled Encoder network to obtain effective reinforced characteristic layers;
s34: inputting the effective reinforced characteristic layer obtained in the step S33 into a YoloHead network of a decoupling output module to obtain a prediction parameter of the effective reinforced characteristic layer; the prediction parameters comprise category prediction parameters Cls, target frame parameters Reg and foreground/background parameters Obj;
s35: stacking the category prediction parameters Cls, the target frame parameters Reg and the foreground/background parameters Obj in the step S34 to obtain a prediction characteristic layer;
s36: and (4) calculating the prediction parameters of the prediction feature layer in the step (S35) and the cross entropy losses of the category prediction parameters Cls and the target frame parameters Reg and the foreground/background parameters Obj in the enhanced image in the step (S12) and the corresponding XML annotation file, and continuously performing iterative optimization according to the model prediction weight of the cross entropy losses until a space target detection model is obtained.
And S4, inputting the space target image in the test set into a trained YOLOX network model for space target detection.
S41: putting the weight of the space target detection model trained in the step S36 into the YOLOX network model constructed in the step S235;
s42: and (4) inputting the space target image in the test set obtained in the step (S13) into the YOLOX network model constructed in the step (S41), and evaluating the overall detection performance of the space target detection model.
The overall detection performance of the space target detection model meets the evaluation index AP and the average accuracy mAP of the average detection precision, wherein the mAP is the average value of the average detection precision APs of all categories.
Step S5: and (3) inputting the space target image of the test set into the YOLOX network model constructed in the step (S2), and evaluating the overall detection performance of the YOLOX network model.
The present invention will be described in detail with reference to specific examples.
S1: acquiring a space target detection data set with labels and tags;
s11, obtaining an image with a space target, performing Copy-Reduce-Paste data enhancement on the image, wherein the Copy-Reduce-Paste data enhancement refers to the steps of reducing/amplifying the space target of the image, pasting the image to an original image, increasing the number of space targets with different sizes, obtaining an enhanced image with a plurality of space targets with different sizes, and improving the extraction effect of a YOLOX network model on image features.
S12, labeling the enhanced images obtained in the step S11 by using a deep learning image labeling tool LabelImg, and acquiring an XML labeling file which comprises a space target position and a type thereof and corresponds to each enhanced image; the annotation category comprises three types of targets, namely a Satellite body (Satellite), a Satellite Cabin body (Cabin) and a solar sailboard (Windsurfing), and the space target image and the corresponding XML annotation file are used for establishing a space target detection data set.
S13, the space target detection data set obtained in the step S12 is processed according to the following steps of 8:1:1, randomly dividing the space target detection model into a training set, a verification set and a test set, and training and testing the space target detection model.
S2: a YOLOX network model, a schematic diagram of the YOLOX network, was constructed, as shown in fig. 2.
The trunk feature extraction module Backbone network comprises a Focus module, a depth separable convolution layer (Conv 2D _ BN _ SilU), a residual error module (CspLayer) and an SPPBottleeck module; the depth-separable convolutional layers include a first depth-separable convolutional layer, a second depth-separable convolutional layer, a third depth-separable convolutional layer, a fourth depth-separable convolutional layer, and a fifth depth-separable convolutional layer; the residual error module comprises a first residual error module, a second residual error module, a third residual error module and a fourth residual error module;
the Backbone feature extraction module backhaul network is formed by a Focus module, a first depth separable convolutional layer, a second depth separable convolutional layer, a first CsppLayer module, a third depth separable convolutional layer, a second CsppLayer module, a fourth depth separable convolutional layer, a third CsppLayer module, a fifth depth separable convolutional layer, an SPPBottleenemy module and a fourth CsppLayer module in sequence;
the Focus module obtains values of every other pixel in an enhanced picture to obtain four independent feature layers, then the four independent feature layers are stacked, input channels are expanded by four times, and the spliced feature layers are changed into twelve channels relative to the original three channels; the depth separable convolutional layer is used for changing the mode of the convolutional layer to reduce the execution times of the convolutional layer; the residual error module is connected through two branches, wherein one branch is used for carrying out convolutional layer standardization and activation function operation on an input characteristic layer, the other branch is processed through the activation function and then is processed through the n residual error modules, and the last two branches are connected; as shown in fig. 3, the sppbotttleneck module performs feature extraction through maximal pooling of different pooled core sizes, so as to improve the receptive field of the backhaul network; finally, the second csplyer module generates the first feature layer; a third csplyer module produces a second feature layer and a fourth csplyer module produces a third feature layer.
S22: constructing a modified Encoder network by a reinforced feature extraction module;
as shown in fig. 4, the augmented feature extraction Module scaled Encoder network includes an initial Convolutional layer Module, a residual error Module, and a Attention mechanism feedback Module CBAM (conditional Block Attention Module); the residual error module is used for extracting context information of targets in various scales from a single feature layer in the trunk feature extraction module by using the discrete convolution layer; as shown in fig. 5, the attention mechanism feedback module CBAM compensates for the performance gap between the single-in single-out architecture and the multiple-in single-out architecture. As shown in fig. 6, in step S23, the decoupling output module YoloHead network performs a series of vector convolution layer operations and branch decoupling operations of an activation function on the feature layer extracted by the enhanced feature extraction module related Encoder network to obtain final prediction parameters, where the prediction parameters include respective prediction parameters Cls, a target frame parameter Reg, and a foreground/background parameter Obj.
The enhanced feature extraction module related Encoder network comprises three main components: the system comprises an initial convolutional layer module, an expansion residual error block and an attention mechanism feedback module CBAM. Firstly, constructing an initial convolutional layer module, and reducing the channel dimension by using a 1 multiplied by 1 convolutional layer; then, adding a 3 × 3 convolutional layer to refine semantic context, then stacking four continuous expansion residual blocks, generating expansion rate in different 3 × 3 convolutional layers, outputting the characteristics with a plurality of receptive fields, and covering the scales of all objects; and finally, establishing an attention mechanism feedback module CBAM to make up the performance difference between the single-input single-output structure and the multi-input single-output structure.
S221: constructing an initial convolutional layer module, and adding the first characteristic layer m obtained in the step S21 1 Using 1 × 1 convolutional layer as input of initial convolutional layer module to reduce channel dimension, adding 3 × 3 convolutional layer to refine semantic context, and obtaining output x of initial convolutional layer module 1 As shown in equation (1):
x 1 =conv 2 (conv 1 (m 1 )) (1)
wherein x is 1 For the output of the initial convolutional layer module, conv 1 Represents a 1X 1 convolutional layer, conv 2 Is a 3 × 3 convolution layer;
s222: four expansion residual blocks, bottleneck1, bottleneck2, bottleneck3, and Bottleneck4, are stacked as shown in equation (2):
Figure BDA0003756143960000131
wherein x is i For the input of the i-th extended residual block, conv 3 、conv 5 All are 1 × 1 convolutional layers, conv 4 Is a 3 × 3 convolution layer, expands the number of residual blocks
Figure BDA0003756143960000137
Z is an integer;
in the present embodiment, the first and second electrodes are,
Figure BDA0003756143960000138
the expansion ratios of the four void convolution layers are 2,4,6,8, respectively.
S223: and establishing an attention mechanism feedback module CBAM, and combining an attention mechanism of a space (Spatial) and a channel (channel) to make up the performance gap between a single-in single-out structure and a multi-in single-out structure.
Figure BDA0003756143960000132
Figure BDA0003756143960000133
Figure BDA0003756143960000134
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003756143960000135
is the output of the channel's attention,
Figure BDA0003756143960000136
is the output of spatial attention, Y i Is the output, X, of the feedback attention module CBAM i Is the input feature map, avgPool and MaxPool are the average pooling and maximum pooling operations, respectively, conv 6 、conv 7 、conv 8 、conv 9 Are all 1X 1 convolutional layers, conv 10 Is a 7 × 7 convolutional layer, δ refers to relu activation function, σ is sigmoid function, and cat is based on one-dimensional stitching operation.
As shown in formula (3), mixing X i Respectively carrying out AvgPool average pooling and MaxPool maximum pooling operations based on width and height, then respectively carrying out conv convolution layer operations, carrying out summation operation on feature maps output by convolution layers, and finally generating a final channel attention output feature map through sigmoid operation.
As shown in equation (4), will
Figure BDA0003756143960000145
And respectively carrying out AvgPool average pooling and MaxPool maximum pooling operations based on width and height, splicing, and carrying out convolutional layer operation to obtain a final spatial attention output characteristic diagram.
And (5) taking the final spatial attention output characteristic map as an output characteristic map of the attention mechanism feedback module CBAM.
S224: establishing a recursive generalized Encoder network
Figure BDA0003756143960000141
S23: constructing a YoloHead network of a decoupling output module;
s231: in order to enhance the interaction between classification and positioning, a stack of task interaction features is learned from a plurality of convolutional layers by using a feature extractor, and the design not only facilitates the task interaction, but also provides multi-level features with multi-scale effective receptive fields. Formally, let X ∈ R × H × W × C denote a single feature layer in step S21, where R, H, W, and C denote the number of images batchSize, image height, image width, and channel number of the YOLOX network model input each time, respectively, and 4 consecutive convolution layers with activation functions are used to calculate task interaction features, and the obtained convolution layers are used to calculate task interaction features
Figure BDA0003756143960000142
Dynamics, as shown in equation (7):
Figure BDA0003756143960000143
wherein, conv k Refers to the kth convolutional layer.
S232: using the layer attention mechanism, by the convolution layer obtained in step S231
Figure BDA0003756143960000146
Dynamically, the features of such specific tasks are computed to perform the decomposition of the tasks. The calculation for each task is as shown in equation (8),
Figure BDA0003756143960000144
w=σ(fc 2 (δ(fc 1 (x inter )))) (9)
wherein, w k Is the kth element of w obtained after the attention layer calculation, as shown in formula (9), w is calculated from the cross-layer task interaction features and can capture the dependency relationship, fc, between layers 1 And fc 2 Refers to two fully connected layers, and X inter Is a splicing
Figure BDA0003756143960000151
And (5) obtaining a characteristic diagram.
S233: the result for classification or localization is obtained, as shown in equation (10),
Z task =conv 12 (δ(conv 11 (X task ))) (10)
wherein, X task Is that
Figure BDA0003756143960000152
Splicing characteristic map of (1), conv 11 Is a 1 × 1 convolutional layer for adjusting the number of channels, and is the relu activation function, conv 12 Is 1 × 1 convolutional layer, is used forGenerating a prediction parameter Z task Namely, a class prediction parameter Cls, a target frame parameter Reg and a foreground/background parameter Obj of target detection are generated.
S3: and respectively inputting the training set and the verification set into an improved YOLOX network model for training and verification to obtain a space target detection model.
S31: inputting 3-channel RGB pictures with any size, and carrying out normalization, cutting, random up-down and left-right turning, scaling, random color change, moisaic and CutMix processing on the images. Scaling the image to 640 x 640 to be used as the input of a YOLOX network model, and carrying out slicing operation on the input image by using a Focus structure in a Backbone feature extraction module Backbone network;
s32: and (3) inputting the image processed in the step (S31) into a Backbone feature extraction module Backbone network, stacking a series of residual modules comprising a plurality of residual blocks and a depth separable convolution layer to deepen the network to realize the initial extraction of features and reduce a large number of training parameters, so as to obtain three effective feature layers of 20 × 20, 40 × 40 and 80 × 80.
S33: and (4) respectively inputting the three effective feature layers of 20 × 20, 40 × 40 and 80 × 80 obtained in the step (S32) into a scaled Encoder network (scaled Encoder network) of the enhanced feature extraction module to obtain effective enhanced feature layers.
S34: inputting the three effective feature layers of 20 × 20, 40 × 40 and 80 × 80 obtained in step S32 into the decoupling output module YoloHead network, and obtaining three prediction results for each effective enhanced feature layer, where the three prediction results are: reg (h, w, 4) is used for judging regression parameters of each feature point, and a prediction frame can be obtained after the regression parameters are adjusted; obj (h, w, 1) is used for judging whether each feature point contains an object; cls (h, w, num _ classes) is used for judging the type of the object contained in each feature point. Stacking three prediction results, wherein the result obtained by each characteristic layer is as follows: the first four parameters of Out (h, w,4+1+ num categories) are used for judging the regression parameters of each feature point, and a prediction frame can be obtained after the regression parameters are adjusted; the fifth parameter is used for judging whether each feature point contains an object or not; the last num _ classes parameters are used for judging the object type contained in each feature point.
S35: stacking the category prediction parameters Cls, the target frame parameters Reg and the foreground/background parameters Obj in the step S34 to obtain a prediction characteristic layer;
s36: and calculating the prediction parameters of the prediction feature layer in the step S35 and the category prediction parameters Cls and the target frame parameters Reg and the cross entropy loss of the foreground/background parameters Obj in the training set in the enhanced image in the step S12 and the corresponding XML annotation file, and continuously performing iterative optimization according to the model prediction weight of the cross entropy loss until a space target detection model is obtained.
S4: and inputting the space target images in the test set into a trained YOLOX network model for space target detection.
S5: and evaluating a space target detection algorithm of the improved YOLOX network model by using the space target detection result of the step S4.
And putting the weight of the trained test set space target detection model into a YOLOX network model, inputting an undetected space target image into the YOLOX network model, and evaluating the overall detection performance of the space target detection model.
The Average accuracy, which is an evaluation index of the algorithm detection accuracy, was mAP (mean Average precision), and the number of Frames Per Second (FPS) of the processed image was used as an evaluation index of the detection speed. The mAP is defined as the Average of the Average Accuracy (AP) of all classes. The average accuracy is then:
Figure BDA0003756143960000161
Figure BDA0003756143960000162
Figure BDA0003756143960000163
wherein, P represents accuracy Precision, and is used for evaluating the predicted accuracy; r represents the Recall rate Recall and is used to evaluate how many correct samples are predicted. TP refers to the positive samples predicted by the model as the positive class, FP refers to the negative samples predicted by the model as the positive class, and FN refers to the positive samples predicted by the model as the negative class.
Fig. 7 is a diagram showing the effect of the detection result of detecting the spatial target image by using the present invention. It can be seen from fig. 7 that a Satellite body (Satellite), a Satellite Cabin (bin), and a solar sailboard (Windsurfing) are detected, a numerical value on each box represents a confidence level, and is used for judging whether an object in a bounding box is a positive sample or a negative sample, a positive sample is judged if the value is greater than a confidence level threshold, and a negative sample, i.e., a background, is judged if the value is less than the confidence level threshold, and in the present invention, the confidence level threshold is set to be 0.60. The detection method provided by the invention can accurately detect the type and the number of the space targets. As shown in Table 1, the AP, mAP and FPS results of the present invention and the YOLOX network model were compared while keeping the training and testing images consistent. Compared with YOLOX, the method has the advantages that the average accuracy is improved by 4 points, and the requirements of real-time detection are met in the aspect of reasoning speed.
TABLE 1 comparison of the test results of the present invention with different test methods
Algorithm mAP FPS Satellite Windsurfing Cabin
Yolo X 0.9117 58.1 0.95 0.90 0.89
Improved Yolo X 0.9528 59.2 0.98 0.96 0.91
In conclusion, the method can realize real-time detection of the space target, continuously performs iterative optimization on the prediction weight of the YOLOX network model through forward propagation and backward propagation, and each model evaluation index achieves a better effect, so that the YOLOX network model can effectively detect and identify a specific type.

Claims (10)

1. A space target detection method based on an improved YOLOX network model is characterized by comprising the following steps:
step S1: acquiring a spatial target detection data set with labels and labels, and dividing the spatial target detection data set into a training set, a verification set and a test set;
step S2: constructing a YOLOX network model, wherein the YOLOX network model comprises a Backbone feature extraction module Backbone network, a reinforced feature extraction module related Encoder network and a decoupling output module Yolohead network;
s3, inputting the training set and the verification set obtained in the step S1 into the YOLOX network model constructed in the step S2, training and verifying to obtain a spatial target detection model and prediction weights thereof, and continuously performing iterative optimization on the prediction weights through forward propagation and backward propagation to obtain a trained YOLOX network model;
and S4, inputting the space target image in the test set into a trained YOLOX network model for space target detection.
2. The method for detecting the spatial target based on the improved YOLOX network model as claimed in claim 1, wherein the step S1 is specifically:
s11, obtaining an image with a space target, and performing Copy-Reduce-Paste data enhancement on the image to obtain an enhanced image;
s12, labeling the enhanced image obtained in the step S11, and obtaining a space target position corresponding to the enhanced image and an XML labeling file of the type of the space target position; establishing a space target detection data set by the enhanced image and the corresponding XML markup file;
s13, the space target detection data set obtained in the step S12 is processed according to the following steps of 8:1:1 are randomly divided into a training set, a validation set and a test set.
3. The method for detecting the spatial target based on the improved YOLOX network model as claimed in claim 1 or 2, wherein the step S2 is specifically:
s21: constructing a Backbone feature extraction module Backbone network;
s22: constructing a reinforced feature extraction module related Encoder network;
s23: and constructing a decoupling output module Yolohead network to complete the construction of a Yolox network model.
4. The method according to claim 3, wherein the Backbone feature extraction module Backbone network in step S21 comprises a Focus module, a depth separable convolutional layer, a residual error module and an SPPBottlene module;
the depth-separable convolutional layers include a first depth-separable convolutional layer, a second depth-separable convolutional layer, a third depth-separable convolutional layer, a fourth depth-separable convolutional layer, and a fifth depth-separable convolutional layer;
the residual error module comprises a first CspLayer module, a second CspLayer module, a third CspLayer module and a fourth CspLayer module;
the Focus module, the first depth separable convolution layer, the second depth separable convolution layer, the first CspLayer module, the third depth separable convolution layer, the second CspLayer module, the fourth depth separable convolution layer, the third CspLayer module, the fifth depth separable convolution layer, the SPPBottleeck module and the fourth CspLayer module are connected in sequence;
the second CspLayer module generates a first feature layer; a third csplyer module produces a second feature layer and a fourth csplyer module produces a third feature layer.
5. The method of claim 4, wherein the enhanced feature extraction module scaled Encoder network in step S22 includes 1 initial convolutional layer module, Z expansion residual blocks, and Z-1 attention mechanism feedback modules CBAM, where Z is a positive integer;
step S22 specifically includes:
s221: constructing an initial convolutional layer module, and adding the first characteristic layer m obtained in the step S21 1 Using 1 × 1 convolutional layer as input of initial convolutional layer module to reduce channel dimension, adding 3 × 3 convolutional layer to refine semantic context, and obtaining output x of initial convolutional layer module 1
x 1 =conv 2 (conv 1 (m 1 ))
In the formula, conv 1 Is 1 × 1 convolutional layer, conv 2 Is a 3 × 3 convolutional layer;
s222: building an extended residual block, and outputting x of the initial convolutional layer module obtained in step S221 1 Performing convolution layer operation to obtain output X of the residual block i
X i =x i +conv 5 (conv 4 (conv 3 (x i )))
In the formula, x i Is a firsti input of the extended residual block, conv 3 、conv 5 Are all 1X 1 convolutional layers, conv 4 Is a 3X 3 convolutional layer, X i For the output of the ith residual block for expansion,
Figure FDA0003756143950000021
s223: constructing an attention mechanism feedback module CBAM, and expanding the output X of the residual error block in the step S222 i Inputting an attention mechanism feedback module CBAM to obtain a channel attention output characteristic diagram
Figure FDA0003756143950000031
And spatial attention output feature map
Figure FDA0003756143950000032
And outputting the spatial attention as a feature map
Figure FDA0003756143950000033
Output characteristic diagram Y as attention mechanism feedback module CBAM i
S224: establishing a recursive reinforced feature extraction module related Encoder network:
x i+1 =Y i
through steps S222 to S223, the output characteristic diagram Y of the attention mechanism feedback module CBAM at the Z-1 st time is obtained Z-1 To obtain the input x of the Z-th expansion residual block Z X is to Z Substituting into the output X of the expanded residual block in step S222 i =x i +conv 5 (conv 4 (conv 3 (x i ) ) obtaining a reinforced characteristic layer of a reinforced characteristic extraction module related to the first characteristic layer;
s225: and repeating the steps S221 to S224 to obtain a second feature layer and a third feature layer corresponding to the enhanced feature layer output by the enhanced feature extraction module scaled Encoder network, and completing the construction of the enhanced feature extraction module scaled Encoder network.
6. The method as claimed in claim 5, wherein the decoupling output module YoloHead network in step S23 includes a dynamic convolution layer, a layer attention mechanism and a prediction parameter layer;
step S23 specifically includes:
s231: calculating task interaction characteristics of a decoupling output module YoloHead network to obtain a dynamic convolution layer
Figure FDA0003756143950000034
Comprises the following steps:
X∈R×H×W×C
Figure FDA0003756143950000035
wherein, X is one of the enhancement feature layers obtained in step S225, R, H, W and C respectively represent the number of images batchSize, image height, image width and channel number of the YOLOX network model input each time, δ indicates the relu activation function, conv k Refers to the k-th convolutional layer,
Figure FDA0003756143950000036
s232: using the dynamic convolution layer obtained in step S231 using a layer attention mechanism
Figure FDA0003756143950000037
Feature layer for computational classification and regression tasks
Figure FDA0003756143950000038
w=σ(fc 2 (δ(fc 1 (x inter ))))
Figure FDA0003756143950000039
Wherein,x inter Is a spliced dynamic convolution layer
Figure FDA00037561439500000310
Characteristic map, fc, obtained later 1 Is the first fully-connected layer, fc 2 Is a second fully connected layer, w is x inter The k-dimensional weight variables calculated by the layer attention mechanism can capture the dependency relationship between k convolutional layers, w k Is the kth element of w, σ is the sigmoid function;
s233: according to the characteristic layer in S232
Figure FDA0003756143950000041
Obtaining a prediction parameter Z for classification or regression obtained by the enhanced feature layer through a decoupling output module YoloHead network task
Z task =conv 12 (δ(conv 11 (X task )))
Wherein, X task Is a characteristic layer
Figure FDA0003756143950000042
Splicing characteristic map of 11 Is 1 × 1 convolution layer for adjusting the number of channels, conv 12 Convolution layer for generating prediction parameter Z task
S234: and repeating the steps S231 to S233 to obtain all the prediction parameters of the enhanced feature layer obtained through the decoupling output module YoloHead network, and completing the construction of the Yolox network model.
7. The method for detecting the spatial target based on the improved YOLOX network model as claimed in claim 6, wherein step S3 is specifically:
s31: inputting the RGB pictures of the training set and the verification set in the step S13 in the YOLOX network model, and carrying out slicing operation on the RGB pictures by using the Focus module in the step S21;
s32: inputting the RGB picture processed in the step S31 into a Backbone feature extraction module Backbone network, and obtaining an effective feature layer through the residual error module and the depth separable convolution layer in the step S21;
s33: respectively inputting the effective characteristic layers obtained in the step S32 into a reinforced characteristic extraction module scaled Encoder network to obtain effective reinforced characteristic layers;
s34: inputting the effective reinforced feature layer obtained in the step S33 into a YoloHead network of a decoupling output module to obtain a prediction parameter of the effective reinforced feature layer; the prediction parameters comprise category prediction parameters Cls, target frame parameters Reg and foreground/background parameters Obj;
s35: stacking the category prediction parameters Cls, the target frame parameters Reg and the foreground/background parameters Obj in the step S34 to obtain a prediction characteristic layer;
s36: and (5) calculating the prediction parameters of the prediction feature layer in the step (S35) and the category prediction parameters Cls and the cross entropy losses of the target frame parameters Reg and the foreground/background parameters Obj in the enhanced image in the step (S12) and the XML annotation file corresponding to the training set, and continuously performing iterative optimization according to the model prediction weight of the cross entropy losses until a space target detection model is obtained.
8. The method for detecting a spatial target based on the improved YOLOX network model as claimed in claim 7, wherein step S223 is specifically:
the channel attention output characteristic graph calculation formula is as follows:
Figure FDA0003756143950000051
calculating a spatial attention output feature map according to the channel attention output feature map:
Figure FDA0003756143950000052
taking the spatial attention output characteristic map as an output characteristic map of an attention mechanism feedback module CBAM:
Figure FDA0003756143950000053
in the formula, avgPool is average pooling, maxpool is maximum pooling operation, conv 6 、conv 7 、conv 8 、conv 9 All are 1 × 1 convolutional layers, conv 10 Is a 7 x 7 convolutional layer, cat is based on a one-dimensional stitching operation.
9. The method for detecting the spatial target based on the improved YOLOX network model as claimed in claim 8, further comprising step S5:
and (3) inputting the space target image of the test set into the YOLOX network model constructed in the step (S2), and evaluating the overall detection performance of the YOLOX network model.
10. The method for spatial target detection based on the improved YOLOX network model as claimed in claim 9, wherein in step S5, the method for evaluating the overall detection performance of the YOLOX network model specifically comprises: the integral detection performance of the YOLOX network model meets the average value of the evaluation indexes AP of the average detection precision and all the evaluation indexes AP, namely the average accuracy mAP;
Figure FDA0003756143950000054
Figure FDA0003756143950000055
Figure FDA0003756143950000056
in the formula, P represents accuracy Precision and is used for evaluating the predicted accuracy; r represents a Recall rate Recall and is used for evaluating the predicted correct samples; TP refers to the positive samples of the positive class predicted by the model, FP is the negative samples of the positive class predicted by the model, and FN is the positive samples of the negative class predicted by the model.
CN202210874032.9A 2022-07-20 2022-07-20 Space target detection method based on improved YOLOX network model Pending CN115471670A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210874032.9A CN115471670A (en) 2022-07-20 2022-07-20 Space target detection method based on improved YOLOX network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210874032.9A CN115471670A (en) 2022-07-20 2022-07-20 Space target detection method based on improved YOLOX network model

Publications (1)

Publication Number Publication Date
CN115471670A true CN115471670A (en) 2022-12-13

Family

ID=84366005

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210874032.9A Pending CN115471670A (en) 2022-07-20 2022-07-20 Space target detection method based on improved YOLOX network model

Country Status (1)

Country Link
CN (1) CN115471670A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115631402A (en) * 2022-12-22 2023-01-20 联通(四川)产业互联网有限公司 AI algorithm service platform construction method suitable for intelligent breeding
CN116469014A (en) * 2023-01-10 2023-07-21 南京航空航天大学 Small sample satellite radar image sailboard identification and segmentation method based on optimized Mask R-CNN
CN117237741A (en) * 2023-11-08 2023-12-15 烟台持久钟表有限公司 Campus dangerous behavior detection method, system, device and storage medium
CN117668669A (en) * 2024-02-01 2024-03-08 齐鲁工业大学(山东省科学院) Pipeline safety monitoring method and system based on improved YOLOv7

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115631402A (en) * 2022-12-22 2023-01-20 联通(四川)产业互联网有限公司 AI algorithm service platform construction method suitable for intelligent breeding
CN115631402B (en) * 2022-12-22 2023-05-23 联通(四川)产业互联网有限公司 AI algorithm service platform construction method suitable for intelligent cultivation
CN116469014A (en) * 2023-01-10 2023-07-21 南京航空航天大学 Small sample satellite radar image sailboard identification and segmentation method based on optimized Mask R-CNN
CN116469014B (en) * 2023-01-10 2024-04-30 南京航空航天大学 Small sample satellite radar image sailboard identification and segmentation method based on optimized Mask R-CNN
CN117237741A (en) * 2023-11-08 2023-12-15 烟台持久钟表有限公司 Campus dangerous behavior detection method, system, device and storage medium
CN117237741B (en) * 2023-11-08 2024-02-13 烟台持久钟表有限公司 Campus dangerous behavior detection method, system, device and storage medium
CN117668669A (en) * 2024-02-01 2024-03-08 齐鲁工业大学(山东省科学院) Pipeline safety monitoring method and system based on improved YOLOv7
CN117668669B (en) * 2024-02-01 2024-04-19 齐鲁工业大学(山东省科学院) Pipeline safety monitoring method and system based on improvement YOLOv (YOLOv)

Similar Documents

Publication Publication Date Title
CN109977918B (en) Target detection positioning optimization method based on unsupervised domain adaptation
CN109919108B (en) Remote sensing image rapid target detection method based on deep hash auxiliary network
CN110930454B (en) Six-degree-of-freedom pose estimation algorithm based on boundary box outer key point positioning
CN112766244B (en) Target object detection method and device, computer equipment and storage medium
CN115471670A (en) Space target detection method based on improved YOLOX network model
CN112633350B (en) Multi-scale point cloud classification implementation method based on graph convolution
CN110569738B (en) Natural scene text detection method, equipment and medium based on densely connected network
CN111583263A (en) Point cloud segmentation method based on joint dynamic graph convolution
CN113326930B (en) Data processing method, neural network training method, related device and equipment
CN111709311A (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
CN111898432B (en) Pedestrian detection system and method based on improved YOLOv3 algorithm
CN115171165A (en) Pedestrian re-identification method and device with global features and step-type local features fused
CN113313173A (en) Human body analysis method based on graph representation and improved Transformer
CN111460894A (en) Intelligent car logo detection method based on convolutional neural network
CN112597919A (en) Real-time medicine box detection method based on YOLOv3 pruning network and embedded development board
Nie et al. Adap-EMD: Adaptive EMD for aircraft fine-grained classification in remote sensing
CN114780767A (en) Large-scale image retrieval method and system based on deep convolutional neural network
CN117131348B (en) Data quality analysis method and system based on differential convolution characteristics
CN114492755A (en) Target detection model compression method based on knowledge distillation
CN112668662A (en) Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN112308213A (en) Convolutional neural network compression method based on global feature relationship
CN116311349A (en) Human body key point detection method based on lightweight neural network
CN115424275A (en) Fishing boat brand identification method and system based on deep learning technology
CN112487927B (en) Method and system for realizing indoor scene recognition based on object associated attention
CN113609904B (en) Single-target tracking algorithm based on dynamic global information modeling and twin network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination