CN115995042A - Video SAR moving target detection method and device - Google Patents

Video SAR moving target detection method and device Download PDF

Info

Publication number
CN115995042A
CN115995042A CN202310099920.2A CN202310099920A CN115995042A CN 115995042 A CN115995042 A CN 115995042A CN 202310099920 A CN202310099920 A CN 202310099920A CN 115995042 A CN115995042 A CN 115995042A
Authority
CN
China
Prior art keywords
feature
features
video sar
moving target
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310099920.2A
Other languages
Chinese (zh)
Inventor
李银伟
张慧萍
朱亦鸣
毛倩倩
李晓鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN202310099920.2A priority Critical patent/CN115995042A/en
Publication of CN115995042A publication Critical patent/CN115995042A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides a method and a device for detecting a video SAR moving target, wherein the method comprises the following steps: framing the video SAR to be trained, then respectively labeling, and expanding a data set in a data enhancement mode; performing primary feature extraction, and inputting the features after the primary feature extraction into BiFPN for further feature fusion extraction; inputting the shallow features output by the BiFPN into the CA, and outputting features which pay more attention to the space coordinates; the method comprises the steps of carrying out feature fusion on high-level features output by BiFPN and features output by CA, inputting the high-level features and the features to an adaptive feature fusion module, carrying out feature adaptive fusion on the input features by the adaptive feature fusion module, and classifying and regressing detection heads; performing iterative training on the deep neural network to obtain optimal weights; and inputting the video SAR to be detected into a trained deep neural network, and outputting the detected moving target. The invention can improve the detection efficiency and the accuracy of the video SAR moving target.

Description

Video SAR moving target detection method and device
Technical Field
The invention relates to the technical field of radar image processing, in particular to a method and a device for detecting a video SAR moving target.
Background
Synthetic aperture radar (Synthetic Aperture Radar, SAR) is an active earth-looking system that can image a variety of targets in high resolution all the day and day. The video SAR can continuously observe and image the region of interest, so that continuous tracking and monitoring of a target can be realized.
For moving objects, the Doppler modulation will cause them to shift and defocus as they are imaged, so that they appear as irregularly shaped shadows in the image, so detection of moving objects can be achieved by detecting shadows in the video SAR image. Conventional SAR image processing algorithms generally require preprocessing of the image, such as registration segmentation extraction, etc., whereas application of a deep neural network to shadow detection of a moving object can achieve end-to-end shadow detection without requiring a complex preprocessing process.
Target detection algorithms based on deep learning are mainly divided into two categories, namely a one-stage and a two-stage method. The one-stage method is to divide an image into S×S grids, and calculate a target probability that a grid center point falls within each grid. The two-stage method is to divide the whole detection process into two stages, firstly, extracting candidate frames in advance according to the position of a target in an image, and then classifying and regressing. The one-stage method detects much faster than the two-stage algorithm, which is one of the more classical algorithms in this class of algorithms. However, in the existing neural network model, the moving target detection precision is low due to factors such as low contrast of the video SAR image, speckle noise and the like.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a method and a device for detecting a moving target of a video SAR, which can improve the detection efficiency of the moving target of the video SAR and have higher detection accuracy of the moving target.
In order to solve the problems, the technical scheme of the invention is as follows:
a video SAR moving target detection method comprises the following steps:
framing the video SAR to be trained, then respectively labeling, and expanding a data set in a data enhancement mode;
performing primary feature extraction, and inputting the features after the primary feature extraction into BiFPN for further feature fusion extraction;
inputting the shallow features output by the BiFPN into the CA, and outputting features which pay more attention to the space coordinates;
the method comprises the steps of carrying out feature fusion on high-level features output by BiFPN and features output by CA, inputting the high-level features and the features to an adaptive feature fusion module, carrying out feature adaptive fusion on the input features by the adaptive feature fusion module, and classifying and regressing detection heads;
performing iterative training on the deep neural network to obtain optimal weights;
and inputting the video SAR to be detected into a trained deep neural network, and outputting the detected moving target.
Preferably, the step of framing the video SAR to be trained, and then labeling the frames, and expanding the data set by data enhancement specifically includes: reading a video SAR image to be trained, obtaining the frame rate, the width and the height of the video, and labeling each frame of image after framing the video SAR image; enhancing the labeled data set by using the enhancing functions such as clipping, mirroring, rotation and the like; renaming and storing the amplified images and labels in a one-to-one correspondence in sequence, and distributing the enhanced data set into a training set and a testing set according to a certain proportion.
Preferably, the step of performing preliminary feature extraction includes inputting the features after the preliminary feature extraction into a BiFPN for further feature fusion extraction, and performing the preliminary feature by using CSPDarknet.
Preferably, the step of inputting the shallow features of the BiFPN output into CA and outputting the features of the spatial coordinates of more interest specifically includes: CA attention separates the height and the width of an input image to encode the height and the width of the input image respectively, and carries out global average pooling on the width and the height of the input feature image respectively to obtain 1D feature images in two directions:
Figure BDA0004086111560000021
Figure BDA0004086111560000022
and (3) carrying out channel-by-channel coding on the input x by using pooling cores with the sizes of H1 and 1*W along the horizontal coordinate direction and the vertical coordinate direction respectively, carrying out convolution, batch normalization and nonlinearity on the feature graphs in the two directions after superposition, respectively carrying out convolution and multiplication with the input x after activation to obtain an attention weight graph, carrying out up-sampling on the feature graphs with small scale, carrying out superposition on the channel after ensuring that the dimensions of the two features are the same, and outputting the feature with more attention to the space coordinate.
Preferably, the step of performing feature fusion on the high-level feature output by the BiFPN and the feature output by the CA and inputting the feature fusion to the adaptive feature fusion module, where the adaptive feature fusion module performs feature adaptive fusion on the input feature, and the step of classifying and regressing the detection head specifically includes: three decoupling head structures for receiving characteristic layers with different scales are used as detection heads of a network, a convolution kernel of 1*1 is used for reducing the number of channels in the decoupling head structures, convolution, batch normalization and activation blocks are used, finally obtained values are overlapped, coordinates of grids on the corresponding characteristic diagrams are calculated, network coordinate points of the characteristic diagrams are created, and a prediction frame obtained by forward reasoning of the neural network is projected onto an original image to obtain the prediction frame.
Preferably, in the step of performing iterative training on the deep neural network to obtain the optimal weight, a loss function is defined before training on the neural network is started:
Figure BDA0004086111560000031
where i is the index value of the training dataset, y i Is tag data,/->
Figure BDA0004086111560000032
Is predictive data, there are M training data sets.
Preferably, in the step of performing iterative training on the deep neural network to obtain the optimal weight, the training loss optimizer is a random gradient descent-based function optimization algorithm, calculates a gradient corresponding to the weight for the loss function, changes the weight toward the opposite direction until the loss function converges to a local minimum, updates the weight value in each iterative training, and has a weight calculation formula of:
Figure BDA0004086111560000033
wherein w is j Is the weight of the jth iteration, w j+1 The weight of the j+1st iteration is the learning rate, lr is the loss function, and the weight obtained by each iteration training is calculated by the weight of the last iteration.
Preferably, in the step of performing iterative training on the deep neural network to obtain the optimal weight, the loss type is an overlap ratio IoU, the IoU is an overlap ratio of the generated prediction frame and the real frame, and a IoU formula is as follows:
Figure BDA0004086111560000034
and storing a weight value once in each iteration, and obtaining the optimal weight value of the deep neural network by training the iteration for a plurality of times.
Further, the present invention also provides a video SAR moving object detection apparatus, characterized in that the apparatus comprises a processor configured to perform the video SAR moving object detection method as described above via execution of the executable instructions of the processor, and a memory for storing the executable instructions of the processor.
Compared with the prior art, the method takes the Yolox backbone network CSPDarknet as a reference line of the network, uses BiFPN to further fuse and extract the characteristics, adopts a coordinated attention mechanism CA to strengthen the attention of part of output characteristic layers, fuses with the output of the BiFPN and inputs the result into an adaptive characteristic fusion module ASFF, and finally uses three characteristic layers output from the adaptive characteristic fusion module to classify and return. The designed deep neural network is used in the detection of the video SAR moving target, and a good effect is obtained in the detection of the blurred video SAR moving target; compared with the traditional image processing method, the method does not need to preprocess the video SAR image, improves the detection efficiency, and has higher accuracy of detecting the moving target compared with the traditional deep neural networks such as the YOLOX, the fast-RCNN and the like.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:
fig. 1 is a flow chart of a method for detecting a moving target of a video SAR according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a deep neural network in a method for detecting a moving target of a video SAR according to an embodiment of the present invention;
fig. 3 is a schematic diagram of Fusion structure in a deep neural network structure according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a CBS structure in a deep neural network structure according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.
Specifically, the invention provides a method for detecting a moving target of a video SAR, as shown in figure 1, comprising the following steps:
s1: framing the video SAR to be trained, then respectively labeling, and expanding a data set in a data enhancement mode;
specifically, in step S1, a video SAR image to be trained is read, a frame rate, a width and a height of the video are obtained, and each frame of image after framing the video SAR is labeled; enhancing the labeled data set by using the enhancing functions such as clipping, mirroring, rotation and the like; renaming and storing the amplified images and labels in a one-to-one correspondence in sequence, and distributing the enhanced data set into a training set and a testing set according to a certain proportion.
S2: performing primary feature extraction, and inputting the features after the primary feature extraction into BiFPN for further feature fusion extraction;
specifically, as shown in fig. 2, 3 and 4, training or testing images are input to a CSPDarknet of a deep neural network to perform preliminary feature extraction, and the three feature layers are named as: "dark3", "dark4" and "dark5".
Further, in order to improve the detection precision of the network, the feature layer after the primary feature extraction is input into a weighted feature bidirectional pyramid network (BiFPN) to perform multi-scale feature fusion extraction.
S3: inputting the shallow features output by the BiFPN into a CA attention mechanism, and outputting features which pay more attention to space coordinates;
specifically, in step S3, the shallow feature layers p3_out and p4_out obtained from the BiFPN are input into the CA attention mechanism.
CA attention converts the input 2D coordinates into 1D, namely, the height and the width of the image are separated to be respectively encoded, the width and the height of the input feature map are respectively subjected to global average pooling, and the 1D feature map in two directions is obtained:
Figure BDA0004086111560000051
Figure BDA0004086111560000052
the pooling kernels of H1 and 1*W sizes are used for the input x to perform channel-by-channel encoding along the horizontal and vertical coordinate directions, respectively, where equation (1) is the output representation of the c-th channel of height H and equation (2) is the c-th channel representation of width w. The feature graphs in two directions are overlapped, convolved, normalized in batches and nonlinear, respectively convolved and activated, and multiplied by the input x.
After the feature maps with two different sizes are input into a CA attention mechanism to obtain an attention weight map, up-sampling is carried out on the feature maps with small scale, superposition is carried out on a channel after the same scale of the two features is ensured, and the features with more attention to space coordinates are output.
S4: the method comprises the steps of carrying out feature fusion on high-level features output by BiFPN and features output by CA, inputting the high-level features and the features to an adaptive feature fusion module, carrying out feature adaptive fusion on the input features by the adaptive feature fusion module, and carrying out classification and regression through a detection head;
specifically, in step S4, the high-level feature layer output in step S2 and the feature layer output in step S3 are subjected to feature fusion and then input to an adaptive feature fusion module (ASFF).
Specifically, three decoupling head structures for receiving characteristic layers with different scales are used as detection heads of a network, a convolution kernel of 1*1 is used for reducing the number of channels in the decoupling head structures, convolution, batch normalization and activation blocks are used, finally obtained values are overlapped, coordinates of grids on the corresponding characteristic diagrams are calculated, network coordinate points of the characteristic diagrams are created, and a prediction frame obtained by forward reasoning of the neural network is projected onto an original image to obtain the prediction frame.
When screening the prediction frame, the method comprises two steps:
the first step is that a positive sample prediction frame is primarily screened, all prediction frames with the central points of the prediction frames in a real frame are screened, and then the prediction frames in a square which expands the real frame by 2.5 times of step length are screened;
the second step is to use a simplified optimal transmission allocation (Optimal Transport Assignment, OTA) algorithm for further screening of the prediction block.
S5: performing iterative training on the deep neural network to obtain optimal weights;
specifically, in step S5, a loss function is defined before training of the neural network is started:
Figure BDA0004086111560000053
where i is the index value of the training dataset, y i Is the label data of the label which is to be read,
Figure BDA0004086111560000054
is predictive data, there are M training data sets. The optimizer of training loss is random gradient descent (stochastic gradient descent, SGD), the loss type is cross ratio (Intersection over Union, ioU), weight is stored once for each iteration, and the training iteration obtains the optimal weight of the deep neural network multiple times.
SGD is a function optimization algorithm based on random gradient descent, and calculates a gradient corresponding to the weight of the loss function, and changes the weight in the opposite direction until the loss function converges to a local minimum. The weight value is updated in each iterative training, and the weight calculation formula is as follows:
Figure BDA0004086111560000061
/>
wherein w is j Is the weight of the jth iteration, w j+1 The weight of the j+1st iteration is the learning rate, lr is the loss function, and the weight obtained by each iteration training is calculated by the weight of the last iteration.
IoU is a criterion used to measure the accuracy of detecting targets in a dataset when calculating losses. IoU formula is as follows:
Figure BDA0004086111560000062
IoU is the overlap ratio of the generated prediction frames and the real frames, i.e. the ratio of their intersection (Area of overlay) to the union (Area of union).
S6: and inputting the video SAR to be detected into a trained deep neural network, and outputting the detected moving target.
Specifically, in step S6, each frame of image of the video SAR to be detected is input into a trained deep neural network, so as to obtain the detected moving target.
Compared with the prior art, the method takes the Yolox backbone network CSPDarknet as a reference line of the network, uses BiFPN to further fuse and extract the characteristics, adopts a coordinated attention mechanism CA to strengthen the attention of part of output characteristic layers, fuses with the output of the BiFPN and inputs the result into an adaptive characteristic fusion module ASFF, and finally uses three characteristic layers output from the adaptive characteristic fusion module to classify and return. The designed deep neural network is used in the detection of the video SAR moving target, and a good effect is obtained in the detection of the blurred video SAR moving target; compared with the traditional image processing method, the method does not need to preprocess the video SAR image, improves the detection efficiency, and has higher accuracy of detecting the moving target compared with the traditional deep neural networks such as the YOLOX, the fast-RCNN and the like.
The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes or modifications may be made by those skilled in the art within the scope of the appended claims without affecting the spirit of the invention. The embodiments of the present application and features in the embodiments may be combined with each other arbitrarily without conflict.

Claims (9)

1. A method for detecting a moving target of a video SAR, comprising the steps of:
framing the video SAR to be trained, then respectively labeling, and expanding a data set in a data enhancement mode;
performing primary feature extraction, and inputting the features after the primary feature extraction into BiFPN for further feature fusion extraction;
inputting the shallow features output by the BiFPN into the CA, and outputting features which pay more attention to the space coordinates;
the method comprises the steps of carrying out feature fusion on high-level features output by BiFPN and features output by CA, inputting the high-level features and the features to an adaptive feature fusion module, carrying out feature adaptive fusion on the input features by the adaptive feature fusion module, and classifying and regressing detection heads;
performing iterative training on the deep neural network to obtain optimal weights;
and inputting the video SAR to be detected into a trained deep neural network, and outputting the detected moving target.
2. The method for detecting a moving target of a video SAR according to claim 1, wherein the step of framing the video SAR to be trained, respectively labeling, and expanding the data set by data enhancement specifically comprises: reading a video SAR image to be trained, obtaining the frame rate, the width and the height of the video, and labeling each frame of image after framing the video SAR image; enhancing the labeled data set by using the enhancing functions such as clipping, mirroring, rotation and the like; renaming and storing the amplified images and labels in a one-to-one correspondence in sequence, and distributing the enhanced data set into a training set and a testing set according to a certain proportion.
3. The method for detecting a moving target of a video SAR according to claim 1, wherein the step of performing preliminary feature extraction, inputting the features after the preliminary feature extraction into BiFPN for further feature fusion extraction, and performing the preliminary feature using CSPDarknet.
4. The method for detecting a moving target of a video SAR according to claim 1, wherein said step of inputting the shallow feature of the BiFPN output into CA and outputting the feature of the spatial coordinate of more interest specifically comprises: CA attention separates the height and the width of an input image to encode the height and the width of the input image respectively, and carries out global average pooling on the width and the height of the input feature image respectively to obtain 1D feature images in two directions:
Figure FDA0004086111550000011
Figure FDA0004086111550000012
and (3) carrying out channel-by-channel coding on the input x by using pooling cores with the sizes of H1 and 1*W along the width direction and the height direction respectively, carrying out convolution, batch normalization and nonlinearity on the feature graphs in the two directions after superposition, respectively carrying out convolution and activation, multiplying the feature graphs with the input x, carrying out up-sampling on the feature graphs with small scale after obtaining the attention weight graph, carrying out superposition on the feature graphs on the channel after ensuring that the dimensions of the two feature graphs are the same, and outputting the feature of more attention space coordinates.
5. The method for detecting a moving target of a video SAR according to claim 1, wherein the step of performing feature fusion on the high-level feature output by BiFPN and the feature output by CA and then inputting the fused features to the adaptive feature fusion module, and the adaptive feature fusion module performs feature adaptive fusion on the input features, and the step of classifying and regressing the detection head specifically comprises: three decoupling head structures for receiving characteristic layers with different scales are used as detection heads of a network, a convolution kernel of 1*1 is used for reducing the number of channels in the decoupling head structures, convolution, batch normalization and activation blocks are used, finally obtained values are overlapped, coordinates of grids on the corresponding characteristic diagrams are calculated, network coordinate points of the characteristic diagrams are created, and a prediction frame obtained by forward reasoning of the neural network is projected onto an original image to obtain the prediction frame.
6. The method for detecting a moving target of a video SAR according to claim 1, wherein in said step of iteratively training the deep neural network to obtain the optimal weight, a loss function is defined before training the neural network:
Figure FDA0004086111550000021
where i is the index value of the training dataset, y i Is tag data,/->
Figure FDA0004086111550000022
Is predictive data, there are M training data sets.
7. The method for detecting a moving target of a video SAR according to claim 6, wherein in the step of performing iterative training on the deep neural network to obtain the optimal weight, the optimizer for training the loss is a random gradient descent-based function optimization algorithm, the gradient corresponding to the weight is calculated on the loss function, the weight is changed in the opposite direction until the loss function converges to a local minimum, the weight is updated in each iterative training, and the weight calculation formula is:
Figure FDA0004086111550000023
wherein w is j Is the weight of the jth iteration, w j+1 The weight of the j+1st iteration is the learning rate, lr is the loss function, and the weight obtained by each iteration training is calculated by the weight of the last iteration.
8. The method for detecting a video SAR moving target according to claim 7, wherein in the step of iteratively training the deep neural network to obtain the optimal weight, the loss type is an overlap ratio IoU, which is an overlap ratio of the generated prediction frame and the real frame, and the IoU formula is:
Figure FDA0004086111550000024
and storing a weight value once in each iteration, and obtaining the optimal weight value of the deep neural network by training the iteration for a plurality of times.
9. A video SAR moving object detection apparatus, comprising a processor configured to perform the video SAR moving object detection method of any one of claims 1 to 8 via execution of executable instructions of the processor, and a memory for storing the executable instructions of the processor.
CN202310099920.2A 2023-02-09 2023-02-09 Video SAR moving target detection method and device Pending CN115995042A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310099920.2A CN115995042A (en) 2023-02-09 2023-02-09 Video SAR moving target detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310099920.2A CN115995042A (en) 2023-02-09 2023-02-09 Video SAR moving target detection method and device

Publications (1)

Publication Number Publication Date
CN115995042A true CN115995042A (en) 2023-04-21

Family

ID=85993406

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310099920.2A Pending CN115995042A (en) 2023-02-09 2023-02-09 Video SAR moving target detection method and device

Country Status (1)

Country Link
CN (1) CN115995042A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912290A (en) * 2023-09-11 2023-10-20 四川都睿感控科技有限公司 Memory-enhanced method for detecting small moving targets of difficult and easy videos
CN117372935A (en) * 2023-12-07 2024-01-09 神思电子技术股份有限公司 Video target detection method, device and medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912290A (en) * 2023-09-11 2023-10-20 四川都睿感控科技有限公司 Memory-enhanced method for detecting small moving targets of difficult and easy videos
CN116912290B (en) * 2023-09-11 2023-12-15 四川都睿感控科技有限公司 Memory-enhanced method for detecting small moving targets of difficult and easy videos
CN117372935A (en) * 2023-12-07 2024-01-09 神思电子技术股份有限公司 Video target detection method, device and medium
CN117372935B (en) * 2023-12-07 2024-02-20 神思电子技术股份有限公司 Video target detection method, device and medium

Similar Documents

Publication Publication Date Title
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN110135267B (en) Large-scene SAR image fine target detection method
CN108509978B (en) Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion
CN108399362B (en) Rapid pedestrian detection method and device
Sameen et al. Classification of very high resolution aerial photos using spectral-spatial convolutional neural networks
CN110047069B (en) Image detection device
CN111027493B (en) Pedestrian detection method based on deep learning multi-network soft fusion
CN111310861A (en) License plate recognition and positioning method based on deep neural network
CN112800964B (en) Remote sensing image target detection method and system based on multi-module fusion
CN115995042A (en) Video SAR moving target detection method and device
CN110659664B (en) SSD-based high-precision small object identification method
CN114022830A (en) Target determination method and target determination device
CN111368769A (en) Ship multi-target detection method based on improved anchor point frame generation model
CN116645592B (en) Crack detection method based on image processing and storage medium
CN115631344B (en) Target detection method based on feature self-adaptive aggregation
CN111898432A (en) Pedestrian detection system and method based on improved YOLOv3 algorithm
CN112016569A (en) Target detection method, network, device and storage medium based on attention mechanism
Abdollahi et al. Road extraction from high-resolution orthophoto images using convolutional neural network
CN113850129A (en) Target detection method for rotary equal-variation space local attention remote sensing image
CN111179270A (en) Image co-segmentation method and device based on attention mechanism
Rafique et al. Smart traffic monitoring through pyramid pooling vehicle detection and filter-based tracking on aerial images
CN114764856A (en) Image semantic segmentation method and image semantic segmentation device
CN115937659A (en) Mask-RCNN-based multi-target detection method in indoor complex environment
Wang Remote sensing image semantic segmentation algorithm based on improved ENet network
CN110852255B (en) Traffic target detection method based on U-shaped characteristic pyramid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination