CN115995042A - Video SAR moving target detection method and device - Google Patents
Video SAR moving target detection method and device Download PDFInfo
- Publication number
- CN115995042A CN115995042A CN202310099920.2A CN202310099920A CN115995042A CN 115995042 A CN115995042 A CN 115995042A CN 202310099920 A CN202310099920 A CN 202310099920A CN 115995042 A CN115995042 A CN 115995042A
- Authority
- CN
- China
- Prior art keywords
- feature
- features
- video sar
- moving target
- weight
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 32
- 230000004927 fusion Effects 0.000 claims abstract description 39
- 238000012549 training Methods 0.000 claims abstract description 35
- 238000013528 artificial neural network Methods 0.000 claims abstract description 33
- 238000000034 method Methods 0.000 claims abstract description 30
- 230000003044 adaptive effect Effects 0.000 claims abstract description 23
- 238000000605 extraction Methods 0.000 claims abstract description 22
- 238000009432 framing Methods 0.000 claims abstract description 9
- 238000002372 labelling Methods 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 18
- 238000010586 diagram Methods 0.000 claims description 9
- 230000002708 enhancing effect Effects 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 5
- 238000012360 testing method Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Images
Abstract
The invention provides a method and a device for detecting a video SAR moving target, wherein the method comprises the following steps: framing the video SAR to be trained, then respectively labeling, and expanding a data set in a data enhancement mode; performing primary feature extraction, and inputting the features after the primary feature extraction into BiFPN for further feature fusion extraction; inputting the shallow features output by the BiFPN into the CA, and outputting features which pay more attention to the space coordinates; the method comprises the steps of carrying out feature fusion on high-level features output by BiFPN and features output by CA, inputting the high-level features and the features to an adaptive feature fusion module, carrying out feature adaptive fusion on the input features by the adaptive feature fusion module, and classifying and regressing detection heads; performing iterative training on the deep neural network to obtain optimal weights; and inputting the video SAR to be detected into a trained deep neural network, and outputting the detected moving target. The invention can improve the detection efficiency and the accuracy of the video SAR moving target.
Description
Technical Field
The invention relates to the technical field of radar image processing, in particular to a method and a device for detecting a video SAR moving target.
Background
Synthetic aperture radar (Synthetic Aperture Radar, SAR) is an active earth-looking system that can image a variety of targets in high resolution all the day and day. The video SAR can continuously observe and image the region of interest, so that continuous tracking and monitoring of a target can be realized.
For moving objects, the Doppler modulation will cause them to shift and defocus as they are imaged, so that they appear as irregularly shaped shadows in the image, so detection of moving objects can be achieved by detecting shadows in the video SAR image. Conventional SAR image processing algorithms generally require preprocessing of the image, such as registration segmentation extraction, etc., whereas application of a deep neural network to shadow detection of a moving object can achieve end-to-end shadow detection without requiring a complex preprocessing process.
Target detection algorithms based on deep learning are mainly divided into two categories, namely a one-stage and a two-stage method. The one-stage method is to divide an image into S×S grids, and calculate a target probability that a grid center point falls within each grid. The two-stage method is to divide the whole detection process into two stages, firstly, extracting candidate frames in advance according to the position of a target in an image, and then classifying and regressing. The one-stage method detects much faster than the two-stage algorithm, which is one of the more classical algorithms in this class of algorithms. However, in the existing neural network model, the moving target detection precision is low due to factors such as low contrast of the video SAR image, speckle noise and the like.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a method and a device for detecting a moving target of a video SAR, which can improve the detection efficiency of the moving target of the video SAR and have higher detection accuracy of the moving target.
In order to solve the problems, the technical scheme of the invention is as follows:
a video SAR moving target detection method comprises the following steps:
framing the video SAR to be trained, then respectively labeling, and expanding a data set in a data enhancement mode;
performing primary feature extraction, and inputting the features after the primary feature extraction into BiFPN for further feature fusion extraction;
inputting the shallow features output by the BiFPN into the CA, and outputting features which pay more attention to the space coordinates;
the method comprises the steps of carrying out feature fusion on high-level features output by BiFPN and features output by CA, inputting the high-level features and the features to an adaptive feature fusion module, carrying out feature adaptive fusion on the input features by the adaptive feature fusion module, and classifying and regressing detection heads;
performing iterative training on the deep neural network to obtain optimal weights;
and inputting the video SAR to be detected into a trained deep neural network, and outputting the detected moving target.
Preferably, the step of framing the video SAR to be trained, and then labeling the frames, and expanding the data set by data enhancement specifically includes: reading a video SAR image to be trained, obtaining the frame rate, the width and the height of the video, and labeling each frame of image after framing the video SAR image; enhancing the labeled data set by using the enhancing functions such as clipping, mirroring, rotation and the like; renaming and storing the amplified images and labels in a one-to-one correspondence in sequence, and distributing the enhanced data set into a training set and a testing set according to a certain proportion.
Preferably, the step of performing preliminary feature extraction includes inputting the features after the preliminary feature extraction into a BiFPN for further feature fusion extraction, and performing the preliminary feature by using CSPDarknet.
Preferably, the step of inputting the shallow features of the BiFPN output into CA and outputting the features of the spatial coordinates of more interest specifically includes: CA attention separates the height and the width of an input image to encode the height and the width of the input image respectively, and carries out global average pooling on the width and the height of the input feature image respectively to obtain 1D feature images in two directions:
and (3) carrying out channel-by-channel coding on the input x by using pooling cores with the sizes of H1 and 1*W along the horizontal coordinate direction and the vertical coordinate direction respectively, carrying out convolution, batch normalization and nonlinearity on the feature graphs in the two directions after superposition, respectively carrying out convolution and multiplication with the input x after activation to obtain an attention weight graph, carrying out up-sampling on the feature graphs with small scale, carrying out superposition on the channel after ensuring that the dimensions of the two features are the same, and outputting the feature with more attention to the space coordinate.
Preferably, the step of performing feature fusion on the high-level feature output by the BiFPN and the feature output by the CA and inputting the feature fusion to the adaptive feature fusion module, where the adaptive feature fusion module performs feature adaptive fusion on the input feature, and the step of classifying and regressing the detection head specifically includes: three decoupling head structures for receiving characteristic layers with different scales are used as detection heads of a network, a convolution kernel of 1*1 is used for reducing the number of channels in the decoupling head structures, convolution, batch normalization and activation blocks are used, finally obtained values are overlapped, coordinates of grids on the corresponding characteristic diagrams are calculated, network coordinate points of the characteristic diagrams are created, and a prediction frame obtained by forward reasoning of the neural network is projected onto an original image to obtain the prediction frame.
Preferably, in the step of performing iterative training on the deep neural network to obtain the optimal weight, a loss function is defined before training on the neural network is started:where i is the index value of the training dataset, y i Is tag data,/->Is predictive data, there are M training data sets.
Preferably, in the step of performing iterative training on the deep neural network to obtain the optimal weight, the training loss optimizer is a random gradient descent-based function optimization algorithm, calculates a gradient corresponding to the weight for the loss function, changes the weight toward the opposite direction until the loss function converges to a local minimum, updates the weight value in each iterative training, and has a weight calculation formula of:wherein w is j Is the weight of the jth iteration, w j+1 The weight of the j+1st iteration is the learning rate, lr is the loss function, and the weight obtained by each iteration training is calculated by the weight of the last iteration.
Preferably, in the step of performing iterative training on the deep neural network to obtain the optimal weight, the loss type is an overlap ratio IoU, the IoU is an overlap ratio of the generated prediction frame and the real frame, and a IoU formula is as follows:and storing a weight value once in each iteration, and obtaining the optimal weight value of the deep neural network by training the iteration for a plurality of times.
Further, the present invention also provides a video SAR moving object detection apparatus, characterized in that the apparatus comprises a processor configured to perform the video SAR moving object detection method as described above via execution of the executable instructions of the processor, and a memory for storing the executable instructions of the processor.
Compared with the prior art, the method takes the Yolox backbone network CSPDarknet as a reference line of the network, uses BiFPN to further fuse and extract the characteristics, adopts a coordinated attention mechanism CA to strengthen the attention of part of output characteristic layers, fuses with the output of the BiFPN and inputs the result into an adaptive characteristic fusion module ASFF, and finally uses three characteristic layers output from the adaptive characteristic fusion module to classify and return. The designed deep neural network is used in the detection of the video SAR moving target, and a good effect is obtained in the detection of the blurred video SAR moving target; compared with the traditional image processing method, the method does not need to preprocess the video SAR image, improves the detection efficiency, and has higher accuracy of detecting the moving target compared with the traditional deep neural networks such as the YOLOX, the fast-RCNN and the like.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:
fig. 1 is a flow chart of a method for detecting a moving target of a video SAR according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a deep neural network in a method for detecting a moving target of a video SAR according to an embodiment of the present invention;
fig. 3 is a schematic diagram of Fusion structure in a deep neural network structure according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a CBS structure in a deep neural network structure according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.
Specifically, the invention provides a method for detecting a moving target of a video SAR, as shown in figure 1, comprising the following steps:
s1: framing the video SAR to be trained, then respectively labeling, and expanding a data set in a data enhancement mode;
specifically, in step S1, a video SAR image to be trained is read, a frame rate, a width and a height of the video are obtained, and each frame of image after framing the video SAR is labeled; enhancing the labeled data set by using the enhancing functions such as clipping, mirroring, rotation and the like; renaming and storing the amplified images and labels in a one-to-one correspondence in sequence, and distributing the enhanced data set into a training set and a testing set according to a certain proportion.
S2: performing primary feature extraction, and inputting the features after the primary feature extraction into BiFPN for further feature fusion extraction;
specifically, as shown in fig. 2, 3 and 4, training or testing images are input to a CSPDarknet of a deep neural network to perform preliminary feature extraction, and the three feature layers are named as: "dark3", "dark4" and "dark5".
Further, in order to improve the detection precision of the network, the feature layer after the primary feature extraction is input into a weighted feature bidirectional pyramid network (BiFPN) to perform multi-scale feature fusion extraction.
S3: inputting the shallow features output by the BiFPN into a CA attention mechanism, and outputting features which pay more attention to space coordinates;
specifically, in step S3, the shallow feature layers p3_out and p4_out obtained from the BiFPN are input into the CA attention mechanism.
CA attention converts the input 2D coordinates into 1D, namely, the height and the width of the image are separated to be respectively encoded, the width and the height of the input feature map are respectively subjected to global average pooling, and the 1D feature map in two directions is obtained:
the pooling kernels of H1 and 1*W sizes are used for the input x to perform channel-by-channel encoding along the horizontal and vertical coordinate directions, respectively, where equation (1) is the output representation of the c-th channel of height H and equation (2) is the c-th channel representation of width w. The feature graphs in two directions are overlapped, convolved, normalized in batches and nonlinear, respectively convolved and activated, and multiplied by the input x.
After the feature maps with two different sizes are input into a CA attention mechanism to obtain an attention weight map, up-sampling is carried out on the feature maps with small scale, superposition is carried out on a channel after the same scale of the two features is ensured, and the features with more attention to space coordinates are output.
S4: the method comprises the steps of carrying out feature fusion on high-level features output by BiFPN and features output by CA, inputting the high-level features and the features to an adaptive feature fusion module, carrying out feature adaptive fusion on the input features by the adaptive feature fusion module, and carrying out classification and regression through a detection head;
specifically, in step S4, the high-level feature layer output in step S2 and the feature layer output in step S3 are subjected to feature fusion and then input to an adaptive feature fusion module (ASFF).
Specifically, three decoupling head structures for receiving characteristic layers with different scales are used as detection heads of a network, a convolution kernel of 1*1 is used for reducing the number of channels in the decoupling head structures, convolution, batch normalization and activation blocks are used, finally obtained values are overlapped, coordinates of grids on the corresponding characteristic diagrams are calculated, network coordinate points of the characteristic diagrams are created, and a prediction frame obtained by forward reasoning of the neural network is projected onto an original image to obtain the prediction frame.
When screening the prediction frame, the method comprises two steps:
the first step is that a positive sample prediction frame is primarily screened, all prediction frames with the central points of the prediction frames in a real frame are screened, and then the prediction frames in a square which expands the real frame by 2.5 times of step length are screened;
the second step is to use a simplified optimal transmission allocation (Optimal Transport Assignment, OTA) algorithm for further screening of the prediction block.
S5: performing iterative training on the deep neural network to obtain optimal weights;
specifically, in step S5, a loss function is defined before training of the neural network is started:
where i is the index value of the training dataset, y i Is the label data of the label which is to be read,is predictive data, there are M training data sets. The optimizer of training loss is random gradient descent (stochastic gradient descent, SGD), the loss type is cross ratio (Intersection over Union, ioU), weight is stored once for each iteration, and the training iteration obtains the optimal weight of the deep neural network multiple times.
SGD is a function optimization algorithm based on random gradient descent, and calculates a gradient corresponding to the weight of the loss function, and changes the weight in the opposite direction until the loss function converges to a local minimum. The weight value is updated in each iterative training, and the weight calculation formula is as follows:
wherein w is j Is the weight of the jth iteration, w j+1 The weight of the j+1st iteration is the learning rate, lr is the loss function, and the weight obtained by each iteration training is calculated by the weight of the last iteration.
IoU is a criterion used to measure the accuracy of detecting targets in a dataset when calculating losses. IoU formula is as follows:
IoU is the overlap ratio of the generated prediction frames and the real frames, i.e. the ratio of their intersection (Area of overlay) to the union (Area of union).
S6: and inputting the video SAR to be detected into a trained deep neural network, and outputting the detected moving target.
Specifically, in step S6, each frame of image of the video SAR to be detected is input into a trained deep neural network, so as to obtain the detected moving target.
Compared with the prior art, the method takes the Yolox backbone network CSPDarknet as a reference line of the network, uses BiFPN to further fuse and extract the characteristics, adopts a coordinated attention mechanism CA to strengthen the attention of part of output characteristic layers, fuses with the output of the BiFPN and inputs the result into an adaptive characteristic fusion module ASFF, and finally uses three characteristic layers output from the adaptive characteristic fusion module to classify and return. The designed deep neural network is used in the detection of the video SAR moving target, and a good effect is obtained in the detection of the blurred video SAR moving target; compared with the traditional image processing method, the method does not need to preprocess the video SAR image, improves the detection efficiency, and has higher accuracy of detecting the moving target compared with the traditional deep neural networks such as the YOLOX, the fast-RCNN and the like.
The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes or modifications may be made by those skilled in the art within the scope of the appended claims without affecting the spirit of the invention. The embodiments of the present application and features in the embodiments may be combined with each other arbitrarily without conflict.
Claims (9)
1. A method for detecting a moving target of a video SAR, comprising the steps of:
framing the video SAR to be trained, then respectively labeling, and expanding a data set in a data enhancement mode;
performing primary feature extraction, and inputting the features after the primary feature extraction into BiFPN for further feature fusion extraction;
inputting the shallow features output by the BiFPN into the CA, and outputting features which pay more attention to the space coordinates;
the method comprises the steps of carrying out feature fusion on high-level features output by BiFPN and features output by CA, inputting the high-level features and the features to an adaptive feature fusion module, carrying out feature adaptive fusion on the input features by the adaptive feature fusion module, and classifying and regressing detection heads;
performing iterative training on the deep neural network to obtain optimal weights;
and inputting the video SAR to be detected into a trained deep neural network, and outputting the detected moving target.
2. The method for detecting a moving target of a video SAR according to claim 1, wherein the step of framing the video SAR to be trained, respectively labeling, and expanding the data set by data enhancement specifically comprises: reading a video SAR image to be trained, obtaining the frame rate, the width and the height of the video, and labeling each frame of image after framing the video SAR image; enhancing the labeled data set by using the enhancing functions such as clipping, mirroring, rotation and the like; renaming and storing the amplified images and labels in a one-to-one correspondence in sequence, and distributing the enhanced data set into a training set and a testing set according to a certain proportion.
3. The method for detecting a moving target of a video SAR according to claim 1, wherein the step of performing preliminary feature extraction, inputting the features after the preliminary feature extraction into BiFPN for further feature fusion extraction, and performing the preliminary feature using CSPDarknet.
4. The method for detecting a moving target of a video SAR according to claim 1, wherein said step of inputting the shallow feature of the BiFPN output into CA and outputting the feature of the spatial coordinate of more interest specifically comprises: CA attention separates the height and the width of an input image to encode the height and the width of the input image respectively, and carries out global average pooling on the width and the height of the input feature image respectively to obtain 1D feature images in two directions:
and (3) carrying out channel-by-channel coding on the input x by using pooling cores with the sizes of H1 and 1*W along the width direction and the height direction respectively, carrying out convolution, batch normalization and nonlinearity on the feature graphs in the two directions after superposition, respectively carrying out convolution and activation, multiplying the feature graphs with the input x, carrying out up-sampling on the feature graphs with small scale after obtaining the attention weight graph, carrying out superposition on the feature graphs on the channel after ensuring that the dimensions of the two feature graphs are the same, and outputting the feature of more attention space coordinates.
5. The method for detecting a moving target of a video SAR according to claim 1, wherein the step of performing feature fusion on the high-level feature output by BiFPN and the feature output by CA and then inputting the fused features to the adaptive feature fusion module, and the adaptive feature fusion module performs feature adaptive fusion on the input features, and the step of classifying and regressing the detection head specifically comprises: three decoupling head structures for receiving characteristic layers with different scales are used as detection heads of a network, a convolution kernel of 1*1 is used for reducing the number of channels in the decoupling head structures, convolution, batch normalization and activation blocks are used, finally obtained values are overlapped, coordinates of grids on the corresponding characteristic diagrams are calculated, network coordinate points of the characteristic diagrams are created, and a prediction frame obtained by forward reasoning of the neural network is projected onto an original image to obtain the prediction frame.
6. The method for detecting a moving target of a video SAR according to claim 1, wherein in said step of iteratively training the deep neural network to obtain the optimal weight, a loss function is defined before training the neural network:where i is the index value of the training dataset, y i Is tag data,/->Is predictive data, there are M training data sets.
7. The method for detecting a moving target of a video SAR according to claim 6, wherein in the step of performing iterative training on the deep neural network to obtain the optimal weight, the optimizer for training the loss is a random gradient descent-based function optimization algorithm, the gradient corresponding to the weight is calculated on the loss function, the weight is changed in the opposite direction until the loss function converges to a local minimum, the weight is updated in each iterative training, and the weight calculation formula is:wherein w is j Is the weight of the jth iteration, w j+1 The weight of the j+1st iteration is the learning rate, lr is the loss function, and the weight obtained by each iteration training is calculated by the weight of the last iteration.
8. The method for detecting a video SAR moving target according to claim 7, wherein in the step of iteratively training the deep neural network to obtain the optimal weight, the loss type is an overlap ratio IoU, which is an overlap ratio of the generated prediction frame and the real frame, and the IoU formula is:and storing a weight value once in each iteration, and obtaining the optimal weight value of the deep neural network by training the iteration for a plurality of times.
9. A video SAR moving object detection apparatus, comprising a processor configured to perform the video SAR moving object detection method of any one of claims 1 to 8 via execution of executable instructions of the processor, and a memory for storing the executable instructions of the processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310099920.2A CN115995042A (en) | 2023-02-09 | 2023-02-09 | Video SAR moving target detection method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310099920.2A CN115995042A (en) | 2023-02-09 | 2023-02-09 | Video SAR moving target detection method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115995042A true CN115995042A (en) | 2023-04-21 |
Family
ID=85993406
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310099920.2A Pending CN115995042A (en) | 2023-02-09 | 2023-02-09 | Video SAR moving target detection method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115995042A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116912290A (en) * | 2023-09-11 | 2023-10-20 | 四川都睿感控科技有限公司 | Memory-enhanced method for detecting small moving targets of difficult and easy videos |
CN117372935A (en) * | 2023-12-07 | 2024-01-09 | 神思电子技术股份有限公司 | Video target detection method, device and medium |
-
2023
- 2023-02-09 CN CN202310099920.2A patent/CN115995042A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116912290A (en) * | 2023-09-11 | 2023-10-20 | 四川都睿感控科技有限公司 | Memory-enhanced method for detecting small moving targets of difficult and easy videos |
CN116912290B (en) * | 2023-09-11 | 2023-12-15 | 四川都睿感控科技有限公司 | Memory-enhanced method for detecting small moving targets of difficult and easy videos |
CN117372935A (en) * | 2023-12-07 | 2024-01-09 | 神思电子技术股份有限公司 | Video target detection method, device and medium |
CN117372935B (en) * | 2023-12-07 | 2024-02-20 | 神思电子技术股份有限公司 | Video target detection method, device and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109584248B (en) | Infrared target instance segmentation method based on feature fusion and dense connection network | |
CN110135267B (en) | Large-scene SAR image fine target detection method | |
CN108509978B (en) | Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion | |
CN108399362B (en) | Rapid pedestrian detection method and device | |
Sameen et al. | Classification of very high resolution aerial photos using spectral-spatial convolutional neural networks | |
CN110047069B (en) | Image detection device | |
CN111027493B (en) | Pedestrian detection method based on deep learning multi-network soft fusion | |
CN111310861A (en) | License plate recognition and positioning method based on deep neural network | |
CN112800964B (en) | Remote sensing image target detection method and system based on multi-module fusion | |
CN115995042A (en) | Video SAR moving target detection method and device | |
CN110659664B (en) | SSD-based high-precision small object identification method | |
CN114022830A (en) | Target determination method and target determination device | |
CN111368769A (en) | Ship multi-target detection method based on improved anchor point frame generation model | |
CN116645592B (en) | Crack detection method based on image processing and storage medium | |
CN115631344B (en) | Target detection method based on feature self-adaptive aggregation | |
CN111898432A (en) | Pedestrian detection system and method based on improved YOLOv3 algorithm | |
CN112016569A (en) | Target detection method, network, device and storage medium based on attention mechanism | |
Abdollahi et al. | Road extraction from high-resolution orthophoto images using convolutional neural network | |
CN113850129A (en) | Target detection method for rotary equal-variation space local attention remote sensing image | |
CN111179270A (en) | Image co-segmentation method and device based on attention mechanism | |
Rafique et al. | Smart traffic monitoring through pyramid pooling vehicle detection and filter-based tracking on aerial images | |
CN114764856A (en) | Image semantic segmentation method and image semantic segmentation device | |
CN115937659A (en) | Mask-RCNN-based multi-target detection method in indoor complex environment | |
Wang | Remote sensing image semantic segmentation algorithm based on improved ENet network | |
CN110852255B (en) | Traffic target detection method based on U-shaped characteristic pyramid |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |