CN113469073A - SAR image ship detection method and system based on lightweight deep learning - Google Patents
SAR image ship detection method and system based on lightweight deep learning Download PDFInfo
- Publication number
- CN113469073A CN113469073A CN202110765081.4A CN202110765081A CN113469073A CN 113469073 A CN113469073 A CN 113469073A CN 202110765081 A CN202110765081 A CN 202110765081A CN 113469073 A CN113469073 A CN 113469073A
- Authority
- CN
- China
- Prior art keywords
- model
- yolov5s
- training
- module
- detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 113
- 238000013135 deep learning Methods 0.000 title claims abstract description 22
- 238000012549 training Methods 0.000 claims abstract description 87
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 28
- 238000013138 pruning Methods 0.000 claims abstract description 27
- 238000005520 cutting process Methods 0.000 claims abstract description 15
- 230000001133 acceleration Effects 0.000 claims abstract description 11
- 238000007781 pre-processing Methods 0.000 claims abstract description 9
- 238000000034 method Methods 0.000 claims description 32
- 238000012545 processing Methods 0.000 claims description 32
- 238000004821 distillation Methods 0.000 claims description 18
- 210000004027 cell Anatomy 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 12
- 238000012216 screening Methods 0.000 claims description 6
- 230000002401 inhibitory effect Effects 0.000 claims description 5
- 238000011068 loading method Methods 0.000 claims description 5
- 210000002569 neuron Anatomy 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 4
- 230000003044 adaptive effect Effects 0.000 claims description 3
- 238000004220 aggregation Methods 0.000 claims description 3
- 230000002776 aggregation Effects 0.000 claims description 3
- 230000001419 dependent effect Effects 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000007477 logistic regression Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 2
- 230000006835 compression Effects 0.000 abstract description 22
- 238000007906 compression Methods 0.000 abstract description 22
- 238000013140 knowledge distillation Methods 0.000 abstract description 8
- 230000002194 synthesizing effect Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 18
- 238000003860 storage Methods 0.000 description 16
- 230000002829 reductive effect Effects 0.000 description 12
- 238000004590 computer program Methods 0.000 description 9
- 238000013527 convolutional neural network Methods 0.000 description 8
- 238000011160 research Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 238000012805 post-processing Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 2
- 102100031315 AP-2 complex subunit mu Human genes 0.000 description 2
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 2
- 101000796047 Homo sapiens AP-2 complex subunit mu Proteins 0.000 description 2
- 230000004913 activation Effects 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
The invention discloses an SAR image ship detection method and system based on lightweight deep learning, which comprises the steps of preprocessing a large-size SAR image, and selecting a training sample; introducing a Ghost module and Ghost Bottleneck to upgrade YOLOv5s to obtain a primary lightweight model of YOLOv5 s; on the basis of the primary lightweight model, further lightweight of the model is realized by using network pruning and knowledge distillation of a traditional model lightweight algorithm; carrying out reasoning acceleration on the light-weighted Yolov5s model by using a TensorRT reasoning optimizer and deploying the model on NVIDIA Jetson TX 2; cutting large-size SAR images to be detected, and sequentially sending the cut large-size SAR images to a model to complete detection; and synthesizing the detection result, and using NMS non-maximum value to inhibit, screen and predict the frame on the final large-size SAR image. On the premise of meeting the acceptable precision loss, the parameter number and the floating point operand of the compression model improve the detection speed.
Description
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to an SAR image ship detection method and system based on lightweight deep learning.
Background
Alexnet appeared in 2012, and the application trend of deep convolutional neural network was raised in the field of computers. Deeper models, which often means that the models have better nonlinear expression capabilities, can perform more complex transformations, and thus can fit more complex features. Based on such an assumption, the deep convolutional neural network develops towards a deeper direction and a wider direction, and although the deep convolutional neural network shows more excellent performance in various tasks, the network model has a larger volume, which is contrary to the hardware conditions of various embedded devices at the current mobile end, and each result of the deep neural network research can only be high-level and cannot fall to the ground. The development speed of the deep neural network is equivalent to that of various mobile terminal devices, the devices usually do not have a high-performance computing cluster of a Graphic Processing Unit (GPU), only a Central Processing Unit (CPU) completes a computing task, and cannot provide a storage space and a computing condition matched with a large convolutional neural network for extracting depth features with stronger expression capability at the present stage, which seriously hinders the development and application of the deep convolutional upgrading network on portable devices. In order to greatly promote falling of the artificial intelligence industry, a large number of scholars in the academic world and the industrial industry invest in the research of a network model lightweight algorithm so as to improve the performance and the efficiency of the portable equipment in the aspect of image processing.
The existing methods for lightening the network can be mainly divided into two categories: model compression and compact model design. The model compression refers to compressing the neural network model according to the model structure and parameters, so that the requirements of the model on storage equipment and computing resources are reduced, and the portable memory and computing power limitation requirements of a mobile terminal are met. The model compression is oriented to the redundant part of the network structure and the network weight, and the accuracy is sacrificed to a certain extent to obtain a model with less redundancy, higher speed and more simplification. The algorithms proposed at present include NetWork Pruning (NetWork Pruning), Model Quantization (Model Quantization), Binarization Method (Binarization Method), Low-rank Decomposition (Low-rank Decomposition), Knowledge Distillation (Knowledge Distillation), and the like. Because the redundancy degrees of all layers of the deep neural network are different, the conventional model compression algorithm is usually overfitting to a specific model, and if the model compression algorithm which is suitable for the redundancy degrees of all layers of each model is manually searched for each model, time and labor are wasted, so that the development of an automatic machine learning algorithm (AutoML) is promoted, the automatic learning and searching of local optimal network hyper-parameters and structures are automatically performed, the manual interference is avoided, and meanwhile, the automatic model compression algorithm can be popularized to all models. Based on AutoML, the university of Western-An transportation and Google research team, an automatic model compression Algorithm (AMC) is provided, reinforcement learning is introduced into the model compression algorithm, and compared with a traditional rule-based compression strategy, the compression ratio is higher under the condition of keeping the network model performance. A series of compact models like Xception, MobileNetV1, MobileNetV2, MobileNetV3, ShuffleNet, ShuffleNetv2, etc. have also been proposed in recent years. These network models generally start from the point of view of reducing the redundancy of the convolution kernel, compressing the number of channels, and replacing the traditional convolution with an efficient convolution module. The small convolution kernels are used in the convolution layer to reduce the redundancy of the convolution kernels, so that the network parameters are effectively reduced. The Fire module proposed in the SqueezeNet consists of a squeeze layer and an expanded layer, and the number of input channels of a 3 × 3 convolution kernel is further reduced by reducing the number of 1 × 1 convolution kernels in the squeeze layer. The conventional Convolution is decomposed into a depth Convolution (Depthwise Convolution) and a point Convolution (poithwise Convolution) in the MobilenetV1 by using a depth Separable Convolution (Depthwise Separable Convolution); the Shuffle net further proposes a scrambling (Shuffle) operation and a grouping point-by-point convolution (grouping point convolution), rearranges the features, so that the feature information circulates in each channel group; MobileneetV 2 proposes an Inverted residual block (Inverted residual block), MobileneetV 3 uses a neural network architecture search technology (NAS), introduces an SE (squeeze and excitation) module, and further compresses a model of a network structure by selecting an H-swish activation function. These excellent lightweight network models have achieved good results in model compression and acceleration with a small loss of accuracy.
The target detection is also called target category detection or target classification detection, and returns the category information and the position information of the interested target in the image. In the last two decades, the method is a research hotspot in the fields of computer vision and digital image processing. Alexnet proposed in 2012, which was previously based on traditional manual feature-based target detection methods, as is well known: V-J detection, HOG detection, DPM detection combined with Bounding box regression. After 2012, with the rise of a convolutional neural network and exponential increase of GPU performance, deep learning is developed explosively, target detection also enters a deep learning period, a preselection frame is generated according to whether an algorithm is needed, and the target detection algorithm based on the deep learning can be divided into a single-stage (One-stage) detection algorithm and a Two-stage (Two-stage) detection algorithm. The representative networks in the single-stage detection algorithm include YOLO series, SSD and RetinaNet. The method is mainly characterized by low detection precision and high detection speed. Typical networks for the two-stage detection algorithm are R-CNN, SPP-Net, Fast R-CNN and Faster R-CNN. Unlike single-stage detection algorithms, two-stage detection has high detection accuracy but high time cost. Until now, the most excellent target detection algorithm is still difficult to compare favorably with the detection of human eyes. Current target detection still faces a number of challenges. Aiming at the requirement of high accuracy, the diversity caused by the texture, color and material of the similar object; diversity of target instance poses and deformations; the difference of the sampling process environment and the influence of image noise influence the robustness of the algorithm to the intra-class deformation. As for class-to-class distinctiveness, this is generally determined by the similarity between classes and the diversity of classes. Aiming at the requirements of time and memory occupation high efficiency, the richness of natural categories, the duality of positioning and classification of target detection tasks and the increasingly huge volume of image data provide higher requirements for the current target detection algorithm, and the method is also the field of ascending of each large study learner.
High-resolution image target detection based on big data is always a popular research direction in the field of remote sensing image processing, the traditional target detection and identification method cannot be adjusted in a self-adaptive manner aiming at mass data of remote sensing images, a large number of image characteristics need to be designed artificially, great time cost is brought, meanwhile, extremely high requirements are provided for researchers on professional knowledge and understanding of the data characteristics, and an efficient classifier is searched to fully understand the data as if the data is fished out in the sea. And the powerful high-level (more abstract and semantic meaning) feature representation and learning capability of deep learning can provide an effective framework for target extraction in the image. Related researches comprise vehicle detection, ship detection, crop detection, and ground object detection of buildings.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a SAR image ship detection method based on lightweight deep learning aiming at the defects in the prior art, and a lightweight target detection network is deployed in an embedded device NVIDIA Jetson TX2 to realize large-size SAR image ship detection. And (3) taking the target detection network YOLOv5 as a baseline network, and combining a traditional model compression algorithm and a Ghost light-weight module to realize light weight of the baseline network.
The invention adopts the following technical scheme:
a SAR image ship detection method based on lightweight deep learning comprises the following steps:
s1, preprocessing the large-size SAR image, and selecting a subgraph containing target information as a training sample;
s2, a Ghost module and a Ghost Bottleneck are introduced to upgrade the YOLOv5S model to obtain a primary lightweight YOLOv5S model, and the training sample selected in the step S1 is used for training the YOLOv5S model;
s3, distilling the YOLOv5S model obtained after training in the step S2, then performing sparseness training and pruning, and performing fine tuning training on the YOLOv5S model after pruning;
s4, carrying out inference acceleration on the YOLOv5S model after fine tuning training in the step S3 by using a TensorRT inference optimizer, and deploying the model on NVIDIA Jetson TX 2;
s5, cutting the SAR image to be detected, sequentially sending the cut SAR image to a step S4, deploying the YOLOv5S model on NVIDIA Jetson TX2, and detecting to obtain a corresponding sub-graph detection result;
and S6, splicing the sub-graph detection results obtained in the step S5, using an NMS non-maximum value to inhibit and screen a prediction frame on the final large-size SAR image, drawing the prediction frame on the original large-size image according to the value of the screened prediction frame, and marking the category, thereby realizing the SAR image ship detection.
Specifically, step S1 specifically includes:
s101, dicing 5 single-channel TIF images Img10K and 31 single-channel TIFF images AIR-SARShip-1.0 at a coincidence rate of 50% to obtain sub-images of large-size remote sensing images;
s102, amplifying 1000 8-bit JPG images SAR-train-int;
s103, unifying the Img-10K, AIR-SARShip-1.0 obtained in the step S101 and the SAR-train-int image obtained in the step S102 into an 8-bit single-channel TIF image to obtain a data set comprising 2551 pictures, dividing 2351 pictures into training samples and 200 pictures into verification samples;
and S104, performing random operation on the training sample in the step S103 by using a Mosaic data enhancement algorithm, and splicing every four pictures in the training sample in a random scaling, random cutting and random arrangement mode.
Specifically, step S2 specifically includes:
s201, replacing a convolution module and a bottleneck module in a backbone network of a YOLOv5S model by using a Ghost module and a GhostBottleneck, and upgrading the YOLOv5S model by using the Ghost module and the GhostBottleneck;
s202, adjusting the width multiplier to be 0.15, adjusting the depth multiplier to be 0.35, and reducing the number of network layers to 212 layers to obtain a primary lightweight YOLOv5S model.
Specifically, step S3 specifically includes:
s301, using YOLOv5m as a teacher model, using L2 loss as a distillation basis function, selecting a distillation dist equilibrium coefficient in loss as 1, and carrying out distillation training for 100 epochs;
s302, after a hyperparameterized model is obtained through normal training, setting a sparse parameter to be 6e-4, conducting L1 regularization on gamma parameters of a BN layer through sparse training, generating a sparse weight matrix as a standard for evaluating the contribution of neurons, determining a threshold according to 30% sparse rate, cutting off a layer smaller than the threshold and a dependent layer of a corresponding layer, and if all channels in the corresponding layer need to be removed, keeping the largest channel;
and S303, after the pruning processing in the step S302 is finished, continuously training the model obtained in the step S302 for 50 epochs, and learning the final weight of the sparse connection through fine tuning training.
Specifically, in step S4, the deployment of the TensorRT inference optimizer includes a Build stage and a Deploymeng stage, which specifically includes:
s401, optimizing at the Build stage by using a Pythrch training model to obtain a pt file, converting the pt file into an onnx model, loading the onnx model in TensorRT, and converting the onnx model into a TensorRT model; then the TensorRT model is stored in a disk or a memory in a serialized mode and is called a plan file;
s402, deploying a lightweight YOLOv5 model in a Deployment stage, deserializing the plan file obtained in the step S401, creating a runtime engine, and completing a forward reasoning process.
Specifically, step S5 specifically includes:
s501, sending a sub-image of a picture to be detected into a trained lightweight YOLOv5S model for detection, if the sub-image of the picture to be detected does not meet the requirement of the model for the size of the picture, carrying out adaptive picture scaling, sending the sub-image into a feature extraction network to obtain a feature map with the size of S multiplied by S, and dividing an input image into small lattices with the size of S multiplied by S;
s502, predicting B bounding boxes by using logistic regression for each grid, if the center of the predicted bounding box is in a grid unit, classifying and frame predicting the target by the B bounding boxes of the grid unit to obtain the prediction result of each grid on the B bounding boxes, outputting the position information of the bounding boxes, the confidence coefficient indicating whether the grid contains the target and the probability information of C classes, and predicting t by each bounding box to obtainx、ty、tw、th、to,tx、tyIs the offset value of the bounding box center coordinates relative to the current grid cell; using logically activated pairs txAnd tyCarrying out normalization processing to limit the value within 0-1, tw、thIs the scaling of the bounding box width and height, toIs confidence;
s503, adopting a feature pyramid network to downsample and transmit a strong semantic feature from top to bottom and a path aggregation network to upsample and transmit a strong positioning feature from bottom to top to fuse detection results of three scales respectively; for a picture input size of 960 × 960, the output feature maps are 120 × 120, 60 × 60, 30 × 30, respectively, 8-fold, 16-fold, and 32-fold down-sampled results, respectively.
Further, in step S502, the coordinates b of the center point of the predicted bounding box in the whole feature map are obtained according to the 5 values predicted by each bounding boxx、byAnd length and width bw、bhThe following were used:
bx=σ(tx)+cx
by=σ(ty)+cy
where the sigma function is logically active, cxAnd cyRespectively, the distance, p, of the current grid cell with respect to the top left corner of the feature mapwAnd phThe length and width of the prior box.
Further, the coordinate offset and the confidence are limited within 0-1, when the real box is in the grid cell, Pr (object) is 1, otherwise Pr (object) is 0, the grid cell belongs to a certain class of probability Pr (class) under the condition of containing the objecti| object) is expressed as
Wherein the content of the first and second substances,pr (class) which is the intersection ratio of the real frame and the predicted framei) The probability of the corresponding category of the target in a certain cell is obtained.
Specifically, step S6 specifically includes:
s601, calculating the position information of the target on the large graph according to the position information of the target on the sub-graph and the relative position of the sub-graph on the large graph;
s602, aiming at a certain class, setting an NMS threshold value to be 0.65, selecting a boundary box with the highest confidence coefficient, filtering all boundary boxes exceeding the NMS threshold value according to the DIOU values of the boundary box and other boundary boxes, and performing picture frame according to the reserved prediction box after the prediction box is screened, so as to finish ship detection of the large-size SAR image.
Another technical solution of the present invention is an SAR image ship detection system based on lightweight deep learning, comprising:
the data module is used for preprocessing the large-size SAR image and selecting a subgraph containing target information as a training sample;
the processing module is used for introducing a Ghost module and a Ghost Bottleneck to upgrade the YOLOv5s model to obtain a primary lightweight YOLOv5s model, and training the YOLOv5s model by using training samples selected by the data module;
the fine tuning module is used for distilling the YOLOv5s model obtained after the training of the processing module, then performing sparse training and pruning, and performing fine tuning training on the pruned YOLOv5s model;
the reasoning module is used for carrying out reasoning acceleration on the YOLOv5s model after the fine tuning training of the fine tuning module by using a TensorRT reasoning optimizer and deploying the model on NVIDIA Jetson TX 2;
the detection module is used for cutting the SAR image to be detected and then sequentially sending the SAR image to the inference module to be deployed on a YOLOv5s model on NVIDIA Jetson TX2 for detection to obtain a corresponding sub-graph detection result;
and the removing module is used for splicing the sub-graph detection results obtained by the detection module, inhibiting and screening a prediction frame on the final large-size SAR image by using the NMS non-maximum value, drawing the prediction frame on the original large-size image according to the value of the screened prediction frame and marking the category, so that the large-size SAR image ship detection is realized.
Compared with the prior art, the invention has at least the following beneficial effects:
the invention relates to an SAR image ship detection method based on lightweight deep learning, which adopts the technical means of network pruning, knowledge distillation and Ghost algorithm, directly scales and inputs the image to a network to cause excessive information loss aiming at the characteristic of large size of a remote sensing image, and adopts a mode of cutting the image into blocks with a certain contact ratio to avoid the network information loss and ensure that the image size is matched with the network input; combining traditional model compression algorithm network pruning and knowledge distillation with a manually designed lightweight model Ghost, and upgrading a target detection network YOLOv 5; the parameter quantity and the floating point operand of the model are reduced to a great extent, and the reasoning speed is improved.
Furthermore, the sizes and the formats of the pictures are unified aiming at the blocks cut by the data sets with different sizes and formats with a certain contact ratio, and the sizes and the formats of the training samples of the input models are ensured to be consistent.
Further, a convolution module and a bottleneck module in the YOLOv5s model are optimized and upgraded by using a lightweight model Ghost, a width multiplier is adjusted to be 0.15, a depth multiplier is adjusted to be 0.35, and the number of network layers is reduced to 212 layers, so that parameters of the model and floating point operation amount are reduced.
Furthermore, in order to further compress the model, a traditional model compression algorithm, network pruning and knowledge distillation are introduced, the knowledge distillation teaches the superior performance of a large model YOLOv5m to the light-weighted YOLOv5s, the performance of the model is improved to a certain extent, and the network pruning cuts out relatively unimportant neurons by measuring the importance of the neurons, so that model parameters and floating point operand are further reduced.
Further, the meaning of the lightweight model is that in order to implement the deployment of the deep learning model on the embedded device, step S4 deploys the lightweight yollov 5S on NVIDIA Jetson TX2 by using a TensorRT inference optimizer.
Further, the detection process of the light-weight YOLOv5S on the large-size SAR image is explained by step S5, and a result graph marked with the prediction frame and the category information is finally obtained.
Further, the description has been made by calculating the center point coordinates and the length and width of the prediction bounding box in the entire feature map in step S502.
Further, the probability Pr (class) that a lattice cell belongs to a certain class under the condition of containing an objecti| object) is the result of the network output
Further, step S6 restores the sub-graph result of the to-be-detected picture to the original-size picture, and filters the prediction frame with higher repetition degree through NMS, so as to obtain the final detection result.
In conclusion, the invention provides a complete model lightweight process, finally obtains a lightweight YOLOv5s model, deploys the lightweight YOLOv5s model on an embedded device NVIDIA Jetson TX2, and completes ship and warship tasks of large-size SAR images.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a graphical representation of the Ghost model;
FIG. 3 is a GhostBottleneck diagram;
FIG. 4 is a schematic diagram of a key part of a complex model after confidence degrees of complex scene detection results are concealed;
FIG. 5 is a schematic diagram of a key part of a complex model for detecting a complex scene, where confidence is not hidden in a result;
FIG. 6 is a schematic diagram of a key part of a complex model for detecting a complex scene, where confidence is not hidden in a result;
FIG. 7 is a diagram of a key part of a simple model after confidence is concealed from a result obtained by simple scene detection;
FIG. 8 is a diagram of a key part of a simple model for detecting a simple scene, where confidence is not hidden in a result.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be understood that the terms "comprises" and/or "comprising" indicate the presence of the stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Various structural schematics according to the disclosed embodiments of the invention are shown in the drawings. The figures are not drawn to scale, wherein certain details are exaggerated and possibly omitted for clarity of presentation. The shapes of various regions, layers and their relative sizes and positional relationships shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, according to actual needs.
The invention provides an SAR image ship detection method based on lightweight deep learning, which is oriented to embedded equipment NVIDIA Jetson TX2, relates to a model compression method, utilizes a traditional model compression algorithm and a manually designed lightweight model to compress and optimize a target detection network, and can be applied to the detection of certain specific targets in large-size synthetic aperture radar images; on the premise of meeting the acceptable precision loss, the parameter number and the floating point operand of the compression model improve the detection speed.
Referring to fig. 1, the SAR image ship detection method based on lightweight deep learning of the present invention includes the following steps:
s1, preprocessing the large-size SAR image, selecting a subgraph containing target information as a training sample, and obtaining 500 epochs of the light-weight YOLOv5S model in the step S2 on the training sample;
s101, cutting 5 10000 × 10000 pixels, 16-bit single-channel TIF images Img10K, 31 3000 × 3000 pixels and 16-bit single-channel TIFF images AIR-SARShip-1.0 into blocks with an overlap ratio of 50%, obtaining sub-images of large-size remote sensing images through cutting, and inputting the sub-images serving as training samples into a network for training;
s102, amplifying 1000 SAR-train-int images of 800 × 800 pixels and 8 bits of JPG to 1000 × 1000;
s103, unifying Img-10K, AIR-SARShip-1.0, AIR-SARShip-2.0 and SAR-train-int image formats into 8-bit single-channel TIF images, wherein the finally established data set comprises 2551 images, training samples are divided into 2351, and 200 samples are verified;
and S104, splicing the four pictures in a random scaling, random cutting and random arrangement mode by using a Mosaic data enhancement algorithm, and increasing the number of small target samples to enable the training data to tend to be uniform in distribution.
S2, a Ghost module and a Ghost Bottleneck are introduced to upgrade the YOLOv5S model, the preliminary lightweight of the YOLOv5S model is completed, and the training samples selected in the step S1 are used for training 500 epochs of the YOLOv5S model;
the idea of the Ghost module instead of standard convolution is to use a small number of eigen-feature maps to perform cheap "Ghost" after linear transformation as the output feature map. The method utilizes the similarity between redundant feature diagram pairs, and can obtain the hypothesis of a large number of similar redundant feature diagrams through simple linear transformation based on a small number of intrinsic feature diagrams, thereby realizing the purpose of compressing convolution parameters and operand. The Ghost module decomposes a standard convolution into two parts, wherein the first part generates a small amount of intrinsic feature maps by using a small amount of standard convolution, and the second part generates a large amount of 'Ghost' feature maps, namely redundant feature maps at extremely low cost by performing simple linear operation on the intrinsic feature maps.
S201, the specific operation of upgrading YOLOv5S by using the Ghost module and the Ghost bottleneck is to replace the convolution module and the bottleneck module in the backbone network of the YOLOv5S model with the Ghost module and the Ghost bottleneck, respectively, as shown in fig. 3.
S202, since the Ghost module greatly increases the network depth, it is considered to reduce the increase in the network depth due to the Ghost module by changing the depth multiplier. The width multiplier was adjusted to 0.15, the depth multiplier was adjusted to 0.35, and the number of network layers was reduced to 212 layers, resulting in a preliminary lightweight YOLOv5s model.
Two multipliers are adjusted: the width multiplier and the depth multiplier are adjusted to reduce the number of network layers, and the process is called as preliminary lightweight.
S3, on the basis of the YOLOv5S model obtained in the step S2, further lightening the YOLOv5S preliminary lightening model by utilizing network pruning and knowledge distillation of a traditional model lightening algorithm to obtain a YOLOv5S model;
distilling the initial lightweight YOLOv5S model obtained in the step S2, performing thinning training after distillation, pruning, and performing fine tuning training on the pruned model to recover the precision.
S301, using YOLOv5m as a teacher model (T-model), using L2 loss as a distillation basis function, selecting a distillation dist equilibrium coefficient in loss as 1, and carrying out distillation training on 100 epochs;
s302, after the over-parameterized model is obtained through normal training, sparse parameters 6e-4 are set, L1 regularization is carried out on gamma parameters of the BN layer through sparse training, and a sparse weight matrix is generated. This was used as a criterion for evaluating the size of the neuron contribution, and a threshold was determined according to the 30% sparsity rate. Cutting off a layer smaller than a threshold value and a dependent layer of the layer, and if all channels in the layer need to be removed, reserving a maximum channel for ensuring a network structure;
step S301 is to perform distillation optimization on the preliminary lightweight model, and step S302 is to further prune the distilled model to obtain a pruned model.
And S303, after pruning is finished, in order to ensure that the precision of the model is not greatly reduced, training the pruned model obtained in the step S302 for 50 epochs continuously, and learning the final weight of sparse connection through fine tuning training.
S4, carrying out inference acceleration on the YOLOv5S model obtained in the step S3 by using a TensorRT inference optimizer, and deploying the model on NVIDIAJetson TX2, wherein the TensorRT inference optimizer carries out deployment and comprises a Build stage and a Deploymeng stage;
s401, optimizing at a Build stage to obtain an pt file by using a Pythrch training model, converting the pt file into an onnx model, loading the onnx model in TensorRT, converting the onnx model into a TensorRT model, and storing the TensorRT model into a disk or a memory in a serialized mode, wherein the file is called a plan file;
and S402, deploying a lightweight YOLOv5 model in a Deployment stage, and finishing a forward reasoning process. Firstly, deserializing a plan file obtained in the Build process, and creating a runtime engine for reasoning.
S5, after the large-size SAR image to be detected is cut, the large-size SAR image to be detected is sequentially sent to a YOLOv5S model deployed on NVIDIA Jetson TX2 in the step S4 to complete detection;
similar to the generation of training samples, sub-graphs which are cut into 1000 × 1000 blocks of large-size SAR images at an overlap (coincidence ratio) of 50% are sequentially sent to a model for detection.
S501, sending a sub-image of a picture to be detected into a trained lightweight YOLOv5S model for detection, if the sub-image does not meet the requirement of the model for the size of the picture, carrying out adaptive picture scaling, sending the sub-image into a feature extraction network to obtain a feature map with the size of S multiplied by S, and dividing an input image into small lattices with the size of S multiplied by S;
s502, predicting B bounding boxes by using logistic regression for each grid, and if the center of the predicted bounding box is in a grid unit, classifying and frame predicting the target by the B bounding boxes of the grid unit to obtain a prediction result of each grid on the B bounding boxes;
and outputting position information of the bounding box, confidence indicating whether the grid contains the target or not and probability information of C categories. Each bounding box predicts 5 values: t is tx、ty、tw、th、to。tx、tyIs the offset value of the bounding box center coordinates relative to the current grid cell. Meanwhile, in order to ensure that the center of the bounding box is restricted in the current grid unit, a logic activation (Logistic) is used for txAnd tyPerforming normalization processing to obtain txAnd tyThe value of (2) is limited within 0-1, so that the model training is more stable; t is tw、thIs the scaling of the bounding box width and height, toIs a confidence, mentioned in RCNNthSo too does the calculation of (c).
According to the 5 values predicted by each bounding box, the center point coordinate b of the predicted bounding box in the whole feature map can be calculated according to the following formulax、byAnd length and width bw、bh。
bx=σ(tx)+cx (1)
by=σ(ty)+cy (2)
Wherein, cxAnd cyIs the distance, p, of the current grid cell relative to the upper left corner of the feature mapwAnd phIs the a priori box length and width. The sigma function is activated logically, and coordinate offset and confidence coefficient are limited within 0-1. When the real frame falls within the grid cell, the probability Pr (object) of the real frame falling within the grid cell is 1, otherwise Pr (object) is 0.
Probability Pr (class) that a certain grid cell belongs to a certain class under the condition of containing an objecti| object) is expressed as:
wherein the content of the first and second substances,pr (class) which is the intersection ratio of the real frame and the predicted framei) The probability of the corresponding category of the target in a certain cell is obtained.
S503, the feature pyramid network FPN downsampling conveys strong semantic features from top to bottom and the path aggregation network PAN upsampling conveys strong positioning features from bottom to top to fuse detection results of three scales respectively.
For a picture input size of 960 × 960, the output feature maps are 120 × 120, 60 × 60, 30 × 30, respectively, 8-fold, 16-fold, and 32-fold down-sampled results, respectively.
And S6, splicing the sub-graph detection results obtained in the step S5, using an NMS non-maximum value to inhibit and screen a prediction frame on the final large-size SAR image, drawing the prediction frame on the original large-size image according to the value of the screened prediction frame, drawing a reserved prediction frame on the original image, marking the category, and completing the target detection of the large-size SAR image.
S601, the splicing process is the reverse process of the dicing process, and the position information of the target on the large graph is calculated according to the position information of the target on the sub-graph and the relative position of the sub-graph on the large graph;
s602, aiming at a certain class, setting an NMS threshold value to be 0.65, selecting a boundary box with the highest confidence coefficient, filtering all boundary boxes exceeding the NMS threshold value according to the DIOU values of the boundary box and other boundary boxes, removing the boundary box with a high repetition rate, screening the prediction boxes by the NMS, performing picture frame according to the reserved prediction boxes after the prediction boxes are screened, and completing ship detection of the large-size SAR image.
In another embodiment of the present invention, a SAR image ship detection system based on lightweight deep learning is provided, which can be used to implement the above SAR image ship detection method based on lightweight deep learning, and specifically, the SAR image ship detection system based on lightweight deep learning includes a data module, a processing module, a fine-tuning module, an inference module, a detection module, and a removal module.
The data module is used for preprocessing the large-size SAR image and selecting a subgraph containing target information as a training sample;
the processing module is used for introducing a Ghost module and a Ghost Bottleneck to upgrade the YOLOv5s model to obtain a primary lightweight YOLOv5s model, and training the YOLOv5s model by using training samples selected by the data module;
the fine tuning module is used for distilling the YOLOv5s model obtained after the training of the processing module, then performing sparse training and pruning, and performing fine tuning training on the pruned YOLOv5s model;
the reasoning module is used for carrying out reasoning acceleration on the YOLOv5s model after the fine tuning training of the fine tuning module by using a TensorRT reasoning optimizer and deploying the model on NVIDIA Jetson TX 2;
the detection module is used for cutting the SAR image to be detected and then sequentially sending the SAR image to the inference module to be deployed on a YOLOv5s model on NVIDIA Jetson TX2 for detection to obtain a corresponding sub-graph detection result;
and the removing module is used for splicing the sub-image detection results obtained by the detection module, inhibiting and screening a prediction frame on the final large-size SAR image by using the NMS non-maximum value, and drawing the prediction frame on the original large-size image according to the value of the screened prediction frame to realize the SAR image ship detection.
In yet another embodiment of the present invention, a terminal device is provided that includes a processor and a memory for storing a computer program comprising program instructions, the processor being configured to execute the program instructions stored by the computer storage medium. The Processor may be a Central Processing Unit (CPU), or may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable gate array (FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, etc., which is a computing core and a control core of the terminal, and is adapted to implement one or more instructions, and is specifically adapted to load and execute one or more instructions to implement a corresponding method flow or a corresponding function; the processor provided by the embodiment of the invention can be used for the operation of the SAR image ship detection method based on the lightweight deep learning, and comprises the following steps:
preprocessing a large-size SAR image, and selecting a subgraph containing target information as a training sample; a Ghost module and a Ghost Bottleneck are introduced to upgrade the YOLOv5s model to obtain a primary lightweight YOLOv5s model, and a training sample is used for training the YOLOv5s model; distilling the trained YOLOv5s model, then performing sparseness training and pruning, and performing fine tuning training on the pruned YOLOv5s model; carrying out inference acceleration on the fine-tuning trained YOLOv5s model by using a TensorRT inference optimizer, and deploying the model on NVIDIA Jetson TX 2; after the SAR image to be detected is cut, the SAR image is sequentially sent to a YOLOv5s model deployed on NVIDIA Jetson TX2 for detection, and a corresponding sub-graph detection result is obtained; and splicing the obtained sub-image detection results, inhibiting and screening a prediction frame on the final large-size SAR image by using the NMS non-maximum value, and drawing the prediction frame on the original large-size image according to the value of the screened prediction frame to realize the SAR image ship detection.
In still another embodiment of the present invention, the present invention further provides a storage medium, specifically a computer-readable storage medium (Memory), which is a Memory device in a terminal device and is used for storing programs and data. It is understood that the computer readable storage medium herein may include a built-in storage medium in the terminal device, and may also include an extended storage medium supported by the terminal device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also, one or more instructions, which may be one or more computer programs (including program code), are stored in the memory space and are adapted to be loaded and executed by the processor. It should be noted that the computer-readable storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor can load and execute one or more instructions stored in the computer readable storage medium to realize the corresponding steps of the SAR image ship detection method based on the lightweight deep learning in the embodiment; one or more instructions in the computer-readable storage medium are loaded by the processor and perform the steps of:
preprocessing a large-size SAR image, and selecting a subgraph containing target information as a training sample; a Ghost module and a Ghost Bottleneck are introduced to upgrade the YOLOv5s model to obtain a primary lightweight YOLOv5s model, and a training sample is used for training the YOLOv5s model; distilling the trained YOLOv5s model, then performing sparseness training and pruning, and performing fine tuning training on the pruned YOLOv5s model; carrying out inference acceleration on the fine-tuning trained YOLOv5s model by using a TensorRT inference optimizer, and deploying the model on NVIDIA Jetson TX 2; after the SAR image to be detected is cut, the SAR image is sequentially sent to a YOLOv5s model deployed on NVIDIA Jetson TX2 for detection, and a corresponding sub-graph detection result is obtained; and splicing the obtained sub-image detection results, inhibiting and screening a prediction frame on the final large-size SAR image by using the NMS non-maximum value, and drawing the prediction frame on the original large-size image according to the value of the screened prediction frame to realize the SAR image ship detection.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The effects of the present invention can be further illustrated by the following experiments:
1. experimental Environment
The simulation environment of the training host is as follows: ubuntu 18.04, Intel (R) Xeon (R) Gold 5118CPU, and the GPUs are GeForce RTX 2080ti, python3.8.5, CUDA 10.0.130 and CuDNN 7.0.
Jetson TX2 inference environment: ubuntu 18.04, CPU HMP Dual Denver 2/2MB L2+ QuadA57/2MB L2, the GPU NVIDIA Pascal, 256CUDA cores, python3.8.5, CUDA 10.2.89, CuDNN 8.0.0.180, TensorRT version 7.1.3.0, Jetpack version 4.4.1.
2. Content of the experiment
(1) And (3) verifying the effectiveness of a Ghost module on model lightweight on a GPU (graphics processing Unit) and a CPU (Central processing Unit) of a training host by taking YOLOv5s as a baseline model respectively, and recording model parameter number, floating point operand, average precision AP50, AP50:95, precision P, recall R, inference time for processing a 1000 x 1000 picture and total processing time. The Ghost model is demonstrated to perform for parameter and floating point operand compression as shown in fig. 2.
(2) The preliminary lightweight model was distilled and pruned, the model was tested on NVIDIA Jetson TX2, and the inference time and total processing time for processing a 1000 x 1000 picture were recorded. The performance of the model on inference time acceleration is verified.
3. Simulation experiment results
Experimental results show that the Ghost module has large compression on the baseline model YOLOv5s in parameter quantity and floating point operand. The distillation and pruning strategies used can further compress the model and greatly increase the inference speed.
TABLE 1 comparison of the Performance of Mobile and Ghost modules in YOLOv5
Model | All | L | Weights(M) | FLOPs(G) | AP50 | AP50:95 | P | R | Infer(ms) | Total(ms) |
YOLOv5s | \ | 224 | 6.72 | 16.3 | 61.0% | 33.5% | 84.5% | 56.6% | 8 | 54 |
YOLOv5m | \ | 308 | 20.06 | 50.3 | 64.3% | 33.5% | 77.9% | 60.1% | 13 | 61 |
GhostYOLOv5 | 326 | 4.44 | 9.7 | 62.8% | 33.7% | 78.5% | 59% | 11 | 60 | |
GhostYOLOv5 | √ | 362 | 2.69 | 6.5 | 60.5% | 32.9% | 80.3% | 58.4% | 12 | 60.9 |
The performance pair ratios of the respective models are shown in table 1. The experiments in the tables were all done on a laboratory server. The image sizes are 960 x 960 to accommodate the network input. Where all represents whether all convolutional modules and bottleneck blocks in the network are replaced, and L represents the number of network layers. It can be seen that there is significant compression on the parameter quantity and the floating point operand of the model by Ghost, which reduces the parameter quantity of YOLOv5s from 6.72M to 4.44M, and reduces the floating point operand from 16.3G to 9.7G, while maintaining a certain precision of Ghost YOLOv5s at AP50And AP50:95The above are slightly higher than YOLOv5s, but the inference speed on the GPU is not as high as that of the parameter and floating point operand, which are both larger than YOLOv5 s. And (4) considering that the computational power bottleneck of the GPU is the memory access bandwidth, and only replacing a backbone network for reducing the network layer number and improving the reasoning speed. The model inference speed is tested on the CPU. The results are shown in Table 2.
TABLE 2 model CPU inference time comparison
Model | Weights(M) | FLOPs(G) | AP50 | AP50:95 | P | R | Infer(ms) | Total(ms) |
YOLOv5s | 6.72 | 16.3 | 61.0% | 33.5% | 84.5% | 56.6% | 510 | 546 |
GhostYOLOv5 | 4.44 | 9.7 | 62.8% | 33.7% | 78.5% | 59% | 440 | 499 |
The inference speed of the Ghost YOLOv5s on the CPU is significantly faster than YOLOv5s, and the total processing time is also faster than YOLOv5s, which shows that Ghost is effective for the lightweight of the network model. It is sufficient to demonstrate the superior performance of the Ghost model on network compression.
TABLE 3 Depth multiplier and Width multiplier impact on inference time
Model | Depth | Width | Weights(M) | FLOPs(G) | AP50 | AP50:95 | Infer(ms) | Total(ms) |
GhostYOLOv5 | 0.33 | 0.50 | 4.44 | 9.7 | 62.8% | 33.7% | 11 | 60 |
GhostYOLOv5 | 0.15 | 0.35 | 2.22 | 5.1 | 63.0% | 34.8% | 12 | 60.9 |
YOLOv5 implements four models of different sizes by adjusting the width multiplier (width multiplex) and the depth multiplier (depth multiplex). Since the Ghost module can bring a large increase in the network depth, it is considered to reduce the increase in the network depth brought about by the Ghost module by changing the depth multiplier. The width multiplier was adjusted to 0.15 and the depth multiplier was adjusted to 0.35. The number of network layers is reduced to 212 layers. The test reasoning speed and the total processing speed on the server GPU are both improved to a certain extent, but the precision of the model is not reduced. It is true that measures to alter the width multiplier and the depth multiplier to control the width and depth of the model are effective.
TABLE 4 TensorRT inference Performance comparison
Model | Pruning | Distillation | Weights(M) | FLOPs(G) | AP50 | AP50:95 | Infer(ms) | Total(ms) |
YOLOv5s | 6.72 | 16.3 | 61.1% | 33.5% | 70.38 | 121.82 | ||
GhostYOLOv5 | 2.22 | 5.1 | 63.0% | 34.8% | 61 | 109.77 | ||
GhostYOLOv5 | √ | 1.62 | 3.0 | 61.6% | 32.2% | 40.5 | 90.7 | |
GhostYOLOv5 | √ | 2.22 | 5.1 | 63.0% | 32.3% | 59.4 | 108.66 | |
GhostYOLOv5 | √ | √ | 0.89 | 1.8 | 57.3% | 27.7% | 30.2 | 84.5 |
After the Yolov5s subjected to the Ghost lightweight is subjected to distillation, pruning and fine tuning training, the parameter number and floating point operand of the model are greatly reduced. A loss of some accuracy is inevitable, but this loss is within an acceptable range. The picture inference time of a 1000 x 1000 picture on TX2 is only 30.2ms, and the total time period including reading the picture and post-processing is also only 84.5 ms. It is sufficient to prove the superiority of ghostyov 5. Analysis of table 4 revealed that distillation had little effect on improving the accuracy of the model, but the model cut by the same pruning strategy after distillation could be made more lightweight. Considering distillation allows the model weight distribution to be more dense, making important weights more important and less important than less important weights. A more sparse matrix can be obtained in the sparse training.
On a 10000 × 10000 test image, fig. 4 shows a key part of the detection result. And hiding the confidence information in order not to shield the small target. The two models obtained finally by the experiment are respectively a complex model only subjected to pruning and a simple model subjected to distillation pruning. The two models have great difference in parameter quantity and floating point operand, and the aimed image complexity is different. The inference time of a single picture of the complex model on the TX2 by GhostYOLOv5s is 40ms, and if the model loading, picture reading and post-processing stages are included, the total processing time of the single picture is about 92 ms. The simple model has the inference time of a single picture on TX2 of ghost yolov5s of 30.2ms, and if the model loading, picture reading and post-processing stages are included, the total processing time of the single picture is about 84.5 ms.
TABLE 5 comparison of detection effects of different models in complex scenes
Model | Weights(M) | FLOPs(G) | AP50 | AP50:95 | Miss | Fake | F1 | Infer(s) | Total(s) |
Complex | 1.62 | 3.0 | 69% | 42.3% | 35.6% | 13.5% | 0.738 | 14.44 | 32.2 |
Simple | 0.89 | 1.8 | 56.1% | 27.3% | 49.5% | 22.9% | 0.61 | 10.9 | 30.5 |
TABLE 6 comparison of detection effects of different models in simple scene
Model | Weights(M) | FLOPs(G) | AP50 | AP50:95 | Miss | Fake | F1 | Infer(s) | Total(s) |
Complex | 1.62 | 3.0 | 81.6% | 50% | 27.9% | 13% | 0.789 | 14.44 | 32.2 |
Simple | 0.89 | 1.8 | 55.1% | 24.6% | 46.3% | 36.7% | 0.581 | 10.9 | 30.5 |
Fig. 5 and 6 show that the confidence level picture is not hidden by the complex model. The images are river channels, the detection complexity of the shore is high, and the confidence coefficient of the detection can be kept at a high level.
The complexity mainly aims at complex scenes, and the simple model mainly aims at simple scenes. Fig. 7 and 8 show the detection effect of the simple model on the simple scene. The simple model shows superior performance in the range sea area. Therefore, the selection of the proper model for images of different complexity can realize the optimization of the reasoning speed.
In conclusion, the SAR image ship detection method based on lightweight deep learning deploys the obtained lightweight YOLOv5s model on the embedded equipment NVIDIA Jetson TX2, completes ship tasks of large-size SAR images, and can effectively detect ships in both simple scenes and complex scenes of the SAR images.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.
Claims (10)
1. A SAR image ship detection method based on lightweight deep learning is characterized by comprising the following steps:
s1, preprocessing the large-size SAR image, and selecting a subgraph containing target information as a training sample;
s2, a Ghost module and a Ghost Bottleneck are introduced to upgrade the YOLOv5S model to obtain a primary lightweight YOLOv5S model, and the training sample selected in the step S1 is used for training the YOLOv5S model;
s3, distilling the YOLOv5S model obtained after training in the step S2, then performing sparseness training and pruning, and performing fine tuning training on the YOLOv5S model after pruning;
s4, carrying out inference acceleration on the YOLOv5S model after fine tuning training in the step S3 by using a TensorRT inference optimizer, and deploying the model on NVIDIA Jetson TX 2;
s5, cutting the SAR image to be detected, sequentially sending the cut SAR image to a step S4, deploying the YOLOv5S model on NVIDIA Jetson TX2, and detecting to obtain a corresponding sub-graph detection result;
and S6, splicing the sub-graph detection results obtained in the step S5, using an NMS non-maximum value to inhibit and screen a prediction frame on the final large-size SAR image, drawing the prediction frame on the original large-size image according to the value of the screened prediction frame, and marking the category, thereby realizing the SAR image ship detection.
2. The method according to claim 1, wherein step S1 is specifically:
s101, dicing 5 single-channel TIF images Img10K and 31 single-channel TIFF images AIR-SARShip-1.0 at a coincidence rate of 50% to obtain sub-images of large-size remote sensing images;
s102, amplifying 1000 8-bit JPG images SAR-train-int;
s103, unifying the Img-10K, AIR-SARShip-1.0 obtained in the step S101 and the SAR-train-int image obtained in the step S102 into an 8-bit single-channel TIF image to obtain a data set comprising 2551 pictures, dividing 2351 pictures into training samples and 200 pictures into verification samples;
and S104, performing random operation on the training sample in the step S103 by using a Mosaic data enhancement algorithm, and splicing every four pictures in the training sample in a random scaling, random cutting and random arrangement mode.
3. The method according to claim 1, wherein step S2 is specifically:
s201, replacing a convolution module and a bottleneck module in a backbone network of a YOLOv5S model by using a Ghost module and a GhostBottleneck, and upgrading the YOLOv5S model by using the Ghost module and the GhostBottleneck;
s202, adjusting the width multiplier to be 0.15, adjusting the depth multiplier to be 0.35, and reducing the number of network layers to 212 layers to obtain a primary lightweight YOLOv5S model.
4. The method according to claim 1, wherein step S3 is specifically:
s301, using YOLOv5m as a teacher model, using L2 loss as a distillation basis function, selecting a distillation dist equilibrium coefficient in loss as 1, and carrying out distillation training for 100 epochs;
s302, after a hyperparameterized model is obtained through normal training, setting a sparse parameter to be 6e-4, conducting L1 regularization on gamma parameters of a BN layer through sparse training, generating a sparse weight matrix as a standard for evaluating the contribution of neurons, determining a threshold according to 30% sparse rate, cutting off a layer smaller than the threshold and a dependent layer of a corresponding layer, and if all channels in the corresponding layer need to be removed, keeping the largest channel;
and S303, after the pruning processing in the step S302 is finished, continuously training the model obtained in the step S302 for 50 epochs, and learning the final weight of the sparse connection through fine tuning training.
5. The method according to claim 1, wherein in step S4, the deployment of the TensorRT inference optimizer includes a Build phase and a depolyymeng phase, specifically:
s401, optimizing at the Build stage by using a Pythrch training model to obtain a pt file, converting the pt file into an onnx model, loading the onnx model in TensorRT, and converting the onnx model into a TensorRT model; then the TensorRT model is stored in a disk or a memory in a serialized mode and is called a plan file;
s402, deploying a lightweight YOLOv5 model in a Deployment stage, deserializing the plan file obtained in the step S401, creating a runtime engine, and completing a forward reasoning process.
6. The method according to claim 1, wherein step S5 is specifically:
s501, sending a sub-image of a picture to be detected into a trained lightweight YOLOv5S model for detection, if the sub-image of the picture to be detected does not meet the requirement of the model for the size of the picture, carrying out adaptive picture scaling, sending the sub-image into a feature extraction network to obtain a feature map with the size of S multiplied by S, and dividing an input image into small lattices with the size of S multiplied by S;
s502, predicting B bounding boxes by using logistic regression for each grid, if the center of the predicted bounding box is in a grid unit, classifying and frame predicting the target by the B bounding boxes of the grid unit to obtain the prediction result of each grid on the B bounding boxes, outputting the position information of the bounding boxes, the confidence coefficient indicating whether the grid contains the target and the probability information of C classes, and predicting t by each bounding box to obtainx、ty、tw、th、to,tx、tyIs the offset value of the bounding box center coordinates relative to the current grid cell; using logically activated pairs txAnd tyCarrying out normalization processing to limit the value within 0-1, tw、thIs the scaling of the bounding box width and height, toIs confidence;
s503, adopting a feature pyramid network to downsample and transmit a strong semantic feature from top to bottom and a path aggregation network to upsample and transmit a strong positioning feature from bottom to top to fuse detection results of three scales respectively; for a picture input size of 960 × 960, the output feature maps are 120 × 120, 60 × 60, 30 × 30, respectively, 8-fold, 16-fold, and 32-fold down-sampled results, respectively.
7. The method according to claim 6, wherein in step S502, the coordinates b of the center point of each predicted bounding box in the whole feature map are obtained according to the 5 predicted values of each bounding boxx、byAnd length and width bw、bhThe following were used:
bx=σ(tx)+cx
by=σ(ty)+cy
where the sigma function is logically active, cxAnd cyRespectively, the distance, p, of the current grid cell with respect to the top left corner of the feature mapwAnd phThe length and width of the prior box.
8. The method of claim 7, wherein the coordinate offset and confidence are limited to be within 0-1, when the real box is in the grid cell, Pr (object) is 1, otherwise Pr (object) is 0, and the grid cell belongs to a certain class of probability Pr (class) under the condition of containing the objecti| object) is expressed as
9. The method according to claim 1, wherein step S6 is specifically:
s601, calculating the position information of the target on the large graph according to the position information of the target on the sub-graph and the relative position of the sub-graph on the large graph;
s602, aiming at a certain class, setting an NMS threshold value to be 0.65, selecting a boundary box with the highest confidence coefficient, filtering all boundary boxes exceeding the NMS threshold value according to the DIOU values of the boundary box and other boundary boxes, and performing picture frame according to the reserved prediction box after the prediction box is screened, so as to finish ship detection of the large-size SAR image.
10. A SAR image ship detection system based on lightweight deep learning is characterized by comprising:
the data module is used for preprocessing the large-size SAR image and selecting a subgraph containing target information as a training sample;
the processing module is used for introducing a Ghost module and a Ghost Bottleneck to upgrade the YOLOv5s model to obtain a primary lightweight YOLOv5s model, and training the YOLOv5s model by using training samples selected by the data module;
the fine tuning module is used for distilling the YOLOv5s model obtained after the training of the processing module, then performing sparse training and pruning, and performing fine tuning training on the pruned YOLOv5s model;
the reasoning module is used for carrying out reasoning acceleration on the YOLOv5s model after the fine tuning training of the fine tuning module by using a TensorRT reasoning optimizer and deploying the model on NVIDIA Jetson TX 2;
the detection module is used for cutting the SAR image to be detected and then sequentially sending the SAR image to the inference module to be deployed on a YOLOv5s model on NVIDIA Jetson TX2 for detection to obtain a corresponding sub-graph detection result;
and the removing module is used for splicing the sub-graph detection results obtained by the detection module, inhibiting and screening a prediction frame on the final large-size SAR image by using the NMS non-maximum value, drawing the prediction frame on the original large-size image according to the value of the screened prediction frame and marking the category, so that the large-size SAR image ship detection is realized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110765081.4A CN113469073B (en) | 2021-07-06 | 2021-07-06 | SAR image ship detection method and system based on lightweight deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110765081.4A CN113469073B (en) | 2021-07-06 | 2021-07-06 | SAR image ship detection method and system based on lightweight deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113469073A true CN113469073A (en) | 2021-10-01 |
CN113469073B CN113469073B (en) | 2024-02-20 |
Family
ID=77878682
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110765081.4A Active CN113469073B (en) | 2021-07-06 | 2021-07-06 | SAR image ship detection method and system based on lightweight deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113469073B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113919444A (en) * | 2021-11-10 | 2022-01-11 | 北京市商汤科技开发有限公司 | Training method of target detection network, target detection method and device |
CN114022705A (en) * | 2021-10-29 | 2022-02-08 | 电子科技大学 | Adaptive target detection method based on scene complexity pre-classification |
CN114418064A (en) * | 2021-12-27 | 2022-04-29 | 西安天和防务技术股份有限公司 | Target detection method, terminal equipment and storage medium |
CN114630396A (en) * | 2021-12-31 | 2022-06-14 | 厦门阳光恩耐照明有限公司 | Intelligent lamp Bluetooth configuration method and system based on image recognition |
CN114821346A (en) * | 2022-06-28 | 2022-07-29 | 深圳安德空间技术有限公司 | Radar image intelligent identification method and system based on embedded platform |
CN114818828A (en) * | 2022-05-18 | 2022-07-29 | 电子科技大学 | Training method of radar interference perception model and radar interference signal identification method |
CN114821022A (en) * | 2022-06-27 | 2022-07-29 | 中国电子科技集团公司第二十八研究所 | Credible target detection method integrating subjective logic and uncertainty distribution modeling |
CN114937045A (en) * | 2022-06-20 | 2022-08-23 | 四川大学华西医院 | Hepatocellular carcinoma pathological image segmentation system |
CN115019180A (en) * | 2022-07-28 | 2022-09-06 | 北京卫星信息工程研究所 | SAR image ship target detection method, electronic device and storage medium |
CN115100603A (en) * | 2022-07-08 | 2022-09-23 | 福州大学 | Lightweight personal protection equipment detection method based on sparse coefficient channel pruning |
CN115170800A (en) * | 2022-07-15 | 2022-10-11 | 浙江大学 | Urban waterlogging deep recognition method based on social media and deep learning |
CN115439684A (en) * | 2022-08-25 | 2022-12-06 | 艾迪恩(山东)科技有限公司 | Household garbage classification method based on lightweight YOLOv5 and APP |
CN115797736A (en) * | 2023-01-19 | 2023-03-14 | 北京百度网讯科技有限公司 | Method, device, equipment and medium for training target detection model and target detection |
CN117058525A (en) * | 2023-10-08 | 2023-11-14 | 之江实验室 | Model training method and device, storage medium and electronic equipment |
CN117314898A (en) * | 2023-11-28 | 2023-12-29 | 中南大学 | Multistage train rail edge part detection method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110909667A (en) * | 2019-11-20 | 2020-03-24 | 北京化工大学 | Lightweight design method for multi-angle SAR target recognition network |
CN110929593A (en) * | 2019-11-06 | 2020-03-27 | 哈尔滨工业大学(威海) | Real-time significance pedestrian detection method based on detail distinguishing and distinguishing |
CN111259740A (en) * | 2020-01-09 | 2020-06-09 | 北京航空航天大学 | Infrared image ship detection method based on lightweight CNN and multi-source feature decision |
CN112308019A (en) * | 2020-11-19 | 2021-02-02 | 中国人民解放军国防科技大学 | SAR ship target detection method based on network pruning and knowledge distillation |
CN112464846A (en) * | 2020-12-03 | 2021-03-09 | 武汉理工大学 | Automatic identification method for abnormal fault of freight train carriage at station |
CN112686180A (en) * | 2020-12-29 | 2021-04-20 | 中通服公众信息产业股份有限公司 | Method for calculating number of personnel in closed space |
CN112819771A (en) * | 2021-01-27 | 2021-05-18 | 东北林业大学 | Wood defect detection method based on improved YOLOv3 model |
-
2021
- 2021-07-06 CN CN202110765081.4A patent/CN113469073B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110929593A (en) * | 2019-11-06 | 2020-03-27 | 哈尔滨工业大学(威海) | Real-time significance pedestrian detection method based on detail distinguishing and distinguishing |
CN110909667A (en) * | 2019-11-20 | 2020-03-24 | 北京化工大学 | Lightweight design method for multi-angle SAR target recognition network |
CN111259740A (en) * | 2020-01-09 | 2020-06-09 | 北京航空航天大学 | Infrared image ship detection method based on lightweight CNN and multi-source feature decision |
CN112308019A (en) * | 2020-11-19 | 2021-02-02 | 中国人民解放军国防科技大学 | SAR ship target detection method based on network pruning and knowledge distillation |
CN112464846A (en) * | 2020-12-03 | 2021-03-09 | 武汉理工大学 | Automatic identification method for abnormal fault of freight train carriage at station |
CN112686180A (en) * | 2020-12-29 | 2021-04-20 | 中通服公众信息产业股份有限公司 | Method for calculating number of personnel in closed space |
CN112819771A (en) * | 2021-01-27 | 2021-05-18 | 东北林业大学 | Wood defect detection method based on improved YOLOv3 model |
Non-Patent Citations (7)
Title |
---|
HANG YU,SUIPING ZHOU: "VS-LSDet: A Multiscale Ship Detector for Spaceborne SAR Images Based on Visual Saliency and Lightweight CNN", IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 14 December 2023 (2023-12-14) * |
RONGFANG WANGA, FAN DINGA: "A Light-Weighted Convolutional Neural Network for Bitemporal SAR Image Change Detection", IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 21 June 2020 (2020-06-21) * |
XI YANG , MEMBER, IEEE, JIANAN ZHANG, CHENGZENG CHEN, AND DONG YANG: "An Efficient and Lightweight CNN Model With Soft Quantification for Ship Detection in SAR Images", IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 31 December 2022 (2022-12-31) * |
XIAOWO XU, XIAOLING ZHANG * AND TIANWEN ZHANG: "Lite-YOLOv5: A Lightweight Deep Learning Detector for On-Board Ship Detection in Large-Scene Sentinel-1 SAR Images", REMOTE SENS, 28 February 2022 (2022-02-28) * |
XUEMENG ZHAO,YINGLEI SONG,SANXIA SHI,SHUNXIN LI: "Improving YOLOv5n for lightweight ship target detection", IEEE 3RD INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS, 16 December 2023 (2023-12-16) * |
梁文楷: "基于深度数据特征与统计特征学习的高分辨 率SAR图像分类", 中国优秀博士学位论文全文数据库, 15 July 2023 (2023-07-15) * |
肖恩: "基于深度学习的SAR 车辆目标分类与识别", 中国优秀硕士学位论文全文数据库, 15 May 2021 (2021-05-15) * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114022705B (en) * | 2021-10-29 | 2023-08-04 | 电子科技大学 | Self-adaptive target detection method based on scene complexity pre-classification |
CN114022705A (en) * | 2021-10-29 | 2022-02-08 | 电子科技大学 | Adaptive target detection method based on scene complexity pre-classification |
CN113919444A (en) * | 2021-11-10 | 2022-01-11 | 北京市商汤科技开发有限公司 | Training method of target detection network, target detection method and device |
CN114418064A (en) * | 2021-12-27 | 2022-04-29 | 西安天和防务技术股份有限公司 | Target detection method, terminal equipment and storage medium |
CN114630396A (en) * | 2021-12-31 | 2022-06-14 | 厦门阳光恩耐照明有限公司 | Intelligent lamp Bluetooth configuration method and system based on image recognition |
CN114818828A (en) * | 2022-05-18 | 2022-07-29 | 电子科技大学 | Training method of radar interference perception model and radar interference signal identification method |
CN114818828B (en) * | 2022-05-18 | 2024-04-05 | 电子科技大学 | Training method of radar interference perception model and radar interference signal identification method |
CN114937045A (en) * | 2022-06-20 | 2022-08-23 | 四川大学华西医院 | Hepatocellular carcinoma pathological image segmentation system |
CN114821022A (en) * | 2022-06-27 | 2022-07-29 | 中国电子科技集团公司第二十八研究所 | Credible target detection method integrating subjective logic and uncertainty distribution modeling |
CN114821346A (en) * | 2022-06-28 | 2022-07-29 | 深圳安德空间技术有限公司 | Radar image intelligent identification method and system based on embedded platform |
CN114821346B (en) * | 2022-06-28 | 2022-09-02 | 深圳安德空间技术有限公司 | Radar image intelligent identification method and system based on embedded platform |
CN115100603A (en) * | 2022-07-08 | 2022-09-23 | 福州大学 | Lightweight personal protection equipment detection method based on sparse coefficient channel pruning |
CN115170800A (en) * | 2022-07-15 | 2022-10-11 | 浙江大学 | Urban waterlogging deep recognition method based on social media and deep learning |
CN115019180A (en) * | 2022-07-28 | 2022-09-06 | 北京卫星信息工程研究所 | SAR image ship target detection method, electronic device and storage medium |
CN115439684A (en) * | 2022-08-25 | 2022-12-06 | 艾迪恩(山东)科技有限公司 | Household garbage classification method based on lightweight YOLOv5 and APP |
CN115439684B (en) * | 2022-08-25 | 2024-02-02 | 艾迪恩(山东)科技有限公司 | Household garbage classification method and APP based on lightweight YOLOv5 |
CN115797736A (en) * | 2023-01-19 | 2023-03-14 | 北京百度网讯科技有限公司 | Method, device, equipment and medium for training target detection model and target detection |
CN117058525A (en) * | 2023-10-08 | 2023-11-14 | 之江实验室 | Model training method and device, storage medium and electronic equipment |
CN117058525B (en) * | 2023-10-08 | 2024-02-06 | 之江实验室 | Model training method and device, storage medium and electronic equipment |
CN117314898A (en) * | 2023-11-28 | 2023-12-29 | 中南大学 | Multistage train rail edge part detection method |
CN117314898B (en) * | 2023-11-28 | 2024-03-01 | 中南大学 | Multistage train rail edge part detection method |
Also Published As
Publication number | Publication date |
---|---|
CN113469073B (en) | 2024-02-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113469073B (en) | SAR image ship detection method and system based on lightweight deep learning | |
CN112396002A (en) | Lightweight remote sensing target detection method based on SE-YOLOv3 | |
CN111126258A (en) | Image recognition method and related device | |
CN112651438A (en) | Multi-class image classification method and device, terminal equipment and storage medium | |
CN111079739B (en) | Multi-scale attention feature detection method | |
CN113569667B (en) | Inland ship target identification method and system based on lightweight neural network model | |
CN113160062B (en) | Infrared image target detection method, device, equipment and storage medium | |
CN110807362A (en) | Image detection method and device and computer readable storage medium | |
CN113034506B (en) | Remote sensing image semantic segmentation method and device, computer equipment and storage medium | |
CN114066718A (en) | Image style migration method and device, storage medium and terminal | |
CN115861799A (en) | Light-weight air-to-ground target detection method based on attention gradient | |
CN113052006A (en) | Image target detection method and system based on convolutional neural network and readable storage medium | |
Li et al. | A CNN-GCN framework for multi-label aerial image scene classification | |
CN115115601A (en) | Remote sensing ship target detection method based on deformation attention pyramid | |
CN112132145B (en) | Image classification method and system based on model extended convolutional neural network | |
CN112132207A (en) | Target detection neural network construction method based on multi-branch feature mapping | |
CN115620120B (en) | Street view image multi-scale high-dimensional feature construction quantization method, device and storage medium | |
CN115661657A (en) | Lightweight unmanned ship target detection method | |
CN108804988B (en) | Remote sensing image scene classification method and device | |
CN115512207A (en) | Single-stage target detection method based on multipath feature fusion and high-order loss sensing sampling | |
CN115578624A (en) | Agricultural disease and pest model construction method, detection method and device | |
CN114842417A (en) | Anti-unmanned aerial vehicle system image identification method based on coordinate attention mechanism fusion | |
CN115761552B (en) | Target detection method, device and medium for unmanned aerial vehicle carrying platform | |
CN115984583B (en) | Data processing method, apparatus, computer device, storage medium, and program product | |
Fang et al. | Embedded image recognition system for lightweight convolutional Neural Networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |