CN116229069A - Infrared shore line segmentation and target detection fusion method for unmanned surface vehicle under dark condition - Google Patents
Infrared shore line segmentation and target detection fusion method for unmanned surface vehicle under dark condition Download PDFInfo
- Publication number
- CN116229069A CN116229069A CN202310166583.4A CN202310166583A CN116229069A CN 116229069 A CN116229069 A CN 116229069A CN 202310166583 A CN202310166583 A CN 202310166583A CN 116229069 A CN116229069 A CN 116229069A
- Authority
- CN
- China
- Prior art keywords
- model
- infrared
- target detection
- convolution
- yolov5m
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/12—Details of acquisition arrangements; Constructional details thereof
- G06V10/14—Optical characteristics of the device performing the acquisition or on the illumination arrangements
- G06V10/143—Sensing or illuminating at different wavelengths
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/05—Underwater scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A10/00—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
- Y02A10/40—Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biodiversity & Conservation Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
According to the infrared shoreline segmentation and target detection fusion method for the unmanned surface vehicle under the dark condition, an infrared shoreline segmentation and target detection data set is firstly established, a deep V3 Plus model and a YOLOv5L model are then established, the two models are trained by utilizing the infrared shoreline segmentation and target detection data set to obtain training weights, then the deep V3 Plus model and the YOLOv5L model under the training weights are evaluated and predicted, super parameters are adjusted, a Pytorch frame network model is established, a decision level is established, and the Pytorch frame network model is cascaded with the deep V3 Plus model and the YOLOV5m model, and finally a weight file of the Pytorch frame network model is converted into a weight file of a TensorRT frame network model and is transferred to an edge computing platform on the unmanned surface vehicle, so that the recognition of the target and a feasible domain on the water based on the edge computing platform is realized.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an infrared shore line segmentation and target detection fusion method of a water surface unmanned ship under a dark condition.
Background
The traditional unmanned ship water surface sensing technology mainly depends on millimeter wave radar, laser radar (LiDAR), inertial measurement unit, GPS and other sensors carried on the unmanned ship. In recent years, the perception technology based on computer vision is developed rapidly, the optical image contains more abundant target area detail information, so the perception technology based on vision is easier to effectively distinguish the water surface targets, the research on shoreline segmentation based on infrared thermal imaging in the industry is very few, and the unmanned ship voyages at night still has a great challenge, so the research on the unmanned ship water target recognition and water area environment perception and positioning technology based on the infrared thermal imaging visual image is particularly important.
Disclosure of Invention
In order to realize effective identification of an unmanned ship on water targets and feasible regions under a dark condition, the invention provides an infrared shore line segmentation and target detection fusion method of the unmanned ship on water.
In order to solve the technical problems, the invention adopts the following technical methods: an infrared shore line segmentation and target detection fusion method of a water surface unmanned ship under a dark condition comprises the following steps:
step S1, data set establishment: the unmanned aerial vehicle carries a plurality of infrared thermal imagers, so that the infrared thermal imagers shoot at low altitude under a water area scene, a navigation view angle of the unmanned aerial vehicle is simulated, shooting video is carried out on the water surface, the video is processed to obtain an original image, an annotation tool is used for annotating the original image data to obtain an infrared bank line segmentation data set and an infrared target detection data set, and the two data sets are divided into a training set, a verification set and a test set according to a preset ratio;
step S2, establishing a model: adopting a deep V3 Plus model as a shoreline segmentation network model, and adopting a YOLOV5m model as a target detection network model; setting super parameters of a deep V3 Plus model and a YOLOV5m model, adopting the YOLOV5m model weight based on a voc2012 dataset as a pre-training weight to perform migration learning, then repeatedly training and verifying the deep V3 Plus model by utilizing a training set of an infrared shoreline segmentation dataset, and repeatedly training and verifying the YOLOV5m model by utilizing a training set of an infrared target detection dataset to obtain a training weight based on the infrared shoreline segmentation dataset and the infrared target detection dataset; then, evaluating and predicting the deep labV3 Plus model under the training weight by using a test set and a verification set of the infrared bank line segmentation data set, evaluating and predicting the Yolov5m model under the training weight by using a test set and a verification set of the infrared target detection data set, continuously adjusting the super parameters of the deep labV3 Plus model and the Yolov5m model according to the evaluation and test results, and establishing a Pytorch framework network model in which a decision level carries out cascading on the deep labV3 Plus model and the Yolov5m model after optimizing the super parameters; converting the weight file of the Pytorch frame network model into the weight file of the TensorRT frame network model, and transferring the weight file to an edge computing platform on the unmanned ship;
step S3, model application: and (2) transmitting the infrared shore line segmentation data and the infrared target detection data obtained by processing the water area scene video shot in real time by the thermal infrared imager carried by the unmanned aerial vehicle through the method related to the step (S1) to an edge computing platform, and processing the infrared shore line segmentation data and the infrared target detection data through a TensorRT frame network model to obtain the recognition result of the on-water target and the feasible region.
Further, in step S1, a data set is established: an infrared thermal imager of a M300 unmanned aerial vehicle in a large-scale area is mounted on an unmanned aerial vehicle, the infrared thermal imager shoots at low altitude in a water area scene, a navigation view angle of the unmanned aerial vehicle is simulated, video is shot on the water surface, frame extraction, de-duplication and screening are carried out on the video to obtain an original image, a marking tool Labelimg is used for marking a rectangular frame of an on-water target, and a marking tool Labelme is used for marking a polygonal frame of a feasible region to obtain an infrared bank line segmentation data set and an infrared target detection data set, wherein the infrared bank line segmentation data set comprises three categories including background, water and obstacle; the infrared target detection data set comprises five categories of ship body, onshore person person_shore, ship person person_body, underwater person swimming and dolphin number one dolphin 1; the infrared shore line segmentation data set and the infrared target detection data set are respectively processed according to 8:1:1 into three sub-data sets of a training set, a verification set and a test set, and the sample number of each category in each sub-data set is consistent.
Further, when the deep v3 Plus model is used as a network model for shoreline segmentation:
adopting a resnet as a trunk feature extraction network, decomposing a standard convolution into a depth convolution and a point-by-point convolution by an encoder main body part, wherein the depth convolution independently uses spatial convolution for each channel, and the point-by-point convolution is used for combining the output of the depth convolution; in the decoder, performing feature extraction by using parallel cavity convolution on the primary effective feature layers compressed four times, respectively using different rates, performing concat merging, and performing 1 multiplied by 1 convolution compression on the features to obtain a feature map; in the decoder, the channel number is adjusted by using 1X 1 convolution for the primary effective feature layer compressed twice, then the primary effective feature layer is stacked with the result of the effective feature up-sampling after the cavity convolution output by the decoder, after the stacking is completed, the depth separable convolution is carried out twice, the final effective feature layer is obtained, the channel adjustment is carried out by using one 1X 1 convolution, the channel adjustment is carried out to num_classes, and finally the up-sampling is carried out by using the resize, so that the width and the height of the final output layer are the same as those of the input picture.
Further, when the YOLOV5m model is used as a network model for target detection:
and adopting a dark net-53 as a trunk feature extraction network, carrying out feature extraction on an input image through the dark net53, extracting three feature layers in total in a feature utilization part, carrying out convolution processing on the three feature layers for 5 times, outputting a prediction result corresponding to the feature layers after the processing, and combining other feature layers after a part of the processing is used for carrying out deconvolution umSampling2 d.
Preferably, the dark-53 is composed of dark Conv2D and a residual network residual module, the residual convolution in the dark 53 is firstly carried out with a convolution of 3*3 and a step length of 2, then the convolution layer is saved, the convolution of 1*1 and the convolution of 3*3 are carried out again, the result is added with layer as a final result, then a large number of residual layer jump connection are used, five downsampling is carried out, the step length is 2, the convolution kernel size is 3, the characteristic dimensions are 64, 128, 256, 512 and 1024 respectively, an average pooling layer and a full connection layer are not used, L2 regularization is carried out when each convolution is carried out, and the batch normalization and the LeakyReLU activation function are carried out after the convolution is completed, wherein the Leaky ReLU activation function is:
the YOLOV5m model features are subjected to target detection by partially extracting multiple feature layers, three feature layers are extracted in total, the three feature layers are positioned at different positions of a trunk part dark 53 and are respectively positioned at a middle layer, a middle lower layer and a bottom layer, and shapes of the three feature layers are (52,52,256), (26,26,512) and (13,13,1024) respectively; and carrying out convolution treatment on the three feature layers for 5 times, wherein after the treatment is finished, part of the three feature layers are used for outputting a prediction result corresponding to the feature layers, and the other part of the three feature layers are used for carrying out deconvolution Umsampling2d and then are combined with other feature layers.
Still further, the super parameters of the deep v3 Plus model and the YOLOV5m model include the size of the image sample in the dataset to be input, the batch size, the number of iterations, the learning rate, and the number of categories.
In step S2, after repeated training and verification of the deep v3 Plus model and the YOLOV5m model, the cross entropy loss function and Adam loss function optimizer are adopted to continuously optimize the super parameters of the two models, so as to obtain training weights based on the infrared shoreline segmentation dataset and the infrared target detection dataset.
Preferably, in the step S2, when evaluating and predicting the DeeplabV3 Plus model and the YOLOV5m model under training weights:
1) Evaluation: screening the obtained training weights, and selecting the weight with the lowest total loss and valloss as the weight of the deep V3 Plus model and the YOLOV5m model; respectively testing the deep V3 Plus model and the YOLOV5m model by using a test set of the infrared bank line segmentation data set and the infrared target detection data set to obtain average precision MAP values of the mIoU and the YOLOV5m model of the deep V3 Plus model, and adjusting super parameters of the deep V3 Plus model and the YOLOV5m model according to a required value of an evaluation index, and retraining until the requirements are met; simultaneously, drawing train loss and val loss curves by using a tensorbard tool module under a tensorface frame;
2) And (3) predicting: testing a deep V3 Plus model by using a verification set of an infrared bank line segmentation data set to obtain masks, and calculating IoU of each category to count mIoU; and testing the YOLOV5m model by using a testing set of the infrared target detection data set, and obtaining the precision AP value and the average precision MAP value of each class of target detection by drawing the MAP program.
In summary, in order to solve the problem that unmanned ships cannot navigate autonomously and intelligently due to insufficient illumination under dark conditions, the invention provides an infrared shoreline segmentation and target detection fusion method, which comprises the following steps:
1. the invention aims at solving the problems of target visualization and data acquisition by adopting an infrared thermal imaging technology in dark environment. And carrying out frame extraction, structural similarity de-duplication and manual screening on the acquired video to build an original database. Rectangular frame labeling is carried out on the water target identification task by using a labeling tool 'Labelimg', polygonal frame labeling is carried out on the feasible region identification task by using a labeling tool 'Labelme', and therefore an infrared bank line segmentation data set and an infrared target detection data set are built.
2. Aiming at a feasible domain identification task, the invention trains the deep V3 Plus network based on the infrared bank line segmentation data set, adopts a man-in-loop data set and network optimization scheme to optimize the data set and the network weight, and finally obtains high-performance weight.
3. According to the method, a YOLOv5m network is trained based on an infrared target detection data set aiming at a water target recognition task, a loop data set and a network optimization scheme of a user are adopted to optimize the data set and the network weight, and finally high-performance weight is obtained.
4. According to the invention, the water target detection weight and the feasible region identification weight under the Pytorch frame network model are obtained based on network training, online reasoning of two networks of the same input image is realized by adopting a double-thread architecture, decision-level fusion is carried out on reasoning results, and the positioning of the water target is realized by combining the target detection result and the semantic segmentation result of the water target.
5. According to the invention, the weight file of the Pytorch frame network model is converted into the weight file of the TensorRT frame network model, and the weight file of the TensorRT frame network model is migrated to the edge computing platform, so that the recognition of the water targets and the feasible regions based on the edge computing platform is realized. The target detection mAP on the edge computing platform is not lower than 92.65%, the shoreline segmentation mIOU is not lower than 74.15%, and the reasoning speed is above 20 FPS.
Therefore, the method is mainly based on a visual image processing technology of infrared thermal imaging, the coastline is segmented, meanwhile, the obstacle on the water surface is identified, and then the deep V3 Plus model of infrared coastline segmentation and the YOLOV5m model of target detection related to the method are deployed on an edge computing platform at the same time, so that the real-time identification of the target and the feasible region on the water is realized. According to the system structure deployed by the aid of the advanced learning model, information among different stages and different participants is effectively integrated by the intelligent perception system based on infrared light, so that recognition of an on-water target and a feasible region is achieved, the effect is very good, and the actual navigation requirement of an unmanned ship is greatly met.
Drawings
FIG. 1 is a flow chart of a fusion method of infrared land line segmentation and target detection of a surface unmanned ship under dark conditions according to the invention;
FIG. 2 is a schematic diagram of a system involved in the infrared land line segmentation and target detection fusion method of the present invention;
FIG. 3 is a schematic diagram showing the fusion of infrared land segmentation and object detection in the present invention;
FIG. 4 is a schematic diagram of experimental results in an embodiment of the present invention.
Detailed Description
The invention will be further described with reference to examples and drawings, to which reference is made, but which are not intended to limit the scope of the invention.
In view of the two major challenges faced by unmanned craft night navigation: the invention uses an infrared light target recognition system as a core, fully uses a modern artificial intelligent algorithm, continuously and iteratively optimizes the recognition performance of the weight of the depth convolution network model under the precondition of continuously expanding a data set, and then completes the real-time operation of the unmanned ship through application and deployment. Accordingly, the invention provides a fusion method for infrared shore line segmentation and target detection of a water surface unmanned ship under a dark condition, which is shown in fig. 1 and 2 and specifically comprises the following steps.
And S1, establishing a data set.
Carrying various types of thermal infrared imagers on an unmanned ship, wherein the thermal infrared imagers comprise a large-scale M300 unmanned ship, the thermal infrared imagers shoot at low altitude under a water area scene, a navigation view angle of the unmanned ship is simulated, video shooting is carried out on the water surface, then frame extraction, structural similarity weight removal and manual screening processing are carried out on the video to obtain an original image, a marking tool Labelimg is used for marking a rectangular frame on a water target, and a marking tool Labelme is used for marking a polygonal frame on a feasible domain to obtain an infrared bank line segmentation data set and an infrared target detection data set, and the infrared bank line segmentation data set comprises three categories including background, water and an obstacle; the infrared target detection data set comprises five categories of ship body, onshore person person_shore, ship person person_body, underwater person swimming and dolphin number one dolphin 1; the infrared shore line segmentation data set and the infrared target detection data set are respectively processed according to 8:1:1 into three sub-data sets of a training set, a verification set and a test set, and the sample number of each category in each sub-data set is consistent.
It is worth noting that the invention is implemented by adopting a deep learning scheme, a large amount of multi-scene live data is needed to be used as a support, the water area scene collected in the step preferably comprises ocean, inland, river side, lake and other different water areas as experimental scenes, and in order to promote the scene richness, the collection time mainly comprises night and early morning, infrared image data in different night time are enriched, five sections of different angle videos are collected altogether, and the original data set is obtained after frame extraction, weight removal, data cleaning and labeling are carried out.
In addition, in the aspect of data processing, a double histogram equalization and Gamma transformation method is adopted to process the image, so that the overall contrast of the image is improved, and details are enhanced.
And moreover, various thermal imagers are adopted to collect data, so that the disadvantage of visible light perception during night ship driving is solved. At present, visual perception-based research schemes basically use visible light images, but in practical application, intelligent traffic night driving safety is inevitably considered. In consideration of different color mixing schemes of different thermal imagers, the invention adopts the different thermal imagers and adopts two color mixing modes of white heat and iron oxide red to collect data, so that the weight of the network model is suitable for various thermal imagers.
And S2, establishing a model.
S21, adopting a deep labV3 Plus model of which the main feature is the res Net to extract a network as a shoreline segmentation network model, compiling the deep labV3 Plus model by utilizing a compiling function, setting super parameters of the deep labV3 Plus model, and repeatedly training and verifying the deep labV3 Plus model by utilizing a training set and a verification set of an infrared shoreline segmentation data set to obtain training weights based on the infrared shoreline segmentation data set. During training and verification, the encoder body portion of the deep v3 Plus model decomposes the standard convolution into a depth convolution and a point-by-point convolution, the depth convolution uses spatial convolution independently for each channel, and the point-by-point convolution is used to combine the output of the depth convolution. In the encoder, the primary effective feature layer compressed four times is subjected to feature extraction by using parallel cavity convolution and different rates respectively, then subjected to concat merging, and then subjected to 1 multiplied by 1 convolution compression features to obtain a feature map. In the decoder, the channel number is adjusted by using 1X 1 convolution for the primary effective feature layer compressed twice, then the primary effective feature layer is stacked with the result of the effective feature up-sampling after the cavity convolution output by the decoder, after the stacking is completed, the depth separable convolution is carried out twice, the final effective feature layer is obtained, the channel adjustment is carried out by using one 1X 1 convolution, the channel adjustment is carried out to num_classes, and finally the up-sampling is carried out by using the resize, so that the width and the height of the final output layer are the same as those of the input picture.
S22, a Yolov5m model of a network is extracted by taking a dark net-53 as a main feature and is taken as a network model for target detection, the Yolov5m model is compiled by utilizing a compiling function, super parameters of the Yolov5m model are set, the Yolov5m model weight based on a voc2012 dataset is taken as a pre-training weight for migration learning, and then the Yolov5m model is repeatedly trained and verified by utilizing a training set and a verification set of an infrared target detection dataset, so that the training weight based on the infrared target detection dataset is obtained. During training and verification, the YOLOV5m model performs feature extraction on an input image through a dark net53, three feature layers are extracted at a feature utilization part, the three feature layers are located at different positions of a trunk part dark net53 and are respectively located at a middle layer, a middle lower layer and a bottom layer, shaps of the three feature layers are respectively (52,52,256), (26,26,512) and (13,13,1024), the three feature layers are subjected to convolution processing for 5 times, a part of the processed three feature layers are used for outputting a prediction result corresponding to the feature layers, and a part of the processed three feature layers are used for being combined with other feature layers after deconvolution of umSampling2 d.
Preferably, the aforementioned dark-53 is composed of dark Conv2D and a residual network residual module, the residual convolution in dark 53 is first performed once for 3*3, the step length is 2, then the convolution layer is saved, then the convolution of 1*1 and the convolution of 3*3 are performed once, the layer is added to the result as the final result, then a large number of residual layer-skipping connections are used, five downsampling is performed, the step length is 2, the convolution kernel size is 3, the characteristic dimensions are 64, 128, 256, 512 and 1024 respectively, an average pooling layer and a full connection layer are not used, L2 regularization is performed when each convolution is performed, and the batch normalization and the LeakyReLU activation function are performed after the convolution is completed, and the Leaky ReLU activation function is:
it should be noted that the hyperparameters of the deep v3 Plus model and the YOLOV5m model in the present invention include at least the size (input_shape), the batch size (batch_size), the iteration number (epochs), the learning rate (lr), and the category number (num_class) of the image sample in the dataset to be input. In the present embodiment, the super parameters set are as follows:
size of image sample in dataset to be input: input_shape=416×416×3;
batch size: freeze-batch_size=8, unFreeze-batch_size=4; typically 2 n, such as 32, 64, 128;
iteration number: freeze_epochs=50, unFreeze_epochs=100;
learning rate: freeze_lr=1e-3, unFreeze_lr=1e-4;
category number: num_class=10.
S23, continuously optimizing hyper-parameters of the deep V3 Plus model and the YOLOV5m model which are repeatedly trained and verified by adopting a cross entropy loss function and an Adam loss function optimizer to obtain training weights based on the infrared bank line segmentation data set and the infrared target detection data set. The cross entropy loss function is a smooth function, the essence of which is the application of cross entropy in information theory in classification problems, and the formula is as follows:
adam loss function optimizer is an optimization method that calculates the adaptive learning rate of each parameter, i.e. stores the square of the past gradientIs also maintained with the past gradient +.>Is an exponential decay average value of:
wherein, the liquid crystal display device comprises a liquid crystal display device,for exponentially moving mean>Is square gradient->Is a gradient over a time step sequence.
If it isAnd->Initialized to the 0 vector, they are biased towards 0, so that an offset correction is made by calculating the offset corrected +.>And->To counteract these deviations:
the gradient update rule is:
s24, evaluating and predicting the deep labV3 Plus model and the YOLOV5m model under the training weight.
1) Evaluation: screening the obtained training weights, and selecting the weight with the lowest total loss and valloss as the weight of the deep V3 Plus model and the YOLOV5m model; and testing the deep V3 Plus model and the YOLOV5m model by using test sets of the infrared bank line segmentation data set and the infrared target detection data set respectively to obtain average precision MAP values of the mIoU and the YOLOV5m model of the deep V3 Plus model, and adjusting super parameters of the deep V3 Plus model and the YOLOV5m model according to the required value of the evaluation index, and retraining until the requirements are met. To see how many epochs have been trained to saturate and prevent overfitting (in this embodiment, 50 epochs are trained by selecting frozen network parameters and then all parameters are thawed and 50 epochs are trained), the tensorbard tool module under the tensorface framework is used to draw the train loss, val loss curves.
2) And (3) predicting: testing a deep V3 Plus model by using a verification set of an infrared bank line segmentation data set to obtain masks, and calculating IoU of each category to count mIoU; and testing the YOLOV5m model by using a verification set of the infrared target detection data set, and obtaining the precision AP value and the average precision MAP value of each class of target detection by drawing the MAP program. The purpose of this prediction is to verify the final effect of the method according to the invention, and if the effect is not ideal, to increase the amount of training data, or to continue adjusting the hyper-parameters of the model until the desired effect is achieved.
S25, establishing a Pytorch frame network model of unmanned ship infrared shore line segmentation and target detection under a dark condition, wherein the Pytorch frame network model comprises a deep V3 Plus model of infrared shore line segmentation after super parameter optimization and a YOLOV5m model of target detection, and the deep V3 Plus model and the YOLOV5m model are fused at a decision level in a serial double-thread reasoning mode, as shown in FIG. 3. In order to enable the Pytorch frame network model to be put into use on line on an unmanned boat, the invention converts the weight file of the Pytorch frame network model into the weight file of the TensorRT frame network model, and the weight file is migrated to an edge computing platform on the unmanned boat, and preferably the edge computing platform is an Injettison embedded platform.
It is to be noted that the decision-level fusion is carried out on the deep V3 Plus model and the YOLOv5m model, so that the safety ship can be guaranteed in all directions, and the recognition of the obstacle in water has great challenges to the segmentation network, and the target detection network can still keep better capability in a complex environment, so that the obstacle recognition is assisted by adopting the target detection scheme.
Step S3, model application: and (2) transmitting the infrared shore line segmentation data and the infrared target detection data obtained by processing the water area scene video shot in real time by the thermal infrared imager carried by the unmanned aerial vehicle through the method related to the step (S1) to an edge computing platform, and processing the infrared shore line segmentation data and the infrared target detection data through a TensorRT frame network model to obtain the recognition result of the on-water target and the feasible region.
Next, in this embodiment, a shoreline split network model comparison experiment is performed, three split networks are tried, and a comparison experiment is given in terms of accuracy, speed and deployment, and the experimental results are shown in table 1 below, which shows that the DeeplabV3 Plus model using res net as a backbone achieves 20FPS in terms of recognition speed, meets actual navigation requirements, and can be deployed in an embedded platform of an unmanned ship.
TABLE 1
Network model | mIoU | FPS | Deployment |
DeeplabV3(Resnet) | 0.956802 | 20 | Implementing deployment |
DeeplabV3(Xception) | 0.953353 | 5 | Undeployable |
U-Net | 0.979853 | 12 | Implementing deployment |
In addition, the present embodiment performs a comparison experiment of the target detection network model, and establishes an infrared target detection data set, which is 4186 pictures in total, according to 8: the proportion of 2 is divided into training sets and testing sets, 3347 training sets and 839 testing sets are obtained. Setting the iteration round number target as 400, simultaneously monitoring the loss value of the training set, stopping when the change of the training set is extremely small for a long time, and ensuring that the network is fully trained to be converged. In this experiment, default parameters were optimized according to actual requirements, as shown in table 2 below.
TABLE 2
Parameter name | Before optimization | After optimization |
initial learning rate | 0.01 | 0.00816 |
final OneCycleLR learning rate | 0.2 | 0.25725 |
SGD momentum | 0.937 | 0.98 |
warmup_bias_lr | 0.1 | 0.11521 |
image HSV-Hue augmentation | 0.015 | 0.01734 |
image HSV-Saturation augmentation | 0.7 | 0.9 |
image HSV-Value augmentation | 0.4 | 0.44829 |
box loss gain | 0.05 | 0.03384 |
cls loss gain | 0.5 | 0.6195 |
warmup_epochs | 3.0 | 2.71044 |
warmup_momentum | 0.8 | 0.66111 |
In the experiment, a plurality of network models such as SSD, YOLOv3, YOLOv5 and the like are practiced, the performance of each model on a thermal image water target data set is compared with that shown in the following table 3, the identification speed of a target detection network model is judged by using FPS, the identification capacity is judged by using average precision MAP, calculation is included when IoU of a prediction frame and an actual frame is larger than 0.5, factors such as detection precision, identification speed, training time and weight are comprehensively considered, and the results in the table 3 show that YOLOv5m is selected as a basic network, so that quick reasoning of the water target and quick deployment of the model can be realized. The final effect of the experiment is shown in fig. 4, the left graph in fig. 4 shows the white heat infrared image recognition result, and the right graph shows the iron red infrared image recognition result.
TABLE 3 Table 3
The foregoing embodiments are preferred embodiments of the present invention, and in addition, the present invention may be implemented in other ways, and any obvious substitution is within the scope of the present invention without departing from the concept of the present invention.
In order to facilitate understanding of the improvements of the present invention over the prior art, some of the figures and descriptions of the present invention have been simplified, and some other elements have been omitted from this document for clarity, as will be appreciated by those of ordinary skill in the art.
Claims (8)
1. The infrared shore line segmentation and target detection fusion method of the unmanned surface vehicle under the dark condition is characterized by comprising the following steps of:
step S1, data set establishment: the unmanned aerial vehicle carries a plurality of infrared thermal imagers, so that the infrared thermal imagers shoot at low altitude under a water area scene, a navigation view angle of the unmanned aerial vehicle is simulated, shooting video is carried out on the water surface, the video is processed to obtain an original image, an annotation tool is used for annotating the original image data to obtain an infrared bank line segmentation data set and an infrared target detection data set, and the two data sets are divided into a training set, a test set and a verification set according to a preset ratio;
step S2, establishing a model: adopting a deep V3 Plus model as a shoreline segmentation network model, and adopting a YOLOV5m model as a target detection network model; setting super parameters of a deep V3 Plus model and a YOLOV5m model, adopting the YOLOV5m model weight based on a voc2012 dataset as a pre-training weight to perform migration learning, then repeatedly training and verifying the deep V3 Plus model by utilizing a training set of an infrared shoreline segmentation dataset, and repeatedly training and verifying the YOLOV5m model by utilizing a training set of an infrared target detection dataset to obtain a training weight based on the infrared shoreline segmentation dataset and the infrared target detection dataset; then, evaluating and predicting the deep labV3 Plus model under the training weight by using a test set and a verification set of the infrared bank line segmentation data set, evaluating and predicting the Yolov5m model under the training weight by using a test set and a verification set of the infrared target detection data set, continuously adjusting the super parameters of the deep labV3 Plus model and the Yolov5m model according to the evaluation and test results, and establishing a Pytorch framework network model in which a decision level carries out cascading on the deep labV3 Plus model and the Yolov5m model after optimizing the super parameters; converting the weight file of the Pytorch frame network model into the weight file of the TensorRT frame network model, and transferring the weight file to an edge computing platform on the unmanned ship;
step S3, model application: and (2) transmitting the infrared shore line segmentation data and the infrared target detection data obtained by processing the water area scene video shot in real time by the thermal infrared imager carried by the unmanned aerial vehicle through the method related to the step (S1) to an edge computing platform, and processing the infrared shore line segmentation data and the infrared target detection data through a TensorRT frame network model to obtain the recognition result of the on-water target and the feasible region.
2. The fusion method for infrared land line segmentation and target detection of the unmanned surface vehicle under the dark condition according to claim 1, wherein the fusion method is characterized by comprising the following steps: step S1, data set establishment: an infrared thermal imager of a M300 unmanned aerial vehicle in a large-scale area is mounted on an unmanned aerial vehicle, the infrared thermal imager shoots at low altitude in a water area scene, a navigation view angle of the unmanned aerial vehicle is simulated, video is shot on the water surface, frame extraction, de-duplication and screening are carried out on the video to obtain an original image, a marking tool Labelimg is used for marking a rectangular frame of an on-water target, and a marking tool Labelme is used for marking a polygonal frame of a feasible region to obtain an infrared bank line segmentation data set and an infrared target detection data set, wherein the infrared bank line segmentation data set comprises three categories including background, water and obstacle; the infrared target detection data set comprises five categories of ship body, onshore person person_shore, ship person person_body, underwater person swimming and dolphin number one dolphin 1; the infrared shore line segmentation data set and the infrared target detection data set are respectively processed according to 8:1:1 into three sub-data sets of a training set, a verification set and a test set, and the sample number of each category in each sub-data set is consistent.
3. The fusion method for infrared land line segmentation and target detection of the unmanned surface vehicle under the dark condition according to claim 2, wherein the fusion method is characterized in that: when the deep V3 Plus model is used as a network model for shoreline segmentation:
adopting a resnet as a trunk feature extraction network, decomposing a standard convolution into a depth convolution and a point-by-point convolution by an encoder main body part, wherein the depth convolution independently uses spatial convolution for each channel, and the point-by-point convolution is used for combining the output of the depth convolution; in the decoder, performing feature extraction by using parallel cavity convolution on the primary effective feature layers compressed four times, respectively using different rates, performing concat merging, and performing 1 multiplied by 1 convolution compression on the features to obtain a feature map; in the decoder, the channel number is adjusted by using 1X 1 convolution for the primary effective feature layer compressed twice, then the primary effective feature layer is stacked with the result of the effective feature up-sampling after the cavity convolution output by the decoder, after the stacking is completed, the depth separable convolution is carried out twice, the final effective feature layer is obtained, the channel adjustment is carried out by using one 1X 1 convolution, the channel adjustment is carried out to num_classes, and finally the up-sampling is carried out by using the resize, so that the width and the height of the final output layer are the same as those of the input picture.
4. The fusion method for infrared shore line segmentation and target detection of the unmanned surface vehicle under the dark condition according to claim 2 or 3, wherein the fusion method is characterized in that: the YOLOV5m model is used as a network model for target detection:
and adopting a dark net-53 as a trunk feature extraction network, carrying out feature extraction on an input image through the dark net53, extracting three feature layers in total in a feature utilization part, carrying out convolution processing on the three feature layers for 5 times, outputting a prediction result corresponding to the feature layers after the processing, and combining other feature layers after a part of the processing is used for carrying out deconvolution umSampling2 d.
5. The fusion method for infrared land line segmentation and target detection of the unmanned surface vehicle under the dark condition according to claim 4, wherein the fusion method is characterized by comprising the following steps: the dark-53 is composed of dark Conv2D and a residual network residual module, the residual convolution in the dark 53 is firstly carried out with a convolution of 3*3 and a step length of 2, then the convolution layer is saved, the convolution of 1*1 and the convolution of 3*3 are carried out again, the layer is added to the result as a final result, then a large number of residual layer jump connections are used, five downsampling is carried out, the step length is 2, the convolution kernel size is 3, the characteristic dimensions are 64, 128, 256, 512 and 1024 respectively, an average pooling layer and a full connection layer are not used, L2 regularization is carried out when each convolution is carried out, and the batch normalization and the LeakyReLU activation function are carried out after the convolution is completed, wherein the Leaky ReLU activation function is:
the YOLOV5m model features are subjected to target detection by partially extracting multiple feature layers, three feature layers are extracted in total, the three feature layers are positioned at different positions of a trunk part dark 53 and are respectively positioned at a middle layer, a middle lower layer and a bottom layer, and shapes of the three feature layers are (52,52,256), (26,26,512) and (13,13,1024) respectively; and carrying out convolution treatment on the three feature layers for 5 times, wherein after the treatment is finished, part of the three feature layers are used for outputting a prediction result corresponding to the feature layers, and the other part of the three feature layers are used for carrying out deconvolution Umsampling2d and then are combined with other feature layers.
6. The fusion method for infrared land line segmentation and target detection of the unmanned surface vehicle under the dark condition according to claim 5, wherein the fusion method is characterized by comprising the following steps: the super parameters of the deep V3 Plus model and the YOLOV5m model comprise the size, batch size, iteration times, learning rate and category number of the image samples in the data set to be input.
7. The fusion method for infrared land line segmentation and target detection of the unmanned surface vehicle under the dark condition according to claim 6, wherein the fusion method is characterized by comprising the following steps: in the step S2, after repeated training and verification are performed on the deep v3 Plus model and the YOLOV5m model, the cross entropy loss function and Adam loss function optimizer are adopted to continuously optimize the super parameters of the two models, so as to obtain training weights based on the infrared shoreline segmentation dataset and the infrared target detection dataset.
8. The fusion method for infrared land line segmentation and target detection of the unmanned surface vehicle under the dark condition according to claim 7, wherein the fusion method is characterized in that: in the step S2, when evaluating and predicting the deep labv3 Plus model and the YOLOV5m model under training weight:
1) Evaluation: screening the obtained training weights, and selecting the weight with the lowest total loss and valloss as the weight of the deep V3 Plus model and the YOLOV5m model; respectively testing the deep V3 Plus model and the YOLOV5m model by using a test set of the infrared bank line segmentation data set and the infrared target detection data set to obtain average precision MAP values of the mIoU and the YOLOV5m model of the deep V3 Plus model, and adjusting super parameters of the deep V3 Plus model and the YOLOV5m model according to a required value of an evaluation index, and retraining until the requirements are met; simultaneously, drawing train loss and val loss curves by using a tensorbard tool module under a tensorface frame;
2) And (3) predicting: testing a deep V3 Plus model by using a verification set of an infrared bank line segmentation data set to obtain masks, and calculating IoU of each category to count mIoU; and testing the YOLOV5m model by using a testing set of the infrared target detection data set, and obtaining the precision AP value and the average precision MAP value of each class of target detection by drawing the MAP program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310166583.4A CN116229069A (en) | 2023-02-27 | 2023-02-27 | Infrared shore line segmentation and target detection fusion method for unmanned surface vehicle under dark condition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310166583.4A CN116229069A (en) | 2023-02-27 | 2023-02-27 | Infrared shore line segmentation and target detection fusion method for unmanned surface vehicle under dark condition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116229069A true CN116229069A (en) | 2023-06-06 |
Family
ID=86588768
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310166583.4A Pending CN116229069A (en) | 2023-02-27 | 2023-02-27 | Infrared shore line segmentation and target detection fusion method for unmanned surface vehicle under dark condition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116229069A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116721346A (en) * | 2023-06-14 | 2023-09-08 | 山东省煤田地质规划勘察研究院 | Shore line intelligent recognition method based on deep learning algorithm |
-
2023
- 2023-02-27 CN CN202310166583.4A patent/CN116229069A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116721346A (en) * | 2023-06-14 | 2023-09-08 | 山东省煤田地质规划勘察研究院 | Shore line intelligent recognition method based on deep learning algorithm |
CN116721346B (en) * | 2023-06-14 | 2024-05-07 | 山东省煤田地质规划勘察研究院 | Shore line intelligent recognition method based on deep learning algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113065558B (en) | Lightweight small target detection method combined with attention mechanism | |
CN108647655B (en) | Low-altitude aerial image power line foreign matter detection method based on light convolutional neural network | |
WO2020156028A1 (en) | Outdoor non-fixed scene weather identification method based on deep learning | |
CN110458844B (en) | Semantic segmentation method for low-illumination scene | |
CN103927531B (en) | It is a kind of based on local binary and the face identification method of particle group optimizing BP neural network | |
CN107145846A (en) | A kind of insulator recognition methods based on deep learning | |
CN109800735A (en) | Accurate detection and segmentation method for ship target | |
CN111079739B (en) | Multi-scale attention feature detection method | |
CN113052834B (en) | Pipeline defect detection method based on convolution neural network multi-scale features | |
CN112001868A (en) | Infrared and visible light image fusion method and system based on generation of antagonistic network | |
CN114724019A (en) | Remote sensing image sea ice intelligent monitoring method based on wavelet transformation and separable convolution semantic segmentation | |
CN113642606B (en) | Marine ship detection method based on attention mechanism | |
CN111985274A (en) | Remote sensing image segmentation algorithm based on convolutional neural network | |
CN113408340B (en) | Dual-polarization SAR small ship detection method based on enhanced feature pyramid | |
CN116229069A (en) | Infrared shore line segmentation and target detection fusion method for unmanned surface vehicle under dark condition | |
CN115410087A (en) | Transmission line foreign matter detection method based on improved YOLOv4 | |
CN113469097B (en) | Multi-camera real-time detection method for water surface floaters based on SSD network | |
CN114358178A (en) | Airborne thermal imaging wild animal species classification method based on YOLOv5 algorithm | |
CN113989612A (en) | Remote sensing image target detection method based on attention and generation countermeasure network | |
CN116659516B (en) | Depth three-dimensional attention visual navigation method and device based on binocular parallax mechanism | |
Meng et al. | A modified fully convolutional network for crack damage identification compared with conventional methods | |
CN112132207A (en) | Target detection neural network construction method based on multi-branch feature mapping | |
CN116820131A (en) | Unmanned aerial vehicle tracking method based on target perception ViT | |
CN116977866A (en) | Lightweight landslide detection method | |
CN116310892A (en) | Marine rescue method based on improved YOLOV4 target detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |