CN116229069A - Infrared shore line segmentation and target detection fusion method for unmanned surface vehicle under dark condition - Google Patents

Infrared shore line segmentation and target detection fusion method for unmanned surface vehicle under dark condition Download PDF

Info

Publication number
CN116229069A
CN116229069A CN202310166583.4A CN202310166583A CN116229069A CN 116229069 A CN116229069 A CN 116229069A CN 202310166583 A CN202310166583 A CN 202310166583A CN 116229069 A CN116229069 A CN 116229069A
Authority
CN
China
Prior art keywords
model
infrared
target detection
convolution
yolov5m
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310166583.4A
Other languages
Chinese (zh)
Inventor
何赟泽
刘圳康
熊锐
郭海艳
邓海明
谯灵俊
王洪金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202310166583.4A priority Critical patent/CN116229069A/en
Publication of CN116229069A publication Critical patent/CN116229069A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/12Details of acquisition arrangements; Constructional details thereof
    • G06V10/14Optical characteristics of the device performing the acquisition or on the illumination arrangements
    • G06V10/143Sensing or illuminating at different wavelengths
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/05Underwater scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

According to the infrared shoreline segmentation and target detection fusion method for the unmanned surface vehicle under the dark condition, an infrared shoreline segmentation and target detection data set is firstly established, a deep V3 Plus model and a YOLOv5L model are then established, the two models are trained by utilizing the infrared shoreline segmentation and target detection data set to obtain training weights, then the deep V3 Plus model and the YOLOv5L model under the training weights are evaluated and predicted, super parameters are adjusted, a Pytorch frame network model is established, a decision level is established, and the Pytorch frame network model is cascaded with the deep V3 Plus model and the YOLOV5m model, and finally a weight file of the Pytorch frame network model is converted into a weight file of a TensorRT frame network model and is transferred to an edge computing platform on the unmanned surface vehicle, so that the recognition of the target and a feasible domain on the water based on the edge computing platform is realized.

Description

Infrared shore line segmentation and target detection fusion method for unmanned surface vehicle under dark condition
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an infrared shore line segmentation and target detection fusion method of a water surface unmanned ship under a dark condition.
Background
The traditional unmanned ship water surface sensing technology mainly depends on millimeter wave radar, laser radar (LiDAR), inertial measurement unit, GPS and other sensors carried on the unmanned ship. In recent years, the perception technology based on computer vision is developed rapidly, the optical image contains more abundant target area detail information, so the perception technology based on vision is easier to effectively distinguish the water surface targets, the research on shoreline segmentation based on infrared thermal imaging in the industry is very few, and the unmanned ship voyages at night still has a great challenge, so the research on the unmanned ship water target recognition and water area environment perception and positioning technology based on the infrared thermal imaging visual image is particularly important.
Disclosure of Invention
In order to realize effective identification of an unmanned ship on water targets and feasible regions under a dark condition, the invention provides an infrared shore line segmentation and target detection fusion method of the unmanned ship on water.
In order to solve the technical problems, the invention adopts the following technical methods: an infrared shore line segmentation and target detection fusion method of a water surface unmanned ship under a dark condition comprises the following steps:
step S1, data set establishment: the unmanned aerial vehicle carries a plurality of infrared thermal imagers, so that the infrared thermal imagers shoot at low altitude under a water area scene, a navigation view angle of the unmanned aerial vehicle is simulated, shooting video is carried out on the water surface, the video is processed to obtain an original image, an annotation tool is used for annotating the original image data to obtain an infrared bank line segmentation data set and an infrared target detection data set, and the two data sets are divided into a training set, a verification set and a test set according to a preset ratio;
step S2, establishing a model: adopting a deep V3 Plus model as a shoreline segmentation network model, and adopting a YOLOV5m model as a target detection network model; setting super parameters of a deep V3 Plus model and a YOLOV5m model, adopting the YOLOV5m model weight based on a voc2012 dataset as a pre-training weight to perform migration learning, then repeatedly training and verifying the deep V3 Plus model by utilizing a training set of an infrared shoreline segmentation dataset, and repeatedly training and verifying the YOLOV5m model by utilizing a training set of an infrared target detection dataset to obtain a training weight based on the infrared shoreline segmentation dataset and the infrared target detection dataset; then, evaluating and predicting the deep labV3 Plus model under the training weight by using a test set and a verification set of the infrared bank line segmentation data set, evaluating and predicting the Yolov5m model under the training weight by using a test set and a verification set of the infrared target detection data set, continuously adjusting the super parameters of the deep labV3 Plus model and the Yolov5m model according to the evaluation and test results, and establishing a Pytorch framework network model in which a decision level carries out cascading on the deep labV3 Plus model and the Yolov5m model after optimizing the super parameters; converting the weight file of the Pytorch frame network model into the weight file of the TensorRT frame network model, and transferring the weight file to an edge computing platform on the unmanned ship;
step S3, model application: and (2) transmitting the infrared shore line segmentation data and the infrared target detection data obtained by processing the water area scene video shot in real time by the thermal infrared imager carried by the unmanned aerial vehicle through the method related to the step (S1) to an edge computing platform, and processing the infrared shore line segmentation data and the infrared target detection data through a TensorRT frame network model to obtain the recognition result of the on-water target and the feasible region.
Further, in step S1, a data set is established: an infrared thermal imager of a M300 unmanned aerial vehicle in a large-scale area is mounted on an unmanned aerial vehicle, the infrared thermal imager shoots at low altitude in a water area scene, a navigation view angle of the unmanned aerial vehicle is simulated, video is shot on the water surface, frame extraction, de-duplication and screening are carried out on the video to obtain an original image, a marking tool Labelimg is used for marking a rectangular frame of an on-water target, and a marking tool Labelme is used for marking a polygonal frame of a feasible region to obtain an infrared bank line segmentation data set and an infrared target detection data set, wherein the infrared bank line segmentation data set comprises three categories including background, water and obstacle; the infrared target detection data set comprises five categories of ship body, onshore person person_shore, ship person person_body, underwater person swimming and dolphin number one dolphin 1; the infrared shore line segmentation data set and the infrared target detection data set are respectively processed according to 8:1:1 into three sub-data sets of a training set, a verification set and a test set, and the sample number of each category in each sub-data set is consistent.
Further, when the deep v3 Plus model is used as a network model for shoreline segmentation:
adopting a resnet as a trunk feature extraction network, decomposing a standard convolution into a depth convolution and a point-by-point convolution by an encoder main body part, wherein the depth convolution independently uses spatial convolution for each channel, and the point-by-point convolution is used for combining the output of the depth convolution; in the decoder, performing feature extraction by using parallel cavity convolution on the primary effective feature layers compressed four times, respectively using different rates, performing concat merging, and performing 1 multiplied by 1 convolution compression on the features to obtain a feature map; in the decoder, the channel number is adjusted by using 1X 1 convolution for the primary effective feature layer compressed twice, then the primary effective feature layer is stacked with the result of the effective feature up-sampling after the cavity convolution output by the decoder, after the stacking is completed, the depth separable convolution is carried out twice, the final effective feature layer is obtained, the channel adjustment is carried out by using one 1X 1 convolution, the channel adjustment is carried out to num_classes, and finally the up-sampling is carried out by using the resize, so that the width and the height of the final output layer are the same as those of the input picture.
Further, when the YOLOV5m model is used as a network model for target detection:
and adopting a dark net-53 as a trunk feature extraction network, carrying out feature extraction on an input image through the dark net53, extracting three feature layers in total in a feature utilization part, carrying out convolution processing on the three feature layers for 5 times, outputting a prediction result corresponding to the feature layers after the processing, and combining other feature layers after a part of the processing is used for carrying out deconvolution umSampling2 d.
Preferably, the dark-53 is composed of dark Conv2D and a residual network residual module, the residual convolution in the dark 53 is firstly carried out with a convolution of 3*3 and a step length of 2, then the convolution layer is saved, the convolution of 1*1 and the convolution of 3*3 are carried out again, the result is added with layer as a final result, then a large number of residual layer jump connection are used, five downsampling is carried out, the step length is 2, the convolution kernel size is 3, the characteristic dimensions are 64, 128, 256, 512 and 1024 respectively, an average pooling layer and a full connection layer are not used, L2 regularization is carried out when each convolution is carried out, and the batch normalization and the LeakyReLU activation function are carried out after the convolution is completed, wherein the Leaky ReLU activation function is:
Figure SMS_1
the YOLOV5m model features are subjected to target detection by partially extracting multiple feature layers, three feature layers are extracted in total, the three feature layers are positioned at different positions of a trunk part dark 53 and are respectively positioned at a middle layer, a middle lower layer and a bottom layer, and shapes of the three feature layers are (52,52,256), (26,26,512) and (13,13,1024) respectively; and carrying out convolution treatment on the three feature layers for 5 times, wherein after the treatment is finished, part of the three feature layers are used for outputting a prediction result corresponding to the feature layers, and the other part of the three feature layers are used for carrying out deconvolution Umsampling2d and then are combined with other feature layers.
Still further, the super parameters of the deep v3 Plus model and the YOLOV5m model include the size of the image sample in the dataset to be input, the batch size, the number of iterations, the learning rate, and the number of categories.
In step S2, after repeated training and verification of the deep v3 Plus model and the YOLOV5m model, the cross entropy loss function and Adam loss function optimizer are adopted to continuously optimize the super parameters of the two models, so as to obtain training weights based on the infrared shoreline segmentation dataset and the infrared target detection dataset.
Preferably, in the step S2, when evaluating and predicting the DeeplabV3 Plus model and the YOLOV5m model under training weights:
1) Evaluation: screening the obtained training weights, and selecting the weight with the lowest total loss and valloss as the weight of the deep V3 Plus model and the YOLOV5m model; respectively testing the deep V3 Plus model and the YOLOV5m model by using a test set of the infrared bank line segmentation data set and the infrared target detection data set to obtain average precision MAP values of the mIoU and the YOLOV5m model of the deep V3 Plus model, and adjusting super parameters of the deep V3 Plus model and the YOLOV5m model according to a required value of an evaluation index, and retraining until the requirements are met; simultaneously, drawing train loss and val loss curves by using a tensorbard tool module under a tensorface frame;
2) And (3) predicting: testing a deep V3 Plus model by using a verification set of an infrared bank line segmentation data set to obtain masks, and calculating IoU of each category to count mIoU; and testing the YOLOV5m model by using a testing set of the infrared target detection data set, and obtaining the precision AP value and the average precision MAP value of each class of target detection by drawing the MAP program.
In summary, in order to solve the problem that unmanned ships cannot navigate autonomously and intelligently due to insufficient illumination under dark conditions, the invention provides an infrared shoreline segmentation and target detection fusion method, which comprises the following steps:
1. the invention aims at solving the problems of target visualization and data acquisition by adopting an infrared thermal imaging technology in dark environment. And carrying out frame extraction, structural similarity de-duplication and manual screening on the acquired video to build an original database. Rectangular frame labeling is carried out on the water target identification task by using a labeling tool 'Labelimg', polygonal frame labeling is carried out on the feasible region identification task by using a labeling tool 'Labelme', and therefore an infrared bank line segmentation data set and an infrared target detection data set are built.
2. Aiming at a feasible domain identification task, the invention trains the deep V3 Plus network based on the infrared bank line segmentation data set, adopts a man-in-loop data set and network optimization scheme to optimize the data set and the network weight, and finally obtains high-performance weight.
3. According to the method, a YOLOv5m network is trained based on an infrared target detection data set aiming at a water target recognition task, a loop data set and a network optimization scheme of a user are adopted to optimize the data set and the network weight, and finally high-performance weight is obtained.
4. According to the invention, the water target detection weight and the feasible region identification weight under the Pytorch frame network model are obtained based on network training, online reasoning of two networks of the same input image is realized by adopting a double-thread architecture, decision-level fusion is carried out on reasoning results, and the positioning of the water target is realized by combining the target detection result and the semantic segmentation result of the water target.
5. According to the invention, the weight file of the Pytorch frame network model is converted into the weight file of the TensorRT frame network model, and the weight file of the TensorRT frame network model is migrated to the edge computing platform, so that the recognition of the water targets and the feasible regions based on the edge computing platform is realized. The target detection mAP on the edge computing platform is not lower than 92.65%, the shoreline segmentation mIOU is not lower than 74.15%, and the reasoning speed is above 20 FPS.
Therefore, the method is mainly based on a visual image processing technology of infrared thermal imaging, the coastline is segmented, meanwhile, the obstacle on the water surface is identified, and then the deep V3 Plus model of infrared coastline segmentation and the YOLOV5m model of target detection related to the method are deployed on an edge computing platform at the same time, so that the real-time identification of the target and the feasible region on the water is realized. According to the system structure deployed by the aid of the advanced learning model, information among different stages and different participants is effectively integrated by the intelligent perception system based on infrared light, so that recognition of an on-water target and a feasible region is achieved, the effect is very good, and the actual navigation requirement of an unmanned ship is greatly met.
Drawings
FIG. 1 is a flow chart of a fusion method of infrared land line segmentation and target detection of a surface unmanned ship under dark conditions according to the invention;
FIG. 2 is a schematic diagram of a system involved in the infrared land line segmentation and target detection fusion method of the present invention;
FIG. 3 is a schematic diagram showing the fusion of infrared land segmentation and object detection in the present invention;
FIG. 4 is a schematic diagram of experimental results in an embodiment of the present invention.
Detailed Description
The invention will be further described with reference to examples and drawings, to which reference is made, but which are not intended to limit the scope of the invention.
In view of the two major challenges faced by unmanned craft night navigation: the invention uses an infrared light target recognition system as a core, fully uses a modern artificial intelligent algorithm, continuously and iteratively optimizes the recognition performance of the weight of the depth convolution network model under the precondition of continuously expanding a data set, and then completes the real-time operation of the unmanned ship through application and deployment. Accordingly, the invention provides a fusion method for infrared shore line segmentation and target detection of a water surface unmanned ship under a dark condition, which is shown in fig. 1 and 2 and specifically comprises the following steps.
And S1, establishing a data set.
Carrying various types of thermal infrared imagers on an unmanned ship, wherein the thermal infrared imagers comprise a large-scale M300 unmanned ship, the thermal infrared imagers shoot at low altitude under a water area scene, a navigation view angle of the unmanned ship is simulated, video shooting is carried out on the water surface, then frame extraction, structural similarity weight removal and manual screening processing are carried out on the video to obtain an original image, a marking tool Labelimg is used for marking a rectangular frame on a water target, and a marking tool Labelme is used for marking a polygonal frame on a feasible domain to obtain an infrared bank line segmentation data set and an infrared target detection data set, and the infrared bank line segmentation data set comprises three categories including background, water and an obstacle; the infrared target detection data set comprises five categories of ship body, onshore person person_shore, ship person person_body, underwater person swimming and dolphin number one dolphin 1; the infrared shore line segmentation data set and the infrared target detection data set are respectively processed according to 8:1:1 into three sub-data sets of a training set, a verification set and a test set, and the sample number of each category in each sub-data set is consistent.
It is worth noting that the invention is implemented by adopting a deep learning scheme, a large amount of multi-scene live data is needed to be used as a support, the water area scene collected in the step preferably comprises ocean, inland, river side, lake and other different water areas as experimental scenes, and in order to promote the scene richness, the collection time mainly comprises night and early morning, infrared image data in different night time are enriched, five sections of different angle videos are collected altogether, and the original data set is obtained after frame extraction, weight removal, data cleaning and labeling are carried out.
In addition, in the aspect of data processing, a double histogram equalization and Gamma transformation method is adopted to process the image, so that the overall contrast of the image is improved, and details are enhanced.
And moreover, various thermal imagers are adopted to collect data, so that the disadvantage of visible light perception during night ship driving is solved. At present, visual perception-based research schemes basically use visible light images, but in practical application, intelligent traffic night driving safety is inevitably considered. In consideration of different color mixing schemes of different thermal imagers, the invention adopts the different thermal imagers and adopts two color mixing modes of white heat and iron oxide red to collect data, so that the weight of the network model is suitable for various thermal imagers.
And S2, establishing a model.
S21, adopting a deep labV3 Plus model of which the main feature is the res Net to extract a network as a shoreline segmentation network model, compiling the deep labV3 Plus model by utilizing a compiling function, setting super parameters of the deep labV3 Plus model, and repeatedly training and verifying the deep labV3 Plus model by utilizing a training set and a verification set of an infrared shoreline segmentation data set to obtain training weights based on the infrared shoreline segmentation data set. During training and verification, the encoder body portion of the deep v3 Plus model decomposes the standard convolution into a depth convolution and a point-by-point convolution, the depth convolution uses spatial convolution independently for each channel, and the point-by-point convolution is used to combine the output of the depth convolution. In the encoder, the primary effective feature layer compressed four times is subjected to feature extraction by using parallel cavity convolution and different rates respectively, then subjected to concat merging, and then subjected to 1 multiplied by 1 convolution compression features to obtain a feature map. In the decoder, the channel number is adjusted by using 1X 1 convolution for the primary effective feature layer compressed twice, then the primary effective feature layer is stacked with the result of the effective feature up-sampling after the cavity convolution output by the decoder, after the stacking is completed, the depth separable convolution is carried out twice, the final effective feature layer is obtained, the channel adjustment is carried out by using one 1X 1 convolution, the channel adjustment is carried out to num_classes, and finally the up-sampling is carried out by using the resize, so that the width and the height of the final output layer are the same as those of the input picture.
S22, a Yolov5m model of a network is extracted by taking a dark net-53 as a main feature and is taken as a network model for target detection, the Yolov5m model is compiled by utilizing a compiling function, super parameters of the Yolov5m model are set, the Yolov5m model weight based on a voc2012 dataset is taken as a pre-training weight for migration learning, and then the Yolov5m model is repeatedly trained and verified by utilizing a training set and a verification set of an infrared target detection dataset, so that the training weight based on the infrared target detection dataset is obtained. During training and verification, the YOLOV5m model performs feature extraction on an input image through a dark net53, three feature layers are extracted at a feature utilization part, the three feature layers are located at different positions of a trunk part dark net53 and are respectively located at a middle layer, a middle lower layer and a bottom layer, shaps of the three feature layers are respectively (52,52,256), (26,26,512) and (13,13,1024), the three feature layers are subjected to convolution processing for 5 times, a part of the processed three feature layers are used for outputting a prediction result corresponding to the feature layers, and a part of the processed three feature layers are used for being combined with other feature layers after deconvolution of umSampling2 d.
Preferably, the aforementioned dark-53 is composed of dark Conv2D and a residual network residual module, the residual convolution in dark 53 is first performed once for 3*3, the step length is 2, then the convolution layer is saved, then the convolution of 1*1 and the convolution of 3*3 are performed once, the layer is added to the result as the final result, then a large number of residual layer-skipping connections are used, five downsampling is performed, the step length is 2, the convolution kernel size is 3, the characteristic dimensions are 64, 128, 256, 512 and 1024 respectively, an average pooling layer and a full connection layer are not used, L2 regularization is performed when each convolution is performed, and the batch normalization and the LeakyReLU activation function are performed after the convolution is completed, and the Leaky ReLU activation function is:
Figure SMS_2
it should be noted that the hyperparameters of the deep v3 Plus model and the YOLOV5m model in the present invention include at least the size (input_shape), the batch size (batch_size), the iteration number (epochs), the learning rate (lr), and the category number (num_class) of the image sample in the dataset to be input. In the present embodiment, the super parameters set are as follows:
size of image sample in dataset to be input: input_shape=416×416×3;
batch size: freeze-batch_size=8, unFreeze-batch_size=4; typically 2 n, such as 32, 64, 128;
iteration number: freeze_epochs=50, unFreeze_epochs=100;
learning rate: freeze_lr=1e-3, unFreeze_lr=1e-4;
category number: num_class=10.
S23, continuously optimizing hyper-parameters of the deep V3 Plus model and the YOLOV5m model which are repeatedly trained and verified by adopting a cross entropy loss function and an Adam loss function optimizer to obtain training weights based on the infrared bank line segmentation data set and the infrared target detection data set. The cross entropy loss function is a smooth function, the essence of which is the application of cross entropy in information theory in classification problems, and the formula is as follows:
Figure SMS_3
adam loss function optimizer is an optimization method that calculates the adaptive learning rate of each parameter, i.e. stores the square of the past gradient
Figure SMS_4
Is also maintained with the past gradient +.>
Figure SMS_5
Is an exponential decay average value of:
Figure SMS_6
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_7
for exponentially moving mean>
Figure SMS_8
Is square gradient->
Figure SMS_9
Is a gradient over a time step sequence.
If it is
Figure SMS_10
And->
Figure SMS_11
Initialized to the 0 vector, they are biased towards 0, so that an offset correction is made by calculating the offset corrected +.>
Figure SMS_12
And->
Figure SMS_13
To counteract these deviations:
Figure SMS_14
the gradient update rule is:
Figure SMS_15
in the above formula:
Figure SMS_16
s24, evaluating and predicting the deep labV3 Plus model and the YOLOV5m model under the training weight.
1) Evaluation: screening the obtained training weights, and selecting the weight with the lowest total loss and valloss as the weight of the deep V3 Plus model and the YOLOV5m model; and testing the deep V3 Plus model and the YOLOV5m model by using test sets of the infrared bank line segmentation data set and the infrared target detection data set respectively to obtain average precision MAP values of the mIoU and the YOLOV5m model of the deep V3 Plus model, and adjusting super parameters of the deep V3 Plus model and the YOLOV5m model according to the required value of the evaluation index, and retraining until the requirements are met. To see how many epochs have been trained to saturate and prevent overfitting (in this embodiment, 50 epochs are trained by selecting frozen network parameters and then all parameters are thawed and 50 epochs are trained), the tensorbard tool module under the tensorface framework is used to draw the train loss, val loss curves.
2) And (3) predicting: testing a deep V3 Plus model by using a verification set of an infrared bank line segmentation data set to obtain masks, and calculating IoU of each category to count mIoU; and testing the YOLOV5m model by using a verification set of the infrared target detection data set, and obtaining the precision AP value and the average precision MAP value of each class of target detection by drawing the MAP program. The purpose of this prediction is to verify the final effect of the method according to the invention, and if the effect is not ideal, to increase the amount of training data, or to continue adjusting the hyper-parameters of the model until the desired effect is achieved.
S25, establishing a Pytorch frame network model of unmanned ship infrared shore line segmentation and target detection under a dark condition, wherein the Pytorch frame network model comprises a deep V3 Plus model of infrared shore line segmentation after super parameter optimization and a YOLOV5m model of target detection, and the deep V3 Plus model and the YOLOV5m model are fused at a decision level in a serial double-thread reasoning mode, as shown in FIG. 3. In order to enable the Pytorch frame network model to be put into use on line on an unmanned boat, the invention converts the weight file of the Pytorch frame network model into the weight file of the TensorRT frame network model, and the weight file is migrated to an edge computing platform on the unmanned boat, and preferably the edge computing platform is an Injettison embedded platform.
It is to be noted that the decision-level fusion is carried out on the deep V3 Plus model and the YOLOv5m model, so that the safety ship can be guaranteed in all directions, and the recognition of the obstacle in water has great challenges to the segmentation network, and the target detection network can still keep better capability in a complex environment, so that the obstacle recognition is assisted by adopting the target detection scheme.
Step S3, model application: and (2) transmitting the infrared shore line segmentation data and the infrared target detection data obtained by processing the water area scene video shot in real time by the thermal infrared imager carried by the unmanned aerial vehicle through the method related to the step (S1) to an edge computing platform, and processing the infrared shore line segmentation data and the infrared target detection data through a TensorRT frame network model to obtain the recognition result of the on-water target and the feasible region.
Next, in this embodiment, a shoreline split network model comparison experiment is performed, three split networks are tried, and a comparison experiment is given in terms of accuracy, speed and deployment, and the experimental results are shown in table 1 below, which shows that the DeeplabV3 Plus model using res net as a backbone achieves 20FPS in terms of recognition speed, meets actual navigation requirements, and can be deployed in an embedded platform of an unmanned ship.
TABLE 1
Network model mIoU FPS Deployment
DeeplabV3(Resnet) 0.956802 20 Implementing deployment
DeeplabV3(Xception) 0.953353 5 Undeployable
U-Net 0.979853 12 Implementing deployment
In addition, the present embodiment performs a comparison experiment of the target detection network model, and establishes an infrared target detection data set, which is 4186 pictures in total, according to 8: the proportion of 2 is divided into training sets and testing sets, 3347 training sets and 839 testing sets are obtained. Setting the iteration round number target as 400, simultaneously monitoring the loss value of the training set, stopping when the change of the training set is extremely small for a long time, and ensuring that the network is fully trained to be converged. In this experiment, default parameters were optimized according to actual requirements, as shown in table 2 below.
TABLE 2
Parameter name Before optimization After optimization
initial learning rate 0.01 0.00816
final OneCycleLR learning rate 0.2 0.25725
SGD momentum 0.937 0.98
warmup_bias_lr 0.1 0.11521
image HSV-Hue augmentation 0.015 0.01734
image HSV-Saturation augmentation 0.7 0.9
image HSV-Value augmentation 0.4 0.44829
box loss gain 0.05 0.03384
cls loss gain 0.5 0.6195
warmup_epochs 3.0 2.71044
warmup_momentum 0.8 0.66111
In the experiment, a plurality of network models such as SSD, YOLOv3, YOLOv5 and the like are practiced, the performance of each model on a thermal image water target data set is compared with that shown in the following table 3, the identification speed of a target detection network model is judged by using FPS, the identification capacity is judged by using average precision MAP, calculation is included when IoU of a prediction frame and an actual frame is larger than 0.5, factors such as detection precision, identification speed, training time and weight are comprehensively considered, and the results in the table 3 show that YOLOv5m is selected as a basic network, so that quick reasoning of the water target and quick deployment of the model can be realized. The final effect of the experiment is shown in fig. 4, the left graph in fig. 4 shows the white heat infrared image recognition result, and the right graph shows the iron red infrared image recognition result.
TABLE 3 Table 3
Figure SMS_17
The foregoing embodiments are preferred embodiments of the present invention, and in addition, the present invention may be implemented in other ways, and any obvious substitution is within the scope of the present invention without departing from the concept of the present invention.
In order to facilitate understanding of the improvements of the present invention over the prior art, some of the figures and descriptions of the present invention have been simplified, and some other elements have been omitted from this document for clarity, as will be appreciated by those of ordinary skill in the art.

Claims (8)

1. The infrared shore line segmentation and target detection fusion method of the unmanned surface vehicle under the dark condition is characterized by comprising the following steps of:
step S1, data set establishment: the unmanned aerial vehicle carries a plurality of infrared thermal imagers, so that the infrared thermal imagers shoot at low altitude under a water area scene, a navigation view angle of the unmanned aerial vehicle is simulated, shooting video is carried out on the water surface, the video is processed to obtain an original image, an annotation tool is used for annotating the original image data to obtain an infrared bank line segmentation data set and an infrared target detection data set, and the two data sets are divided into a training set, a test set and a verification set according to a preset ratio;
step S2, establishing a model: adopting a deep V3 Plus model as a shoreline segmentation network model, and adopting a YOLOV5m model as a target detection network model; setting super parameters of a deep V3 Plus model and a YOLOV5m model, adopting the YOLOV5m model weight based on a voc2012 dataset as a pre-training weight to perform migration learning, then repeatedly training and verifying the deep V3 Plus model by utilizing a training set of an infrared shoreline segmentation dataset, and repeatedly training and verifying the YOLOV5m model by utilizing a training set of an infrared target detection dataset to obtain a training weight based on the infrared shoreline segmentation dataset and the infrared target detection dataset; then, evaluating and predicting the deep labV3 Plus model under the training weight by using a test set and a verification set of the infrared bank line segmentation data set, evaluating and predicting the Yolov5m model under the training weight by using a test set and a verification set of the infrared target detection data set, continuously adjusting the super parameters of the deep labV3 Plus model and the Yolov5m model according to the evaluation and test results, and establishing a Pytorch framework network model in which a decision level carries out cascading on the deep labV3 Plus model and the Yolov5m model after optimizing the super parameters; converting the weight file of the Pytorch frame network model into the weight file of the TensorRT frame network model, and transferring the weight file to an edge computing platform on the unmanned ship;
step S3, model application: and (2) transmitting the infrared shore line segmentation data and the infrared target detection data obtained by processing the water area scene video shot in real time by the thermal infrared imager carried by the unmanned aerial vehicle through the method related to the step (S1) to an edge computing platform, and processing the infrared shore line segmentation data and the infrared target detection data through a TensorRT frame network model to obtain the recognition result of the on-water target and the feasible region.
2. The fusion method for infrared land line segmentation and target detection of the unmanned surface vehicle under the dark condition according to claim 1, wherein the fusion method is characterized by comprising the following steps: step S1, data set establishment: an infrared thermal imager of a M300 unmanned aerial vehicle in a large-scale area is mounted on an unmanned aerial vehicle, the infrared thermal imager shoots at low altitude in a water area scene, a navigation view angle of the unmanned aerial vehicle is simulated, video is shot on the water surface, frame extraction, de-duplication and screening are carried out on the video to obtain an original image, a marking tool Labelimg is used for marking a rectangular frame of an on-water target, and a marking tool Labelme is used for marking a polygonal frame of a feasible region to obtain an infrared bank line segmentation data set and an infrared target detection data set, wherein the infrared bank line segmentation data set comprises three categories including background, water and obstacle; the infrared target detection data set comprises five categories of ship body, onshore person person_shore, ship person person_body, underwater person swimming and dolphin number one dolphin 1; the infrared shore line segmentation data set and the infrared target detection data set are respectively processed according to 8:1:1 into three sub-data sets of a training set, a verification set and a test set, and the sample number of each category in each sub-data set is consistent.
3. The fusion method for infrared land line segmentation and target detection of the unmanned surface vehicle under the dark condition according to claim 2, wherein the fusion method is characterized in that: when the deep V3 Plus model is used as a network model for shoreline segmentation:
adopting a resnet as a trunk feature extraction network, decomposing a standard convolution into a depth convolution and a point-by-point convolution by an encoder main body part, wherein the depth convolution independently uses spatial convolution for each channel, and the point-by-point convolution is used for combining the output of the depth convolution; in the decoder, performing feature extraction by using parallel cavity convolution on the primary effective feature layers compressed four times, respectively using different rates, performing concat merging, and performing 1 multiplied by 1 convolution compression on the features to obtain a feature map; in the decoder, the channel number is adjusted by using 1X 1 convolution for the primary effective feature layer compressed twice, then the primary effective feature layer is stacked with the result of the effective feature up-sampling after the cavity convolution output by the decoder, after the stacking is completed, the depth separable convolution is carried out twice, the final effective feature layer is obtained, the channel adjustment is carried out by using one 1X 1 convolution, the channel adjustment is carried out to num_classes, and finally the up-sampling is carried out by using the resize, so that the width and the height of the final output layer are the same as those of the input picture.
4. The fusion method for infrared shore line segmentation and target detection of the unmanned surface vehicle under the dark condition according to claim 2 or 3, wherein the fusion method is characterized in that: the YOLOV5m model is used as a network model for target detection:
and adopting a dark net-53 as a trunk feature extraction network, carrying out feature extraction on an input image through the dark net53, extracting three feature layers in total in a feature utilization part, carrying out convolution processing on the three feature layers for 5 times, outputting a prediction result corresponding to the feature layers after the processing, and combining other feature layers after a part of the processing is used for carrying out deconvolution umSampling2 d.
5. The fusion method for infrared land line segmentation and target detection of the unmanned surface vehicle under the dark condition according to claim 4, wherein the fusion method is characterized by comprising the following steps: the dark-53 is composed of dark Conv2D and a residual network residual module, the residual convolution in the dark 53 is firstly carried out with a convolution of 3*3 and a step length of 2, then the convolution layer is saved, the convolution of 1*1 and the convolution of 3*3 are carried out again, the layer is added to the result as a final result, then a large number of residual layer jump connections are used, five downsampling is carried out, the step length is 2, the convolution kernel size is 3, the characteristic dimensions are 64, 128, 256, 512 and 1024 respectively, an average pooling layer and a full connection layer are not used, L2 regularization is carried out when each convolution is carried out, and the batch normalization and the LeakyReLU activation function are carried out after the convolution is completed, wherein the Leaky ReLU activation function is:
Figure QLYQS_1
the YOLOV5m model features are subjected to target detection by partially extracting multiple feature layers, three feature layers are extracted in total, the three feature layers are positioned at different positions of a trunk part dark 53 and are respectively positioned at a middle layer, a middle lower layer and a bottom layer, and shapes of the three feature layers are (52,52,256), (26,26,512) and (13,13,1024) respectively; and carrying out convolution treatment on the three feature layers for 5 times, wherein after the treatment is finished, part of the three feature layers are used for outputting a prediction result corresponding to the feature layers, and the other part of the three feature layers are used for carrying out deconvolution Umsampling2d and then are combined with other feature layers.
6. The fusion method for infrared land line segmentation and target detection of the unmanned surface vehicle under the dark condition according to claim 5, wherein the fusion method is characterized by comprising the following steps: the super parameters of the deep V3 Plus model and the YOLOV5m model comprise the size, batch size, iteration times, learning rate and category number of the image samples in the data set to be input.
7. The fusion method for infrared land line segmentation and target detection of the unmanned surface vehicle under the dark condition according to claim 6, wherein the fusion method is characterized by comprising the following steps: in the step S2, after repeated training and verification are performed on the deep v3 Plus model and the YOLOV5m model, the cross entropy loss function and Adam loss function optimizer are adopted to continuously optimize the super parameters of the two models, so as to obtain training weights based on the infrared shoreline segmentation dataset and the infrared target detection dataset.
8. The fusion method for infrared land line segmentation and target detection of the unmanned surface vehicle under the dark condition according to claim 7, wherein the fusion method is characterized in that: in the step S2, when evaluating and predicting the deep labv3 Plus model and the YOLOV5m model under training weight:
1) Evaluation: screening the obtained training weights, and selecting the weight with the lowest total loss and valloss as the weight of the deep V3 Plus model and the YOLOV5m model; respectively testing the deep V3 Plus model and the YOLOV5m model by using a test set of the infrared bank line segmentation data set and the infrared target detection data set to obtain average precision MAP values of the mIoU and the YOLOV5m model of the deep V3 Plus model, and adjusting super parameters of the deep V3 Plus model and the YOLOV5m model according to a required value of an evaluation index, and retraining until the requirements are met; simultaneously, drawing train loss and val loss curves by using a tensorbard tool module under a tensorface frame;
2) And (3) predicting: testing a deep V3 Plus model by using a verification set of an infrared bank line segmentation data set to obtain masks, and calculating IoU of each category to count mIoU; and testing the YOLOV5m model by using a testing set of the infrared target detection data set, and obtaining the precision AP value and the average precision MAP value of each class of target detection by drawing the MAP program.
CN202310166583.4A 2023-02-27 2023-02-27 Infrared shore line segmentation and target detection fusion method for unmanned surface vehicle under dark condition Pending CN116229069A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310166583.4A CN116229069A (en) 2023-02-27 2023-02-27 Infrared shore line segmentation and target detection fusion method for unmanned surface vehicle under dark condition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310166583.4A CN116229069A (en) 2023-02-27 2023-02-27 Infrared shore line segmentation and target detection fusion method for unmanned surface vehicle under dark condition

Publications (1)

Publication Number Publication Date
CN116229069A true CN116229069A (en) 2023-06-06

Family

ID=86588768

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310166583.4A Pending CN116229069A (en) 2023-02-27 2023-02-27 Infrared shore line segmentation and target detection fusion method for unmanned surface vehicle under dark condition

Country Status (1)

Country Link
CN (1) CN116229069A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116721346A (en) * 2023-06-14 2023-09-08 山东省煤田地质规划勘察研究院 Shore line intelligent recognition method based on deep learning algorithm

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116721346A (en) * 2023-06-14 2023-09-08 山东省煤田地质规划勘察研究院 Shore line intelligent recognition method based on deep learning algorithm
CN116721346B (en) * 2023-06-14 2024-05-07 山东省煤田地质规划勘察研究院 Shore line intelligent recognition method based on deep learning algorithm

Similar Documents

Publication Publication Date Title
CN113065558B (en) Lightweight small target detection method combined with attention mechanism
CN108647655B (en) Low-altitude aerial image power line foreign matter detection method based on light convolutional neural network
WO2020156028A1 (en) Outdoor non-fixed scene weather identification method based on deep learning
CN110458844B (en) Semantic segmentation method for low-illumination scene
CN103927531B (en) It is a kind of based on local binary and the face identification method of particle group optimizing BP neural network
CN107145846A (en) A kind of insulator recognition methods based on deep learning
CN109800735A (en) Accurate detection and segmentation method for ship target
CN111079739B (en) Multi-scale attention feature detection method
CN113052834B (en) Pipeline defect detection method based on convolution neural network multi-scale features
CN112001868A (en) Infrared and visible light image fusion method and system based on generation of antagonistic network
CN114724019A (en) Remote sensing image sea ice intelligent monitoring method based on wavelet transformation and separable convolution semantic segmentation
CN113642606B (en) Marine ship detection method based on attention mechanism
CN111985274A (en) Remote sensing image segmentation algorithm based on convolutional neural network
CN113408340B (en) Dual-polarization SAR small ship detection method based on enhanced feature pyramid
CN116229069A (en) Infrared shore line segmentation and target detection fusion method for unmanned surface vehicle under dark condition
CN115410087A (en) Transmission line foreign matter detection method based on improved YOLOv4
CN113469097B (en) Multi-camera real-time detection method for water surface floaters based on SSD network
CN114358178A (en) Airborne thermal imaging wild animal species classification method based on YOLOv5 algorithm
CN113989612A (en) Remote sensing image target detection method based on attention and generation countermeasure network
CN116659516B (en) Depth three-dimensional attention visual navigation method and device based on binocular parallax mechanism
Meng et al. A modified fully convolutional network for crack damage identification compared with conventional methods
CN112132207A (en) Target detection neural network construction method based on multi-branch feature mapping
CN116820131A (en) Unmanned aerial vehicle tracking method based on target perception ViT
CN116977866A (en) Lightweight landslide detection method
CN116310892A (en) Marine rescue method based on improved YOLOV4 target detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination