CN115471746A - Ship target identification detection method based on deep learning - Google Patents

Ship target identification detection method based on deep learning Download PDF

Info

Publication number
CN115471746A
CN115471746A CN202211030296.2A CN202211030296A CN115471746A CN 115471746 A CN115471746 A CN 115471746A CN 202211030296 A CN202211030296 A CN 202211030296A CN 115471746 A CN115471746 A CN 115471746A
Authority
CN
China
Prior art keywords
target
network
deep learning
feature
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211030296.2A
Other languages
Chinese (zh)
Inventor
郭富海
李晨浩
王鸿显
张政
杜鹏
胡春洋
陈秀敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cssc Marine Technology Co ltd
Original Assignee
Cssc Marine Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cssc Marine Technology Co ltd filed Critical Cssc Marine Technology Co ltd
Priority to CN202211030296.2A priority Critical patent/CN115471746A/en
Publication of CN115471746A publication Critical patent/CN115471746A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • G06V10/7515Shifting the patterns to accommodate for positional errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a ship target identification and detection method based on deep learning, which comprises the following steps: (1) collecting an image sample of a ship sailing water area; (2) preprocessing and labeling the image; (3) Performing data enhancement to make a data set for training; (4) constructing a deep learning network model based on a YOLOv4 network; (5) Training a deep learning network by using the pre-trained parameters as initial weights; (6) Inputting the processed picture to be detected into a backbone network for feature extraction, performing feature fusion through a neck network, and performing non-maximum suppression operation to complete the prediction of the ship target; (7) And (4) performing output post-processing, performing result filtering by using a confidence threshold value, and judging by combining other indexes to obtain an optimal detection result. The invention improves the small target detection capability and the multi-target classification effect in the complex sea area.

Description

Ship target identification detection method based on deep learning
Technical Field
The invention belongs to the field of target identification, and particularly relates to a ship target identification detection method based on deep learning.
Background
The target identification is an important subject in the field of intelligent transportation, the demand of marine transportation is rapidly increased due to the continuous development of economic globalization, and the safety of marine navigation is concerned by people with the increasing number of ships. In order to improve the efficiency, reliability and safety of ship navigation, the current shipping industry is gradually developing towards the intelligentized and fully-automated directions of automatic ship driving, automatic obstacle avoidance and automatic wharf entering and leaving, so that the intelligent ship gradually becomes a new research direction.
As the density of traffic flow increases over water, the navigation environment becomes more complex. The target identification capability of the ship is directly related to the safety of sea (river) navigation. Before the convolutional neural network is applied to ship target detection, the traditional ship target detection algorithm mainly comprises region selection, combined feature extraction, background texture modeling and the like. In recent years, deep learning technology is widely used in the field of target detection, which further improves the real-time target identification and detection capability of ships. However, in actual sea navigation, variable natural conditions and complex activity scenes are often accompanied, which makes the target identification and detection effect in complex sea areas not ideal. For example, small target clusters, dense ships with complex types, fuzzy targets caused by the influence of sea surface fog and the like have the problems that the detection of sea areas has higher complexity, the targets are difficult to distinguish and identify accurately, and the requirement on the ship target identification detection capability is higher.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a ship target identification and detection method based on deep learning, which can improve the small target detection capability and the multi-target classification effect in a complex sea area.
In order to achieve the above object, the present invention provides a ship target identification and detection method based on deep learning, which includes the following steps: (1) collecting an image sample of a ship sailing water area; (2) preprocessing and labeling the image; (3) Performing data enhancement to make a data set for training; (4) constructing a deep learning network model based on a YOLOv4 network; (5) Training a deep learning network by using the pre-trained parameters as initial weights; (6) Inputting the processed picture to be detected into a backbone network for feature extraction, performing feature fusion through a neck network, and performing non-maximum suppression operation to complete the prediction of the ship target; (7) And performing output post-processing, performing result filtering by using a confidence coefficient threshold value, and judging by combining other indexes to obtain an optimal detection result.
Further, in the step (2), labeling at least the following five sea surface targets on the acquired image by using Labelimage: bulk carriers, container ships, fishing boats, cruise ships, islands.
Further, in the step (3), a data set for training is manufactured by using a method of Mosaic data enhancement, a group of 4 images is adopted, and the 4 images are spliced in a random scaling, random clipping and random arrangement mode to obtain 4 new images, so that the total input number of the 4 new images is unchanged, and random shielding is performed on the obtained new images.
Further, in the step (4), the deep learning network model includes a backbone feature extraction network, an SPP structure, a PANet multipath feature fusion structure, and a Head detection structure; the backbone feature extraction network uses an RGB image with the size of 640 × 640 as input, and after convolution, batchnormalization and Mish activation functions, the backbone feature extraction network further uses a residual block structure with the sizes of (320, 320, 64), (160, 160, 128), (80, 80, 256), (40, 40, 512), (20, 20, 1024), respectively; after feature extraction is carried out, the output of the last residual block passes through an SPP structure, and after splicing, the result of passing through the CSP and CBL structure and the output results of the penultimate and penultimate residual blocks of the backbone network are used as the input of a PANet structure; the PANet structure performs a series of up-sampling, down-sampling and convolution operations, performs multi-path feature fusion processing on three inputs, and inputs a Head; and the Head outputs target coordinate information of the ship before decoding, including a target frame abscissa x, a target frame ordinate y, a target frame width w, a target frame height h, a classification confidence coefficient and a target existence confidence coefficient.
Further, in the step (5), a self-adaptive anchor frame calculation module is introduced, an anchor frame is automatically calculated in the training process, a prediction frame is output by the network on the basis of the initial anchor frame, and then the prediction frame is compared with a real frame to calculate the difference between the prediction frame and the real frame, and then the network parameters are updated reversely and iterated.
Further, the backbone network in the step (6) is Darknet-53, a feature attention module FA is embedded into the adjusted residual error structure in the Darknet-53, feature weights in feature channel relations are redistributed, 1 × 1 and 3 × 3 convolutions are added before global average pooling, cross-channel information integration is realized, and the spatial connectivity of ship images is enhanced; and then converting the global space information of the characteristic diagram into one-dimensional vector summation through global average pooling to obtain the global information of the characteristic diagram.
Further, the global average pooling formula is as follows:
Figure BDA0003817021490000031
where Gc is the vector sum after global average pooling of feature maps, H and W are the width and height of the input feature map, and Uc (i, j) is the value of the c-th channel Uc at (i, j).
Further, in the step (6), the image enhanced by the Mosaic data is adaptively added with the least black edges, is subjected to standardization processing, is scaled to 640 × 640, and is converted into an RGB picture; inputting the standardized pictures into a trained network to obtain the output of the Head; the output of Head will include three feature layers, each feature layer divided into 20 × 20, 40 × 40 and 80 × 80 grids, each grid point will correspond to three anchors, each anchor performs center shift and length-width scaling within its corresponding grid point; for decoding, firstly scaling the prediction result according to the original size of the corresponding anchor, then calculating the length, width and position of a prediction frame relative to a standardized input image according to the offset of grid division and the center of the anchor, and finally filtering redundant prediction according to a gray frame added during standardized processing; after decoding, non-maximum suppression operation is carried out, and one target with the highest confidence coefficient is directly selected as output.
Further, in the step (7), output region threshold filtering is performed on the output of the step (6) first, so that prediction is prevented from being given by a network when no ship target exists, and false detection is reduced; and then using the confidence threshold value to carry out final result filtering, namely outputting the ship target with the confidence coefficient larger than the threshold value as a final prediction.
Further, the output region threshold comprises a width direction threshold and a height direction threshold, wherein the width direction threshold is a distance between a center coordinate of the ship target and a boundary of the picture where the ship target is located, and the height direction threshold is a ratio of the width to the height of the whole picture.
Further, the other indexes in the step (7) comprise a target boundary box, a positioning confidence coefficient and all category probability maps, and the ship target with the positioning confidence coefficient larger than a threshold value is output as a prediction result; the offset between the target bounding box and the prediction bounding box is smaller than a certain value, and the probability in all the class probability graphs is larger than the detected target.
Compared with the prior art, the invention has the beneficial effects that:
1. the YOLOv4 is improved to obtain a deep learning network model Ship-YOLOv4, and for the problem of insufficient training data, mosaic data enhancement, adaptive picture scaling optimization and K-means clustering algorithm adaptive anchor frame are performed at the input end, so that the generalization capability of the frame is improved, and the occurrence of overfitting is avoided.
2. And a feature attention module is constructed based on an attention mechanism and is embedded into Darknet-53 for feature recalibration, so that the feature extraction capability of the model in a complex environment is improved.
3. Aiming at the problems of insufficient bottom layer feature semantic information and feature disappearance caused by too deep network in the feature fusion process, the PANet multi-path feature fusion structure is optimized, and multi-level feature information is fused, so that the relevance of a receiving domain of a network layer and a feature extraction network is enhanced, and the small and medium target detection capability and the multi-target classification effect in a complex sea area are improved. Experiments are carried out in a user-defined data set, and the superiority of the method in ship target identification and detection is fully verified.
Drawings
FIG. 1 is a diagram of an algorithm architecture according to one embodiment of the present invention;
FIG. 2 is a flow chart of one embodiment of the present invention;
FIG. 3 is a graph of data set analysis results according to one embodiment of the present invention;
FIG. 4 is a flow diagram of a data enhancement implementation of one embodiment of the present invention;
FIG. 5 is a network architecture diagram of one embodiment of the present invention;
FIG. 6 is a block diagram of an optimized PANET implementation according to one embodiment of the present invention;
FIG. 7 is a diagram illustrating the effectiveness of network training according to an embodiment of the present invention;
FIG. 8 is a diagram illustrating the detection effect of one embodiment of the present invention.
Detailed Description
The technical solution of the present invention is further explained with reference to the accompanying drawings and the specific embodiments.
As shown in fig. 1 to 8, an embodiment of the ship target identification and detection method based on deep learning of the present invention includes the following steps: (1) collecting an image sample of a ship sailing water area; (2) preprocessing and labeling the image; (3) Performing data enhancement to make a data set for training; (4) constructing a deep learning network model based on a YOLOv4 network; (5) Training a deep learning network by using the pre-trained parameters as initial weights; (6) Inputting the processed picture to be detected into a backbone network for feature extraction, performing feature fusion through a neck network, and performing non-maximum suppression operation to complete the prediction of the ship target; (7) And performing output post-processing, performing result filtering by using a confidence coefficient threshold value, and judging by combining other indexes to obtain an optimal detection result.
As shown in fig. 1, a single-stage detection and identification algorithm framework is adopted, and the single-stage detection and identification algorithm framework mainly comprises a feature extraction module, a feature fusion module, a detection classification module and the like, wherein the feature extraction module is constructed on the basis of an attention mechanism, the obtained feature information is subjected to feature fusion by an optimized PANet multipath feature fusion structure, and finally, image features are transmitted to a detection classifier to predict the image features and judge the position of a boundary frame and the class of a ship in the frame.
As shown in fig. 2, the method can be divided into seven steps: the method comprises the steps of image acquisition, image annotation, data enhancement, model construction, network training, network prediction and output post-processing.
In one embodiment, in the step (2), labeling at least the following five sea surface targets is performed on the acquired image by using LabelImage: bulk carriers, container ships, fishing boats, cruise ships, islands. Specifically, a numerical label is added to each image sample of the obtained ship target image sample, and the label format is a COCO format, such as an image added value "0" without a ship target, an image added value "1" with a mail ship, an image added value "2" with a container ship, an image added value "3" with a bulk cargo ship, an image added value "4" with a fishing ship and an image added value "5" with an island reef. Only by correct labeling contents can the trained model be guaranteed to have a good effect in actual operation. The normalized data contains five sea surface targets: bulk carriers, container ships, fishing boats, cruise ships, islands. The data analysis results are shown in fig. 3.
In one embodiment, in the step (3), a data set for training is manufactured by using a method of mosaics data enhancement, a group of 4 images is adopted, and the 4 images are spliced in a random scaling, random clipping and random arrangement mode to obtain 4 new images, so that the total input number of the 4 new images is unchanged, and random shielding is performed on the obtained new images, thereby greatly enriching the detection data set, particularly increasing many small targets by random scaling, and making the robustness of the network better. However, the introduction of the Mosaic data enhancement leads to the reduction of the target and the deterioration of the generalization capability of the model, so the number of pictures of the Mosaic data enhancement is selected to be 4. Based on the collected image information, a data set for training is manufactured by using a method of Mosaic data enhancement, and the implementation process is shown in FIG. 4, so that the data set is greatly enriched, and particularly, many small targets are added by random scaling, so that the robustness of the network is better. The size of the image input by single iteration is not required to be large, and the method is more friendly to single GPU training. The training set accounts for 80% of the generated data set, and the testing set accounts for 20%.
In one embodiment, in the step (4), the deep learning network model includes a backbone feature extraction network, an SPP structure, a PANet multipath feature fusion structure and a Head detection Head structure; the backbone feature extraction network uses an RGB image with the size of 640 x 640 as input, and passes through residual block structures with the sizes of (320, 320, 64), (160, 160, 128), (80, 80, 256), (40, 40, 512), (20, 20, 1024) after convolution, batch Normalization and Mish activation functions; after feature extraction, the output of the last residual block passes through an SPP structure, and after splicing, the result of the CSP and CBL structure and the output results of the penultimate and penultimate residual blocks of the main network are used as the input of a PANet structure; the PANet structure performs a series of up-sampling, down-sampling and convolution operations, performs multi-path feature fusion processing on three inputs, and inputs a Head; the Head outputs target coordinate information of the ship before decoding, including a target frame abscissa x, a target frame ordinate y, a target frame width w, a target frame height h, a classification confidence and a target existence confidence.
In this embodiment, referring to a Yolov4 network structure, an input end, a backbone network structure, a neck network structure, and an output end are improved, and a deep learning network model Ship-Yolov4 for Ship target identification and detection is constructed, where the network structure is shown in fig. 5 and can be divided into three major parts: backone is a feature extraction network CSPDarknet53 introducing a feature attention module FA, neck is composed of an SPP and an optimized PANET multipath feature fusion structure, and Head is a detection structure. The feature extraction network uses an RGB image with a size of 640 × 640 as an input, and after convolution, batch Normalization and Mish activation functions, passes through a residual block structure with sizes of (320, 320, 64), (160, 160, 128), (80, 80, 256), (40, 40, 512), (20, 20, 1024), respectively; after feature extraction, the output of the last residual block passes through an SPP structure, and after splicing, the result of the CSP and CBL structure and the output results of the penultimate and penultimate residual blocks of the main network are used as the input of the PANet structure.
The PANet architecture is shown in fig. 6, where part (a) and (b) are top-down network architectures with feature fusion by lateral connection, and part (c) is bottom-up network architecture with lateral connection maintained while aggregating lower-level features. Each top-level feature is generated by fusing three different path features, in such a way that the low-level feature, the intermediate feature and the top-level feature are fused with each other in fig. 6. And (4) giving a prediction result of each layer of characteristics in the characteristic uploading process, and then inputting Head. And outputting x, y, w and h coordinate information of the ship target before decoding, classification confidence and target existence confidence by the Head.
In one embodiment, in the step (5), an adaptive anchor frame calculation module is introduced to automatically calculate an anchor frame in the training process, the network outputs a prediction frame on the basis of the initial anchor frame, and then compares the prediction frame with a real frame to calculate the difference between the prediction frame and the real frame, and then reversely updates and iterates network parameters. When the network is trained, the parameters of the pre-training are used as initial weights, the file of the pre-training weights is yolov4.Conv.137, and the training deep learning network uses the following training parameters: the learning rate was 0.001, batch was 64, subdivisions was 16, and the partition ratio of the training set validation set to the data set was 0.9 and 0.1. The network training strategy is to use pre-trained parameters as initial weights, then freeze the weights of the Backbone part, train the rest part for 50 epochs, and finally unfreeze all the weights and train the rest part for 50 epochs. The training equipment uses 2 Nvidia RTX 2080Ti model GPUs for about 16 hours. Training results when epoch is 100, the verification set loss is 3.1534. The training effect is shown in fig. 7. The map value is larger and smaller, and the loss value is smaller and smaller in the training process.
In one embodiment, the backbone network in the step (6) is Darknet-53, the feature attention module FA is embedded into the adjusted residual error structure in the Darknet-53, the feature weights in the feature channel relationship are redistributed, and 1 × 1 and 3 × 3 convolutions are added before the global average pooling, so that cross-channel information integration is realized, and the spatial connectivity of the ship image is enhanced; and then converting the global space information of the feature map into one-dimensional vector summation through global average pooling to obtain the global information of the feature map.
In one embodiment, the global average pooling formula is as follows:
Figure BDA0003817021490000091
where Gc is the vector sum after global average pooling of feature maps, H and W are the width and height of the input feature map, and Uc (i, j) is the value of the c-th channel Uc at (i, j).
SPP max pooling is applied to convolution kernels of sizes 5 × 5, 9 × 9, 13 × 13, with the spatial dimensions preserved. Feature maps from different kernel sizes are concatenated together as output. Compared with a pure k × k maximum pooling mode, the method has the advantages that the receiving range of the trunk features is effectively increased, and the most important context features are obviously separated.
The CBL is a basic component in the multi-path feature fusion structure of the PANet, and is composed of three layers, namely, a common convolutional layer Conv + a normalization layer Bn + an activation function layer SiLU.
The SiLu function is a variant of the Sigmoid function, and has the functional form:
SiLu(x)=x*Sigmoid(x)
Figure BDA0003817021490000101
concat concatenates the two tensors, expanding the dimensionality of the two tensors.
The CSP structure replaces the reset with the ordinary CBL and is applied to the Neck. The backhaul is a deeper network, and the increase of a residual structure can enhance the gradient value of backward propagation between layers, so that the disappearance of the gradient caused by deepening is avoided, and the characteristics of finer granularity are extracted without worrying about network degradation.
In one embodiment, in the step (6), a minimum black edge is adaptively added to the image enhanced by the Mosaic data, the image is normalized, scaled to 640 × 640, and converted into an RGB picture; inputting the standardized pictures into a trained network to obtain the output of Head; the output of Head will include three feature layers, each feature layer divided into 20 × 20, 40 × 40 and 80 × 80 grids, each grid point will correspond to three anchors, each anchor performs center shift and length-width scaling within its corresponding grid point; for decoding, firstly scaling the prediction result according to the original size of the corresponding anchor, then calculating the length, width and position of a prediction frame relative to a standardized input image according to the offset of grid division and the center of the anchor, and finally filtering redundant prediction according to a gray frame added during standardized processing; after decoding, non-maximum suppression operation is carried out, and one target with the highest confidence coefficient is directly selected as output, so that the detection speed is improved.
In one embodiment, in the step (7), output region threshold filtering is performed on the output of the step (6) first, so that the prediction is prevented from being given by the network when no ship target exists, and false detection is reduced; and then using the confidence threshold value to carry out final result filtering, namely outputting the ship target with the confidence coefficient larger than the threshold value as a final prediction.
In one embodiment, the output region threshold includes two parts, namely a width direction threshold and a height direction threshold, wherein the width direction threshold is the distance between the center coordinate of the ship target and the boundary of the picture, and the height direction threshold is the ratio of the width to the height of the whole picture.
In one embodiment, the other indexes in the step (7) comprise an object boundary box, a positioning confidence level and a probability map of all categories, and the ship object with the positioning confidence level larger than a threshold value is output as a prediction result; the offset between the target bounding box and the prediction bounding box is smaller than a certain value, and the probability in all the category probability graphs is larger than the detected target.
In the embodiment of the invention, ship-YOLOv4 does not need to generate the region of interest in advance, and the network can be directly trained in a regression mode. Clustering the boundary frames of the training samples by using a K-means algorithm, presetting 3 groups of predefined boundary frames according to 3 scales, and performing subsequent positioning prediction based on the 9 boundary frames. First, feature extraction is performed on an original 640 × 640 input image through a feature extraction network, and then feature vectors are input into the SPP and PANet structures, so as to generate 3 mesh regions, which are 20 × 20, 40 × 40, and 80 × 80, respectively. Three bounding boxes are predicted per mesh region. To generate (80 × 80+40 × 40+20 × 20 × 3=25200 bounding boxes, one vector N is predicted in each boundary, the combination of the vectors N is shown as follows:
N=(t x +t y +t w +t h )+N 0 +(N 1 +N 2 +…+Nn)
t x 、t y denotes the horizontal and vertical coordinates, t, of the bounding box w 、t h Indicating the width and height of the bounding box. N is a radical of hydrogen 0 … Nn represent the probability values for objects in the prediction box.
The distance calculation formula from the center of the final prediction result boundary box to the upper left corner of the feature map and the length and width calculation formula of the prediction boundary box are as follows:
b x =δ(t x )+Cx
b y =δ(t y )+Cy
b w =p w ×e tw
b h =p h ×e th
delta denotes Sigmoid function, C x And C y Indicating the offset, p, of the grid to which the bounding box belongs relative to the upper left corner of the picture h And p w Representing the length and width of a predefined bounding box, b x And b y Representing final prediction boundariesDistance from center of frame to upper left corner of picture, b h And b w Representing the length and width of the prediction bounding box.
In the training, CIOU _ Loss is used as the Loss of a target Bounding box, and the length-width ratio of a prediction box and a target box is considered, wherein the CIOU _ Loss formula is as follows:
Figure BDA0003817021490000121
α is a weight parameter defined as:
Figure BDA0003817021490000122
v is a measure of the similarity of aspect ratios, defined as:
Figure BDA0003817021490000123
the partial derivatives of v for w and h in CIOU _ Loss optimization need to be defined, namely:
Figure BDA0003817021490000124
Figure BDA0003817021490000131
w, h is in [0,1]In case of w 2 +h 2 Are usually very small and may lead to gradient explosions, to avoid this problem
Figure BDA0003817021490000132
It is replaced by 1 when implemented.
In summary, CIOU _ Loss regresses the target box with three important geometric factors: overlap area, center point distance, aspect ratio are all included.
As shown in fig. 8, the method of the embodiment of the present invention can identify the target information of bulk carriers, container ships, fishing boats, cruise ships and islands, and the performance and the identification accuracy are significantly improved compared with YOLOv4.
Finally, it should be noted that: although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (11)

1. A ship target identification detection method based on deep learning is characterized by comprising the following steps:
(1) Acquiring an image sample of a ship navigation water area;
(2) Preprocessing and labeling the image;
(3) Performing data enhancement to make a data set for training;
(4) Constructing a deep learning network model based on a YOLOv4 network;
(5) Training a deep learning network by using the pre-trained parameters as initial weights;
(6) Inputting the processed picture to be detected into a backbone network for feature extraction, performing feature fusion through a neck network, and performing non-maximum suppression operation to complete the prediction of the ship target;
(7) And performing output post-processing, performing result filtering by using a confidence coefficient threshold value, and judging by combining other indexes to obtain an optimal detection result.
2. The vessel target recognition detection method based on deep learning of claim 1, wherein in the step (2), labeling at least the following five sea surface targets is performed on the acquired image by using Labelimage: bulk carriers, container ships, fishing boats, cruise ships, islands.
3. The vessel target recognition detection method based on deep learning according to claim 1, wherein in the step (3), a mosaics data enhancement method is used to produce a data set for training, a group of 4 pictures is adopted, and the 4 pictures are spliced in a manner of random scaling, random cutting and random arrangement to obtain 4 new images, so that the total input number of the 4 new images is unchanged, and random shielding is performed on the obtained new images.
4. The deep learning based ship target identification detection method according to claim 3, wherein in the step (4), the deep learning network model comprises a backbone feature extraction network, an SPP structure, a PANet multipath feature fusion structure and a Head detection structure; the backbone feature extraction network uses an RGB image with the size of 640 x 640 as input, and passes through residual block structures with the sizes of (320, 320, 64), (160, 160, 128), (80, 80, 256), (40, 40, 512), (20, 20, 1024) after convolution, batch Normalization and Mish activation functions; after feature extraction, the output of the last residual block passes through an SPP structure, and after splicing, the result of the CSP and CBL structure and the output results of the penultimate and penultimate residual blocks of the main network are used as the input of a PANet structure; the PANet structure performs a series of up-sampling, down-sampling and convolution operations, performs multi-path feature fusion processing on three inputs, and inputs a Head; and the Head outputs target coordinate information of the ship before decoding, including a target frame abscissa x, a target frame ordinate y, a target frame width w, a target frame height h, a classification confidence coefficient and a target existence confidence coefficient.
5. The vessel target recognition detection method based on deep learning of claim 1, wherein in the step (5), an adaptive anchor frame calculation module is introduced, an anchor frame is automatically calculated in a training process, a network outputs a prediction frame on the basis of an initial anchor frame, and then the prediction frame is compared with a real frame, the difference between the two frames is calculated, and then reverse updating and network parameters are iterated.
6. The vessel target identification detection method based on deep learning of claim 1, wherein the backbone network in step (6) is Darknet-53, a feature attention module FA is embedded in the adjusted residual error structure in the Darknet-53, feature weights in feature channel relationships are redistributed, 1 × 1 and 3 × 3 convolutions are added before global average pooling, cross-channel information integration is realized, and spatial connectivity of vessel images is enhanced; and then converting the global space information of the feature map into one-dimensional vector summation through global average pooling to obtain the global information of the feature map.
7. The deep learning-based ship target identification detection method according to claim 6, wherein the global average pooling formula is as follows:
Figure FDA0003817021480000031
where Gc is the vector sum after global average pooling of feature maps, H and W are the width and height of the input feature map, and Uc (i, j) is the value of the c-th channel Uc at (i, j).
8. The vessel target recognition detection method based on deep learning of claim 4, wherein in the step (6), the least black edges are adaptively added to the image enhanced by the Mosaic data, normalized, scaled to 640 × 640 size, and converted into the RGB picture; inputting the standardized pictures into a trained network to obtain the output of Head; the output of Head will include three feature layers, each feature layer divided into 20 × 20, 40 × 40 and 80 × 80 grids, each grid point will correspond to three anchors, each anchor performs center shift and length-width scaling within its corresponding grid point; for decoding, firstly scaling the prediction result according to the original size of the corresponding anchor, then calculating the length, width and position of a prediction frame relative to a standardized input image according to the offset of grid division and the center of the anchor, and finally filtering redundant prediction according to a gray frame added during standardized processing; after decoding, non-maximum suppression operation is carried out, and one target with the highest confidence coefficient is directly selected as output.
9. The vessel target recognition detection method based on deep learning of claim 4, wherein in the step (7), output region threshold filtering is performed on the output of the step (6) first, so that prediction is prevented from being given by a network when no vessel target exists, and false detection is reduced; and then using the confidence threshold value to carry out final result filtering, namely outputting the ship target with the confidence coefficient larger than the threshold value as a final prediction.
10. The vessel target recognition detection method based on deep learning of claim 9, wherein the output region threshold includes two parts, namely a width direction threshold and a height direction threshold, wherein the width direction threshold is a distance from a center coordinate of the vessel target to a boundary of a picture where the vessel target is located, and the height direction threshold is a ratio of a width to a height of the whole picture.
11. The vessel target recognition detection method based on deep learning of claim 1, wherein the other indexes in step (7) include a target bounding box, a positioning confidence level, and all class probability maps, and the vessel target with the positioning confidence level greater than a threshold is output as a prediction result; the offset between the target bounding box and the prediction bounding box is smaller than a certain value, and the probability in all the category probability graphs is larger than the detected target.
CN202211030296.2A 2022-08-26 2022-08-26 Ship target identification detection method based on deep learning Pending CN115471746A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211030296.2A CN115471746A (en) 2022-08-26 2022-08-26 Ship target identification detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211030296.2A CN115471746A (en) 2022-08-26 2022-08-26 Ship target identification detection method based on deep learning

Publications (1)

Publication Number Publication Date
CN115471746A true CN115471746A (en) 2022-12-13

Family

ID=84370529

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211030296.2A Pending CN115471746A (en) 2022-08-26 2022-08-26 Ship target identification detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN115471746A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116152580A (en) * 2023-04-18 2023-05-23 江西师范大学 Data processing detection method and data training method for small targets in complex scene
CN116503737A (en) * 2023-05-10 2023-07-28 中国人民解放军61646部队 Ship detection method and device based on space optical image
CN117058081A (en) * 2023-08-02 2023-11-14 苏州弗莱威智能科技有限公司 Corner and surface defect detection method for photovoltaic glass

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116152580A (en) * 2023-04-18 2023-05-23 江西师范大学 Data processing detection method and data training method for small targets in complex scene
CN116152580B (en) * 2023-04-18 2023-08-15 江西师范大学 Data training method for small target in complex scene
CN116503737A (en) * 2023-05-10 2023-07-28 中国人民解放军61646部队 Ship detection method and device based on space optical image
CN116503737B (en) * 2023-05-10 2024-01-09 中国人民解放军61646部队 Ship detection method and device based on space optical image
CN117058081A (en) * 2023-08-02 2023-11-14 苏州弗莱威智能科技有限公司 Corner and surface defect detection method for photovoltaic glass

Similar Documents

Publication Publication Date Title
Chen et al. A deep neural network based on an attention mechanism for SAR ship detection in multiscale and complex scenarios
Zhang et al. Balance learning for ship detection from synthetic aperture radar remote sensing imagery
CN110084234B (en) Sonar image target identification method based on example segmentation
CN115471746A (en) Ship target identification detection method based on deep learning
CN113569667B (en) Inland ship target identification method and system based on lightweight neural network model
CN111079739B (en) Multi-scale attention feature detection method
CN117253154B (en) Container weak and small serial number target detection and identification method based on deep learning
CN109359661B (en) Sentinel-1 radar image classification method based on convolutional neural network
CN110991257B (en) Polarized SAR oil spill detection method based on feature fusion and SVM
CN111723632B (en) Ship tracking method and system based on twin network
CN113420759B (en) Anti-occlusion and multi-scale dead fish identification system and method based on deep learning
CN113743322A (en) Offshore ship detection method based on improved YOLOv3 algorithm
CN113743505A (en) Improved SSD target detection method based on self-attention and feature fusion
Zhou et al. YOLO-ship: a visible light ship detection method
CN114565824A (en) Single-stage rotating ship detection method based on full convolution network
Chen et al. Orientation-aware ship detection via a rotation feature decoupling supported deep learning approach
Liu et al. An approach to ship target detection based on combined optimization model of dehazing and detection
CN117036656A (en) Water surface floater identification method under complex scene
Ruan et al. Dual-Path Residual “Shrinkage” Network for Side-Scan Sonar Image Classification
CN116863293A (en) Marine target detection method under visible light based on improved YOLOv7 algorithm
Zhou et al. A real-time scene parsing network for autonomous maritime transportation
CN114255385B (en) Optical remote sensing image ship detection method and system based on sensing vector
Li et al. Research on ROI algorithm of ship image based on improved YOLO
CN115496998A (en) Remote sensing image wharf target detection method
Cai et al. Obstacle Detection of Unmanned Surface Vessel based on Faster RCNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination