CN114241189A

CN114241189A - Ship black smoke identification method based on deep learning

Info

Publication number: CN114241189A
Application number: CN202111441778.2A
Authority: CN
Inventors: 胡里阳; 叶智锐; 邵宜昌; 王超; 张耀玉; 吴浩
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2022-03-25

Abstract

The invention discloses a ship black smoke identification method based on deep learning, which comprises the following steps: s1, constructing a data set; s2, preprocessing a data set; s3, constructing a ship black smoke recognition model; and S4, real-time monitoring. The method is based on an improved YOLO v4 network model, loss weights of difficult samples are increased by pertinently modifying a loss function, and therefore the identification difficulty caused by the imbalance of black smoke sample sets can be overcome. The method keeps a higher identification speed on the basis of high identification precision, can meet the requirements of relevant management departments on the accuracy and the real-time performance of ship black smoke identification, and is suitable for picture detection and video detection. In addition, the hybrid mosaic data amplification method provided by the invention can ensure that the network model is stably trained at high speed in a single GPU environment, saves the computing resource and training time of the target recognition algorithm, and has great industrial production value and popularization value.

Description

Ship black smoke identification method based on deep learning

Technical Field

The invention relates to a ship black smoke identification method based on deep learning, and belongs to the technical field of ship black smoke identification.

Background

The black smoke of the ship is caused by insufficient combustion of the diesel oil sprayed when the diesel engine works, so that the machine abrasion is increased to a certain extent, the service life of the diesel engine is reduced, and meanwhile, a large amount of carbon deposition generated can seriously affect the quality of the air on the sea, so that the ship needs to be identified accurately in real time from the viewpoints of energy conservation and environmental protection. The current target identification method mainly comprises a traditional image processing method at a pixel level and an emerging deep learning method. The traditional method takes a Vibe algorithm as an example, and the method can only detect a moving object but cannot detect the specific fact of the moving object; in addition, the method is greatly different from a deep learning method in the aspect of detection precision, so that most industrial target identification tasks adopt an algorithm based on deep learning. The target recognition algorithm based on deep learning mainly comprises two major classes, one class is a two-stage class, the method firstly generates a plurality of candidate regions, and then corrects and classifies each candidate region, and a typical algorithm mainly comprises an R-CNN series, such as Fast R-CNN, Fast R-CNN and the like; the other type is a one-stage type, the method can complete the whole identification process only by sending pictures into the model once, and typical algorithms are YOLO series (v1, v2, v3, v4), SSD and the like.

Generally, one-stage type algorithms have a higher recognition speed than the two-stage type, but are inferior to the two-stage type in recognition accuracy, and few algorithms currently can make a good balance between recognition accuracy and recognition speed. The YOLO v4 network model proposed by Alexey Bochkovski in 2020 can ensure accuracy and has excellent recognition speed, and can achieve real-time effect, but the network is mainly designed for recognizing general categories in daily life, and the recognition of the special category of black smoke is difficult to complete. In addition, the recognition method based on deep learning needs a large amount of training data to provide support, the requirements on the number of samples and the diversity of the samples are high, the number of black smoke samples is relatively small, the diversity of the samples is not rich enough, and the existing method has a lot of technical problems in recognition of the unbalanced sample set.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the ship black smoke identification method based on deep learning can overcome the difficulty of unbalanced black smoke ship training samples, solve the problem of illegal ship automatic detection, achieve the purpose of real-time and accurate identification of underway black smoke ships, further help relevant management departments to reduce manpower, improve law enforcement efficiency and respond to the national call for atmosphere pollution prevention and control work.

The invention adopts the following technical scheme for solving the technical problems:

a ship black smoke identification method based on deep learning comprises the following steps:

step 1, acquiring an original data set, wherein the original data set is not less than 2000 effective pictures, the resolution of each picture is not less than 608 multiplied by 608, and the pictures are numbered in sequence;

step 2, preprocessing the original data set to obtain a preprocessed data set;

step 3, constructing a ship black smoke recognition network model, and training the ship black smoke recognition network model by using the preprocessed data set to obtain a trained ship black smoke recognition network model;

and 4, inputting the pictures or videos captured in real time into the trained ship black smoke recognition network model, outputting a detection result, and judging whether the ship violates rules or not by discharging black smoke according to the detection result.

As a preferred embodiment of the present invention, the specific process of acquiring the original data set in step 1 is as follows:

step 1-1, snapping ship pictures by using a pan-tilt camera to form a snapping data set, wherein the snapping data set comprises the ship pictures with black smoke and the ship pictures without the black smoke;

step 1-2, crawling black smoke related pictures from a picture website by using a web crawler technology, and manually screening to form an amplification data set;

the snapshot dataset and the augmented dataset together comprise an original dataset.

As a preferred embodiment of the present invention, the specific process of step 2 is as follows:

step 2-1, performing enhancement transformation on an original data set by using an image enhancement mode to obtain an enhanced data set, wherein the image enhancement mode comprises random turning, random cutting, random erasing, random brightness adjustment, random gelatinization and histogram equalization;

step 2-2, manually marking the enhanced data set by using a labelImg image marking tool, respectively marking the positions of the ship and the black smoke in the picture by using a marking frame, and marking a tag book and a cookie;

and 2-3, amplifying the enhanced data set by using a mixed mosaic amplification method, and fusing manual labeling frames to obtain a preprocessed data set.

As a preferred scheme of the invention, the specific process of the step 2-3 is as follows:

firstly, randomly selecting two pictures from the enhanced data set, and superposing the two pictures according to the transparency of 0.5 to form a new picture set, wherein the number of the pictures in the new picture set is

n is the number of the pictures in the enhanced data set, then four pictures are randomly selected from the new picture set to be spliced together to synthesize a new picture, the new picture is zoomed into 608 multiplied by 608 size, and finally the fusion of the manual labeling frames is carried out to obtain the preprocessed data set, wherein the number of the pictures in the preprocessed data set is

As a preferred embodiment of the present invention, the specific process of step 3 is as follows:

step 3-1, configuring a running environment of a ship black smoke recognition network model YOLO v 4;

step 3-2, building a ship black smoke recognition network model YOLO v4, wherein the model YOLO v4 comprises a feature extraction backbone network CSPdark net53, a neck network SPP + PANnet and a head network YOLO head;

3-3, loading a pre-training weight file into YOLO v4 to obtain a pre-training ship black smoke recognition network model;

step 3-4, modifying the regression loss part in the loss function of the pre-training ship black smoke recognition network model into CIOU loss, increasing attenuation coefficient of the classification loss part, modifying into focus classification loss aiming at the unbalanced sample, and obtaining the improved ship black smoke recognition network model, wherein the improved loss function calculation formula is as follows:

in the formula, the first row is the CIOU loss of a regression box and a real box ground truth box, the second row and the third row are the confidence loss of a detection object and the confidence loss of the detection object respectively, and the fourth row is the focus classification loss of each category;

S²representing input picture split into S²A detection grid, each grid generating B possible prediction frames, λ_coordRepresenting the weight occupied by the CIOU loss part;

if the jth prediction box of the ith grid is responsible for a certain detection object obj, the value is 1, otherwise, the value is 0; IoU denotes the jth prediction box of the ith mesh

And the real frame B existing in the ith mesh_iThe ratio of intersection to union of;

the coordinates A of the center point of the jth prediction box representing the ith grid_ctrAnd the coordinates B of the center point of the real frame existing in the ith mesh_ctrThe Euclidean distance between;

the length of a diagonal line of a minimum bounding box formed by a real box contained in the ith grid and a jth prediction box in the ith grid is taken as the length of the diagonal line of the minimum bounding box; v represents a parameter used to measure aspect ratio; alpha represents a weight coefficient occupied by upsilon, and the specific calculation formula is as follows:

wherein, w^gt、h^gtRespectively representing the width and the height of a real box, and w and h respectively representing the width and the height of a prediction box;

the confidence loss adopts a binomial cross entropy loss function, C_i,

Respectively representing the true and predicted classes of the detected object, λ_noobjWeight of confidence loss when the detected target is not included;

if the jth prediction box of the ith grid is not responsible for detecting the object obj, the value is 1, otherwise, the value is 0;

the classification loss adopts a binomial cross entropy loss function, classes represent the number of classes of detection tasks, and p_i(c) A summary of the ith mesh belonging to the c-th categoryA rate, γ, is an attenuation coefficient greater than 1;

and 3-5, inputting the preprocessed data set into the improved ship black smoke recognition network model, training by using a random small batch gradient descent algorithm, updating the weight of each layer of neurons in the network, and stopping until the loss function value is not descended any more to obtain the trained ship black smoke recognition network model.

As a preferred scheme of the present invention, the feature extraction backbone network CSPdarknet53 in step 3-2 includes the following modules connected in sequence: a first CBM module, a second CBM module, a first CSP module, a third CBM module, a second CSP module, a fourth CBM module, a third CSP module, a fifth CBM module, a fourth CSP module, a sixth CBM module and a fifth CSP module;

each CBM module is formed by connecting a 2D convolution layer, a batch-in-one layer and a Mish activation function layer in series; the first CBM module to the sixth CBM module respectively correspondingly comprise 32 3 multiplied by 3 convolution kernels, 64 3 multiplied by 3 convolution kernels, 128 3 multiplied by 3 convolution kernels, 256 3 multiplied by 3 convolution kernels, 512 3 multiplied by 3 convolution kernels and 1024 3 multiplied by 3 convolution kernels;

the input of the first CSP module sequentially passes through 1 CBM module, 1 residual error unit and 1 CBM module to obtain a first output, meanwhile, the input of the first CSP module passes through 1 CBM module to obtain a second output, and the first output and the second output are connected to serve as the output of the first CSP module; 3 CBM modules in the first CSP module all contain 64 1 × 1 convolution kernels, the input of the residual error unit enters the addition layer together with the input of the residual error unit after sequentially passing through 2 CBM modules, and the convolution kernels contained in 2 CBM modules in the residual error unit are respectively: 32 1 × 1 convolution kernels, 64 3 × 3 convolution kernels;

the input of the second CSP module sequentially passes through 1 CBM module, 2 residual error units and 1 CBM module to obtain a third output, meanwhile, the input of the second CSP module passes through 1 CBM module to obtain a fourth output, and the third output and the fourth output are connected to be used as the output of the second CSP module; the 3 CBM modules in the second CSP module respectively comprise 64 convolution kernels of 1 × 1, the input of each residual error unit sequentially passes through the 2 CBM modules and then enters the addition layer together with the input of the residual error unit, and the convolution kernels of the 2 CBM modules in the residual error unit respectively are as follows: 64 1 × 1 convolution kernels, 64 3 × 3 convolution kernels;

the input of the third CSP module sequentially passes through 1 CBM module, 8 residual error units and 1 CBM module to obtain a fifth output, meanwhile, the input of the third CSP module passes through 1 CBM module to obtain a sixth output, and the fifth output and the sixth output are connected to be used as the output of the third CSP module; the 3 CBM modules in the third CSP module each include 128 1 × 1 convolution kernels, the input of each residual error unit sequentially passes through 2 CBM modules and then enters the adder layer together with the input of the residual error unit, and the convolution kernels included in the 2 CBM modules in the residual error unit are: 128 1 × 1 convolution kernels, 128 3 × 3 convolution kernels;

the input of the fourth CSP module sequentially passes through 1 CBM module, 8 residual error units and 1 CBM module to obtain a seventh output, meanwhile, the input of the fourth CSP module passes through 1 CBM module to obtain an eighth output, and the seventh output and the eighth output are connected to serve as the output of the fourth CSP module; the 3 CBM modules in the fourth CSP module each include 256 convolution kernels of 1 × 1, the input of each residual error unit sequentially passes through the 2 CBM modules and then enters the adder layer together with the input of the residual error unit, and the convolution kernels included in the 2 CBM modules in the residual error unit are: 256 1 × 1 convolution kernels, 256 3 × 3 convolution kernels;

the input of the fifth CSP module sequentially passes through 1 CBM module, 4 residual error units and 1 CBM module to obtain a ninth output, meanwhile, the input of the fifth CSP module passes through 1 CBM module to obtain a tenth output, and the ninth output and the tenth output are connected to serve as the output of the fifth CSP module; the 3 CBM modules in the fifth CSP module each include 512 convolution kernels, 1 × 1 convolution kernels, the input of each residual error unit sequentially passes through the 2 CBM modules and then enters the adder layer together with the input of the residual error unit, and the convolution kernels included in the 2 CBM modules in the residual error unit are: 512 1 × 1 convolution kernels, 512 3 × 3 convolution kernels.

As a preferred embodiment of the present invention, the SPP module in the neck network SPP + PANnet in step 3-2 includes three parallel maximum pooling layers, the pooling cores have sizes of 13 × 13, 9 × 9, and 5 × 5, respectively, the moving step is 1, and the output of the SPP module is the output of the three parallel maximum pooling layers connected to the input of the SPP module.

As a preferred embodiment of the present invention, the PANnet module in the neck network SPP + PANnet in step 3-2 includes four combination modules:

(1) combination module 1: the system consists of 3 CBL modules with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, 1 SPP module, 3 CBL modules with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, 1 upsampling layer, 1 connecting layer and 5 CBL modules with convolution kernel sizes of 1 × 1, 3 × 3, 1 × 1, 3 × 3 and 1 × 1;

(2) and (3) combining the modules 2: the system consists of 1 upsampling layer, a CBL module containing 128 1 multiplied by 1 convolution kernels and 1 connecting layer;

(3) and (3) a combined module: the system consists of 1 down-sampling layer, a CBL module containing 256 3 multiplied by 3 convolution kernels and 1 connecting layer;

(4) and (4) combining the modules: the system consists of 1 down-sampling layer, a CBL module containing 512 3 multiplied by 3 convolution kernels and 1 connecting layer;

the CBL module is formed by connecting a 2D convolution layer, a batch-to-batch layer and a Leaky _ relu activation function layer in series;

the characteristic graph scale output by the combination module 2 is 76 multiplied by 76; the characteristic graph scale output by the combination module 3 is 38 multiplied by 38; the feature map scale output by the combining module 4 is 19 × 19.

As a preferred solution of the present invention, the head network YOLO head in step 3-2 includes CBL modules with convolution kernels of 1 × 1 and 3 × 3, respectively, that are alternately convolved three times and convolved by 1 layer Conv2D, and the output of the head network is feature maps of three scales of 76 × 76, 38 × 38, and 19 × 19, respectively; the CBL module is formed by connecting a 2D convolution layer, a batch-to-batch layer and a Leaky _ relu activation function layer in series.

As a preferred aspect of the present invention, the initial hyper-parameter setting of the ship black smoke recognition network model is as follows: the learning rate is 0.001, the learning rate attenuation rate is 0.0005, and the batch size batch _ size is 64.

Compared with the prior art, the invention adopting the technical scheme has the following technical effects:

1. the improved YOLO v4 network model is adopted, the difficulty of uneven black smoke sample sets can be overcome, the recognition precision is improved, the recognition speed is improved, the requirements of management departments on the accuracy and the real-time performance of black smoke detection are met, and the method is suitable for picture detection and video detection.

2. The hybrid mosaic data amplification method adopted by the invention can ensure that the network model can be stably and quickly trained in a single GPU environment, saves the computing resources and the training time of a target recognition algorithm, and has industrial popularization value.

Drawings

FIG. 1 is a flow chart of a ship black smoke identification method based on deep learning of the invention;

FIG. 2 is a diagram illustrating an exemplary hybrid mosaic amplification method according to the present invention;

FIG. 3 is a block diagram of various modules in a ship black smoke identification network constructed by the invention;

FIG. 4 is a diagram of the YOLO v4 network architecture according to the present invention;

fig. 5 is a graph showing the result of identifying ship black smoke, in which (a) and (b) are two different scenes, respectively.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

As shown in fig. 1, a flowchart of a ship black smoke recognition method based on deep learning according to the present invention includes the following specific steps:

s1, constructing a data set:

s1-1, capturing a data set: the method comprises the steps that ship pictures shot by a pan-tilt camera form a shot data set, wherein the shot data set comprises ship pictures with black smoke and ship pictures without black smoke;

s1-2, amplifying the data set: because the number of black cigarette samples is relatively small, a plurality of black cigarette related pictures are crawled from a picture website by using a web crawler technology and are manually screened to form an amplification data set;

the snapshot data set and the augmentation data set jointly form an original data set, the data set is not less than 2000 effective pictures, the effective pictures are stored under a JPEGImaps folder directory, the resolution of each picture is not less than 608 multiplied by 608, and the pictures are numbered in sequence. The original data set of this embodiment has a total of 2505 valid pictures.

S2, preprocessing of the data set:

s2-1, image enhancement: carrying out enhancement transformation on an original data set by using an image enhancement mode to increase the diversity of training samples, wherein the adopted image enhancement mode comprises random turning, random cutting, random erasing, random brightness adjustment, random gelatinization and histogram equalization, and the purpose of the histogram equalization is to enhance the contrast between different objects in an image and facilitate the model to learn the color information of a detail information target;

s2-2, image annotation: manually labeling the enhanced data set by using a labelImg image labeling tool, respectively labeling the positions of the ship and the black smoke in the picture, labeling a board and a bookmark, and finally storing the generated xml labeling file in an options folder directory;

s2-3, amplifying the enhanced data set by using a mixed Mosaic (Mix Mosaic) amplification method, and fusing manual labeling frames;

the mixed Mosaic amplification (Mix Mosaic) method combines the traditional Mix up method and the emerging Mosaic method, and mainly comprises the following steps: firstly, two pictures are randomly selected from a data set and overlapped according to the transparency of 0.5 to form a new picture set, then four pictures are randomly selected from the overlapped picture set and spliced together to form a new picture, and finally, the new picture is zoomed into a picture with the size of 608 x 608 to be used as data for inputting a network to train, at the moment, each input enhanced picture is fused with the information of eight pictures in an original data set so as to increase the information amount of each round of learning of a model, and one example of the hybrid mosaic amplification method is shown in fig. 2.

S3, constructing a ship black smoke recognition model:

s3-1, configuring a target recognition network YOLO v4 running environment, mainly comprising a Ubuntu operating system, Python 3.7, OpenCV, Cuda, cuDNN, GeForce RTX 2070 hardware conditions and the like;

s3-2, constructing a feature extraction backbone network CSPdark net53, a neck network SPP + PANnet and a head network YOLO head to form a basic frame of the recognition model, wherein the structure of each part of modules is shown in figure 3, the constructed network frame is shown in figure 4, and the recognition process is specifically as follows: firstly, extracting high-dimensional features of an input image by a backbone network CSPdark net53, combining the features of three scales extracted in the last step by a neck network SPP + PANnet from top to bottom and then from bottom to top, fully utilizing multi-scale features, finally predicting by a head network YOLO head by utilizing the multi-scale combined features, and outputting a result comprising coordinates, classification categories and confidence degrees of a prediction frame;

the feature extraction backbone network CSPdarknet53 mainly comprises, connected in sequence, 1 CBM module including 32 3 × 3 convolution kernels, 1 CBM module including 64 3 × 3 convolution kernels, 1 CSP module including 1 residual error unit, 1 CBM module including 128 3 × 3 convolution kernels, 1 CSP module including 2 residual error units, 1 CBM module including 256 3 × 3 convolution kernels, 1 CSP module including 8 residual error units, 1 CBM module including 512 3 × 3 convolution kernels, 1 CSP module including 8 residual error units, 1 CBM module including 1024 3 × 3 convolution kernels, and 1 CSP module including 4 residual error units;

the CBM module is formed by connecting a 2D convolution layer (Conv2D), a batch-in layer (BN) and a Mish activation function layer in series;

the residual unit consists of 2 CBM modules and an addition layer (Add) connected with the original input;

the CSP module is composed of 1 CBM module, a residual error unit, 1 CBM module and a connection layer (Concat) connected with the original input through the 1 CBM module.

The neck network SPP module mainly comprises three parallel maximum pooling layers, the sizes of the pooling cores are respectively 13 multiplied by 13, 9 multiplied by 9 and 5 multiplied by 5, the moving step length is 1, and the output of the SPP module is the output of the three parallel maximum pooling layers and is connected with the input of the module.

The neck network PANnet module mainly comprises four combined modules:

(2) and (3) combining the modules 2: the system consists of 1 upsampling layer, 1 CBL module containing 128 1 multiplied by 1 convolution kernels and 1 connecting layer;

the CBL module is formed by connecting a 2D convolution layer (Conv2D), a batch grouping layer (BN) and a Leaky _ relu activation function layer in series;

the feature map scale output by the combination module 2 is 76 × 76; the feature map scale output by the combination module 3 is 38 × 38; the feature map scale output by the combining module 4 is 19 × 19.

The head network YOLO head mainly comprises CBL modules with convolution kernels of 1 × 1 and 3 × 3 respectively, which are alternately convolved three times and convolved by 1 layer Conv2D, and the output of the head network is feature maps of 76 × 76, 38 × 38 and 19 × 19 scales respectively.

S3-3, loading a pre-training weight file to obtain a pre-training target recognition network;

s3-4, modifying a regression loss part in a loss function of the pre-training model into CIOU loss, increasing an attenuation coefficient in a classification loss part, and modifying the CIOU loss part into focus loss aiming at the unbalanced sample, wherein the CIOU loss is mainly improved on the coordinate regression loss of the target detection frame, and the focus loss is mainly improved on the classification loss, so that the relative loss value of the difficult sample is increased;

the improved loss function calculation formula is as follows:

in the detection process, the algorithm firstly segments the input image into S²Detecting grids, each grid generating B possible prediction frames, and lambda in the regression loss_coordRepresenting the weight occupied by the regression loss part;

if the jth prediction box of the ith grid is responsible for a certain detection object obj, the value is 1, otherwise, the value is 0; IoU is the jth prediction box of the ith mesh

the length of a diagonal line of a minimum bounding box formed by a real box contained in the ith grid and a jth prediction box in the ith grid is taken as the length of the diagonal line of the minimum bounding box; v is a parameter used to measure aspect ratio; alpha can be regarded as a weight coefficient occupied by upsilon, and the specific calculation formula is as follows:

wherein, w^gt,h^gtRespectively representing the width and height of a real box, w and h respectively representing the width and height of a prediction box;

the confidenceDegree loss adopts a binomial cross entropy loss function, C_i,

if the jth prediction box of the ith grid is not responsible for any detection object obj, the value is 1, otherwise, the value is 0;

the classification loss adopts a binomial cross entropy loss function, and the detection tasks are assumed to share classes, p_i(c) The probability that the ith grid belongs to the c-th class is determined, gamma is an attenuation coefficient larger than 1, when the prediction probability is high enough, the generated classification loss is attenuated more, and the classification loss generated by samples with small prediction probability is almost not attenuated, so that the relative loss value of the difficult samples is increased, the weight of the difficult samples in each round of loss calculation is further increased, and the problem of imbalance among classes of the training data set is solved in such a way.

In the embodiment, classes is 2, and the super parameter γ is 2, so as to attenuate the loss values generated by the samples that are easy to classify, thereby increasing the relative loss values of the difficult samples, and further increasing the weight of the difficult samples in each round of loss calculation; setting a hyper-parameter lambda_coord＝5，λ_noobjThe weight of each fractional loss to the total loss function value is balanced at 0.5.

S3-5, inputting the amplified data set into the improved network, and respectively initializing network hyper-parameters as follows: the learning rate is 0.001, the learning rate attenuation rate is 0.0005, the batch size batch _ size is 64, a random small batch gradient descent algorithm is used for training, the weight of each layer of neurons in the network is updated, and the operation is stopped until the loss function value does not descend any more, so that the ship black smoke recognition network model is obtained.

S4, real-time monitoring:

and inputting the real-time picture or video captured by the cloud camera into the trained ship black smoke recognition network model, outputting a detection result, judging whether the regulation is violated or not, and achieving the purpose of real-time monitoring.

Inputting a picture to be detected or a real-time picture/video captured by a cloud camera into a trained ship black smoke recognition network model, and outputting a detection result, wherein (a) and (b) in fig. 5 are black smoke ship recognition results of two different scenes respectively. The problem that sample diversity is scarce can be overcome by the model, and ship black smoke in the photo can be effectively identified.

In summary, the ship black smoke recognition method based on deep learning provided by the invention can overcome the difficulty of unbalanced black smoke ship training samples, and can realize the detection function of real-time snapshot pictures transmitted by a cloud end by locally deploying the running environment of a black smoke recognition model, thereby helping related management departments to reduce manpower and improve the automation level of illegal ship monitoring.

The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the protection scope of the present invention.

Claims

1. A ship black smoke identification method based on deep learning is characterized by comprising the following steps:

step 2, preprocessing the original data set to obtain a preprocessed data set;

2. The deep learning-based ship black smoke identification method according to claim 1, wherein the specific process of obtaining the original data set in step 1 is as follows:

3. The deep learning-based ship black smoke identification method according to claim 1, wherein the specific process of the step 2 is as follows:

4. The deep learning-based ship black smoke identification method according to claim 3, wherein the specific process of the step 2-3 is as follows:

5. The deep learning-based ship black smoke identification method according to claim 1, wherein the specific process of the step 3 is as follows:

S²representing input picture segmentationIs S²A detection grid, each grid generating B possible prediction frames, λ_coordRepresenting the weight occupied by the CIOU loss part;

the confidence loss adopts a binomial cross entropy loss function, C_i,

the classification loss adopts a binomial cross entropy loss function, classes represent the number of classes of detection tasks, and p_i(c) Gamma is an attenuation coefficient greater than 1 for the probability that the ith mesh belongs to the c-th class;

6. The deep learning-based ship black smoke identification method according to claim 5, wherein the feature extraction backbone network CSPdark net53 in step 3-2 comprises the following modules connected in sequence: a first CBM module, a second CBM module, a first CSP module, a third CBM module, a second CSP module, a fourth CBM module, a third CSP module, a fifth CBM module, a fourth CSP module, a sixth CBM module and a fifth CSP module;

7. The deep learning-based ship black smoke identification method according to claim 5, wherein the SPP module in the neck network SPP + PANnet in step 3-2 comprises three parallel maximum pooling layers, the pooling kernel sizes are respectively 13 × 13, 9 × 9 and 5 × 5, the moving step size is 1, and the output of the SPP module is the connection of the outputs of the three parallel maximum pooling layers and the input of the SPP module.

8. The deep learning-based ship black smoke identification method according to claim 7, wherein the PANnet module in the neck network SPP + PANnet in step 3-2 comprises four combination modules:

9. The deep learning-based ship black smoke identification method according to claim 5, wherein the head network YOLO head in step 3-2 comprises CBL modules with convolution kernels of 1 x 1 and 3 x 3, respectively, which are alternately convolved three times and convolved by 1 layer Conv2D, and the output of the head network is feature maps of three scales of 76 x 76, 38 x 38 and 19 x 19; the CBL module is formed by connecting a 2D convolution layer, a batch-to-batch layer and a Leaky _ relu activation function layer in series.

10. The deep learning-based ship black smoke identification method according to claim 5, wherein the initial hyper-parameters of the ship black smoke identification network model are set as follows: the learning rate is 0.001, the learning rate attenuation rate is 0.0005, and the batch size batch _ size is 64.