CN114241189A - Ship black smoke identification method based on deep learning - Google Patents

Ship black smoke identification method based on deep learning Download PDF

Info

Publication number
CN114241189A
CN114241189A CN202111441778.2A CN202111441778A CN114241189A CN 114241189 A CN114241189 A CN 114241189A CN 202111441778 A CN202111441778 A CN 202111441778A CN 114241189 A CN114241189 A CN 114241189A
Authority
CN
China
Prior art keywords
module
cbm
black smoke
output
convolution kernels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111441778.2A
Other languages
Chinese (zh)
Inventor
胡里阳
叶智锐
邵宜昌
王超
张耀玉
吴浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202111441778.2A priority Critical patent/CN114241189A/en
Publication of CN114241189A publication Critical patent/CN114241189A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a ship black smoke identification method based on deep learning, which comprises the following steps: s1, constructing a data set; s2, preprocessing a data set; s3, constructing a ship black smoke recognition model; and S4, real-time monitoring. The method is based on an improved YOLO v4 network model, loss weights of difficult samples are increased by pertinently modifying a loss function, and therefore the identification difficulty caused by the imbalance of black smoke sample sets can be overcome. The method keeps a higher identification speed on the basis of high identification precision, can meet the requirements of relevant management departments on the accuracy and the real-time performance of ship black smoke identification, and is suitable for picture detection and video detection. In addition, the hybrid mosaic data amplification method provided by the invention can ensure that the network model is stably trained at high speed in a single GPU environment, saves the computing resource and training time of the target recognition algorithm, and has great industrial production value and popularization value.

Description

Ship black smoke identification method based on deep learning
Technical Field
The invention relates to a ship black smoke identification method based on deep learning, and belongs to the technical field of ship black smoke identification.
Background
The black smoke of the ship is caused by insufficient combustion of the diesel oil sprayed when the diesel engine works, so that the machine abrasion is increased to a certain extent, the service life of the diesel engine is reduced, and meanwhile, a large amount of carbon deposition generated can seriously affect the quality of the air on the sea, so that the ship needs to be identified accurately in real time from the viewpoints of energy conservation and environmental protection. The current target identification method mainly comprises a traditional image processing method at a pixel level and an emerging deep learning method. The traditional method takes a Vibe algorithm as an example, and the method can only detect a moving object but cannot detect the specific fact of the moving object; in addition, the method is greatly different from a deep learning method in the aspect of detection precision, so that most industrial target identification tasks adopt an algorithm based on deep learning. The target recognition algorithm based on deep learning mainly comprises two major classes, one class is a two-stage class, the method firstly generates a plurality of candidate regions, and then corrects and classifies each candidate region, and a typical algorithm mainly comprises an R-CNN series, such as Fast R-CNN, Fast R-CNN and the like; the other type is a one-stage type, the method can complete the whole identification process only by sending pictures into the model once, and typical algorithms are YOLO series (v1, v2, v3, v4), SSD and the like.
Generally, one-stage type algorithms have a higher recognition speed than the two-stage type, but are inferior to the two-stage type in recognition accuracy, and few algorithms currently can make a good balance between recognition accuracy and recognition speed. The YOLO v4 network model proposed by Alexey Bochkovski in 2020 can ensure accuracy and has excellent recognition speed, and can achieve real-time effect, but the network is mainly designed for recognizing general categories in daily life, and the recognition of the special category of black smoke is difficult to complete. In addition, the recognition method based on deep learning needs a large amount of training data to provide support, the requirements on the number of samples and the diversity of the samples are high, the number of black smoke samples is relatively small, the diversity of the samples is not rich enough, and the existing method has a lot of technical problems in recognition of the unbalanced sample set.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the ship black smoke identification method based on deep learning can overcome the difficulty of unbalanced black smoke ship training samples, solve the problem of illegal ship automatic detection, achieve the purpose of real-time and accurate identification of underway black smoke ships, further help relevant management departments to reduce manpower, improve law enforcement efficiency and respond to the national call for atmosphere pollution prevention and control work.
The invention adopts the following technical scheme for solving the technical problems:
a ship black smoke identification method based on deep learning comprises the following steps:
step 1, acquiring an original data set, wherein the original data set is not less than 2000 effective pictures, the resolution of each picture is not less than 608 multiplied by 608, and the pictures are numbered in sequence;
step 2, preprocessing the original data set to obtain a preprocessed data set;
step 3, constructing a ship black smoke recognition network model, and training the ship black smoke recognition network model by using the preprocessed data set to obtain a trained ship black smoke recognition network model;
and 4, inputting the pictures or videos captured in real time into the trained ship black smoke recognition network model, outputting a detection result, and judging whether the ship violates rules or not by discharging black smoke according to the detection result.
As a preferred embodiment of the present invention, the specific process of acquiring the original data set in step 1 is as follows:
step 1-1, snapping ship pictures by using a pan-tilt camera to form a snapping data set, wherein the snapping data set comprises the ship pictures with black smoke and the ship pictures without the black smoke;
step 1-2, crawling black smoke related pictures from a picture website by using a web crawler technology, and manually screening to form an amplification data set;
the snapshot dataset and the augmented dataset together comprise an original dataset.
As a preferred embodiment of the present invention, the specific process of step 2 is as follows:
step 2-1, performing enhancement transformation on an original data set by using an image enhancement mode to obtain an enhanced data set, wherein the image enhancement mode comprises random turning, random cutting, random erasing, random brightness adjustment, random gelatinization and histogram equalization;
step 2-2, manually marking the enhanced data set by using a labelImg image marking tool, respectively marking the positions of the ship and the black smoke in the picture by using a marking frame, and marking a tag book and a cookie;
and 2-3, amplifying the enhanced data set by using a mixed mosaic amplification method, and fusing manual labeling frames to obtain a preprocessed data set.
As a preferred scheme of the invention, the specific process of the step 2-3 is as follows:
firstly, randomly selecting two pictures from the enhanced data set, and superposing the two pictures according to the transparency of 0.5 to form a new picture set, wherein the number of the pictures in the new picture set is
Figure BDA0003383686500000031
n is the number of the pictures in the enhanced data set, then four pictures are randomly selected from the new picture set to be spliced together to synthesize a new picture, the new picture is zoomed into 608 multiplied by 608 size, and finally the fusion of the manual labeling frames is carried out to obtain the preprocessed data set, wherein the number of the pictures in the preprocessed data set is
Figure BDA0003383686500000032
As a preferred embodiment of the present invention, the specific process of step 3 is as follows:
step 3-1, configuring a running environment of a ship black smoke recognition network model YOLO v 4;
step 3-2, building a ship black smoke recognition network model YOLO v4, wherein the model YOLO v4 comprises a feature extraction backbone network CSPdark net53, a neck network SPP + PANnet and a head network YOLO head;
3-3, loading a pre-training weight file into YOLO v4 to obtain a pre-training ship black smoke recognition network model;
step 3-4, modifying the regression loss part in the loss function of the pre-training ship black smoke recognition network model into CIOU loss, increasing attenuation coefficient of the classification loss part, modifying into focus classification loss aiming at the unbalanced sample, and obtaining the improved ship black smoke recognition network model, wherein the improved loss function calculation formula is as follows:
Figure BDA0003383686500000033
in the formula, the first row is the CIOU loss of a regression box and a real box ground truth box, the second row and the third row are the confidence loss of a detection object and the confidence loss of the detection object respectively, and the fourth row is the focus classification loss of each category;
S2representing input picture split into S2A detection grid, each grid generating B possible prediction frames, λcoordRepresenting the weight occupied by the CIOU loss part;
Figure BDA0003383686500000034
if the jth prediction box of the ith grid is responsible for a certain detection object obj, the value is 1, otherwise, the value is 0; IoU denotes the jth prediction box of the ith mesh
Figure BDA0003383686500000041
And the real frame B existing in the ith meshiThe ratio of intersection to union of;
Figure BDA0003383686500000042
the coordinates A of the center point of the jth prediction box representing the ith gridctrAnd the coordinates B of the center point of the real frame existing in the ith meshctrThe Euclidean distance between;
Figure BDA0003383686500000043
the length of a diagonal line of a minimum bounding box formed by a real box contained in the ith grid and a jth prediction box in the ith grid is taken as the length of the diagonal line of the minimum bounding box; v represents a parameter used to measure aspect ratio; alpha represents a weight coefficient occupied by upsilon, and the specific calculation formula is as follows:
Figure BDA0003383686500000044
Figure BDA0003383686500000045
wherein, wgt、hgtRespectively representing the width and the height of a real box, and w and h respectively representing the width and the height of a prediction box;
the confidence loss adopts a binomial cross entropy loss function, Ci,
Figure BDA0003383686500000046
Respectively representing the true and predicted classes of the detected object, λnoobjWeight of confidence loss when the detected target is not included;
Figure BDA0003383686500000047
if the jth prediction box of the ith grid is not responsible for detecting the object obj, the value is 1, otherwise, the value is 0;
the classification loss adopts a binomial cross entropy loss function, classes represent the number of classes of detection tasks, and pi(c) A summary of the ith mesh belonging to the c-th categoryA rate, γ, is an attenuation coefficient greater than 1;
and 3-5, inputting the preprocessed data set into the improved ship black smoke recognition network model, training by using a random small batch gradient descent algorithm, updating the weight of each layer of neurons in the network, and stopping until the loss function value is not descended any more to obtain the trained ship black smoke recognition network model.
As a preferred scheme of the present invention, the feature extraction backbone network CSPdarknet53 in step 3-2 includes the following modules connected in sequence: a first CBM module, a second CBM module, a first CSP module, a third CBM module, a second CSP module, a fourth CBM module, a third CSP module, a fifth CBM module, a fourth CSP module, a sixth CBM module and a fifth CSP module;
each CBM module is formed by connecting a 2D convolution layer, a batch-in-one layer and a Mish activation function layer in series; the first CBM module to the sixth CBM module respectively correspondingly comprise 32 3 multiplied by 3 convolution kernels, 64 3 multiplied by 3 convolution kernels, 128 3 multiplied by 3 convolution kernels, 256 3 multiplied by 3 convolution kernels, 512 3 multiplied by 3 convolution kernels and 1024 3 multiplied by 3 convolution kernels;
the input of the first CSP module sequentially passes through 1 CBM module, 1 residual error unit and 1 CBM module to obtain a first output, meanwhile, the input of the first CSP module passes through 1 CBM module to obtain a second output, and the first output and the second output are connected to serve as the output of the first CSP module; 3 CBM modules in the first CSP module all contain 64 1 × 1 convolution kernels, the input of the residual error unit enters the addition layer together with the input of the residual error unit after sequentially passing through 2 CBM modules, and the convolution kernels contained in 2 CBM modules in the residual error unit are respectively: 32 1 × 1 convolution kernels, 64 3 × 3 convolution kernels;
the input of the second CSP module sequentially passes through 1 CBM module, 2 residual error units and 1 CBM module to obtain a third output, meanwhile, the input of the second CSP module passes through 1 CBM module to obtain a fourth output, and the third output and the fourth output are connected to be used as the output of the second CSP module; the 3 CBM modules in the second CSP module respectively comprise 64 convolution kernels of 1 × 1, the input of each residual error unit sequentially passes through the 2 CBM modules and then enters the addition layer together with the input of the residual error unit, and the convolution kernels of the 2 CBM modules in the residual error unit respectively are as follows: 64 1 × 1 convolution kernels, 64 3 × 3 convolution kernels;
the input of the third CSP module sequentially passes through 1 CBM module, 8 residual error units and 1 CBM module to obtain a fifth output, meanwhile, the input of the third CSP module passes through 1 CBM module to obtain a sixth output, and the fifth output and the sixth output are connected to be used as the output of the third CSP module; the 3 CBM modules in the third CSP module each include 128 1 × 1 convolution kernels, the input of each residual error unit sequentially passes through 2 CBM modules and then enters the adder layer together with the input of the residual error unit, and the convolution kernels included in the 2 CBM modules in the residual error unit are: 128 1 × 1 convolution kernels, 128 3 × 3 convolution kernels;
the input of the fourth CSP module sequentially passes through 1 CBM module, 8 residual error units and 1 CBM module to obtain a seventh output, meanwhile, the input of the fourth CSP module passes through 1 CBM module to obtain an eighth output, and the seventh output and the eighth output are connected to serve as the output of the fourth CSP module; the 3 CBM modules in the fourth CSP module each include 256 convolution kernels of 1 × 1, the input of each residual error unit sequentially passes through the 2 CBM modules and then enters the adder layer together with the input of the residual error unit, and the convolution kernels included in the 2 CBM modules in the residual error unit are: 256 1 × 1 convolution kernels, 256 3 × 3 convolution kernels;
the input of the fifth CSP module sequentially passes through 1 CBM module, 4 residual error units and 1 CBM module to obtain a ninth output, meanwhile, the input of the fifth CSP module passes through 1 CBM module to obtain a tenth output, and the ninth output and the tenth output are connected to serve as the output of the fifth CSP module; the 3 CBM modules in the fifth CSP module each include 512 convolution kernels, 1 × 1 convolution kernels, the input of each residual error unit sequentially passes through the 2 CBM modules and then enters the adder layer together with the input of the residual error unit, and the convolution kernels included in the 2 CBM modules in the residual error unit are: 512 1 × 1 convolution kernels, 512 3 × 3 convolution kernels.
As a preferred embodiment of the present invention, the SPP module in the neck network SPP + PANnet in step 3-2 includes three parallel maximum pooling layers, the pooling cores have sizes of 13 × 13, 9 × 9, and 5 × 5, respectively, the moving step is 1, and the output of the SPP module is the output of the three parallel maximum pooling layers connected to the input of the SPP module.
As a preferred embodiment of the present invention, the PANnet module in the neck network SPP + PANnet in step 3-2 includes four combination modules:
(1) combination module 1: the system consists of 3 CBL modules with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, 1 SPP module, 3 CBL modules with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, 1 upsampling layer, 1 connecting layer and 5 CBL modules with convolution kernel sizes of 1 × 1, 3 × 3, 1 × 1, 3 × 3 and 1 × 1;
(2) and (3) combining the modules 2: the system consists of 1 upsampling layer, a CBL module containing 128 1 multiplied by 1 convolution kernels and 1 connecting layer;
(3) and (3) a combined module: the system consists of 1 down-sampling layer, a CBL module containing 256 3 multiplied by 3 convolution kernels and 1 connecting layer;
(4) and (4) combining the modules: the system consists of 1 down-sampling layer, a CBL module containing 512 3 multiplied by 3 convolution kernels and 1 connecting layer;
the CBL module is formed by connecting a 2D convolution layer, a batch-to-batch layer and a Leaky _ relu activation function layer in series;
the characteristic graph scale output by the combination module 2 is 76 multiplied by 76; the characteristic graph scale output by the combination module 3 is 38 multiplied by 38; the feature map scale output by the combining module 4 is 19 × 19.
As a preferred solution of the present invention, the head network YOLO head in step 3-2 includes CBL modules with convolution kernels of 1 × 1 and 3 × 3, respectively, that are alternately convolved three times and convolved by 1 layer Conv2D, and the output of the head network is feature maps of three scales of 76 × 76, 38 × 38, and 19 × 19, respectively; the CBL module is formed by connecting a 2D convolution layer, a batch-to-batch layer and a Leaky _ relu activation function layer in series.
As a preferred aspect of the present invention, the initial hyper-parameter setting of the ship black smoke recognition network model is as follows: the learning rate is 0.001, the learning rate attenuation rate is 0.0005, and the batch size batch _ size is 64.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects:
1. the improved YOLO v4 network model is adopted, the difficulty of uneven black smoke sample sets can be overcome, the recognition precision is improved, the recognition speed is improved, the requirements of management departments on the accuracy and the real-time performance of black smoke detection are met, and the method is suitable for picture detection and video detection.
2. The hybrid mosaic data amplification method adopted by the invention can ensure that the network model can be stably and quickly trained in a single GPU environment, saves the computing resources and the training time of a target recognition algorithm, and has industrial popularization value.
Drawings
FIG. 1 is a flow chart of a ship black smoke identification method based on deep learning of the invention;
FIG. 2 is a diagram illustrating an exemplary hybrid mosaic amplification method according to the present invention;
FIG. 3 is a block diagram of various modules in a ship black smoke identification network constructed by the invention;
FIG. 4 is a diagram of the YOLO v4 network architecture according to the present invention;
fig. 5 is a graph showing the result of identifying ship black smoke, in which (a) and (b) are two different scenes, respectively.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
As shown in fig. 1, a flowchart of a ship black smoke recognition method based on deep learning according to the present invention includes the following specific steps:
s1, constructing a data set:
s1-1, capturing a data set: the method comprises the steps that ship pictures shot by a pan-tilt camera form a shot data set, wherein the shot data set comprises ship pictures with black smoke and ship pictures without black smoke;
s1-2, amplifying the data set: because the number of black cigarette samples is relatively small, a plurality of black cigarette related pictures are crawled from a picture website by using a web crawler technology and are manually screened to form an amplification data set;
the snapshot data set and the augmentation data set jointly form an original data set, the data set is not less than 2000 effective pictures, the effective pictures are stored under a JPEGImaps folder directory, the resolution of each picture is not less than 608 multiplied by 608, and the pictures are numbered in sequence. The original data set of this embodiment has a total of 2505 valid pictures.
S2, preprocessing of the data set:
s2-1, image enhancement: carrying out enhancement transformation on an original data set by using an image enhancement mode to increase the diversity of training samples, wherein the adopted image enhancement mode comprises random turning, random cutting, random erasing, random brightness adjustment, random gelatinization and histogram equalization, and the purpose of the histogram equalization is to enhance the contrast between different objects in an image and facilitate the model to learn the color information of a detail information target;
s2-2, image annotation: manually labeling the enhanced data set by using a labelImg image labeling tool, respectively labeling the positions of the ship and the black smoke in the picture, labeling a board and a bookmark, and finally storing the generated xml labeling file in an options folder directory;
s2-3, amplifying the enhanced data set by using a mixed Mosaic (Mix Mosaic) amplification method, and fusing manual labeling frames;
the mixed Mosaic amplification (Mix Mosaic) method combines the traditional Mix up method and the emerging Mosaic method, and mainly comprises the following steps: firstly, two pictures are randomly selected from a data set and overlapped according to the transparency of 0.5 to form a new picture set, then four pictures are randomly selected from the overlapped picture set and spliced together to form a new picture, and finally, the new picture is zoomed into a picture with the size of 608 x 608 to be used as data for inputting a network to train, at the moment, each input enhanced picture is fused with the information of eight pictures in an original data set so as to increase the information amount of each round of learning of a model, and one example of the hybrid mosaic amplification method is shown in fig. 2.
S3, constructing a ship black smoke recognition model:
s3-1, configuring a target recognition network YOLO v4 running environment, mainly comprising a Ubuntu operating system, Python 3.7, OpenCV, Cuda, cuDNN, GeForce RTX 2070 hardware conditions and the like;
s3-2, constructing a feature extraction backbone network CSPdark net53, a neck network SPP + PANnet and a head network YOLO head to form a basic frame of the recognition model, wherein the structure of each part of modules is shown in figure 3, the constructed network frame is shown in figure 4, and the recognition process is specifically as follows: firstly, extracting high-dimensional features of an input image by a backbone network CSPdark net53, combining the features of three scales extracted in the last step by a neck network SPP + PANnet from top to bottom and then from bottom to top, fully utilizing multi-scale features, finally predicting by a head network YOLO head by utilizing the multi-scale combined features, and outputting a result comprising coordinates, classification categories and confidence degrees of a prediction frame;
the feature extraction backbone network CSPdarknet53 mainly comprises, connected in sequence, 1 CBM module including 32 3 × 3 convolution kernels, 1 CBM module including 64 3 × 3 convolution kernels, 1 CSP module including 1 residual error unit, 1 CBM module including 128 3 × 3 convolution kernels, 1 CSP module including 2 residual error units, 1 CBM module including 256 3 × 3 convolution kernels, 1 CSP module including 8 residual error units, 1 CBM module including 512 3 × 3 convolution kernels, 1 CSP module including 8 residual error units, 1 CBM module including 1024 3 × 3 convolution kernels, and 1 CSP module including 4 residual error units;
the CBM module is formed by connecting a 2D convolution layer (Conv2D), a batch-in layer (BN) and a Mish activation function layer in series;
the residual unit consists of 2 CBM modules and an addition layer (Add) connected with the original input;
the CSP module is composed of 1 CBM module, a residual error unit, 1 CBM module and a connection layer (Concat) connected with the original input through the 1 CBM module.
The neck network SPP module mainly comprises three parallel maximum pooling layers, the sizes of the pooling cores are respectively 13 multiplied by 13, 9 multiplied by 9 and 5 multiplied by 5, the moving step length is 1, and the output of the SPP module is the output of the three parallel maximum pooling layers and is connected with the input of the module.
The neck network PANnet module mainly comprises four combined modules:
(1) combination module 1: the system consists of 3 CBL modules with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, 1 SPP module, 3 CBL modules with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, 1 upsampling layer, 1 connecting layer and 5 CBL modules with convolution kernel sizes of 1 × 1, 3 × 3, 1 × 1, 3 × 3 and 1 × 1;
(2) and (3) combining the modules 2: the system consists of 1 upsampling layer, 1 CBL module containing 128 1 multiplied by 1 convolution kernels and 1 connecting layer;
(3) and (3) a combined module: the system consists of 1 down-sampling layer, a CBL module containing 256 3 multiplied by 3 convolution kernels and 1 connecting layer;
(4) and (4) combining the modules: the system consists of 1 down-sampling layer, a CBL module containing 512 3 multiplied by 3 convolution kernels and 1 connecting layer;
the CBL module is formed by connecting a 2D convolution layer (Conv2D), a batch grouping layer (BN) and a Leaky _ relu activation function layer in series;
the feature map scale output by the combination module 2 is 76 × 76; the feature map scale output by the combination module 3 is 38 × 38; the feature map scale output by the combining module 4 is 19 × 19.
The head network YOLO head mainly comprises CBL modules with convolution kernels of 1 × 1 and 3 × 3 respectively, which are alternately convolved three times and convolved by 1 layer Conv2D, and the output of the head network is feature maps of 76 × 76, 38 × 38 and 19 × 19 scales respectively.
S3-3, loading a pre-training weight file to obtain a pre-training target recognition network;
s3-4, modifying a regression loss part in a loss function of the pre-training model into CIOU loss, increasing an attenuation coefficient in a classification loss part, and modifying the CIOU loss part into focus loss aiming at the unbalanced sample, wherein the CIOU loss is mainly improved on the coordinate regression loss of the target detection frame, and the focus loss is mainly improved on the classification loss, so that the relative loss value of the difficult sample is increased;
the improved loss function calculation formula is as follows:
Figure BDA0003383686500000101
in the formula, the first row is the CIOU loss of a regression box and a real box ground truth box, the second row and the third row are the confidence loss of a detection object and the confidence loss of the detection object respectively, and the fourth row is the focus classification loss of each category;
in the detection process, the algorithm firstly segments the input image into S2Detecting grids, each grid generating B possible prediction frames, and lambda in the regression losscoordRepresenting the weight occupied by the regression loss part;
Figure BDA0003383686500000102
if the jth prediction box of the ith grid is responsible for a certain detection object obj, the value is 1, otherwise, the value is 0; IoU is the jth prediction box of the ith mesh
Figure BDA0003383686500000103
And the real frame B existing in the ith meshiThe ratio of intersection to union of;
Figure BDA0003383686500000104
the coordinates A of the center point of the jth prediction box representing the ith gridctrAnd the coordinates B of the center point of the real frame existing in the ith meshctrThe Euclidean distance between;
Figure BDA0003383686500000105
the length of a diagonal line of a minimum bounding box formed by a real box contained in the ith grid and a jth prediction box in the ith grid is taken as the length of the diagonal line of the minimum bounding box; v is a parameter used to measure aspect ratio; alpha can be regarded as a weight coefficient occupied by upsilon, and the specific calculation formula is as follows:
Figure BDA0003383686500000106
Figure BDA0003383686500000107
wherein, wgt,hgtRespectively representing the width and height of a real box, w and h respectively representing the width and height of a prediction box;
the confidenceDegree loss adopts a binomial cross entropy loss function, Ci,
Figure BDA0003383686500000108
Respectively representing the true and predicted classes of the detected object, λnoobjWeight of confidence loss when the detected target is not included;
Figure BDA0003383686500000109
if the jth prediction box of the ith grid is not responsible for any detection object obj, the value is 1, otherwise, the value is 0;
the classification loss adopts a binomial cross entropy loss function, and the detection tasks are assumed to share classes, pi(c) The probability that the ith grid belongs to the c-th class is determined, gamma is an attenuation coefficient larger than 1, when the prediction probability is high enough, the generated classification loss is attenuated more, and the classification loss generated by samples with small prediction probability is almost not attenuated, so that the relative loss value of the difficult samples is increased, the weight of the difficult samples in each round of loss calculation is further increased, and the problem of imbalance among classes of the training data set is solved in such a way.
In the embodiment, classes is 2, and the super parameter γ is 2, so as to attenuate the loss values generated by the samples that are easy to classify, thereby increasing the relative loss values of the difficult samples, and further increasing the weight of the difficult samples in each round of loss calculation; setting a hyper-parameter lambdacoord=5,λnoobjThe weight of each fractional loss to the total loss function value is balanced at 0.5.
S3-5, inputting the amplified data set into the improved network, and respectively initializing network hyper-parameters as follows: the learning rate is 0.001, the learning rate attenuation rate is 0.0005, the batch size batch _ size is 64, a random small batch gradient descent algorithm is used for training, the weight of each layer of neurons in the network is updated, and the operation is stopped until the loss function value does not descend any more, so that the ship black smoke recognition network model is obtained.
S4, real-time monitoring:
and inputting the real-time picture or video captured by the cloud camera into the trained ship black smoke recognition network model, outputting a detection result, judging whether the regulation is violated or not, and achieving the purpose of real-time monitoring.
Inputting a picture to be detected or a real-time picture/video captured by a cloud camera into a trained ship black smoke recognition network model, and outputting a detection result, wherein (a) and (b) in fig. 5 are black smoke ship recognition results of two different scenes respectively. The problem that sample diversity is scarce can be overcome by the model, and ship black smoke in the photo can be effectively identified.
In summary, the ship black smoke recognition method based on deep learning provided by the invention can overcome the difficulty of unbalanced black smoke ship training samples, and can realize the detection function of real-time snapshot pictures transmitted by a cloud end by locally deploying the running environment of a black smoke recognition model, thereby helping related management departments to reduce manpower and improve the automation level of illegal ship monitoring.
The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the protection scope of the present invention.

Claims (10)

1. A ship black smoke identification method based on deep learning is characterized by comprising the following steps:
step 1, acquiring an original data set, wherein the original data set is not less than 2000 effective pictures, the resolution of each picture is not less than 608 multiplied by 608, and the pictures are numbered in sequence;
step 2, preprocessing the original data set to obtain a preprocessed data set;
step 3, constructing a ship black smoke recognition network model, and training the ship black smoke recognition network model by using the preprocessed data set to obtain a trained ship black smoke recognition network model;
and 4, inputting the pictures or videos captured in real time into the trained ship black smoke recognition network model, outputting a detection result, and judging whether the ship violates rules or not by discharging black smoke according to the detection result.
2. The deep learning-based ship black smoke identification method according to claim 1, wherein the specific process of obtaining the original data set in step 1 is as follows:
step 1-1, snapping ship pictures by using a pan-tilt camera to form a snapping data set, wherein the snapping data set comprises the ship pictures with black smoke and the ship pictures without the black smoke;
step 1-2, crawling black smoke related pictures from a picture website by using a web crawler technology, and manually screening to form an amplification data set;
the snapshot dataset and the augmented dataset together comprise an original dataset.
3. The deep learning-based ship black smoke identification method according to claim 1, wherein the specific process of the step 2 is as follows:
step 2-1, performing enhancement transformation on an original data set by using an image enhancement mode to obtain an enhanced data set, wherein the image enhancement mode comprises random turning, random cutting, random erasing, random brightness adjustment, random gelatinization and histogram equalization;
step 2-2, manually marking the enhanced data set by using a labelImg image marking tool, respectively marking the positions of the ship and the black smoke in the picture by using a marking frame, and marking a tag book and a cookie;
and 2-3, amplifying the enhanced data set by using a mixed mosaic amplification method, and fusing manual labeling frames to obtain a preprocessed data set.
4. The deep learning-based ship black smoke identification method according to claim 3, wherein the specific process of the step 2-3 is as follows:
firstly, randomly selecting two pictures from the enhanced data set, and superposing the two pictures according to the transparency of 0.5 to form a new picture set, wherein the number of the pictures in the new picture set is
Figure FDA0003383686490000024
n is the number of the pictures in the enhanced data set, then four pictures are randomly selected from the new picture set to be spliced together to synthesize a new picture, the new picture is zoomed into 608 multiplied by 608 size, and finally the fusion of the manual labeling frames is carried out to obtain the preprocessed data set, wherein the number of the pictures in the preprocessed data set is
Figure FDA0003383686490000025
5. The deep learning-based ship black smoke identification method according to claim 1, wherein the specific process of the step 3 is as follows:
step 3-1, configuring a running environment of a ship black smoke recognition network model YOLO v 4;
step 3-2, building a ship black smoke recognition network model YOLO v4, wherein the model YOLO v4 comprises a feature extraction backbone network CSPdark net53, a neck network SPP + PANnet and a head network YOLO head;
3-3, loading a pre-training weight file into YOLO v4 to obtain a pre-training ship black smoke recognition network model;
step 3-4, modifying the regression loss part in the loss function of the pre-training ship black smoke recognition network model into CIOU loss, increasing attenuation coefficient of the classification loss part, modifying into focus classification loss aiming at the unbalanced sample, and obtaining the improved ship black smoke recognition network model, wherein the improved loss function calculation formula is as follows:
Figure FDA0003383686490000021
in the formula, the first row is the CIOU loss of a regression box and a real box ground truth box, the second row and the third row are the confidence loss of a detection object and the confidence loss of the detection object respectively, and the fourth row is the focus classification loss of each category;
S2representing input picture segmentationIs S2A detection grid, each grid generating B possible prediction frames, λcoordRepresenting the weight occupied by the CIOU loss part;
Figure FDA0003383686490000022
if the jth prediction box of the ith grid is responsible for a certain detection object obj, the value is 1, otherwise, the value is 0; IoU denotes the jth prediction box of the ith mesh
Figure FDA0003383686490000023
And the real frame B existing in the ith meshiThe ratio of intersection to union of;
Figure FDA0003383686490000031
the coordinates A of the center point of the jth prediction box representing the ith gridctrAnd the coordinates B of the center point of the real frame existing in the ith meshctrThe Euclidean distance between;
Figure FDA0003383686490000032
the length of a diagonal line of a minimum bounding box formed by a real box contained in the ith grid and a jth prediction box in the ith grid is taken as the length of the diagonal line of the minimum bounding box; v represents a parameter used to measure aspect ratio; alpha represents a weight coefficient occupied by upsilon, and the specific calculation formula is as follows:
Figure FDA0003383686490000033
Figure FDA0003383686490000034
wherein, wgt、hgtRespectively representing the width and the height of a real box, and w and h respectively representing the width and the height of a prediction box;
the confidence loss adopts a binomial cross entropy loss function, Ci,
Figure FDA0003383686490000035
Respectively representing the true and predicted classes of the detected object, λnoobjWeight of confidence loss when the detected target is not included;
Figure FDA0003383686490000036
if the jth prediction box of the ith grid is not responsible for detecting the object obj, the value is 1, otherwise, the value is 0;
the classification loss adopts a binomial cross entropy loss function, classes represent the number of classes of detection tasks, and pi(c) Gamma is an attenuation coefficient greater than 1 for the probability that the ith mesh belongs to the c-th class;
and 3-5, inputting the preprocessed data set into the improved ship black smoke recognition network model, training by using a random small batch gradient descent algorithm, updating the weight of each layer of neurons in the network, and stopping until the loss function value is not descended any more to obtain the trained ship black smoke recognition network model.
6. The deep learning-based ship black smoke identification method according to claim 5, wherein the feature extraction backbone network CSPdark net53 in step 3-2 comprises the following modules connected in sequence: a first CBM module, a second CBM module, a first CSP module, a third CBM module, a second CSP module, a fourth CBM module, a third CSP module, a fifth CBM module, a fourth CSP module, a sixth CBM module and a fifth CSP module;
each CBM module is formed by connecting a 2D convolution layer, a batch-in-one layer and a Mish activation function layer in series; the first CBM module to the sixth CBM module respectively correspondingly comprise 32 3 multiplied by 3 convolution kernels, 64 3 multiplied by 3 convolution kernels, 128 3 multiplied by 3 convolution kernels, 256 3 multiplied by 3 convolution kernels, 512 3 multiplied by 3 convolution kernels and 1024 3 multiplied by 3 convolution kernels;
the input of the first CSP module sequentially passes through 1 CBM module, 1 residual error unit and 1 CBM module to obtain a first output, meanwhile, the input of the first CSP module passes through 1 CBM module to obtain a second output, and the first output and the second output are connected to serve as the output of the first CSP module; 3 CBM modules in the first CSP module all contain 64 1 × 1 convolution kernels, the input of the residual error unit enters the addition layer together with the input of the residual error unit after sequentially passing through 2 CBM modules, and the convolution kernels contained in 2 CBM modules in the residual error unit are respectively: 32 1 × 1 convolution kernels, 64 3 × 3 convolution kernels;
the input of the second CSP module sequentially passes through 1 CBM module, 2 residual error units and 1 CBM module to obtain a third output, meanwhile, the input of the second CSP module passes through 1 CBM module to obtain a fourth output, and the third output and the fourth output are connected to be used as the output of the second CSP module; the 3 CBM modules in the second CSP module respectively comprise 64 convolution kernels of 1 × 1, the input of each residual error unit sequentially passes through the 2 CBM modules and then enters the addition layer together with the input of the residual error unit, and the convolution kernels of the 2 CBM modules in the residual error unit respectively are as follows: 64 1 × 1 convolution kernels, 64 3 × 3 convolution kernels;
the input of the third CSP module sequentially passes through 1 CBM module, 8 residual error units and 1 CBM module to obtain a fifth output, meanwhile, the input of the third CSP module passes through 1 CBM module to obtain a sixth output, and the fifth output and the sixth output are connected to be used as the output of the third CSP module; the 3 CBM modules in the third CSP module each include 128 1 × 1 convolution kernels, the input of each residual error unit sequentially passes through 2 CBM modules and then enters the adder layer together with the input of the residual error unit, and the convolution kernels included in the 2 CBM modules in the residual error unit are: 128 1 × 1 convolution kernels, 128 3 × 3 convolution kernels;
the input of the fourth CSP module sequentially passes through 1 CBM module, 8 residual error units and 1 CBM module to obtain a seventh output, meanwhile, the input of the fourth CSP module passes through 1 CBM module to obtain an eighth output, and the seventh output and the eighth output are connected to serve as the output of the fourth CSP module; the 3 CBM modules in the fourth CSP module each include 256 convolution kernels of 1 × 1, the input of each residual error unit sequentially passes through the 2 CBM modules and then enters the adder layer together with the input of the residual error unit, and the convolution kernels included in the 2 CBM modules in the residual error unit are: 256 1 × 1 convolution kernels, 256 3 × 3 convolution kernels;
the input of the fifth CSP module sequentially passes through 1 CBM module, 4 residual error units and 1 CBM module to obtain a ninth output, meanwhile, the input of the fifth CSP module passes through 1 CBM module to obtain a tenth output, and the ninth output and the tenth output are connected to serve as the output of the fifth CSP module; the 3 CBM modules in the fifth CSP module each include 512 convolution kernels, 1 × 1 convolution kernels, the input of each residual error unit sequentially passes through the 2 CBM modules and then enters the adder layer together with the input of the residual error unit, and the convolution kernels included in the 2 CBM modules in the residual error unit are: 512 1 × 1 convolution kernels, 512 3 × 3 convolution kernels.
7. The deep learning-based ship black smoke identification method according to claim 5, wherein the SPP module in the neck network SPP + PANnet in step 3-2 comprises three parallel maximum pooling layers, the pooling kernel sizes are respectively 13 × 13, 9 × 9 and 5 × 5, the moving step size is 1, and the output of the SPP module is the connection of the outputs of the three parallel maximum pooling layers and the input of the SPP module.
8. The deep learning-based ship black smoke identification method according to claim 7, wherein the PANnet module in the neck network SPP + PANnet in step 3-2 comprises four combination modules:
(1) combination module 1: the system consists of 3 CBL modules with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, 1 SPP module, 3 CBL modules with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, 1 upsampling layer, 1 connecting layer and 5 CBL modules with convolution kernel sizes of 1 × 1, 3 × 3, 1 × 1, 3 × 3 and 1 × 1;
(2) and (3) combining the modules 2: the system consists of 1 upsampling layer, a CBL module containing 128 1 multiplied by 1 convolution kernels and 1 connecting layer;
(3) and (3) a combined module: the system consists of 1 down-sampling layer, a CBL module containing 256 3 multiplied by 3 convolution kernels and 1 connecting layer;
(4) and (4) combining the modules: the system consists of 1 down-sampling layer, a CBL module containing 512 3 multiplied by 3 convolution kernels and 1 connecting layer;
the CBL module is formed by connecting a 2D convolution layer, a batch-to-batch layer and a Leaky _ relu activation function layer in series;
the characteristic graph scale output by the combination module 2 is 76 multiplied by 76; the characteristic graph scale output by the combination module 3 is 38 multiplied by 38; the feature map scale output by the combining module 4 is 19 × 19.
9. The deep learning-based ship black smoke identification method according to claim 5, wherein the head network YOLO head in step 3-2 comprises CBL modules with convolution kernels of 1 x 1 and 3 x 3, respectively, which are alternately convolved three times and convolved by 1 layer Conv2D, and the output of the head network is feature maps of three scales of 76 x 76, 38 x 38 and 19 x 19; the CBL module is formed by connecting a 2D convolution layer, a batch-to-batch layer and a Leaky _ relu activation function layer in series.
10. The deep learning-based ship black smoke identification method according to claim 5, wherein the initial hyper-parameters of the ship black smoke identification network model are set as follows: the learning rate is 0.001, the learning rate attenuation rate is 0.0005, and the batch size batch _ size is 64.
CN202111441778.2A 2021-11-30 2021-11-30 Ship black smoke identification method based on deep learning Pending CN114241189A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111441778.2A CN114241189A (en) 2021-11-30 2021-11-30 Ship black smoke identification method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111441778.2A CN114241189A (en) 2021-11-30 2021-11-30 Ship black smoke identification method based on deep learning

Publications (1)

Publication Number Publication Date
CN114241189A true CN114241189A (en) 2022-03-25

Family

ID=80752122

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111441778.2A Pending CN114241189A (en) 2021-11-30 2021-11-30 Ship black smoke identification method based on deep learning

Country Status (1)

Country Link
CN (1) CN114241189A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165602A (en) * 2018-08-27 2019-01-08 成都华安视讯科技有限公司 A kind of black smoke vehicle detection method based on video analysis

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019048604A1 (en) * 2017-09-09 2019-03-14 Fcm Dienstleistungs Ag Automatic early detection of smoke, soot and fire with increased detection reliability using machine learning
CN112464883A (en) * 2020-12-11 2021-03-09 武汉工程大学 Automatic detection and identification method and system for ship target in natural scene
CN112800838A (en) * 2020-12-28 2021-05-14 浙江万里学院 Channel ship detection and identification method based on deep learning
CN112884090A (en) * 2021-04-14 2021-06-01 安徽理工大学 Fire detection and identification method based on improved YOLOv3

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019048604A1 (en) * 2017-09-09 2019-03-14 Fcm Dienstleistungs Ag Automatic early detection of smoke, soot and fire with increased detection reliability using machine learning
CN112464883A (en) * 2020-12-11 2021-03-09 武汉工程大学 Automatic detection and identification method and system for ship target in natural scene
CN112800838A (en) * 2020-12-28 2021-05-14 浙江万里学院 Channel ship detection and identification method based on deep learning
CN112884090A (en) * 2021-04-14 2021-06-01 安徽理工大学 Fire detection and identification method based on improved YOLOv3

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165602A (en) * 2018-08-27 2019-01-08 成都华安视讯科技有限公司 A kind of black smoke vehicle detection method based on video analysis

Similar Documents

Publication Publication Date Title
CN110503112B (en) Small target detection and identification method for enhancing feature learning
CN112884064B (en) Target detection and identification method based on neural network
CN111368690B (en) Deep learning-based video image ship detection method and system under influence of sea waves
CN111179217A (en) Attention mechanism-based remote sensing image multi-scale target detection method
CN111881730A (en) Wearing detection method for on-site safety helmet of thermal power plant
CN110991444B (en) License plate recognition method and device for complex scene
CN114743119B (en) High-speed rail contact net hanger nut defect detection method based on unmanned aerial vehicle
CN112800838A (en) Channel ship detection and identification method based on deep learning
CN110598693A (en) Ship plate identification method based on fast-RCNN
Zheng et al. A lightweight ship target detection model based on improved YOLOv5s algorithm
CN113807464A (en) Unmanned aerial vehicle aerial image target detection method based on improved YOLO V5
CN111414807A (en) Tidal water identification and crisis early warning method based on YO L O technology
CN115205264A (en) High-resolution remote sensing ship detection method based on improved YOLOv4
CN108133235A (en) A kind of pedestrian detection method based on neural network Analysis On Multi-scale Features figure
Zhang et al. MMFNet: Forest fire smoke detection using multiscale convergence coordinated pyramid network with mixed attention and fast-robust NMS
CN116168240A (en) Arbitrary-direction dense ship target detection method based on attention enhancement
CN116994135A (en) Ship target detection method based on vision and radar fusion
CN114529821A (en) Offshore wind power safety monitoring and early warning method based on machine vision
Yandouzi et al. Investigation of combining deep learning object recognition with drones for forest fire detection and monitoring
CN114241189A (en) Ship black smoke identification method based on deep learning
CN114565824A (en) Single-stage rotating ship detection method based on full convolution network
CN113936299A (en) Method for detecting dangerous area in construction site
CN117036656A (en) Water surface floater identification method under complex scene
CN116580324A (en) Yolov 5-based unmanned aerial vehicle ground target detection method
Li et al. CDMY: A lightweight object detection model based on coordinate attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination