CN114241189B

CN114241189B - Ship black smoke recognition method based on deep learning

Info

Publication number: CN114241189B
Application number: CN202111441778.2A
Authority: CN
Inventors: 胡里阳; 叶智锐; 邵宜昌; 王超; 张耀玉; 吴浩
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2024-06-07
Anticipated expiration: 2041-11-30
Also published as: CN114241189A

Abstract

The invention discloses a ship black smoke identification method based on deep learning, which comprises the following steps: s1, constructing a data set; s2, preprocessing a data set; s3, constructing a ship black smoke identification model; s4, monitoring in real time. The method is based on an improved YOLO v4 network model, and the loss weight of a difficult sample is increased by purposefully modifying a loss function, so that the recognition difficulty caused by unbalance of a black smoke sample set can be overcome. The method keeps a high recognition speed on the basis of high recognition precision, can meet the requirements of relevant management departments on the accuracy and the instantaneity of the black smoke recognition of the ship, and is simultaneously suitable for picture detection and video detection. In addition, the mixed mosaic data amplification method provided by the invention can ensure that the network model is stably trained at a high speed in a single GPU environment, saves the computing resource and training time of a target recognition algorithm, and has a relatively large industrial production value and popularization value.

Description

Ship black smoke recognition method based on deep learning

Technical Field

The invention relates to a ship black smoke recognition method based on deep learning, and belongs to the technical field of ship black smoke recognition.

Background

The black smoke of the ship is caused by insufficient combustion of diesel oil sprayed in the process of working of the diesel engine, the abrasion of the machine can be increased to a certain extent, the service life of the diesel engine is reduced, and a large amount of carbon deposition generated at the same time can seriously affect the quality of the offshore air, so that from the aspects of energy conservation and environmental protection, the ship with the black smoke on the ship is necessary to be identified in real time and accurately. The current target recognition method mainly comprises a traditional pixel-level image processing method and an emerging deep learning method. The conventional method takes a Vibe algorithm as an example, and the method can only detect moving objects, but cannot detect what the moving objects are in particular; in addition, the detection accuracy is greatly different from that of a deep learning method, so that most industrial target recognition tasks adopt a deep learning-based algorithm. The target recognition algorithm based on deep learning mainly comprises two major classes, namely two classes, wherein the class method firstly generates a plurality of candidate areas, then corrects and classifies each candidate area, and the typical algorithm mainly comprises R-CNN series, such as Fast R-CNN, fast R-CNN and the like; the other type is a one-stage type, and the method can complete the whole set of recognition flow by only sending pictures into a model once, and typical algorithms are the YOLO series (v 1, v2, v3, v 4), SSD and the like.

Generally, the recognition speed of the one-stage algorithm is faster than that of the two-stage algorithm, but the recognition accuracy is inferior to that of the two-stage algorithm, and at present, few algorithms can well balance between the recognition accuracy and the recognition speed. The YOLO v4 network model proposed in the year 2020 Alexey Bochkovskiy can ensure accuracy and simultaneously has excellent recognition speed, and can achieve real-time effect, but the design of the network mainly aims at recognizing general categories in daily life, and the recognition of the special category of black smoke is difficult to finish. In addition, the recognition method based on deep learning needs a large amount of training data to provide support, has higher requirements on the number of samples and sample diversity, has relatively less black smoke sample size and insufficient sample diversity, and has a plurality of technical problems for recognizing the unbalanced sample set by the existing method.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: the ship black smoke recognition method based on deep learning can overcome the difficulty of unbalanced black smoke ship training samples, solve the problem of automatic detection of illegal ships, achieve the purpose of carrying out real-time and accurate recognition on the ship with the black smoke on air, further help relevant management departments to lighten manpower, improve law enforcement efficiency and respond to the call of the country to the air pollution prevention and treatment work.

The invention adopts the following technical scheme for solving the technical problems:

A ship black smoke identification method based on deep learning comprises the following steps:

step 1, acquiring an original data set, wherein the original data set is not less than 2000 effective pictures, the resolution of each picture is not less than 608 multiplied by 608, and the pictures are numbered in sequence;

step 2, preprocessing an original data set to obtain a preprocessed data set;

step 3, constructing a ship black smoke recognition network model, and training the ship black smoke recognition network model by using the preprocessed data set to obtain a trained ship black smoke recognition network model;

And 4, inputting the pictures or videos captured in real time into a trained ship black smoke recognition network model, outputting a detection result, and judging whether the ship discharges black smoke or not, namely whether the ship is illegal or not according to the detection result.

As a preferred embodiment of the present invention, the specific process of acquiring the original data set in step 1 is as follows:

step 1-1, using a cradle head camera to snap ship pictures to form a snap data set, wherein the snap data set comprises ship pictures with black smoke and ship pictures without black smoke;

Step 1-2, crawling black smoke related pictures from a picture website by using a web crawler technology, and performing manual screening to form an amplification data set;

The snapshot dataset and the augmentation dataset together comprise the original dataset.

As a preferable scheme of the invention, the specific process of the step 2 is as follows:

step 2-1, performing enhancement transformation on an original data set by using an image enhancement mode to obtain an enhanced data set, wherein the image enhancement mode comprises random overturn, random cutting, random erasure, random brightness adjustment, random gelatinization and histogram equalization;

step 2-2, manually marking the enhanced data set by using labelImg image marking tools, respectively marking the positions of the ship and the black smoke in the picture by using marking frames, and marking the marks and the smoks;

And 2-3, amplifying the enhanced data set by using a mixed mosaic amplification method, and fusing the artificial marking frames to obtain a preprocessed data set.

As a preferable scheme of the invention, the specific process of the step 2-3 is as follows:

Firstly, randomly selecting two pictures from the enhanced data set, and overlapping the two pictures according to transparency 0.5 to form a new picture set, wherein the number of the pictures in the new picture set is N is the number of pictures in the enhanced data set, four pictures are randomly selected from the new picture set and spliced together to form a new picture, the new picture is scaled to 608 x 608, and finally fusion of artificial labeling frames is carried out to obtain a preprocessed data set, wherein the number of pictures in the preprocessed data set is/>

As a preferable scheme of the invention, the specific process of the step 3 is as follows:

step 3-1, configuring a ship black smoke identification network model YOLO v4 operation environment;

Step 3-2, building a ship black smoke identification network model YOLO v4, wherein the model comprises a feature extraction backbone network CSPDARKNET, a neck network SPP+ PANnet and a head network YOLO head;

Step 3-3, loading a pre-training weight file into the YOLO v4 to obtain a pre-training ship black smoke recognition network model;

And 3-4, modifying a regression loss part in a loss function of the pretrained ship black smoke identification network model into CIOU losses, increasing an attenuation coefficient by a classification loss part, modifying the regression loss part into focus classification loss aiming at an unbalanced sample, obtaining an improved ship black smoke identification network model, and calculating the improved loss function by the following formula:

In the formula, the first row represents a regression box and CIOU loss of a real box ground truth box, the second row and the third row represent confidence loss of the existence detection object and confidence loss of the non-existence detection object respectively, and the fourth row represents focus classification loss of each category;

S ² denotes dividing the input picture into S ² detection grids, each generating B possible prediction frames, λ _coord denotes the weight occupied by the CIOU lost part; for indicating the variable, if the jth prediction frame of the ith grid is responsible for a certain detection object obj, the value is 1, otherwise, the value is 0; ioU denotes the jth prediction box/>, of the ith grid And the intersection to union ratio of the real boxes B _i existing in the ith grid; /(I)Representing the Euclidean distance between the jth predicted frame center point coordinate A _ctr of the ith grid and the real frame center point coordinate B _ctr present in the ith grid; /(I)The diagonal length of the minimum bounding box enclosed by the real frame contained in the ith grid and the jth predicted frame in the ith grid; v represents a parameter used to measure aspect ratio; alpha represents a weight coefficient occupied by upsilon, and a specific calculation formula is as follows:

wherein w ^gt、h^gt represents the width and the height of the real frame respectively, and w and h represent the width and the height of the predicted frame respectively;

the confidence loss uses a two-term cross entropy loss function, C _i, Respectively representing the real category and the predicted category of the detection target, wherein lambda _noobj is the weight of confidence loss when the detection target is not included; /(I)For indicating the variable, if the jth prediction frame of the ith grid is not responsible for detecting the object obj, the value is 1, otherwise, the value is 0;

The classification loss adopts a two-term cross entropy loss function, classes represents the class number of the detection task, p _i (c) is the probability that the ith grid belongs to the c class, and gamma is an attenuation coefficient larger than 1;

And 3-5, inputting the preprocessed data set into an improved ship black smoke recognition network model, training by using a random small-batch gradient descent algorithm, updating the weight of each layer of neurons of the network, and stopping until the loss function value is no longer lowered, so as to obtain a trained ship black smoke recognition network model.

As a preferred embodiment of the present invention, the feature extraction backbone network CSPDARKNET in step 3-2 includes the following modules connected in sequence: a first CBM module, a second CBM module, a first CSP module, a third CBM module, a second CSP module, a fourth CBM module, a third CSP module, a fifth CBM module, a fourth CSP module, a sixth CBM module, and a fifth CSP module;

Each CBM module is formed by connecting a 2D convolution layer, a batch normalization layer and Mish activation function layers in series; and the first through sixth CBM modules respectively and correspondingly comprise 323 x 3 convolution kernels, 64 3 x 3 convolution kernels, 128 3 x 3 convolution kernels, 256 3 x 3 convolution kernels, 512 3 x 3 convolution kernels, 1024 3 x 3 convolution kernels;

the input of the first CSP module sequentially passes through 1 CBM module, 1 residual unit and 1 CBM module to obtain a first output, and meanwhile, the input of the first CSP module passes through 1 CBM module to obtain a second output, and the first output and the second output are connected to be used as the output of the first CSP module; the 3 CBM modules in the first CSP module all contain 64 1×1 convolution kernels, and the input of the residual unit sequentially passes through the 2 CBM modules and then enters an addition layer together with the input of the residual unit, and the convolution kernels contained by the 2 CBM modules in the residual unit are respectively: 32 1 x1 convolution kernels, 64 3 x 3 convolution kernels;

the input of the second CSP module sequentially passes through the 1 CBM module, the 2 residual error units and the 1 CBM module to obtain a third output, and meanwhile, the input of the second CSP module passes through the 1 CBM module to obtain a fourth output, and the third output and the fourth output are connected to be used as the output of the second CSP module; the 3 CBM modules in the second CSP module each comprise 64 1×1 convolution kernels, and the inputs of each residual unit enter the addition layer together with the inputs of the residual units after passing through the 2 CBM modules in sequence, and the convolution kernels contained in the 2 CBM modules in the residual units are respectively: 64 1 x 1 convolution kernels, 64 3 x 3 convolution kernels;

The input of the third CSP module sequentially passes through the 1 CBM module, the 8 residual error units and the 1 CBM module to obtain a fifth output, and meanwhile, the input of the third CSP module passes through the 1 CBM module to obtain a sixth output, and the fifth output and the sixth output are connected to be used as the output of the third CSP module; the 3 CBM modules in the third CSP module each include 128 1×1 convolution kernels, and the inputs of each residual unit sequentially pass through the 2 CBM modules and then enter the addition layer together with the inputs of the residual units, where the convolution kernels included in the 2 CBM modules in the residual units are respectively: 128 1 x1 convolution kernels, 128 3 x3 convolution kernels;

the input of the fourth CSP module sequentially passes through the 1 CBM module, the 8 residual error units and the 1 CBM module to obtain a seventh output, and meanwhile, the input of the fourth CSP module passes through the 1 CBM module to obtain an eighth output, and the seventh output and the eighth output are connected to be used as the output of the fourth CSP module; the 3 CBM modules in the fourth CSP module each include 256 1×1 convolution kernels, and the inputs of each residual unit sequentially pass through the 2 CBM modules and then enter the addition layer together with the inputs of the residual units, where the convolution kernels included in the 2 CBM modules in the residual units are respectively: 256 1 x1 convolution kernels, 256 3 x3 convolution kernels;

the input of the fifth CSP module sequentially passes through the 1 CBM module, the 4 residual error units and the 1 CBM module to obtain a ninth output, and meanwhile, the input of the fifth CSP module passes through the 1 CBM module to obtain a tenth output, and the ninth output and the tenth output are connected to be used as the output of the fifth CSP module; the 3 CBM modules in the fifth CSP module each include 512 1×1 convolution kernels, and the input of each residual unit sequentially passes through the 2 CBM modules and then enters the addition layer together with the input of the residual unit, where the convolution kernels included in the 2 CBM modules in the residual unit are respectively: 512 1 x1 convolution kernels, 512 3 x3 convolution kernels.

As a preferred scheme of the invention, the SPP module in the SPP+ PANnet of the neck network in the step 3-2 comprises three parallel maximum pooling layers, the pooling core sizes are 13×13, 9×9 and 5×5 respectively, the moving step length is 1, and the output of the SPP module is that the output of the three parallel maximum pooling layers is connected with the input of the SPP module.

As a preferred embodiment of the present invention, the PANnet module in the neck network spp+ PANnet in step 3-2 includes four combination modules:

(1) Combination module 1: the device consists of 3 CBL modules with convolution kernel sizes of 1 multiplied by 1,3 multiplied by 3 and 1 multiplied by 1,1 SPP modules, 3 CBL modules with convolution kernel sizes of 1 multiplied by 1,3 multiplied by 3 and 1 multiplied by 1,1 upsampling layer, 1 connecting layer and 5 CBL modules with convolution kernel sizes of 1 multiplied by 1,3 multiplied by 3,1 multiplied by 1,3 multiplied by 3 and 1 multiplied by 1;

(2) Combining module 2: consists of 1 up-sampling layer, CBL module containing 128 1×1 convolution kernels, 1 connection layer;

(3) Combining module 3: consists of 1 downsampling layer, a CBL module containing 256 3×3 convolution kernels, and 1 connecting layer;

(4) Combining module 4: consists of 1 downsampling layer, a CBL module containing 512 3×3 convolution kernels, 1 connecting layer;

the CBL module is formed by connecting a 2D convolution layer, a batch-to-batch layer and a leak_ relu activation function layer in series;

The feature map output by the combination module 2 has a scale of 76×76; the feature map output by the combination module 3 has a scale of 38×38; the feature map output by the combination module 4 has a dimension of 19×19.

As a preferred scheme of the present invention, the header network YOLO head in step 3-2 includes CBL modules with convolution kernels of 1×1 and 3×3 respectively, and 1-layer Conv2D convolution alternately three times, and the output of the header network is a feature map with three scales of 76×76, 38×38, and 19×19 respectively; the CBL module is formed by serially connecting a 2D convolution layer, a batch-to-batch layer and a leak_ relu activation function layer.

As a preferable scheme of the invention, the initial super-parameters of the ship black smoke identification network model are set as follows: the learning rate was 0.001, the learning rate decay rate was 0.0005, and the batch size batch_size was 64.

Compared with the prior art, the technical scheme provided by the invention has the following technical effects:

1. The invention adopts the improved YOLO v4 network model, can overcome the difficulty of unbalance of a black smoke sample set, can improve the recognition accuracy and the recognition speed, meets the requirements of a management department on the accuracy and the real-time performance of black smoke detection, and is suitable for picture detection and video detection.

2. The mixed mosaic data amplification method adopted by the invention can ensure that the network model is trained stably and at high speed in a single GPU environment, saves the computing resource and training time of a target recognition algorithm, and has industrial popularization value.

Drawings

FIG. 1 is a flow chart of a deep learning-based ship black smoke identification method;

FIG. 2 is an exemplary diagram of a hybrid mosaic amplification method according to the present invention;

FIG. 3 is a block diagram of each module in the marine black smoke identification network constructed by the invention;

FIG. 4 is a diagram of the network structure of YOLO v4 of the present invention;

fig. 5 is a view of a result of identifying black smoke of a ship, wherein (a) and (b) are two different scenes respectively.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings. The embodiments described below by referring to the drawings are exemplary only for explaining the present invention and are not to be construed as limiting the present invention.

Referring to fig. 1, a flowchart of a deep learning-based ship black smoke recognition method according to the present invention includes the following specific steps:

s1, constructing a data set:

S1-1, snapshot data set: the ship pictures captured by using the cradle head camera form a captured data set, wherein the captured data set comprises black smoke ship pictures and ship pictures without black smoke;

S1-2, amplifying a data set: because the black smoke sample size is relatively small, a plurality of black smoke related pictures are crawled from a picture website by using a web crawler technology, and are manually screened to form an amplification data set;

the snapshot dataset and the augmentation dataset form an original dataset together, the dataset is not less than 2000 effective pictures, the effective pictures are stored under JPEGImages folder catalogues, the resolution of each picture is not lower than 608 multiplied by 608, and the images are numbered in sequence. The original data set of this embodiment has 2505 valid pictures in total.

S2, preprocessing a data set:

S2-1, image enhancement: the original data set is subjected to enhancement transformation by using an image enhancement mode to increase the diversity of training samples, the adopted image enhancement mode is random overturn, random cutting, random erasure, random brightness adjustment, random gelatinization and histogram equalization, wherein the purpose of the histogram equalization is to enhance the contrast ratio between different objects in an image, so that a model can learn color information of a detail information target conveniently;

s2-2, image labeling: manually marking the enhanced data set by using labelImg image marking tools, marking the positions of the ship and black smoke in the picture respectively, marking the marks and the marks, and finally storing the generated xml marking file under a Annotations folder directory;

S2-3, amplifying the enhanced data set by using a mixed Mosaic (Mix Mosaic) amplification method, and fusing the artificial annotation frames;

The hybrid Mosaic amplification (Mix Mosaic) method combines the traditional Mixup method with the emerging Mosaic method, and mainly comprises the following steps: two pictures are randomly selected from the data set and are overlapped according to transparency of 0.5 to form a new picture set, four pictures are randomly selected from the overlapped picture set and are spliced together to form a new picture, finally the new picture is scaled into a picture with the size of 608 multiplied by 608 to serve as training data of an input network, at the moment, each piece of reinforced picture is input to fuse information of eight pictures in the original data set, so that the information quantity of each round of learning of a model is increased, and one example of the hybrid mosaic amplification method is shown in fig. 2.

S3, constructing a ship black smoke identification model:

S3-1, configuring a target identification network YOLO v4 running environment, wherein the operating environment mainly comprises Ubuntu operating system, python 3.7, openCV, cuda, cuDNN, geForce RTX 2070 hardware conditions and the like;

S3-2, constructing a feature extraction backbone network CSPDARKNET, a neck network SPP+ PANnet and a head network YOLO head to form an identification model basic framework, wherein the component structures of all the component modules are shown in fig. 3, the constructed network framework is shown in fig. 4, and the identification process specifically comprises the following steps: firstly, a backbone network CSPDARKNET extracts high-dimensional features of an input image, a neck network SPP+ PANnet combines the features of the three scales extracted in the previous step from top to bottom, then from bottom to top, fully utilizes the multi-scale features, finally a head network YOLO head predicts by utilizing the multi-scale combined features, and an output result comprises coordinates, classification categories and confidence of a prediction frame;

The feature extraction backbone network CSPDARKNET mainly comprises 1 CBM module comprising 323×3 convolution kernels, 1 CBM module comprising 64 3×3 convolution kernels, 1 CSP module comprising 1 residual unit, 1 CBM module comprising 128 3×3 convolution kernels, 1 CSP module comprising 2 residual units, 1 CBM module comprising 256 3×3 convolution kernels, 1 CSP module comprising 8 residual units, 1 CBM module comprising 512 3×3 convolution kernels, 1 CSP module comprising 8 residual units, 1 CBM module comprising 1024 3×3 convolution kernels, and 1 CSP module comprising 4 residual units, which are sequentially connected;

The CBM module is formed by serially connecting a 2D convolution layer (Conv 2D), a batch one layer (BN) and Mish activation function layers;

the residual unit consists of 2 CBM modules and an addition layer (Add) connected with the original input;

The CSP module is composed of 1 CBM module, residual unit, 1 CBM module and connection layer (Concat) connected with the original input through 1 CBM module.

The neck network SPP module mainly comprises three parallel maximum pooling layers, the pooling core sizes are 13×13,9×9 and 5×5 respectively, the moving step length is 1, and the output of the SPP module is that the output of the three parallel maximum pooling layers is connected with the input of the module.

The neck network PANnet module mainly comprises four combination modules:

(2) Combining module 2: consists of 1 up-sampling layer, 1 CBL module containing 128 1×1 convolution kernels, 1 connection layer;

the CBL module is formed by serially connecting a 2D convolution layer (Conv 2D), a batch one layer (BN) and a leak_ relu activation function layer;

the feature map output by the combination module 2 has the scale of 76 multiplied by 76; the feature map output by the combination module 3 has the scale of 38 multiplied by 38; the feature map output by the combination module 4 has a dimension of 19×19.

The header network YOLO head mainly comprises three times of alternating convolutions of a CBL module with convolution kernels of 1 multiplied by 1 and 3 multiplied by 3 and 1 layer Conv2D convolutions, and the output of the header network is a characteristic diagram with three scales of 76 multiplied by 76, 38 multiplied by 38 and 19 multiplied by 19 respectively.

S3-3, loading a pre-training weight file to obtain a pre-training target identification network;

s3-4, modifying a regression loss part in a loss function of the pre-training model into CIOU losses, and increasing an attenuation coefficient by a classification loss part to modify the focus loss of an unbalanced sample, wherein CIOU losses mainly improve coordinate regression loss of a target detection frame, and focus losses mainly improve the classification loss, and the aim is to increase a relative loss value of a difficult sample;

the improved loss function calculation formula is as follows:

In the detection process, firstly, an algorithm divides an input image into S ² detection grids, each grid generates B possible prediction frames, and lambda _coord in the regression loss represents the weight occupied by a regression loss part; For indicating the variable, if the jth prediction frame of the ith grid is responsible for a certain detection object obj, the value is 1, otherwise, the value is 0; ioU is the j-th prediction box/>, of the i-th grid And the intersection to union ratio of the real boxes B _i existing in the ith grid; /(I)Representing the Euclidean distance between the jth predicted frame center point coordinate A _ctr of the ith grid and the real frame center point coordinate B _ctr present in the ith grid; /(I)The diagonal length of the minimum surrounding frame formed by the real frame contained in the ith grid and the jth predicted frame in the ith grid; v is a parameter used to measure aspect ratio; alpha can be regarded as a weight coefficient occupied by upsilon, and a specific calculation formula is as follows:

Wherein w ^gt,h^gt represents the width and the height of the real frame, and w and h represent the width and the height of the predicted frame, respectively;

the confidence loss uses a two-term cross entropy loss function, C _i, Respectively representing the real category and the predicted category of the detection target, wherein lambda _noobj is the weight of confidence loss when the detection target is not included; /(I)To indicate a variable, if the jth prediction box of the ith grid is not responsible for any detection object obj, the value is 1, otherwise, the value is 0;

The classification loss adopts two cross entropy loss functions, the detection tasks are assumed to have classes categories, p _i (c) is the probability that the ith grid belongs to the c category, gamma is an attenuation coefficient larger than 1, when the prediction probability is high enough, the generated classification loss is more attenuated, and the classification loss generated by samples with small prediction probability is almost unattenuated, so that the relative loss values of difficult samples are increased, the weight of the difficult samples in each round of loss calculation is increased, and the problem of unbalance among training data set categories is solved by the mode.

In this embodiment, classes =2, and the super parameter γ=2 is set to attenuate the loss values generated by the samples easy to classify, so as to increase the relative loss values of the difficult samples, and further increase the weight of the difficult samples in each round of loss calculation; the super parameter λ _coord＝5,λ_noobj =0.5 is set to balance the weight of the total loss function value occupied by each partial loss.

S3-5, inputting the amplified data set into an improved network, and initializing network super-parameters as follows: the learning rate is 0.001, the learning rate attenuation rate is 0.0005, the batch size batch_size is 64, training is carried out by using a random small batch gradient descent algorithm, the weight of each layer of neurons of the network is updated, and the network model is stopped until the loss function value is not lowered any more, so that the ship black smoke recognition network model is obtained.

S4, monitoring in real time:

and inputting the real-time pictures or videos captured by the cloud camera into a trained ship black smoke recognition network model, outputting a detection result, and judging whether the regulations are violated or not, so that the purpose of real-time monitoring is achieved.

Inputting a to-be-detected picture or a real-time picture/video captured by a cloud camera into a trained ship black smoke recognition network model, and outputting detection results, wherein (a) and (b) in fig. 5 are black smoke ship recognition results of two different scenes respectively. The model can be seen to overcome the problem of sample diversity, and can effectively identify ship black smoke in the photo.

In summary, the deep learning-based ship black smoke recognition method provided by the invention can overcome the difficulty of unbalanced black smoke ship training samples, realize the detection function of real-time snap-shot pictures transmitted by the cloud through locally deploying the black smoke recognition model running environment, help related management departments to reduce manpower and improve the automation level of monitoring illegal ships.

The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereto, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the present invention.

Claims

1. The ship black smoke identification method based on deep learning is characterized by comprising the following steps of:

step 2, preprocessing an original data set to obtain a preprocessed data set;

step 3, constructing a ship black smoke recognition network model, and training the ship black smoke recognition network model by using the preprocessed data set to obtain a trained ship black smoke recognition network model; the specific process is as follows:

Confidence loss employs a two-term cross entropy loss function, Respectively representing the real category and the predicted category of the detection target, wherein lambda _noobj is the weight of confidence loss when the detection target is not included; /(I)For indicating the variable, if the jth prediction frame of the ith grid is not responsible for detecting the object obj, the value is 1, otherwise, the value is 0;

step 3-5, inputting the preprocessed data set into an improved ship black smoke recognition network model, training by using a random small-batch gradient descent algorithm, updating the weight of each layer of neurons of the network, and stopping until the loss function value is no longer lowered, so as to obtain a trained ship black smoke recognition network model;

2. The deep learning-based ship black smoke identification method according to claim 1, wherein the specific process of acquiring the original data set in step 1 is as follows:

3. The deep learning-based ship black smoke identification method according to claim 1, wherein the specific process of the step 2 is as follows:

4. The deep learning-based ship black smoke identification method according to claim 3, wherein the specific process of the step 2-3 is as follows:

5. The deep learning based marine soot identification method of claim 1, wherein the feature extraction backbone network CSPDARKNET of step 3-2 comprises the following modules connected in sequence: a first CBM module, a second CBM module, a first CSP module, a third CBM module, a second CSP module, a fourth CBM module, a third CSP module, a fifth CBM module, a fourth CSP module, a sixth CBM module, and a fifth CSP module;

6. The deep learning-based ship black smoke recognition method according to claim 1, wherein the SPP module in the neck network spp+ PANnet in step 3-2 includes three parallel maximum pooling layers, the pooling core sizes are 13×13, 9×9, and 5×5, respectively, the movement step size is 1, and the output of the SPP module is that the output of the three parallel maximum pooling layers is connected with the input of the SPP module.

7. The deep learning based marine soot identification method of claim 6, wherein PANnet module in the neck network spp+ PANnet of step 3-2 comprises four combination modules:

8. The deep learning-based ship black smoke recognition method according to claim 1, wherein the head network YOLO head in step 3-2 comprises three times of alternating convolutions of CBL modules with convolution kernels of 1 x1 and 3 x 3 and 1 layer of Conv2D convolutions, and the head network outputs are feature maps with three scales of 76 x 76, 38 x 38 and 19 x 19 respectively; the CBL module is formed by serially connecting a 2D convolution layer, a batch-to-batch layer and a leak_ relu activation function layer.

9. The deep learning-based ship black smoke recognition method according to claim 1, wherein the initial super-parameters of the ship black smoke recognition network model are set as follows: the learning rate was 0.001, the learning rate decay rate was 0.0005, and the batch size batch_size was 64.