CN111160354B

CN111160354B - Ship image segmentation method based on joint image information under sea and sky background

Info

Publication number: CN111160354B
Application number: CN201911388248.9A
Authority: CN
Inventors: 张雯; 何旭杰; 张智; 苏丽; 宋浩; 崔浩浩; 张秋雨; 贺金夯
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2022-06-17
Anticipated expiration: 2039-12-30
Also published as: CN111160354A

Abstract

The invention relates to a ship image segmentation method under a sea-sky background based on joint image information, aiming at a ship image to be segmented, firstly, a trained interference factor discriminator is used for discriminating the environment type corresponding to the ship image; then, the ship extractor corresponding to the environment type is used for carrying out segmentation and extraction on the ship; adopting a classification network based on a neural network to construct an interference factor discriminator; training by using a training set to obtain a trained interference factor discriminator; constructing ship extractors under different environments by adopting a neural network-based segmentation network; and (4) respectively training by using the ship images in each environment in the training set to obtain the trained ship extractors corresponding to the ship images in different environments. The method is mainly used for segmentation and extraction of ships in the image. The method solves the problem that the segmentation precision is reduced when the existing segmentation algorithm is used for segmenting the ship image under the sea-sky background.

Description

Ship image segmentation method based on joint image information under sea and sky background

Technical Field

The invention relates to a ship image segmentation method, in particular to a ship image segmentation method based on joint image information under a sea-sky background, and belongs to the field of digital image processing.

Background

Image segmentation belongs to a branch of the digital image processing field, and is an important and extremely challenging digital image processing task. The main purpose is to extract the region of interest in an image. Generally, image segmentation is widely used in image compression, image retrieval, medical diagnosis, target recognition, video surveillance, and automatic driving systems.

The existing image segmentation algorithms are mainly divided into the following categories: a threshold-based segmentation algorithm, an edge-based segmentation algorithm, a region-based segmentation algorithm, a superpixel-based segmentation algorithm, and a particular theory-based segmentation algorithm. The segmentation algorithm based on a specific theory comprises the following classes: the method is based on a genetic algorithm, a level set algorithm, a clustering algorithm, a fuzzy theory algorithm, a morphological watershed algorithm and a neural network algorithm. However, when the existing segmentation algorithm is directly applied to the sea-sky background, the segmentation result is very strong due to the particularity of the sea-sky background. In the practical sea-sky background image segmentation, it is found that it is difficult to obtain a satisfactory segmentation result by using the conventional segmentation algorithm for the time of sea fog, heavy wake, heavy wave, sea surface reflection, reflection and dusk, which always appear on the sea surface, as shown in the three rows of segmentation results in fig. 1, and the third row in fig. 1 is a result obtained by using the conventional segmentation method.

In addition, most of the existing segmentation algorithms for ships under the marine background are based on infrared images, only a small part of the existing segmentation algorithms are based on visible light images, and the defects that the infrared images have low signal-to-noise ratio, low foreground and background contrast and target details are lost are known.

Disclosure of Invention

Aiming at the prior art, the technical problem to be solved by the invention is to provide a ship image segmentation method based on joint image information under the sea-sky background, which can solve the problem of reduced segmentation precision in the process of carrying out ship image segmentation under the sea-sky background by using the existing segmentation algorithm.

In order to solve the technical problem, the invention provides a method for segmenting a ship image based on joint image information under a sea-sky background, which comprises the following steps:

step 1: preprocessing an image, including normalization operation and image expansion operation;

step 2: constructing an interference factor discriminator by adopting a classification network based on a neural network, and training the interference factor discriminator by utilizing a training set containing ship images under different environmental backgrounds to obtain a trained interference factor discriminator;

and step 3: inputting the image obtained in the step 1 into the trained interference factor discriminator obtained in the step 2, and discriminating the image into ship images under different environmental backgrounds;

and 4, step 4: constructing ship extractors under different environments by adopting a neural network-based segmentation network, and respectively carrying out targeted training on the ship extractors by utilizing ship images under each environment in a training set to obtain trained ship extractors corresponding to the ship images under different environments;

and 5: and (4) respectively inputting the ship images under different backgrounds obtained by the judgment in the step (3) into corresponding trained ship extractors for ship extraction.

The invention also includes:

1. in the step 1, the image expansion operation adopts a bilinear interpolation mode, and the interpolation formula is as follows:

I(m,n)＝am+bn+cmn+d

in the formula, m and n are coordinates of pixel points, a, b, c and d are four dynamically determined parameters, and I is a pixel value obtained through interpolation.

2. The interference factor discriminator established by the classification network based on the neural network in the step 2 is realized based on Squeezenet, and the method specifically comprises the following steps:

the input image is first subjected to a first combining operation, including Conv, ReLU, Maxpool;

then passing through Fire 2-Fire 9 modules, adding MaxPool after Fire3 and Fire5, and passing through the last combination operation after Fire9, including Dropout, Conv, ReLU, AvgPool;

and finally, outputting a judgment result.

The Fire 2-Fire 9 modules specifically include:

the Fire 2-Fire 9 blocks contain the Squeeze Layer, expanded Layer and 2 ReLUs, wherein the Squeeze Layer performs a1 × 1 convolution process; the Expand Layer performs the convolution process of 1 × 1 and the convolution process of 3 × 3 in parallel.

4. The step 4 of constructing the ship extractor under different environments by adopting the neural network-based segmentation network is realized based on DeepLabv3+, and specifically comprises the following steps:

firstly, preliminarily extracting features from the ship images under each environment in the training set through a DCNN (distributed computing NN), and then dividing the extracted features into 5 branches for processing;

performing a cavity convolution operation and a Batch Normalization operation from the branch 2 to the branch 5;

branch 1 performs a combining operation comprising: AvgPool, Conv, BatchNorm, ReLU; then, the feature map of the branch is up-sampled to 1/16 with the size of the feature map being the size of the original map, and the feature map is spliced with the feature maps processed by the branches 2 to 5;

subjecting the spliced feature map to a combination operation, including Conv, BatchNorm, ReLU, Dropout;

finally, directly up-sampling 4 times to 1/4 of the size of the original image, thereby forming a feature map F1 of the Encoder;

at the stage of extracting the feature map F1 by the DCNN, a feature map with a low level is extracted, the size of the feature map is 1/4 of the size of the original image, and the low level features are subjected to one combination operation, including Conv, BatchNorm, ReLU; and then, the feature map is spliced with the feature map F1 to form a feature map F2, then, F2 is subjected to two Conv + BatchNorm + ReLU combining operations and one Dropout + Conv combining operation, and finally, the obtained feature map is directly up-sampled by 4 times to the size of the original image so as to output a segmentation map.

The invention has the beneficial effects that: the invention relates to an image segmentation method based on deep learning, which is different from a conventional segmentation algorithm based on deep learning, structurally integrates an interference factor discriminator into the front end of a ship extractor, and improves DeepLabv3+, so that the image segmentation method is more suitable for solving the special problem under the sea-sky background. The method finally solves the problem that the segmentation precision of the conventional segmentation algorithm based on deep learning is not high due to the particularity of the sea-sky background, such as sea fog on the sea surface, and can be used for pertinently solving other interference factors existing in the sea-sky background, such as large wake, large waves, sea surface reflection, reflection and dusk moments without losing generality.

Drawings

FIG. 1 is a diagram of the effect of a conventional segmentation algorithm applied to a sea-sky background; the first column is the original image, the second column is the labeled graph, and the third column is the segmentation result of the conventional algorithm;

FIG. 2 is an overall structure diagram of a novel ship segmentation algorithm under the sea-sky background combined with image information according to the present invention;

FIG. 3 is a view showing the structure of SqueezeNet;

FIG. 4 is a diagram of a Fire Module structure;

FIG. 5(a) is a schematic of a standard convolution;

FIG. 5(b) is a schematic diagram of hole convolution;

FIG. 6 is a schematic representation of DeepLabv3 +;

FIG. 7 is a diagram of a Dense Block structure;

FIG. 8 is a comparison of Transition Layer and Tran-Expansion Layer;

FIG. 9 is a modified DeepLabv3+ structure diagram;

FIG. 10 is a graph comparing the results of the example segmentation with the results of the DeepLabv3 +: the first column is the original, the second column is the labeled graph, the third column is the division result of deep lavv 3+, and the fourth column is the division result of the present invention.

Detailed Description

The invention aims to solve the problem that the segmentation precision is reduced when the existing segmentation algorithm is used for segmenting the ship image under the sea-sky background. Aiming at a ship image to be segmented, firstly, a trained interference factor discriminator is used for discriminating the environment type corresponding to the ship image; then, the ship extractor corresponding to the environment type is used for carrying out segmentation and extraction on the ship; adopting a classification network based on a neural network to construct an interference factor discriminator; training by using a training set to obtain a trained interference factor discriminator; constructing ship extractors under different environments by adopting a neural network-based segmentation network; and respectively training by using the ship images in each environment in the training set to obtain the trained ship extractors corresponding to the ship images in different environments. The method is mainly used for segmentation and extraction of ships in the image. The method comprises the following steps:

aiming at a ship image to be segmented, firstly, a trained interference factor discriminator is used for discriminating the environment type corresponding to the ship image; then, the ship extractor corresponding to the environment type is used for carrying out segmentation and extraction on the ship;

the interference factor discriminator and the ship extractors corresponding to different environment types are as follows:

adopting a classification network based on a neural network to construct an interference factor discriminator; training by using a training set to obtain a trained interference factor discriminator;

constructing ship extractors under different environments by adopting a neural network-based segmentation network; and respectively training by using the ship images in each environment in the training set to obtain the trained ship extractors corresponding to the ship images in different environments.

Further, before training the interference factor discriminator and the ship extractor, normalization operation and image scaling operation are required to be performed on the pictures.

Further, the image scaling operation adopts a bilinear interpolation mode, and an interpolation formula is as follows:

I(m,n)＝am+bn+cmn+d (1)

Further, the interference factor discriminator constructed by the classification network based on the neural network is realized based on the SqueezeNet, and the method specifically comprises the following steps:

and finally, outputting a judgment result.

Further, the Fire 2-Fire 9 modules are as follows:

the Fire module contains the Squeeze Layer, expanded Layer and 2 ReLUs, wherein the Squeeze Layer performs a1 × 1 convolution process; the Expand Layer performs the convolution process of 1 × 1 and the convolution process of 3 × 3 in parallel.

Further, the number of channels of the feature map output after the input picture is subjected to the first combination operation is 96, and the size of the feature map is 1/2 of the size of the input picture;

the number of channels of the feature map in the Fire2 is compressed from 96 to 16 and expanded to 64, and so on until the number of Fire modules reaches Fire9, at which time the number of output feature maps is 256 and the size is 1/8, which is the size of the input map.

Further, the construction of the ship extractor under different environments by adopting the neural network-based segmentation network is realized based on deep nav 3+, which is specifically as follows:

firstly, preliminarily extracting features of an input image through a DCNN (distributed computing NN) network, and then dividing the extracted features into 5 branches for processing;

branch 1 performs a combined operation comprising: AvgPool, Conv, BatchNorm, ReLU; then, the feature map of the branch is up-sampled to 1/16 with the size of the feature map being the size of the original map, and the feature map is spliced with the feature maps processed by the branches 2 to 5;

extracting a feature map with low level at the stage of extracting a feature map F1 by the DCNN, wherein the size of the feature map is 1/4 of the size of the original image, and the low level features are subjected to one combination operation, including Conv, Batchnorm, ReLU; and then, the feature map is spliced with the feature map F1 to form a feature map F2, then, F2 is subjected to two Conv + BatchNorm + ReLU combining operations and one Dropout + Conv combining operation, and finally, the obtained feature map is directly up-sampled by 4 times to the size of the original image so as to output a segmentation map.

Further, the DCNN network is a DenseNet69 network, which is specifically as follows:

4 identical Dense blocks are designated Dense Block 1-Dense Block 4; each Dense Block contains 8 identical Dense layers, namely Dense layer 1-Dense layer 8; the 4 identical TE layers are marked as TE Layer 1-TE Layer 4;

firstly, extracting features of an image through a Conv Layer, then, enabling the extracted features C to enter a Dense Block1, enabling the features C in the Dense Block1 to enter a Dense Layer1 firstly, namely, performing BatchNorm + ReLU + Conv combination operation twice, then, splicing the output of the Dense Layer1 and the features C, using the spliced features as the input of Dense Layer2, and similarly, using the spliced whole of the outputs of Dense Layer1 and Dense Layer2 and the features C as the input of Dense Layer3, and analogizing the rest Dense layers;

the output feature map of the Dense Block1 then enters TE Layer1, the feature map enters TE Layer1 and then undergoes BatchNorm + ReLU + Conv + AvgPoling combination operation, the feature map of the output is 1/2 of the size of the input feature map, and the number of Conv output channels is expanded in TE Layer 1;

the output characteristic diagram of the TE Layer1 enters a Dense Block2, the output characteristic diagram of a Dense Block2 enters a TE Layer2, the output characteristic diagram of the TE Layer2 enters a Dense Block3, after the Dense Block3, a BatchNorm + ReLU combination operation is carried out, the output characteristic diagram is divided into 2 branches, one branch is a characteristic diagram of low level, one branch enters a TE Layer3, the output characteristic diagram of the TE Layer3 enters a Dense Block4, and the output characteristic diagram of the Dense Block4 enters a TE Layer 4.

Further, the DenseNet69 network is added with a Conv + BatchNorm + ReLU combination operation, and then the extracted feature is divided into 5 branches for processing.

The first embodiment is as follows:

the embodiment is a marine vessel segmentation method based on joint image information under the sea and sky background; in the present embodiment and example, the data set used is partially derived from Singapore Markime Dataset (SMD), partially derived from markime Detection, Classification, and Tracking Dataset (MarDCT), created by RoCoCo laboratories of the university of roman, and partially derived from the network.

The method for segmenting the ship under the sea and sky background based on the combined image information comprises the following steps:

step a: preprocessing the picture:

performing normalization operation and image expansion operation on all the pictures;

the normalization operation is adopted to prevent the absolute value of the net input from being too large to cause the output of the neuron to be saturated;

the image scaling operation is adopted because the pictures are not consistent in size and is not friendly to training a neural network. In the present embodiment, a bilinear interpolation method is used, and the interpolation formula is as follows:

I(m,n)＝am+bn+cmn+d (1)

Step b: training an interference factor discriminator:

the invention adopts a discriminator based on a neural network to judge the interference factors contained in the image; in the specific implementation process, a SqueezeNet classification network is adopted. The strategy used by SqueezeNet in architecture is: using a1 × 1 convolution kernel to reduce the parameters of the model; the number of input channels is reduced using a compression layer.

The main purpose of these two strategies is to make them in concert with the lightweight of the architecture and without loss of final precision. Considering the final size and the running speed of the algorithm, one of the three current squeezet versions with the smallest memory occupation is selected, and the class of the output end is adjusted (class is 2) so as to be suitable for the requirement of the framework sea sky background.

The overall network framework of the interference factor discriminator is shown in fig. 3. It can be seen from fig. 3 that the input picture is first subjected to the first combination operation (Conv, ReLU, Maxpool), the number of channels of the output feature map is 96, and the size of the feature map is 1/2 of the size of the input map, i.e. C96/2 in fig. 3; the number of channels through the Fire module, e.g. (96-16-64) in Fire2 representing the profile, is then compressed from 96 to 16 and expanded to 64, and so on for the rest of the Fire modules, and both Fire3 and Fire5 are followed by MaxPool operations in order to reduce the profile to half the size of the input profile, i.e. to the extent that all '/2' tokens in FIG. 3 are meant. Until Fire9, the number of output feature maps is 256 at this time, the size is 1/8 of the size of the input map, and the judgment result of the output picture is given after the last combination operation (Dropout, Conv, ReLU, AvgPool). The most main structure in the network is a Fire Module, for the Fire Module, the internal structure diagram refers to fig. 4, the Fire Module contains an Squeeze Layer and an expanded Layer, wherein the Squeeze Layer mainly completes the convolution process with the convolution kernel size of 1 × 1, and the main function is to compress the number of input characteristic diagram channels, thereby achieving the purpose of reducing parameters; the expanded Layer comprises a convolution process with a convolution kernel size of 1 × 1 and a convolution process with a convolution kernel size of 3 × 3 in parallel, the expanded Layer has the function of relieving the problem that the final classification performance is reduced due to the fact that the characteristic diagram is greatly reduced due to the existence of the Squeeze Layer, and the two ReLUs in the Fire Module are used for introducing a nonlinear factor and improving the expression capacity of the network. Referring to FIG. 3, taking Fire2 as an example, (96-16-64) is completed by Squeeze Layer (96-16) and expansion Layer (16-64). The whole process of entering the Fire Module by the feature map can be summarized as follows:

wherein x represents the input profile, x' represents the output profile, S represents the mapping of Squeeze Layer, E represents the mapping of Expand Layer (subscript 1 represents the convolution process of 1 × 1, subscript 3 represents the convolution process of 3 × 3),

characterized by a feature graph boundary level stitching operation,

a linear commutation activation function is characterized.

Training by using a training set to obtain a trained interference factor discriminator; the training set comprises ship images under the background of normal sea and sky, ship images under sea fog, ship images corresponding to large wake flow, ship images in large waves, ship images corresponding to sea surface reflection, ship images corresponding to reflection, ship images under dusk and the like;

when the practical application of ship extraction is carried out, the images are input into a trained interference factor discriminator, and the trained interference factor discriminator judges the images of the ship under the background of normal sea and sky and the images of the ship (the images of the ship under sea fog in the embodiment) corresponding to other conditions; then, the ship is correspondingly input into the trained ship extractors respectively to carry out ship extraction.

Step c: carrying out targeted training on the ship extractor:

the invention adopts an extractor based on a neural network to extract ships contained in an image;

in some embodiments, a DeepLabv3+ segmentation network with ideal comprehensive performance is improved, so that the DeepLabv3+ segmentation network is more suitable for the ship segmentation problem in the sea-sky background. In the deplab v3+, the main work is summarized as taking the previous deplab v3 as the encoding module of the new network structure and proposing the decoding module, i.e. the encoding-decoding structure for image segmentation. The algorithms of the deep lab series all use hole convolution. The hole convolution is based on the standard convolution and carries the holes. The standard convolution process is to take a continuous area in the image to perform convolution operation, the hole convolution is taken at intervals in the image, and the number of the intervals is determined by a new hyper-parameter, namely the hole rate, brought by the hole convolution. The standard convolution and hole convolution processes are shown in fig. 5(a) and 5(b), respectively, where fig. 5(a) is a schematic diagram of the standard convolution and fig. 5(b) is a schematic diagram of the hole convolution. It is obvious to observe that the introduction of the hole convolution can make the obtained larger receptive field information for the same size filter, which is of great benefit for image segmentation.

For the hole convolution process, the formula can be summarized as follows:

wherein f is_iIndex characterizing the filter, g_iThe index of the pixel is characterized, and the void rate is characterized by beta-3.

Referring to fig. 6, the operation of deplab v3+ can be repeated as: for an input image, firstly, the DCNN network is used to preliminarily extract features, the size of the feature map is 1/16 the size of the input image, then the extracted features are subjected to a hole convolution operation (the first layer of scaled Conv of the combination operation after the arrow 2-5 in fig. 6) by using a certain number of hole coefficients (generally, 4 different hole coefficients are used), and then a Batch Normalization operation (the second layer of Batch Norm of the combination operation after the arrow 2-5 in fig. 6) is performed, see the branches 2-5 (arrows 2-5) in fig. 6. In DCNN, a branch whose characteristic diagram size is also 1/16 of the input image is also introduced, see branch 1 of fig. 6 (arrow 1), and a combining operation (AvgPool, Conv, BatchNorm, ReLU) is carried out to a characteristic diagram size of AvgPool, Conv1 × 1, the feature map of the branch is up-sampled to 1/16 with the feature map size being the original size, and is spliced with the feature maps after the hole convolution and Batch Normalization operations, i.e. the feature maps obtained under different views are spliced, see the feature map in Encoder in fig. 6

The stitched feature map is then subjected to a combination operation (Conv, BatchNorm, ReLU, Dropout) and finally up-sampled by 4 times to 1/4 of the original size, thereby forming the feature map F1 of Encoder. Meanwhile, at the stage of extracting the feature map F1 by the DCNN, a feature map of low level is extracted, the size of the feature map is 1/4 of the size of the original, see branch 6 (arrow 6) in fig. 6, the low level feature is subjected to a combination operation (Conv, BatchNorm, ReLU) once and then is spliced with F1 to form a feature map F2, then F2 is subjected to two combination operations (Conv, BatchNorm, ReLU) and one combination operation (droplu, Conv), and finally, the obtained feature map is directly up-sampled by 4 times to the size of the original to output a segmentation map, i.e., Decoder.

In order to obtain higher segmentation accuracy, in some embodiments, the feature extraction network densneet 69 network is used as a DCNN module (DCNN module in fig. 6), so as to obtain the capability of the feature extraction network to solve the segmentation problem.

The DenseNet69 network is obtained by improving a DenseNet121 network which has excellent performance in the classification problem, and the DenseNet121 is improved according to the following four strategies:

(i) in order to maintain the Encoder-Decoder structure in the DeepLabv3+ original network, firstly, a low-order characteristic diagram is led out behind the 3 rd Dense Block in the DenseNet 121;

(ii) in order to obtain a 16-time down-sampling structure as the output of the Encoder, a Transition Layer is added at the tail end of the DenseNet 121;

(iii) in order to further solve the problem that the display memory is seriously occupied due to the DenseNet, the number of layers of the DenseNet121 is greatly reduced, namely, 8 Dense layers are contained in each used Dense Block, and referring to FIG. 7, FIG. 7 shows the internal structure of the Dense Block;

(iv) in order to balance the problem of the decrease in feature extraction capability due to the drastic reduction in the number of layers in strategy (iii), all the Transition layers in DenseNet121 were removed, and a newly proposed Tran-Expansion Layer is embedded therein, see fig. 8, the structure of the Transition Layer used in the original DenseNet121 is shown on the left side of figure 8, on the right side of FIG. 8 is a Tran-Expansion Layer (TE Layer) which is newly proposed by the present invention, it can be seen that, unlike the Transition Layer, the TE Layer controls the characteristic diagram according to a new hyper-parameter θ (expansion coefficient), therefore, θ is a normal number larger than 1 in the present embodiment, for example, for an input feature map with ξ layers, the output feature map is θ × ξ layers (for example, the number ξ of input layers is 3, the number of output layers is 3 × θ, that is, the number of input layers is expanded by θ), thereby achieving the problem of reduced feature extraction performance caused by great reduction of the number of layers in the balance strategy (iii);

FIGS. 7 and 8 are enlarged views of two structures; the output of FIG. 7 is connected to the right TE Layer of FIG. 8, and the purpose of drawing the Transition Layer on the left side of FIG. 8 is to compare the original Transition Layer with the current TE Layer, and the Transition Layer on the left side does not exist in the present invention.

The modified DenseNet121 is named DenseNet69 as the DCNN module of deplab v3+, depending on the number of layers in the final network, and in some embodiments a combining operation (Conv, BatchNorm, ReLU) may be added after DenseNet 69. Overall improved depeplabv 3+ structure referring to fig. 9, for the specific implementation of the policy (i), refer to arrow a drawn after DenseBlock3 in fig. 9, which is now a feature map of low level and is 1/4 in size of the input image; for the specific implementation of the strategy (ii), refer to the arrow B drawn after TE Layer4 in fig. 9, which is a high level feature map, and the feature map size is 1/16 of the input image size, and it can be seen that we have performed a combining operation (Conv, BatchNorm, ReLU) on the feature map output by DenseNet69, where Conv aims to compress the feature map Layer number output by DenseNet again, and BatchNorm and ReLU are added to accelerate the Layer convergence speed and better learning features; for a specific implementation of strategy (iii), seeIn FIG. 9, there are 4 sense blocks and 8 sense layers in each sense Block; for the specific implementation of the strategy (iv), refer to 4 TE layers in fig. 9. The overall feature extraction process is described in deep labv3+, so the process of initially extracting features when densinet 69 is DCNN is only explained here: referring to fig. 9, the input picture is firstly subjected to feature extraction through the Conv Layer, then the extracted feature C enters into the delay Block1, internal structure of the delay Block1 referring to fig. 7, it can be seen that the feature C firstly enters into the delay Layer1, namely, two times of combining operations (BatchNorm, ReLU, Conv) are carried out, and then the output of the delay Layer1 is spliced with the feature C, referring to the first one in fig. 7

The spliced features are used as the input of the Dense Layer2, similarly, the Dense Layer3 is used as the input of the spliced whole of the outputs of Dense layers 1 and 2 and the feature C, and the rest Dense layers are analogized by analogy, and the size of the image after passing through Dense Block is not changed. Referring to fig. 9, it can be seen that the output feature map of the detect Block1 next enters the TE Layer1, see fig. 8 to view the internal structure diagram of the TE Layer, and the right side of fig. 8 is the dilation structure newly proposed by the present method, and it can be seen that the feature map enters the TE Layer and then goes through a combining operation (BatchNorm, ReLU, Conv, avgpoling), where the output feature map is 1/2 of the size of the input feature map, and it can be seen that the number of Conv output channels in the TE Layer is dilated to make up for the defect of the feature extraction capability decrease caused by the large reduction of the number of layers, as described in policy (iv). The remaining Dense Block and TE Layer, similarly, note that there is a combinatorial operation (BatchNorm, ReLU) after Dense Block3 to further mitigate overfitting and prevent the appearance of gradient disappearance.

Respectively training the ship images in each environment in the training set to obtain trained ship extractors corresponding to the ship images in different environments;

and when the ship extraction is actually applied, inputting the images into the trained ship extractors corresponding to the images according to the judgment result of the trained interference factor judger to carry out the ship extraction.

Examples

Example 1:

the present embodiment describes the present invention by taking the ship image under the background of normal sea and sky and the ship image under sea fog as examples,

the method specifically comprises the following steps:

first, data preparation phase

All data used for training and testing in this embodiment 1 are derived from SMDs, mardcts (the data set is well known to those skilled in the art, and the embodiment of the present invention is not described in detail herein) and networks. And all data were manually labeled using LabelMe. The newly consolidated vessel database is then weighed as Mari-Data. Because the interference factor discriminator and the ship extractor need to be trained respectively, the trained data set is also divided into two parts.

For the data of the interference factor discriminator, the embodiment only considers the sea fog scene with the most serious influence on the precision, so as to improve the overall segmentation precision, therefore, the interference factor discriminator part of the invention only needs to judge whether the picture contains sea fog, and the overall structure of the interference factor discriminator part is shown in fig. 2. 1669 photo library training interference factor discriminators consisting of normal sea sky scenes and sea fog scenes are picked from the Mari-data. For the data of the ship extractor, the invention only needs to train the ship extractor models under two conditions, namely sea fog and normal sea-sky background. And finally, 972 pictures are picked from the Mari-data to train the ship extractor under the background of the normal sea and sky, and 745 pictures are picked to train the ship extractor under the sea fog scene.

1) LabelMe: the method is special for annotation software for image segmentation, the image can generate a JSON file after being annotated by the software, and corresponding annotation information is used by reading the JSON file.

Second, training stage

1) And repeating the steps b1 and b2 for the training process of the interference factor discriminator.

The interference factor discriminator is trained by adopting a discrimination network (such as SqueezeNet) based on deep learning, and a judgment result of whether the image contains sea fog is given, and the method specifically comprises the following steps:

step b 1: initializing the SqueezeNet network, setting the required judgment class number of class to be 2 (a normal sea-sky background and a sea-sky background containing sea fog), setting the maximum iteration number of Epoch to be 500, setting the Batch training Size of Batch-Size to be 32, setting the initial learning rate initial _ lr to be 0.001, selecting an Adam optimization algorithm introduced with gradient first-order moment estimation as an optimizer, setting a weight attenuation coefficient L2 to be 0.00001, and selecting a Cross Entropy loss function Cross _ Entrophy _ loss as a loss function.

Step b 2: and (c) taking the image obtained in the step a as the input of the Squeezenet network, and starting to train the interference factor discriminator. The training is an iterative process as a whole, each iteration firstly calculates the cross entropy loss value of the forward propagation, and then reversely updates all parameters of the network with the goal of minimizing the cross entropy loss value.

2) Repeating steps c1, c2 for the training process of the marine vessel extractor.

The vessel extractor is trained specifically by using a segmentation network based on deep learning (here, an original deep nav DeepLabv3+ and an improved I-deep nav 3+ are trained at the same time) to segment the vessel, and the segmentation results of all vessels contained in the image are given, and the specific steps are as follows:

step c 1: initializing the DeepLabv3+ network and the modified I-DeepLabv3+ network, setting the maximum iteration number Epoch to 900, setting the Batch training Size to Batch-Size to 20 for the DeepLabv3+, setting to Batch-Size to 5 for the I-DeepLabv3+ and using PyTorch's own checkpoint function to optimize the memory usage for the I-DeepLabv3+, but this will bring 10% -20% extra execution time due to the difference of networks and the limitation of hardware. In order to have a faster convergence rate in the early training period, the initial learning rate initial _ lr is set to 0.01, and in order to further reduce the loss in the later training process, the learning rate is set to 0.008 at 300epoch, 0.005 at 600epoch, 0.002 at 800epoch, an Adam optimization algorithm introducing gradient first moment estimation is selected as an optimizer, the weight attenuation coefficient L2 is set to 0.0001, and the loss function is selected to be a cross entropy loss function BCE _ loss.

Step c 2: in order to obtain the best extraction result, all images containing sea fog in all images after the step a are extracted as input of a DeepLabv3+ network and an I-DeepLabv3+ network for training ship extractors A1 and A2 in a sea fog scene, and all images under a normal sea-sky background in all images after the step a are extracted as input of a DeepLabv3+ network and an I-DeepLabv3+ network for training ship extractors B1 and B2 in a normal sea-sky background scene. The training is an iterative process as a whole, each iteration firstly calculates the cross entropy loss value of the forward propagation, and then reversely updates all parameters of the network with the goal of minimizing the cross entropy loss value.

Third, testing stage

And repeating the first step to the third step for the process of the test stage.

The method comprises the following steps: repeating the step a on the test picture;

step two: inputting the test image obtained in the first step into the Squeezenet network trained in the steps b1 and b2 for type judgment, and finally giving a judgment result in a probability form;

step three: and (4) respectively inputting the results of the second step into different ship extractors trained in the steps c1 and c2 according to the probability value to extract the ship, namely if the probability value is close to 1, the picture is a picture under the background of normal sea and sky, the picture only needs to be input into the ship extractor B1 or B2 to extract the ship, and if the probability value is close to 0, the picture containing sea fog only needs to be input into the ship extractor A1 or A2 to extract the ship. Therefore, by combining, the structure of the marine vessel extractor can be as follows: a1+ B1, A1+ B2, A2+ B1 and A2+ B2.

The embodiment of the invention adopts A1+ B1, A1+ B2 and A2+ B2, and the A2+ B1 is omitted in the experimental process. And because the probability value in the step two does not belong to the judgment value, in order to make the result of the step two have judgment, a threshold value gamma is set for the result, namely when the probability value is greater than or equal to the threshold value gamma, the input picture is considered to be the picture under the normal sea-sky background, otherwise, the input picture is the picture with sea fog. In order to obtain the optimal segmentation result, 15 different sets of Γ are selected, and MIoU (MIoU is a segmentation index well known to those skilled in the art, and is not described in detail in the embodiments of the present invention) and Avg are selected as the most representative evaluation criteria, because MIoU is considered as the most representative evaluation criteria in the image segmentation field, and Avg can just comprehensively reflect the performance of the whole four evaluation indexes. The experimental results 1 are shown in table 1. From table 1, it can be observed that as Γ increases, the optimal Γ values of a2+ B2 and a1+ B2 become smaller, and for this reason, three points are given: (i) in the data preparation stage, the number of the sea fog pictures is 224 less than that of the normal sea-sky pictures, so that the generalization capability of the network on the sea fog pictures is definitely inferior to that of the normal sea-sky background pictures; (ii) in step c1, we know that the ratio of the batch size setting of I-DeepLabv3+ is small due to hardware limitation, which can cause oscillation at intervals in the training process to cause slow convergence speed; (iii) the diversity of the sea fog, such as the dense and light sea fog, the distance from the observer, etc., may cause the generalization ability of the network to the sea fog picture to be reduced, and thus the training of the sea fog model is more difficult. In summary, only a relatively small value can be set, so that the network of the sea fog can specifically solve the situations of severe sea fog, thereby improving the overall segmentation accuracy, and the experimental result also shows that the segmentation effect is more and more excellent when the threshold value Γ is gradually reduced, i.e. the idea is proved to be correct. Also for these reasons, we finally abandoned version a2+ B1 in view of the significance of the experiment and the time constraints. And as can be seen from table 1, if Avg is used as the evaluation criterion, the precision of the three versions is gradually increased, and it is best to OurApproachA1+ B2-88.96%. Best performance we present in black bold.

TABLE 1

The experimental result 2 is shown in table 2, and the result shows that the segmentation effect obtained by the algorithm is obviously more suitable for the complex occasion of the sea-sky background than deep nav 3+ under the four indexes (PA, MPA, MIoU, FWIoU are the segmentation indexes known by the person skilled in the art, and the embodiment of the present invention does not need to be described herein) in the field of image segmentation. Table 2 shows five indices (except Time), and the segmentation results of all three versions of our table are better than the deepLabv3+ which has the most excellent segmentation performance for different occasions. For Time, the version of the real-Time segmentation algorithm only takes 10-20 ms more than DeepLabv3+, but the real-Time segmentation algorithm also belongs to the real-Time segmentation algorithm. For the high consumption time, the explanation is made from the following 3 aspects: (i) firstly, it should be noted that the framework of the algorithm is formed by two parts, so that the problem of consuming more time is inevitable, but the implementation process also tries to reduce the time consumed more, for example, the classification network selected by the front end is a lightweight network squeezet; (ii) please note that we use the checkpoint function in the I-DeepLabv3+ to optimize the use of the memory, which may bring about 10% -20% of additional execution time, so when the hardware condition allows, the checkpoint function is removed, and the problem of time consumption is further optimized; (iii) the problem of multiple time consumption can be solved by adopting the conventional Network Slimming algorithm.

TABLE 2

1)Avg：

2) Time: the time it takes to process a picture.

Experimental segmentation results we took 9 for comparative demonstration, see fig. 10. It can be seen that the new algorithm proposed by us is more suitable for the sea-sky background than when deep labv3 +.

The present invention is capable of other embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention.

Claims

1. A ship image segmentation method based on joint image information under a sea and sky background is characterized by comprising the following steps:

2. The marine vessel image segmentation method based on joint image information as claimed in claim 1, wherein: the image expansion operation in the step 1 adopts a bilinear interpolation mode, and the interpolation formula is as follows:

I(m,n)＝am+bn+cmn+d

3. The marine vessel image segmentation method based on joint image information as claimed in claim 1, wherein: step 2, adopting a classification network based on a neural network to construct the interference factor discriminator is realized based on Squeezenet, and the method specifically comprises the following steps:

and finally, outputting a judgment result.

4. The marine vessel image segmentation method based on joint image information as claimed in claim 3, wherein: the Fire 2-Fire 9 modules are specifically:

5. The marine vessel image segmentation method based on joint image information as claimed in claim 1, wherein: step 4, adopting the neural network-based segmentation network to construct the ship extractor under different environments is realized based on deep nav 3+, and specifically comprises the following steps:

finally, directly performing up-sampling on the image by 4 times to 1/4 of the size of the original image, thereby forming a feature map F1 of the Encoder;