CN110059758B

CN110059758B - Remote sensing image culture pond detection method based on semantic segmentation

Info

Publication number: CN110059758B
Application number: CN201910333358.9A
Authority: CN
Inventors: 胡永利; 田德宇; 朱济帅; 吴泰琦; 彭小松; 李海霞
Original assignee: Hainan Changguang Satellite Information Technology Co ltd
Current assignee: Hainan Changguang Satellite Information Technology Co ltd
Priority date: 2019-04-24
Filing date: 2019-04-24
Publication date: 2020-07-10
Anticipated expiration: 2039-04-24
Also published as: CN110059758A

Abstract

The invention discloses a remote sensing image culture pond detection method based on semantic segmentation. The method is completed in four steps, firstly, the sub-meter-level remote sensing image is cut and blocked, and index information of each image block is stored at the same time; secondly, training the culture pond detection semantic segmentation model in advance until convergence to obtain a culture pond semantic segmentation model; then inputting different remote sensing image blocks into the semantic segmentation model of the culture pond to obtain a remote sensing image block culture pond binary mask image; and finally, fusing basic index information of the remote sensing image blocks obtained after cutting to complete the combination of the binary mask images and generate a mask grid image of the culture pond. The method utilizes the deep learning technology to train the culture pond detection model, and has accurate extraction result, good robustness and high adaptability; and the remote sensing image culture pond detection integrated process is designed, the manual participation degree is reduced, and the method has a certain practical value.

Description

Remote sensing image culture pond detection method based on semantic segmentation

Technical Field

The invention relates to a deep learning technology, mainly relates to pixel-level semantic segmentation, and discloses an end-to-end remote sensing image culture pond detection method.

Background

Aquaculture refers to commercial activities for raising aquatic organisms (including fish, mollusks, crustaceans, and aquatic plants), and can be classified into 3 categories, land, water, and beach, depending on the nature of the basal plane of operation. Land-based systems mainly include ponds, rice fields and other facilities constructed on land; the development system based on the water surface comprises a bay, a fence, a net cage and raft culture, and is usually positioned in coastal or inland water areas provided with a fence; the mudflat-based culture system comprises pond culture and high-level pond culture. China is the first aquaculture country in the world and is the only country in the world with aquaculture yield exceeding fishing yield, and the aquaculture scale is still continuously and rapidly increasing at present. While making a great contribution to meeting the demand of aquatic products in the world, the aquaculture in China is under pressure in various aspects such as the increasingly worsening of the water environment condition, the supervision of social public opinion, the monitoring of policies and regulations, the increasingly higher demand on the quality of the aquatic products and the like, and the aquaculture is increasingly becoming one of the current research hotspots.

Due to the convenience and the economy of aquaculture in the coastal zone land area, a large number of aquaculture ponds are built in ecological protection areas of the coastal zone, wastewater with excessive nitrogen and phosphorus in the aquaculture ponds is directly discharged into the sea through concealed pipes, the sea defense forest land is illegally occupied, and the ecological environment is greatly damaged. Remote sensing has the advantages of wide detection range, high data acquisition speed, short period, high timeliness, low cost, high economic benefit and the like. In recent years, with the development of remote sensing technology, multi-platform, multi-type and multi-resolution remote sensing data provide opportunity for aquiculture information acquisition, and a culture pond can be identified and positioned quickly by using remote sensing images.

The method for extracting the remote sensing image from the culture pond mainly comprises visual interpretation, namely, according to interpretation marks (such as positions, shapes, sizes, tones, shadows and textures) and interpretation experiences, the requirement on the professional level of an interpreter is high, and meanwhile, the method has the advantages of high workload, labor and time consumption and low timeliness of quantitative analysis on massive spatial information; secondly, the method based on the space structure analysis is not suitable for extracting the culture pond with isolated distribution and small plaque; in addition, the object-oriented information extraction method firstly segments an image into image objects with certain meanings, then comprehensively utilizes information such as spectral characteristics, textures, shapes, proximity relations and the like of ground features to determine the types of the segmented objects, is greatly influenced by characteristic selection, and needs to be combined with a visual interpretation method.

A semantic segmentation algorithm based on a Deep convolutional neural network has a good effect in the field of computer vision, and a semantic segmentation algorithm designed based on a network structure classified by labels such as AlexNet, VGGNet and GoogleNet is used as a basis in the early stage, L ong and the like replace a full connection layer of the convolutional neural network with a convolutional layer, the convolutional layer is adopted and gradually recovered through an interpolation method, and a semantic segmentation mask map with the same size as the original input size is generated.

In order to effectively and quickly extract the information of the culture pond and enable the extraction of the culture pond to be more intelligent and automatic, a remote sensing image culture pond detection method based on semantic segmentation is provided.

Disclosure of Invention

The invention aims to provide a remote sensing image culture pond detection method based on semantic segmentation, which can effectively and quickly identify culture pond information in a remote sensing image, avoids the characteristics of large image, time and labor waste caused by artificial intervention in the traditional method, and provides technical support and basis for culture pond supervision and protection of coastal zone ecological environment.

The invention provides a semantic segmentation-based remote sensing image culture pond retrieval method, which is realized by adopting the following technical scheme:

the method comprises the following steps that firstly, a sub-meter-level remote sensing image to be extracted is cut and processed in a blocking mode, remote sensing image blocks are obtained after cutting, and index information of each image block is stored;

step two, when information extraction is carried out on the remote sensing image block, a culture pond detection semantic segmentation model needs to be trained in advance, the sub-meter remote sensing image is used as training basic data, a visual interpretation mode is adopted, culture pond information in the remote sensing image is obtained, a training sample is made, data are preprocessed and input into a designed semantic segmentation model for training, and after cost errors are converged, the culture pond semantic segmentation model is obtained;

extracting culture pond information of the remote sensing image block, preprocessing the culture pond information, and inputting the preprocessed culture pond information into a trained semantic segmentation model to obtain a binary mask map of the culture pond of the remote sensing image block;

combining the stored basic index information of the remote sensing image blocks and the real parameter size of the merging area, merging the extracted binary mask images of the remote sensing image blocks to obtain the mask raster image of the culture pond with the geographic information.

Further, the remote sensing image in the first step is cut and blocked, and the specific cutting standard is determined by the size of a preset block and the size of an overlapping area;

further, the remote sensing image cutting and blocking in the first step comprises the following steps:

a) setting cutting parameters, wherein the size of the pixels of the cut image blocks is set to be 1024 x 1024, and the size of the pixels of the length of the overlapped area is 144 and is represented by o;

b) the method comprises the steps of cutting an original sub-meter-level remote sensing image in a block-by-block mode according to a line sequence, wherein stored index information is (x, y, w, h), x is the column number of the remote sensing image block in the original sub-meter-level image, y is the line number of the remote sensing image block in the original sub-meter-level image, w and h are the width and the height of the remote sensing image block, and due to the fact that a GPU training model is used, in order to take account of the size of GPU video memory, the stored index information is uniformly set to be 1024.

Further, the training of the semantic segmentation model of the culture pond in the second step comprises the following steps:

a) manufacturing a training sample by using a sub-meter satellite image, and extracting culture pond information in the image by means of visual interpretation by means of remote sensing professional software;

b) processing the extracted culture pond information data and the original sub-meter satellite image to generate a training sample and a test sample containing culture pond mask information;

c) carrying out preprocessing such as random mirror image, random rotation, random blur, random affine transformation, random color stretching transformation and the like on training data input into the model;

d) setting training parameters, and taking a Dice coefficient of a set similarity measurement function and a binary cross entropy BCE as an error cost function, wherein the error cost function specifically comprises the following steps:

Loss＝DiceLoss+BCELoss (3)

wherein p is_nIs the predicted value of the nth pixel point, r_nThe actual label value of the nth pixel point is a smoothing coefficient, N is the number of the pixel points, and the finally obtained error cost function is shown as (3) and is the result of adding the error costs of the Dice and the BCE;

training the model by using an Adam optimizer, wherein a specific training strategy is shown as a formula (4):

wherein m is_tAnd v_tβ being first and second order momentum terms, respectively₁And β₂The power value is usually 0.9 and 0.999 respectively;

and

respectively are respective correction values; w_tRepresenting the parameter of the t time, namely the t-th iteration model; g_t＝ΔJ(W_t) Representing the gradient magnitude of the t times of iterative cost function relative to W; taking a very small number (generally 1e-8) to prevent the denominator from being 0;

e) parallel training is carried out by adopting 4 Titan xp GPUs until the model converges, and model testing is carried out in a test set, wherein the testing IoU is 0.75.

Further, the extraction of the information of the culture pond for the remote sensing image block in the third step comprises the following steps

The method comprises the following steps:

a) the method comprises the steps that a model structure is divided into two parts, namely encoding and decoding, after input remote sensing block data is subjected to standardization processing, encoding of the model is firstly carried out, namely a characteristic extraction stage, an encoding layer adopts a ResNet34 network and is formed by connecting FirstConv-BN-Relu, FirstMaxPool, Encoder1, Encoder2, Encoder3 and Encoder4 in sequence, wherein Conv represents convolution, BN represents batch normalization, Relu is an activation function, and Relu (x) is max (0, x); inputting R, G, B three-channel images with the size of 1024 × 1024, namely three-dimensional data, wherein the size of a convolution filter of the FirstConv-BN-Relu is 7 × 7, the convolution step size is 2, and the convolution filter contains batch normalization BN and an activation function Relu; encoder is a residual block of a ResNet34 network under different scales, the output of each Encoder part is reserved as bottom layer characteristics, and the output bottom layer characteristics are down-sampled by 2 times;

b) the output of the coding layer is connected with a cavity convolution module DilationBlock which is formed by combining and superposing cavities with different expansion rates, so that the receptive field of the model is increased; the modules are sequentially connected by the void convolution layers with the expansion rates of 1, 2, 4 and 8, and finally output as output superposition values of void convolution with different expansion rates;

c) the output of the hole convolution module is used as the input of a decoding layer which is formed by sequentially connecting DecoderBlock4, DecoderBlock3, DecoderBlock2, DecoderBlock1, Final-Deconv-Relu-1, Final-Conv-Relu-2 and Final-Conv-Sigmoid-3, the output of the hole convolution module is connected with DecoderBlock4, the bottom layer features stored in the coding layer are connected with the decoding layer after being connected through L inkBlock connection processing, Encoder3 is connected with L inkBlock3, Encoder2 is connected with L inkBlock2, Encoder1 is connected with 7 inkBlock1, the output of the DecoderBlock4 is combined with the output of 4 as the input of the DecoderBlock 72, the output of the DecoderBlock4 is combined with the output of the lamination layer 4 as the input of a DecoderBlock deconvolution function, the DecoderBlock4, the convolution function of each of the DecoderBlock 72 and the DecoderBlock4 and the activation function of the DecoderBlock4, and the convolution function of the DecoderBlock4, wherein the convolution function of the DecoderBlock 72 comprises the volume-4, the volume-363, the volume-4, the volume-3

d) And obtaining a predicted value of each pixel point through a decoding layer, and generating a remote sensing image culture pond binary mask image according to the predicted value, wherein the pixel value of a culture pond area in the image is 1, and the pixel values of other areas are 0.

Further, the combining of the two-value mask maps of the culture pond in the fourth step specifically comprises:

and the two-value mask map of the culture pond obtained by the semantic segmentation model does not contain geographic information, each generated culture pond block corresponds to the stored index information of the remote sensing image block, the two-value mask maps of the culture pond are merged through coordinate mapping, the geographic information is added, and a grid map of the culture pond corresponding to the original input remote sensing image is generated.

Compared with the prior art, the invention has the following beneficial effects:

compared with the traditional method, the method for detecting the remote sensing image culture pond based on semantic segmentation improves the universality of the model, and can effectively extract culture pond information in the remote sensing image with relative blur and color deviation;

the network of the invention is an end-to-end structure, namely, a remote sensing image is input, a mask image containing culture pond information is directly output, no human intervention is needed, and the degree of automation is higher.

Brief description of the drawings

FIG. 1 is an overall flow chart of the remote sensing image culture pond detection method.

FIG. 2 is a schematic diagram of a network structure of the remote sensing image culture pond detection method of the invention.

FIG. 3 is a schematic diagram of a coding layer structure in the remote sensing image culture pond extraction model.

FIG. 4 is a schematic structural diagram of a cavity convolution module DialationBlock in the remote sensing image culture pond extraction model.

FIG. 5 is a schematic structural diagram of a DecoderBlock module of a decoding layer in the remote sensing image culture pond extraction model.

FIG. 6 is a schematic structural diagram of an L inkBlock module in the remote sensing image culture pond extraction model.

FIG. 7 is a schematic diagram showing changes in training error cost values of the remote sensing image culture pond extraction model.

Fig. 8 is a jilin No. one sub-meter-level remote sensing image in the embodiment of the present invention.

FIG. 9 is a drawing of the overall effect of extraction of the Jilin No. one sub-meter-level remote sensing image culture pond.

Detailed Description

The technical scheme of the invention is explained in detail below with reference to the attached drawings:

FIG. 1 is an overall flow chart of the remote sensing image culture pond detection method, which comprises the following specific steps:

the first step, the sub-meter remote sensing grid image size is bigger under general conditions, needs to be cut and blocked for prediction:

1.1, firstly setting cutting parameters, setting the size of pixels of a cutting image block to be 1024 x 1024 during cutting, setting the length o of a superposition area to be 144, wherein an superposition area refers to that a common area exists between every two adjacent cutting blocks during cutting, and because a prediction result of each remote sensing block possibly has a fault at a joint in the final splicing process, the superposition area is set, and influences are eliminated by taking the intersection of the prediction results of the superposition area;

1.2, cutting the original sub-meter-level remote sensing image in blocks according to the line sequence, and storing index information.

And a second step, which is a model training part and actually needs to be trained in advance, wherein the specific training process is as follows:

2.1, manufacturing a training sample by using the Jilin I sub-meter-level remote sensing image, and extracting culture pond information in the image by means of visual interpretation by means of remote sensing professional software;

2.2, processing the extracted culture pond information data and the original sub-meter-level remote sensing image to generate 2000 training samples containing culture pond mask information and 300 test samples;

2.3, preprocessing such as random mirror image, random rotation, random fuzzy, random affine transformation, random color stretching transformation and the like is carried out on the training data input into the model, and the input data are preprocessed, so that the data are expanded, the sample diversity is increased, and the robustness of the model is improved to a certain extent;

and 2.4, setting training parameters, adopting a set similarity measurement function Dice coefficient and a binary cross entropy BCE as error cost functions, and improving the performance of the model by using different loss functions.

The model is trained by using the Adam optimizer, the Adam optimizer is simple to implement, efficient in calculation, low in memory requirement, free of influence of gradient expansion transformation on parameters, capable of automatically adjusting learning rate, suitable for large-scale data and parameter scenes, and excellent in performance.

Fig. 7 is a schematic diagram of the decrease of the error cost value along with the number of iterations, and it can be seen through graphical representation that the model convergence is fast, and the model training is ended when the training error cost value fluctuates and tends to be stable.

2.5, performing parallel training by adopting 4 Titan xp GPUs, and accelerating the convergence speed of the model through parallel calculation during training; during testing and actual application, the detection efficiency of the model can be greatly improved through multi-GPU parallel computing. The concrete framework of the model is shown in fig. 2, until the model converges, the model is tested in the test set, where test IoU is 0.75.

Inputting the remote sensing image blocks cut and partitioned into blocks into a trained model, and acquiring binary mask information of the culture pond:

3.1 after carrying out standardization processing on the input remote sensing block data, firstly entering a model coding stage, namely a feature extraction stage, wherein a coding layer adopts a ResNet34 network and is formed by connecting FirstConv-BN-Relu, FirstMaxPool, Encoder1, Encoder2, Encoder3 and Encoder4 in sequence, as shown in figure 3, the output of each Encoder part is reserved as a bottom layer feature, and the sampling rate of the output bottom layer feature is reduced by 2 times;

3.2 the output of the coding layer is connected to a cavity convolution module, which is formed by combining and superposing cavities with different expansion rates, as shown in FIG. 4, and the receptive field of the model is increased. The modules are sequentially connected by the void convolution layers with the expansion rates of 1, 2, 4 and 8, and finally output as output superposition values of void convolution with different expansion rates.

And 3.3, the output of the cavity convolution module is used as the input of the decoding layer, the cavity convolution increases the scope of the receptive field of the convolution, the cavity convolution with different expansion rates is used, the cavity convolution is carried out on different scales, and the context semantic information of the characteristics is enriched.

The decoding layer is formed by sequentially connecting DecoderBlock4, DecoderBlock3, DecoderBlock2, DecoderBlock1, Final-Deconv-Relu-1, Final-Conv-Relu-2 and Final-Conv-Sigmoid-3. the output of the hole convolution module is connected with DecoderBlock4, the bottom layer features stored in the coding layer are connected with the decoding layer after L inkBlock connection processing, such as the DecoderBlock module shown in FIG. 5, the Encoder3 is connected with L inkBlock3, such as the L inkBlock module shown in FIG. 6, the Encoder2 is connected with L inkBlock2, the Encoder1 is connected with L inkBlock1, the output of the DecoderBlock1 is combined with the output of 1, the DecoderBlock1 is used as the input of the DecoderBlock1, the output of the DecoderBlock1 is combined with the output of the DecoderBlock1, and the DecoderBlock1 is used as the input and output of the DecoderBlock1, and the DecoderBlock 1.

And fourthly, merging the two-value mask images of the culture ponds, obtaining a plurality of two-value mask image blocks of the culture ponds through a semantic segmentation model, enabling each generated culture pond block to correspond to the stored index information of the remote sensing image block, merging the two-value mask images of the culture ponds through coordinate mapping, adding geographic information, and generating a grid image of the culture ponds corresponding to the original input remote sensing image, wherein input sub-meter-level remote sensing image images and corresponding two-value mask images of the culture ponds are shown in the images 8 and 9 respectively.

The above embodiments are only for illustrating the invention and are not to be construed as limiting the invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention, therefore, all equivalent technical solutions also belong to the scope of the invention, and the scope of the invention is defined by the claims.

Claims

1. A remote sensing image culture pond detection method based on semantic segmentation is characterized by comprising the following steps:

secondly, when information extraction is carried out on the remote sensing image block, a model structure needs to be designed, the model is trained, the sub-meter remote sensing image is used as training basic data, a visual interpretation mode is adopted, culture pond information in the remote sensing image is obtained, a training sample is made, data are preprocessed and input into a designed semantic segmentation model for training, and after cost errors are converged, the culture pond semantic segmentation model is obtained;

extracting culture pond information of the remote sensing image block, preprocessing the culture pond information, and inputting the preprocessed culture pond information into a trained semantic segmentation model to obtain a binary mask image of the culture pond of the remote sensing image block, wherein the method comprises the following steps of:

a) the model structure is divided into two major parts of coding and decoding, after the input remote sensing block data is subjected to standardization processing, the coding of the model is firstly carried out, namely a characteristic extraction stage, a ResNet34 network is adopted in a coding layer, and the coding layer is formed by sequentially connecting FirstConv-BN-Relu, FirstMaxPool, Encoder1, Encoder2, Encoder3 and Encoder4, wherein the size of a convolution filter of the FirstConv-BN-Relu is 7, the convolution step length is 2, and the convolution filter contains batch normalization BN and an activation function Relu; encoder is a residual block of a ResNet34 network under different scales, the output of each Encoder part is reserved as bottom layer characteristics, and the output bottom layer characteristics are down-sampled by 2 times;

c) the output of the hole convolution module is used as the input of a decoding layer consisting of DecoderBlock4, DecoderBlock3, DecoderBlock2, DecoderBlock1 and Final-DeconThe method comprises the steps of generating a code layer, generating a tail layer

x represents the input characteristic value of a neuron in the forward transmission process of the convolutional neural network;

d) obtaining a predicted value of each pixel point through a decoding layer, and generating a remote sensing image culture pond binary mask image according to the predicted value, wherein the pixel value of a culture pond area in the image is 1, and the pixel values of other areas are 0;

2. The method for detecting the remote sensing image culture pond based on the semantic segmentation as claimed in claim 1, wherein the remote sensing image in the first step is cut into blocks, and the specific cutting standard is determined by a preset block size and an overlapping area size.

3. The method for detecting the remote sensing image culture pond based on the semantic segmentation as claimed in claim 1 or 2, wherein the remote sensing image cutting and blocking in the first step comprises the following steps:

b) the original sub-meter-level remote sensing image is cut in a block-by-block mode according to the line sequence, stored index information is (x, y, w, h), wherein x is the column number of the remote sensing image block in the original sub-meter-level image, y is the line number of the remote sensing image block in the original sub-meter-level image, and w and h are the width and the height of the remote sensing image block and are uniformly set to be 1024.

4. The remote-sensing image culture pond detection method based on semantic segmentation as claimed in claim 3, wherein the culture pond semantic segmentation model training in the second step comprises the following steps:

c) carrying out 5 combination modes of random mirror image, random rotation, random blur, random affine transformation and random color stretching transformation on training data input into the model for pretreatment;

Loss＝DiceLoss+BCELoss (3)

wherein p is_nIs the predicted value of the nth pixel point, r_nSetting the actual label value of the nth pixel point as a smooth coefficient and setting the value as 1e-8, setting N as the number of the pixel points, and finally obtaining an error cost function which is shown as (3) and is the result of adding the error costs of Dice and BCE, wherein w is_nRepresenting the weight size of the nth iteration batch;

wherein m is_tAnd v_tβ being first and second order momentum terms, respectively₁And β₂The power value is 0.9 and 0.999 respectively;

and

respectively are respective correction values; w_tRepresenting the parameter of the t time, namely the t-th iteration model; g_t＝ΔJ(W_t) Representing the gradient magnitude of the cost function with respect to W for t iterations, J representing the cost function, i.e., the L oss value above, in order to prevent the denominator from being 0, the smoothing coefficient is set to 1 e-8;

e) and (3) performing parallel training by adopting 4 Titan xp GPUs until the model is converged, and performing model test on a test set, wherein the test IoU is 0.75, and IoU refers to the proportion of intersection and union of a real culture pond area and a model prediction culture pond area, and represents the performance index of the semantic segmentation model.

5. The remote sensing image culture pond detection method based on semantic segmentation as claimed in claim 1 or 4, wherein the culture pond binary mask map merging of step four specifically is:

and merging the two-value mask images of the culture pond through coordinate mapping, adding geographic information, and generating a grid image of the culture pond corresponding to the original input remote sensing image.

6. The remote sensing image culture pond detection method based on semantic segmentation as claimed in claim 4, characterized in that the generated training samples are 2000 and the test samples are 300.