CN112800851B

CN112800851B - Water body contour automatic extraction method and system based on full convolution neuron network

Info

Publication number: CN112800851B
Application number: CN202011633088.2A
Authority: CN
Inventors: 余华芬; 季顺平; 顾春墚; 聂晨晖; 张志力
Original assignee: Zhejiang Institute Of Surveying And Mapping Science And Technology
Current assignee: Zhejiang Institute Of Surveying And Mapping Science And Technology
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2022-09-23
Anticipated expiration: 2040-12-31
Also published as: CN112800851A

Abstract

The invention relates to a water body contour automatic extraction method and a system based on a full convolution neural network, wherein the water body contour automatic extraction method comprises the following steps: respectively constructing a training sample library and an image prediction library; performing iterative training on the training sample library through the multi-receptive-field characteristic combined full convolution network to obtain a network model; extracting the water body contour of the image prediction library by using the network model to obtain an extraction result of the surface water body contour; the training sample library is constructed by surface images and water body labeling data, and the image prediction library is constructed by surface images. The multi-receptive-field characteristic combined full convolution neural network has strong scale robustness, can adapt to the extraction of high-resolution remote sensing image water bodies with different complex situations and different scales, and can continuously and iteratively optimize.

Description

Water body contour automatic extraction method and system based on full convolution neuron network

Technical Field

The invention relates to a coordinate fitting method, in particular to a water body contour automatic extraction method and a water body contour automatic extraction system based on a full convolution neural network.

Background

The water body extraction has very important significance for the applications of water resource monitoring, natural disaster assessment, environmental protection and the like, and the method for acquiring the earth surface information by using the remote sensing technology is the most common technical means. The traditional methods mainly comprise a wave band threshold value method, a supervision classification method, a water body index method, an inter-spectrum relation method and the like, and the main work of the methods for extracting the water body is as follows: a proper feature is designed on the spectrum features of each wave band on the remote sensing image and empirically for water body identification, and the attention to the features of the water body such as shape, size, texture, edge, semantic, shadow and the like is little, which seriously influences the extraction precision of the water body. In addition, for massive remote sensing data processing, the traditional method generally has the defects of low automation degree, poor working efficiency and low precision.

Therefore, it is important to improve the accuracy and automation of surface water extraction. The convolutional neural network in deep learning has extremely strong performance in image classification, image retrieval, target detection and semantic segmentation, and the extremely strong characteristic representation capability of the convolutional neural network is attributed to a certain degree. The convolutional neural network can perform layered abstraction on image features by using local operation, and automatically learns a multi-layer feature expression, and the capability of automatically learning the features exceeds that of a traditional method for designing the features empirically.

The extraction of the remote sensing image water body needs to pay more attention to the contour problem of the water body, and the interior of the water body is a secondary and simpler problem. The contour information of the water body relates to various semantic information, such as blocking of ridges and paddy fields, river banks and river water, channels and the ground, shadows and weeds, and the like, which are the main difficulties in water body extraction. In making various topographical and thematic maps, the plotter has to manually trace out the various contours of the body of water on the satellite or aerial images, which is obviously a tedious and inefficient task. Therefore, efficient and automatic extraction research for the remote sensing image water body is very important.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a method and a system for automatically extracting a water body contour based on a full convolution neural network.

In order to achieve the purpose, the invention adopts the following technical scheme: the automatic water body contour extraction method based on the full convolution neuron network comprises the following steps:

respectively constructing a training sample library and an image prediction library;

performing iterative training on the training sample library through the multi-receptive-field characteristic combined full convolution network to obtain a network model;

and extracting the water body contour of the image prediction library by using the network model to obtain an extraction result of the surface water body contour.

The training sample library is constructed by surface images and water body labeling data, and the image prediction library is constructed by surface images.

Preferably, the constructing the training sample library includes:

acquiring a surface image and water body labeling data;

preprocessing water body labeling data to obtain a surface image and a labeling grid pair;

and slicing the earth surface image and the marking grid pair to obtain the training sample.

Preferably, the pretreatment comprises:

rasterizing the water body labeling data to obtain labeling raster data;

and resampling and cutting the earth surface image and the water body labeling grid data to obtain the earth surface image and the labeling grid pair.

Preferably, the multi-receptive field feature joint full convolution network comprises an encoding section, a decoding section, and an output section, wherein,

the coding part consists of 5 multi-sensing-field characteristic combination modules and 4 maximum pooling layers;

the decoding part consists of 4 different semantic feature fusion modules and 4 upsampling layers;

the output part consists of 5 output layers and 1 characteristic multi-scale prediction fusion module.

Preferably, the 1 st and 2 nd multi-susceptibility-field feature combination modules of the coding part are composed of a convolution layer with a convolution kernel size of 3 × 3 and a step size of 1, a batch normalization layer and a modified linear unit in front;

the front and back of the 3 rd, 4 th and 5 th multi-susceptibility-field feature combination modules of the coding part are composed of convolution layers with the convolution kernel size of 3 multiplied by 3 and the step length of 1, a batch normalization layer and a correction linear unit; the maximum pooling layer step length of the encoding part is 2 multiplied by 2, and after the encoding part passes through the pooling layer, the height and the width of an output feature map become one half of the input feature map.

Preferably, the multi-receptive field feature combination module is composed of three different receptive field feature extractions, namely a short distance feature extraction module, a middle distance feature extraction module and a long distance feature extraction module, wherein,

the short-distance feature extraction module consists of a convolution layer with the convolution kernel size of 3 multiplied by 3 and the step length of 1, a batch normalization layer and a Sigmoid function;

the middle distance feature extraction module consists of a convolution layer with convolution kernel size of 7 multiplied by 7 and step length of 4, a batch normalization layer and a correction linear unit, and the convolution layer with convolution kernel size of 3 multiplied by 3 and step length of 1, the batch normalization layer, the correction linear unit and a 4-time bilinear up-sampling layer;

the long-distance feature extraction module consists of a Global Pooling layer (GP) and two full-connection layers;

and the multi-receptive-field characteristic combination module finally consists of a convolution layer with convolution kernel size of 1 multiplied by 1 and step length of 1, a batch normalization layer and a correction linear unit.

Preferably, the different semantic feature fusion module of the decoding part includes:

performing convolution and batch normalization with convolution kernel size of 1 × 1 and step size of 1 on the input, and correcting the linear unit to obtain a feature map F1;

performing global pooling on the F1 feature map to obtain a feature GP1, and then performing convolution with the convolution kernel size of 1 multiplied by 1 and the step size of 1, batch normalization, linear unit correction, convolution with the convolution kernel size of 1 multiplied by 1 and the step size of 1 and Sigmoid function to obtain global information GP 2;

performing matrix operation GP 2F 1+ F1 to obtain a characteristic diagram F2;

carrying out convolution with convolution kernel size of 3 multiplied by 3 and step size of 1, batch normalization and linear unit correction on the F2 feature map;

and each convolution layer of the decoding part is input by the characteristic diagram obtained by the up-sampling layer and the series connection of the corresponding size characteristic diagram of the coding part.

Preferably, the output layer of the output part is composed of a convolution layer with a convolution kernel size of 1 × 1 and a step size of 1 and a Sigmoid function, and specifically includes:

the characteristic multi-scale prediction fusion module firstly performs convolution, batch normalization and linear unit correction operations with the convolution kernel size of 1 multiplied by 1 and the step length of 1 on input to obtain a characteristic diagram F' 1;

respectively processing the obtained product by convolution kernels with the sizes of 3 multiplied by 3, 5 multiplied by 5 and 7 multiplied by 7, batch normalization and Sigmoid functions to obtain W1, W2 and W3;

performing convolution with convolution kernel size of 3 multiplied by 3 and step size of 1, batch normalization, linear unit correction and convolution layer with convolution kernel size of 1 multiplied by 1 and step size of 1 on the series connection results of F ' 1, F ' 1 multiplied by W1, F ' 1 multiplied by W2 and F ' 1 multiplied by W3 to obtain a feature map F ' 2;

and F' 2 is processed by a Sigmoid function.

And the input of the characteristic multi-scale prediction fusion module of the output part is the concatenation of the up-sampling results of the 3 rd and 4 th output layers and the 5 th output layer result.

Preferably, after the surface water body contour extraction result is obtained, edge vectorization is carried out by using a Douglas-Pock algorithm.

The invention also provides a water body contour automatic extraction system based on the full convolution neural network, which comprises the following steps:

the training sample library construction unit is used for constructing the earth surface image and the water body labeling data to obtain a training sample;

the image prediction library construction unit is used for constructing the earth surface image to obtain an image prediction library;

the network model training unit is used for carrying out iterative training on the training samples acquired by the training sample library construction unit to acquire a network model;

and the water body contour extraction unit is used for extracting the water body contour from the image prediction library by using the network model to obtain an earth surface water body contour extraction result.

Compared with the prior art, the invention has the beneficial effects that:

according to the method, a training sample library can be constructed according to the existing high-resolution aerial or satellite image and water body labeling data; then training a Multi-field Features combined Full convolution Network (MFU-FCN) to learn the Features of the water body in the high-resolution remote sensing image; after the network training is finished, the trained parameters and the network are utilized to predict the high-resolution remote sensing image to obtain a high-precision extraction result of the surface water coverage of the high-resolution remote sensing image, wherein the multiple receptive field characteristics are combined with the full convolution neural network, so that the robustness is strong, the method is suitable for extracting the water of the high-resolution remote sensing image under different complex conditions and different scales, and the continuous iteration optimization can be realized continuously.

The invention is further described below with reference to the accompanying drawings and specific embodiments.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a flowchart of the construction of the training sample library and the image prediction sample library in this embodiment.

FIG. 2 is a schematic diagram of the multi-field-sensing feature combination module in this embodiment,

figure 3 is a schematic diagram of the mid-range feature extraction module in this embodiment,

fig. 4 is a schematic diagram of the long distance feature extraction module in this embodiment.

Fig. 5 is a schematic diagram of the feature multi-scale prediction fusion module in this embodiment.

Fig. 6 is a schematic structural diagram of the multi-exposure feature extraction combined full convolution network in this embodiment.

Fig. 7 is a schematic diagram of a framework of an automatic water body contour extraction system based on a full convolution neural network in this embodiment.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1, fig. 2, fig. 3, fig. 4, fig. 5, and fig. 6, in the present embodiment, a Multi-field Features combined Full convolution Network (MFU-FCN) is used to learn water body Features in high-resolution satellite or aerial remote sensing images, and then pixel-level prediction is performed on the water body coverage of the remote sensing images. The method specifically comprises the following steps:

s1, respectively constructing a training sample library and an image prediction library;

training of a network model requires obtaining of training samples, wherein the training sample library is constructed by surface images and water body labeling data, the image prediction library is constructed by surface images, a process for constructing the training sample library is shown with reference to fig. 1, and the specific process is as follows:

firstly, preparing a high-resolution satellite film or a navigation film and water body labeling vector data;

and then, carrying out data preprocessing on the data, namely rasterizing the water body labeling vector data, and resampling and cutting the image and labeling grid data to obtain an image and labeling grid pair with proper resolution and consistent size.

Finally, the training sample library with proper slice size (such as 512 × 512 or 256 × 256) is made by combining the computer video memory resource, the feature of the ground feature and other factors. In addition, the data preprocessing is utilized to carry out the same processing on the image to be predicted, and an image prediction library is established for the direct prediction of a subsequent model. Note that the image prediction library contains no water labeling data.

S2, carrying out iterative training on the training sample library through the multi-sensitive-field characteristic combined full convolution network to obtain a network model;

the multi-field feature joint full convolution network in this embodiment includes 3 parts, namely encoding (encoding stage), decoding (decoding stage), and outputting (output). Wherein the content of the first and second substances,

the encoding part consists of 5 Multi-field features units (MFU) and 4 Max Pooling layers (Max Pooling Layer);

the decoding part consists of 4 Different semantic feature fusion modules (DSFF) and 4 Upsampling layers (Upsampling Layer);

the output section consists of 5 output layers and 1 feature Multi-scale prediction fusion Module (MPF).

The 1 st and 2 nd multi-field feature union modules of the above coding section are previously composed of a Convolution kernel size of 3 × 3 and a Convolution layer (Convolution) with step size 1, a Batch Normalization layer (BN), and a modified Linear Unit (ReLU). The front and back of the 3 rd, 4 th and 5 th multi-sensitivity feature combination modules of the coding part are composed of convolution layers with the convolution kernel size of 3 multiplied by 3 and the step length of 1, a batch normalization layer and a correction linear unit.

In addition, the multi-receptive field feature combination module in this embodiment is composed of three different receptive field feature extractions, which are a short distance feature extraction module, a medium distance feature extraction module, and a long distance feature extraction module.

The short-distance feature extraction module consists of a convolution layer with the convolution kernel size of 3 multiplied by 3 and the step length of 1, a batch normalization layer and a Sigmoid function; the middle distance feature extraction module consists of a convolution layer with convolution kernel size of 7 multiplied by 7 and step length of 4, a batch normalization layer, a correction linear unit, a convolution layer with convolution kernel size of 3 multiplied by 3 and step length of 1, a batch normalization layer, a correction linear unit and a 4-time bilinear upsampling layer; the long-distance feature extraction module consists of a Global Pooling (GP) layer and two full-connection layers; and the multi-receptive-field characteristic combination module consists of a convolution layer with convolution kernel size of 1 multiplied by 1 and step length of 1, a batch normalization layer and a correction linear unit.

The maximum pooling layer step size of the encoding part is 2 x 2, and after passing through the pooling layer, the height and width of the output feature map become one half of the input.

In this embodiment, different semantic feature fusion modules of the decoding portion first perform operations of convolution with a convolution kernel size of 1 × 1 and a step size of 1, batch normalization, and correction of linear units on the input to obtain a feature map F ₁ (ii) a Then to F ₁ The feature GP is obtained by the global pooling operation of the feature map ₁ Then convolution with convolution kernel size of 1 multiplied by 1 and step size of 1, batch normalization, correction of linear unit, convolution with convolution kernel size of 1 multiplied by 1 and step size of 1, Sigmoid function are carried out to obtain global information GP ₂ (ii) a Then, the matrix operation GP is carried out ₂ *F ₁ +F ₁ Obtain a characteristic diagram F ₂ . Last pair F ₂ The characteristic diagram is subjected to convolution with convolution kernel size of 3 multiplied by 3 and step size of 1, batch normalization and operation of correcting linear units.

The characteristic diagram obtained by the up-sampling layer and the characteristic diagram corresponding to the size of the coding part are connected in series.

In this embodiment, the output layer of the output part is composed of a convolution layer with a convolution kernel size of 1 × 1 and a step size of 1, and a Sigmoid function, and specifically includes:

the characteristic multi-scale prediction fusion module of the output part performs convolution with convolution kernel size of 1 multiplied by 1 and step size of 1, batch normalization and linear unit correction on the input to obtain a characteristic map F' ₁ 。

Then respectively using convolution kernels with the sizes of 3 multiplied by 3, 5 multiplied by 5 and 7 multiplied by 7, batch normalization and Sigmoid function to process to obtain W ₁ 、W ₂ 、W ₃ . Then to F' ₁ 、F’ ₁ *W ₁ 、F’ ₁ *W ₂ 、F’ ₁ *W ₃ The series result of (2) is subjected to convolution with a convolution kernel size of 3 × 3 and a step size of 1, batch normalization, correction of linear units, and operation of convolution layers with a convolution kernel size of 1 × 1 and a step size of 1 to obtain a feature map F' ₂ 。

Last to F' ₂ And (6) performing Sigmoid function processing.

And S3, extracting the water body contour of the image prediction library by using the network model to obtain an extraction result of the surface water body contour.

In this embodiment, after the training sample library is manufactured, iterative training is performed on the network model until the model is optimal, and after the model training is completed, water extraction is performed on the image prediction library by using the trained model, so that a remote sensing image water extraction result can be obtained. And after obtaining the water body extraction result, carrying out grid vectorization on the edge of the water body by using a Douglas-Pock algorithm.

Referring to fig. 7, the embodiment further provides a system for automatically extracting a water body contour based on a full convolution neural network, including:

the training sample library construction unit 1 is used for constructing the earth surface image and the water body labeling data to obtain a training sample;

the image prediction library construction unit 2 is used for constructing the earth surface image to obtain an image prediction library;

the network model training unit 3 is used for carrying out iterative training on the training samples acquired by the training sample library construction unit to acquire a network model;

and the water body contour extraction unit 4 is used for extracting the water body contour from the image prediction library by using the network model to obtain an extraction result of the surface water body contour.

It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. The computer program includes program instructions, and the computer program may be stored in a storage medium, which is a computer-readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.

Accordingly, the present invention also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program, wherein the computer program, when executed by a processor, causes the processor to perform the steps of:

constructing a training sample library and an image prediction library;

and extracting the water body contour from the image prediction library by using the network model to obtain an extraction result of the surface water body contour.

The storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, which can store various computer readable storage media.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.

The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be merged, divided and deleted according to actual needs. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.

While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The automatic water body contour extraction method based on the full convolution neural network is characterized by comprising the following steps of:

extracting the water body contour of the image prediction library by using the network model to obtain an extraction result of the surface water body contour;

the training sample library is constructed by surface images and water body labeling data, and the image prediction library is constructed by surface images;

the multi-receptive field characteristic joint full convolution network comprises an encoding part, a decoding part and an output part, wherein,

the output part consists of 5 output layers and 1 characteristic multi-scale prediction fusion module;

the 1 st and 2 nd multi-sensing field characteristic combination modules of the coding part are composed of a convolution layer with 3 multiplied by 3 convolution kernel size and step length of 1, a batch normalization layer and a correction linear unit in front of the combination module;

the front and back of the 3 rd, 4 th and 5 th multi-susceptibility-field feature combination modules of the coding part are composed of convolution layers with the convolution kernel size of 3 multiplied by 3 and the step length of 1, a batch normalization layer and a correction linear unit; the step length of the maximum pooling layer of the coding part is 2 multiplied by 2, and after the coding part passes through the pooling layer, the height and the width of an output characteristic diagram become one half of the input characteristic diagram;

the multi-receptive field characteristic combination module consists of three different receptive field characteristic extractions, namely a short-distance characteristic extraction module, a middle-distance characteristic extraction module and a long-distance characteristic extraction module,

the middle distance feature extraction module consists of a convolution layer with convolution kernel size of 7 multiplied by 7 and step length of 4, a batch normalization layer, a correction linear unit, a convolution layer with convolution kernel size of 3 multiplied by 3 and step length of 1, a batch normalization layer, a correction linear unit and a 4-time bilinear upsampling layer;

the long-distance feature extraction module consists of a global pooling layer and two full-connection layers;

the multi-receptive-field characteristic combination module is finally composed of a convolution layer with convolution kernel size of 1 multiplied by 1 and step length of 1, a batch normalization layer and a correction linear unit;

the different semantic feature fusion module of the decoding part comprises:

performing global pooling on the F1 feature map to obtain a feature GP1, and then performing convolution with the convolution kernel size of 1 multiplied by 1 and the step length of 1, batch normalization, linear unit correction, convolution with the convolution kernel size of 1 multiplied by 1 and the step length of 1 and a Sigmoid function to obtain global information GP 2;

performing matrix operation GP 2F 1+ F1 to obtain a characteristic map F2;

wherein, each convolution layer of the decoding part is input by the cascade connection of a characteristic diagram obtained by an up-sampling layer and a characteristic diagram with the corresponding size of the coding part;

the output layer of the output part is composed of a convolution layer with convolution kernel size of 1 × 1 and step length of 1 and a Sigmoid function, and specifically comprises the following steps:

carrying out Sigmoid function processing on the F' 2;

2. The method for automatically extracting the water body contour based on the full convolution neural network according to claim 1, wherein the constructing of the training sample library comprises:

acquiring a surface image and water body labeling data;

3. The method for automatically extracting the water body contour based on the full convolution neural network according to claim 2, wherein the preprocessing comprises:

rasterizing the water body labeling data to obtain labeling grid data;

4. The method for automatically extracting the water body contour based on the full convolution neural network according to claim 1, wherein after a surface water body contour extraction result is obtained, edge vectorization is performed by using a Douglas-Pock algorithm.

5. Automatic water body contour extraction system based on full convolution neuron network is characterized by comprising:

the network model training unit is used for carrying out iterative training on the training samples acquired by the training sample library construction unit through a multi-receptive-field characteristic combined full-convolution network to acquire a network model;

the water body contour extraction unit is used for extracting the water body contour from the image prediction library by utilizing the network model to obtain a surface water body contour extraction result;

the front and back of the 3 rd, 4 th and 5 th multi-susceptibility-field feature combination modules of the coding part are composed of convolution layers with the convolution kernel size of 3 multiplied by 3 and the step length of 1, a batch normalization layer and a correction linear unit; the step length of the maximum pooling layer of the encoding part is 2 multiplied by 2, and after the encoding part passes through the pooling layer, the height and the width of an output characteristic diagram become one half of the input height and width;

the multi-receptive field characteristic combination module consists of three different receptive field characteristic extractions, namely a short-distance characteristic extraction module, a middle-distance characteristic extraction module and a long-distance characteristic extraction module, wherein,

the different semantic feature fusion module of the decoding part comprises:

performing matrix operation GP 2F 1+ F1 to obtain a characteristic diagram F2;

performing convolution with convolution kernel size of 3 multiplied by 3 and step size of 1, batch normalization and linear unit correction on the F2 feature map;

wherein, the input of each convolution layer of the decoding part is the series connection of a characteristic diagram obtained by an up-sampling layer and a characteristic diagram with the corresponding size of the coding part;

the characteristic multi-scale prediction fusion module firstly carries out the operations of convolution with the convolution kernel size of 1 multiplied by 1 and the step length of 1, batch normalization and linear unit correction on input to obtain a characteristic diagram F ^’ 1；

carrying out Sigmoid function processing on the F' 2;