CN115641515A

CN115641515A - Remote sensing image bare land image identification method based on U-Net network

Info

Publication number: CN115641515A
Application number: CN202211233795.1A
Authority: CN
Inventors: 刘寓; 罗斌; 黄心; 付娟娟; 谭婷婷; 龚巧灵; 刘冠伸; 邬斐妮; 邓可欣; 黄孝艳; 蒲锐; 蒋一凡; 况舒文; 颜春波; 王衡; 邓未江; 黄川�; 全鸿欣
Original assignee: Chongqing Ecological Environment Big Data Application Center; CHONGQING CYBERCITY SCI-TECH CO LTD
Current assignee: Chongqing Ecological Environment Big Data Application Center; CHONGQING CYBERCITY SCI-TECH CO LTD
Priority date: 2022-10-10
Filing date: 2022-10-10
Publication date: 2023-01-24

Abstract

The invention discloses a remote sensing image bare land image identification method based on a U-Net network, which specifically comprises the following steps: s1: preprocessing the obtained original remote sensing image to obtain a first remote sensing image; s2: carrying out pixel marking on the land parcel with the bare condition in the first remote sensing image so as to construct training set sample data; s3: building a U-net network model, wherein the U-net network model comprises an encoder and a decoder, and then performing iterative training on the U-net network model by adopting training set sample data built in the S2; s4: and (4) identifying the first remote sensing image by using the U-net network model trained in the step (S3), and outputting the first remote sensing image to the bare land image. The invention has the capability of extracting depth characteristics, does not need excessive human participation and reduces the cost; meanwhile, the classification diagram boundary obtained by the U-net classification method is closer to the ground object reality, is smoother and more attractive, and improves the identification efficiency.

Description

Remote sensing image bare land image identification method based on U-Net network

Technical Field

The invention relates to the technical field of ecological environment, in particular to a remote sensing image bare land image identification method based on a U-Net network.

Background

The urban development and construction often generate bare land parcels, and if the bare land parcels are not treated in time, solid particles and other pollution sources harmful to the urban environment are easily generated, so that the urban environment is greatly polluted, and the public health is influenced.

The method for identifying the bare land by using the remote sensing image is a scientific and efficient means, and comprises a pixel classification-based method and an object-oriented classification method. The method based on pixel classification has the defects of low identification efficiency and difficulty in timely and effectively finding and confirming bare land parcels; the object-oriented classification method can comprehensively utilize various characteristics to extract required information, but needs more human participation. Therefore, how to realize the rapid, intelligent and automatic supervision of the exposed land, the land utilization information is extracted rapidly and efficiently, and the problem to be solved urgently is formed.

Disclosure of Invention

Aiming at the problems of low efficiency and low speed of the remote sensing image to bare land mass recognition in the prior art, the invention provides a remote sensing image bare land mass image recognition method based on a U-Net network.

In order to achieve the purpose, the invention provides the following technical scheme:

a remote sensing image bare land image identification method based on a U-Net network specifically comprises the following steps:

s1: preprocessing the obtained original remote sensing image to obtain a first remote sensing image;

s2: carrying out pixel marking on the land parcel with the bare condition in the first remote sensing image so as to construct training set sample data;

s3: building a U-net network model, wherein the U-net network model comprises an encoder and a decoder, and then performing iterative training on the U-net network model by adopting training set sample data built in the S2;

s4: and (4) identifying the first remote sensing image by using the U-net network model trained in the step (S3), and outputting the first remote sensing image to the bare land image.

Preferably, said S1 comprises the steps of:

s1-1: performing geometric fine correction and image registration on the obtained original remote sensing image to obtain a first sub remote sensing image;

s1-2: performing radiation correction on the first sub remote sensing image to obtain a radiation correction image, namely a second sub remote sensing image;

the formula for calculating the radiation correction is:

in the formula (1), rho is the reflectivity and is dimensionless; d is a distance between day and ground parameter and is dimensionless; ESUN _λ Is the solar spectral irradiance in W.m ² ·μm ^-1 (ii) a Theta is the sun zenith angle and the unit is DEG; l lambda is the radiance received by the satellite, and the relation between the radiance DN of the original remote sensing image is as follows:

L _λ ＝Gain×DN+Bias (2)

in formula (2), DN is the image brightness value, L _λ Represents the radiance of the satellite reception in W.m ^-2 ·sr ^-1 ·μm ^-1 (ii) a Gain is Gain in W.m ^-2 ·sr ^-1 ·μm ^-1 Gain is equal to (L) _max -L _min )/255，L _max And L _min Maximum and minimum spectral radiance values; 255 is the maximum gray level of the 8-bit quantized image, ranging from [1,255 ]](ii) a Bias is Bias;

s1-3: carrying out spatial color homogenizing on the second sub remote sensing image to obtain a spatial color homogenizing remote sensing image, namely a third sub remote sensing image;

s1-4: and combining the plurality of third sub remote sensing images into one image, namely the first remote sensing image.

Preferably, said S1-1 comprises:

s1-1-1: aiming at a flat area, a polynomial model is adopted, control points are selected, and resampling of image points is completed in an interpolation value mode, so that geometric fine correction and image registration are completed on an original remote sensing image, and a first sub remote sensing image is obtained;

the polynomial model is shown below:

in the formula (3), x and y are image plane coordinates of the image point, and X, Y is geodetic coordinates of the ground point corresponding to the image point; a is _i 、b _i Is a polynomial coefficient, i =0,1,2,3,4,5,6,7,8,9;

s1-2: aiming at the area with larger topographic relief, an RPC model is adopted to correlate the image coordinates P (r, c) with the ground coordinates P (X, Y, Z) to establish the mathematical relation of a ratio polynomial,

the expression form is as follows:

in the formula (4), P _n (X _n ,Y _n ,Z _n ) The polynomial function represents the X-axis coordinate, the Y-axis coordinate and the Z-axis coordinate of the nth coordinate point;

in the calculation, the image coordinates P (r, c) and the ground coordinates P (X, Y, Z) need to be normalized by scaling and translation to obtain normalized coordinates with a value range of (-1,1), and the transformation form of formula (4) is as follows:

in the formula (5), X ₀ 、Y ₀ 、Z ₀ 、r ₀ 、c ₀ Representing a normalized translation parameter, X _s 、Y _s 、Z _s 、r _s 、c _s Representing a normalized ratio parameter, X _n 、Y _n 、Z _n 、r _c 、c _n Indicating the normalized coordinates.

Preferably, in S1-3, the method for spatially homogenizing includes:

preparing a template image, wherein the coverage area of the template image is larger than that of a second sub remote sensing image needing color homogenizing, and matching the local geographical area information of the second sub remote sensing image needing color homogenizing with the local geographical area information of the template image corresponding to reference by traversing pixel values of each unit pixel in the second sub remote sensing image to enable the color of the corresponding position of the second sub remote sensing image and the color of the corresponding position of the template image to be similar.

Preferably, said S2 comprises the steps of:

s2-1: the method comprises the following steps of marking exposed land blocks of a first remote sensing image to form a sample data set, wherein the method comprises the following steps:

adopting manual vectorization, plotting to form bare block vector data by judging a bare block area in the first remote sensing image, and then giving a '1' value to the plotted area and giving a '0' value to the non-plotted area through vector raster data conversion to form a 0,1 binary grid map;

s2-2: the first remote sensing image and the binary raster image in the S2-1 are required to be cut, and the cutting method comprises the following steps:

cutting according to the long and short edge pixels 2 by using a regular grid ⁿ Rule, the calculation formula of the length of the regular grid is as follows:

L＝Cell×B (6)

in the formula (6), L is the length of the regular grid, cell is the image resolution, and B is the number of long and short-edge pixels.

Preferably, in S3, the encoder includes a first downsampling layer, a second downsampling layer, a third downsampling layer, a fourth downsampling layer, and a fifth downsampling layer that are sequentially connected from top to bottom;

the first down-sampling layer performs 2 times of 3 × 3 convolution on the input image, and performs 1 time of 2 × 2 pooling; the second downsampling layer performs 2 times of 3 × 3 convolution on the feature map output by the first downsampling layer, and performs 1 time of 2 × 2 pooling; the third down-sampling layer performs 2 times of 3 × 3 convolution on the feature map output by the second down-sampling layer, and performs 1 time of 2 × 2 pooling; the fourth down-sampling layer performs 2 times of 3 × 3 convolution on the feature map output by the third down-sampling layer, and performs 1 time of 2 × 2 pooling; the fifth downsampling layer performs 2 times of 3 × 3 convolution on the feature map output by the fourth downsampling layer.

Preferably, in S3, the decoder includes a first upsampling layer, a second upsampling layer, a third upsampling layer, and a fourth upsampling layer, which are sequentially connected from bottom to top;

the first up-sampling layer performs 1-time 2 × 2 up-sampling on the feature map output by the fifth down-sampling layer, then superimposes the feature map output by the fourth down-sampling layer and not pooled feature maps, performs 2-time 3 × 3 convolution after the superimposition, and then performs 1-time 2 × 2 up-sampling; the second up-sampling layer superposes the feature map output by the first up-sampling layer and the feature map output by the third down-sampling layer and then carries out 2 times of 3 × 3 convolution and 1 time of 2 × 2 up-sampling; the third up-sampling layer superposes the feature map output by the second up-sampling layer and the un-pooled feature map output by the second down-sampling layer, and then 2 times of 3 × 3 convolution and 1 time of 2 × 2 up-sampling are carried out; and after the feature map output by the third up-sampling layer and the un-pooled feature map output by the first down-sampling layer are superposed by the fourth up-sampling layer, 2 times of 3 multiplied by 3 convolution and one time of 1 multiplied by 1 convolution are carried out, and a segmentation result is output.

In summary, due to the adoption of the technical scheme, compared with the prior art, the invention at least has the following beneficial effects:

1. the influence of the identification result caused by different acquisition equipment, seasonal variation and background image difference can be reduced by preprocessing the sample set, and the detection precision is improved.

2. For the classification of a large area, the traditional classification is influenced by various factors such as a segmentation result, feature selection, a classification algorithm and the like, and the classification effect cannot easily achieve a good effect; the invention has the capability of extracting depth characteristics, does not need excessive human participation and reduces the cost; meanwhile, the classification graph boundary obtained by the U-net classification method is closer to the ground object real condition, is smoother and more attractive, and improves the identification efficiency.

Description of the drawings:

fig. 1 is a schematic flow chart of a remote sensing image bare land image identification method based on a U-Net network according to an exemplary embodiment of the invention.

Fig. 2 is a schematic diagram of a U-Net network structure according to an exemplary embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and embodiments. It should be understood that the scope of the above-described subject matter of the present invention is not limited to the following examples, and any technique realized based on the contents of the present invention is within the scope of the present invention.

In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used merely for convenience of description and for simplicity of description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, are not to be construed as limiting the present invention.

As shown in FIG. 1, the invention provides a remote sensing image bare land image identification method based on a U-Net network, which specifically comprises the following steps:

s1: the method mainly utilizes the original remote sensing images of high score one, high score six, resource three and the like to carry out preprocessing to obtain a first remote sensing image.

S1-1: aiming at flat areas, a polynomial model is adopted, the model is simple, external orientation elements and the imaging process of remote sensing images are not required to be considered, meanwhile, the calculation efficiency is high, the model corrects the coordinate relation between corresponding points of the images by adopting a proper polynomial, and the method is as follows:

in the formula (1), x and y are image plane coordinates of the image points, and X, Y is geodetic coordinates of the ground points corresponding to the image points; a is _i 、b _i Is a polynomial coefficient, i =0,1,2,3,4,5,6,7,8,9.

In the original remote sensing image, each pixel can represent an image point coordinate, and each image point coordinate corresponds to a geodetic coordinate of a real ground point; however, in reality, each pixel cannot acquire a real coordinate, so that n control points are selected through a polynomial model, and resampling of image points is completed through an interpolation value mode, so that geometric fine correction and image registration are completed on an original remote sensing image, and a first sub remote sensing image is obtained.

S1-2: aiming at areas with large topographic relief, an RPC model is adopted, and a ratio polynomial relationship is established by the RPC model according to three-dimensional coordinates of ground points and two-dimensional coordinates of image points to obtain a model which is approximate to a strict physical model in precision and simple in form. The RPC model is to associate the image coordinates P (r, c) with the ground coordinates P (X, Y, Z) to establish the mathematical relationship of a ratio polynomial, and the expression form is as follows:

in the formula (2), P _n (X _n ,Y _n ,Z _n ) For a polynomial function, the meditation of X, Y, Z of each term is maximum 3, the sum of power values of X, Y, Z of each term is not higher than 3, and the specific expression is as follows:

P(X,Y,Z)＝a ₀ +a ₁ X+a ₂ Y+a ₃ Z+a ₄ XY+a ₅ XZ+a ₆ YZ+a ₇ X ² +a ₈ Y ² +a ₉ Z ² +a ₁₀ XYZ+a ₁₁ X ² Y+a ₁₂ X ² Z+a ₁₃ Y ² X+a ₁₄ Y ² Z+a ₁₅ Z ² X+a ₁₆ Z ² Y+a ₁₇ X ³ +a ₁₈ Y ³ +a ₁₉ Z ³ (3)

in the formula (3), a ₀ 、a ₁ 、...a ₁₉ Representing polynomial coefficients;

in the calculation, the image coordinates P (r, c) and the ground coordinates P (X, Y, Z) need to be normalized by scaling and translation to obtain normalized coordinates with a value range of (-1,1), and the transformation form of formula (2) is as follows:

in the formula (4), X ₀ 、Y ₀ 、Z ₀ 、r ₀ 、c ₀ Representing a normalized translation parameter, X _s 、Y _s 、Z _s 、r _s 、c _s Representing a normalized scale parameter, X _n 、Y _n 、Z _n 、r _c 、c _n Indicating the normalized coordinates.

And according to the standardized coordinates, performing geometric fine correction and image registration on the original remote sensing image to obtain a first sub remote sensing image.

S1-3: and performing radiation correction on the first sub remote sensing image to obtain a radiation correction image, namely a second sub remote sensing image.

Because the remote sensing image is easily interfered by external factors such as the atmosphere, the reflectivity rho needs to be calculated by using the following formula to finish the radiation correction:

in the formula (5), rho is the reflectivity (the atmospheric top surface reflectivity TOA) and is dimensionless; d is a distance between day and ground parameter and is dimensionless; ESUN _λ Is the solar spectral irradiance in W.m ² ·μm ^-1 (ii) a Theta is the solar zenith angle and the unit is degree; l is _λ The relationship between the radiance received by the satellite and the original remote sensing image brightness DN is as follows:

L _λ ＝Gain×DN+Bias (6)

in formula (6), DN is the image brightness value, L _λ Represents the radiance of the satellite reception in W.m ^-2 ·sr ^-1 ·μm ^-1 (ii) a Gain is Gain in W.m ^-2 ·sr ^-1 ·μm ^-1 Gain is equal to (L) _max -L _min )/255，L _max And L _min The unit is the same as the gain for maximum and minimum spectral radiance values; 255 is the maximum gray level of an 8-bit quantized image, ranging from [1,255 ]](ii) a Bias is the Bias.

S1-4: and carrying out spatial color homogenizing on the second sub remote sensing image to obtain a spatial color homogenizing remote sensing image, namely a third sub remote sensing image.

In this embodiment, the second sub remote sensing image has the characteristics of multiple elements, multiple resolutions and multiple time phases, and has a large difference, so that the spatial color equalization can improve the training precision on one hand, and can improve the recognition precision when the bare land is recognized on the other hand. The method mainly corrects the image color to be processed into the template image color according to the geographic position space of two images, and comprises the main steps of preparing a low-resolution template image, ensuring that the image coverage is larger than the image needing uniform color, ensuring that the coordinate systems of the two images are consistent, counting the local geographic area information of the image needing uniform color and the local geographic area information of the corresponding reference template image for detail matching by traversing pixel values of each unit pixel, and enabling the image to be close to the corresponding position color of the template image.

S1-5: in this embodiment, when the coverage of a single remote sensing image in the research area is insufficient, the plurality of third sub remote sensing images need to be merged into one image, that is, the first remote sensing image, so as to solve the problem of incomplete data coverage.

And S2, constructing sample data of a training set according to the first remote sensing image obtained in the step S1, and marking the pixels of the land blocks with bare conditions at semantic level.

S2-1: the method mainly comprises the steps of adopting manual vectorization, judging a bare land area through reading, plotting to form bare land vector data, and giving a plotted area a value of '1' and a non-plotted area a value of '0' through vector raster data conversion to form a 0,1 binary raster image.

S2-2: because the mosaic combined first remote sensing image is often large in size and cannot be applied to the U-net network model, the first remote sensing image and the binary raster image in the S2-1 need to be cut to obtain training set sample data. The cutting method is to use a regular grid to cut, and the cutting follows the long and short edge pixels 2 ⁿ For a rule, e.g., 256 × 256, the rule grid length calculation formula is as follows:

L＝Cell×B (7)

in formula (7), L is the length of the regular grid, cell is the image resolution, and B is the number of long and short-edge pixels.

When the width and height of the first remote sensing image is not 2n, the image is limited by a neural network and cannot be used for sample data set training, so that the image needs to be enlarged and reduced in equal proportion according to long-edge pixels; for example, if the long-side pixel is set to 256 and the actual clipping may be 257 or 255, the scaling up or down is necessary.

S3, as shown in the figure 2, constructing a U-net network model which comprises a left encoder and a right decoder; and then, carrying out iterative training on the constructed U-net network model for subsequent bare land identification.

S3-1, in the embodiment, the encoder comprises a first downsampling layer, a second downsampling layer, a third downsampling layer, a fourth downsampling layer and a fifth downsampling layer which are sequentially connected from top to bottom, wherein the downsampling layer at the later stage performs convolution operation on a feature map output by the downsampling layer at the previous stage, and the convolution operation comprises two times of 3 × 3 convolution (rule) and 2 × 2 (step size is 2) maximum pooling and twice downsampling. Therefore, with the continuous deepening of the down-sampling layer number, the resolution of the input feature map is smaller and smaller, the channel number is gradually increased, the detail features such as space positioning and the like are gradually lost, but the receptive field is continuously enlarged, and more global abstract features are extracted.

For example, the first downsampling layer performs 2 times of 3 × 3 convolution (Relu) on the input image (with 256 × 256 pixels), 1 time of 2 × 2 pooling (with a step size of 2); the second downsampling layer performs 2 times of 3 × 3 convolution (Relu) and 1 time of 2 × 2 pooling (step size is 2) on the feature map (pixel is 128 × 128) output by the first downsampling layer; the third down-sampling layer performs 2 times of 3 × 3 convolution (Relu) on the feature map (pixels are 64 × 64) output by the second down-sampling layer, and performs 1 time of 2 × 2 pooling (step size is 2); the fourth down-sampling layer performs 2 times of 3 × 3 convolution (Relu) on the feature map (with 32 × 32 pixels) output by the third down-sampling layer, and performs 1 time of 2 × 2 pooling (with a step size of 2); the fifth downsampling layer performs 3 × 3 convolution (Relu) 2 times on the feature map (pixel is 16 × 16) output by the fourth downsampling layer, and at this time, the pixel of the image is 16 × 16, and there are 1024 channels in total.

S3-2, in the embodiment, the decoder comprises a first up-sampling layer, a second up-sampling layer, a third up-sampling layer and a fourth up-sampling layer which are sequentially connected from bottom to top, the up-sampling layer at the later stage performs deconvolution operation on the feature map output by the up-sampling layer at the previous stage, meanwhile, in order to fuse more shallow feature information, the up-sampling layer is connected with the corresponding down-sampling layer, the combined feature map is convolved for two times by 3, and then 1 time of 2 x 2 up-sampling (the step length is 2) is performed to recover the compressed features; the network end uses a 1 × 1 convolution to output the feature channel segmentation result.

For example, the first upsampling layer performs 1-time 2 × 2 upsampling (step size is 2) on the feature map (pixel is 16 × 16) output by the fifth downsampling layer to obtain a new feature map (pixel is 32 × 32), and then superimposes the new feature map with the unflattened feature map (pixel is 32 × 32) output by the fourth downsampling layer, and after the superimposition, performs 2-time 3 × 3 convolution (Relu), and then performs 1-time 2 × 2 upsampling (step size is 2); the second upsampling layer superposes the feature map (the pixel is 64 × 64) output by the first upsampling layer and the unflulized feature map (the pixel is 64 × 64) output by the third downsampling layer, and then performs 2 times of 3 × 3 convolution (Relu), and then performs 1 time of 2 × 2 upsampling (the step size is 2); the third upsampling layer superposes the feature map (the pixel is 128 × 128) output by the second upsampling layer and the unfused feature map (the pixel is 128 × 128) output by the second downsampling layer, and then performs 2 times of 3 × 3 convolution (Relu) and then 1 time of 2 × 2 upsampling (the step size is 2); the fourth upsampling layer superimposes the feature map (with 256 × 256 pixels) output by the third upsampling layer and the unfused feature map (with 256 × 256 pixels) output by the first downsampling layer, and then performs 2 times of 3 × 3 convolution (Relu) and one time of 1 × 1 convolution to output the segmentation result.

As can be seen from fig. 2, the left encoder includes 5 layers of convolution and 4 times of pooling, the right decoder includes 4 layers of convolution and 4 times of upsampling, the number of pixels of the input and output images is kept consistent, and then the pixels of the original image are returned in equal proportion through the step S2-3.

S4: and (3) identifying the first remote sensing image processed in the step (S1) by using the U-net network model trained in the step (S3) to obtain a bare land block result, and subsequently converting '1' value data into vector data through grid vector conversion to obtain bare land block spatial data.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims

1. A remote sensing image bare land image identification method based on a U-Net network is characterized by comprising the following steps:

s4: and (4) identifying the first remote sensing image by using the U-net network model trained in the step (S3) and outputting the first remote sensing image to the bare land image.

2. The method for recognizing the bare land image of the remote sensing image based on the U-Net network as claimed in claim 1, wherein the S1 comprises the following steps:

the formula for the calculation of the radiation correction is:

in the formula (1), rho is the reflectivity and is dimensionless; d is a distance between day and ground parameter and is dimensionless; ESUN _λ Is the solar spectral irradiance in W.m ² ·μm ^-1 (ii) a Theta is the solar zenith angle and the unit is degree; l is _λ The relationship between the radiance received by the satellite and the original remote sensing image brightness DN is as follows:

L _λ ＝Gain×DN+Bias (2)

in formula (2), DN is the image brightness value, L _λ Represents the radiance of the satellite reception in W.m ^-2 ·sr ^-1 ·μm ^-1 (ii) a Gain is Gain in W.m ^-2 ·sr ^-1 ·μm ^-1 Gain is equal to (L) _max -L _min )/255，L _max And L _min Maximum and minimum spectral radiance values; 255 is the maximum gray level of an 8-bit quantized image, ranging from 1,255](ii) a Bias is Bias;

3. The method for recognizing the bare land image of the remote sensing image based on the U-Net network as claimed in claim 2, wherein the S1-1 comprises:

the polynomial model is shown below:

the expression form is as follows:

in the formula (4), P _n (X _n ，Y _n ，Z _n ) The polynomial function represents the X-axis coordinate, the Y-axis coordinate and the Z-axis coordinate of the nth coordinate point;

in the formula (5), X ₀ 、Y ₀ 、Z ₀ 、r ₀ 、c ₀ Representing a normalized translation parameter, X _s 、Y _s 、Z _s 、r _s 、c _s Representing a normalized scale parameter, X _n 、Y _n 、Z _n 、r _c 、c _n Indicating the normalized coordinates.

4. The method for recognizing the bare land image of the remote sensing image based on the U-Net network as claimed in claim 2, wherein in S1-3, the method for spatially homogenizing color comprises:

5. The method for recognizing the bare land image of the remote sensing image based on the U-Net network as claimed in claim 1, wherein the S2 comprises the following steps:

and (3) cutting by using a regular grid, wherein the cutting follows the 2n rule of long and short edge pixels, and the length calculation formula of the regular grid is as follows:

L＝Cell×B (6)

6. The method for recognizing the bare land image of the remote sensing image based on the U-Net network as claimed in claim 1, wherein in S3, the encoder comprises a first downsampling layer, a second downsampling layer, a third downsampling layer, a fourth downsampling layer and a fifth downsampling layer which are sequentially connected from top to bottom;

the first down-sampling layer performs 2 times of 3 × 3 convolution on the input image, and performs 1 time of 2 × 2 pooling; the second downsampling layer performs 2 times of 3 × 3 convolution on the feature map output by the first downsampling layer, and performs 1 time of 2 × 2 pooling; the third downsampling layer performs 2 times of 3 × 3 convolution and 1 time of 2 × 2 pooling on the feature map output by the second downsampling layer; the fourth downsampling layer performs 2 times of 3 × 3 convolution and 1 time of 2 × 2 pooling on the feature map output by the third downsampling layer; the fifth downsampling layer performs 2 times of 3 × 3 convolution on the feature map output by the fourth downsampling layer.

7. The method for recognizing the bare land image of the remote sensing image based on the U-Net network as claimed in claim 1, wherein in S3, the decoder comprises a first up-sampling layer, a second up-sampling layer, a third up-sampling layer and a fourth up-sampling layer which are connected in sequence from bottom to top;

the first up-sampling layer performs 1-time 2 × 2 up-sampling on the feature map output by the fifth down-sampling layer, then superimposes the feature map output by the fourth down-sampling layer and not pooled feature maps, performs 2-time 3 × 3 convolution after the superimposition, and then performs 1-time 2 × 2 up-sampling; the second up-sampling layer superposes the feature map output by the first up-sampling layer and the unslusterized feature map output by the third down-sampling layer, and then 2 times of 3 multiplied by 3 convolution and 1 time of 2 multiplied by 2 up-sampling are carried out; the third up-sampling layer superposes the feature map output by the second up-sampling layer and the feature map output by the second down-sampling layer and then carries out 2 times of 3 × 3 convolution and 1 time of 2 × 2 up-sampling; and the fourth up-sampling layer superposes the feature map output by the third up-sampling layer and the unflustered feature map output by the first down-sampling layer, and then performs 2 times of 3 × 3 convolution and one time of 1 × 1 convolution to output a segmentation result.