CN108764039B - Neural network, building extraction method of remote sensing image, medium and computing equipment - Google Patents

Neural network, building extraction method of remote sensing image, medium and computing equipment Download PDF

Info

Publication number
CN108764039B
CN108764039B CN201810373725.3A CN201810373725A CN108764039B CN 108764039 B CN108764039 B CN 108764039B CN 201810373725 A CN201810373725 A CN 201810373725A CN 108764039 B CN108764039 B CN 108764039B
Authority
CN
China
Prior art keywords
layers
scale
remote sensing
neural network
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810373725.3A
Other languages
Chinese (zh)
Other versions
CN108764039A (en
Inventor
李祥
彭玲
胡媛
肖莎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Remote Sensing and Digital Earth of CAS
Original Assignee
Institute of Remote Sensing and Digital Earth of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Remote Sensing and Digital Earth of CAS filed Critical Institute of Remote Sensing and Digital Earth of CAS
Priority to CN201810373725.3A priority Critical patent/CN108764039B/en
Publication of CN108764039A publication Critical patent/CN108764039A/en
Application granted granted Critical
Publication of CN108764039B publication Critical patent/CN108764039B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/176Urban or other man-made structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a neural network, a building extraction method of remote sensing images, a medium and computing equipment. The disclosed neural network is used for building extraction of remote sensing images, and comprises: an input layer, first to fifth convolutional layers, first to fourth pooling layers in the VGG network; the input end of the first single-scale fusion layer is connected to the output end of the first coiled layer; the input ends of the second to fifth single-scale fusion layers are respectively connected to the output ends of the second to fifth convolution layers; the input ends of the first to fourth up-sampling layers are respectively connected to the output ends of the second to fifth single-scale fusion layers; the input end of the multi-scale splicing and fusing layer is connected to the output ends of the first single-scale fusing layer and the first to fourth upper sampling layers; and (5) outputting the layer. The neural network disclosed can effectively process buildings which are densely distributed and have various sizes, and improves the precision of automatic extraction of the buildings.

Description

Neural network, building extraction method of remote sensing image, medium and computing equipment
Technical Field
The invention relates to the field of neural networks and image processing, in particular to a neural network, a building extraction method of remote sensing images, a medium and computing equipment.
Background
With the rapid development of sensor technology, the spatial resolution of remote sensing images is continuously improved. Inspired by deep learning algorithm in the field of computer vision, at present, most scholars adopt a convolutional neural network to realize semantic segmentation task of remote sensing images. Although some of the most advanced methods have achieved good effects on the task of semantic segmentation of remote sensing images, some characteristics of the remote sensing images are not considered. Firstly, in the conventional computer vision semantic segmentation task, only a few to dozens of targets are generally arranged on an image to be detected, and the distribution among the targets is loose, as shown in fig. 1 (a). However, in remote sensing images, buildings are generally distributed more densely, especially in residential areas, as shown in fig. 1 (b). Secondly, in a conventional computer vision semantic segmentation task, the size of an object to be detected is generally large, the length and the width are generally dozens to hundreds of pixels, the size of a building in a remote sensing image is generally much smaller, and the change of the scale (the number of pixels corresponding to the images of different buildings) is also large, as shown in fig. 1 (c).
In order to ensure the accuracy of semantic segmentation, the accuracy of building (feature) extraction is ensured first. Although some technical solutions for extracting some specific targets in the remote sensing image by combining the convolutional neural network already exist in the prior art. For example, patent application publication No. CN107025440A discloses a method for extracting a road from a remote sensing image based on a full convolution neural network, and the disclosed technical solution uses the full convolution neural network to realize structural output, so as to fully mine the two-dimensional geometric structure correlation of the road in the remote sensing image. However, an effective method for extracting feature information of buildings with different scales in a remote sensing image by fully utilizing a convolutional neural network does not exist in the prior art.
Therefore, a new technical scheme needs to be provided to combine the convolutional neural network to fuse the image features under different scales, so as to effectively improve the accuracy of the automatic extraction of buildings with different scales.
Disclosure of Invention
The neural network system is used for building extraction of remote sensing images and comprises the following components:
an input layer, first to fifth convolutional layers, first to fourth pooling layers in the VGG network;
the input end of the first single-scale fusion layer is connected to the output end of the first convolution layer and is used for fusing the first-scale multi-channel feature map output by the first convolution layer and outputting the fused first-scale fusion single-channel feature map;
input ends of the second to fifth single-scale fusion layers are respectively connected to output ends of the second to fifth convolution layers and are used for respectively fusing second to fifth scale multichannel feature maps output by the second to fifth convolution layers and respectively outputting fused second to fifth scale fusion single-channel feature maps;
the input ends of the first to fourth up-sampling layers are respectively connected to the output ends of the second to fifth single-scale fusion layers;
the input end of the multi-scale splicing and fusing layer is connected to the output ends of the first single-scale fusing layer and the first to fourth upper sampling layers and is used for fusing the feature maps output by the first single-scale fusing layer and the first to fourth upper sampling layers and outputting the fused multi-scale fusion single-channel feature map;
an output layer, the input end of which is connected with the output end of the multi-scale splicing fusion layer and is used for outputting the building characteristic diagram based on the multi-scale fusion single-channel characteristic diagram,
and the two-dimensional single-channel characteristic diagram with the same resolution as that of the remote sensing image is output by the output ends of the first single-scale fusion layer, the first up-sampling layer, the second up-sampling layer, the fourth up-sampling layer and the multi-scale splicing fusion layer.
The neural network system according to the present invention, further comprising:
and the first to fourth clipping layers are respectively arranged between the first to fourth up-sampling layers and the multi-scale splicing and fusing layer and are used for respectively clipping the feature maps output by the first to fourth up-sampling layers to the resolution ratio same as that of the original input image.
According to the neural network system of the present invention, the following layers are further included after the first to fifth convolutional layers:
the first to fifth ReLU layers, the first to fifth Batch Normalization layers and the first to fifth Dropout layers are used for avoiding over-fitting and improving the generalization capability of the neural network system.
The building extraction method for the remote sensing image comprises the following steps:
constructing a trained neural network system as described above;
and acquiring a building characteristic diagram corresponding to the remote sensing image by using the trained neural network system.
According to the building extraction method for remote sensing images, before the step of constructing the trained neural network system, the method further comprises the following steps:
and training the neural network system by using a data set comprising the remote sensing training image of the building and the label image corresponding to the remote sensing training image to obtain the trained neural network system.
According to the building extraction method for the remote sensing image, after the building characteristic map corresponding to the remote sensing image is obtained, the final building distribution map is obtained by using a threshold value method.
According to the building extraction method for the remote sensing image, a Sigmoid Cross energy Loss function is used and a stochastic gradient descent algorithm is used when a neural network system is trained.
A computer-readable storage medium according to the invention, having stored thereon a computer program, which when executed by a processor, carries out the steps of the method as described above.
The computing device according to the invention comprises a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method as described above when executing the program.
According to the technical scheme of the invention, the multi-scale information in the deep convolutional neural network is directly utilized, so that the building with dense distribution and various scales can be effectively processed, and the precision of automatic extraction of the building is improved. In addition, according to the above technical solution of the present invention, the whole image is used as an input, and the segmentation (i.e., building extraction) result is directly output without performing overlapped slices, thereby greatly improving the efficiency of building extraction.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention. In the drawings, like reference numerals are used to indicate like elements. The drawings in the following description are directed to some, but not all embodiments of the invention. For a person skilled in the art, other figures can be derived from these figures without inventive effort.
Fig. 1 shows a schematic diagram of a conventional image to be detected and a remote sensing image to be detected according to the present invention.
Fig. 2 schematically shows a schematic block diagram of a neural network system according to the present invention.
Fig. 3 schematically shows a flow chart of a building extraction method for remote sensing images according to the invention.
FIG. 4 schematically illustrates different image maps output by the various layers of the neural network system shown in FIG. 2.
Fig. 5 exemplarily shows an original satellite remote sensing image, a corresponding real label image thereof, and a building feature diagram actually output according to the technical solution of the present invention.
Fig. 6 shows exemplary accuracy vs. recall curves for different relaxation coefficients according to the solution of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
Fig. 1 shows a schematic diagram of a conventional image to be detected and a remote sensing image to be detected according to the present invention.
As described in the background art with reference to fig. 1, because the remote sensing image and the conventional image have the above difference, a new technical solution needs to be provided to combine the convolutional neural network to fuse the image features at different scales, so as to effectively improve the accuracy of the automatic extraction of buildings at different scales.
Fig. 2 schematically shows a schematic block diagram of a neural network system according to the present invention.
As shown in fig. 2, the neural network system according to the present invention, for building extraction of a remote sensing image, includes:
input layers (corresponding to "input image" in fig. 2) in the VGG network, first to fifth convolutional layers (corresponding to layer sets in which "Conv 1_2, Conv2_2, Conv3_3, Conv4_3, and Conv5_ 3" in fig. 2, respectively), first to fourth pooling layers (corresponding to "Pool 1, Pool2, Pool3, and Pool 4" layers in fig. 2, respectively);
a first single-scale fusion layer (corresponding to the 1 st "Conv" on the left side of the horizontal distribution in fig. 2), an input end of the first single-scale fusion layer being connected to an output end of the first convolution layer, for fusing the first-scale multi-channel feature map output by the first convolution layer and outputting a fused first-scale fused single-channel feature map;
second to fifth single-scale fusion layers (corresponding to 2 nd to 5 th "Conv" horizontally distributed in fig. 2, respectively), the input ends of which are connected to the output ends of the second to fifth convolution layers, respectively, for fusing the second to fifth scale multi-channel feature maps output by the second to fifth convolution layers, respectively, and outputting the fused second to fifth scale fusion single-channel feature maps, respectively;
first to fourth Upsampling layers (corresponding to "2 × Upsampling", "4 × Upsampling", "8 × Upsampling", and "16 × Upsampling" in fig. 2, respectively), input ends of the first to fourth Upsampling layers being connected to output ends of the second to fifth single-scale fusion layers, respectively;
a multi-scale splicing and fusing layer (corresponding to the "Concat" layer in fig. 2), wherein an input end of the multi-scale splicing and fusing layer is connected to output ends of the first single-scale fusing layer and the first to fourth upsampling layers, and is used for fusing the feature maps output by the first single-scale fusing layer and the first to fourth upsampling layers and outputting a fused multi-scale fusion single-channel feature map;
an output layer (corresponding to "Conv" above "P" in fig. 2), an input end of which is connected to an output end of the multi-scale stitch-fusion layer, for outputting a building feature map (corresponding to "P" in fig. 2) based on the multi-scale fusion single-channel feature map,
and the two-dimensional single-channel characteristic diagram with the same resolution as that of the remote sensing image is output by the output ends of the first single-scale fusion layer, the first up-sampling layer, the second up-sampling layer, the fourth up-sampling layer and the multi-scale splicing fusion layer.
In the above technical solution, although the number (i.e., number) C of channels of the input feature maps corresponding to the first to fifth single-scale fusion layers is different (64, 128, 256, 512, and 512, respectively, as shown in fig. 2), since they respectively use different convolution kernels having a dimension of 1 × 1 (64, 128, 256, 512, and 512) × 1 to fuse all the input feature maps at the respective scales, it is finally possible to output 1 single-channel feature map (i.e., the first to fifth scale fusion single-channel feature maps, which are single-channel feature maps of 5 different resolutions having resolutions of 2562, 1282, 642, 322, and 162, respectively).
For the multi-scale merged fusion layer, similarly to the fusion method of the first to fifth single-scale fusion layers, the number C (i.e., the number) of channels of the input feature map is 5 (including the feature map output by the first single-scale fusion layer and 4 new feature maps output by the first to fourth upsampling layers obtained by using an upsampling method, and the resolutions of the feature maps are the same as the resolution of the original remote sensing image). Thus, these 5 signatures can be stitched into one 5-channel probability map (i.e., signature), and a single-channel prediction map (i.e., multi-scale fusion single-channel signature) can be obtained using a convolution kernel with a dimension of 1 × 5 × 1.
Although the solution shown in fig. 2 does not require clipping of the first to fourth upsampling layers, optionally, the neural network system may further include:
the first to fourth clipping layers (respectively corresponding to "Crop" in the layer set where "P2" to "P5" in fig. 2) are respectively disposed between the first to fourth upsampling layers and the multi-scale stitching fusion layer, and are used for respectively clipping the feature maps output by the first to fourth upsampling layers to the same resolution as the original input image. And automatically adapting to the condition that the resolution of the remote sensing image is inconsistent with the resolution of the output feature maps of the first to fourth upsampling layers.
Optionally, after the first to fifth convolutional layers, the neural network system further includes the following layers (not shown in fig. 2):
the first to fifth ReLU layers, the first to fifth Batch Normalization layers and the first to fifth Dropout layers are used for avoiding over-fitting and improving the generalization capability of the neural network system.
The network shallow layer as shown in fig. 2 generates a feature map with fine spatial resolution but low level semantic information, the deep layer as shown in fig. 2 generates a coarse feature map with high level semantic information, and the feature map of the middle layer as shown in fig. 2 corresponds to some middle level features. The technical scheme can integrate the different feature maps, so that buildings with different appearances or shelters can be effectively extracted.
Fig. 3 schematically shows a flow chart of a building extraction method for remote sensing images according to the invention.
The building extraction method for the remote sensing image comprises the following steps:
step S302: constructing a trained neural network system as described above;
step S304: the trained neural network system is used to obtain a building probability map (i.e., a building feature map, i.e., a building extraction prediction map "P", as described above) corresponding to the remote sensing image (i.e., "input image" in fig. 2).
Optionally, the building extraction method for remote sensing images further includes, before step S302:
step S302': the neural network system is trained using a data set including a remote sensing training image (i.e., "input image" in fig. 2) of a building and a tag image (i.e., "input map" in fig. 2) corresponding to the remote sensing training image to obtain a trained neural network system.
Alternatively, in step S304 and step S302', the above-described building feature map (final building extraction result) is obtained using a threshold method.
Optionally, in step S302', a Sigmoid Cross control Loss function (i.e., a calculation function corresponding to "Loss" in fig. 2) and a random gradient descent algorithm are used in training the neural network system.
In order that those skilled in the art will better understand the technical advantages of the present invention, the following description should be taken in conjunction with the specific embodiments.
FIG. 4 schematically illustrates different image maps output by the various layers of the neural network system shown in FIG. 2.
As shown in fig. 4, fig. 4(a) is an original satellite remote sensing image (i.e., "input image" in fig. 2) selected from Massachusetts remote sensing datasets (of a first scale), and fig. 4(b) is a feature image (i.e., "P2" in fig. 2) obtained by interpolating a second scale feature image with a small receptive field, and based on the feature image, low-level features such as edges and corners of the original satellite remote sensing image can be extracted. Fig. 4(c) is an interpolated feature map ("P3" in fig. 2) of a third scale feature map with a larger receptive field, which is capable of delineating a preliminary outline of a building. Fig. 4(d) is a feature map (i.e., "P4" in fig. 2) obtained by interpolating the fourth scale feature map having a larger receptive field, and based on the feature map, it is possible to identify a non-building region such as a lake. Fig. 4(e) is a feature diagram (i.e., "P5" in fig. 2) obtained by interpolating the fifth scale feature diagram having the largest receptive field, and based on the feature diagram, non-building areas such as lakes and bare lands can be identified. Finally, integrating the semantic information and the spatial information at multiple levels results in a reliable prediction (the multi-scale fusion single-channel feature map described above, i.e., "P" in fig. 2), as shown in fig. 4 (f).
Fig. 5 exemplarily shows an original satellite remote sensing image, a corresponding real label image thereof, and a building feature diagram actually output according to the technical solution of the present invention.
As shown in fig. 5, fig. 5(a) is an original satellite remote sensing image selected from Massachusetts remote sensing data sets, fig. 5(b) is a real label image thereof, and fig. 5(c) is a predicted label image (i.e., a building feature image actually output according to the technical solution of the present invention). According to the technical scheme, the building distribution condition can be well predicted, and the building boundary is accurate.
Fig. 6 shows exemplary accuracy vs. recall curves for different relaxation coefficients according to the solution of the invention.
The accuracy is defined as the proportion of the detected pixel within the range of rho adjacent pixels of the real pixel, and the recall rate is defined as the proportion of the real pixel within the range of rho adjacent pixels of the detected pixel. Fig. 6(a) is an accuracy-recall curve according to the present invention at ρ ═ 3, which corresponds to a model accuracy of about 0.9668 (corresponding to the break key at the point where accuracy and recall are equal, indicated by the symbol x in fig. 6 (a)). Fig. 6(b) is an accuracy-recall curve according to the present invention at ρ ═ 0, which corresponds to a model accuracy of about 0.8424 (corresponding to the break key at the point where accuracy and recall are equal, indicated by the symbol x in fig. 6 (b)).
Table 1 gives the architectural extraction performance comparisons between different schemes including the Mnih-CNN scheme and the Mnih-CNN + CRF scheme disclosed in his doctor's paper "Machine learning for aural image labeling, docoral (2013)", and the Saito-multi-MA scheme and the Saito-multi-MA & CIS scheme disclosed in Saito in "Multiple object extraction from aural image with connected audio network works", and the technical scheme of the present invention.
TABLE 1 comparison of Performance between different technical solutions
Model (model) Breakeven(ρ=3) Breakeven(ρ=0) Predicting time(s)
Mnih-CNN 0.9271 0.7661 8.7
Mnih-CNN+CRF 0.9282 0.7638 26.6
Saito-multi-MA 0.9503 0.7873 67.72
Saito-multi-MA&CIS 0.9509 0.7872 67.84
Technical scheme of the invention 0.9668 0.8424 2.05
Note: the prediction time is the average time required for predicting a single 1500 × 1500 test image, and the model of the used display card is NVIDIA TITAN X.
As can be seen from the results in table 1, the above-described solution according to the present invention has better technical effects both in terms of model accuracy under different relaxation coefficients (ρ ═ 3 and ρ ═ 0) and in terms of prediction time. Not only can obviously improve the extraction precision, but also can reduce the operation time.
In connection with the above technical solution according to the present invention, a computer-readable storage medium is also provided, on which a computer program is stored, which when executed by a processor implements the steps of the method shown in fig. 3.
In combination with the above technical solution according to the present invention, a computing device is further provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the steps of the method shown in fig. 3 are implemented.
According to the technical scheme of the invention, the VGG network is used as a basic structure, the last layer of each resolution characteristic diagram in the network is extracted, and the characteristic diagrams are fused into a single-channel characteristic diagram by convolution operation. And obtaining a final prediction result through resampling and feature map splicing.
According to the technical scheme of the invention, the multi-scale information in the deep convolutional neural network is directly utilized, so that the building with dense distribution and various scales can be effectively processed, and the precision of automatic extraction of the building is improved. In addition, according to the above technical solution of the present invention, the whole image is used as an input, and the segmentation (i.e., building extraction) result is directly output without performing overlapped slices, thereby greatly improving the efficiency of building extraction.
According to the technical scheme of the invention, the method also has the following advantages: 1) the resolution characteristic graphs can be fused, so that input image multi-scale information is extracted, and accurate extraction of buildings is realized; 2) as model integration is not required during prediction (i.e., extraction), and post-processing operation is not required, the building extraction efficiency is greatly improved; 3) since the full convolution network is used, an input image of an arbitrary size can be accepted as video memory permits.
In addition, according to the technical scheme of the invention, the whole image is directly used as input, and the segmentation (namely, building extraction) result can be obtained through one-time network forward propagation, so that model integration in a mode of overlapping slices is not required, post-processing operation is not required, and the building extraction efficiency is greatly improved. The result of the comparison test based on the Massachusetts remote sensing data set shows that the technical scheme of the invention is obviously superior to other methods in precision and efficiency.
The above-described aspects may be implemented individually or in various combinations, and such variations are within the scope of the present invention.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
Finally, it should be noted that: the above examples are only for illustrating the technical solutions of the present invention, and are not limited thereto. Although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. A neural network system for building extraction of remote sensing images, comprising:
an input layer, first to fifth convolutional layers, first to fourth pooling layers in the VGG network;
the input end of the first single-scale fusion layer is connected to the output end of the first convolution layer and is used for fusing the first-scale multi-channel feature map output by the first convolution layer and outputting a fused first-scale fusion single-channel feature map;
the input ends of the second to fifth single-scale fusion layers are respectively connected to the output ends of the second to fifth convolutional layers, and the second to fifth single-scale fusion layers are used for respectively fusing the second to fifth scale multi-channel feature maps output by the second to fifth convolutional layers and respectively outputting the fused second to fifth scale fusion single-channel feature maps;
first to fourth upsampling layers, input ends of the first to fourth upsampling layers being connected to output ends of the second to fifth single-scale fusion layers, respectively;
the input end of the multi-scale splicing fusion layer is connected to the output ends of the first single-scale fusion layer and the first to fourth upsampling layers, and the multi-scale splicing fusion layer is used for fusing the feature maps output by the first single-scale fusion layer and the first to fourth upsampling layers and outputting a fused multi-scale fusion single-channel feature map;
an output layer, an input of the output layer connected to an output of the multi-scale stitch fusion layer, for outputting a building feature map based on the multi-scale fusion single-channel feature map,
and the two-dimensional single-channel feature map with the same resolution as that of the remote sensing image is output by the first single-scale fusion layer, the output ends of the first to fourth up-sampling layers and the output end of the multi-scale splicing and fusion layer.
2. The neural network system of claim 1, further comprising:
and the first to fourth clipping layers are respectively arranged between the first to fourth upsampling layers and the multi-scale splicing and fusing layer and are used for respectively clipping the feature maps output by the first to fourth upsampling layers to the resolution ratio same as that of the original input image.
3. The neural network system of claim 1 or 2, further comprising the following layers after the first to fifth convolutional layers:
the first to fifth ReLU layers, the first to fifth Batch Normalization layers and the first to fifth Dropout layers are used for avoiding over-fitting and improving the generalization capability of the neural network system.
4. A building extraction method for remote sensing images is characterized by comprising the following steps:
constructing a trained neural network system of any one of claims 1-3;
and acquiring a building characteristic diagram corresponding to the remote sensing image by using the trained neural network system.
5. The building extraction method for remote sensing images as claimed in claim 4, further comprising, before the step of constructing the trained neural network system as claimed in any one of claims 1 to 3:
and training the neural network system by using a data set comprising a remote sensing training image of a building and a label image corresponding to the remote sensing training image to obtain the trained neural network system.
6. The building extraction method for remote sensing images as claimed in claim 4 or 5, characterized in that after obtaining the building feature map corresponding to the remote sensing image, a final building distribution map is obtained by using a threshold method.
7. The building extraction method for remote sensing images as claimed in claim 5, wherein in training the neural network system, Sigmoid Cross energy Loss function is used and stochastic gradient descent algorithm is used.
8. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the method according to any one of claims 4 to 7.
9. A computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of claims 4 to 7 when executing the program.
CN201810373725.3A 2018-04-24 2018-04-24 Neural network, building extraction method of remote sensing image, medium and computing equipment Active CN108764039B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810373725.3A CN108764039B (en) 2018-04-24 2018-04-24 Neural network, building extraction method of remote sensing image, medium and computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810373725.3A CN108764039B (en) 2018-04-24 2018-04-24 Neural network, building extraction method of remote sensing image, medium and computing equipment

Publications (2)

Publication Number Publication Date
CN108764039A CN108764039A (en) 2018-11-06
CN108764039B true CN108764039B (en) 2020-12-01

Family

ID=64011327

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810373725.3A Active CN108764039B (en) 2018-04-24 2018-04-24 Neural network, building extraction method of remote sensing image, medium and computing equipment

Country Status (1)

Country Link
CN (1) CN108764039B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109859167A (en) * 2018-12-28 2019-06-07 中国农业大学 The appraisal procedure and device of cucumber downy mildew severity
CN109753928B (en) * 2019-01-03 2022-03-29 北京百度网讯科技有限公司 Method and device for identifying illegal buildings
CN109871798B (en) * 2019-02-01 2021-06-29 浙江大学 Remote sensing image building extraction method based on convolutional neural network
CN109934110B (en) * 2019-02-02 2021-01-12 广州中科云图智能科技有限公司 Method for identifying illegal buildings near river channel
CN110163207B (en) * 2019-05-20 2022-03-11 福建船政交通职业学院 Ship target positioning method based on Mask-RCNN and storage device
CN110263797B (en) * 2019-06-21 2022-07-12 北京字节跳动网络技术有限公司 Method, device and equipment for estimating key points of skeleton and readable storage medium
CN110991252B (en) * 2019-11-07 2023-07-21 郑州大学 Detection method for people group distribution and counting in unbalanced scene
CN113486840B (en) * 2021-07-21 2022-08-30 武昌理工学院 Building rapid extraction method based on composite network correction
CN116052019B (en) * 2023-03-31 2023-07-25 深圳市规划和自然资源数据管理中心(深圳市空间地理信息中心) High-quality detection method suitable for built-up area of large-area high-resolution satellite image

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106056628A (en) * 2016-05-30 2016-10-26 中国科学院计算技术研究所 Target tracking method and system based on deep convolution nerve network feature fusion
CN106886977A (en) * 2017-02-08 2017-06-23 徐州工程学院 A kind of many figure autoregistrations and anastomosing and splicing method
CN107092870A (en) * 2017-04-05 2017-08-25 武汉大学 A kind of high resolution image semantics information extracting method and system
CN107092871A (en) * 2017-04-06 2017-08-25 重庆市地理信息中心 Remote sensing image building detection method based on multiple dimensioned multiple features fusion
CN107123083A (en) * 2017-05-02 2017-09-01 中国科学技术大学 Face edit methods
CN107169974A (en) * 2017-05-26 2017-09-15 中国科学技术大学 It is a kind of based on the image partition method for supervising full convolutional neural networks more
CN107220657A (en) * 2017-05-10 2017-09-29 中国地质大学(武汉) A kind of method of high-resolution remote sensing image scene classification towards small data set

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10489703B2 (en) * 2015-05-20 2019-11-26 Nec Corporation Memory efficiency for convolutional neural networks operating on graphics processing units
US11055537B2 (en) * 2016-04-26 2021-07-06 Disney Enterprises, Inc. Systems and methods for determining actions depicted in media contents based on attention weights of media content frames

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106056628A (en) * 2016-05-30 2016-10-26 中国科学院计算技术研究所 Target tracking method and system based on deep convolution nerve network feature fusion
CN106886977A (en) * 2017-02-08 2017-06-23 徐州工程学院 A kind of many figure autoregistrations and anastomosing and splicing method
CN107092870A (en) * 2017-04-05 2017-08-25 武汉大学 A kind of high resolution image semantics information extracting method and system
CN107092871A (en) * 2017-04-06 2017-08-25 重庆市地理信息中心 Remote sensing image building detection method based on multiple dimensioned multiple features fusion
CN107123083A (en) * 2017-05-02 2017-09-01 中国科学技术大学 Face edit methods
CN107220657A (en) * 2017-05-10 2017-09-29 中国地质大学(武汉) A kind of method of high-resolution remote sensing image scene classification towards small data set
CN107169974A (en) * 2017-05-26 2017-09-15 中国科学技术大学 It is a kind of based on the image partition method for supervising full convolutional neural networks more

Also Published As

Publication number Publication date
CN108764039A (en) 2018-11-06

Similar Documents

Publication Publication Date Title
CN108764039B (en) Neural network, building extraction method of remote sensing image, medium and computing equipment
US11151725B2 (en) Image salient object segmentation method and apparatus based on reciprocal attention between foreground and background
CN108664981B (en) Salient image extraction method and device
CN108876792B (en) Semantic segmentation method, device and system and storage medium
CN112132156B (en) Image saliency target detection method and system based on multi-depth feature fusion
CN112016614B (en) Construction method of optical image target detection model, target detection method and device
CN110781756A (en) Urban road extraction method and device based on remote sensing image
CN109003297B (en) Monocular depth estimation method, device, terminal and storage medium
CN111369581A (en) Image processing method, device, equipment and storage medium
CN110838125A (en) Target detection method, device, equipment and storage medium of medical image
CN111275034B (en) Method, device, equipment and storage medium for extracting text region from image
CN107506792B (en) Semi-supervised salient object detection method
CN111291825A (en) Focus classification model training method and device, computer equipment and storage medium
CN112307853A (en) Detection method of aerial image, storage medium and electronic device
CN113689373B (en) Image processing method, device, equipment and computer readable storage medium
CN110335228B (en) Method, device and system for determining image parallax
CN112132867B (en) Remote sensing image change detection method and device
CN113516697A (en) Image registration method and device, electronic equipment and computer-readable storage medium
CN116994000A (en) Part edge feature extraction method and device, electronic equipment and storage medium
CN116798041A (en) Image recognition method and device and electronic equipment
CN112085652A (en) Image processing method and device, computer storage medium and terminal
CN110570376A (en) image rain removing method, device, equipment and computer readable storage medium
CN112651351B (en) Data processing method and device
CN115512405A (en) Background image processing method and device, electronic equipment and storage medium
CN109961083A (en) For convolutional neural networks to be applied to the method and image procossing entity of image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant