CN113361662B - Urban rail transit remote sensing image data processing system and method - Google Patents

Urban rail transit remote sensing image data processing system and method Download PDF

Info

Publication number
CN113361662B
CN113361662B CN202110831395.XA CN202110831395A CN113361662B CN 113361662 B CN113361662 B CN 113361662B CN 202110831395 A CN202110831395 A CN 202110831395A CN 113361662 B CN113361662 B CN 113361662B
Authority
CN
China
Prior art keywords
frame
feature
prediction
unit
remote sensing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110831395.XA
Other languages
Chinese (zh)
Other versions
CN113361662A (en
Inventor
张开婷
李俊
周立荣
蔺陆洲
贾蔡
祝宏
邓平科
杨军
马长斗
张迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Quantutong Position Network Co ltd
Original Assignee
Quantutong Position Network Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Quantutong Position Network Co ltd filed Critical Quantutong Position Network Co ltd
Priority to CN202110831395.XA priority Critical patent/CN113361662B/en
Publication of CN113361662A publication Critical patent/CN113361662A/en
Application granted granted Critical
Publication of CN113361662B publication Critical patent/CN113361662B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A30/00Adapting or protecting infrastructure or their operation
    • Y02A30/60Planning or developing urban green infrastructure

Abstract

The invention relates to the technical field of urban rail transit remote sensing image data processing, in particular to a system and a method for processing urban rail transit remote sensing image data. The system comprises a remote sensing image feature extraction module, a region recommendation module and an object prediction semantic segmentation module, and the invention fuses a convolutional neural network and a semantic segmentation algorithm to realize building objectification extraction, thereby solving the problems of low accuracy, low speed and the like of the existing building extraction based on the remote sensing image based on deep learning.

Description

Urban rail transit remote sensing image data processing system and method
Technical Field
The invention relates to the technical field of urban rail transit remote sensing image data processing, in particular to a system and a method for processing urban rail transit remote sensing image data.
Background
The existing method for extracting buildings from remote sensing images by using a deep learning algorithm is mainly Convolutional Neural Network (CNN). The convolutional neural network is an improved algorithm of a fully-connected neural network, changes the input nodes of the neural network from image pixels to the features extracted by image convolution and pooling, reduces the number of the input nodes of the neural network, reduces the network scale, and is suitable for processing two-dimensional image data.
The existing method comprises the steps of extracting a building from a remote sensing image by using a convolutional neural network, mainly dividing the building into two stages, training a model in the first stage, cutting a typical building from the remote sensing image to serve as training data, designing a convolutional layer, a pooling layer and a full-connection layer, training parameters of the neural network by using the training data, and carrying out forward propagation and calculation errors and backward propagation and regression convergence to enable the model to learn building characteristics; and secondly, predicting data, namely inputting a remote sensing image, sliding on the remote sensing image according to a designed window, inputting an image within the sliding window range into a convolutional neural network, and predicting whether a building exists or not by forward propagation, wherein the existing building is recorded in the original remote sensing image identification, so that building identification is realized.
In the prior art, an R-CNN algorithm (Region-CNN) is closest to the method, and is divided into four modules, namely a Region suggestion algorithm module (selective search), which is a method for generating a recommended interest Region according to the information of an image, and recommending about 1000-2000 potential building Region frames in a picture through the module; the feature extraction module is used for extracting the features of each potential building area image by using a classical convolutional neural network AlexNet; and then sending the extracted features into a linear classifier, classifying the extracted features according to the high-dimensional features extracted by the convolutional neural network by using a Support Vector Machine (SVM), scoring the features of each potential building region in the support vector machine, filtering the features by a threshold (0.5), sending the building region meeting the requirements to a boundary frame correction regression module, and accurately positioning the building region to a building outer frame rectangle by regression of four sets of parameters of the boundary frame. The R-CNN model training method is the same as other convolutional neural networks, a building training data set is constructed, and network parameters are adjusted according to forward propagation errors.
However, the R-CNN algorithm has the following general disadvantages:
1. R-CNN only supports the input of single-scale remote sensing pictures, and the existing method cannot compare the influence of multi-source and multi-size remote sensing images on a model because the resolution and the image quality of remote sensing data are different;
2. R-CNN can extract remote sensing image buildings with higher precision, but a preselected frame is completed by a selective search algorithm with slower speed and the calculation of a convolution network is repeated, so that the model training speed is slow and the memory occupation is large;
3. the R-CNN four modules are mutually independent and connected in series, so that parallel operation is impossible, and calculation resources are difficult to fully utilize;
4. the classifier is used as a support vector machine, so that the classifier is required to be trained for detecting the number of objects, and the training process is complex;
5. and the fusion instance segmentation is not performed, so that R-CNN can only extract the outer frame of the building, and the building object instance segmentation cannot be completed.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a processing system and a processing method for urban rail transit remote sensing image data, which are used for realizing building objectification extraction by fusing a convolutional neural network and a semantic segmentation algorithm, and solve the problems of low building precision, low speed and the like in remote sensing image extraction based on deep learning in the prior art.
The technical scheme adopted by the invention is as follows:
a processing system of urban rail transit remote sensing image data, which comprises a remote sensing image feature extraction module, a region recommendation module and an object prediction semantic segmentation module,
the remote sensing image feature extraction module is used for extracting trunk features of the urban rail transit remote sensing image and constructing a feature pyramid;
the region recommending module is used for extracting the features of the feature pyramid by using shared convolution to generate a recommending frame;
the object prediction semantic segmentation module is used for generating a prediction frame after convolution of the local features extracted by the suggestion frame, and intercepting the local features of the prediction frame from the shared feature map and the prediction frame to generate an object mask.
The remote sensing image feature extraction module comprises a trunk feature extraction unit, a feature pyramid construction unit and a feature pyramid unit, wherein,
the trunk feature extraction unit is used for extracting features of different levels of the remote sensing image by using a multi-layer residual convolution neural network, and generating four-level feature images with 256 channels;
the feature pyramid construction unit is used for starting convolution and up-sampling from the feature map with the lowest dimension, and constructing five-layer feature maps by superposing the feature map with the first-level feature map with the highest dimension to form a shared feature pyramid unit.
The region recommending module comprises a region recommending convolutional network unit, an object predicting unit, a frame adjusting unit and a generating proposal frame module unit, wherein,
the region recommended convolution network unit is used for extracting the features of the feature pyramid by using a layer of shared convolution;
the object prediction unit is used for predicting whether the sliding window of each characteristic point contains an object or not and adjusting parameters relative to the sliding window by two special convolutions, filtering whether the sliding window of the predicted characteristic point contains the object or not by using a threshold value, and the sliding window passing the threshold value is a preselected frame;
the frame adjustment unit adjusts the preselected frame window by using frame adjustment parameters, so that the suggestion frame generation module unit generates a suggestion frame.
The object prediction semantic segmentation module comprises a feature interception unit, an object classification prediction unit, a frame adjustment prediction unit, a mask feature extraction unit and a mask prediction unit, wherein,
the feature intercepting unit is used for interpolating the local features intercepted by the suggestion frame into 7*7 feature graphs through bilinear interpolation;
the object classification prediction unit is used for predicting classification results and frame adjustment parameters respectively by using two full-connection after two-layer convolution;
the frame adjustment prediction unit is used for adjusting the frame of the object with the classification result higher than the threshold value to be a prediction frame by using frame adjustment parameters;
the mask feature extraction unit is used for intercepting local features of the prediction frame from the shared feature map;
the mask prediction unit is used for predicting the object mask by interpolation after two-layer convolution and one-layer deconvolution by using bilinear interpolation unified feature map size of 14 x 14.
A processing method of urban rail transit remote sensing image data comprises the following steps:
A. inputting high-resolution remote sensing images, extracting trunk features by using a feature interception network, and constructing a feature pyramid;
B. inputting the trunk feature into an area recommendation network to predict a recommendation frame which possibly contains a building;
C. according to the main feature interception of the suggestion frame, inputting the main feature into a fully-connected network prediction object type, using a frame adjustment parameter adjustment suggestion frame of synchronous prediction as a prediction frame, intercepting a local feature map from the main feature according to the prediction frame, and inputting the local feature map into a convolution network of a semantic segmentation module to predict a building mask;
D. and marking the obtained object prediction frame and the building mask on the image.
The step A specifically comprises the following steps:
a1, firstly, extracting features of different layers of a remote sensing image by using a multi-layer residual convolution neural network, and generating four-layer feature images with 256 channels;
a2, starting convolution and up-sampling from the feature map with the lowest dimension, and superposing the feature map with the first level higher dimension to construct a five-layer feature map as a shared feature pyramid.
The step B specifically comprises the following steps:
and extracting the characteristic points of the characteristic pyramid by using a layer of shared convolution, respectively predicting whether the sliding window of each characteristic point contains an object or not and adjusting parameters relative to the sliding window by using two specific convolutions, filtering whether the sliding window of the predicted characteristic point contains the object by using a threshold value, and adjusting the window of the preselected frame by using the frame adjusting parameters to obtain a recommended frame, wherein the sliding window passing the threshold value is the preselected frame.
The step C specifically comprises the following steps:
c1, carrying out bilinear interpolation on local features intercepted by the suggestion frame to obtain a 7*7 feature map, and respectively predicting classification results and frame adjustment parameters by using two full-connection after two-layer convolution;
c2, adjusting the frame of the object with the classification result higher than the threshold value into a prediction frame by using the frame adjustment parameter;
and C3, intercepting local features of the prediction frame from the shared feature map, using bilinear interpolation to unify feature map sizes to be 14 x 14, and carrying out interpolation prediction on the object mask after two-layer convolution and one-layer deconvolution.
The technical scheme provided by the invention has the beneficial effects that:
(1) The multi-source and multi-size remote sensing images are fused to compare the training effect of the model, so that the algorithm model has better robustness and generalization in the process of coping with the multi-source remote sensing images;
(2) The shared convolution layer is adopted, so that the speed of training and applying an algorithm is greatly simplified, each picture only needs to be convolved once, and independent convolution calculation on each frame to be selected is not needed;
(3) The candidate region is determined by adopting a convolutional neural network method, so that the pre-selection target frame can be trained and learned by training a learning rule, and the extraction efficiency of the pre-selection frame is improved;
(4) The multi-task full-connection network is adopted, the same set of parameters are shared by type prediction and angular point offset, and the prediction accuracy is improved while the processing speed is greatly prompted;
(5) And adding a semantic segmentation module to the convolutional neural network object classification, objectively identifying the building and simultaneously predicting the detailed outline of the building.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a method for processing urban rail transit remote sensing image data.
Fig. 2 is a block diagram of a remote sensing image feature extraction module of a processing system for urban rail transit remote sensing image data according to the present invention.
Fig. 3 is a block diagram of a regional recommendation module of a processing system for urban rail transit remote sensing image data according to the present invention.
Fig. 4 is a structural block diagram of an object prediction semantic segmentation module of the urban rail transit remote sensing image data processing system.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
Example 1
As shown in fig. 1, a processing method of urban rail transit remote sensing image data comprises the following steps:
A. inputting high-resolution remote sensing images, extracting trunk features by using a feature interception network, and constructing a feature pyramid;
B. inputting the trunk feature into an area recommendation network to predict a recommendation frame which possibly contains a building;
C. according to the main feature interception of the suggestion frame, inputting the main feature into a fully-connected network prediction object type, using a frame adjustment parameter adjustment suggestion frame of synchronous prediction as a prediction frame, intercepting a local feature map from the main feature according to the prediction frame, and inputting the local feature map into a convolution network of a semantic segmentation module to predict a building mask;
D. and marking the obtained object prediction frame and the building mask on the image.
The step A specifically comprises the following steps:
a1, firstly, extracting features of different layers of a remote sensing image by using a multi-layer residual convolution neural network, and generating four-layer feature images with 256 channels;
a2, starting convolution and up-sampling from the feature map with the lowest dimension, and superposing the feature map with the first level higher dimension to construct a five-layer feature map as a shared feature pyramid.
The step B specifically comprises the following steps:
and extracting the characteristic points of the characteristic pyramid by using a layer of shared convolution, respectively predicting whether the sliding window of each characteristic point contains an object or not and adjusting parameters relative to the sliding window by using two specific convolutions, filtering whether the sliding window of the predicted characteristic point contains the object by using a threshold value, and adjusting the window of the preselected frame by using the frame adjusting parameters to obtain a recommended frame, wherein the sliding window passing the threshold value is the preselected frame.
The step C specifically comprises the following steps:
c1, carrying out bilinear interpolation on local features intercepted by the suggestion frame to obtain a 7*7 feature map, and respectively predicting classification results and frame adjustment parameters by using two full-connection after two-layer convolution;
c2, adjusting the frame of the object with the classification result higher than the threshold value into a prediction frame by using the frame adjustment parameter;
and C3, intercepting local features of the prediction frame from the shared feature map, using bilinear interpolation to unify feature map sizes to be 14 x 14, and carrying out interpolation prediction on the object mask after two-layer convolution and one-layer deconvolution.
A specific example is given below for illustration:
the model input is an RGB three-channel high-resolution remote sensing image slice, and the dimension is as follows: 1024×1024×3 (length×width×channel).
(1) The method comprises the steps of firstly entering a feature extraction module, wherein the purpose of the module is to obtain a main feature of an image (see 2), firstly obtaining four layers of main features (the dimensions are 256×256, 128×128×256, 64×64×256 and 32×256 respectively) through a main feature extraction convolution network, then inputting the obtained four layers of main features into a pyramid construction convolution network to obtain five layers of feature graphs (P2, P3, P4, P5 and P6, the dimensions are 256×256, 128×128×256, 64×64×256, 32×32×256 and 16×16×256 respectively), and using the five layers of feature graphs as main features for later processing.
(2) Then entering a region recommendation model, wherein the module aims to predict a suggestion frame (the module structure is shown in fig. 3) of a building region on a picture according to trunk characteristics, five trunk characteristic diagrams P2-P6 are sequentially input into a region recommendation convolution network, the operation process of the region recommendation network is illustrated by taking P6 (dimension: 16 x 256) as an example, the length and width of P6 are 16 x 16, the original diagram (length and width: 1024 x 1024) is actually divided into 16 x 16 small regions, the length and width of each small region is 64 x 64, therefore, each point on the length and width of the characteristic diagram P6 corresponds to a region with the size of 64 x 64 on the original diagram, a first task of the region recommendation convolution network predicts whether each point on the characteristic diagram corresponds to the building in the region of the original diagram, the first task output result of the characteristic diagram P6 is 16 x 16 values, each point on the characteristic diagram P6 corresponds to one output value, the range of each output value is 0-1, the value is larger than 0.5, and the point on the original diagram corresponds to the building region not corresponding to the building region, and the original diagram is not directly ignored; because the area of the corresponding original image of each point on the feature map is fixed, but the building does not necessarily fall into the area of the corresponding original image, and deviation is possible, the second task of the area recommendation convolution network is to predict how the area of the corresponding original image of each point on the feature map is adjusted to exactly contain the building, and the output result of the second task of the feature map P6 is as follows: 16×16×4, each point on the feature map corresponds to four predicted values, which are respectively a length-width adjustment parameter of the point corresponding to the upper left corner and the lower right corner of the original map area, the range of the feature point corresponding to the original map area including the building is adjusted by using the adjustment parameter, so that a suggestion frame of the feature map P6 is obtained, each point of the feature map P6 corresponds to the area under 64×64 of the original map in consideration of the difference of the building size, each point of the feature map P2 corresponds to the area of the original map 4*4, and can be regarded as a suggestion frame of the feature map P6 for large building prediction, P2 is used for small building prediction, and five-layer feature maps are sequentially input into the area recommendation convolution network to obtain five-layer feature.
(3) Then, an object prediction module is entered to predict whether the interior of a suggestion frame truly contains a building and the adjustment parameters of the angular points of the suggestion frame (the module structure is shown in fig. 4), the previous module only roughly predicts whether each point of a feature map contains the building, the result is inaccurate, the local feature of an original image area represented by the suggestion frame is intercepted by the module, the full-connection neural network is used for accurately predicting, because the full-connection neural network needs to have unified input dimensions, the size of the suggestion frame is different, the number of the internal feature points of the local feature is different (for example, two suggestion frames, one intercepted local feature has a length and width of 8*9, the other intercepted local feature has a length and width of 11 x 6, then the first local feature has 72 points, the second local feature has 66 points, the dimension is not unified and is input into the full-connection neural network, therefore, the local feature map with a dimension of 7 x 7 is obtained by using a bilinear interpolation method, the full-connection neural network is input into the full-connection network, and the prediction value is larger than 256 in the full-connection neural network, and the total prediction value is equal to or equal to 256 building input into the first layer, and the prediction value is larger than the first layer, and equal to or equal to 1 x 7, and the total output value is contained in the prediction value is larger than 1024, and equal to 1.1024 is contained in the first layer, and equal to 0; the second fully-connected network comprises 1024 nodes of a layer of hidden layer, outputs 4 values, predicts the adjustment parameters of the upper left corner and the lower right corner of the suggestion frame, and then adjusts the angular points of the suggestion frame containing the building by using the predicted adjustment parameters to obtain the prediction frame, wherein the prediction frame is the result of the object prediction module.
(4) Finally, entering a semantic segmentation module, wherein the purpose is to obtain a mask of a building in a prediction frame (the module structure is shown in fig. 4), the third step is that the object prediction module obtains the prediction frame, the prediction frame is intercepted to correspond to a main feature map area to serve as local features of the prediction frame, the local features are unified to be 14 x 256 in dimension by using a bilinear interpolation method, the length and the width are doubled to be 28 x 28 by deconvolution, the number of channels is convolved to be 1, namely, the size of an output image of the semantic segmentation convolution network is 28 x 1, the last dimension value is only 0 and 1, the pixel is represented as the building when the value is 1, and finally, the local feature size of the prediction frame is interpolated to serve as a mask result of the building. And finishing the whole flow of the algorithm, wherein all the prediction frames are the building object detection results, and the semantic segmentation model result of each prediction frame is the building semantic segmentation result.
Example two
The embodiment provides a processing system of urban rail transit remote sensing image data, which comprises a remote sensing image feature extraction module, a region recommendation module and an object prediction semantic segmentation module,
the remote sensing image feature extraction module is used for extracting trunk features of the urban rail transit remote sensing image and constructing a feature pyramid;
the region recommending module is used for extracting the features of the feature pyramid by using shared convolution to generate a recommending frame;
the object prediction semantic segmentation module is used for generating a prediction frame after convolution of the local features extracted by the suggestion frame, and intercepting the local features of the prediction frame from the shared feature map and the prediction frame to generate an object mask.
As shown in fig. 2, the remote sensing image feature extraction module includes a main feature extraction unit, a feature pyramid construction unit and a feature pyramid unit, wherein,
the trunk feature extraction unit is used for extracting features of different levels of the remote sensing image by using a multi-layer residual convolution neural network, and generating four-level feature images with 256 channels;
the feature pyramid construction unit is used for starting convolution and up-sampling from the feature map with the lowest dimension, and constructing five-layer feature maps by superposing the feature map with the first-level feature map with the highest dimension to form a shared feature pyramid unit.
As shown in fig. 3, the region recommendation module includes a region recommendation convolutional network unit, an object prediction unit, a border adjustment unit, and a generate suggestion frame module unit, wherein,
the region recommended convolution network unit is used for extracting the features of the feature pyramid by using a layer of shared convolution;
the object prediction unit is used for predicting whether the sliding window of each characteristic point contains an object or not and adjusting parameters relative to the sliding window by two special convolutions, filtering whether the sliding window of the predicted characteristic point contains the object or not by using a threshold value, and the sliding window passing the threshold value is a preselected frame;
the frame adjustment unit adjusts the preselected frame window by using frame adjustment parameters, so that the suggestion frame generation module unit generates a suggestion frame.
As shown in fig. 4, the object prediction class semantic segmentation module includes a feature interception unit, an object classification prediction unit, a frame adjustment prediction unit, a mask feature extraction unit, and a mask prediction unit, wherein,
the feature intercepting unit is used for interpolating the local features intercepted by the suggestion frame into 7*7 feature graphs through bilinear interpolation;
the object classification prediction unit is used for predicting classification results and frame adjustment parameters respectively by using two full-connection after two-layer convolution;
the frame adjustment prediction unit is used for adjusting the frame of the object with the classification result higher than the threshold value to be a prediction frame by using frame adjustment parameters;
the mask feature extraction unit is used for intercepting local features of the prediction frame from the shared feature map;
the mask prediction unit is used for predicting the object mask by interpolation after two-layer convolution and one-layer deconvolution by using bilinear interpolation unified feature map size of 14 x 14.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (2)

1. A processing system of urban rail transit remote sensing image data comprises a remote sensing image feature extraction module and an area recommendation module
The module and the object prediction semantic segmentation module are characterized in that:
the remote sensing image feature extraction module is used for extracting trunk features of the urban rail transit remote sensing image and constructing
A feature pyramid;
the region recommending module is used for extracting the features of the feature pyramid by using shared convolution to generate a recommending frame;
the object prediction semantic segmentation module is used for generating local features extracted from the suggestion frame after convolution
A prediction frame, and local features of the prediction frame are intercepted from the shared feature map and the prediction frame to generate an object mask;
the remote sensing image feature extraction module comprises a trunk feature extraction unit, a feature pyramid construction unit and a feature golden word
A tower unit, wherein,
the trunk feature extraction unit is used for extracting characteristics of different layers of remote sensing images by using a multi-layer residual convolution neural network
Generating four hierarchical feature graphs with channels of 256;
the feature pyramid construction unit is used for starting convolution and upsampling from the feature map with the lowest dimension, and is higher than the feature map with the first level of dimension
The feature graphs are overlapped to construct five layers of feature graphs, so that a shared feature pyramid unit is formed;
the region recommending module comprises a region recommending convolutional network unit, an object predicting unit, a frame adjusting unit and a generating proposal frame module unit, wherein,
the region recommended convolution network unit is used for extracting the features of the feature pyramid by using a layer of shared convolution;
the object prediction unit is used for predicting whether the sliding window positioned at each characteristic point contains two special convolutions or not
An object and an adjustment parameter relative to the sliding window, filtering whether the sliding window of the predicted feature points contains the object by using a threshold value,
the sliding window passing the threshold value is a preselection frame;
the frame adjusting unit adjusts the pre-selected frame window by using frame adjusting parameters so as to generate a suggested frame module
The unit generates a suggestion frame;
the object prediction semantic segmentation module comprises a feature interception unit, an object classification prediction unit, a frame adjustment prediction unit, a mask feature extraction unit and a mask prediction unit, wherein,
the feature intercepting unit is used for interpolating the local features intercepted by the suggestion frame into 7*7 feature graphs through bilinear interpolation;
the object classification prediction unit is used for predicting classification results and frame adjustment respectively by using two full-connection after two-layer convolution
Finishing parameters;
the frame adjustment prediction unit is used for adjusting the frame of the object with the classification result higher than the threshold value to be prediction by using the frame adjustment parameter
A frame;
the mask feature extraction unit is used for intercepting local features of the prediction frame from the shared feature map;
the mask prediction unit is used for using bilinear interpolation to unify the feature map size to be 14 x 14 through two-layer convolution and one-layer convolution
And deconvoluting and then interpolating to predict the object mask.
2. A processing method of urban rail transit remote sensing image data comprises the following steps:
A. inputting high-resolution remote sensing images, extracting trunk features by using a feature interception network, and constructing a feature pyramid;
B. inputting the trunk feature into an area recommendation network to predict a recommendation frame which possibly contains a building;
C. according to the main feature interception of the suggestion frame, inputting the main feature into a fully-connected network prediction object type, using a frame adjustment parameter adjustment suggestion frame of synchronous prediction as a prediction frame, intercepting a local feature map from the main feature according to the prediction frame, and inputting the local feature map into a convolution network of a semantic segmentation module to predict a building mask;
D. labeling the obtained object prediction frame and a building mask thereof on an image;
the step A specifically comprises the following steps:
a1, firstly, extracting features of different layers of remote sensing images by using a multi-layer residual convolution neural network to generate 256 channels
Four hierarchical feature graphs;
a2, starting convolution from the feature map with the lowest dimension, upsampling, and superposing with the feature map with the first level higher dimension to construct five-layer dtex
A symptom map as a shared feature pyramid;
the step B specifically comprises the following steps:
extracting feature points of the feature pyramid by using a layer of shared convolution, and respectively predicting each feature by two specific convolutions
Whether or not the sliding window of the point contains an object and an adjustment parameter relative to the sliding window, and predicting the sliding of the feature point using a threshold value
Whether the window contains object filtering, the sliding window passing the threshold is the preselection frame, and the frame adjusting parameters are used for preselecting the frame window
Adjusting to obtain a suggestion frame;
the step C specifically comprises the following steps:
c1, carrying out bilinear interpolation on the local features intercepted by the suggestion frame to obtain a 7*7 feature map, and using two branches after two-layer convolution
The full connection predicts the classification result and the frame adjustment parameter respectively;
c2, adjusting the frame of the object with the classification result higher than the threshold value into a prediction frame by using the frame adjustment parameter;
and C3, intercepting local features of the prediction frame from the shared feature map, using bilinear interpolation to unify feature map sizes to be 14 x 14, and carrying out interpolation prediction on the object mask after two-layer convolution and one-layer deconvolution.
CN202110831395.XA 2021-07-22 2021-07-22 Urban rail transit remote sensing image data processing system and method Active CN113361662B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110831395.XA CN113361662B (en) 2021-07-22 2021-07-22 Urban rail transit remote sensing image data processing system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110831395.XA CN113361662B (en) 2021-07-22 2021-07-22 Urban rail transit remote sensing image data processing system and method

Publications (2)

Publication Number Publication Date
CN113361662A CN113361662A (en) 2021-09-07
CN113361662B true CN113361662B (en) 2023-08-29

Family

ID=77540092

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110831395.XA Active CN113361662B (en) 2021-07-22 2021-07-22 Urban rail transit remote sensing image data processing system and method

Country Status (1)

Country Link
CN (1) CN113361662B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118124B (en) * 2021-09-29 2023-09-12 北京百度网讯科技有限公司 Image detection method and device

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709465A (en) * 2016-12-29 2017-05-24 武汉大学 Polarization SAR image road extraction method based on conditional random field
WO2018214195A1 (en) * 2017-05-25 2018-11-29 中国矿业大学 Remote sensing imaging bridge detection method based on convolutional neural network
CN110263705A (en) * 2019-06-19 2019-09-20 上海交通大学 Towards two phase of remote sensing technology field high-resolution remote sensing image change detecting method
CN110570353A (en) * 2019-08-27 2019-12-13 天津大学 Dense connection generation countermeasure network single image super-resolution reconstruction method
CN110675408A (en) * 2019-09-19 2020-01-10 成都数之联科技有限公司 High-resolution image building extraction method and system based on deep learning
CN110909642A (en) * 2019-11-13 2020-03-24 南京理工大学 Remote sensing image target detection method based on multi-scale semantic feature fusion
US10671878B1 (en) * 2019-01-11 2020-06-02 Capital One Services, Llc Systems and methods for text localization and recognition in an image of a document
CN111462124A (en) * 2020-03-31 2020-07-28 武汉卓目科技有限公司 Remote sensing satellite cloud detection method based on Deep L abV3+
CN111553303A (en) * 2020-05-07 2020-08-18 武汉大势智慧科技有限公司 Remote sensing ortho image dense building extraction method based on convolutional neural network
WO2020215236A1 (en) * 2019-04-24 2020-10-29 哈尔滨工业大学(深圳) Image semantic segmentation method and system
CN112101189A (en) * 2020-09-11 2020-12-18 北京航空航天大学 SAR image target detection method and test platform based on attention mechanism
CN112183432A (en) * 2020-10-12 2021-01-05 中国科学院空天信息创新研究院 Building area extraction method and system based on medium-resolution SAR image

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110298298B (en) * 2019-06-26 2022-03-08 北京市商汤科技开发有限公司 Target detection and target detection network training method, device and equipment

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709465A (en) * 2016-12-29 2017-05-24 武汉大学 Polarization SAR image road extraction method based on conditional random field
WO2018214195A1 (en) * 2017-05-25 2018-11-29 中国矿业大学 Remote sensing imaging bridge detection method based on convolutional neural network
US10671878B1 (en) * 2019-01-11 2020-06-02 Capital One Services, Llc Systems and methods for text localization and recognition in an image of a document
WO2020215236A1 (en) * 2019-04-24 2020-10-29 哈尔滨工业大学(深圳) Image semantic segmentation method and system
CN110263705A (en) * 2019-06-19 2019-09-20 上海交通大学 Towards two phase of remote sensing technology field high-resolution remote sensing image change detecting method
CN110570353A (en) * 2019-08-27 2019-12-13 天津大学 Dense connection generation countermeasure network single image super-resolution reconstruction method
CN110675408A (en) * 2019-09-19 2020-01-10 成都数之联科技有限公司 High-resolution image building extraction method and system based on deep learning
CN110909642A (en) * 2019-11-13 2020-03-24 南京理工大学 Remote sensing image target detection method based on multi-scale semantic feature fusion
CN111462124A (en) * 2020-03-31 2020-07-28 武汉卓目科技有限公司 Remote sensing satellite cloud detection method based on Deep L abV3+
CN111553303A (en) * 2020-05-07 2020-08-18 武汉大势智慧科技有限公司 Remote sensing ortho image dense building extraction method based on convolutional neural network
CN112101189A (en) * 2020-09-11 2020-12-18 北京航空航天大学 SAR image target detection method and test platform based on attention mechanism
CN112183432A (en) * 2020-10-12 2021-01-05 中国科学院空天信息创新研究院 Building area extraction method and system based on medium-resolution SAR image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度学习的高分辨遥感影像建筑物提取技术研究;李祥;《中国博士学位论文全文数据库基础科学辑》(第6期);第A008-33页 *

Also Published As

Publication number Publication date
CN113361662A (en) 2021-09-07

Similar Documents

Publication Publication Date Title
CN108985181B (en) End-to-end face labeling method based on detection segmentation
CN111428586B (en) Three-dimensional human body posture estimation method based on feature fusion and sample enhancement
CN109902600B (en) Road area detection method
CN112733919B (en) Image semantic segmentation method and system based on void convolution and multi-scale and multi-branch
CN110084299B (en) Target detection method and device based on multi-head fusion attention
CN112364855A (en) Video target detection method and system based on multi-scale feature fusion
CN114821665A (en) Urban pedestrian flow small target detection method based on convolutional neural network
CN113591617B (en) Deep learning-based water surface small target detection and classification method
CN110852199A (en) Foreground extraction method based on double-frame coding and decoding model
CN114820579A (en) Semantic segmentation based image composite defect detection method and system
CN113361662B (en) Urban rail transit remote sensing image data processing system and method
CN112084859A (en) Building segmentation method based on dense boundary block and attention mechanism
CN111353544A (en) Improved Mixed Pooling-Yolov 3-based target detection method
CN114359245A (en) Method for detecting surface defects of products in industrial scene
Li et al. Fusing taxi trajectories and RS images to build road map via DCNN
CN110956119A (en) Accurate and rapid target detection method in image
CN115861281A (en) Anchor-frame-free surface defect detection method based on multi-scale features
CN112508099A (en) Method and device for detecting target in real time
CN115457043A (en) Image segmentation network based on overlapped self-attention deformer framework U-shaped network
CN115100652A (en) Electronic map automatic generation method based on high-resolution remote sensing image
CN113408550B (en) Intelligent weighing management system based on image processing
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
CN116912485A (en) Scene semantic segmentation method based on feature fusion of thermal image and visible light image
CN113920479A (en) Target detection network construction method, target detection device and electronic equipment
CN115830592A (en) Overlapping cervical cell segmentation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant