CN113177956A - Semantic segmentation method for unmanned aerial vehicle remote sensing image - Google Patents

Semantic segmentation method for unmanned aerial vehicle remote sensing image Download PDF

Info

Publication number
CN113177956A
CN113177956A CN202110508833.9A CN202110508833A CN113177956A CN 113177956 A CN113177956 A CN 113177956A CN 202110508833 A CN202110508833 A CN 202110508833A CN 113177956 A CN113177956 A CN 113177956A
Authority
CN
China
Prior art keywords
image
remote sensing
semantic segmentation
unmanned aerial
aerial vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110508833.9A
Other languages
Chinese (zh)
Inventor
于扬鸿
车明亮
杨帆
周雨航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong University
Original Assignee
Nantong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong University filed Critical Nantong University
Priority to CN202110508833.9A priority Critical patent/CN113177956A/en
Publication of CN113177956A publication Critical patent/CN113177956A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/40Image enhancement or restoration using histogram techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • G06T2207/20032Median filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20192Edge enhancement; Edge preservation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a semantic segmentation method for remote sensing images of unmanned aerial vehicles, which is characterized in that firstly, block processing is carried out, and then, the semantic segmentation of the remote sensing images is carried out block by block, so that the scale of reading the remote sensing image data by the method is reduced, and the risk of memory overflow in the semantic segmentation processing is reduced. The invention designs and realizes that the zoom is used for extracting target images in different spatial ranges, retains the most complete key characteristic information in the target images by constructing a characteristic image pyramid, and can obtain the most accurate classification predicted value, thereby ensuring the high classification precision of pixels. In the implementation of the invention, the image slices adopt a multi-process and image semantic segmentation parallel processing mode to reduce the total running time cost. The invention uses the light convolutional neural network in the classifier, reduces the model quantity to the maximum extent under the condition of ensuring that the image classification precision is not reduced, and reduces the memory and disk space consumed by the method in application.

Description

Semantic segmentation method for unmanned aerial vehicle remote sensing image
Technical Field
The invention relates to the field of image semantic segmentation, in particular to a semantic segmentation method for unmanned aerial vehicle remote sensing images.
Background
Image semantic segmentation is an important means in remote sensing image computer interpretation. The category of each pixel in the image can be judged through semantic segmentation, so that a land cover/utilization classification map is generated. In the land utilization classification investigation at the present stage, the low-altitude remote sensing image data of the unmanned aerial vehicle greatly improves the working efficiency due to the characteristics of convenience in use, low acquisition cost, high spatial resolution and the like. However, compared with conventional medium and high altitude remote sensing (aviation and satellite remote sensing) images, the unmanned aerial vehicle remote sensing image contains more complex ground object target information, so that the application of the traditional segmentation method, such as support vector machine classification, neural network classification, decision tree classification, expert system classification and the like, is greatly limited. Under the background of rapid development of artificial intelligence technology, deep learning can directly learn image feature expression from massive image data, so that more computer vision tasks such as image classification, image recognition, image segmentation and the like are solved. This makes deep learning increasingly a new method for remote sensing image ground feature classification.
At present, the remote sensing image classification method by applying deep learning is mainly based on a classical or advanced image segmentation model. A Full Connectivity Network (FCN) is the most basic framework for image segmentation. The FCN changes all the fully connected layers behind the convolutional neural network into convolutional layers so as to obtain a low-dimensional to high-dimensional characteristic diagram. And then carrying out pixel-by-pixel classification prediction on the characteristic graphs with different dimensions, carrying out upsampling on the characteristic graphs to expand to the size of the original image, and finally carrying out prediction result fusion. Since FCN mainly utilizes deep networks to extract features and classify, it is insensitive to small-sized objects and inaccurate to segmentation details in images. The U-Net model is an improvement and extension of FCN, and follows the idea of performing image semantic segmentation by FCN, namely performing feature extraction by using a convolution layer and a pooling layer, and reducing the image size by using a deconvolution layer. The U-Net structure uses symmetric compression-expansion channels. The compression channel is used for capturing context, extracting image features layer by layer, and the expansion channel is used for accurately positioning and restoring the position information of the image. Experiments prove that the U-Net model can obtain more accurate classification results under fewer training samples. However, the model is usually used for binary semantic segmentation, and the semantic segmentation needs to additionally modify the model structure for multiple classes of labels. The SegNet model is similar to FCN, with the full connection layer removed. The core structure of SegNet includes an encoder network, a decoder network, and a pixel-by-pixel classification layer. The encoder part uses the first 13 layers of convolution layers of the VGG-16 network, and extracts the image high-dimensional feature map through downsampling. Each encoder layer corresponds to a decoder layer for upsampling the low resolution feature map to the full input resolution feature map for pixel classification. Data verification shows that the training speed and the segmentation precision of the model are better than those of an FCN model, but the model is classified independently of pixels, and the spatial relation among the pixels is not considered, so that the segmentation result has a blockiness effect. For this purpose, the deep lab model recovers the target boundary details by introducing a fully connected Conditional Random Field (CRF) at the last layer of the convolutional network, so as to achieve accurate positioning. Based on semantic segmentation, the appearance of Mask-RCNN improves the hierarchy of segmentation tasks, namely processing example segmentation. The Mask-RCNN follows the idea of fast RCNN, a ResNet residual network is adopted for feature extraction, and a Mask prediction branch is additionally added. The Mask-RCNN framework still adopts a two-step strategy, firstly, a Region generation Network (RPN) is found out, then, each interest Region found by the RPN is classified and positioned, and a binary Mask is calculated. This makes Mask RCNN have higher segmentation precision, and the Feature Pyramid Network (FPN) strategy makes this model can support the multi-scale detection simultaneously.
Although the segmentation model achieves a good segmentation effect on an image test data set, for the unmanned aerial vehicle remote sensing image, the segmentation accuracy and the operation efficiency of the unmanned aerial vehicle remote sensing image still face huge challenges. Firstly, unmanned aerial vehicle remote sensing image ground objects are relatively complex, target scales are changeable, training samples are insufficient, and therefore the classification accuracy of the model is low. Secondly, the scale of the remote sensing image data of the unmanned aerial vehicle is large, the segmentation model has more modules due to the fact that the network is deep, the size of the model is large, and therefore the operation efficiency of the model is low when the image data are read, weight files are loaded, and classification results are predicted. Finally, the segmentation model has the information attenuation problem in the aspect of extracting the target object features, and particularly for small target objects, the segmentation model has poor effect in the process of segmenting small-size ground objects. Current image segmentation models are mainly based on convolutional layer extraction features, such as single feature maps, pyramid feature hierarchies, and feature pyramid networks. Although the three methods can extract high-dimensional abstract image features, part or most of original key features of the image are lost without exception. In contrast, the characterized image pyramid has strong semantics at all levels, can retain all image feature information as much as possible, and is applied more in algorithms with advanced ImageNet and COCO detection challenge match ranks, but consumes more time and has higher requirements on calculation amount and memory. Therefore, the application of the deep learning semantic segmentation model in the aspect of unmanned aerial vehicle remote sensing image classification is restricted due to the existence of the problems, and the problems need to be solved and processed urgently.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the problems of low precision and low efficiency of the existing image semantic segmentation algorithm in the aspect of unmanned aerial vehicle remote sensing image classification, and provides a high-precision unmanned aerial vehicle remote sensing image semantic segmentation method. The method can achieve a good segmentation effect, greatly improves the classification precision of the remote sensing image, and achieves the purpose of accurately segmenting the high-resolution unmanned aerial vehicle remote sensing image.
The technical scheme is as follows: the invention relates to a semantic segmentation method for unmanned aerial vehicle remote sensing images, which comprises the following steps:
(1) image preprocessing: carrying out image enhancement processing on the unmanned aerial vehicle remote sensing image, wherein the image enhancement processing comprises operations of image drying removal, image sharpening, image equalization and the like;
(2) initializing parameters: initializing parameters related to the method, such as an image block size s, an image slice number n, a zoom window radius r, a zoom level (namely a focal length) f and the like;
(3) image blocking and slicing: according to the block size of a preset image, dividing the original image according to blocks, temporarily storing the original image according to block numbers, and overlapping m between two adjacent image blocks to keep complete boundary semantics1And (4) each pixel element. And according to the block numbers, creating an image block queue, and sequentially performing semantic segmentation processing on each image block according to the dequeuing sequence. According to the number of preset image slices, equally dividing the image block to be processed, and overlapping m between two adjacent image slices for keeping complete boundary semantics2And (4) each pixel element.
(4) Constructing an image segmentation process pool: and constructing a process pool according to the set CPU core number, and processing the tasks of the sub-processes in the process pool in an asynchronous non-blocking manner. And respectively sending each image slice segmented in the steps to a subprocess, and creating a parallel image segmentation task.
(5) Jumping and traversing the pixels, and extracting a target subgraph by using a zoom device: and (5) extracting a target sub-image by using an initial focal length of the zoom to slide pixel by pixel on the basis of an image slice coordinate system. And when the output probability of the classified target subgraph is lower than a threshold value, performing upscaling zooming according to the zooming level, and expanding the range of the extracted target subgraph. And after the target subgraphs are classified, the zoom recovers the initial focal length to perform jumping traversal on the image element according to the span, and the next target subgraph is extracted.
(6) Classifying by a classifier: classifying the target subgraph in the extracted image slice by using a classifier, outputting a classification result of a central pixel, and completing semantic segmentation in the image slice;
(7) image slice merging and boundary fusion: and merging all the image slices subjected to semantic segmentation, and fusing the overlapped boundaries.
(8) Image block merging and boundary fusion: and merging the image blocks subjected to semantic segmentation according to the block numbers, and fusing the overlapped boundaries.
(9) And (3) post-treatment: and post-processing the semantic segmentation image processed by the steps to obtain a more accurate remote sensing image classification result.
Preferably, in step (1), the image noise (such as salt and pepper spots) is removed by filtering with a median and a mean value with a convolution kernel of 3 × 3, that is, replacing and eliminating the noise pixels by the median and the mean value of the intensity values in the neighborhood of the noise pixels. The image sharpening operation used for highlighting the boundary of the ground object is mainly laplacian operators (Laplace operators) in 4 and 8 fields. The image equalization method used for keeping the brightness consistency of each area of the image and improving the image definition of partial areas is mainly global histogram equalization.
Preferably, in step (2), the parameters to be initialized mainly include: the image block size s, the number of image block overlapping pixels m1, the number of image slices n, the number of image slice overlapping pixels m2, the number of process pool CPU cores, the zoom level (i.e., focal length) f and the translation coefficient c of the zoom, the classification probability threshold thr, and the like.
Preferably, in the step (3), the format of the image block number is: sequence number _ row index _ column index. The sequence number is the only identification of the block and is calculated by the size of the original image and the size of the image block, and the row index and the column index are respectively obtained by the number of blocks according to the height and the number of blocks according to the width.
bid=Nrow*Ncol=fceil(H/s)*fceil(W/s)
Where bid is the number of the image block, NrowIs the number of blocks by height, NcolIs the number of blocks by width, fceil() For the ceiling function, H and W are the pixel height and width, respectively, of the original image, and s is the image block size.
Preferably, in the step (5) above, the zoom device has an auto zoom function of the camera lens, and the zoom window size r and the zoom level f have the following relationship:
r=R(f)=2*f+c
where r is the window size, f is the zoom level, and c is the translation coefficient. When f is smaller, the space range of the ground features captured by the zoom is smaller, and vice versa.
Whether the zoom is zoomed or not is related to the classification probability of the feature image currently extracted, and is expressed by the following formula:
Figure BDA0003059464570000041
in the formula, p is the maximum classification probability, and thr is the probability threshold.
The zoom step of moving from the current pixel to the next pixel is related to the zoom level and is expressed by:
k=Kceil(f/2)
wherein K is span, KceilIs an rounding-up function.
In general, the zoom device will not activate the zoom mechanism, and it will calculate the window size according to the initial zoom level and then extract the target image for subsequent classification, centered on the pixel where it is located. However, when the classification probability is lower than the corresponding threshold, the zoom device starts a zoom mechanism to gradually enlarge the range of the target image until the classification probability meets the classification requirement. Repeating this process results in a series of characterized image pyramids. The images can obtain the most accurate classification value because the most complete key characteristic information is kept.
Preferably, in the step (6), the classifier for performing the class determination on the target subgraph mainly uses a lightweight convolutional neural network, which is composed of a series of convolutional layers (compression layer and expansion layer), pooling layer, normalization layer, activation function layer, and full-link layer. Before the classifier is used, a sample interest region can be selected in the remote sensing image in advance for classification training, wherein the learning rate is dynamically adjusted according to step length (Multistep) in the training process.
Preferably, in the step (7), the image overlap boundary fusion method mainly uses a median filter with a convolution kernel of 5 × 5.
Preferably, in the step (8), the image overlap boundary fusion method mainly uses a median filter with a convolution kernel of 5 × 5.
Preferably, in the step (9), a small part of false pixels, such as broken patches, island pixels, and the like, may exist in the finally generated semantic segmentation image. To effectively remove false pixels, the process mainly uses erosion and dilation operations.
The invention has the beneficial effects that:
the method can achieve a good segmentation effect, greatly improves the classification precision of the remote sensing image, and achieves the purpose of accurately segmenting the high-resolution unmanned aerial vehicle remote sensing image.
1) Aiming at the characteristic of large scale of the unmanned aerial vehicle remote sensing image space, the invention firstly carries out block processing and then carries out remote sensing image semantic segmentation block by block, thereby reducing the scale of reading the remote sensing image data by the method and reducing the risk of memory overflow in the semantic segmentation processing.
2) The invention designs and realizes that the zoom is used for extracting target images in different spatial ranges, retains the most complete key characteristic information in the target images by constructing a characteristic image pyramid, and can obtain the most accurate classification predicted value, thereby ensuring the high classification precision of pixels.
3) The invention adopts a multi-process and image semantic segmentation parallel processing mode in implementation so as to reduce the total running time cost. Furthermore, the jumping traversal of the zoom also shortens the run time. This ensures the timeliness of the semantic segmentation of the remote sensing image.
4) The invention uses the light convolutional neural network in the classifier, reduces the model quantity to the maximum extent under the condition of ensuring that the image classification precision is not reduced, and reduces the memory and disk space consumed by the method in application.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic flow chart of an image semantic segmentation method for unmanned aerial vehicle remote sensing images.
Fig. 2 is a schematic diagram of image segmentation and slicing according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a zoom device structure and a constructed image pyramid according to an embodiment of the invention.
Fig. 4 is a schematic diagram of a classifier according to an embodiment of the present invention.
Fig. 5 is a diagram of a semantic segmentation effect of an unmanned aerial vehicle remote sensing image in the embodiment of the present invention.
Detailed Description
The technical solution of the present invention will be further described in detail with reference to the following embodiments and the accompanying drawings. Specifically, the invention relates to a semantic segmentation method for unmanned aerial vehicle remote sensing images, which comprises the following specific steps as shown in fig. 1:
(1) image preprocessing: the method is used for enhancing the remote sensing image of the unmanned aerial vehicle so as to achieve the purposes of reducing noise and highlighting details, and mainly comprises the operations of image drying removal, image sharpening, image equalization and the like.
In the above steps, the image noise (such as salt and pepper noise) processing mainly adopts median and mean filtering with convolution kernel of 3 × 3, that is, the noise pixel is replaced and eliminated by using the median and mean of the intensity values in the neighborhood of the noise pixel. The image sharpening operation used for highlighting the boundary of the ground object is mainly laplacian operators (Laplace operators) in 4 and 8 fields. The image equalization method used for keeping the brightness consistency of each area of the image and improving the image definition of partial areas is mainly global histogram equalization.
In the above steps, the processing procedure of the laplacian is described as follows: the method comprises the steps of firstly utilizing an edge enhancement operator to highlight local edges in an image, and then gradually tracking edge points along two different directions from a place with higher edge strength until two tracks meet and form a closed contour line.
(2) Initializing parameters: initializing parameters related in the method, which mainly comprises the following steps: image block size s, imageNumber of block overlapped pixels m1N number of image slices, m number of overlapping pixels of image slices2The number of CPU cores in the process pool, zoom level (i.e., focal length) f and translation coefficient c of the zoom, and classification probability threshold thr.
(3) Image blocking and slicing: according to the block size of a preset image, dividing the original image according to blocks, temporarily storing the original image according to block numbers, and overlapping m between two adjacent image blocks to keep complete boundary semantics1And (4) each pixel element. And according to the block numbers, creating an image block queue, and sequentially performing semantic segmentation processing on each image block according to the dequeuing sequence. According to the number of preset image slices, equally dividing the image block to be processed, and overlapping m between two adjacent image slices for keeping complete boundary semantics2And (4) each pixel element. As shown in fig. 2.
In the above step, the format of the image block number is: sequence number _ row index _ column index. The sequence number is the only identification of the block and is calculated by the size of the original image and the size of the image block, and the row index and the column index are respectively obtained by the number of blocks according to the height and the number of blocks according to the width.
bid=Nrow*Ncol=fceil(H/s)*fceil(W/s)
Where bid is the number of the image block, NrowIs the number of blocks by height, NcolIs the number of blocks by width, fceil() For the ceiling function, H and W are the pixel height and width, respectively, of the original image, and s is the image block size.
(4) Constructing an image segmentation process pool: and constructing a process pool according to the set CPU core number, and processing the tasks of the sub-processes in the process pool in an asynchronous non-blocking manner. And respectively sending each image slice segmented in the steps to a subprocess, and creating and executing a parallel image segmentation task. As shown in fig. 3.
(5) And jumping and traversing the pixels, and extracting a target subgraph by using a zoom. And (5) extracting a target sub-image by using an initial focal length of the zoom to slide pixel by pixel on the basis of an image slice coordinate system. And when the output probability of the classified target subgraph is lower than a threshold value, performing upscaling zooming according to the zooming level, and expanding the range of the extracted target subgraph. And after the target subgraphs are classified, the zoom recovers the initial focal length to perform jumping traversal on the image element according to the span, and the next target subgraph is extracted.
In the above steps, the zoom device has an auto zoom function of the camera lens, and the zoom window size r and the zoom level f have the following relationship:
r=R(f)=2*f+c
where r is the window size, f is the zoom level, and c is the translation coefficient. When f is smaller, the space range of the ground features captured by the zoom is smaller, and vice versa.
Whether the zoom is zoomed or not is related to the classification probability of the feature image currently extracted, and is expressed by the following formula:
Figure BDA0003059464570000071
in the formula, p is the maximum classification probability, and thr is the probability threshold.
The zoom step of moving from the current pixel to the next pixel is related to the zoom level and is expressed by:
k=Kceil(f/2)
wherein K is span, KceilIs an rounding-up function.
In general, the zoom device will not activate the zoom mechanism, and it will calculate the window size according to the initial zoom level and then extract the target image for subsequent classification, centered on the pixel where it is located. However, when the classification probability is lower than the corresponding threshold, the zoom device starts a zoom mechanism to gradually enlarge the range of the target image until the classification probability meets the classification requirement. Repeating this process results in a series of characterized image pyramids. The images can obtain the most accurate classification value because the most complete key characteristic information is kept.
(6) Classifying by a classifier: classifying the target subgraph in the extracted image slice by using a classifier, outputting a classification result of a central pixel, and completing semantic segmentation in the image slice;
in the above steps, the classifier for performing the class determination on the target subgraph mainly adopts a light-weight convolutional neural network. In the present embodiment, although not limited thereto, the SqueezeNet is used, and is composed of a series of convolution layers (compression layer and expansion layer), pooling layer, normalization layer, activation function layer, and full connection layer. The core module of the SqueezeNet model is a fire module and consists of a compression layer (1 multiplied by 1 convolution) and an expansion layer (1 multiplied by 1 convolution +3 multiplied by 3 convolution), so that the convolutional neural network can keep equal accuracy on a limited parameter budget. As shown in fig. 4. The size of the input image of the SqueezeNet model is preset to 32 × 32 pixels, but is not limited thereto; each layer of channel retains its original value; the category number depends on the semantic segmentation category of the remote sensing image; the learning rate is initially set to 0.001, and is dynamically adjusted according to a step size (Multistep) in the training process, but the learning rate is not limited to this; before the SqueezeNet model is used, a sample interest region can be selected in the remote sensing image in advance for classification training. In addition, classical convolutional neural network frameworks such as VGGNet, ResNet, and golelenet can also be used as classifiers in the present method.
(7) Image slice merging and boundary fusion: and merging all the image slices subjected to semantic segmentation, and fusing the overlapped boundaries.
In the above steps, the image overlap boundary fusion method mainly uses a median filter with a convolution kernel of 5 × 5. But is not limited thereto;
(8) image block merging and boundary fusion: and merging the image blocks subjected to semantic segmentation according to the block numbers, and fusing the overlapped boundaries.
In the above steps, the image overlap boundary fusion method mainly uses a median filter with a convolution kernel of 5 × 5. But is not limited thereto;
(9) and (3) post-treatment: in the steps, a small part of false image elements such as broken patches, island image elements and the like may exist in the finally generated semantic segmentation image. In order to effectively remove the false pixels, post-processing is needed to obtain a more accurate remote sensing image classification result, and the process mainly adopts corrosion and expansion operations for processing. The effect of the finally classified remote sensing image is shown in fig. 5.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (10)

1. A semantic segmentation method for unmanned aerial vehicle remote sensing images is characterized by comprising the following steps:
step 1: preprocessing the remote sensing image of the unmanned aerial vehicle;
step 2: initializing the related parameters;
and step 3: partitioning and numbering the preprocessed unmanned aerial vehicle remote sensing images, and slicing;
and 4, step 4: constructing an image segmentation process pool, processing the sub-processes in the image segmentation process pool according to an asynchronous non-blocking type task, respectively sending each image slice segmented in the step 3 into the sub-processes, and creating a parallel image segmentation task;
and 5: jumping and traversing pixels in each image slice, and extracting a target sub-image by using a zoom device;
step 6: classifying the target subgraph in the extracted image slice by using a classifier, outputting a classification result of a central pixel, and completing semantic segmentation in the image slice;
and 7: merging all the image slices subjected to semantic segmentation, and fusing overlapping boundaries;
and 8: merging the image blocks subjected to semantic segmentation according to the block numbers, and fusing overlapping boundaries;
and step 9: and post-processing the semantic segmentation image processed by the steps to obtain a more accurate remote sensing image classification result.
2. The semantic segmentation method for the unmanned aerial vehicle remote sensing image according to claim 1, wherein in step 1, the preprocessing comprises image drying, image sharpening and image equalization; filtering the image by adopting a median and a mean with a convolution kernel of 3 multiplied by 3; the image sharpening operation adopts Laplacian operators in 4 fields and 8 fields; the image equalization operation employs global histogram equalization.
3. The semantic segmentation method for the unmanned aerial vehicle remote sensing image according to claim 1, wherein in the step 2, the parameters to be initialized comprise: size of image block s, number of overlapped pixels m of image block1N number of image slices, m number of overlapping pixels of image slices2The number of CPU cores in the process pool, zoom level f and translation coefficient c of the zoom device, and classification probability threshold thr.
4. The semantic segmentation method for the unmanned aerial vehicle remote sensing image according to claim 1, wherein the step 3 specifically comprises: according to the block size of a preset image, dividing the original image according to blocks, temporarily storing the original image according to block numbers, and overlapping m between two adjacent image blocks1A pixel element; according to the block numbers, creating an image block queue, and sequentially performing semantic segmentation processing on each image block according to the dequeuing sequence; according to the number of preset image slices, equally dividing the image block to be processed, wherein m is overlapped between two adjacent image slices2And (4) each pixel element.
5. The semantic segmentation method for the unmanned aerial vehicle remote sensing image according to claim 4, wherein the image block numbering format is as follows: sequence number row index column index, where the sequence number is the unique identification of the block, calculated from the original image size and image block size, the row index and column index are derived from the number of height tiles and the number of width tiles respectively,
bid=Nrow*Ncol=fceil(H/s)*fceil(W/s)
where bid is the number of the image block, NrowIs the number of blocks by height, NcolIs the number of blocks by width, fceil() For the ceiling function, H and W are the pixel height and width, respectively, of the original image, and s is the image block size.
6. The semantic segmentation method for the unmanned aerial vehicle remote sensing image according to claim 1, wherein the step 5 specifically comprises: taking an image slice coordinate system as a reference, and utilizing the initial focal length of the zoom to slide pixel by pixel to extract a target sub-image; when the output probability of the classified target subgraph is lower than a threshold value, performing upscaling zooming according to the zooming level, and expanding the range of the extracted target subgraph; and after the target subgraphs are classified, the zoom recovers the initial focal length to perform jumping traversal on the image element according to the span, and the next target subgraph is extracted.
7. The semantic segmentation method for the unmanned aerial vehicle remote sensing image according to claim 1, wherein in the step 6, a light convolutional neural network is adopted as a classifier for carrying out class judgment on the target subgraph, and the classifier is composed of a series of convolutional layers, pooling layers, normalization layers, activation function layers and full-connection layers; selecting a sample interest area on the remote sensing image in advance before using the classifier for classification training, wherein the learning rate is dynamically adjusted in a step-by-step mode in the training process.
8. The semantic segmentation method for the unmanned aerial vehicle remote sensing image according to claim 1, wherein in step 7, the image overlapping boundary fusion method adopts a median filter with a convolution kernel of 5 x 5.
9. The semantic segmentation method for the unmanned aerial vehicle remote sensing image according to claim 1, wherein in step 8, the image overlapping boundary fusion method adopts a median filter with a convolution kernel of 5 x 5.
10. The semantic segmentation method for unmanned aerial vehicle remote sensing images according to claim 1, wherein in step 9, the post-processing comprises erosion and dilation operations.
CN202110508833.9A 2021-05-11 2021-05-11 Semantic segmentation method for unmanned aerial vehicle remote sensing image Pending CN113177956A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110508833.9A CN113177956A (en) 2021-05-11 2021-05-11 Semantic segmentation method for unmanned aerial vehicle remote sensing image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110508833.9A CN113177956A (en) 2021-05-11 2021-05-11 Semantic segmentation method for unmanned aerial vehicle remote sensing image

Publications (1)

Publication Number Publication Date
CN113177956A true CN113177956A (en) 2021-07-27

Family

ID=76928825

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110508833.9A Pending CN113177956A (en) 2021-05-11 2021-05-11 Semantic segmentation method for unmanned aerial vehicle remote sensing image

Country Status (1)

Country Link
CN (1) CN113177956A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113946538A (en) * 2021-09-23 2022-01-18 南京大学 Convolutional layer fusion storage device and method based on line cache mechanism
CN114445632A (en) * 2022-02-08 2022-05-06 支付宝(杭州)信息技术有限公司 Picture processing method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103868499A (en) * 2014-02-28 2014-06-18 北京空间机电研究所 Intelligent optical remote sensing system
CN107610141A (en) * 2017-09-05 2018-01-19 华南理工大学 A kind of remote sensing images semantic segmentation method based on deep learning
CN108369635A (en) * 2015-11-08 2018-08-03 阿格洛英公司 The method with analysis is obtained for aerial image
CN110675408A (en) * 2019-09-19 2020-01-10 成都数之联科技有限公司 High-resolution image building extraction method and system based on deep learning
CN110689544A (en) * 2019-09-06 2020-01-14 哈尔滨工程大学 Method for segmenting delicate target of remote sensing image
CN112084923A (en) * 2020-09-01 2020-12-15 西安电子科技大学 Semantic segmentation method for remote sensing image, storage medium and computing device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103868499A (en) * 2014-02-28 2014-06-18 北京空间机电研究所 Intelligent optical remote sensing system
CN108369635A (en) * 2015-11-08 2018-08-03 阿格洛英公司 The method with analysis is obtained for aerial image
CN107610141A (en) * 2017-09-05 2018-01-19 华南理工大学 A kind of remote sensing images semantic segmentation method based on deep learning
CN110689544A (en) * 2019-09-06 2020-01-14 哈尔滨工程大学 Method for segmenting delicate target of remote sensing image
CN110675408A (en) * 2019-09-19 2020-01-10 成都数之联科技有限公司 High-resolution image building extraction method and system based on deep learning
CN112084923A (en) * 2020-09-01 2020-12-15 西安电子科技大学 Semantic segmentation method for remote sensing image, storage medium and computing device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SERGIO BERNABE´ ETAL: ""A new parallel tool for classification of remotely sensed imagery"", 《COMPUTERS & GEOSCIENCES》, pages 208 - 218 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113946538A (en) * 2021-09-23 2022-01-18 南京大学 Convolutional layer fusion storage device and method based on line cache mechanism
CN113946538B (en) * 2021-09-23 2024-04-12 南京大学 Convolutional layer fusion storage device and method based on line caching mechanism
CN114445632A (en) * 2022-02-08 2022-05-06 支付宝(杭州)信息技术有限公司 Picture processing method and device

Similar Documents

Publication Publication Date Title
CN111210443B (en) Deformable convolution mixing task cascading semantic segmentation method based on embedding balance
CN109241982B (en) Target detection method based on deep and shallow layer convolutional neural network
Alidoost et al. A CNN-based approach for automatic building detection and recognition of roof types using a single aerial image
CN111709416B (en) License plate positioning method, device, system and storage medium
CN113076871B (en) Fish shoal automatic detection method based on target shielding compensation
CN111460968B (en) Unmanned aerial vehicle identification and tracking method and device based on video
CN111046880A (en) Infrared target image segmentation method and system, electronic device and storage medium
CN108830196A (en) Pedestrian detection method based on feature pyramid network
CN106295613A (en) A kind of unmanned plane target localization method and system
CN113420607A (en) Multi-scale target detection and identification method for unmanned aerial vehicle
CN112070070B (en) LW-CNN method and system for urban remote sensing scene recognition
CN109919223B (en) Target detection method and device based on deep neural network
CN109886159B (en) Face detection method under non-limited condition
CN109685045A (en) A kind of Moving Targets Based on Video Streams tracking and system
CN111914698A (en) Method and system for segmenting human body in image, electronic device and storage medium
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
Xing et al. Traffic sign recognition using guided image filtering
CN113160062A (en) Infrared image target detection method, device, equipment and storage medium
CN111353544A (en) Improved Mixed Pooling-Yolov 3-based target detection method
CN113177956A (en) Semantic segmentation method for unmanned aerial vehicle remote sensing image
CN114049572A (en) Detection method for identifying small target
CN114037640A (en) Image generation method and device
Zhou et al. Building segmentation from airborne VHR images using Mask R-CNN
CN117058546A (en) High-resolution remote sensing image building extraction method of global local detail perception conditional random field
CN114463205A (en) Vehicle target segmentation method based on double-branch Unet noise suppression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination