CN112215096B

CN112215096B - Remote sensing image town extraction method and device based on scene and pixel information

Info

Publication number: CN112215096B
Application number: CN202011024302.4A
Authority: CN
Inventors: 赵理君; 张伟; 唐娉; 张正
Original assignee: Aerospace Information Research Institute of CAS
Current assignee: Aerospace Information Research Institute of CAS
Priority date: 2020-09-25
Filing date: 2020-09-25
Publication date: 2023-04-07
Anticipated expiration: 2040-09-25
Also published as: CN112215096A

Abstract

The invention discloses a method and a device for extracting a remote sensing image town based on scene and pixel information. The method comprises the following steps: carrying out grid division on a remote sensing image to be processed to generate a plurality of grid images; inputting the grid image into a target scene classification network model to obtain a grid classification image; performing edge extraction operation on the classified images of the grids, and marking the classified images of the grids which are edges; inputting the marked grid classified image into a target pixel classified network model, and acquiring a pixel classification result image as a town extraction result image of a corresponding region in a town extraction result; assigning a value of 1 to a corresponding area in the town extraction result image aiming at the unmarked grid image of the town category; assigning a value of 0 to a corresponding area in the town extraction result image aiming at the non-town category grid image; and carrying out hole filling post-processing on the town extraction result image to obtain a town extraction binary image. The invention achieves the integrity of town range extraction and the accuracy of town boundary extraction.

Description

Remote sensing image town extraction method and device based on scene and pixel information

Technical Field

The invention relates to the technical field of satellite remote sensing image processing, in particular to a remote sensing image town extraction method and device based on scene and pixel information.

Background

The town crowd is a main occurrence place of spatial redistribution and transfer of production elements under the new situation of economic globalization, is a product of industrial space integration, and occupies a core position in national and regional economic development. In the new environment, competition in economic development between regions and countries requires close cooperation and interaction among cities and industries within a metropolitan area according to the roles they play. Scientific and reasonable definition and understanding of the town pool are beneficial to monitoring, managing and promoting coordinated development among various cities in the town pool. In traditional urban geographic research, the definition and definition of urban groups mainly depend on socio-economic statistics. The methods have higher requirements on the completeness and accuracy of each index of statistical data, and cannot visually reflect the spatial relationship among cities in the town group, the space form and the space mode of the town group. The introduction of the remote sensing technology provides a new way for acquiring spatial information of towns and town groups. The existing remote sensing-based city and urban community research proves that the remote sensing technology has great advantages in the aspect of acquiring and analyzing urban spatial information.

The remote sensing image town identification is generally carried out by adopting the existing land covering/land utilization classification method, the classification also belongs to the category of pattern identification, and the research of the classification method mainly focuses on the feature extraction of town ground object targets and the design and selection of classifiers. The town extraction based on the remote sensing image needs to obtain a boundary range of town distribution, the town usually contains a plurality of different ground surface coverage types, the ground surface coverage types are represented as a multi-land complex, and impermeable layer types such as artificial building facilities and the like can sporadically appear outside the town distribution range. Under the condition, the traditional classification method based on the pixel can accurately extract the boundary range of town distribution, but usually only can extract impervious layers such as roads, residential areas and the like in towns, so that lakes, rivers, vegetation and other places in the town range cannot be all extracted at one time, and meanwhile, scattered artificial building facilities outside the town range can be wrongly extracted as the town area, so that the town extraction result is fragmented and lacks integrity. On the other hand, the classification method based on the scene can extract the overall range of the town area from the whole in a grid mode according to the spatial distribution characteristics of the town, but the method is not ideal for processing the town boundary and mostly presents a sawtooth phenomenon. In order to improve the integrity of town range extraction and the accuracy of town boundary extraction, the application requirement of the town extraction of the current remote sensing image cannot be well met by singly considering any one method.

Disclosure of Invention

The technical problem solved by the invention is as follows: the method and the device for extracting the town of the remote sensing image based on the scene and the pixel information are provided.

In order to solve the technical problem, an embodiment of the present invention provides a remote sensing image town extraction method based on pixel information, including:

carrying out grid division on the obtained remote sensing image to be processed to generate a plurality of grid images corresponding to the remote sensing image to be processed;

inputting each grid image into a pre-trained target scene classification network model to obtain a grid classification image corresponding to each grid image;

performing edge extraction operation on the classified grid image, and marking the classified grid image which is the edge according to an extraction result;

inputting the marked grid classified image into a pre-trained target pixel classified network model to obtain a pixel classification result image corresponding to the marked grid classified image, and taking the pixel classification result image as a town extraction result image of a corresponding region in a town extraction result;

assigning a value of 1 to a corresponding region in the town extraction result image for the unlabeled grid image judged as the town category by the target scene classification network model;

assigning a value of 0 to a corresponding area in the town extraction result image aiming at the grid image judged as a non-town category by the target scene classification network model;

and carrying out hole filling post-processing on the town extraction result image to obtain a town extraction binary image.

Optionally, before the step of inputting each mesh image to a pre-trained target scene classification network model to obtain a mesh classification image corresponding to each mesh image, the method further includes:

obtaining a homologous town satellite remote sensing image and a non-town satellite remote sensing image;

cutting the town satellite remote sensing image and the non-town satellite remote sensing image according to the size of NxN to respectively obtain a first cut image corresponding to the town satellite remote sensing image and a second cut image corresponding to the non-town satellite remote sensing image; wherein N is a positive integer and is an integral multiple of 2;

labeling the first cut image and the second cut image respectively according to the town scene category and the non-town scene category, and respectively generating a first labeled image corresponding to the first cut image and a second labeled image corresponding to the second cut image;

training an initial scene classification network model based on the first annotation image and the second annotation image to obtain the target scene classification network model.

Optionally, before the step of inputting the labeled classified image of the mesh into a pre-trained target pixel classification network model to obtain a pixel classification result image corresponding to the labeled classified image of the mesh, and taking the pixel classification result image as an extraction result of a corresponding region in a town extraction result, the method further includes:

obtaining homologous town satellite remote sensing images and non-town satellite remote sensing images;

labeling the image elements in the first cutting image and the second cutting image in a town category and a non-town category, and respectively generating a first image element labeling image corresponding to the first cutting image and a second image element labeling image corresponding to the second cutting image;

and training an initial pixel classification network model based on the pixel annotation images of the town categories in the first pixel annotation image and the second pixel annotation image to obtain the target pixel classification network model.

Optionally, the performing an edge extraction operation on the classified mesh image, and labeling the classified mesh image as an edge according to an extraction result includes:

and performing edge extraction on the edges of the classified grid images based on an image edge detection algorithm, marking the grid images identified as the edges as E, and not marking the non-edge grid images.

Optionally, the inputting the labeled grid classified image into a pre-trained target pixel classification network model to obtain a pixel classification result image corresponding to the labeled grid classified image, and taking the pixel classification result image as a town extraction result image of a corresponding region in the town extraction result includes:

establishing a blank image with the same size as the remote sensing image to be processed so as to store the town extraction result image;

performing condition judgment on each classified image of the grids in the classified images of the grids;

if the classified image of the grid is a town category and is marked as E, inputting the classified image of the grid into the target pixel classification network model to predict a pixel classification result of the target pixel classification network model, and assigning the pixel classification result image to the same area in the blank image;

if the classified image of the grid is an image of a town category and is not marked, all pixels in the same area in the blank image are assigned to be 1;

and if the classified grid image is of a non-town type, all pixels in the same area in the blank image are assigned to be 0, and finally the town extraction result image is a 0-1 binary image.

Optionally, the performing hole filling post-processing on the town extraction result image to obtain a town extraction binary image includes:

and post-processing the town splicing image by using a hole filling algorithm in morphological operation, filling pixels which are positioned in the town and have a pixel value of 0, and obtaining a town extracted binary image.

In order to solve the above technical problem, an embodiment of the present invention further provides a remote sensing image town extraction device based on scene and pixel information, including:

the grid image generation module is used for carrying out grid division on the obtained remote sensing image to be processed and generating a plurality of grid images corresponding to the remote sensing image to be processed;

a classified image obtaining module, configured to input each grid image into a pre-trained target scene classification network model, so as to obtain a grid classified image corresponding to each grid image;

a classified image marking module, configured to perform edge extraction operation on the classified images of the grids, and mark the classified images of the grids as edges according to an extraction result;

the extraction result image acquisition module is used for inputting the marked grid classified images into a pre-trained target pixel classified network model so as to acquire pixel classified result images corresponding to the marked grid classified images and taking the pixel classified result images as town extraction result images of corresponding areas in the town extraction results;

the first image assignment module is used for assigning a value of 1 to a corresponding area in the town extraction result image aiming at the unmarked grid image judged as the town category by the target scene classification network model;

the second image assignment module is used for assigning 0 to a corresponding area in the town extraction result image aiming at the grid image which is judged to be in a non-town category by the target scene classification network model;

and the town binary image acquisition module is used for carrying out hole filling post-processing on the town extraction result image to obtain a town extraction binary image.

Optionally, the method further comprises:

the first remote sensing image acquisition module is used for acquiring a homologous town satellite remote sensing image and a non-town satellite remote sensing image;

the first cutting image acquisition module is used for cutting the town satellite remote sensing image and the non-town satellite remote sensing image according to the size of NxN to respectively obtain a first cutting image corresponding to the town satellite remote sensing image and a second cutting image corresponding to the non-town satellite remote sensing image; wherein N is a positive integer and is an integral multiple of 2;

the first labeling image generation module is used for labeling the first cropping image and the second cropping image respectively according to town scene categories and non-town scene categories, and respectively generating a first labeling image corresponding to the first cropping image and a second labeling image corresponding to the second cropping image;

and the target classification model acquisition module is used for training an initial scene classification network model based on the first annotation image and the second annotation image to obtain the target scene classification network model.

Optionally, the method further comprises:

the second remote sensing image acquisition module is used for acquiring a homologous town satellite remote sensing image and a non-town satellite remote sensing image;

the second cutting image acquisition module is used for cutting the town satellite remote sensing image and the non-town satellite remote sensing image according to the size of NxN to respectively obtain a first cutting image corresponding to the town satellite remote sensing image and a second cutting image corresponding to the non-town satellite remote sensing image; wherein N is a positive integer greater than or equal to 1;

the second labeling image generating module is used for labeling the image elements in the first cutting image and the second cutting image in a town category and a non-town category, and respectively generating a first image element labeling image corresponding to the first cutting image and a second image element labeling image corresponding to the second cutting image;

and the target pixel classification model acquisition module is used for training an initial pixel classification network model based on the pixel annotation images of the town categories in the first pixel annotation image and the second pixel annotation image to obtain the target pixel classification network model.

Optionally, the classified image labeling module includes:

and the classified image marking unit is used for carrying out edge extraction on the classified images of the grids based on an image edge detection algorithm, marking the grid images identified as edges as E, and not marking the grid images without edges.

Optionally, the extraction result image obtaining module includes:

the blank image establishing unit is used for establishing a blank image with the same size as the remote sensing image to be processed so as to store the town extraction result image;

a grid classified image judging unit, configured to perform condition judgment on each of the grid classified images;

a first image assignment unit, configured to, if the classified grid image is of a town category and is marked as E, input the classified grid image into the target pixel classification network model to predict a pixel classification result thereof, and assign the pixel classification result image to the same region in the blank image;

a second image assignment unit, configured to assign all pixels in the same area in the blank image to 1 if the classified image of the mesh is an image of a town category and is not marked;

and the third image assignment unit is used for assigning all pixels in the same area in the blank image as 0 if the classified grid image is of a non-town type, and finally, the town extraction result image is a 0-1 binary image.

Optionally, the town binary image obtaining module includes:

and the town binary image acquisition unit is used for performing post-processing on the town mosaic image by utilizing a hole filling algorithm in morphological operation, filling pixels which are located inside the town and have pixel values of 0, and obtaining the town extracted binary image.

Compared with the prior art, the invention has the advantages that:

the embodiment of the invention provides a method and a device for extracting town images of remote sensing images based on scene and pixel information. The method comprises the steps of carrying out grid division on an obtained remote sensing image to be processed to generate a plurality of grid images corresponding to the remote sensing image to be processed; inputting each grid image into a pre-trained target scene classification network model to obtain a grid classification image corresponding to each grid image; performing edge extraction operation on the classified images of the grids, and marking the classified images of the grids which are edges according to the extraction result; inputting the marked grid classified image into a pre-trained target pixel classified network model to obtain a pixel classification result image corresponding to the marked grid classified image, and taking the pixel classification result image as a town extraction result image of a corresponding region in a town extraction result; assigning a value of 1 to a corresponding area in the town extraction result image aiming at the unmarked grid image judged as the town category by the target scene classification network model; assigning a value of 0 to a corresponding area in the town extraction result image aiming at the grid image judged as a non-town category by the target scene classification network model; and carrying out hole filling post-processing on the town extraction result image to obtain a town extraction binary image. The embodiment of the invention firstly utilizes a scene classification method to determine the range of the town distribution target area, then utilizes the pixel classification to subdivide the town boundary, and finally utilizes a hole filling algorithm to obtain the accurate and complete town distribution area.

Drawings

FIG. 1 is a flow chart of steps of a method for extracting a town from a remote sensing image based on scene and pixel information, provided by an embodiment of the invention;

fig. 2 is a schematic structural diagram of a remote sensing image town extraction device based on scene and pixel information according to an embodiment of the present invention.

Detailed Description

Example one

Referring to fig. 1, a flowchart illustrating steps of a town extraction method for a remote sensing image based on scene and pixel information according to an embodiment of the present invention is shown, and as shown in fig. 1, the town extraction method for a remote sensing image based on scene and pixel information may specifically include the following steps:

step 101: and carrying out grid division on the obtained remote sensing image to be processed to generate a plurality of grid images corresponding to the remote sensing image to be processed.

The embodiment of the invention can be applied to extracting the town extraction scene of the town distribution area in the remote sensing image.

The remote sensing image to be processed is a satellite remote sensing image needing town extraction.

After the remote sensing image to be processed is obtained, grid division can be performed on the remote sensing image to be processed to generate a plurality of grid images corresponding to the remote sensing image to be processed, it can be understood that for the remote sensing image to be processed, the remote sensing image to be processed must be a satellite remote sensing image with a training target scene classification network model and a target pixel classification network model which are homologous, the size of the remote sensing image to be processed can be P × Q, wherein P and Q are positive integers, grid division is performed on the image according to a sliding window with the size of N × N, wherein N is a positive integer and is an integer multiple of 2, each grid image is input into the trained CNN network model, the scene type of the grid image is output, the town type is 1, the non-town type is 0, a grid classification result image of the original remote sensing image is obtained, and the image is a binary image of 0-1. The method comprises the steps that a training image is divided according to a training target scene classification network model and a target pixel classification network model, wherein NxN is the size of the training image divided according to the training target scene classification network model and the target pixel classification network model, and in the process, the size of a grid image divided by a remote sensing image to be processed is required to be kept consistent with that of a grid image divided by a training sample.

After the remote sensing image to be processed is subjected to grid division to generate a plurality of grid images corresponding to the remote sensing image to be processed, step 102 is executed.

Step 102: and inputting each grid image into a pre-trained target scene classification network model to obtain a grid classification image corresponding to each grid image.

The target scene classification network model may be a CNN network model, the CNN network model may perform classification of a town category and a non-town category on the mesh image, and a training process of the CNN network model may be described in detail in combination with the following specific implementation manner.

In a specific implementation manner of the present invention, before the step 102, the method may further include:

step A1: obtaining homologous town satellite remote sensing images and non-town satellite remote sensing images;

step A2: cutting the town satellite remote sensing image and the non-town satellite remote sensing image according to the size of NxN to respectively obtain a first cut image corresponding to the town satellite remote sensing image and a second cut image corresponding to the non-town satellite remote sensing image; wherein N is a positive integer and is an integral multiple of 2;

step A3: labeling the first cut image and the second cut image respectively according to the town scene category and the non-town scene category, and respectively generating a first labeled image corresponding to the first cut image and a second labeled image corresponding to the second cut image;

step A4: training an initial scene classification network model based on the first annotation image and the second annotation image to obtain the target scene classification network model.

In the embodiment of the invention, a training sample set of a town scene image and a non-town scene image can be established first, a CNN (convolutional neural network) model of the convolutional neural network is trained, specifically, a certain satellite remote sensing image is selected, images of different spatial distribution areas are downloaded to ensure the representativeness of the established sample set, the images are cut according to the size of NxN (namely a first cut image and a second cut image), wherein N is a positive integer, the types of the cut image blocks are labeled, the image block of the town scene is labeled as 1 (namely the first labeled image), the image block of the non-town scene is labeled as 0 (namely the second labeled image), and the training sample set of the town scene image and the non-town scene image is obtained.

After the training sample sets of the town scene images and the non-town scene images are obtained, a certain pre-trained CNN network model can be selected, the network output structure is modified to adapt to the classification requirements of 2 classes of towns and non-towns, and the pre-trained CNN network model is subjected to fine tuning training of network parameters by using the constructed town scene images and the non-town scene image training sample sets to obtain the trained CNN network model, namely the target scene classification network model.

At this time, after a plurality of grid images corresponding to the remote sensing image to be processed are generated, each grid image may be input to the target scene classification network model, and the scene category of each grid image is output by the target scene classification network model, where the town category is 1 and the non-town category is 0, so as to obtain a grid classification result image of the remote sensing image to be processed, where the image is a 0-1 binary image.

After each grid image is input to a pre-trained target scene classification network model to obtain a grid classification image corresponding to each grid image, step 103 is performed.

Step 103: and performing edge extraction operation on the classified images of the grids, and marking the classified images of the grids which are edges according to the extraction result.

After the classified images of grids corresponding to each grid image input by the target scene classification network model are obtained, the classified images of town grids belonging to the town category in the classified images of grids can be obtained, specifically, the images of the grid classification result can be subjected to edge extraction by using an image processing edge detection algorithm, grids identified as edges are marked as E, and other grids which are not edges are not marked.

After the town mesh classified image is acquired, step 104 is performed.

Step 104: and inputting the marked grid classified images into a pre-trained target pixel classified network model to obtain pixel classification result images corresponding to the marked grid classified images, and taking the pixel classification result images as town extraction result images of corresponding areas in the town extraction results.

Step 105: assigning a value of 1 to a corresponding region in the town extraction result image for the unlabeled grid image judged as the town category by the target scene classification network model;

step 106: assigning a value of 0 to a corresponding area in the town extraction result image aiming at the grid image judged as a non-town category by the target scene classification network model;

the target pixel classification network model may be a model for classifying each pixel in the town mesh classification image, and in this example, the target pixel classification network model may be an FCN network model. The training process for the target pixel classification network model can be described in detail in conjunction with the following specific implementation manner.

In another specific implementation manner of the present invention, before the step 104, the method may further include:

step B1: obtaining homologous town satellite remote sensing images and non-town satellite remote sensing images;

and step B2: cutting the town satellite remote sensing image and the non-town satellite remote sensing image according to the size of NxN to respectively obtain a first cut image corresponding to the town satellite remote sensing image and a second cut image corresponding to the non-town satellite remote sensing image; wherein N is a positive integer and is an integral multiple of 2;

and step B3: labeling the town type and the non-town type of each pixel in the first cut image and the second cut image to respectively generate a first pixel labeling image corresponding to the first cut image and a second pixel labeling image corresponding to the second cut image;

and step B4: and training an initial pixel classification network model based on the pixel annotation images of the town categories in the first pixel annotation image and the second pixel annotation image to obtain the target pixel classification network model.

In the embodiment of the invention, a town pixel classification truth value image set can be established, a full connection network FCN (fuzzy C-means network) model is trained, satellite remote sensing images which are homologous with town and non-town scene images are selected, images in different space distribution areas are downloaded to ensure the representativeness of the established sample set, the images are cut according to the size of NxN, wherein N is a positive integer and is an integral multiple of 2, each pixel in the cut image block is labeled with a category, the town pixel is labeled as 1, the non-town pixel is labeled as 0, a pixel category labeled image with the same size as the image block is obtained, and the pixel category labeled image and the original image block form the town pixel classification truth value image set.

And designing an FCN network structure, requiring the size of an input image to be NxN, wherein N is a positive integer, the size of an output image is the same as that of the input image, and performing parameter training on the FCN network by using the constructed town pixel classification true value image set to obtain a trained FCN network model.

After the town grid classified image is obtained, the town grid classified image can be input into a target pixel classified network model to predict the pixel classification result, the pixel classification result image is used as the extraction result of the corresponding region in the town extraction result, the grid image judged as the town category by the CNN network model and without the mark of the grid is directly assigned with 1 for the corresponding region in the town extraction result, and the grid image judged as the non-town category by the CNN network model and with 0 for the corresponding region in the town extraction result. The specific implementation process is as follows: establishing a blank image I which has the same size as the remote sensing image to be subjected to town extraction and has the size of P multiplied by Q and is used for storing a town extraction result image, wherein P and Q are both positive integers; and (3) carrying out condition judgment on each grid image in the grid classification result image: if the grid is of a town category and marked as E, inputting the grid into an FCN model to predict a pixel classification result of the grid, and assigning the pixel classification result image to the same area in the image I; if the grid is a grid image of a town category but is not marked, directly assigning all pixels of the same area in the image I to be 1; if the grid is in a non-town category, all pixels in the same area in the image I are directly assigned to be 0; and finally, obtaining a town extraction preliminary result which is a 0-1 binary image.

After assigning values to the corresponding regions of the resulting town extraction result image, step 107 is performed.

Step 107: and carrying out hole filling post-processing on the town extraction result image to obtain a town extraction binary image.

Further, hole filling post-processing is carried out on the town extraction result image to obtain a final town extraction binary image, specifically, a hole filling algorithm in morphological operation can be used for carrying out post-processing on a preliminary town extraction result (a processed image), pixels which are located inside the town and have pixel values of 0 are filled, and then the final town extraction binary image is obtained.

The remote sensing image town extraction method based on scene and pixel information provided by the embodiment of the invention generates a plurality of grid images corresponding to the remote sensing image to be processed by grid division of the obtained remote sensing image to be processed; inputting each grid image into a pre-trained target scene classification network model to obtain a grid classification image corresponding to each grid image; performing edge extraction operation on the classified grid image, and marking the classified grid image which is the edge according to an extraction result; inputting the marked grid classified image into a pre-trained target pixel classified network model to obtain a pixel classification result image corresponding to the marked grid classified image, and taking the pixel classification result image as a town extraction result image of a corresponding region in a town extraction result; assigning a value of 1 to a corresponding area in the town extraction result image aiming at the unmarked grid image judged as the town category by the target scene classification network model; assigning a value of 0 to a corresponding region in the town extraction result image for the grid image which is judged as a non-town category by the target scene classification network model; and carrying out hole filling post-processing on the town extraction result image to obtain a town extraction binary image. The embodiment of the invention firstly utilizes a scene classification method to determine the range of the town distribution target area, then utilizes the pixel classification to subdivide the town boundary, and finally utilizes a hole filling algorithm to obtain the accurate and complete town distribution area.

Example two

Referring to fig. 2, a schematic structural diagram of a remote sensing image town extraction device based on scene and pixel information according to an embodiment of the present invention is shown, and as shown in fig. 2, the remote sensing image town extraction device based on scene and pixel information may specifically include the following modules:

a grid image generation module 210, configured to perform grid division on the obtained remote sensing image to be processed, and generate a plurality of grid images corresponding to the remote sensing image to be processed;

a classified image obtaining module 220, configured to input each grid image into a pre-trained target scene classification network model, so as to obtain a grid classified image corresponding to each grid image;

a classified image labeling module 230, configured to perform an edge extraction operation on the classified grid image, and label the classified grid image as an edge according to an extraction result;

an extraction result image obtaining module 240, configured to input the labeled grid classified image into a pre-trained target pixel classification network model, so as to obtain a pixel classification result image corresponding to the labeled grid classified image, and use the pixel classification result image as a town extraction result image of a corresponding region in a town extraction result;

a first image assignment module 250, configured to assign a value of 1 to a corresponding region in the town extraction result image, for an unlabeled mesh image determined as a town category by the target scene classification network model;

a second image assignment module 260, configured to assign a value of 0 to a corresponding area in the town extraction result image, for the mesh image determined as a non-town category by the target scene classification network model;

and a town binary image obtaining module 270, configured to perform hole filling post-processing on the town extraction result image to obtain a town extraction binary image.

Optionally, the method further comprises:

Optionally, the classified image labeling module includes:

and the classified image marking unit is used for carrying out edge extraction on the classified images of the grids based on an image edge detection algorithm, marking the grid images identified as edges as E, and not marking the grid images which are not edges.

Optionally, the extraction result image obtaining module includes:

Optionally, the town binary image obtaining module includes:

and the town binary image acquisition unit is used for post-processing the town mosaic image by utilizing a hole filling algorithm in morphological operation, filling pixels which are positioned in the town and have a pixel value of 0, and obtaining the town extraction binary image.

The remote sensing image town extraction device based on scene and pixel information provided by the embodiment of the invention generates a plurality of grid images corresponding to the remote sensing image to be processed by grid division of the obtained remote sensing image to be processed; inputting each grid image into a pre-trained target scene classification network model to obtain a grid classification image corresponding to each grid image; performing edge extraction operation on the classified grid image, and marking the classified grid image which is the edge according to an extraction result; inputting the marked grid classified image into a pre-trained target pixel classified network model to obtain a pixel classification result image corresponding to the marked grid classified image, and taking the pixel classification result image as a town extraction result image of a corresponding region in a town extraction result; for the grid image judged as the town category by the target scene classification network model and not marked, assigning a value of 1 to a corresponding area in the town extraction result image; assigning a value of 0 to a corresponding area in the town extraction result image aiming at the grid image judged as a non-town category by the target scene classification network model; and carrying out hole filling post-processing on the town extraction result image to obtain a town extraction binary image. The embodiment of the invention firstly utilizes a scene classification method to determine the range of the town distribution target area, then utilizes the pixel classification to subdivide the town boundary, and finally utilizes a hole filling algorithm to obtain the accurate and complete town distribution area.

It should be noted that the above-described embodiments may enable those skilled in the art to more fully understand the present invention, but do not limit the present invention in any way. Thus, it will be appreciated by those skilled in the art that the invention may be modified and equivalents may be substituted; all technical solutions and modifications thereof which do not depart from the spirit and technical essence of the present invention should be covered by the scope of the present patent.

Those skilled in the art will appreciate that those matters not described in detail in the present specification are well known in the art.

Claims

1. A remote sensing image town extraction method based on scene and pixel information is characterized by comprising the following steps:

the method comprises the following steps: performing edge extraction on the classified grid image based on an image edge detection algorithm, marking the grid image identified as the edge as E, and not marking the grid image not identified as the edge;

the method comprises the following steps: establishing a blank image with the same size as the remote sensing image to be processed so as to store the town extraction result image;

if the classified grid image is of a non-town type, all pixels in the same area in the blank image are assigned to be 0, and finally the town extraction result image is a 0-1 binary image;

assigning a value of 1 to a corresponding area in the town extraction result image aiming at the unmarked grid image judged as the town category by the target scene classification network model;

2. The method of claim 1, further comprising, prior to said inputting each of said mesh images to a pre-trained target scene classification network model to obtain a mesh classification image corresponding to each of said mesh images:

3. The method according to claim 1, wherein before inputting the labeled classified image of the mesh into a pre-trained target image element classification network model to obtain an image element classification result image corresponding to the labeled classified image of the mesh, and using the image element classification result image as an extraction result of a corresponding region in the town extraction result, the method further comprises:

labeling the town type and the non-town type of each pixel in the first cut image and the second cut image to respectively generate a first pixel labeling image corresponding to the first cut image and a second pixel labeling image corresponding to the second cut image;

4. The method of claim 1, wherein the performing hole filling post-processing on the town extraction result image to obtain a town extraction binary image comprises:

and performing post-processing on the town mosaic image by using a hole filling algorithm in morphological operation, and filling pixels which are located inside the town and have pixel values of 0 to obtain a town extracted binary image.

5. A remote sensing image town extraction device based on scene and pixel information is characterized by comprising:

the grid image generation module is used for carrying out grid division on the acquired remote sensing image to be processed and generating a plurality of grid images corresponding to the remote sensing image to be processed;

a classified image obtaining module, configured to input each grid image to a pre-trained target scene classification network model, so as to obtain a grid classified image corresponding to each grid image;

the classified image marking module is used for executing edge extraction operation on the grid classified image and marking the grid classified image which is the edge according to an extraction result;

the classified image labeling module includes:

a classified image marking unit, configured to perform edge extraction on the classified grid image based on an image edge detection algorithm, mark a grid image identified as an edge as E, and not mark a grid image other than the edge;

the extraction result image acquisition module includes:

a grid classified image judging unit, configured to perform condition judgment on each grid classified image in the grid classified images;

a first image assignment unit, configured to, if the classified grid image is of a town category and is labeled as E, input the classified grid image to the target pixel classification network model to predict a pixel classification result of the classified grid image, and assign the pixel classification result image to a same area in the blank image;

a second image assignment unit, configured to assign all pixels in the same area in the blank image to 1 if the classified grid image is an image of a town category and is not marked;

a third image assignment unit, configured to assign all pixels in the same region in the blank image to 0 if the classified grid image is of a non-town category, and finally obtain a town extraction result image, where the town extraction result image is a 0-1 binary image;

6. The apparatus of claim 5, further comprising:

the first labeling image generating module is used for labeling the first clipping image and the second clipping image respectively according to the town scene category and the non-town scene category, and respectively generating a first labeling image corresponding to the first clipping image and a second labeling image corresponding to the second clipping image;

7. The apparatus of claim 5, further comprising:

the second cutting image acquisition module is used for cutting the town satellite remote sensing image and the non-town satellite remote sensing image according to the size of NxN to respectively obtain a first cutting image corresponding to the town satellite remote sensing image and a second cutting image corresponding to the non-town satellite remote sensing image; wherein N is a positive integer and is an integral multiple of 2;

the second labeling image generating module is used for labeling each pixel in the first clipping image and the second clipping image in a town category and a non-town category, and respectively generating a first pixel labeling image corresponding to the first clipping image and a second pixel labeling image corresponding to the second clipping image;

8. The apparatus of claim 5, wherein the town binary map obtaining module comprises: