CN114092423A

CN114092423A - Intelligent extraction method for remote sensing image information label

Info

Publication number: CN114092423A
Application number: CN202111334182.2A
Authority: CN
Inventors: 王永安; 李峰; 楚博策; 王士成; 陈金勇; 帅通; 彭会湘; 王梅瑞; 于君娜; 韦二龙; 杨广厅
Original assignee: CETC 54 Research Institute
Current assignee: CETC 54 Research Institute
Priority date: 2021-11-11
Filing date: 2021-11-11
Publication date: 2022-02-25

Abstract

The invention belongs to the field of remote sensing information service, and provides an intelligent extraction method for remote sensing image information labels. The method adopts an artificial intelligence method to finish the understanding of the remote sensing image, and carries out label extraction on the remote sensing image from multiple angles such as ground feature classification, target detection, space retrieval and the like, thereby realizing the accurate description on the content and the use level of the remote sensing image and providing basic data support for assisting a user to quickly acquire the information of the remote sensing image of interest. The invention provides a multi-scale weight (MWSNet) segmentation model suitable for image surface feature classification and a CormNet model suitable for image target detection, which are used for quickly extracting surface feature information and target information of remote sensing images, quickly extracting typical landmark information based on spatial retrieval, constructing a multi-level label system of surface features, targets, landmarks and the like for remote sensing image data, enriching semantic features of the remote sensing images on the basis of the basic attributes of the remote sensing images, realizing accurate description of contents and purposes of the remote sensing images and providing basic data support for intelligent service of the remote sensing data.

Description

Intelligent extraction method for remote sensing image information label

Technical Field

Aiming at the diversity requirements of users in different fields on remote sensing data products, the invention provides the steps of extracting multilevel label information such as ground objects, targets, landmarks and the like of the remote sensing data products to be distributed with unknown contents, constructing an image label system, storing and managing, realizing the accurate description of the remote sensing data contents, and supporting the users to quickly obtain the interested remote sensing image information.

Background

The application of remote sensing image data is in the development period from the traditional industries such as geological disaster management, mineral resources, urban construction, marine field, meteorology and the like to the widening of the application of emerging industries such as fine agriculture, environmental evaluation, digital city and the like, and the application of remote sensing technology and image data products thereof plays an increasingly important role in the actual production and life of people. However, in the image application process, most of current users can only search and acquire information such as geographic positions, administrative divisions and the like, and it is often difficult to obtain targeted image products according to specific requirements of image contents, and the requirement of the users on the diversity of image data requirements cannot be met.

In recent years, with the rapid development of image processing technologies in the fields of computer vision, pattern recognition and the like, an image processing method is continuously and deeply integrated with an efficient and high-performance deep learning model algorithm, and more possibilities are provided for automatically acquiring content information of remote sensing images. Meanwhile, based on a WEB space retrieval technology, the multisource multi-mode geospatial information can be provided in the image content acquisition process by utilizing the geospatial big data information.

Disclosure of Invention

The invention aims to solve the technical problem of providing an intelligent extraction method of remote sensing image information labels for avoiding the problems in the background technology, and provides a rapid high-precision image ground object classification and target detection model aiming at mass unknown image data under a python language and a pytorch deep learning framework, and realizes an automatic extraction technology of multi-level information labels such as image ground objects, targets, landmarks and the like, so as to automatically describe the image contents in multiple angles and layers.

The technical scheme adopted by the invention is as follows:

an intelligent extraction method for remote sensing image information labels comprises the following processes:

constructing an MWSNet multi-scale weight segmentation model and training ground object categories; carrying out wave band selection and numerical normalization processing on an original image, cutting the original image to generate a block image with a regular size, sending the block image into a trained MWSNet multi-scale weight segmentation model, carrying out ground feature classification on the block image, merging ground feature classification results to obtain ground feature category and proportion label information in the original image, and storing the ground feature category and proportion label information in an image label database;

constructing a CormNet target detection model and carrying out target class training; performing wave band selection and numerical normalization processing on an original image, respectively cutting the image by adopting two different pixels at the same time to generate a block image with a regular size, respectively sending the block images with the two different pixels into a trained CormNet target detection model, performing target detection on the block images, merging target detection results, reserving a small target in a lower pixel block image of the two pixels, reserving a large target in a higher pixel block image, synthesizing a final image target detection result, obtaining target category, coordinate and number label information, and storing the target category, coordinate and number label information in an image label database; wherein, the small target and the large target can be set;

based on the original image geographic range, POI information in the image range is extracted by utilizing a spatial retrieval technology, error removal and repeated cleaning operations are carried out on the POI information, effective landmark tag information is extracted and stored in an image tag database.

The MWSNet multi-scale weight segmentation model comprises a feature learning module, an up-sampling module and a multi-scale weight connection module;

the feature learning module comprises four groups of feature extraction modules which are sequentially connected, image features are extracted in the feature extraction modules through ESP structure stacking, a self-adaptive pooling structure is adopted to carry out down-sampling on the feature map, and multi-scale image features extracted by the four groups of feature extraction modules are respectively output to the upper sampling module;

the up-sampling module directly up-samples the multi-scale image feature maps obtained from the different feature extraction modules to 1/2 size of the input image, the up-sampling process is realized by deconvolution operation, and the up-sampled image is output to the multi-scale weight connection module;

after the multi-scale weight connection module samples the multi-scale image feature map, according to the scale difference features among the image ground features, the method of giving different weights to each corresponding type of ground features in the different-scale image feature maps is adopted to obtain the combined features of the ground feature types, and then final classification calculation is carried out based on the connection features of all the image feature maps.

The CormNet target detection model comprises a feature extraction module, a pooling module and a target frame positioning module;

the characteristic extraction module is based on a DLA trunk network improved by a multi-stage characteristic pyramid structure, aiming at the detection problem of a multi-scale target, the multi-stage characteristic pyramid structure is added into the DLA trunk network, multi-stage and multi-scale depth characteristic extraction is carried out on an input block image to be used as a characteristic basis of a target detection task, and key point bias and target frame length and width are obtained through convolution;

the pooling module comprises a cascade corner pooling module and a central pooling module, the central pooling module calculates the maximum value of the sum of the feature values of the key points in the horizontal direction and the vertical direction according to the features output by the feature extraction module to determine the central position of the target, the cascade corner pooling module combines the maximum values of the feature values of the inner part and the boundary direction of the target to determine the target corner, and finally a key heat point map comprising the information of the central point and the corner is generated;

and the target frame positioning module comprehensively determines the final detection target position by matching the key hot spot diagram of the target corner points and the central points and carrying out convolution on the depth features output by the feature extraction module to obtain key point bias and the target frame length and width.

The improved DLA trunk network comprises the steps that different scale features extracted from the DLA trunk network are used as original input to a multilevel feature pyramid structure, shallow and deep image features are fused in the multilevel feature pyramid structure through a feature fusion module FFM, then a multilevel thinning U-shaped module TUM is adopted for deep processing, and finally a multilevel feature pyramid feature is constructed through multilevel multiscale features generated by a multilevel thinning U-shaped module TUM aggregated by a scale feature aggregation module SFAM.

Compared with the prior art, the invention has the following advantages:

1) the invention provides a multilevel label automatic extraction method for remote sensing image data of unknown content, and the semantic description of the image is enriched.

2) The multi-scale weight segmentation model and the CormNet model which are suitable for remote sensing image classification and detection are provided, and the automatic label extraction efficiency is improved under the condition of no precision loss.

Drawings

FIG. 1 is a block diagram of a method implementation of the present invention.

FIG. 2 is a diagram of a MWSNet multi-scale weight segmentation model architecture.

Fig. 3 is a structural diagram of a feature extraction module of the MWSNet of the present invention.

FIG. 4 is a diagram of a CornmNet target detection model according to the present invention.

Fig. 5 is a diagram of the structure of the improved DLA backbone network of the present invention.

FIG. 6 is a diagram illustrating an image according to an embodiment of the present invention.

FIG. 7 is a diagram illustrating an example of image classification results according to the present invention.

FIG. 8 is a diagram illustrating an exemplary image target detection result according to the present invention.

FIG. 9 is an exemplary diagram of an image search range according to the present invention.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings and detailed description, in order to facilitate the understanding and implementation of the invention by those skilled in the art.

An intelligent extraction method for remote sensing image information tags is shown in fig. 1, and comprises the following processes:

As shown in figure 2, an MWSNet multi-scale weight segmentation model is introduced into an ESP convolution structure in an ESPNet network, a rapid feature extraction structure is built, deconvolution up-sampling is respectively carried out on features of different scales, a multi-scale weight feature combination mechanism for giving weights to the segmented features subjected to the up-sampling of different scales according to different ground feature types is provided, final segmentation classification model parameters are obtained through training on a data set in the early stage according to comprehensive feature combinations, current model parameters are obtained through GF-1 image data set training, and the accuracy of ground features such as woodland, grassland, cultivated land, built-up areas, bare land and water bodies can reach more than 92.55% through verification. The applicability of the model parameters to the image and ground feature types needs to be adjusted through a training data set.

The MWSNet multi-scale weight segmentation model specifically comprises a feature learning module, an up-sampling module and a multi-scale weight connection module;

as shown in fig. 3, the feature learning module includes four sets of Feature Extraction Modules (FEM) connected in sequence, and the feature extraction modules stack and extract image features through an ESP structure, and perform downsampling on a feature map by using an adaptive Pooling structure in a posing operation, and connect a connection layer (concat) with a low-level feature extracted by convolution (conv) to form an output feature. Finally, the multi-scale image features extracted by the four groups of feature extraction modules are respectively output to an upper sampling module,

The CornmNet target detection model is combined with two one-stage models, namely a ConrNet model and a CenterNet model, a target is detected based on corner points and center points, the learning capability of the model is improved by introducing a multi-scale feature pyramid in the feature extraction stage, and the target position is finally locked through the positions of the center points of the corner points, the offset of the key points and the length and the width of a target frame. The model parameters are acquired through early-stage training, the current model parameters are acquired through GF-2 image training, detection of 9 types of targets of ships, airplanes, oil storage tanks, bridges, track and field lands, overpasses, expressway service areas, ports and airports can be achieved in a GF-2 data set, and the average accuracy rate exceeds 85%. The applicability of the model parameters to the image and the detected target category needs to be adjusted through a training data set.

As shown in fig. 4, the CormNet model structure mainly includes a feature extraction module, a pooling module, and a target frame positioning module.

1) Feature extraction module

The target detection in the GF-2 remote sensing image requires rich expression from low dimension to high dimension, from small dimension to large dimension, from fine grain to fuzzy resolution, a deep network can extract a large amount of remote sensing semantic information and global characteristics, but the image characteristic expression of a single deep network is not suitable for accurate identification of various targets. Therefore, a deep aggregation network DLA is selected as a backbone network for image Feature extraction, a Multi-level Feature Pyramid structure is introduced between a down-sampling structure and an up-sampling structure of the DLA, Multi-level features obtained by up-sampling of the DLA backbone network are formed into Multi-level Multi-scale features of a target through a Multi-level Feature Pyramid network (Multi-level Feature Pyramid), then up-sampling operation is carried out, final image features are output, and key point bias and target frame length and width are obtained through convolution.

As shown in fig. 5, the improved DLA backbone network includes that different scale features extracted from the DLA backbone network are used as original input to a multilevel feature pyramid structure, in the multilevel feature pyramid structure, shallow and deep image features are fused by a feature fusion module FFM, then a multilevel refinement U-shaped module TUM is used for deep processing, and finally a multilevel feature pyramid feature is constructed by aggregating multilevel multi-scale features generated by the multilevel refinement U-shaped module TUM by a scale feature aggregation module SFAM.

2) Pooling module (2Types of Pooling)

In the pooling stage, each detection target is regarded as 3 key points including 2 corner points and 1 central point, and two modules of cascading corner point pooling and central point pooling are correspondingly designed. Calculating the maximum value of the sum of the feature values of the key points in the horizontal direction and the vertical direction according to the output features of the backbone network to determine the center position of the target, determining the target corner points by combining the maximum values of the sum of the feature values in the internal direction and the boundary direction of the target, and finally generating a key heat point map (Heatmaps) comprising the information of the center points and the corner points.

4) Target frame positioning

In the target frame positioning process of the CormNet model, the accurate position of a target frame is obtained by convolving key point offset information (Offsets) and the length and width (W & H) of the target frame, which are obtained according to image depth characteristics extracted by a main network, and then combining key hotspot graphs (Heatmaps) generated by pooling.

The design idea of the invention is as follows:

1. data preparation and Experimental design

Taking a GF-1 image and a GF-2 image as examples (figure 6), the method extracts multi-level content information such as ground objects, targets, landmarks and the like of the images, generates corresponding labels of the images and stores the labels in an image label database.

2. Document pre-processing

The original compressed files of the batch images are decompressed, and in implementation, the images comprise multispectral images of red, green and blue wave bands and image metafiles.

3. Image tag database definition

The image tag database stores tag data by adopting a structured database PostgreSQL, serves for tag storage and query management, describes image content and provides data support for subsequent information accurate service. The image tag data database comprises image information, ground class tag information, target detection tag information and landmark tag information.

Table 1 tag storage structure table

4. Image intelligent ground object label extraction

4.1 image band selection

The red, green and blue wave bands of the read block images are sequentially selected and converted into a true color synthesis sequence as a classification wave band.

4.2 numerical pretreatment

And carrying out maximum and minimum normalization on the block image values with the selected wave bands, wherein the maximum value is 1100 and the minimum value is 0.

4.3 image blocking Pre-processing

The read-in image is automatically cut according to the size of 512 pixels multiplied by 512 pixels, block images with the regular size of the original image are generated, local cache is carried out, and meanwhile, corresponding position information is recorded in a file name and is row number _ column number _.

4.4 Intelligent ground feature classification of image

And sending the block images subjected to numerical processing into a trained multi-scale weight segmentation model, and carrying out ground feature classification on the block images. And according to the position information, sequentially splicing the classification results of the block images to obtain a final classification result, and outputting and storing the final classification result (fig. 7).

4.5 ground object tag information extraction

And according to the classification result, performing class proportion statistics, and storing corresponding information into a label database.

5. Image intelligent target tag extraction

5.1 image band selection

The red, green and blue wave bands of the image are selected and converted into a true color synthesis sequence as detection data.

5.2 numerical pretreatment

5.3 image multiscale block cropping

In order to reasonably keep the characteristic information of various targets, the image is cut by combining two sizes of 500 pixels multiplied by 500 pixels and 2000 pixels multiplied by 2000 pixels. And storing the multi-scale block image after being cut into blocks under a folder, naming the block image according to the information such as the cutting size, the row number and the column number during cutting, wherein the information is the cutting size, the row number, the column number and the tif and is used for processing and merging subsequent detection results.

5.4 image target detection

And sequentially carrying out target detection based on a CormNet model on all the block images.

5.5 Synthesis of assay results

The block images are detected to generate corresponding image detection results, and the results are synthesized according to information such as the cropping size and the row and column number, wherein a smaller target is reserved in the result of the cropping size of 500 pixels, a larger target is reserved in the result of the cropping size of 2000 pixels, and the final image target detection result is synthesized (fig. 8).

5.6 target tag information extraction

And performing information statistics on the conditions such as the object type, the coordinates, the number and the like of the objects contained in the image range based on the obtained image detection result, and storing the information in an image tag database.

6. Image intelligent landmark tag extraction

POI data of appointed keywords in an image range are obtained based on Baidu map service, and common landmarks in 5 types of life of hospitals, parks, stadiums, railway stations and fueling and gas stations are taken as examples in implementation.

6.1Web retrieval of POI information

According to the image metafile, recording the image range as the retrieval range (figure 9), selecting a rectangular area retrieval service, and retrieving the appointed place information in the rectangle corresponding to the coordinates by setting the coordinates of the left lower corner and the right upper corner of the retrieval area.

6.2POI data cleansing

And when the acquired POI landmark information has category classification errors, tag categories are adopted for secondary screening aiming at the category errors, and POI information tag classification labels are respectively set for key words of hospitals, parks, stadiums, railway stations and fueling stations, so as to filter redundant information.

And removing POI information outside the image range according to the image range, and only keeping the POI information in the actual range of the image.

6.3 Landmark tag information extraction

And (4) counting the acquired POI information (including type, name, position and the like) to obtain landmark tag information, and storing the landmark tag information into an image tag database.

7. Video tag database description

After the image multilevel labels are extracted, a corresponding label is generated for each image, and the label is stored in an image label database and can be used for follow-up image query and management.

The foregoing is illustrative of embodiments of the present invention and is not to be construed as limiting thereof, since equivalent alterations and modifications may be effected without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An intelligent extraction method for remote sensing image information labels is characterized by comprising the following processes:

2. The intelligent extraction method of remote sensing image information labels as claimed in claim 1, wherein the MWSNet multi-scale weight segmentation model comprises a feature learning module, an up-sampling module and a multi-scale weight connection module;

3. The intelligent extraction method of the remote sensing image information tag according to claim 1, wherein the CormNet target detection model comprises a feature extraction module, a pooling module and a target frame positioning module;

4. The intelligent remote sensing image information tag extraction method according to claim 3, wherein the improved DLA backbone network comprises a multilevel feature pyramid structure in which different scale features extracted from the DLA backbone network are used as original input, shallow and deep image features are fused by a feature fusion module FFM in the multilevel feature pyramid structure, then a multilevel refinement U-shaped module TUM is used for deep processing, and finally a multilevel feature pyramid feature is constructed by aggregating multilevel multi-scale features generated by the multilevel refinement U-shaped module TUM by a scale feature aggregation module SFAM.