CN113627288A

CN113627288A - Intelligent information label obtaining method for massive images

Info

Publication number: CN113627288A
Application number: CN202110850382.7A
Authority: CN
Inventors: 应申; 张馨月; 窦小影; 唐茉
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2021-11-09
Anticipated expiration: 2041-07-27
Also published as: CN113627288B

Abstract

The invention relates to an intelligent information label acquisition method facing mass images, which mainly comprises 4 modules: defining an image label database, providing a multi-scale weight segmentation model to classify the ground features of the remote sensing image, providing a CornmNet target detection model to detect the target of the remote sensing image, and acquiring a ground mark label by using a web retrieval technology. The intelligent information label acquisition method provided by the invention can realize the automatic extraction of multi-level information labels such as image ground objects, targets, landmarks and the like and the multi-angle and multi-level automatic description of image contents.

Description

Intelligent information label obtaining method for massive images

Technical Field

The invention belongs to the technical field of remote sensing image information extraction, and particularly relates to an intelligent information label acquisition method for massive images.

Background

The application of remote sensing image data is in the development period from the traditional industries such as geological disaster management, mineral resources, urban construction, marine field, meteorology and the like to the widening of the application of emerging industries such as fine agriculture, environmental evaluation, digital city and the like, and the application of remote sensing technology and image data products thereof plays an increasingly important role in the actual production and life of people. However, in the current image application process, most of image product users can only search and acquire information such as geographical locations and administrative divisions, and it is often difficult to obtain targeted image products according to specific requirements of image contents in the process, and the diversity of the user requirements on image data cannot be met.

In recent years, with the rapid development of image processing technologies in the fields of computer vision, pattern recognition and the like, an image processing method and a high-efficiency and high-performance deep learning model algorithm are continuously and deeply fused, and more possibilities are provided for automatically acquiring content information of remote sensing images. Meanwhile, based on a WEB space retrieval technology, the multisource multi-mode geospatial information can be provided in the image content acquisition process by utilizing the geospatial big data information.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an intelligent information label acquisition method facing mass images, which mainly comprises 4 modules of defining an image label database, providing a multi-scale weight segmentation (MWSNet) model to classify the ground objects of the remote sensing images, providing a CornmNet target detection model to detect the target of the remote sensing images and acquiring landmark labels by using a web retrieval technology, and can realize the automatic extraction of the multi-level information labels of the ground objects, the targets, the landmarks and the like of the images and the automatic description of the multi-angle and multi-level of the image contents.

In order to achieve the purpose, the technical scheme provided by the invention comprises the following steps:

step 1, defining an image tag database, including image information, land type tag information, target detection tag information and landmark tag information;

step 2, decompressing the original compressed files of the images in batches to obtain multispectral images and image metafiles comprising red, green and blue wave bands;

step 3, constructing and training a multi-scale weight segmentation model;

step 4, constructing and training a CornmNet target detection model;

step 5, acquiring an image intelligent ground object label by using a multi-scale weight segmentation model;

step 6, acquiring an image intelligent target label by using a CornmNet target detection model;

and 7, acquiring the intelligent image landmark tags by using a web retrieval technology.

In addition, in the step 1, the image tag database stores tag data by using a structured database PostgreSQL, and the database serves for tag storage and query management, describes image content, and provides data support for subsequent information precision services.

Moreover, the building and training of the multi-scale weight segmentation model in the step 3 includes the following steps:

3.1, introducing an ESP convolution structure in an ESPNet network, and constructing a rapid feature extraction structure;

step 3.2, deconvolution up-sampling is respectively carried out on the features with different scales extracted in the step 3.1;

step 3.3, giving different weights to the segmentation features sampled at different scales in the step 3.2 according to different ground object categories to obtain a multi-scale weight feature combination;

step 3.4, segmenting and classifying the image according to the multi-scale weight feature combination obtained in the step 3.3;

and 3.5, training the multi-scale weight segmentation model through the image data set to obtain model training parameters.

Moreover, the constructing and training of the CornmNet target detection model in the step 4 comprises the following steps:

step 4.1, adopting a DLA network as a model backbone network, and introducing an MLFPN multi-scale feature extraction module into the DLA network for feature extraction;

step 4.2, generating a key hot spot diagram in a cascading angular point pooling mode and a central point pooling mode;

4.3, accurately positioning the key points by a method of searching adjacent points and calculating offset;

step 4.4, obtaining the accurate position of the target frame by combining the length and the width of the target frame;

and 4.5, training the CornmNet target detection model through the image data set to obtain model training parameters.

Furthermore, the step 5 of obtaining the image intelligent ground object label by using the multi-scale weight segmentation model comprises the following steps:

step 5.1, cutting and blocking the multispectral image obtained in the step 2;

automatically cutting a read-in image according to the size of 512 pixels multiplied by 512 pixels to generate a block image with the regular size of an original image, performing local cache, and simultaneously recording corresponding position information in a file name, namely a row number, a column number and a tif;

step 5.2, reading in red, green and blue wave bands of the block images, and converting the red, green and blue wave bands into a true color synthesis sequence as a classification wave band;

step 5.3, carrying out normalization processing on the pixel values of the partitioned image after the band selection in the step 5.2 is finished;

the formula for the normalization calculation is as follows:

in the formula, x is a pixel value before the block image normalization, and x' is a pixel value after the block image normalization;

step 5.4, carrying out ground object classification on the block image subjected to the pixel value normalization processing and obtained in the step 5.3 by using the multi-scale weight segmentation model trained in the step 3;

step 5.5, sequentially splicing the classification results of the block images according to the position information of the block images to obtain a final classification result;

and 5.6, performing category proportion statistics according to the classification result obtained in the step 5.5, and storing corresponding information into a label database to finish the extraction of the ground feature label information.

Furthermore, the step 6 of acquiring the image intelligent target label by using the cornernet target detection model includes the following steps:

step 6.1, reading in the red, green and blue wave bands of the multispectral image obtained in the step 2, and converting the multispectral image into a true color synthesis sequence as detection data;

step 6.2, carrying out normalization processing on the image pixel values, wherein the formula of normalization calculation is as follows:

in the formula, x₁Is the pixel value, x, before image normalization₁' is the pixel value after the image normalization;

6.3, performing multi-scale block cutting on the normalized image;

step 6.4, carrying out target detection on the segmented image obtained in the step 6.3 by using the CornmNet target detection model trained in the step 4;

step 6.5, synthesizing the target detection result obtained in the step 6.4 according to the cutting size and the row and column number information;

and 6.6, carrying out information statistics on the target type, the coordinates and the number of the targets contained in the image range based on the target detection result obtained in the step 6.5, and storing the information into an image tag database to finish the extraction of the ground feature tag information.

In step 6.3, the multi-scale block cropping of the normalized image is to crop the image by combining two sizes, namely, 500 pixels × 500 pixels and 2000 pixels × 2000 pixels, at the same time, store the multi-scale block cropped block image under a folder, and name the block image according to the cropping size and the row and column number of the image during cropping, that is, the cropping size _ row number _ column number _ tif is used for processing and merging the subsequent detection results.

In step 6.5, the final image target detection result is synthesized by retaining a small target such as an oil storage tank, an airplane, or a ship for the result of clipping 500 pixels × 500 pixels, and retaining a large target such as a highway service station, a large bridge, or an airport for the result of clipping 2000 pixels × 2000 pixels.

Moreover, the step 7 of acquiring the image intelligent landmark tag by using the web retrieval technology includes the following steps:

step 7.1, Web retrieves POI information;

taking the image range recorded in the image metafile acquired in the step 2 as a retrieval range, selecting a rectangular area retrieval service, and retrieving POI landmark information in a rectangle corresponding to the coordinates by setting coordinates of the left lower corner and the right upper corner of the retrieval area;

step 7.2, cleaning the POI data retrieved in the step 7.1;

and 7.3, counting the POI information cleaned in the step 7.2 to obtain landmark tag information, and storing the landmark tag information into an image tag database to finish the acquisition of the landmark tag information.

Moreover, the reason why the POI data retrieved in the step 7.1 is cleaned in the step 7.2 is that the POI landmark information acquired in the step 7.1 has a category classification error, tag categories are adopted for secondary screening aiming at the category classification error, and the keywords of the hospital, the park, the stadium, the railway station and the fueling and fueling station are respectively provided with POI information tag classification labels of comprehensive hospital, the park, the stadium, the railway station and the fueling and fueling station, redundant information is filtered, and then information of POIs outside the image range is removed according to the image range, and only POI information in the image actual range is retained.

Compared with the prior art, the invention has the following advantages:

(1) the method comprises the steps of providing a multi-scale weight segmentation model suitable for remote sensing image ground feature classification, adopting a lightweight ESP convolution structure as a basic unit of the model, reducing model parameters by adopting a direct sample loading mode, and improving the running speed of the model; aiming at the characteristics of staggered distribution of image ground objects, large scale difference between classes and the like, the multi-scale weight module is added, so that the classification and identification capability of the model is improved.

(2) Providing a CormNet target detection model suitable for remote sensing image target detection, wherein a DLA network is adopted by a main network, and effective iteration and combination are carried out on the hierarchical characteristics of the network, so that the hierarchical characteristic information of the image target is enriched; the target frame is determined based on the extraction mode of the angular points and the central points, so that a large amount of calculation of anchor points and candidate frames in the image range is avoided; by adopting the MLFPN multi-stage feature pyramid extraction structure, the target features with overlarge differences can be accurately extracted on a plurality of internal and external scales.

(3) The intelligent information label acquisition method for the massive images is provided, and automatic extraction of multi-level information labels such as image ground objects, targets, landmarks and the like and multi-angle and multi-level automatic description of image contents are realized.

Drawings

Fig. 1 is a technical route diagram of an embodiment of the present invention.

Fig. 2 is a high-resolution remote sensing image map used in the embodiment of the present invention, in which fig. 2(a) is a GF-1 remote sensing image, and fig. 2(b) is a GF-2 remote sensing image.

Fig. 3 is an exemplary diagram of a classification result of a remote sensing image according to an embodiment of the present invention.

Fig. 4 is an exemplary diagram of a target detection result of a remote sensing image according to an embodiment of the present invention.

Fig. 5 is an exemplary diagram of a search range of a remote sensing image according to an embodiment of the invention.

Fig. 6 is an exemplary diagram of a landmark retrieval result of a remote sensing image according to an embodiment of the present invention.

Fig. 7 is an exemplary diagram of a remote sensing image tag according to an embodiment of the invention.

Detailed Description

The invention provides an intelligent information label acquisition method facing mass images, which mainly comprises 4 modules: defining an image label database, providing a multi-scale weight segmentation (MWSNet) model to classify the ground features of the remote sensing image, providing a CornmNet target detection model to detect the target of the remote sensing image, and acquiring the ground mark label by using a web retrieval technology. The intelligent information label acquisition method provided by the invention can realize the automatic extraction of multi-level information labels such as image ground objects, targets, landmarks and the like and the multi-angle and multi-level automatic description of image contents.

The technical solution of the present invention is further explained with reference to the drawings and the embodiments.

As shown in fig. 1, the process of the embodiment of the present invention includes the following steps:

step 1, defining an image tag database, which comprises image information, land type tag information, target detection tag information and landmark tag information.

The image tag database stores tag data by adopting a structured database PostgreSQL, serves for tag storage and query management, describes image content, and provides data support for subsequent information accurate service, and the tag storage structure is shown in table 1.

Table 1 tag storage structure table

And 2, decompressing the batch of original image compressed files to obtain multispectral images and image metafiles comprising red, green and blue wave bands.

Step 3, constructing and training a multi-scale weight segmentation model, and specifically comprising the following substeps:

and 3.1, introducing an ESP convolution structure in the ESPNet network, and constructing a rapid feature extraction structure.

And 3.2, performing deconvolution upsampling on the features with different scales extracted in the step 3.1 respectively.

And 3.3, giving different weights to the segmentation features sampled in different scales in the step 3.2 according to different ground object types to obtain a multi-scale weight feature combination.

And 3.4, carrying out segmentation classification on the multi-scale weight feature combination obtained in the step 3.3.

Step 4, constructing and training a CornmNet target detection model, and specifically comprising the following substeps:

and 4.1, adopting a DLA network as a model backbone network, and introducing an MLFPN multi-scale feature extraction module into the DLA network for feature extraction.

And 4.2, generating a key hot spot diagram in a cascading corner pooling mode and a central point pooling mode.

And 4.3, accurately positioning the key points by a method of searching adjacent points and calculating offset.

And 4.4, obtaining the accurate position of the target frame by combining the length and the width of the target frame.

And 5, acquiring the image intelligent ground object label by using the multi-scale weight segmentation model.

And 5.1, cutting and blocking the multispectral image acquired in the step 2.

The read-in image is automatically cut according to the size of 512 pixels multiplied by 512 pixels, block images with the regular size of the original image are generated, local cache is carried out, and meanwhile, corresponding position information is recorded in a file name, namely row number _ column number _.

And 5.2, reading in red, green and blue wave bands of the block images, and converting the red, green and blue wave bands into a true color synthesis sequence as a classification wave band.

And 5.3, performing normalization processing on the pixel values of the partitioned image after the band selection is completed in the step 5.2.

The formula for the normalization calculation is as follows:

in the formula, x is a pixel value before the block image normalization, and x' is a pixel value after the block image normalization.

And 5.4, carrying out ground object classification on the segmented image subjected to the pixel value normalization processing and obtained in the step 5.3 by using the multi-scale weight segmentation model trained in the step 3.

And 5.5, sequentially splicing the classification results of the block images according to the position information of the block images to obtain a final classification result.

And 6, acquiring an image intelligent target label by using a CornmNet target detection model.

And 6.1, reading in the red, green and blue wave bands of the multispectral image acquired in the step 2, and converting the multispectral image into a true color synthesis sequence to be used as detection data.

in the formula, x₁Is the pixel value, x, before image normalization₁' is the pixel value after the image normalization.

And 6.3, performing multi-scale block cutting on the image subjected to the normalization processing.

In order to reasonably reserve the characteristic information of various targets, the images are combined and cut by adopting two sizes of 500 pixels multiplied by 500 pixels and 2000 pixels multiplied by 2000 pixels, the block images after multi-scale block cutting are stored under a folder, and the block images are named according to the cutting size, the row number and the column number and other information during cutting, namely the cutting size, the row number and the column number, tif, and are used for processing and merging subsequent detection results.

And 6.4, carrying out target detection on the segmented image obtained in the step 6.3 by using the CornmNet target detection model trained in the step 4.

And 6.5, synthesizing the target detection result obtained in the step 6.4 according to the cutting size and the row and column number information.

The final image target detection result is synthesized by retaining a smaller target (e.g., an oil storage tank, an airplane, a ship, etc.) in the result of clipping 500 pixels × 500 pixels and retaining a larger target (e.g., a highway service station, a large bridge, an airport, etc.) in the result of clipping 2000 pixels × 2000 pixels, as shown in fig. 4.

And 6.6, carrying out information statistics on the target type, coordinates, number and other conditions of the targets contained in the image range based on the target detection result obtained in the step 6.5, and storing the information into an image tag database to finish the extraction of the ground feature tag information.

POI data of appointed keywords in an image range are obtained based on Baidu map service, and in the embodiment, common landmarks in 5 types of life of hospitals, parks, stadiums, railway stations and fueling and gas stations are taken as an example.

Step 7.1, the Web retrieves POI information.

And (3) taking the image range recorded in the image metafile acquired in the step (2) as a retrieval range, selecting a rectangular area retrieval service, and retrieving POI landmark information in a rectangle corresponding to the coordinates by setting coordinates of the left lower corner and the right upper corner of the retrieval area.

And 7.2, cleaning the POI data retrieved in the step 7.1.

Due to the fact that the POI landmark information acquired in the step 7.1 has the condition of category classification errors, tag categories are adopted for secondary screening aiming at the types of errors, keywords of a hospital, a park, a stadium, a railway station and a fueling and fueling station are set to be comprehensive hospitals, parks, stadiums, railway stations and fueling stations respectively, redundant information is filtered, information of POI outside an image range is eliminated according to the image range, and only POI information in the image actual range is reserved.

In specific implementation, the above process can adopt computer software technology to realize automatic operation process.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. An intelligent information label acquisition method for massive images is characterized by comprising the following steps:

step 3, constructing and training a multi-scale weight segmentation model;

step 4, constructing and training a CornmNet target detection model;

2. The method for acquiring the intelligent information labels facing the massive images as claimed in claim 1, wherein: the constructing and training of the multi-scale weight segmentation model in the step 3 comprises the following steps:

3. The method for acquiring the intelligent information labels facing the massive images as claimed in claim 1, wherein: the construction and training of the CornmNet target detection model in the step 4 comprises the following steps:

4. The method for acquiring the intelligent information labels facing the massive images as claimed in claim 2, wherein: the step 5 of obtaining the image intelligent ground object label by using the multi-scale weight segmentation model comprises the following steps:

step 5.1, cutting and blocking the multispectral image obtained in the step 2;

automatically cutting the read-in image according to a certain size, generating a block image with the regular size of the original image, performing local cache, and simultaneously recording corresponding position information in a file name, namely a row number _ column number _. tif;

step 5.3, carrying out normalization processing on the pixel values of the block images after the band selection in the step 5.2 is completed, wherein the formula of normalization calculation is as follows:

5. The method for acquiring the intelligent information labels facing the massive images as claimed in claim 3, wherein: the step 6 of obtaining the image intelligent target label by using the CornmNet target detection model comprises the following steps:

in the formula, x₁Is pixel value, x 'before image normalization'₁The pixel values are normalized images;

6.3, performing multi-scale block cutting on the normalized image;

6. The method for acquiring the intelligent information labels facing the massive images as claimed in claim 5, wherein: the step 6.3 of performing multi-scale block clipping on the normalized image is to perform combined clipping on the image by using two sizes of 500 pixels × 500 pixels and 2000 pixels × 2000 pixels, store the multi-scale block clipped image under a folder, and name the block image according to the clipping size and the row and column number during clipping, namely, clipping size _ row number _ column number _ tif, which is used for processing and merging subsequent detection results.

7. The method for acquiring the intelligent information labels facing the massive images as claimed in claim 6, wherein: in the step 6.5, the result of clipping the size of 500 pixels × 500 pixels is to keep the small targets such as the oil storage tank, the airplane, and the ship, and the result of clipping the size of 2000 pixels × 2000 pixels is to keep the large targets such as the highway service station, the large bridge, and the airport, so as to synthesize the final image target detection result.

8. The method for acquiring the intelligent information labels facing the massive images as claimed in claim 1, wherein: the step 7 of acquiring the intelligent landmark tag of the image by using the web retrieval technology comprises the following steps:

step 7.1, Web retrieves POI information;

step 7.2, cleaning the POI data retrieved in the step 7.1;

9. The method for acquiring the intelligent information labels facing the massive images as claimed in claim 8, wherein: the reason why the POI data retrieved in the step 7.1 are cleaned in the step 7.2 is that the POI landmark information acquired in the step 7.1 has a category classification error, tag categories are adopted for secondary screening aiming at the category errors, and the keywords of the hospital, the park, the stadium, the railway station and the fueling and fueling station are respectively set with POI information tag classification labels of comprehensive hospital, the park, the stadium, the railway station and the fueling and fueling station, redundant information is filtered, and then the information of POIs outside the image range is removed according to the image range, and only the POI information in the actual image range is retained.