CN113420738B

CN113420738B - Self-adaptive network remote sensing image classification method, computer equipment and storage medium

Info

Publication number: CN113420738B
Application number: CN202110971318.4A
Authority: CN
Inventors: 唐厂; 李显巨; 孙琨; 王力哲
Original assignee: China University of Geosciences
Current assignee: China University of Geosciences
Priority date: 2021-08-24
Filing date: 2021-08-24
Publication date: 2021-11-09
Anticipated expiration: 2041-08-24
Also published as: CN113420738A

Abstract

The invention provides a self-adaptive network remote sensing image classification method, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring an image to be detected; inputting an image to be detected into a trained region generator, extracting at least one image subregion in the image to be detected as a target region by the region generator, extracting the information degree of each target region, and screening the target region according to the information degree to obtain a discriminant region; extracting regional characteristics of the discriminant region and global characteristics of the image to be detected by using a characteristic extraction network, and performing self-adaptive weighted convolution transformation on each regional characteristic and each global characteristic to obtain second transformation characteristics; and inputting the second transformation characteristic into a trained classifier to obtain a classification result, so that the limitation of a redundant region and a noise region in the remote sensing scene image on the network classification performance can be reduced, and the discriminant region in the image can be effectively positioned to promote the network classification performance.

Description

Self-adaptive network remote sensing image classification method, computer equipment and storage medium

Technical Field

The invention relates to the technical field of image processing, in particular to a self-adaptive network remote sensing image classification method, computer equipment and a storage medium.

Background

The remote sensing scene classification refers to the classification of remote sensing scenes into specific categories by using content information in remote sensing images, and has wide application in the fields of land management, city planning, field fire prevention, crop growth supervision, target detection and the like. However, due to the large distance between the image sensor usually located on the satellite and the earth surface, the remote sensing scene images have large scale difference, which causes many challenges for remote sensing scene classification.

Many scholars propose a large number of remote sensing scene classification task methods, which can be divided into two types from the aspect of feature characterization difference, namely, a method based on the traditional manual manufacturing features; the second is a method based on deep learning. For the first method, the features are extracted by using modes such as scale invariant feature transformation, histogram oriented gradient, local binary pattern and the like, and then the extracted features are used for training a classifier, however, the method has long time for extracting the features, limited feature characterization capability, needs expert knowledge in the professional field to guide the feature extraction process, and consumes a large amount of manpower and material resources; for the second type of method, although the method has good feature characterization capability and learning capability, features of lower levels in the network are not fully exerted, so that the method is influenced by some redundant and noisy regions in a scene in the classification process, and is influenced by high-scale differences of objects in remote sensing images, and the robustness of the classification method is poor.

Disclosure of Invention

The invention solves the problem of how to improve the classification performance of the remote sensing image.

In order to solve the above problems, the present invention provides a method for classifying a self-adaptive network remote sensing image, wherein the self-adaptive discriminative area learning network includes a feature extraction network, an area generator, a discriminator and a scorer, and the method for classifying a self-adaptive network remote sensing image includes:

acquiring an image to be detected; inputting the image to be detected into the trained region generator, extracting at least one image subregion in the image to be detected by the region generator to serve as a target region, extracting the information degree of each target region, and screening the target region according to the information degree to obtain at least one discriminant region; extracting the regional characteristics of the discriminant region and the global characteristics of the image to be detected by using the characteristic extraction network, performing self-adaptive weighted convolution transformation on each regional characteristic and the global characteristics to obtain each regional characteristic and first transformation characteristic corresponding to the global characteristics, and combining all the first transformation characteristics to obtain second transformation characteristics; and inputting the second transformation characteristic into the trained scorer to obtain a classification result.

Compared with the prior art, the method has the advantages that the target area in the image to be detected is extracted and screened to obtain the discriminant area, so that the limitation of redundant areas and noise areas in the remote sensing scene image on the network classification performance can be reduced; by extracting the regional characteristics and carrying out self-adaptive weighted convolution transformation, the characteristics focused on different regions can be connected and classified, so that a classification result is obtained, and the discriminant region in the image is effectively positioned to promote the network classification performance.

Optionally, before the obtaining of the image to be detected, a network training step is further included, including:

acquiring a training image; processing the training image through the region generator, selecting at least one image sub-block in the training image as a training target region, extracting the information degree of each training target region, screening the training target region according to the information degree to obtain at least one training preferred region, and extracting the training region features of the training preferred region and the training global features in the training image by using the feature extraction network; carrying out self-adaptive weighted convolution transformation and combination on each training region characteristic and the training global characteristic to obtain a fusion characteristic; calculating the confidence coefficient of the training optimal region through the discriminator, sorting the information degree and the confidence coefficient from high to low, and screening the training optimal region with the information degree meeting a preset condition as a training discriminant region; scoring, by the scorer, the fused features; calculating network loss, wherein the network loss comprises area generation loss, discriminant loss and score loss, and propagating and optimizing the network based on the network loss back, wherein the area generation loss is constructed based on the information degree, the discriminant loss is constructed based on the confidence coefficient, and the score loss is constructed based on the classification result.

Therefore, during training, the network is reversely adjusted according to the loss, so that the network has higher robustness; based on network loss, three modules in the network are optimized simultaneously, so that the training efficiency of the network can be improved; and the network is finely adjusted according to the information degree, the confidence coefficient and the classification result, so that the classification accuracy can be further improved.

Optionally, the calculating, by the discriminator, a confidence level of the training preferred region, sorting the information degree and the confidence level from high to low, and screening the training preferred region with an information degree meeting a preset condition as a training discriminant region includes:

optimizing constraints for constructing the training preferred region using a pair of ordering loss functions, wherein the construction of the pair of ordering loss functions comprises: sequencing the training preferred areas according to the information degree of the training preferred areas and numbering; establishing a non-incremental function which takes the serial number as an independent variable and the information degree as a dependent variable as a first loss function; judging whether a second loss function with the number as an independent variable and the confidence coefficient as a dependent variable is consistent with the monotonicity of the first loss function; and if not, acquiring the training preferred area from the training target area again.

Therefore, the wrong target area constraint algorithm can be removed quickly, when the monotonicity of the first loss function is inconsistent with that of the second loss function, the step of selecting the discriminant area from the target area again is directly returned, and the training efficiency is improved.

Optionally, the calculating a network loss, the network loss including a region generation loss, a discriminant loss, and a fractional loss, and the optimizing the network based on the network loss back propagation includes:

weighting the area generation loss, the discriminant loss and the fractional loss to obtain the network loss; and performing back propagation on the network loss, and optimizing the feature extraction network, the region generator, the discriminator and the scorer.

Therefore, the network and the algorithm are optimized through back propagation through network loss, and the trained network has higher accuracy and robustness.

and constructing the region generation loss through a hinge loss function, and constructing the discriminant loss and the fractional loss through a cross entropy loss function.

Therefore, loss is generated by constructing the region through the hinge loss function, the loss value can be prevented from being influenced by a wrong target region, only the loss of the correct target region is considered, interference is reduced, and training efficiency is improved; discriminant loss and fractional loss are constructed through a cross entropy loss function, the optimized result is fast in convergence, and the training efficiency can be improved.

Optionally, the extracting the information degree of each target region and screening the target regions according to the information degree to obtain at least one discriminant region includes:

extracting the information degree of each target area; based on the information degree, using non-maximum value to inhibit and screen the target area to obtain a preferred area; and sorting the preferred regions from large to small according to the information degree, and selecting a preset number of the preferred regions as the discriminant regions.

Thus, for a certain target element, the most accurate target region can be obtained as the preferred region.

Optionally, the extracting, by using the feature extraction network, the regional features of the discriminant region and the global features of the image to be detected, and performing adaptive weighted convolution transformation on each of the regional features and the global features to obtain first transformation features corresponding to each of the regional features and the global features, and combining all the first transformation features to obtain second transformation features includes:

extracting the regional characteristics of the discriminant region and the global characteristics of the image to be detected; performing adaptive weighted convolution transformation on the region feature and the global feature to obtain the first transformation feature, including: respectively performing convolution operation on each feature by using different weights, and adding a corresponding bias term into each convolution result; and carrying out vector connection on the first transformation characteristics to obtain the second transformation characteristics.

Therefore, the characteristics are subjected to self-adaptive weighted convolution transformation, and accurate transformation characteristics can be obtained.

Optionally, the sizes of the image sub-regions include at least three, which are one-twelfth, one-sixth and one-third of the short side of the image to be detected; the aspect ratio is 1: 1. 3: 2. 2: 3.

therefore, the most appropriate image sub-area can be selected as the target area based on the target element to be framed, the calculation amount is reduced, and the classification efficiency is improved.

In another aspect, the present invention further provides a computer device, which includes a computer readable storage medium storing a computer program and a processor, where the computer program is read by the processor and executed by the processor, and implements the adaptive network remote sensing image classification method as described above.

Compared with the prior art, the computer equipment has the same advantages as the self-adaptive network remote sensing image classification method, and the description is omitted.

The invention also provides a computer storage medium, which stores a computer program, and when the computer program is read and executed by a processor, the method for classifying the self-adaptive network remote sensing image is realized.

Compared with the prior art, the computer storage medium has the same advantages as the self-adaptive network remote sensing image classification method, and the description is omitted here.

Drawings

FIG. 1 is a schematic flow chart of a method for classifying an adaptive network remote sensing image according to an embodiment of the present invention;

FIG. 2 is another schematic flow chart of the adaptive network remote sensing image classification method according to the embodiment of the present invention;

FIG. 3 is a schematic flow chart of the adaptive network remote sensing image classification method according to the embodiment of the present invention after step S300 is refined;

FIG. 4 is a schematic flow chart of the adaptive network remote sensing image classification method according to the embodiment of the present invention after step S400 is refined;

FIG. 5 is a schematic diagram of a classification method for an adaptive network remote sensing image according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of the classification OA result on the AID data set processed by other algorithms according to the embodiment of the present invention;

FIG. 7 is a diagram illustrating results of processing UC Merceded data set using other algorithms according to an embodiment of the present invention;

FIG. 8 is a graphical illustration of the results of processing a NWPU dataset using other algorithms in accordance with an embodiment of the present invention;

FIG. 9 is a graph illustrating the results of processing a WHU-RS19 data set using other algorithms in accordance with an embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

The invention provides a remote sensing image classification method, which comprises the following steps of referring to fig. 1:

and step S200, acquiring an image to be detected.

In an embodiment, the image to be detected comprises a remote sensing scene image, the remote sensing scene image is processed, and at least one image sub-region is selected as a target region, because the remote sensing scene image usually contains both background elements and target elements to be classified, the remote sensing image is primarily processed to obtain at least one target region.

Step S300, inputting the image to be detected into the trained region generator, extracting at least one image sub-region in the image to be detected as a target region by the region generator, extracting the information degree of each target region, and screening the target region according to the information degree to obtain at least one discriminant region.

In an embodiment, the region generator comprises a convolutional neural network.

In one embodiment, if 3 target elements are identified in the image to be detected, at least three image sub-blocks are selected as the target area.

Optionally, as shown in fig. 5, the sizes of the image sub-regions include at least three, which are one-twelfth, one-sixth, and one-third of the short side of the image to be detected; the aspect ratio is 1: 1. 3: 2. 2: 3.

in an embodiment, as shown in fig. 5, the selected area in the image is an image sub-area, the sizes of the image sub-areas are three, and the size of the target in the image to be detected is selected. Firstly, the length and the width of an image to be detected are obtained, the width of the image is used as a size reference quantity of a sub-region of the image, and the ratio of the length to the width of the sub-region of the image corresponds to the size of the sub-region of the image in a one-to-one mode. After the width of the image to be detected is obtained, a target element in the image to be detected is obtained, the length and width pixel value of the target element is judged, the target element is subjected to frame selection, and the target element is selected in a frame selection mode with the most appropriate size. For example, if the length and width of the image sub-region are 1000 and 600 pixels, respectively, then 600 pixels are used as the reference amount of the size of the image sub-region, and the target element is selected; identifying and obtaining a target element in an image to be detected, judging that the size of the target element is 150 pixels by 200 pixels at the moment, judging that the size of one tenth and one sixth cannot completely frame the target pixel into an image subregion, using one third of the width (200 pixels) of the maximum size as the size of the image subregion, and using a rectangular frame of 200 pixels by 300 pixels to select the target element because the aspect ratio of the image subregion is specified to be 2: 3.

The target areas selected by the preliminary frames may have overlapping portions, in this case, the same target element may be subjected to frame selection for multiple times, and the accuracy of the target areas selected by the frames is not the same, so that the target areas with higher quality need to be selected, and the target areas with more or less than complete frame selection of the target element need to be filtered.

For example, in an embodiment, for the same target element, four target areas frame the target element at the same time, but one of the target elements uses a smaller size and does not frame the target element completely, so that the target area has a lower information degree and is rejected; in two of the target areas, the target element does not appear in the center of the target area, so the information degree of the two target areas is low, and the last target area is selected as a discriminant area.

Optionally, if there are n target elements, at least n target regions are screened out as discriminant regions.

In an embodiment, the image to be detected has 3 target elements, and 7 target regions are selected from the frame, so that at least 3 target regions should be selected as discriminant regions in this embodiment. Specifically, when more than 3 target areas among the 7 target areas satisfy the requirement, for example, 5 target areas satisfy the condition, all of the 5 target areas are regarded as the discriminant areas.

Alternatively, referring to fig. 3, step S300 includes:

step S301, extracting the information degree of each target region.

In an embodiment, the informativeness of the target region is calculated by a convolutional neural network, the informativeness indicates the information content included in the target region, that is, the integrity of the target element is included, after the target region is selected, the informativeness in the target region is calculated, and then whether the target region selected in the step S400 is accurate or not can be screened based on the informativeness.

And step S302, based on the information degree, using a non-maximum value to inhibit and screen the target area, and obtaining a preferred area.

In an embodiment, a non-maximum suppression method is used to screen a target area, during the target detection process, a large number of candidate frames are generated at the same target position, and these candidate frames may overlap with each other, and at this time, it is necessary to find the optimal target area by using the non-maximum suppression method, and eliminate redundant bounding frames. Specifically, sorting is carried out according to the information degree scores, a boundary box with the highest information degree is selected, namely a target area with the highest information degree is selected and added to an output list; calculating the areas of all target areas; IoU (Intersection of Union ratio) of the target area with the highest information degree and other candidate target areas is calculated; target areas larger than the threshold are deleted IoU and the process is repeated until for one target element only one target area remains, with the final remaining target area being the preferred area.

Optionally, the non-maximum suppression threshold is set to 0.3.

And (4) intersection and union ratio, namely the ratio of the intersection and union of the areas of the two rectangular frames.

Step S303, sorting the preferred regions from large to small according to the information degree, and selecting a preset number of the preferred regions as the discriminant regions.

Optionally, the preset number is 4 or 6.

In an embodiment, if the preset number is 4, the plurality of preferred regions screened in step S302 are sorted from large to small according to the information degree, and the 4 preferred regions with the highest information degree are selected as discriminant regions for subsequent image classification.

Step S400, extracting the regional characteristics of the discriminant region and the global characteristics of the image to be detected by using the characteristic extraction network, performing self-adaptive weighted convolution transformation on each regional characteristic and the global characteristics to obtain each regional characteristic and first transformation characteristic corresponding to the global characteristics, and combining all the first transformation characteristics to obtain second transformation characteristics.

And extracting the regional characteristics of the discriminant region by using a characteristic extraction network to obtain respective characteristic vectors of the discriminant region. Because each region respectively selects different target elements, and the types of each target element are different, the region features extracted according to the discriminant region have different characteristics, and need to be adaptively transformed, and after adaptive transformation, all the features are connected to obtain a second transformation feature.

In an embodiment, in addition to connecting all the first transformation features, the features of the original image, that is, the features of the image to be detected, are also connected. Since the region feature is extracted from the discriminant region, the weighted first transformation feature is only the adaptive feature of all target elements, and does not include global information. The remote sensing image classification is to classify the whole image, and the characteristics of the original image contain global information, so the characteristics of the original image need to be connected to obtain a second transformation characteristic besides all the first transformation characteristics.

Alternatively, as shown in fig. 4, step S400 includes:

step S401, extracting the regional characteristics of the discriminant region and the global characteristics of the image to be detected.

Specifically, in step S400, the regional features of the discriminant region and the global features of the image to be detected are extracted first.

Step S402, performing adaptive weighted convolution transformation on the region feature and the global feature to obtain the first transformation feature, including: and respectively performing convolution operation on each feature by using different weights, and adding a corresponding bias term to each convolution result.

Adding different weights to different region characteristics, and convolving the weighted region characteristics, wherein the weighting to the region characteristics can be calculated as

. Wherein, denotes a convolution operation,

the features after the transformation are represented by a graph,

to represent

The corresponding weight of the weight is set to be,

to represent

Corresponding bias terms, thereby, can ensure that features focused on different elements can be adaptively connected in preparation for subsequent input of fully-connected layer classification.

Step S403, performing vector connection on the first transformation features to obtain the second transformation features.

Connecting all the transformed feature vectors to obtain a second transformed feature, and calculating the second transformed feature may be expressed as

。

Wherein the content of the first and second substances,

which represents a product based on the elements of the image,

the weight for a feature connection is represented,

showing a bias term for feature connection, CatF shows the feature after the connection of the first transformed feature after the feature transformation extracted from different discriminant regions is transformed, and a calculation formula is expressed as

Cat is the join operation and F is the second transformation characteristic.

And S500, inputting the second transformation characteristic into the trained scorer to obtain a classification result.

And inputting the second transformation characteristics acquired in the step S400 into the full-connection layer for classification operation to acquire a classification result.

In one embodiment, the classification results obtained by using the categories of mine land, cultivated land, forest land, water area, road, residential land, unused land and the like as the preset classification results include the probability of obtaining the classification results, and the classification with the highest probability is selected as the classification result. For example, the probability of the land for mining is 0.9, the probability of the cultivated land is 0.6; the probability of the forest land is 0.6; the probability of a water area is 0.8; the probability of a road is 0.1; the probability of the residential land is 0.2; and if the probability of the unused land is 0.2, the classification result of the image to be detected is the mine land.

Optionally, as shown in fig. 2, before step S200, the method further includes:

step S100, acquiring a training image.

Step S110, processing the training image through the region generator, selecting at least one image sub-block in the training image as a training target region, extracting the information degree of each training target region, screening the training target region according to the information degree to obtain at least one training preferred region, and extracting the training region features of the training preferred region and the training global features in the training image by using the feature extraction network.

And acquiring a training image, and selecting at least one image sub-block as a training target area.

In one embodiment, a training target area with at least the number of target elements is obtained, and if 3 target elements are detected in the image, at least 3 training target areas are obtained.

In another embodiment, according to the size of the target element, the most suitable image sub-block size is selected from preset sizes as the training target area, specifically, the preset image sub-block sizes are the lengths of one-twelfth, one-sixth and one-third of the width of the training image, and the length-width ratio is respectively 1: 1. 3: 2. 2: 3.

and extracting the information degree of the training target area, screening the training target area meeting the requirement by using a non-maximum value inhibition mode, and screening the training target area meeting the information degree requirement as a training discriminant area.

In an embodiment, a feature extraction network is used to extract the regional features of the training discriminant regions, and also extract the global features of the training images, and each training discriminant region independently obtains its own regional feature, wherein the feature extraction network is trained before step S100, the training mode is specifically that a Momentum + random gradient descent algorithm is used to optimize the network, the initial learning rate is set to 0.01, and 0.1 is multiplied by each 10 cycles.

Optionally, the feature extraction network is one of AlexNet, VGG16, or ResNet 50.

And step S120, performing self-adaptive weighted convolution transformation and combination on each training region characteristic and each training global characteristic to obtain a fusion characteristic.

In one embodiment, different weights are added to different regional features and global features, convolution operation is performed, corresponding bias terms are added after convolution, then vector connection is performed on the transformed features, and fusion features are obtained.

Step S130, calculating the confidence of the training preferential area through the discriminator, sorting the information degree and the confidence from high to low, and screening the training preferential area with the information degree meeting the preset condition as a training discriminant area.

And step S140, scoring the fusion characteristics through the scorer.

And inputting the fusion characteristics into a full-connection layer for scoring to obtain a prediction confidence coefficient, comparing the prediction confidence coefficient with a real classification result, calculating and judging loss, and performing reverse fine adjustment on the network based on the loss.

Step S150, calculating network losses, wherein the network losses comprise area generation losses, discriminant losses and score losses, and the network is optimized based on the network loss back propagation, wherein the area generation losses are constructed based on the information degree, the discriminant losses are constructed based on the confidence degree, and the score losses are constructed based on the classification results.

And obtaining a classification result through the full connection layer, wherein the classification result comprises prediction probabilities of different preset results.

The learning network is optimized through network loss back propagation, wherein the self-adaptive discriminant region learning network comprises all algorithms and modules in the steps S110-S140, errors of results in the steps S110-S140 are calculated through network loss, the errors are back propagated to the feature extraction network, the region generator, the classification algorithm and the fractional algorithm, and therefore a more accurate algorithm is obtained according to the error optimization algorithm.

Optionally, step S150 includes:

and weighting the area generation loss, the discriminant loss and the fractional loss to obtain the network loss.

And performing back propagation on the network loss, and optimizing the feature extraction network, the region generator, the discriminator and the scorer.

Weighting the area generation loss, the discrimination loss and the fractional loss to obtain the network loss, performing back propagation on the network loss as a whole, and optimizing a feature extraction network, a target extraction algorithm, a classification algorithm and a fractional algorithm, wherein a calculation formula for performing weighting calculation on the loss and the network loss can be expressed as

. Wherein the content of the first and second substances,

indicates a region generation loss,

Indicates a discrimination loss,

Indicates the loss of score,

And

are balance parameters.

Preferably, the first and second electrodes are formed of a metal,

and

are all 0.5.

Optionally, the region generation loss is constructed by a hinge loss function, and the discriminant loss and the fractional loss are constructed by a cross-entropy loss function.

For the acquisition algorithm of the target region, the cross entropy loss between the minimal real class and the prediction confidence is used for optimization.

The training target area which is wrongly constructed can be ensured to be far enough from the correct training target area through hinge loss, if the difference reaches a preset threshold value, the error of the wrongly constructed training target area can be considered as 0, otherwise, the error needs to be accumulated. Constructing the region generation penalty using the hinge penalty function may reduce the error penalty incurred by the wrong training target region, taking into account only the error incurred by the correct training target region.

The difference between the real probability distribution and the prediction probability distribution can be measured through a cross entropy loss function, and the smaller the value of the cross entropy is, the better the model prediction effect is. Wherein the discrimination loss can be expressed as

Where M is a confidence function, of different regions

Mapping to original image

The probability corresponding to the true class, N is the number of training discriminant regions. The score loss is used for dividing the remote sensing scene into specific categories by using the original image and the extracted features of different discriminant areas, and can be expressed as

. Wherein F is the classification result.

The final network loss function is

And performing reverse optimization on the network through L.

Optionally, calculating a confidence level of the training preferred region by the discriminator, sorting the information degree and the confidence level from high to low, and screening the training preferred region with an information degree meeting a preset condition as a training discriminant region includes:

optimizing constraints for constructing the training preferred region using a pair of ordering loss functions, wherein the construction of the pair of ordering loss functions comprises: sequencing the training preferred areas according to the information degree of the training preferred areas and numbering; and establishing a non-increasing function which takes the number as an independent variable and the information degree as a dependent variable as a first loss function.

The confidence coefficient is recorded as C, the information degree is recorded as I, and the area is recorded as

、

...

。

And selecting k target areas with the highest information degree through non-maximum suppression.

Pair-wise ordering penalty function optimization constraints

And

so that they have the same order.

Optionally, judging whether a second loss function with the number as an independent variable and the confidence coefficient as a dependent variable is consistent with the monotonicity of the first loss function;

if not, returning to the step of extracting the information degree of the training target area, sorting the information degree from high to low and screening the training target area with the information degree meeting the preset condition as a training discriminant area to obtain the training discriminant area again.

Judging whether the monotonicity is consistent or not, namely judging

Whether or not to meet

Then, the information degree and the confidence degree loss are defined as

。

If the monotonicity is not consistent, the process returns to step S120, and the training discriminant region is selected from the training target regions again.

In one embodiment, in order to verify the effectiveness of an ADRL-Net method based on an adaptive discriminant area learning network, the performance of the ADRL-Net method is tested on 4 remote sensing scene image data sets (AID, UC Merced, NWPU and WHU-RS 19), and meanwhile, the ADRL-Net method is compared with 10 current popular remote sensing scene classification methods.

The information for the 4 remote sensing scene image datasets is as follows:

(1) aeronautical Image Dataset (AID): the image comprises 10000 images of 30 different scene categories, each image is an RGB image with the size of 600 x 600, and the spatial resolution is different from 0.5m to 8 m.

(2) UC mercded Land Use (UC mercded Land Use dataset): 2100 images of different scene types in 21, each of which is an RGB image of size 256 × 256.

(3) NWPU-RESISC45 dataset: 31500 images of 45 different scene categories, each category containing 700 pictures, each image being an RGB image of size 256 x 256. The spatial resolution varies from 0.2m to 30 m.

(4) WHU-RS19 dataset: contains 950 images of 19 different scene categories, each category containing 700 pictures, each image being an RGB image of size 600 x 600.

The information of 10 remote sensing scene classification methods is as follows:

(1) trimmed AlexNet and VGG 16: the method replaces the full-connection layer of the convolutional neural network with the full-connection layer with the randomly initialized dimensionality as the number of the remote sensing scene categories.

(2) VGG-M: the method uses VVGnet to extract features, uses two full-connection layers to obtain final features, and then uses a linear support vector machine to obtain a classification result.

(3) BoVW: the method generates visual words from features extracted from existing convolutional neural networks.

(4) DFF: the method is a remote sensing scene classification method based on a depth feature fusion network.

(5) MSCP: the method combines a multi-layer stacked covariance pool with a pre-trained convolutional neural network.

(6) MCNN: the method solves the problem of large scale in the remote sensing scene image by using the multi-scale convolution neural network.

(7) DCNN: the method combines metric learning and a convolutional neural network to enhance discrimination capability.

(8) ARCnet: the method is an end-to-end attention cycle convolution neural network for remote sensing scene classification based on a human visual system.

(9) SCCov: the method embeds hop connectivity and covariance pooling in a MSCP network.

(10) GBNet: the method weakens and integrates multi-feature aggregation and interference information into an end-to-end remote sensing scene classification convolutional neural network.

In this experiment, we randomly generated training and test sets, and in order to reduce the effect of randomness on the results, we repeated the training test experiment 5 times and reported the mean and variance of the total accuracy (OA) results.

Analysis of results on AID dataset:

for the AID dataset we used two training test data partitioning approaches. For the first, we randomly chose 20% of the samples for training, and for the second, we chose 50% of the samples for training. Fig. 6 shows the OA results of the different algorithms. When the training ratio is 20%, the classification effect of ADRL-Net using ResNet50 and VGG16 as the backbone network is significantly better than that of other comparative algorithms. When the backbone networks are ResNet50 and VGG16, the classification OA values are 94.24% and 93.67%, respectively. When the training ratio is 50%, the classification effect of ADRL-Net using VGG16 as the backbone network is not the best, but is better than that of the DCNN method. In addition, schools, squares, villages, for example, are relatively difficult to identify scenes because many different or noisy object objects are contained in these scenes. In addition, the ADRL-Net can achieve 100% classification accuracy in the scenes such as airports, sand beaches, forests, mountains, ports, viaducts and the like.

Analysis of results on UC Merced dataset:

in this experiment, we randomly selected 50% and 80% of the samples in each category as training sets and the rest as test sets. Fig. 7 shows the classification accuracy results of different algorithms on the UC merceded data set. From the results, it can be seen that when the training sample ratio is 50%, the classification OA value of ADRL-Net using ResNet50 as the backbone network is 98.72%, which is significantly better than other comparison algorithms. The classification OA value for ADRL-Net using VGG16 as the backbone network was 97.31%.

Analysis of results on NWPU dataset:

in this experiment, we randomly selected 10% and 20% of the samples in each category as training sets and the rest as test sets. Figure 8 shows the classification accuracy results of different algorithms on NWPU datasets. As can be seen from the results, the OA values of the ADRL-Net classification are all optimal.

Analysis of results on WHU-RS19 dataset:

for the WHU-RS19 data set, we randomly selected 40% and 60% of the samples in each category as training sets, and the rest as test sets. FIG. 9 shows the classification accuracy results of different algorithms on the WHU-RS19 data set. From the results, it can be seen that although the classification OA of other comparison algorithms can reach the accuracy of more than 95%, the ADRL-Net still has effective effect improvement.

Effectiveness analysis of ADRL-Net:

to visually verify the effectiveness of ADRL-Net, we present the partial region visualization results generated by the discriminative region generator of ADRL-Net in fig. 5. As can be seen from the results, ADRL-Net can effectively extract the regions that provide valid information for a particular scene analogy.

Network convergence:

we trained the network using 10 cycles. To verify the convergence of ADRL-Net, the loss and OA values of the network over different periods on the AID data set were recorded. ADRL-Net converges around 10 cycles, and OA levels plateau around 8 cycles.

A computer device according to another embodiment of the present invention includes a computer-readable storage medium storing a computer program, and a processor, where the computer program is read by the processor and executed to implement the adaptive network remote sensing image classification method as described above.

A computer storage medium according to another embodiment of the present invention stores a computer program, which when read and executed by a processor, implements the adaptive network remote sensing image classification method as described above.

Although the present disclosure has been described above, the scope of the present disclosure is not limited thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present disclosure, and these changes and modifications are intended to be within the scope of the present disclosure.

Claims

1. A self-adaptive network remote sensing image classification method is characterized in that a self-adaptive discriminant area learning network comprises a feature extraction network, an area generator, a discriminator and a scorer, and the self-adaptive network remote sensing image classification method comprises the following steps:

acquiring a training image;

processing the training image through the region generator, selecting at least one image sub-block in the training image as a training target region, extracting the information degree of each training target region, screening the training target region according to the information degree to obtain at least one training preferred region, and extracting the training region features of the training preferred region and the training global features in the training image by using the feature extraction network;

carrying out self-adaptive weighted convolution transformation and combination on each training region characteristic and the training global characteristic to obtain a fusion characteristic;

calculating the confidence coefficient of the training optimal region through the discriminator, sorting the information degree and the confidence coefficient from high to low, and screening the training optimal region with the information degree meeting a preset condition as a training discriminant region;

scoring, by the scorer, the fused features;

calculating network loss, wherein the network loss comprises area generation loss, discrimination loss and score loss, and the network is optimized based on the back propagation of the network loss, wherein the area generation loss is constructed based on the information degree, the discrimination loss is constructed based on the confidence coefficient, and the score loss is constructed based on the classification result;

acquiring an image to be detected;

inputting the image to be detected into the trained region generator, extracting at least one image subregion in the image to be detected by the region generator to serve as a target region, extracting the information degree of each target region, and screening the target region according to the information degree to obtain at least one discriminant region;

extracting regional features of the discriminant region and global features of the image to be detected by using the feature extraction network, performing self-adaptive weighted convolution transformation on each regional feature and the global features to obtain first transformation features corresponding to the regional features and the global features, combining all the first transformation features to obtain second transformation features, wherein the self-adaptive weighted convolution transformation comprises using different weights to perform convolution operation on each feature respectively, and adding a corresponding bias term into each convolution result;

and inputting the second transformation characteristic into the trained scorer to obtain a classification result.

2. The adaptive network remote sensing image classification method according to claim 1, wherein the calculating the confidence of the training preferred region through the discriminator, the ranking the information degree and the confidence from high to low and screening the training preferred region with the information degree meeting a preset condition as a training discriminant region comprises:

optimizing constraints for constructing the training preferred region using a pair of ordering loss functions, wherein the construction of the pair of ordering loss functions comprises: sequencing the training preferred areas according to the information degree of the training preferred areas and numbering; establishing a non-incremental function which takes the serial number as an independent variable and the information degree as a dependent variable as a first loss function;

judging whether a second loss function with the number as an independent variable and the confidence coefficient as a dependent variable is consistent with the monotonicity of the first loss function;

and if not, acquiring the training preferred area from the training target area again.

3. The adaptive network remote sensing image classification method according to claim 1, wherein the network loss is calculated, the network loss comprises area generation loss, discriminant loss and fractional loss, and the network back propagation optimization based on the network loss comprises:

weighting the area generation loss, the discriminant loss and the fractional loss to obtain the network loss;

4. The adaptive network remote sensing image classification method according to claim 3, wherein the calculating of network losses, including area generation losses, discriminant losses, and fractional losses, and the optimizing of the network based on the network loss back propagation further comprises:

5. The method for classifying the self-adaptive network remote sensing image according to claim 1, wherein the extracting the information degree of each target area and screening the target areas according to the information degree to obtain at least one discriminant area comprises:

extracting the information degree of each target area;

based on the information degree, using non-maximum value to inhibit and screen the target area to obtain a preferred area;

and sorting the preferred regions from large to small according to the information degree, and selecting a preset number of the preferred regions as the discriminant regions.

6. The method for classifying the self-adaptive network remote sensing image according to claim 5, wherein the step of extracting the regional features of the discriminant region and the global features of the image to be detected by using the feature extraction network, performing self-adaptive weighted convolution transformation on each regional feature and each global feature to obtain each regional feature and first transformation features corresponding to the global features, and combining all the first transformation features to obtain second transformation features comprises the steps of:

extracting the regional characteristics of the discriminant region and the global characteristics of the image to be detected;

carrying out self-adaptive weighted convolution transformation on the region characteristic and the global characteristic to obtain the first transformation characteristic;

and carrying out vector connection on the first transformation characteristics to obtain the second transformation characteristics.

7. The adaptive network remote sensing image classification method according to any one of claims 1 to 6, characterized in that the sizes of the image subareas include at least three, which are one-twelfth, one-sixth and one-third of the short side of the image to be detected; the aspect ratio is 1: 1. 3: 2. 2: 3.

8. a computer device, comprising a computer readable storage medium storing a computer program and a processor, wherein the computer program is read by the processor and when executed, implements the adaptive network remote sensing image classification method according to any one of claims 1 to 7.

9. A computer storage medium, characterized in that a computer-readable storage medium stores a computer program which, when read and executed by a processor, implements the adaptive network remote sensing image classification method according to any one of claims 1 to 7.