CN113420738B - Self-adaptive network remote sensing image classification method, computer equipment and storage medium - Google Patents

Self-adaptive network remote sensing image classification method, computer equipment and storage medium Download PDF

Info

Publication number
CN113420738B
CN113420738B CN202110971318.4A CN202110971318A CN113420738B CN 113420738 B CN113420738 B CN 113420738B CN 202110971318 A CN202110971318 A CN 202110971318A CN 113420738 B CN113420738 B CN 113420738B
Authority
CN
China
Prior art keywords
region
loss
training
network
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110971318.4A
Other languages
Chinese (zh)
Other versions
CN113420738A (en
Inventor
唐厂
李显巨
孙琨
王力哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Geosciences
Original Assignee
China University of Geosciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Geosciences filed Critical China University of Geosciences
Priority to CN202110971318.4A priority Critical patent/CN113420738B/en
Publication of CN113420738A publication Critical patent/CN113420738A/en
Application granted granted Critical
Publication of CN113420738B publication Critical patent/CN113420738B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a self-adaptive network remote sensing image classification method, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring an image to be detected; inputting an image to be detected into a trained region generator, extracting at least one image subregion in the image to be detected as a target region by the region generator, extracting the information degree of each target region, and screening the target region according to the information degree to obtain a discriminant region; extracting regional characteristics of the discriminant region and global characteristics of the image to be detected by using a characteristic extraction network, and performing self-adaptive weighted convolution transformation on each regional characteristic and each global characteristic to obtain second transformation characteristics; and inputting the second transformation characteristic into a trained classifier to obtain a classification result, so that the limitation of a redundant region and a noise region in the remote sensing scene image on the network classification performance can be reduced, and the discriminant region in the image can be effectively positioned to promote the network classification performance.

Description

Self-adaptive network remote sensing image classification method, computer equipment and storage medium
Technical Field
The invention relates to the technical field of image processing, in particular to a self-adaptive network remote sensing image classification method, computer equipment and a storage medium.
Background
The remote sensing scene classification refers to the classification of remote sensing scenes into specific categories by using content information in remote sensing images, and has wide application in the fields of land management, city planning, field fire prevention, crop growth supervision, target detection and the like. However, due to the large distance between the image sensor usually located on the satellite and the earth surface, the remote sensing scene images have large scale difference, which causes many challenges for remote sensing scene classification.
Many scholars propose a large number of remote sensing scene classification task methods, which can be divided into two types from the aspect of feature characterization difference, namely, a method based on the traditional manual manufacturing features; the second is a method based on deep learning. For the first method, the features are extracted by using modes such as scale invariant feature transformation, histogram oriented gradient, local binary pattern and the like, and then the extracted features are used for training a classifier, however, the method has long time for extracting the features, limited feature characterization capability, needs expert knowledge in the professional field to guide the feature extraction process, and consumes a large amount of manpower and material resources; for the second type of method, although the method has good feature characterization capability and learning capability, features of lower levels in the network are not fully exerted, so that the method is influenced by some redundant and noisy regions in a scene in the classification process, and is influenced by high-scale differences of objects in remote sensing images, and the robustness of the classification method is poor.
Disclosure of Invention
The invention solves the problem of how to improve the classification performance of the remote sensing image.
In order to solve the above problems, the present invention provides a method for classifying a self-adaptive network remote sensing image, wherein the self-adaptive discriminative area learning network includes a feature extraction network, an area generator, a discriminator and a scorer, and the method for classifying a self-adaptive network remote sensing image includes:
acquiring an image to be detected; inputting the image to be detected into the trained region generator, extracting at least one image subregion in the image to be detected by the region generator to serve as a target region, extracting the information degree of each target region, and screening the target region according to the information degree to obtain at least one discriminant region; extracting the regional characteristics of the discriminant region and the global characteristics of the image to be detected by using the characteristic extraction network, performing self-adaptive weighted convolution transformation on each regional characteristic and the global characteristics to obtain each regional characteristic and first transformation characteristic corresponding to the global characteristics, and combining all the first transformation characteristics to obtain second transformation characteristics; and inputting the second transformation characteristic into the trained scorer to obtain a classification result.
Compared with the prior art, the method has the advantages that the target area in the image to be detected is extracted and screened to obtain the discriminant area, so that the limitation of redundant areas and noise areas in the remote sensing scene image on the network classification performance can be reduced; by extracting the regional characteristics and carrying out self-adaptive weighted convolution transformation, the characteristics focused on different regions can be connected and classified, so that a classification result is obtained, and the discriminant region in the image is effectively positioned to promote the network classification performance.
Optionally, before the obtaining of the image to be detected, a network training step is further included, including:
acquiring a training image; processing the training image through the region generator, selecting at least one image sub-block in the training image as a training target region, extracting the information degree of each training target region, screening the training target region according to the information degree to obtain at least one training preferred region, and extracting the training region features of the training preferred region and the training global features in the training image by using the feature extraction network; carrying out self-adaptive weighted convolution transformation and combination on each training region characteristic and the training global characteristic to obtain a fusion characteristic; calculating the confidence coefficient of the training optimal region through the discriminator, sorting the information degree and the confidence coefficient from high to low, and screening the training optimal region with the information degree meeting a preset condition as a training discriminant region; scoring, by the scorer, the fused features; calculating network loss, wherein the network loss comprises area generation loss, discriminant loss and score loss, and propagating and optimizing the network based on the network loss back, wherein the area generation loss is constructed based on the information degree, the discriminant loss is constructed based on the confidence coefficient, and the score loss is constructed based on the classification result.
Therefore, during training, the network is reversely adjusted according to the loss, so that the network has higher robustness; based on network loss, three modules in the network are optimized simultaneously, so that the training efficiency of the network can be improved; and the network is finely adjusted according to the information degree, the confidence coefficient and the classification result, so that the classification accuracy can be further improved.
Optionally, the calculating, by the discriminator, a confidence level of the training preferred region, sorting the information degree and the confidence level from high to low, and screening the training preferred region with an information degree meeting a preset condition as a training discriminant region includes:
optimizing constraints for constructing the training preferred region using a pair of ordering loss functions, wherein the construction of the pair of ordering loss functions comprises: sequencing the training preferred areas according to the information degree of the training preferred areas and numbering; establishing a non-incremental function which takes the serial number as an independent variable and the information degree as a dependent variable as a first loss function; judging whether a second loss function with the number as an independent variable and the confidence coefficient as a dependent variable is consistent with the monotonicity of the first loss function; and if not, acquiring the training preferred area from the training target area again.
Therefore, the wrong target area constraint algorithm can be removed quickly, when the monotonicity of the first loss function is inconsistent with that of the second loss function, the step of selecting the discriminant area from the target area again is directly returned, and the training efficiency is improved.
Optionally, the calculating a network loss, the network loss including a region generation loss, a discriminant loss, and a fractional loss, and the optimizing the network based on the network loss back propagation includes:
weighting the area generation loss, the discriminant loss and the fractional loss to obtain the network loss; and performing back propagation on the network loss, and optimizing the feature extraction network, the region generator, the discriminator and the scorer.
Therefore, the network and the algorithm are optimized through back propagation through network loss, and the trained network has higher accuracy and robustness.
Optionally, the calculating a network loss, the network loss including a region generation loss, a discriminant loss, and a fractional loss, and the optimizing the network based on the network loss back propagation includes:
and constructing the region generation loss through a hinge loss function, and constructing the discriminant loss and the fractional loss through a cross entropy loss function.
Therefore, loss is generated by constructing the region through the hinge loss function, the loss value can be prevented from being influenced by a wrong target region, only the loss of the correct target region is considered, interference is reduced, and training efficiency is improved; discriminant loss and fractional loss are constructed through a cross entropy loss function, the optimized result is fast in convergence, and the training efficiency can be improved.
Optionally, the extracting the information degree of each target region and screening the target regions according to the information degree to obtain at least one discriminant region includes:
extracting the information degree of each target area; based on the information degree, using non-maximum value to inhibit and screen the target area to obtain a preferred area; and sorting the preferred regions from large to small according to the information degree, and selecting a preset number of the preferred regions as the discriminant regions.
Thus, for a certain target element, the most accurate target region can be obtained as the preferred region.
Optionally, the extracting, by using the feature extraction network, the regional features of the discriminant region and the global features of the image to be detected, and performing adaptive weighted convolution transformation on each of the regional features and the global features to obtain first transformation features corresponding to each of the regional features and the global features, and combining all the first transformation features to obtain second transformation features includes:
extracting the regional characteristics of the discriminant region and the global characteristics of the image to be detected; performing adaptive weighted convolution transformation on the region feature and the global feature to obtain the first transformation feature, including: respectively performing convolution operation on each feature by using different weights, and adding a corresponding bias term into each convolution result; and carrying out vector connection on the first transformation characteristics to obtain the second transformation characteristics.
Therefore, the characteristics are subjected to self-adaptive weighted convolution transformation, and accurate transformation characteristics can be obtained.
Optionally, the sizes of the image sub-regions include at least three, which are one-twelfth, one-sixth and one-third of the short side of the image to be detected; the aspect ratio is 1: 1. 3: 2. 2: 3.
therefore, the most appropriate image sub-area can be selected as the target area based on the target element to be framed, the calculation amount is reduced, and the classification efficiency is improved.
In another aspect, the present invention further provides a computer device, which includes a computer readable storage medium storing a computer program and a processor, where the computer program is read by the processor and executed by the processor, and implements the adaptive network remote sensing image classification method as described above.
Compared with the prior art, the computer equipment has the same advantages as the self-adaptive network remote sensing image classification method, and the description is omitted.
The invention also provides a computer storage medium, which stores a computer program, and when the computer program is read and executed by a processor, the method for classifying the self-adaptive network remote sensing image is realized.
Compared with the prior art, the computer storage medium has the same advantages as the self-adaptive network remote sensing image classification method, and the description is omitted here.
Drawings
FIG. 1 is a schematic flow chart of a method for classifying an adaptive network remote sensing image according to an embodiment of the present invention;
FIG. 2 is another schematic flow chart of the adaptive network remote sensing image classification method according to the embodiment of the present invention;
FIG. 3 is a schematic flow chart of the adaptive network remote sensing image classification method according to the embodiment of the present invention after step S300 is refined;
FIG. 4 is a schematic flow chart of the adaptive network remote sensing image classification method according to the embodiment of the present invention after step S400 is refined;
FIG. 5 is a schematic diagram of a classification method for an adaptive network remote sensing image according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of the classification OA result on the AID data set processed by other algorithms according to the embodiment of the present invention;
FIG. 7 is a diagram illustrating results of processing UC Merceded data set using other algorithms according to an embodiment of the present invention;
FIG. 8 is a graphical illustration of the results of processing a NWPU dataset using other algorithms in accordance with an embodiment of the present invention;
FIG. 9 is a graph illustrating the results of processing a WHU-RS19 data set using other algorithms in accordance with an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
The invention provides a remote sensing image classification method, which comprises the following steps of referring to fig. 1:
and step S200, acquiring an image to be detected.
In an embodiment, the image to be detected comprises a remote sensing scene image, the remote sensing scene image is processed, and at least one image sub-region is selected as a target region, because the remote sensing scene image usually contains both background elements and target elements to be classified, the remote sensing image is primarily processed to obtain at least one target region.
Step S300, inputting the image to be detected into the trained region generator, extracting at least one image sub-region in the image to be detected as a target region by the region generator, extracting the information degree of each target region, and screening the target region according to the information degree to obtain at least one discriminant region.
In an embodiment, the region generator comprises a convolutional neural network.
In one embodiment, if 3 target elements are identified in the image to be detected, at least three image sub-blocks are selected as the target area.
Optionally, as shown in fig. 5, the sizes of the image sub-regions include at least three, which are one-twelfth, one-sixth, and one-third of the short side of the image to be detected; the aspect ratio is 1: 1. 3: 2. 2: 3.
in an embodiment, as shown in fig. 5, the selected area in the image is an image sub-area, the sizes of the image sub-areas are three, and the size of the target in the image to be detected is selected. Firstly, the length and the width of an image to be detected are obtained, the width of the image is used as a size reference quantity of a sub-region of the image, and the ratio of the length to the width of the sub-region of the image corresponds to the size of the sub-region of the image in a one-to-one mode. After the width of the image to be detected is obtained, a target element in the image to be detected is obtained, the length and width pixel value of the target element is judged, the target element is subjected to frame selection, and the target element is selected in a frame selection mode with the most appropriate size. For example, if the length and width of the image sub-region are 1000 and 600 pixels, respectively, then 600 pixels are used as the reference amount of the size of the image sub-region, and the target element is selected; identifying and obtaining a target element in an image to be detected, judging that the size of the target element is 150 pixels by 200 pixels at the moment, judging that the size of one tenth and one sixth cannot completely frame the target pixel into an image subregion, using one third of the width (200 pixels) of the maximum size as the size of the image subregion, and using a rectangular frame of 200 pixels by 300 pixels to select the target element because the aspect ratio of the image subregion is specified to be 2: 3.
The target areas selected by the preliminary frames may have overlapping portions, in this case, the same target element may be subjected to frame selection for multiple times, and the accuracy of the target areas selected by the frames is not the same, so that the target areas with higher quality need to be selected, and the target areas with more or less than complete frame selection of the target element need to be filtered.
For example, in an embodiment, for the same target element, four target areas frame the target element at the same time, but one of the target elements uses a smaller size and does not frame the target element completely, so that the target area has a lower information degree and is rejected; in two of the target areas, the target element does not appear in the center of the target area, so the information degree of the two target areas is low, and the last target area is selected as a discriminant area.
Optionally, if there are n target elements, at least n target regions are screened out as discriminant regions.
In an embodiment, the image to be detected has 3 target elements, and 7 target regions are selected from the frame, so that at least 3 target regions should be selected as discriminant regions in this embodiment. Specifically, when more than 3 target areas among the 7 target areas satisfy the requirement, for example, 5 target areas satisfy the condition, all of the 5 target areas are regarded as the discriminant areas.
Alternatively, referring to fig. 3, step S300 includes:
step S301, extracting the information degree of each target region.
In an embodiment, the informativeness of the target region is calculated by a convolutional neural network, the informativeness indicates the information content included in the target region, that is, the integrity of the target element is included, after the target region is selected, the informativeness in the target region is calculated, and then whether the target region selected in the step S400 is accurate or not can be screened based on the informativeness.
And step S302, based on the information degree, using a non-maximum value to inhibit and screen the target area, and obtaining a preferred area.
In an embodiment, a non-maximum suppression method is used to screen a target area, during the target detection process, a large number of candidate frames are generated at the same target position, and these candidate frames may overlap with each other, and at this time, it is necessary to find the optimal target area by using the non-maximum suppression method, and eliminate redundant bounding frames. Specifically, sorting is carried out according to the information degree scores, a boundary box with the highest information degree is selected, namely a target area with the highest information degree is selected and added to an output list; calculating the areas of all target areas; IoU (Intersection of Union ratio) of the target area with the highest information degree and other candidate target areas is calculated; target areas larger than the threshold are deleted IoU and the process is repeated until for one target element only one target area remains, with the final remaining target area being the preferred area.
Optionally, the non-maximum suppression threshold is set to 0.3.
And (4) intersection and union ratio, namely the ratio of the intersection and union of the areas of the two rectangular frames.
Step S303, sorting the preferred regions from large to small according to the information degree, and selecting a preset number of the preferred regions as the discriminant regions.
Optionally, the preset number is 4 or 6.
In an embodiment, if the preset number is 4, the plurality of preferred regions screened in step S302 are sorted from large to small according to the information degree, and the 4 preferred regions with the highest information degree are selected as discriminant regions for subsequent image classification.
Step S400, extracting the regional characteristics of the discriminant region and the global characteristics of the image to be detected by using the characteristic extraction network, performing self-adaptive weighted convolution transformation on each regional characteristic and the global characteristics to obtain each regional characteristic and first transformation characteristic corresponding to the global characteristics, and combining all the first transformation characteristics to obtain second transformation characteristics.
And extracting the regional characteristics of the discriminant region by using a characteristic extraction network to obtain respective characteristic vectors of the discriminant region. Because each region respectively selects different target elements, and the types of each target element are different, the region features extracted according to the discriminant region have different characteristics, and need to be adaptively transformed, and after adaptive transformation, all the features are connected to obtain a second transformation feature.
In an embodiment, in addition to connecting all the first transformation features, the features of the original image, that is, the features of the image to be detected, are also connected. Since the region feature is extracted from the discriminant region, the weighted first transformation feature is only the adaptive feature of all target elements, and does not include global information. The remote sensing image classification is to classify the whole image, and the characteristics of the original image contain global information, so the characteristics of the original image need to be connected to obtain a second transformation characteristic besides all the first transformation characteristics.
Alternatively, as shown in fig. 4, step S400 includes:
step S401, extracting the regional characteristics of the discriminant region and the global characteristics of the image to be detected.
Specifically, in step S400, the regional features of the discriminant region and the global features of the image to be detected are extracted first.
Step S402, performing adaptive weighted convolution transformation on the region feature and the global feature to obtain the first transformation feature, including: and respectively performing convolution operation on each feature by using different weights, and adding a corresponding bias term to each convolution result.
Adding different weights to different region characteristics, and convolving the weighted region characteristics, wherein the weighting to the region characteristics can be calculated as
Figure DEST_PATH_IMAGE001
. Wherein, denotes a convolution operation,
Figure 856481DEST_PATH_IMAGE002
the features after the transformation are represented by a graph,
Figure DEST_PATH_IMAGE003
to represent
Figure 314007DEST_PATH_IMAGE004
The corresponding weight of the weight is set to be,
Figure DEST_PATH_IMAGE005
to represent
Figure 721855DEST_PATH_IMAGE004
Corresponding bias terms, thereby, can ensure that features focused on different elements can be adaptively connected in preparation for subsequent input of fully-connected layer classification.
Step S403, performing vector connection on the first transformation features to obtain the second transformation features.
Connecting all the transformed feature vectors to obtain a second transformed feature, and calculating the second transformed feature may be expressed as
Figure 747579DEST_PATH_IMAGE006
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE007
which represents a product based on the elements of the image,
Figure 608088DEST_PATH_IMAGE008
the weight for a feature connection is represented,
Figure DEST_PATH_IMAGE009
showing a bias term for feature connection, CatF shows the feature after the connection of the first transformed feature after the feature transformation extracted from different discriminant regions is transformed, and a calculation formula is expressed as
Figure 275830DEST_PATH_IMAGE010
Cat is the join operation and F is the second transformation characteristic.
And S500, inputting the second transformation characteristic into the trained scorer to obtain a classification result.
And inputting the second transformation characteristics acquired in the step S400 into the full-connection layer for classification operation to acquire a classification result.
In one embodiment, the classification results obtained by using the categories of mine land, cultivated land, forest land, water area, road, residential land, unused land and the like as the preset classification results include the probability of obtaining the classification results, and the classification with the highest probability is selected as the classification result. For example, the probability of the land for mining is 0.9, the probability of the cultivated land is 0.6; the probability of the forest land is 0.6; the probability of a water area is 0.8; the probability of a road is 0.1; the probability of the residential land is 0.2; and if the probability of the unused land is 0.2, the classification result of the image to be detected is the mine land.
Optionally, as shown in fig. 2, before step S200, the method further includes:
step S100, acquiring a training image.
Step S110, processing the training image through the region generator, selecting at least one image sub-block in the training image as a training target region, extracting the information degree of each training target region, screening the training target region according to the information degree to obtain at least one training preferred region, and extracting the training region features of the training preferred region and the training global features in the training image by using the feature extraction network.
And acquiring a training image, and selecting at least one image sub-block as a training target area.
In one embodiment, a training target area with at least the number of target elements is obtained, and if 3 target elements are detected in the image, at least 3 training target areas are obtained.
In another embodiment, according to the size of the target element, the most suitable image sub-block size is selected from preset sizes as the training target area, specifically, the preset image sub-block sizes are the lengths of one-twelfth, one-sixth and one-third of the width of the training image, and the length-width ratio is respectively 1: 1. 3: 2. 2: 3.
and extracting the information degree of the training target area, screening the training target area meeting the requirement by using a non-maximum value inhibition mode, and screening the training target area meeting the information degree requirement as a training discriminant area.
In an embodiment, a feature extraction network is used to extract the regional features of the training discriminant regions, and also extract the global features of the training images, and each training discriminant region independently obtains its own regional feature, wherein the feature extraction network is trained before step S100, the training mode is specifically that a Momentum + random gradient descent algorithm is used to optimize the network, the initial learning rate is set to 0.01, and 0.1 is multiplied by each 10 cycles.
Optionally, the feature extraction network is one of AlexNet, VGG16, or ResNet 50.
And step S120, performing self-adaptive weighted convolution transformation and combination on each training region characteristic and each training global characteristic to obtain a fusion characteristic.
In one embodiment, different weights are added to different regional features and global features, convolution operation is performed, corresponding bias terms are added after convolution, then vector connection is performed on the transformed features, and fusion features are obtained.
Step S130, calculating the confidence of the training preferential area through the discriminator, sorting the information degree and the confidence from high to low, and screening the training preferential area with the information degree meeting the preset condition as a training discriminant area.
And step S140, scoring the fusion characteristics through the scorer.
And inputting the fusion characteristics into a full-connection layer for scoring to obtain a prediction confidence coefficient, comparing the prediction confidence coefficient with a real classification result, calculating and judging loss, and performing reverse fine adjustment on the network based on the loss.
Step S150, calculating network losses, wherein the network losses comprise area generation losses, discriminant losses and score losses, and the network is optimized based on the network loss back propagation, wherein the area generation losses are constructed based on the information degree, the discriminant losses are constructed based on the confidence degree, and the score losses are constructed based on the classification results.
And obtaining a classification result through the full connection layer, wherein the classification result comprises prediction probabilities of different preset results.
The learning network is optimized through network loss back propagation, wherein the self-adaptive discriminant region learning network comprises all algorithms and modules in the steps S110-S140, errors of results in the steps S110-S140 are calculated through network loss, the errors are back propagated to the feature extraction network, the region generator, the classification algorithm and the fractional algorithm, and therefore a more accurate algorithm is obtained according to the error optimization algorithm.
Optionally, step S150 includes:
and weighting the area generation loss, the discriminant loss and the fractional loss to obtain the network loss.
And performing back propagation on the network loss, and optimizing the feature extraction network, the region generator, the discriminator and the scorer.
Weighting the area generation loss, the discrimination loss and the fractional loss to obtain the network loss, performing back propagation on the network loss as a whole, and optimizing a feature extraction network, a target extraction algorithm, a classification algorithm and a fractional algorithm, wherein a calculation formula for performing weighting calculation on the loss and the network loss can be expressed as
Figure DEST_PATH_IMAGE011
. Wherein the content of the first and second substances,
Figure 803763DEST_PATH_IMAGE012
indicates a region generation loss,
Figure 797127DEST_PATH_IMAGE013
Indicates a discrimination loss,
Figure DEST_PATH_IMAGE014
Indicates the loss of score,
Figure 348194DEST_PATH_IMAGE015
And
Figure DEST_PATH_IMAGE016
are balance parameters.
Preferably, the first and second electrodes are formed of a metal,
Figure 554047DEST_PATH_IMAGE015
and
Figure 343012DEST_PATH_IMAGE016
are all 0.5.
Optionally, the region generation loss is constructed by a hinge loss function, and the discriminant loss and the fractional loss are constructed by a cross-entropy loss function.
For the acquisition algorithm of the target region, the cross entropy loss between the minimal real class and the prediction confidence is used for optimization.
The training target area which is wrongly constructed can be ensured to be far enough from the correct training target area through hinge loss, if the difference reaches a preset threshold value, the error of the wrongly constructed training target area can be considered as 0, otherwise, the error needs to be accumulated. Constructing the region generation penalty using the hinge penalty function may reduce the error penalty incurred by the wrong training target region, taking into account only the error incurred by the correct training target region.
The difference between the real probability distribution and the prediction probability distribution can be measured through a cross entropy loss function, and the smaller the value of the cross entropy is, the better the model prediction effect is. Wherein the discrimination loss can be expressed as
Figure 100752DEST_PATH_IMAGE017
Where M is a confidence function, of different regions
Figure DEST_PATH_IMAGE018
Mapping to original image
Figure DEST_PATH_IMAGE020
The probability corresponding to the true class, N is the number of training discriminant regions. The score loss is used for dividing the remote sensing scene into specific categories by using the original image and the extracted features of different discriminant areas, and can be expressed as
Figure 76798DEST_PATH_IMAGE021
. Wherein F is the classification result.
The final network loss function is
Figure DEST_PATH_IMAGE022
And performing reverse optimization on the network through L.
Optionally, calculating a confidence level of the training preferred region by the discriminator, sorting the information degree and the confidence level from high to low, and screening the training preferred region with an information degree meeting a preset condition as a training discriminant region includes:
optimizing constraints for constructing the training preferred region using a pair of ordering loss functions, wherein the construction of the pair of ordering loss functions comprises: sequencing the training preferred areas according to the information degree of the training preferred areas and numbering; and establishing a non-increasing function which takes the number as an independent variable and the information degree as a dependent variable as a first loss function.
The confidence coefficient is recorded as C, the information degree is recorded as I, and the area is recorded as
Figure 945397DEST_PATH_IMAGE023
Figure 588868DEST_PATH_IMAGE025
...
Figure DEST_PATH_IMAGE026
And selecting k target areas with the highest information degree through non-maximum suppression.
Pair-wise ordering penalty function optimization constraints
Figure 392876DEST_PATH_IMAGE027
And
Figure DEST_PATH_IMAGE028
so that they have the same order.
Optionally, judging whether a second loss function with the number as an independent variable and the confidence coefficient as a dependent variable is consistent with the monotonicity of the first loss function;
if not, returning to the step of extracting the information degree of the training target area, sorting the information degree from high to low and screening the training target area with the information degree meeting the preset condition as a training discriminant area to obtain the training discriminant area again.
Judging whether the monotonicity is consistent or not, namely judging
Figure 715273DEST_PATH_IMAGE029
Whether or not to meet
Figure 997350DEST_PATH_IMAGE030
Then, the information degree and the confidence degree loss are defined as
Figure 495327DEST_PATH_IMAGE031
If the monotonicity is not consistent, the process returns to step S120, and the training discriminant region is selected from the training target regions again.
In one embodiment, in order to verify the effectiveness of an ADRL-Net method based on an adaptive discriminant area learning network, the performance of the ADRL-Net method is tested on 4 remote sensing scene image data sets (AID, UC Merced, NWPU and WHU-RS 19), and meanwhile, the ADRL-Net method is compared with 10 current popular remote sensing scene classification methods.
The information for the 4 remote sensing scene image datasets is as follows:
(1) aeronautical Image Dataset (AID): the image comprises 10000 images of 30 different scene categories, each image is an RGB image with the size of 600 x 600, and the spatial resolution is different from 0.5m to 8 m.
(2) UC mercded Land Use (UC mercded Land Use dataset): 2100 images of different scene types in 21, each of which is an RGB image of size 256 × 256.
(3) NWPU-RESISC45 dataset: 31500 images of 45 different scene categories, each category containing 700 pictures, each image being an RGB image of size 256 x 256. The spatial resolution varies from 0.2m to 30 m.
(4) WHU-RS19 dataset: contains 950 images of 19 different scene categories, each category containing 700 pictures, each image being an RGB image of size 600 x 600.
The information of 10 remote sensing scene classification methods is as follows:
(1) trimmed AlexNet and VGG 16: the method replaces the full-connection layer of the convolutional neural network with the full-connection layer with the randomly initialized dimensionality as the number of the remote sensing scene categories.
(2) VGG-M: the method uses VVGnet to extract features, uses two full-connection layers to obtain final features, and then uses a linear support vector machine to obtain a classification result.
(3) BoVW: the method generates visual words from features extracted from existing convolutional neural networks.
(4) DFF: the method is a remote sensing scene classification method based on a depth feature fusion network.
(5) MSCP: the method combines a multi-layer stacked covariance pool with a pre-trained convolutional neural network.
(6) MCNN: the method solves the problem of large scale in the remote sensing scene image by using the multi-scale convolution neural network.
(7) DCNN: the method combines metric learning and a convolutional neural network to enhance discrimination capability.
(8) ARCnet: the method is an end-to-end attention cycle convolution neural network for remote sensing scene classification based on a human visual system.
(9) SCCov: the method embeds hop connectivity and covariance pooling in a MSCP network.
(10) GBNet: the method weakens and integrates multi-feature aggregation and interference information into an end-to-end remote sensing scene classification convolutional neural network.
In this experiment, we randomly generated training and test sets, and in order to reduce the effect of randomness on the results, we repeated the training test experiment 5 times and reported the mean and variance of the total accuracy (OA) results.
Analysis of results on AID dataset:
for the AID dataset we used two training test data partitioning approaches. For the first, we randomly chose 20% of the samples for training, and for the second, we chose 50% of the samples for training. Fig. 6 shows the OA results of the different algorithms. When the training ratio is 20%, the classification effect of ADRL-Net using ResNet50 and VGG16 as the backbone network is significantly better than that of other comparative algorithms. When the backbone networks are ResNet50 and VGG16, the classification OA values are 94.24% and 93.67%, respectively. When the training ratio is 50%, the classification effect of ADRL-Net using VGG16 as the backbone network is not the best, but is better than that of the DCNN method. In addition, schools, squares, villages, for example, are relatively difficult to identify scenes because many different or noisy object objects are contained in these scenes. In addition, the ADRL-Net can achieve 100% classification accuracy in the scenes such as airports, sand beaches, forests, mountains, ports, viaducts and the like.
Analysis of results on UC Merced dataset:
in this experiment, we randomly selected 50% and 80% of the samples in each category as training sets and the rest as test sets. Fig. 7 shows the classification accuracy results of different algorithms on the UC merceded data set. From the results, it can be seen that when the training sample ratio is 50%, the classification OA value of ADRL-Net using ResNet50 as the backbone network is 98.72%, which is significantly better than other comparison algorithms. The classification OA value for ADRL-Net using VGG16 as the backbone network was 97.31%.
Analysis of results on NWPU dataset:
in this experiment, we randomly selected 10% and 20% of the samples in each category as training sets and the rest as test sets. Figure 8 shows the classification accuracy results of different algorithms on NWPU datasets. As can be seen from the results, the OA values of the ADRL-Net classification are all optimal.
Analysis of results on WHU-RS19 dataset:
for the WHU-RS19 data set, we randomly selected 40% and 60% of the samples in each category as training sets, and the rest as test sets. FIG. 9 shows the classification accuracy results of different algorithms on the WHU-RS19 data set. From the results, it can be seen that although the classification OA of other comparison algorithms can reach the accuracy of more than 95%, the ADRL-Net still has effective effect improvement.
Effectiveness analysis of ADRL-Net:
to visually verify the effectiveness of ADRL-Net, we present the partial region visualization results generated by the discriminative region generator of ADRL-Net in fig. 5. As can be seen from the results, ADRL-Net can effectively extract the regions that provide valid information for a particular scene analogy.
Network convergence:
we trained the network using 10 cycles. To verify the convergence of ADRL-Net, the loss and OA values of the network over different periods on the AID data set were recorded. ADRL-Net converges around 10 cycles, and OA levels plateau around 8 cycles.
A computer device according to another embodiment of the present invention includes a computer-readable storage medium storing a computer program, and a processor, where the computer program is read by the processor and executed to implement the adaptive network remote sensing image classification method as described above.
Compared with the prior art, the computer equipment has the same advantages as the self-adaptive network remote sensing image classification method, and the description is omitted.
A computer storage medium according to another embodiment of the present invention stores a computer program, which when read and executed by a processor, implements the adaptive network remote sensing image classification method as described above.
Compared with the prior art, the computer storage medium has the same advantages as the self-adaptive network remote sensing image classification method, and the description is omitted here.
Although the present disclosure has been described above, the scope of the present disclosure is not limited thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present disclosure, and these changes and modifications are intended to be within the scope of the present disclosure.

Claims (9)

1. A self-adaptive network remote sensing image classification method is characterized in that a self-adaptive discriminant area learning network comprises a feature extraction network, an area generator, a discriminator and a scorer, and the self-adaptive network remote sensing image classification method comprises the following steps:
acquiring a training image;
processing the training image through the region generator, selecting at least one image sub-block in the training image as a training target region, extracting the information degree of each training target region, screening the training target region according to the information degree to obtain at least one training preferred region, and extracting the training region features of the training preferred region and the training global features in the training image by using the feature extraction network;
carrying out self-adaptive weighted convolution transformation and combination on each training region characteristic and the training global characteristic to obtain a fusion characteristic;
calculating the confidence coefficient of the training optimal region through the discriminator, sorting the information degree and the confidence coefficient from high to low, and screening the training optimal region with the information degree meeting a preset condition as a training discriminant region;
scoring, by the scorer, the fused features;
calculating network loss, wherein the network loss comprises area generation loss, discrimination loss and score loss, and the network is optimized based on the back propagation of the network loss, wherein the area generation loss is constructed based on the information degree, the discrimination loss is constructed based on the confidence coefficient, and the score loss is constructed based on the classification result;
acquiring an image to be detected;
inputting the image to be detected into the trained region generator, extracting at least one image subregion in the image to be detected by the region generator to serve as a target region, extracting the information degree of each target region, and screening the target region according to the information degree to obtain at least one discriminant region;
extracting regional features of the discriminant region and global features of the image to be detected by using the feature extraction network, performing self-adaptive weighted convolution transformation on each regional feature and the global features to obtain first transformation features corresponding to the regional features and the global features, combining all the first transformation features to obtain second transformation features, wherein the self-adaptive weighted convolution transformation comprises using different weights to perform convolution operation on each feature respectively, and adding a corresponding bias term into each convolution result;
and inputting the second transformation characteristic into the trained scorer to obtain a classification result.
2. The adaptive network remote sensing image classification method according to claim 1, wherein the calculating the confidence of the training preferred region through the discriminator, the ranking the information degree and the confidence from high to low and screening the training preferred region with the information degree meeting a preset condition as a training discriminant region comprises:
optimizing constraints for constructing the training preferred region using a pair of ordering loss functions, wherein the construction of the pair of ordering loss functions comprises: sequencing the training preferred areas according to the information degree of the training preferred areas and numbering; establishing a non-incremental function which takes the serial number as an independent variable and the information degree as a dependent variable as a first loss function;
judging whether a second loss function with the number as an independent variable and the confidence coefficient as a dependent variable is consistent with the monotonicity of the first loss function;
and if not, acquiring the training preferred area from the training target area again.
3. The adaptive network remote sensing image classification method according to claim 1, wherein the network loss is calculated, the network loss comprises area generation loss, discriminant loss and fractional loss, and the network back propagation optimization based on the network loss comprises:
weighting the area generation loss, the discriminant loss and the fractional loss to obtain the network loss;
and performing back propagation on the network loss, and optimizing the feature extraction network, the region generator, the discriminator and the scorer.
4. The adaptive network remote sensing image classification method according to claim 3, wherein the calculating of network losses, including area generation losses, discriminant losses, and fractional losses, and the optimizing of the network based on the network loss back propagation further comprises:
and constructing the region generation loss through a hinge loss function, and constructing the discriminant loss and the fractional loss through a cross entropy loss function.
5. The method for classifying the self-adaptive network remote sensing image according to claim 1, wherein the extracting the information degree of each target area and screening the target areas according to the information degree to obtain at least one discriminant area comprises:
extracting the information degree of each target area;
based on the information degree, using non-maximum value to inhibit and screen the target area to obtain a preferred area;
and sorting the preferred regions from large to small according to the information degree, and selecting a preset number of the preferred regions as the discriminant regions.
6. The method for classifying the self-adaptive network remote sensing image according to claim 5, wherein the step of extracting the regional features of the discriminant region and the global features of the image to be detected by using the feature extraction network, performing self-adaptive weighted convolution transformation on each regional feature and each global feature to obtain each regional feature and first transformation features corresponding to the global features, and combining all the first transformation features to obtain second transformation features comprises the steps of:
extracting the regional characteristics of the discriminant region and the global characteristics of the image to be detected;
carrying out self-adaptive weighted convolution transformation on the region characteristic and the global characteristic to obtain the first transformation characteristic;
and carrying out vector connection on the first transformation characteristics to obtain the second transformation characteristics.
7. The adaptive network remote sensing image classification method according to any one of claims 1 to 6, characterized in that the sizes of the image subareas include at least three, which are one-twelfth, one-sixth and one-third of the short side of the image to be detected; the aspect ratio is 1: 1. 3: 2. 2: 3.
8. a computer device, comprising a computer readable storage medium storing a computer program and a processor, wherein the computer program is read by the processor and when executed, implements the adaptive network remote sensing image classification method according to any one of claims 1 to 7.
9. A computer storage medium, characterized in that a computer-readable storage medium stores a computer program which, when read and executed by a processor, implements the adaptive network remote sensing image classification method according to any one of claims 1 to 7.
CN202110971318.4A 2021-08-24 2021-08-24 Self-adaptive network remote sensing image classification method, computer equipment and storage medium Active CN113420738B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110971318.4A CN113420738B (en) 2021-08-24 2021-08-24 Self-adaptive network remote sensing image classification method, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110971318.4A CN113420738B (en) 2021-08-24 2021-08-24 Self-adaptive network remote sensing image classification method, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113420738A CN113420738A (en) 2021-09-21
CN113420738B true CN113420738B (en) 2021-11-09

Family

ID=77719442

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110971318.4A Active CN113420738B (en) 2021-08-24 2021-08-24 Self-adaptive network remote sensing image classification method, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113420738B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117173172B (en) * 2023-11-02 2024-01-26 深圳市富邦新材科技有限公司 Machine vision-based silica gel molding effect detection method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845499A (en) * 2017-01-19 2017-06-13 清华大学 A kind of image object detection method semantic based on natural language
CN107665336A (en) * 2017-09-20 2018-02-06 厦门理工学院 Multi-target detection method based on Faster RCNN in intelligent refrigerator
CN108875588A (en) * 2018-05-25 2018-11-23 武汉大学 Across camera pedestrian detection tracking based on deep learning
CN110135502A (en) * 2019-05-17 2019-08-16 东南大学 A kind of image fine granularity recognition methods based on intensified learning strategy
CN110335270A (en) * 2019-07-09 2019-10-15 华北电力大学(保定) Transmission line of electricity defect inspection method based on the study of hierarchical regions Fusion Features
CN110689091A (en) * 2019-10-18 2020-01-14 中国科学技术大学 Weak supervision fine-grained object classification method
CN111914599A (en) * 2019-05-09 2020-11-10 四川大学 Fine-grained bird recognition method based on semantic information multi-layer feature fusion

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2570970A1 (en) * 2011-09-16 2013-03-20 Technische Universität Berlin Method and system for the automatic analysis of an image of a biological sample
US10748281B2 (en) * 2018-07-21 2020-08-18 International Business Machines Corporation Negative sample enhanced object detection machine

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845499A (en) * 2017-01-19 2017-06-13 清华大学 A kind of image object detection method semantic based on natural language
CN107665336A (en) * 2017-09-20 2018-02-06 厦门理工学院 Multi-target detection method based on Faster RCNN in intelligent refrigerator
CN108875588A (en) * 2018-05-25 2018-11-23 武汉大学 Across camera pedestrian detection tracking based on deep learning
CN111914599A (en) * 2019-05-09 2020-11-10 四川大学 Fine-grained bird recognition method based on semantic information multi-layer feature fusion
CN110135502A (en) * 2019-05-17 2019-08-16 东南大学 A kind of image fine granularity recognition methods based on intensified learning strategy
CN110335270A (en) * 2019-07-09 2019-10-15 华北电力大学(保定) Transmission line of electricity defect inspection method based on the study of hierarchical regions Fusion Features
CN110689091A (en) * 2019-10-18 2020-01-14 中国科学技术大学 Weak supervision fine-grained object classification method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
An Efficient Approach for Polyps Detection in Endoscopic Videos Based on Faster R-CNN;X. Mo等;《2018 24th International Conference on Pattern Recognition (ICPR)》;20180824;第3929-3934页 *
基于改进RPN的Faster-RCNN网络SAR图像车辆目标检测方法;曹磊等;《东南大学学报(自然科学版)》;20210120;第51卷(第1期);第87-90页 *

Also Published As

Publication number Publication date
CN113420738A (en) 2021-09-21

Similar Documents

Publication Publication Date Title
CN109614985B (en) Target detection method based on densely connected feature pyramid network
CN110334765B (en) Remote sensing image classification method based on attention mechanism multi-scale deep learning
US10984532B2 (en) Joint deep learning for land cover and land use classification
CN109670528B (en) Data expansion method facing pedestrian re-identification task and based on paired sample random occlusion strategy
US8233712B2 (en) Methods of segmenting a digital image
Qin et al. Saliency detection via cellular automata
Poggi et al. Supervised segmentation of remote sensing images based on a tree-structured MRF model
CN109360232B (en) Indoor scene layout estimation method and device based on condition generation countermeasure network
CN110309781B (en) House damage remote sensing identification method based on multi-scale spectrum texture self-adaptive fusion
EP1700269A2 (en) Detection of sky in digital color images
CN108805151B (en) Image classification method based on depth similarity network
CN113033520A (en) Tree nematode disease wood identification method and system based on deep learning
CN104680193B (en) Online objective classification method and system based on quick similitude network integration algorithm
CN110738132B (en) Target detection quality blind evaluation method with discriminant perception capability
CN110222767A (en) Three-dimensional point cloud classification method based on nested neural and grating map
CN113487600A (en) Characteristic enhancement scale self-adaptive sensing ship detection method
Li et al. Incorporating open source data for Bayesian classification of urban land use from VHR stereo images
CN113420738B (en) Self-adaptive network remote sensing image classification method, computer equipment and storage medium
CN110334628B (en) Outdoor monocular image depth estimation method based on structured random forest
CN114510594A (en) Traditional pattern subgraph retrieval method based on self-attention mechanism
CN107563296B (en) Method and system for extracting bedrock coast shoreline
Sjahputera et al. Clustering of detected changes in high-resolution satellite imagery using a stabilized competitive agglomeration algorithm
CN111626321A (en) Image data clustering method and device
Naeini et al. Improving the dynamic clustering of hyperspectral data based on the integration of swarm optimization and decision analysis
CN112364747B (en) Target detection method under limited sample

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant