CN113435383B

CN113435383B - Remote sensing aircraft target classification method and device based on double triple pseudo-twin framework

Info

Publication number: CN113435383B
Application number: CN202110769103.4A
Authority: CN
Inventors: 邹焕新; 曹旭; 李润林; 应昕怡; 贺诗甜; 李美霖; 成飞; 魏娟; 孙丽
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2021-07-07
Filing date: 2021-07-07
Publication date: 2022-07-29
Anticipated expiration: 2041-07-07
Also published as: CN113435383A

Abstract

The application relates to a remote sensing airplane target classification method and device based on a double-triple pseudo-twin framework. The method comprises the following steps: obtaining a sample set of remote sensing aircraft targets, the sample set comprising: the method comprises the following steps of constructing a remote sensing airplane target classification network based on a double triple pseudo-twin framework by using anchor samples, positive samples and negative samples, wherein the network comprises the following components: three pseudo-twin architecture based classification subnets; the classification sub-network comprises two classification branches consisting of a feature extraction network, a full connection layer and a classification layer; constructing a loss function; and training the network according to the anchor sample, the positive sample, the negative sample and the loss function to obtain a remote sensing airplane target classification model, and inputting the sample to be detected into the model to obtain a remote sensing airplane target classification result. The method adopts a contrast loss promotion model to distinguish the capabilities of non-homologous features; the distance of the similar targets in the feature space is shortened by adopting triple loss, the distance of the non-similar targets in the feature space is increased, and the classification precision is improved.

Description

Remote sensing aircraft target classification method and device based on double triple pseudo-twin framework

Technical Field

The application relates to the technical field of image processing, in particular to a remote sensing aircraft target classification method and device based on a double triple pseudo-twin framework.

Background

With the rapid development of remote sensing technology, the resolution of remote sensing images is gradually improved, and the space and texture information of the images is more and more abundant. The remote sensing airplane target fine identification is a popular research topic in the remote sensing field in recent years. The main process is to classify according to the target characteristics extracted by the network. How to effectively extract target characteristic information and distinguish different types of targets with small differences is a key of remote sensing airplane target identification. Because the sizes and the shapes of the targets of different types of airplanes are similar, the characteristic difference among the types is small; the airplanes of the same model are influenced by different opening angles and shadows of the variable swept wings, and the characteristics in the same model are different greatly, so that the airplanes are difficult to distinguish.

Yann et al propose a twin architecture in which the structure of the two sub-networks is the same and the parameters are shared, using the two images as input and measuring the contrast loss by pixel-wise similarity (e.g., euclidean distance). The twin architecture can extract the difference between two input images and determine whether they belong to the same class. If two images belong to the same category, the distance learned by the architecture should be small, and vice versa.

The existing twin-architecture-based classification model has achieved more advanced performance, but the precision of fine-grained identification is not high.

Disclosure of Invention

Based on the technical problem, a remote sensing airplane target classification method and device based on a double-triple pseudo-twin architecture are needed.

A remote sensing aircraft target classification method based on a double-triple pseudo-twin architecture comprises the following steps:

obtaining a sample set of remotely sensed aircraft targets, the sample set comprising: anchor, positive and negative examples; the anchor sample and the positive sample belong to the same category, and the anchor sample and the negative sample belong to different categories.

Constructing a remote sensing airplane target classification network based on a double triple pseudo-twin framework; the remote sensing aircraft target classification network comprises three classification subnetworks based on a pseudo-twin framework; the classification sub-network based on the pseudo-twin framework comprises two classification branches consisting of a feature extraction network, a full connection layer and a classification layer; the feature extraction networks in the two classification branches are two convolutional neural networks with unshared weights, and the feature extraction networks are used for extracting features of input samples.

Constructing a loss function; the loss function comprises classification loss, contrast loss and triple loss; the triplet loss is determined according to the distance between the features of the anchor sample and the features of the positive sample, the distance between the features of the anchor sample and the features of the negative sample, and a preset distance threshold.

And respectively inputting the anchor sample, the positive sample and the negative sample into the remote sensing airplane target classification network, and training the remote sensing airplane target classification network according to the loss function to obtain a trained remote sensing airplane target classification model.

And acquiring a to-be-detected anchor sample, a to-be-detected positive sample and a to-be-detected negative sample of the to-be-detected remote sensing airplane target, and inputting the to-be-detected positive sample and the to-be-detected negative sample into the remote sensing airplane target classification model to obtain a remote sensing airplane target classification result.

According to the method and the device for classifying the remote sensing airplane target based on the double triple pseudo-twin framework, a sample set of the remote sensing airplane target is obtained, and the sample set comprises the following steps: the method comprises the following steps of constructing a remote sensing airplane target classification network based on a double triple pseudo-twin framework by using anchor samples, positive samples and negative samples, wherein the network comprises the following components: three pseudo-twin architecture based classification subnets; the classification sub-network comprises two classification branches consisting of a feature extraction network, a full connection layer and a classification layer, wherein the feature extraction network in the two classification branches is a convolutional neural network with two weights not shared; and constructing a loss function, training a remote sensing airplane target classification network according to the anchor sample, the positive sample, the negative sample and the loss function to obtain a remote sensing airplane target classification model, and inputting the sample to be detected into the remote sensing airplane target classification model to obtain a remote sensing airplane target classification result. The method adopts a contrast loss promotion model to distinguish the capabilities of non-homologous features; the distance of the similar targets in the feature space is shortened by adopting triple loss, the distance of the non-similar targets in the feature space is increased, and the classification precision is improved.

Drawings

FIG. 1 is a schematic flow chart of a remote sensing aircraft target classification method based on a double triple pseudo-twin architecture in one embodiment;

FIG. 2 is a schematic diagram of a remote sensing aircraft target classification network training process based on a double-triple pseudo-twin architecture in another embodiment;

FIG. 3 is a schematic diagram of a remote sensing aircraft target classification network testing process based on a double-triple pseudo-twin architecture in another embodiment;

fig. 4 is a structural block diagram of a remote sensing aircraft target classification device based on a double-triple pseudo-twin architecture in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In one embodiment, as shown in fig. 1, a remote sensing aircraft target classification method based on a double triple pseudo-twin architecture is provided, and the method comprises the following steps:

step 100: and acquiring a sample set of the remote sensing airplane target.

The sample set includes: anchor, positive and negative examples; the anchor sample and the positive sample belong to the same category, and the anchor sample and the negative sample belong to different categories.

The sample set of the remote sensing airplane target is obtained by processing the image of the remote sensing airplane target.

The anchor sample, the positive sample and the negative sample are taken as a triplet input. The anchor, positive and negative samples each include a pattern and a mask, so the triplet input includes: an image triplet and a mask triplet.

Step 102: and constructing a remote sensing airplane target classification network based on a double triple pseudo-twin framework.

The remote sensing airplane target classification network comprises three classification sub-networks based on a pseudo-twin architecture.

The classification sub-network based on the pseudo-twin framework comprises two classification branches consisting of a feature extraction network, a full connection layer and a classification layer.

The feature extraction networks in the two classification branches are two convolutional neural networks with unshared weights, and the feature extraction networks are used for extracting image features and mask features of input samples.

Step 104: a loss function is constructed.

The loss functions include classification loss, contrast loss, and triplet loss.

The triplet loss is determined according to the distance between the features of the anchor sample and the features of the positive sample, the distance between the features of the anchor sample and the features of the negative sample, and a preset distance threshold.

And the classification loss function trains the remote sensing airplane classification network in a cross entropy loss mode.

The contrast loss is the euclidean distance between the image features and the mask features of the input samples extracted by the two feature extraction networks in the classification subnet.

Step 106: and respectively inputting the anchor sample, the positive sample and the negative sample into a remote sensing airplane target classification network, and training the remote sensing airplane target classification network according to the loss function to obtain a trained remote sensing airplane target classification model.

Triplet loss functions are used to supervise the triplet network training to reduce feature distances of the same class and increase feature distances of different classes.

And taking the anchor sample, the positive sample and the negative sample as a triple input, inputting the triple input into a remote sensing airplane target classification network, and training the remote sensing airplane target classification network according to a loss function to obtain a trained remote sensing airplane target classification model.

Step 108: and acquiring a to-be-detected anchor sample, a to-be-detected positive sample and a to-be-detected negative sample of the to-be-detected remote sensing airplane target, and inputting the to-be-detected positive sample and the to-be-detected negative sample into the remote sensing airplane target classification model to obtain a remote sensing airplane target classification result.

In the method for classifying the remote sensing aircraft target based on the double triple pseudo-twin framework, a sample set of the remote sensing aircraft target is obtained, and the sample set comprises the following steps: the method comprises the following steps of constructing a remote sensing airplane target classification network based on a double triple pseudo-twin framework by using anchor samples, positive samples and negative samples, wherein the network comprises the following components: three pseudo-twin architecture based classification subnets; the classification sub-network comprises two classification branches consisting of a feature extraction network, a full connection layer and a classification layer. The feature extraction networks in the two classification branches are two convolutional neural networks with unshared weights; and constructing a loss function, training a remote sensing airplane target classification network according to the anchor sample, the positive sample, the negative sample and the loss function to obtain a remote sensing airplane target classification model, and inputting the sample to be detected into the remote sensing airplane target classification model to obtain a remote sensing airplane target classification result. The method adopts a contrast loss promotion model to distinguish the capabilities of non-homologous features; the distance of the similar targets in the feature space is shortened by adopting triple loss, the distance of the non-similar targets in the feature space is increased, and the classification precision is improved.

In one embodiment, step 104 further comprises: carrying out weighted summation on the classification loss, the contrast loss and the triple loss to obtain a total loss; the loss function for the total loss is expressed as:

Loss＝w ₁ *L _cls +w ₂ *L _contrastive +w ₃ *L _triplet (1)

Wherein: loss, L _cls Represents a classification loss, w ₁ ,w ₂ ,w ₃ Represents a loss weight, preferably a loss weight w ₁ ,w ₂ ,w ₃ Defaults to 1; l is _triplet Representing a triplet penalty; the functional expression of the triple penalty is:

wherein:

and

respectively, a first group of anchor samples, positive samples and negative samples, f (·) represents a feature map extracted by a feature extraction network, α ═ 1 represents a distance threshold, and N represents the number of groups of anchor samples, positive samples and negative samples. Preferably, the distance threshold α is 1.

The classification loss function expression is:

wherein: y represents the result of the classification prediction,

a true label is shown.

The expression of the comparative loss function is:

wherein D _W Representing the Euclidean distance L between two characteristic variables _contrastive Representing contrast loss

(1) Construction principle of contrast loss function:

the contrast loss is the core of the underlying twin network, and its essence is to calculate the distance between two variables (i.e. the features extracted by the feature extraction network), the distance measure is usually the Euclidean distance, and the contrast loss function is calculated as follows:

where W represents the weight of the network,

representing two variables

The Euclidean distance between the two parts,

y represents the matching relationship between the first input and the second input, i.e. if the input is homogeneous, Y is 0, and the loss function is simplified to

If not, Y is 1, and the loss function is simplified to

However, the input samples are the target image and the mask, and samples belonging to different domains, and in order to further improve the feature extraction capability of the CNN, Y is set to 1. L is _D ,L _S Default to 0.5 for the control constant; i means power meaning and 2 means squared euclidean distance by default.

Although the comparison loss function hopes that the larger the distance between different classes of samples is, the better the distance is, the infinite the distance is impossible to be, thus the training of the network is damaged, at the moment, a distance threshold value m needs to be added, when the distance calculated between different classes exceeds the threshold value m, the network is considered to be trained well, the comparison loss is supposed to be 0, and in the invention, the threshold value m is 20. In summary, the expression of the contrast loss function in the present invention is shown in formula (4).

(2) Triple loss function construction principle

Triple loss requires three inputs including anchor samples, positive samples having the same class label as the anchor samples, and negative samples having different class labels from the anchor samples. The training aim is to minimize the distance between the anchor point and the positive samples with the same category, to enable the features of the same label to be as close as possible in the feature space so as to reduce the intra-category distance, and to maximize the distance between the anchor point and the negative samples with different categories, to enable the features of different labels to be as far away as possible in the feature space so as to increase the inter-category distance, so that the classifier can classify conveniently. The loss function needs to satisfy the following equation:

Wherein

And

respectively, anchor, positive and negative samples, f (.) the last level feature map extracted, and a the distance threshold, set herein to 1. The above relation indicates that the network will pull the characteristic distance between the anchor sample and the positive sample, and the added distance threshold is still smaller than the distance between the anchor sample and the negative sample, that is, the network will pull the characteristic distance between the anchor sample and the negative sample. The expression of the triplet loss function is shown in formula (2).

In one embodiment, the anchor samples, the positive samples, and the negative samples each include: an image sample and a mask sample; the full connection layer of the classification subnet comprises two parallel full connection layers; the classification layer of the classification subnet comprises two parallel classification layers; step 106 further comprises: forming an image triple by the image sample of the anchor sample, the image sample of the positive sample and the image sample of the negative sample; combining the mask samples of the anchor samples, the mask samples of the positive samples and the mask samples of the negative samples into mask triples; inputting the image triple and the mask triple into a feature extraction network of three classification subnets of a remote sensing airplane target classification network to obtain triple image features and triple mask features; the triplet image characteristics include: anchor sample image features, positive sample image features, and negative sample image features; the triplet mask features include: an anchor sample mask feature, a positive sample mask feature, and a negative sample mask feature; inputting the triple image characteristics and the triple mask characteristics into a full-connection layer network of three classification subnets of the remote sensing aircraft target classification network, and inputting the obtained output into classification layers of the three classification subnets of the remote sensing aircraft target classification network to obtain image triple prediction classification and mask triple prediction classification; obtaining an image triple classification loss according to the image triple prediction classification and the real classification of the correspondingly input image triple, and obtaining a mask triple classification loss according to the mask triple prediction classification and the real classification of the correspondingly input mask triple; obtaining an anchor sample contrast loss according to the anchor sample image characteristic and the anchor sample mask characteristic, obtaining a positive sample contrast loss according to the positive sample image characteristic and the positive sample mask characteristic, and obtaining a negative sample contrast loss according to the negative sample image characteristic and the negative sample mask characteristic; obtaining image triple loss according to the anchor sample image characteristics, the positive sample image characteristics and the negative sample image characteristics; obtaining mask triple loss according to the anchor sample mask features, the positive sample mask features and the negative sample mask features; carrying out weighted fusion on the classification loss of the image triples, the classification loss of the mask triples, the anchor sample contrast loss, the positive sample contrast loss, the negative sample contrast loss, the image triples and the mask triples to obtain a total loss; and carrying out reverse training on the remote sensing airplane target classification network according to the total loss until the total loss meets a preset condition or the number of training rounds reaches a preset value, and finishing training to obtain a trained remote sensing airplane target classification model.

In another embodiment, a process for training a remotely sensed aircraft target classification network is shown in FIG. 2. The network inputs are in the form of triplets, namely anchor samples, positive samples and negative samples. The image triple consists of three target images, and the mask triple consists of three mask images; the anchor sample is the same as the positive sample and is not the same as the negative sample, the anchor sample is input into a network for training in a triple form, and the network is helped to distinguish the tiny characteristic difference between different classes by matching with the triple loss, namely the intra-class distance is reduced and the inter-class distance is increased; CNN1 is a different feature extraction network than CNN2, where CNN1 shares weights and CNN2 shares weights; CNN selects Resnet as a feature extraction network; the last-stage characteristic diagram respectively passes through an FC layer and Softmax to obtain classification prediction, and classification loss calculation is carried out on the classification prediction and a corresponding true value; meanwhile, the contrast loss is calculated between the last-stage feature maps (images and masks) of the same sample, and the contrast loss is beneficial to the network to more pertinently extract image features by using mask information, namely the mask guides the extraction of network features; and finally, calculating the triad loss between every three of the final-stage image feature map and the mask feature map of different samples.

In one embodiment, step 106 further comprises: inputting image triples into a feature extraction network of a first classification branch of three classification subnets of a remote sensing aircraft target classification network to obtain triple image features, wherein the triple image features comprise anchor sample image features, positive sample image features and negative sample image features; and inputting the mask triple into a feature extraction network of a second classification branch of three classification subnets of a remote sensing airplane target classification network to obtain triple mask features, wherein the triple mask features comprise anchor sample mask features, positive sample mask features and negative sample mask features.

In one embodiment, the image triplet prediction classification includes: the method comprises the steps of image anchor sample prediction classification, image positive sample prediction classification and image negative sample prediction classification; the mask triplet prediction classification includes: a mask anchor sample prediction classification, a mask positive sample prediction classification, and a mask negative sample prediction classification. Step 106 further comprises: inputting the image characteristics of the anchor sample into a full-connection layer of a first classification branch of a first classification subnet, inputting the obtained characteristics into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain the prediction classification of the image anchor sample; inputting the mask characteristics of the anchor samples into a full-connection layer of a second classification branch of the first classification subnet, inputting the obtained characteristics into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain the prediction classification of the anchor samples of the masks; inputting the image characteristics of the positive sample into a full-connection layer of a first classification branch of a second classification subnet, inputting the obtained characteristics into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain the prediction classification of the image positive sample; inputting the mask characteristics of the positive sample into a full-connection layer of a second classification branch of a second classification subnet, inputting the obtained characteristics into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain the prediction classification of the positive sample of the mask; inputting the image characteristics of the negative sample into a full-connection layer of a first classification branch of a third classification subnet, inputting the obtained characteristics into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain the prediction classification of the image negative sample; inputting the mask characteristics of the negative samples into the full-connection layer of the second classification branch of the third classification sub-network, inputting the obtained characteristics into the classification layer, and classifying by adopting a Softmax logistic regression model to obtain the prediction classification of the negative samples of the mask.

In one embodiment, the anchor sample contrast loss, the positive sample contrast loss and the negative sample contrast loss are calculated by using the same contrast loss function; the classification loss of the image triples and the classification loss of the mask triples are calculated by adopting the same classification loss function; the image triplet loss and the mask triplet loss are calculated using the same triplet loss function.

In one embodiment, the classification penalty for image triples includes an anchor sample pattern classification penalty, a positive sample image classification penalty, and a negative sample image classification penalty; the classification loss of the mask triples comprises an anchor sample mask classification loss, a positive sample mask classification loss and a negative sample mask classification loss; step 106 further comprises: adding the anchor sample pattern classification loss, the positive sample image classification loss, the negative sample image classification loss, the anchor sample mask classification loss, the positive sample mask classification loss and the negative sample mask classification loss to obtain the classification loss; adding the anchor sample contrast loss, the positive sample contrast loss and the negative sample contrast loss to obtain contrast loss; adding the image triplet loss and the mask triplet loss to obtain a triplet loss; and performing weighted fusion on the classification loss, the contrast loss and the triple loss to obtain the total loss.

In one embodiment, step 106 is followed by: acquiring a test sample, wherein the test sample is a pair of samples consisting of an image and a mask; inputting the image into a feature extraction network of a first classification branch of a first sub-network of a remote sensing aircraft target classification model to obtain the feature of a test image; inputting the characteristics of the test image into a full-connection layer of a first classification branch of a first sub-network of a remote sensing airplane target classification model, inputting the obtained output characteristics into the classification layer, and classifying by adopting a Softmax logistic regression model to obtain image classification prediction; measuring the distance between the test image characteristics and the predefined standard template characteristics, inputting the obtained measurement result into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain image contrast prediction; obtaining a mask classification prediction and a mask comparison prediction from a mask and a second classification branch of a first sub-network of a remote sensing airplane target classification model; according to the image classification prediction, the mask classification prediction, the image comparison prediction and the mask comparison prediction, a judgment fusion method is adopted for fusion to obtain a final prediction result of the first sub-network; and testing the second sub-network and the third sub-network by adopting the same steps as the first sub-network to obtain the final prediction result of the second sub-network and the final prediction result of the third sub-network.

In another embodiment, a test procedure for remotely sensing an aircraft object classification model is shown in FIG. 3. The network takes as input a pair of samples consisting of an image and a mask. The image is input into the trained CNN1 to get the final level feature map, and then the extracted features are sent to two branches: 1) sending the extracted features to the FC layer and the Softmax layer to generate a class prediction; 2) the distance between the extracted features and the pre-defined standard template features is measured and the result is input into the Softmax layer to generate a comparative prediction.

The pre-defined standard template construction method comprises the following steps: the 5 samples for each category were randomly selected in the training dataset and input into CNN1 for feature extraction.

The euclidean distance between the extracted features and the predefined standard template features is measured. Since each class of target contains 5 standard samples, it is necessary to average the distance by a factor of 5 to obtain an average distance. The inverse of the average distance is then input into Softmax to generate a contrast prediction.

The classified prediction and contrast prediction process of the mask is consistent with the image prediction process; the network can obtain 2 classified predictions and 2 comparative predictions, 4 prediction results are fused by adopting a discrimination fusion method to obtain a final prediction result, and the prediction precision is improved.

Preferably, the discriminative fusion method employs DS evidence theory. The DS (Dempster-Shafer) evidence theory is an imprecise reasoning theory proposed and perfected by Dempster and Shafer, has the ability to handle uncertain information, and is widely used in expert systems. The required prior data is more intuitive and easier to obtain than that in a probability inference theory, meets the weaker condition than the Bayes probability theory, and can fuse various data and knowledge.

In one embodiment, step 106 is followed by: measuring the Euclidean distance between the test image characteristics and the predefined standard template characteristics to obtain a measurement result; and inputting the measurement result into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain image contrast prediction.

It should be understood that, although the steps in the flowchart of fig. 1 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 1 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 4, there is provided a remote sensing aircraft target classification device based on a double triple pseudo-twin architecture, the device comprising: the system comprises a triple sample acquisition module, a remote sensing airplane target classification network construction module, a loss function construction module, a remote sensing airplane target classification network training module and a remote sensing airplane target classification module, wherein:

the triple sample acquisition module is used for acquiring a sample set of a remote sensing airplane target, and the sample set comprises: anchor, positive and negative examples; the anchor sample and the positive sample belong to the same category, and the anchor sample and the negative sample belong to different categories.

The remote sensing aircraft target classification network construction module is used for constructing a remote sensing aircraft target classification network based on a double triple pseudo-twin framework; the remote sensing aircraft target classification network comprises three classification subnetworks based on a pseudo-twin framework; the classification sub-network based on the pseudo-twin framework comprises two classification branches consisting of a feature extraction network, a full connection layer and a classification layer; the feature extraction networks in the two classification branches are two convolutional neural networks with unshared weights, and the feature extraction networks are used for extracting features of input samples.

The loss function constructing module is used for constructing a loss function; the loss function comprises classification loss, contrast loss and triple loss; the triplet loss is determined according to the distance between the features of the anchor sample and the features of the positive sample, the distance between the features of the anchor sample and the features of the negative sample, and a preset distance threshold.

And the remote sensing airplane target classification network training module is used for respectively inputting the anchor sample, the positive sample and the negative sample into the remote sensing airplane target classification network, and training the remote sensing airplane target classification network according to the loss function to obtain a trained remote sensing airplane target classification model.

And the remote sensing airplane target classification module is used for obtaining an anchor sample to be detected, a positive sample to be detected and a negative sample to be detected of the remote sensing airplane target to be detected, and inputting the anchor sample, the positive sample to be detected and the negative sample to be detected into the remote sensing airplane target classification model to obtain a remote sensing airplane target classification result.

In one embodiment, the loss function constructing module is further configured to perform weighted summation on the classification loss, the contrast loss, and the triplet loss to obtain a total loss; the loss function for the total loss is expressed as:

Loss＝w ₁ *L _cls +w ₂ *L _contrastive +w ₃ *L _triplet

wherein: loss, L _cls Represents a classification loss, L _contrastive Denotes loss of contrast, w ₁ ,w ₂ ,w ₃ Represents the loss weight, L _triplet Representing the triple loss, the functional expression of the triple loss is as follows:

wherein:

and

respectively representing a first group of anchor samples, positive samples and negative samples, f (.) representing a feature map extracted by a feature extraction network, alpha representing a distance threshold, and N representing the group number of the anchor samples, the positive samples and the negative samples.

In one embodiment, the anchor samples, the positive samples, and the negative samples each include: an image sample and a mask sample; the full connection layer of the classification subnet comprises two parallel full connection layers; the classification layer of the classification subnet comprises two parallel classification layers; the remote sensing aircraft target classification network training module is also used for forming an image triple by the image sample of the anchor sample, the image sample of the positive sample and the image sample of the negative sample; combining the mask samples of the anchor samples, the mask samples of the positive samples and the mask samples of the negative samples into mask triples; inputting the image triple and the mask triple into a feature extraction network of three classification subnets of a remote sensing airplane target classification network to obtain triple image features and triple mask features; the triplet image characteristics include: anchor sample image features, positive sample image features, and negative sample image features; the triplet mask features include: an anchor sample mask feature, a positive sample mask feature, and a negative sample mask feature; inputting the triple image characteristics and the triple mask characteristics into a full-connection layer network of three classification subnets of the remote sensing aircraft target classification network, and inputting the obtained output into classification layers of the three classification subnets of the remote sensing aircraft target classification network to obtain image triple prediction classification and mask triple prediction classification; obtaining an image triple classification loss according to the image triple prediction classification and the real classification of the correspondingly input image triple, and obtaining a mask triple classification loss according to the mask triple prediction classification and the real classification of the correspondingly input mask triple; obtaining an anchor sample contrast loss according to the anchor sample image characteristic and the anchor sample mask characteristic, obtaining a positive sample contrast loss according to the positive sample image characteristic and the positive sample mask characteristic, and obtaining a negative sample contrast loss according to the negative sample image characteristic and the negative sample mask characteristic; obtaining image triple loss according to the anchor sample image characteristics, the positive sample image characteristics and the negative sample image characteristics; obtaining mask triple loss according to the anchor sample mask features, the positive sample mask features and the negative sample mask features; carrying out weighted fusion on the classification loss of the image triples, the classification loss of the mask triples, the anchor sample contrast loss, the positive sample contrast loss, the negative sample contrast loss, the image triples and the mask triples to obtain a total loss; and carrying out reverse training on the remote sensing airplane target classification network according to the total loss until the total loss meets a preset condition or the number of training rounds reaches a preset value, and finishing training to obtain a trained remote sensing airplane target classification model.

In one embodiment, the remote sensing aircraft target classification network training module is further configured to input the image triplet into a feature extraction network of a first classification branch of three classification subnets of the remote sensing aircraft target classification network to obtain triplet image features, wherein the triplet image features include an anchor sample image feature, a positive sample image feature and a negative sample image feature; and inputting the mask triple into a feature extraction network of a second classification branch of three classification subnets of a remote sensing airplane target classification network to obtain triple mask features, wherein the triple mask features comprise anchor sample mask features, positive sample mask features and negative sample mask features.

In one embodiment, the image triplet prediction classification includes: the method comprises the steps of image anchor sample prediction classification, image positive sample prediction classification and image negative sample prediction classification; the mask triplet prediction classification includes: a mask anchor sample prediction classification, a mask positive sample prediction classification, and a mask negative sample prediction classification. The remote sensing aircraft target classification network training module is also used for inputting the image characteristics of the anchor sample into the full-link layer of the first classification branch of the first classification subnet, inputting the obtained characteristics into the classification layer, and classifying by adopting a Softmax logistic regression model to obtain the prediction classification of the image anchor sample; inputting the mask characteristics of the anchor samples into a full-connection layer of a second classification branch of the first classification subnet, inputting the obtained characteristics into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain the prediction classification of the anchor samples of the masks; inputting the image characteristics of the positive sample into a full-connection layer of a first classification branch of a second classification subnet, inputting the obtained characteristics into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain the prediction classification of the image positive sample; inputting the mask characteristics of the positive sample into a full-connection layer of a second classification branch of a second classification subnet, inputting the obtained characteristics into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain the prediction classification of the positive sample of the mask; inputting the image characteristics of the negative sample into a full-connection layer of a first classification branch of a third classification subnet, inputting the obtained characteristics into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain the prediction classification of the image negative sample; inputting the mask characteristics of the negative samples into the full-connection layer of the second classification branch of the third classification sub-network, inputting the obtained characteristics into the classification layer, and classifying by adopting a Softmax logistic regression model to obtain the prediction classification of the negative samples of the mask.

In one embodiment, the anchor sample contrast loss, the positive sample contrast loss and the negative sample contrast loss are calculated by using the same contrast loss function; the classification loss of the image triples and the classification loss of the mask triples are obtained by adopting the same classification loss function; the image triplet loss and the mask triplet loss are calculated by adopting the same triplet loss function.

In one embodiment, the classification penalty for image triples includes an anchor sample pattern classification penalty, a positive sample image classification penalty, and a negative sample image classification penalty; the classification loss of the mask triples comprises an anchor sample mask classification loss, a positive sample mask classification loss and a negative sample mask classification loss; the remote sensing airplane target classification network training module is also used for adding the anchor sample pattern classification loss, the positive sample image classification loss, the negative sample image classification loss, the anchor sample mask classification loss, the positive sample mask classification loss and the negative sample mask classification loss to obtain the classification loss; adding the anchor sample contrast loss, the positive sample contrast loss and the negative sample contrast loss to obtain contrast loss; adding the image triplet loss and the mask triplet loss to obtain a triplet loss; and performing weighted fusion on the classification loss, the contrast loss and the triple loss to obtain the total loss.

In one embodiment, the remote sensing aircraft target classification network training module further comprises a network testing module used for obtaining a testing sample, wherein the testing sample is a pair of samples formed by an image and a mask; inputting the image into a feature extraction network of a first classification branch of a first sub-network of a remote sensing aircraft target classification model to obtain the feature of a test image; inputting the characteristics of the test image into a full-connection layer of a first classification branch of a first sub-network of a remote sensing airplane target classification model, inputting the obtained output characteristics into the classification layer, and classifying by adopting a Softmax logistic regression model to obtain image classification prediction; measuring the distance between the test image characteristics and the predefined standard template characteristics, inputting the obtained measurement result into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain image contrast prediction; obtaining a mask classification prediction and a mask comparison prediction in a second classification branch of a first sub-network of the mask and a remote sensing airplane target classification model; according to the image classification prediction, the mask classification prediction, the image comparison prediction and the mask comparison prediction, a judgment fusion method is adopted for fusion to obtain a final prediction result of the first sub-network; and testing the second sub-network and the third sub-network by adopting the same steps as the first sub-network to obtain the final prediction result of the second sub-network and the final prediction result of the third sub-network.

In one embodiment, the network test module is further configured to measure a euclidean distance between the test image feature and a predefined standard template feature to obtain a measurement result; and inputting the measurement result into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain image contrast prediction.

For specific limitations of the remote sensing aircraft target classification device based on the double triple pseudo-twin architecture, reference may be made to the above limitations of the remote sensing aircraft target classification method based on the double triple pseudo-twin architecture, and details are not repeated here. The modules in the remote sensing airplane target classification device based on the double-triple pseudo-twin architecture can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In a verification embodiment, a remote sensing airplane target data set is an input sample, and the aspects of the invention are adopted to carry out a plurality of groups of experiments to carry out function and performance verification, wherein the results are as follows:

(1) Classification performance comparison experiment with other algorithms

The remote sensing airplane target classification based on the double-triple pseudo-twin framework is compared with other 6 classification networks with twin structures (namely AlexNet, Simese AlexNet, VGG-16, Simese VGG-16, ResNet-50 and Simese ResNet-50). The results of the experiment are shown in table 1.

Table 1 experimental results obtained by different methods

Wherein: cls pred represents the initial classification precision, and DS pred represents the final classification precision after the DS criterion fusion.

The results of the experiment are shown in table 1. The remote sensing aircraft target classification network based on the double triple pseudo-twin architecture achieves the best precision on a data set, and compared with other twin architectures, classification prediction is increased by 7% on average, and DS prediction is increased by 12% on average. The remote sensing aircraft target classification network based on the double-triple pseudo-twin architecture can solve the problem of mask training overfitting by using the pseudo-twin architecture, and meanwhile, improves image classification by using mask information in an auxiliary mode. In addition, the remote sensing aircraft target classification network based on the double triple pseudo-twin framework adopts a contrast loss improvement model to distinguish the capability of non-homologous features; and the distance of the similar target in the characteristic space is shortened by adopting the triple loss, and the distance of the non-similar target in the characteristic space is increased. And finally, the remote sensing aircraft target classification network based on the double triple pseudo-twin framework further improves the classification precision by using contrast prediction and discrimination fusion.

(2) Ablation experiment

The invention introduces different training strategies to research, and potential influence brought by a backbone network and a system structure. The invention compares the influence of three architectures on network training, including: (1) the twin architecture (3) of the classification network (2) is directly used for pseudo-twin architecture. A convolutional neural network Resnet was used as a feature extraction network under each architecture. Specifically, the method comprises the following steps: (1) the Resnet50 was used directly for target classification as baseline for this experiment; (2) using Resnet50 as a feature extraction network in a twin architecture; (3) resnet50 was used as the image feature extraction network (CNN1) and Resnet18 was used as the mask feature extraction network (CNN2) in the pseudo-twin architecture. Then, after model determination, ablation experiments were performed on whether mask information, contrast loss, triplet loss, and contrast prediction were used, respectively. Finally, 4 prediction results are used for distinguishing and fusing. Note that the experimental '/' representation model does not have some capability. The results of ablation experiments with the networks in table 1 obtained with different modification strategies are shown in table 2.

Table 2 ablation experimental results

Wherein: cls Pred represents the initial classification precision, Mask Pred represents the initial Mask classification precision, and DS Pred represents the final classification precision after the DS criterion fusion.

Images were classified directly using Resnet50 as the classification network with an accuracy of 90.35%. The model is used as baseline for image classification of the ablation experiment. After comparison prediction with a standard image template and use of discriminant fusion, the precision is 90.51%, which is increased by 0.16%. And discrimination fusion is carried out by using the homologous prediction result, and the precision improvement is limited. Masks were classified directly using Resnet50 with an accuracy of 40.33%. The model is used as baseline for ablation experimental mask classification. Since the mask is a single-channel image and has no background interference, the model has an overfitting problem, and the classification precision is low. This is also why Resnet18 is chosen for the mask feature extraction network in the pseudo-twin architecture herein. In the twin architecture, Resnet50 is used as a feature extraction network for images and masks. By directly classifying the image and the mask by using the framework, the image classification precision is 88.71%, the mask classification precision is 38.25%, and the image classification precision is reduced by 1.64% and 2.08% respectively compared with baseline. Due to weight sharing between twin fabric subnetworks, it is difficult for the network to learn the difference information of non-homogeneous samples if only classification loss is used. After adding contrast loss, the image and mask classification accuracy is improved by 2.39% and 13.09%, respectively. The contrast loss helps the model to learn the differences of the non-homologous samples, with gains for their respective classifications. With further triple loss, the classification accuracy increased by 2.19% and 4.58%, respectively. Meanwhile, judgment and fusion are carried out on the prediction results, and the network is favorable for improving the classification performance. To further solve the mask branch overfitting problem, the present embodiment employs a pseudo-twin architecture in which sub-network weights are not shared. And uses Resnet50 as the image feature extraction network and Resnet18 as the mask feature extraction network. When only classification loss is used, the image and mask classification accuracy reaches 94.07% and 81.31% respectively, and is increased by 3.72% and 40.98% respectively compared with baseline. The mask training overfitting problem is substantially alleviated. After the contrast loss, the triple loss and the discrimination fusion are used, the classification precision of the model is further improved. The final model image classification precision can reach 95.67%, and the mask classification precision can reach 85.43%. Meanwhile, the image marking template and the mask standard template are used for comparison prediction and judgment fusion, and the classification precision can reach 97.28%.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A remote sensing aircraft target classification method based on a double-triple pseudo-twin framework is characterized by comprising the following steps:

obtaining a sample set of remotely sensed aircraft targets, the sample set comprising: anchor, positive and negative examples; the anchor sample and the positive sample belong to the same category, and the anchor sample and the negative sample belong to different categories;

constructing a remote sensing airplane target classification network based on a double triple pseudo-twin framework; the remote sensing aircraft target classification network comprises three classification subnetworks based on a pseudo-twin framework; the classification sub-network based on the pseudo-twin framework comprises two classification branches consisting of a feature extraction network, a full connection layer and a classification layer; the feature extraction networks in the two classification branches are two convolutional neural networks with unshared weights, and the feature extraction networks are used for extracting features of input samples;

Constructing a loss function; the loss function comprises classification loss, contrast loss and triple loss; the triplet loss is determined according to the distance between the characteristics of the anchor sample and the characteristics of the positive sample, the distance between the characteristics of the anchor sample and the characteristics of the negative sample and a preset distance threshold;

inputting the anchor sample, the positive sample and the negative sample into the remote sensing airplane target classification network respectively, and training the remote sensing airplane target classification network according to the loss function to obtain a trained remote sensing airplane target classification model;

acquiring a to-be-detected anchor sample, a to-be-detected positive sample and a to-be-detected negative sample of a to-be-detected remote sensing airplane target, and inputting the to-be-detected positive sample and the to-be-detected negative sample into the remote sensing airplane target classification model to obtain a remote sensing airplane target classification result;

wherein, constructing a loss function comprises:

carrying out weighted summation on the classification loss, the contrast loss and the triple loss to obtain a total loss; the expression of the loss function for the total loss is:

Loss＝w ₁ *L _cls +w ₂ *L _contrastive +w ₃ *L _triplet

wherein: loss, L _cls Represents a classification loss, L _contrastive Denotes loss of contrast, w ₁ ,w ₂ ,w ₃ Represents the loss weight, L _triplet Representing the triple loss, the functional expression of the triple loss is:

Wherein:

and

2. The method of claim 1, wherein the anchor sample, the positive sample, and the negative sample each comprise: an image sample and a mask sample;

the full connection layer of the classification subnet comprises two parallel full connection layers;

the classification layers of the classification subnets comprise two parallel classification layers;

inputting the anchor sample, the positive sample and the negative sample into three classification subnets respectively, and training the remote sensing aircraft target classification network according to the loss function to obtain a trained remote sensing aircraft target classification model, wherein the method comprises the following steps:

composing image samples of the anchor sample, image samples of the positive sample, and image samples of the negative sample into image triples;

combining mask samples of the anchor samples, mask samples of the positive samples, and mask samples of the negative samples into mask triples;

inputting the image triple and the mask triple into a feature extraction network of three classification subnets of the remote sensing aircraft target classification network to obtain triple image features and triple mask features; the triplet image features include: anchor sample image features, positive sample image features, and negative sample image features; the triplet mask features include: an anchor sample mask feature, a positive sample mask feature, and a negative sample mask feature;

Inputting the triple image characteristics and the triple mask characteristics into a full-connection layer network of three classification subnets of the remote sensing aircraft target classification network, and inputting the obtained output into classification layers of the three classification subnets of the remote sensing aircraft target classification network to obtain image triple prediction classification and mask triple prediction classification;

obtaining image triple classification losses according to the image triple prediction classification and the real classification of the correspondingly input image triple, and obtaining mask triple classification losses according to the mask triple prediction classification and the real classification of the correspondingly input mask triple;

obtaining an anchor sample contrast loss according to the anchor sample image characteristic and the anchor sample mask characteristic, obtaining a positive sample contrast loss according to the positive sample image characteristic and the positive sample mask characteristic, and obtaining a negative sample contrast loss according to the negative sample image characteristic and the negative sample mask characteristic;

obtaining image triple losses according to the anchor sample image characteristics, the positive sample image characteristics and the negative sample image characteristics; obtaining mask triple losses according to the anchor sample mask features, the positive sample mask features and the negative sample mask features;

Carrying out weighted fusion on the classification loss of the image triples, the classification loss of the mask triples, the anchor sample contrast loss, the positive sample contrast loss, the negative sample contrast loss, the image triples and the mask triples to obtain a total loss;

and carrying out reverse training on the remote sensing airplane target classification network according to the total loss until the total loss meets a preset condition or the number of training rounds reaches a preset value, and finishing training to obtain a trained remote sensing airplane target classification model.

3. The method of claim 2, wherein inputting the image triples and the mask triples into a feature extraction network of three classification subnets of the remote sensing aircraft target classification network to obtain triplet image features and triplet mask features comprises:

inputting the image triples into a feature extraction network of a first classification branch of three classification subnets of the remote sensing aircraft target classification network to obtain triple image features, wherein the triple image features comprise anchor sample image features, positive sample image features and negative sample image features;

and inputting the mask triple into a feature extraction network of a second classification branch of the three classification subnets of the remote sensing aircraft target classification network to obtain triple mask features, wherein the triple mask features comprise anchor sample mask features, positive sample mask features and negative sample mask features.

4. The method of claim 3, wherein the image triplet prediction classification comprises: the method comprises the steps of image anchor sample prediction classification, image positive sample prediction classification and image negative sample prediction classification;

the mask triplet prediction classification includes: a mask anchor sample prediction classification, a mask positive sample prediction classification, and a mask negative sample prediction classification;

inputting the triple image characteristics and the triple mask characteristics into a full-connection layer network of three classification subnets of the remote sensing aircraft target classification network, and inputting the obtained output into classification layers of the three classification subnets of the remote sensing aircraft target classification network to obtain image triple prediction classification and mask triple prediction classification, wherein the image triple prediction classification and the mask triple prediction classification comprise:

inputting the image characteristics of the anchor sample into a full-connection layer of a first classification branch of a first classification subnet, inputting the obtained characteristics into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain image anchor sample prediction classification;

inputting the anchor sample mask features into a full-link layer of a second classification branch of the first classification subnet, inputting the obtained features into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain a prediction classification of the anchor sample;

Inputting the positive sample image characteristics into a full-connection layer of a first classification branch of a second classification subnet, inputting the obtained characteristics into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain image positive sample prediction classification;

inputting the positive sample mask features into a full-connection layer of a second classification branch of a second classification subnet, inputting the obtained features into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain a mask positive sample prediction classification;

inputting the negative sample image characteristics into a full-connection layer of a first classification branch of a third classification sub-network, inputting the obtained characteristics into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain image negative sample prediction classification;

inputting the negative sample mask features into a full-connection layer of a second classification branch of a third classification subnet, inputting the obtained features into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain the prediction classification of the negative sample of the mask.

5. The method of claim 2, wherein the anchor sample contrast loss, the positive sample contrast loss, and the negative sample contrast loss are calculated using the same contrast loss function;

The classification loss of the image triples and the classification loss of the mask triples are calculated by adopting the same classification loss function;

and the image triplet loss and the mask triplet loss are obtained by adopting the same triplet loss function.

6. The method of claim 2, wherein the classification penalty for image triples includes anchor sample pattern classification penalty, positive sample image classification penalty, and negative sample image classification penalty;

the classification loss of the mask triples comprises an anchor sample mask classification loss, a positive sample mask classification loss and a negative sample mask classification loss;

weighting and fusing the classification loss of the image triples, the classification loss of the mask triples, the anchor sample contrast loss, the positive sample contrast loss, the negative sample contrast loss, the image triples and the mask triples to obtain a total loss, wherein the weighting and fusing comprises the following steps:

adding the anchor sample pattern classification loss, the positive sample image classification loss, the negative sample image classification loss, the anchor sample mask classification loss, the positive sample mask classification loss and the negative sample mask classification loss to obtain a classification loss;

Adding the anchor sample contrast loss, the positive sample contrast loss and the negative sample contrast loss to obtain a contrast loss;

adding the image triplet loss and the mask triplet loss to obtain a triplet loss;

and performing weighted fusion on the classification loss, the contrast loss and the triple loss to obtain a total loss.

7. The method of claim 1, characterized by the steps of: inputting the anchor sample, the positive sample and the negative sample into the remote sensing airplane target classification network respectively, training the remote sensing airplane target classification network according to the loss function to obtain a trained remote sensing airplane target classification model, and then:

obtaining a test sample, wherein the test sample is a pair of samples consisting of an image and a mask;

inputting the image into a feature extraction network of a first classification branch of a first sub-network of the remote sensing airplane target classification model to obtain a test image feature;

inputting the test image characteristics to a full connection layer of a first classification branch of a first sub-network of the remote sensing aircraft target classification model, inputting the obtained output characteristics to the classification layer, and classifying by adopting a Softmax logistic regression model to obtain image classification prediction;

Measuring the distance between the test image characteristic and a predefined standard template characteristic, inputting the obtained measurement result into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain an image contrast prediction;

obtaining a mask classification prediction and a mask comparison prediction from the mask and a second classification branch of a first sub-network of the remote sensing airplane target classification model;

according to the image classification prediction, the mask classification prediction, the image comparison prediction and the mask comparison prediction, a judgment fusion method is adopted for fusion to obtain a final prediction result of a first sub-network;

and testing the second sub-network and the third sub-network by adopting the same steps as the first sub-network to obtain the final prediction result of the second sub-network and the final prediction result of the third sub-network.

8. The method of claim 7, characterized by the steps of: measuring the distance between the test image feature and a predefined standard template feature, inputting the obtained measurement result into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain image contrast prediction, wherein the step of measuring the distance comprises the following steps:

Measuring the Euclidean distance between the test image characteristic and a predefined standard template characteristic to obtain a measurement result;

and inputting the measurement result into a classification layer, and classifying by adopting a Softmax logistic regression model to obtain image contrast prediction.

9. A remote sensing aircraft target classification device based on a double triple pseudo-twin architecture, which is characterized by comprising:

the triple sample acquisition module is used for acquiring a sample set of a remote sensing airplane target, and the sample set comprises: anchor samples, positive samples and negative samples; the anchor sample and the positive sample belong to the same category, and the anchor sample and the negative sample belong to different categories;

the remote sensing aircraft target classification network construction module is used for constructing a remote sensing aircraft target classification network based on a double triple pseudo-twin framework; the remote sensing aircraft target classification network comprises three classification subnetworks based on a pseudo-twin framework; the classification sub-network based on the pseudo-twin framework comprises two classification branches consisting of a feature extraction network, a full connection layer and a classification layer; the feature extraction networks in the two classification branches are two convolutional neural networks with unshared weights, and the feature extraction networks are used for extracting features of input samples;

The loss function constructing module is used for constructing a loss function; the loss function comprises classification loss, contrast loss and triple loss; the triplet loss is determined according to the distance between the characteristics of the anchor sample and the characteristics of the positive sample, the distance between the characteristics of the anchor sample and the characteristics of the negative sample and a preset distance threshold;

the remote sensing airplane target classification network training module is used for inputting the anchor sample, the positive sample and the negative sample into the remote sensing airplane target classification network respectively, and training the remote sensing airplane target classification network according to the loss function to obtain a trained remote sensing airplane target classification model;

the remote sensing airplane target classification module is used for obtaining an anchor sample to be detected, a positive sample to be detected and a negative sample to be detected of a remote sensing airplane target to be detected, and inputting the anchor sample, the positive sample to be detected and the negative sample to be detected into the remote sensing airplane target classification model to obtain a remote sensing airplane target classification result;

the loss function building module is further configured to perform weighted summation on the classification loss, the contrast loss and the triple loss to obtain a total loss; the expression of the loss function for the total loss is:

Loss＝w ₁ *L _cls +w ₂ *L _contrastive +w ₃ *L _triplet

wherein: loss, L _cls Represents a classification loss, L _contrastive Denotes the loss of contrast, w ₁ ,w ₂ ,w ₃ Represents the loss weight, L _triplet Representing the triple loss, the functional expression of the triple loss is:

wherein:

and

respectively representing a first set of anchor samples, positive samples and negative samples, f (.) And the characteristic graph represents the characteristic extraction of the characteristic extraction network, alpha represents a distance threshold value, and N represents the group number of the anchor sample, the positive sample and the negative sample.