CN108596108B

CN108596108B - Aerial remote sensing image change detection method based on triple semantic relation learning

Info

Publication number: CN108596108B
Application number: CN201810385526.4A
Authority: CN
Inventors: 陈克明; 张梦雅; 许光銮; 闫梦龙
Original assignee: Institute of Electronics of CAS
Current assignee: Institute of Electronics of CAS
Priority date: 2018-04-26
Filing date: 2018-04-26
Publication date: 2021-02-23
Anticipated expiration: 2038-04-26
Also published as: CN108596108A

Abstract

The invention provides an aerial photography remote sensing image change detection method based on triple semantic relation learning, which comprises the following steps: step A: constructing a double-path deep neural network model based on triple semantic relation learning; and B: training a two-way deep neural network model by using a training data set; and C: obtaining a feature representation of the test data set based on the test data set and the trained two-way deep neural network model; step D: calculating the Euclidean distance between two time phase images based on the feature representation of the test data set to obtain a difference image; and step E: and processing the difference image by using a threshold value method to obtain a change detection result. The method for detecting the change of the aerial photography remote sensing image based on the triple semantic relation learning automatically realizes the automatic selection of the multi-time-phase aerial photography remote sensing image characteristics by using the deep learning method, can express the image more comprehensively and deeply, does not need manual characteristic selection, is time-saving and labor-saving, and is convenient for engineering application.

Description

Aerial remote sensing image change detection method based on triple semantic relation learning

Technical Field

The disclosure relates to the technical field of remote sensing image processing, in particular to an aerial photography remote sensing image change detection method based on triple semantic relation learning.

Background

Human activities have a great influence on the environment on the earth surface, and the influence is reflected in various aspects such as environmental change, urban development and the like. Therefore, the real-time accurate acquisition of the change condition of the earth surface coverage is significant to environmental monitoring and resource management, and the change detection means that the change of the earth surface is determined by observing the distribution condition of the ground features in the same region at different times. The remote sensing image can provide earth surface information in a large range for a long time, and has important application in change detection. In recent years, with the development of aerial remote sensing technology, aerial images have a huge amount of data, so that aerial remote sensing image change detection is also an important subject in the field of remote sensing at present.

The method for detecting the change of the aerial remote sensing image mainly comprises two types: one type is that two time phase remote sensing images are classified respectively, and then the obtained classification class diagrams are compared and analyzed, so that a change detection result is obtained; the other is to compare and analyze multi-phase images to generate a difference image, and then analyze the difference image to obtain the result of change detection. The latter is the mainstream change detection method, and how to generate a high-quality difference map is an important research direction for change detection.

However, in implementing the present disclosure, the present inventors found that a common method for generating a disparity map is to compare extracted different phase features. The traditional change detection method is to manually extract features, and the feature expression force is not high; the change detection method combined with deep learning is characterized in that features are extracted through a deep neural network, the robustness and the abstraction are stronger, but the problems of semantic relation among pixels and multi-scale of a change area are ignored in the features extracted by the deep learning method, and a high-quality difference map cannot be generated.

BRIEF SUMMARY OF THE PRESENT DISCLOSURE

Technical problem to be solved

Based on the technical problems, the invention provides an aerial photography remote sensing image change detection method based on triple semantic relation learning, so as to solve the technical problems that in the prior art, the change detection method ignores the semantic relation between pixels and the multi-scale of a change area, and cannot generate a high-quality difference image.

(II) technical scheme

The invention provides an aerial photography remote sensing image change detection method based on triple semantic relation learning, which comprises the following steps: step A: constructing a double-path deep neural network model based on triple semantic relation learning; and B: training the two-way deep neural network model by using a training data set; and C: obtaining a feature representation of the test data set based on the test data set and the trained two-way deep neural network model; step D: calculating the Euclidean distance between two time phase images based on the feature representation of the test data set to obtain a difference image; and step E: and processing the difference image by using a threshold value method to obtain a change detection result.

In some embodiments of the present disclosure, the step a comprises: step A1: constructing the two-way deep neural network model for extracting features based on a 101-layer residual error network; step A2: acquiring a triple selection layer for training; and step a 3: a loss function Triplet loss layer is set.

In some embodiments of the present disclosure, the step a1 includes: step A1 a: replacing a full connection layer in a residual error network of 101 layers with a full convolution layer; step A1 b: enlarging the range of the receptive field by adopting a porous convolution; and step A1 c: and extracting features of different scales by adopting a spatial pyramid pooling method with holes.

In some embodiments of the present disclosure, the step a2 includes: step A2 a: and cascading the feature representations of the training data of the two time phases in the training data set after training into a feature map through a cascading layer, wherein the feature map satisfies the following formula:

f_w(X)＝{f_w(x_ij)|1≤i≤H，1≤j≤W}

wherein f is_w(x_ij) The feature vectors representing the corresponding pixels on the feature icon (i, j), and H, W represent the height and width of the current feature map.

The step a2 further includes: step A2 b: obtaining a feature vector of each pixel on the feature map

Is marked as anchor; step A2 c: obtaining the feature vector which is the same as the anchor label and has the farthest distance

Marking as positive; step A2 d: acquiring a feature vector which is different from the anchor label and is closest to the anchor label

Marking as negative; step A2 e: the feature vector anchor, positive and negative are combined into a triple feature vector

In some embodiments of the present disclosure, in step a 3: the Triplet loss layer satisfies the following forward calculation formula:

wherein L (w) represents a loss function, L_pRepresenting functions that only consider inter-class losses, L_tRepresenting a conventional triplet loss function, P representing the number of pairs of images input to the network, w representing a network parameter, λ representing a weight to measure the two losses, m₁Is a constant, m₂Is a ratio m₁A small constant.

The step a3 further includes: calculating the partial derivative of the Triplet loss layer according to the following formula:

wherein h is₁(w) represents L_pThe parameters are subjected to partial derivation h₂(w) represents L_tCalculating the deviation of the parameters;

in some embodiments of the present disclosure, in the step B, the two-way deep neural network is trained by using a stochastic gradient descent method using the training data set.

In some embodiments of the present disclosure, the step C comprises: and B, taking the test data set as the input of the trained double-path deep neural network model obtained in the step B, removing a cascade layer, a triple selection layer and a loss function layer at the tail of the double-path deep neural network model, and keeping the output of the multi-scale feature fusion layer as the depth feature representation obtained by learning on the test data set.

In some embodiments of the present disclosure, the step D comprises: and C, as for the feature representation of the test image acquired in the step C, restoring the resolution of the feature map to the size of the input image by using bilinear interpolation with the coefficient of 8, and calculating the Euclidean distance between the two feature maps to obtain a difference image.

In some embodiments of the present disclosure, the step E comprises: and D, processing the difference image obtained in the step D according to the following strategy: when d (x)_mn) When the pixel point is larger than th, the pixel point is set to be 255; when d (x)_mn) If the pixel point is less than th, setting the pixel point to be 0; d (x)_mn) Indicates the distance value of the corresponding pixel at the difference image coordinates (m, n), and th indicates that the threshold is constant.

In some embodiments of the present disclosure, preprocessing multi-temporal remote sensing image data to be detected to obtain a training data set and a test data set; the preprocessing of the multi-temporal remote sensing image data to be detected comprises the following steps: carrying out relative radiation correction on a plurality of groups of images in the same region at different times by using a histogram matching method to eliminate radiation difference among the images at different time phases; and/or cutting or selecting the registered image to obtain a training data set and a test data set of the remote sensing image, wherein the cutting of the registered image comprises the following steps: cutting any area of each group of two-time phase images as a test area, and taking the rest area of the two-time phase images as a training area; and/or after the two time phase images of the training area are cut, horizontal and vertical turning and rotation change are carried out to obtain an expanded training data set.

(III) advantageous effects

According to the technical scheme, the aerial photography remote sensing image change detection method based on triple semantic relation learning has one or part of the following beneficial effects:

(1) the automatic selection of the multi-time-phase aerial remote sensing image characteristics is automatically realized by utilizing a deep learning method, the image can be more comprehensively and deeply expressed, manual characteristic selection is not needed, time and labor are saved, and engineering application is facilitated;

(2) the two-path network can process two images simultaneously, and two branches of the network share weight, which is equivalent to extracting features from the two images by using the same method;

(3) the problem of reduced resolution of the feature map caused by downsampling and pooling is solved by adopting the porous convolution, the reduction multiple of the picture size can be reduced, and meanwhile, the scope of a receptive field is effectively expanded under the condition of not increasing the number of parameters and the calculation amount, so that a denser feature map is obtained;

(4) the extraction of the feature representation of different scale change areas can be realized by utilizing a spatial pyramid pooling method with holes;

(5) by utilizing the improved triple loss function, the distance of the feature vector of the same type of label can be smaller than that of different types of labels by learning the semantic relation between the pixels, and the compactness of the pixels of the same type of label can be enhanced;

(6) a change detection result image is obtained through a threshold value method, the result image is closer to pixels of the same label through learning the semantic relation among the pixels, and noise points in the result image are fewer.

Drawings

Fig. 1 is a schematic flow chart of an aerial remote sensing image change detection method based on triple semantic relation learning according to an embodiment of the present disclosure.

Fig. 2 is a multi-temporal remote sensing image dataset to be detected selected by the embodiment of the present disclosure.

Fig. 3 is a graph of test results for an embodiment of the disclosure.

Detailed Description

In the method for detecting changes of aerial remote sensing images based on triple semantic relation learning, a deep learning method is used for learning semantic relations among image pixels so as to realize remote sensing image change detection, manual feature extraction is not needed, time and labor are saved, image features can be extracted more comprehensively and deeply, difference images are obtained, then the difference images are analyzed to obtain a change result image, and excellent results are obtained in the field of remote sensing image change detection.

For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

The embodiment of the disclosure provides an aerial photography remote sensing image change detection method based on triple semantic relation learning, as shown in fig. 1, including:

step A: constructing a double-path deep neural network model based on triple semantic relation learning;

and B: training a two-way deep neural network model by using a training data set;

and C: obtaining a feature representation of the test data set based on the test data set and the trained two-way deep neural network model;

step D: calculating the Euclidean distance between two time phase images based on the feature representation of the test data set to obtain a difference image;

and step E: the difference image is processed by a threshold value method to obtain a change detection result, the automatic selection of the multi-time-phase aerial remote sensing image characteristics is automatically realized by a deep learning method, the image can be more comprehensively and deeply expressed, manual characteristic selection is not needed, time and labor are saved, and engineering application is facilitated.

In some embodiments of the present disclosure, step a comprises:

step A1: constructing the two-way deep neural network model for extracting features based on a 101-layer residual error network;

step A2: acquiring a triple selection layer for training;

and step a 3: a loss function Triplet loss layer is set.

In the step a1, a training data set is used as an input to construct a two-way deep neural network model for extracting features, the network structure is a residual error network based on 101 layers, the two-way network can process two images simultaneously, and two branches of the network share weights, which is equivalent to extracting features from the two images by using the same method.

In practical application, the residual network of 101 layers can divide it into 5 convolutional blocks, the first convolutional block is composed of a convolutional layer and a pooling layer, where the convolutional filter convolutional kernel is 7 × 7, and the number of convolutional filters is 64; the second convolution block consists of 3 sets [ (1 × 1, 64) (3 × 3, 64) (1 × 1, 256) ] of convolution filters, where the former term (1 × 1, 3 × 3, 1 × 1) is the convolution filter convolution kernel and the latter term (64, 256) is the number of convolution filters; the third convolution block consists of 4 sets [ (1 × 1, 128) (3 × 3, 128) (1 × 1, 512) ] of convolution filters; the fourth convolution block consists of 23 sets [ (1 × 1, 256) (3 × 3, 256) (1 × 1, 1024) ] of convolution filters; the fifth convolution block consists of 3 sets [ (1 × 1, 512) (3 × 3, 512) (1 × 1, 2048) ] of convolution filters; the last layer is the fully connected layer. In order to make the network structure more suitable for change detection, the residual error network is improved, and the method specifically comprises the following steps:

step A1 a: the full link layers in the residual network of 101 layers are replaced by full convolutional layers, that is, the full link layers in the residual network of 101 layers are replaced by full convolutional layers with convolutional filter convolutional kernels of 1 × 1 and the number of the convolutional layers is 16.

Step A1 b: the problem of reduced resolution of the feature map caused by downsampling and pooling is solved by adopting the porous convolution, in the embodiment of the disclosure, the porous convolution can only reduce the size of the picture by 8 times instead of 32 times of the original size, and meanwhile, under the condition of not increasing the number of parameters and the calculation amount, the scope of the receptive field is effectively expanded, so that a denser feature map is obtained; in the disclosed embodiment, all of the four and five convolutional layers use a perforated convolutional layer, where the hole rate of the fourth convolutional block is 2 and the hole rate of the fifth convolutional block is 4.

Step A1 c: the multi-scale target problem is solved by adopting a porous spatial pyramid pooling method, and because different porosity rates correspond to different receptive field sizes, the characteristics of different scales can be extracted by adopting different porosity rates; in the embodiment of the disclosure, a spatial pyramid structure with holes is used behind the fifth convolution block, and a plurality of parallel convolution kernels with holes of different hole rates are used for processing, wherein the convolution kernels are all 3 × 3, and the hole rates are respectively 6, 12, 18, and 24, so that objects and context information of a plurality of scales can be captured, and finally, multi-scale features are fused, so as to obtain feature representation of an image.

As described above, in step a2, the feature representations of the training data at two time phases in the training data set are cascaded by the cascade layer into a feature map satisfying the following equation:

f_w(X)＝{f_w(x_ij)|1≤i≤H，1≤j≤W}

wherein f is_w(x_ij) Representing the feature vector of the corresponding pixel on the feature icon (i, j), and H, W representing the height and width of the current feature map;

in some embodiments of the present disclosure, a feature vector for each pixel on a feature map is obtained

Is marked as anchor; and acquiring the feature vector which is the same as the anchor label and has the farthest distance

Marking as positive; then, feature vectors which are different from the anchor label and are closest to the anchor label are obtained

Marking as negative; finally, the feature vector anchorThe positive and negative component triplet feature vectors

And aiming at one anchor feature vector, obtaining a triple feature vector of each pixel on the feature map, wherein all the positive feature vectors can form a corresponding positive feature map, and all the negative feature vectors can form a corresponding negative feature map.

As described above, in step A3, the anchor profile, the positive profile, and the negative profile obtained in sub-step a2 are used to obtain the loss function Triplet loss. The similarity between feature vectors is obtained by euclidean distance. The traditional triple loss function only needs to satisfy the feature vector pairs of different labels in the triple

Greater than the same label

A specific value requirement. In the change detection, the situation that the label is fully changed or not changed occurs in the training sample, at this time, the negative characteristic diagram does not exist, and in addition, the traditional loss function does not have a pair

Is constrained, and therefore, the distance may be large and not meet the requirements of the embodiments of the present disclosure, so that the conventional triple loss function is not suitable for the embodiments of the present disclosure. The loss function provided in the embodiment of the disclosure adds pairs to the traditional loss function

The distance is controlled within a specific value range by the constraint of the distance, so that the closer the same type of tags in the feature space and the farther the different tags are.

In some embodiments of the present disclosure, the Triplet loss layer satisfies the following forward calculation formula:

wherein L (w) represents a loss function, L_pRepresenting functions that only consider inter-class losses, L_tRepresenting a conventional triplet loss function, P representing the number of pairs of images input to the network, w representing a network parameter, λ representing a weight to measure the two losses, m₁Is a constant, m₂Is a ratio m₁A small constant. In some embodiments of the disclosure, λ takes the value 0.5, m₁The value is 0.5, m₂The value is 0.3.

In some embodiments of the present disclosure, step a3 includes: the triple loss layer partial derivative is calculated according to the following formula:

in some embodiments of the present disclosure, in step B, the two-way deep neural network is trained using a random gradient descent method using a training data set. The initialization and gradient change of two branches of the network are identical, and when the loss function of the whole deep neural network tends to be close to the local optimal solution, the training is completed. Because the number of network layers is large, the network is difficult to achieve the optimal state, and therefore the network is initialized to a pre-trained model.

In some embodiments of the disclosure, step C comprises: and B, taking the test data set as the input of the trained double-path deep neural network model obtained in the step B, removing a cascade layer, a triple selection layer and a loss function layer at the tail of the double-path deep neural network model, and keeping the output of the multi-scale feature fusion layer as the depth feature representation obtained by learning on the test data set.

In some embodiments of the present disclosure, step D comprises: and C, as for the feature representation of the test image acquired in the step C, restoring the resolution of the feature map to the size of the input image by using bilinear interpolation with the coefficient of 8, and calculating the Euclidean distance between the two feature maps to obtain a difference image.

In some embodiments of the disclosure, step E comprises: and D, processing the difference image obtained in the step D according to the following strategy:

when d (x)_mn) When the pixel point is larger than th, the pixel point is set to be 255;

when d (x)_mn) If the pixel point is less than th, setting the pixel point to be 0;

d(x_mn) The distance value of the corresponding pixel on the difference image coordinate (m, n) is represented, the threshold value is represented as a constant, the change detection result image is obtained through the threshold value method, the result image is closer to the pixels with the same label through learning the semantic relation among the pixels, and the noise points in the result image are fewer.

The data sets adopted by the embodiment of the disclosure are a public data set SZADA data set and a TISZADA data set, and the number of the data sets is 12.

Fig. 2 is a multi-temporal remote sensing image dataset to be detected selected by the embodiment of the present disclosure. Part (a) in fig. 2 is a first group of remote sensing data sets of different phases in the SZTAKI AirChange Benchmark Set. Part (b) in fig. 2 is a third group of remote sensing data sets of different phases in the TISZADOB data Set in the SZTAKI AirChange Benchmark Set.

It should be noted that fig. 2 shows that the original data set image is subjected to grayscale processing, and in practical applications, the input image is a color image, and specifically refer to the contents of the public data set szaida data set and TISZADOB data set.

In some embodiments of the present disclosure, the multi-temporal remote sensing image data to be detected (as shown in fig. 2) is preprocessed to obtain a training data set and a test data set.

In some embodiments of the present disclosure, the preprocessing the multi-temporal remote sensing image data to be detected includes: carrying out relative radiation correction on a plurality of groups of images in the same region at different times by using a histogram matching method to eliminate radiation difference among the images at different time phases; and cutting or selecting the registered image to obtain a training data set and a test data set of the remote sensing image.

In some embodiments of the present disclosure, in cropping the registered images, any region of each set of two-time phase images is cropped as a test region; the remaining area of the two-phase images serves as a training area.

In some embodiments of the present disclosure, in the cropping of the registered image, after the two-time phase image of the training area is cropped, the horizontal and vertical flipping and the rotation change are performed to obtain an extended training data set.

The upper left area of each group of image pairs is cut out as a test area, and the size of the test area is 784 × 448. The rest area of the image pair is a training area, the training area is cut into pictures with the size of 113 x 113 in an overlapped mode to serve as training samples, and the cut training samples are horizontally and vertically turned and rotated by 90 degrees, 180 degrees and 270 degrees, so that the purpose of expanding the training samples is achieved, and the training data set is 2744 in total.

In practical applications, if the number of images included in the data set is large, one part of the data set may be directly selected as a training data set, and the other part of the data set may be selected as a testing data set.

Examples of applications of the present disclosure are further illustrated below:

fig. 3 is a graph of test results for an embodiment of the disclosure.

In fig. 3, (a) shows a standard reference result. Part (b) in fig. 3 is a result of the detection method provided by the embodiment of the present disclosure. Part (c) of fig. 3 is the result of the first comparative method. Part (d) of fig. 3 shows the results of the second comparative method. The upper half of FIG. 3 is the test results for the SZADA/1 data set. The lower half of FIG. 3 is the test results for the TISZADOB/3 data set.

In order to verify the effectiveness of the change detection method provided by the embodiment of the disclosure, the scheme of the invention is tested on a real test data set. Test results on a typical set of data sets are given here: the test data set is shown in fig. 2. In addition, the Change detection result obtained by the detection method provided by the embodiment of the present disclosure is compared with the Change detection results obtained by the two existing methods [ y.zhan, k.fu, m.yan, x.sun, h.w, and x.qiu, Change detection based on parameter relative network for optical information Images, IEEE Geoscience and Remote Sensing Letters, 14 (10): 1845-: 3384 and 3394, 2016 (comparison method two), and the corresponding test results are shown in fig. 3. From left to right in fig. 3, there are the standard reference result (a), the inventive method result (b), the comparative method one result (c) and the comparative method two result (d) in that order.

Here, analysis was performed using the quantitative change results for the change detection experiment result graph:

A. calculating the number of missed detections: the number of pixels which are changed in the reference image but are detected as unchanged in the experimental result image is recorded as a missed detection number FN;

B. calculating the number of false detections: the number of pixels which are not changed in the reference image but are detected as changed in the experimental result image is recorded as the error detection number FP;

C. calculating the accuracy rate Pr ═ TP/(TP + FP);

D. calculating the recall ratio Re ═ TP/(TP + FN);

E. the evaluation index F-measure ═ 2 × Re × Pr)/(Re + Pr, which measures the consistency of the experimental result graph and the reference graph.

Table 1: the method provided by the embodiment of the disclosure is compared with the performance indexes of the detection results of the comparison method I and the comparison method II

The performance indexes of the detection results of the method provided by the embodiment of the disclosure, the comparison method I and the comparison method II are shown in table 1. By observing and analyzing the fig. 2 and the table 1, on the szoda/1 test data, the change areas are small and dispersed, the result graph of the detection method provided by the embodiment of the disclosure has few noise points and good compactness, and the change areas with different scales are well detected. On the TISZADOB/3 test data, the change area is large and regular, the method provided by the embodiment of the disclosure also has a good change detection effect, and a good promotion is realized on the F-measure evaluation index.

So far, the embodiments of the present disclosure have been described in detail with reference to the accompanying drawings. It is to be noted that, in the attached drawings or in the description, the implementation modes not shown or described are all the modes known by the ordinary skilled person in the field of technology, and are not described in detail. Further, the above definitions of the various elements and methods are not limited to the various specific structures, shapes or arrangements of parts mentioned in the examples, which may be easily modified or substituted by those of ordinary skill in the art.

From the above description, those skilled in the art should clearly recognize that the method for detecting changes in aerial remote sensing images based on triple semantic relation learning provided by the present disclosure.

In conclusion, the method for detecting changes of aerial remote sensing images based on triple semantic relation learning provided by the disclosure learns the semantic relation among the pixels of the images by using a deep learning method, so as to realize the detection of changes of the remote sensing images, and can extract the image features more comprehensively and deeply on the basis of time and labor saving, so as to obtain excellent results in the field of detection of changes of the remote sensing images.

It should also be noted that directional terms, such as "upper", "lower", "front", "rear", "left", "right", and the like, used in the embodiments are only directions referring to the drawings, and are not intended to limit the scope of the present disclosure. Throughout the drawings, like elements are represented by like or similar reference numerals. Conventional structures or constructions will be omitted when they may obscure the understanding of the present disclosure.

And the shapes and sizes of the respective components in the drawings do not reflect actual sizes and proportions, but merely illustrate the contents of the embodiments of the present disclosure. Furthermore, in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the disclosure, various features of the disclosure are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various disclosed aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that is, the claimed disclosure requires more features than are expressly recited in each claim. Rather, as the following claims reflect, disclosed aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this disclosure.

The above-mentioned embodiments are intended to illustrate the objects, aspects and advantages of the present disclosure in further detail, and it should be understood that the above-mentioned embodiments are only illustrative of the present disclosure and are not intended to limit the present disclosure, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims

1. An aerial photography remote sensing image change detection method based on triple semantic relation learning comprises the following steps:

and B: training the two-way deep neural network model by using a training data set;

step D: calculating the Euclidean distance between two time phase images based on the feature representation of the test data set to obtain a difference image; and

step E: processing the difference image by using a threshold value method to obtain a change detection result;

the step A comprises the step of setting a loss function Triplet loss layer, wherein the Triplet loss layer meets the following forward calculation formula:

wherein L (w) represents a loss function, L_pRepresenting functions that only consider inter-class losses, L_tRepresenting a conventional triplet loss function, P representing the number of pairs of images input to the network, w representing a network parameter, λ representing a weight to measure the two losses, m₁Is a constant, m₂Is a ratio m₁A small constant of the number of the first and second,

a feature vector representing each pixel on the input feature map,

is shown and

the labels of the corresponding feature vectors are the same and the distance is the greatestThe feature vectors of the far are,

is shown and

the labels of the corresponding feature vectors are different and the feature vector with the nearest distance is obtained.

2. The method for detecting changes in aerial remote sensing images based on triple semantic relationship learning according to claim 1, wherein the step A further comprises the following steps:

step A2: a triplet selection layer is obtained for training.

3. The method for detecting changes in aerial remote sensing images based on triple semantic relationship learning according to claim 2, wherein the step A1 comprises the following steps:

step A1 a: replacing a full connection layer in a residual error network of 101 layers with a full convolution layer;

step A1 b: enlarging the range of the receptive field by adopting a porous convolution; and

step A1 c: and extracting features of different scales by adopting a spatial pyramid pooling method with holes.

4. The method for detecting changes in aerial remote sensing images based on triple semantic relationship learning according to claim 2, wherein the step A2 comprises the following steps:

step A2 a: and cascading the feature representations of the training data of the two time phases in the training data set after training into a feature map through a cascading layer, wherein the feature map satisfies the following formula:

f_w(X)＝{f_w(x_ij)|1≤i≤H，1≤j≤W}

wherein f is_w(x_ij) The feature vectors representing the corresponding pixels on the feature icon (i, j), H, W representing the current feature mapHeight and width;

step A2 b: obtaining a feature vector of each pixel on the feature map

Is marked as anchor;

step A2 c: obtaining the feature vector which is the same as the anchor label and has the farthest distance

Marking as positive;

step A2 d: acquiring a feature vector which is different from the anchor label and is closest to the anchor label

Marking as negative;

step A2 e: the feature vector anchor, positive and negative are combined into a triple feature vector

5. The method for detecting changes in aerial remote sensing images based on triple semantic relationship learning according to claim 2, wherein the setting of the loss function Triplet loss layer further comprises: calculating the partial derivative of the Triplet loss layer according to the following formula:

6. the method for detecting changes in aerial photography remote sensing images based on triple semantic relationship learning of claim 1, wherein in step B, a two-way deep neural network is trained by using a random gradient descent method and the training data set.

7. The method for detecting changes in aerial remote sensing images based on triple semantic relationship learning according to claim 1, wherein the step C comprises the following steps: and B, taking the test data set as the input of the trained double-path deep neural network model obtained in the step B, removing a cascade layer, a triple selection layer and a loss function layer at the tail of the double-path deep neural network model, and keeping the output of the multi-scale feature fusion layer as the depth feature representation obtained by learning on the test data set.

8. The method for detecting changes in aerial remote sensing images based on triple semantic relationship learning according to claim 1, wherein the step D comprises the following steps: and C, as for the feature representation of the test image acquired in the step C, restoring the resolution of the feature map to the size of the input image by using bilinear interpolation with the coefficient of 8, and calculating the Euclidean distance between the two feature maps to obtain a difference image.

9. The method for detecting changes in aerial remote sensing images based on triple semantic relationship learning according to claim 1, wherein the step E comprises the following steps: and D, processing the difference image obtained in the step D according to the following strategy:

d(x_mn) Indicates the distance value of the corresponding pixel at the difference image coordinates (m, n), and th indicates that the threshold is constant.

10. The method for detecting changes in aerial photography remote sensing images based on triple semantic relationship learning of claim 1, wherein multi-temporal remote sensing image data to be detected are preprocessed to obtain a training data set and a test data set;

the preprocessing of the multi-temporal remote sensing image data to be detected comprises the following steps:

carrying out relative radiation correction on a plurality of groups of images in the same region at different times by using a histogram matching method to eliminate radiation difference among the images at different time phases; and/or

Cutting or selecting the registered image to obtain a training data set and a test data set of the remote sensing image, wherein the cutting of the registered image comprises the following steps:

cutting any area of each group of two-time phase images as a test area, and taking the rest area of the two-time phase images as a training area; and/or

After the two time phase images of the training area are cut, horizontal and vertical turning and rotation change are carried out, and an expanded training data set is obtained.