CN112861672B

CN112861672B - Heterogeneous remote sensing image matching method based on optical-SAR

Info

Publication number: CN112861672B
Application number: CN202110111049.4A
Authority: CN
Inventors: 李斌; 周世杰; 阴俊恺; 张正强; 吴震
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-01-27
Filing date: 2021-01-27
Publication date: 2022-08-05
Anticipated expiration: 2041-01-27
Also published as: CN112861672A

Abstract

The invention provides an optical-SAR-based heterogeneous remote sensing image matching method, which comprises the following steps: s1, collecting training data containing a positive sample and a negative sample; s2, training a full convolution backbone network by using training data to obtain an image matching model; and S3, matching the optical image to be matched with the SAR image by using the image matching model. The invention can self-adaptively learn the detail characteristics of the image by using a deep learning method, can improve the accuracy and efficiency of remote sensing image matching, and can be truly commercialized.

Description

Heterogeneous remote sensing image matching method based on optical-SAR

Technical Field

The invention relates to the technical field of heterogeneous remote sensing matching, in particular to an optical-SAR-based heterogeneous remote sensing image matching method.

Background

At present, the heterogeneous remote sensing matching mostly uses a manual design method, the detail characteristics of heterogeneous remote sensing images cannot be well quantized and utilized, great precision loss exists, and the method does not meet the production requirements.

Disclosure of Invention

The invention aims to provide an optical-SAR-based heterogeneous remote sensing image matching method, and aims to solve the problems that the existing heterogeneous remote sensing matching method is mostly designed manually, cannot be quantized well, utilizes the detail characteristics of heterogeneous remote sensing images, has large precision loss and does not meet the requirements of productization.

The invention provides an optical-SAR-based heterogeneous remote sensing image matching method, which comprises the following steps:

s1, collecting training data containing a positive sample and a negative sample;

s2, training a full convolution backbone network by using training data to obtain an image matching model;

and S3, matching the optical image to be matched with the SAR image by using the image matching model.

Further, step S1 includes the following sub-steps:

s11, acquiring optical images and SAR images of a plurality of same areas;

s12, carrying out image preprocessing on the optical image and the SAR image;

s13, extracting SAR characteristic points and optical homonymy characteristic points of the SAR characteristic points; the SAR characteristic point and the optical homonymous characteristic point are called homonymous point pairs;

s14, collecting local neighborhood image blocks in a preset shape with SAR characteristic points as centers in the preprocessed SAR image;

s15, collecting a local neighborhood image block of a preset shape taking an optical homonymous feature point as a center in the preprocessed optical image;

s16, matching the local neighborhood image blocks obtained in the step S14 and the step S15 in pairs, wherein the local neighborhood image blocks corresponding to the same-name point pairs are called same-name image pairs, and the rest local neighborhood image blocks matched in pairs are called heterogeneous image pairs;

s17, the homonym image pair is used as a positive sample of the training data, and the heteronym image pair is used as a negative sample of the training data.

Further, step S13 includes the following sub-steps:

s131, searching feature points in the preprocessed optical image and SAR image by using a Harris algorithm, detecting the feature points by using an SIFT algorithm, and extracting the optical feature points and the SAR feature points;

s132, collecting local neighborhood image blocks in a preset shape by taking the optical characteristic points and the SAR characteristic points as centers respectively;

s133, generating 128-dimensional feature vectors for each local neighborhood image block by using a deep neural network;

s134, calculating Euclidean spatial similarity of feature vectors of image blocks of local neighborhoods of different sources by using a K-means algorithm;

s135, according to the calculated European spatial similarity, taking the SAR characteristic point as a reference, and taking the nearest neighbor optical characteristic point as an optical homonymy characteristic point;

s136, eliminating false-reported optical homonymous feature points by using a RANSAC algorithm to obtain an affine transformation matrix of a matching point set;

s137, calculating an accurate regression coordinate by using the affine transformation matrix of the matching point set, and constructing a matching block with the radius of H by taking the coordinate as a center; secondly, by taking the SAR image characteristic points as a reference, re-matching the optical characteristic points in the matching block as new optical homonymous characteristic points;

and S138, repeating the steps S136 to S137 for a plurality of times to obtain the required optical homonymous feature points.

Further, the network structure of the fully-convolutional backbone network in step S2 includes 1 local response normalization layer LRN and 11 network modules, each network module is composed of a convolutional layer Conv and a batch normalization layer BN, and each network module is followed by 1 activation function Relu; the partial response normalization layer LRN is connected after the last activation function Relu.

Further, the proportion of the positive samples and the negative samples of the training data input into the full convolution backbone network is controlled to be 1: 1.

Further, the method for training the full convolution backbone network in step S2 includes: the method comprises the steps of pre-training the full convolution backbone network by using Brown and HPatches data sets, and then training by using positive samples and negative samples of training data to obtain an image matching model, so that the problem of small data volume can be solved to a certain extent.

Further, step S3 includes the following sub-steps:

s31, acquiring an optical image and an SAR image to be matched;

s32, carrying out image preprocessing on the optical image to be matched and the SAR image;

s33, searching characteristic points in the preprocessed SAR image to be matched by using a Harris algorithm, detecting the characteristic points by using an SIFT algorithm, and extracting SAR characteristic points;

s34, collecting local neighborhood image blocks in a preset shape with SAR feature points as centers in the preprocessed SAR image to be matched;

s35, acquiring local neighborhood image blocks in different positions and in a preset shape in the preprocessed optical image to be matched;

s36, inputting the optical image to be matched and the local neighborhood image block of the SAR image into the image block pair matching model to obtain the local neighborhood image block of the matched optical image; the optical characteristic point at the center of the local neighborhood image block of the matched optical image is the optical homonymous characteristic point.

Further, the image preprocessing method comprises the following steps:

(1) improving the local contrast ratio by using image equalization on the optical image and the SAR image;

(2) gaussian filtering is also used on the SAR image to filter out image noise.

Further, the preset shape of the local neighborhood image block is a square.

Further, the size of the local neighborhood image block is 16 × 16, 32 × 32, or 64 × 64.

In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:

1. the method for matching the heterogeneous remote sensing image based on the optical-SAR can adaptively learn the detail characteristics of the image by using a deep learning method, can improve the accuracy and efficiency of matching the remote sensing image, and can be truly commercialized.

2. According to the method, the effect of characteristic point extraction and the reliability of a characteristic extraction part can be improved by image equalization on the quality of the optical image and the SAR image; and by additionally using Gaussian filtering on the SAR image, false recognition can be avoided when the feature points are extracted.

3. The method uses Harris algorithm and SIFT algorithm to extract the feature points, and can avoid the problem that the feature points with different names are detected on the optical image and the SAR image.

4. The invention eliminates the false-reported optical homonymous feature points by using the RANSAC algorithm, and can obtain more accurate homonymous feature points.

5. According to the method, the ratio of the positive sample to the negative sample of the training data is controlled to be 1:1, so that the trained image matching model can be ensured to be accurately matched with a target image block which is most matched with the local neighborhood image block of the SAR image from the local neighborhood image block of the optical image containing a large number of negative samples, and the situation that the image matching model cannot be converged due to excessive negative samples is avoided.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a flow chart of an optical-SAR heterogeneous remote sensing image matching method according to an embodiment of the present invention.

FIG. 2 is a block diagram of a process for collecting training data including positive and negative examples according to an embodiment of the present invention

Fig. 3 is a block diagram of a process of extracting SAR feature points and optical homonymous feature points according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of a pair of homonymous images and homonymous feature points according to an embodiment of the present invention.

FIG. 5 is a schematic diagram of a homonymous image pair and a heterologous image pair of an embodiment of the invention.

Fig. 6 is a schematic network structure diagram of the full-convolution backbone network according to the present invention.

Fig. 7 is a block diagram of a process of matching an optical image to be matched with an SAR image by using an image matching model according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Examples

Referring to fig. 1, the present embodiment provides an optical-SAR-based heterogeneous remote sensing image matching method, where the matching method includes the following steps:

s1, collecting training data containing positive and negative samples, see fig. 2:

s11, acquiring optical images and SAR images of a plurality of same areas;

s12, image preprocessing is carried out on the optical image and the SAR image: (1) improving the local contrast ratio by using image equalization on the optical image and the SAR image; (2) gaussian filtering is also used on the SAR image to filter out image noise. The quality of an original optical image and an SAR image is poor, an overexposed image or an overexposed image often appears, the effect of feature point extraction can be improved and the reliability of a feature extraction part can be improved for the quality of the optical image and the SAR image through image equalization; the SAR image is additionally subjected to Gaussian filtering, so that false recognition during feature point extraction can be avoided;

s13, extracting SAR characteristic points and optical homonymy characteristic points of the SAR characteristic points; the SAR feature points and the optical homonymous feature points are called homonymous point pairs, see fig. 3:

s131, searching feature points in the preprocessed optical image and SAR image by using a Harris algorithm, detecting the feature points by using an SIFT algorithm, and extracting the optical feature points and the SAR feature points; for an optical image and an SAR image, the gray value is generally mutated in the edge and other areas of an object, the Harris algorithm can well detect the characteristic points, but the Harris algorithm requires that the gray value of each direction of a picture is obviously changed, so that the detection quantity of the characteristic points is too small, and even the characteristic points with the same name are detected on the optical image and the SAR image, therefore, the SIFT algorithm is used for assisting in extraction to solve the problem;

s132, collecting local neighborhood image blocks in a preset shape by taking the optical characteristic points and the SAR characteristic points as centers respectively; in this embodiment, the preset shape of the local neighborhood image block is a square; generally, the size of the local neighborhood image block is 16 × 16, 32 × 32 or 64 × 64;

s133, a 128-dimensional feature vector is generated for each local neighborhood image block by using a deep neural network, and the method for extracting the feature vector in the step can adopt the prior art and is not described any more;

s134, calculating Euclidean spatial similarity of feature vectors of local neighborhood image blocks of different sources (namely from the optical image and the SAR image respectively) by using a K-means algorithm; the present embodiment uses the L2 distance as the euclidean space similarity;

s136, eliminating false-reported optical homonymous feature points by using a RANSAC algorithm to obtain an affine transformation matrix of a matching point set; the accuracy of the optical homonymous feature points obtained in step S135 needs to be further improved, so the present invention uses the RANSAC algorithm to eliminate the misreported optical homonymous feature points, where the RANSAC algorithm is the prior art, and in this embodiment: and randomly selecting a plurality of optical homonymous feature points, calculating an affine transformation matrix by using a least square method, and quantifying the model goodness according to the number of homonymous points which can be matched by the affine transformation matrix. Repeating the cycle until a model meeting the requirement is obtained or the maximum iteration times is reached, and then withdrawing the cycle, wherein the obtained affine transformation matrix is the affine transformation matrix of the matching point set;

s138, repeatedly executing the steps S136-S137 for a plurality of times to obtain the required optical homonymous feature points;

s14, collecting local neighborhood image blocks in a preset shape with SAR characteristic points as centers in the preprocessed SAR image; as before, the preset shape of the local neighborhood image block here is also square; likewise, the size of the local neighborhood image block is 16 × 16, 32 × 32 or 64 × 64;

s15, collecting a local neighborhood image block in a preset shape with an optical homonymous feature point as a center in the preprocessed optical image; as before, the preset shape of the local neighborhood image block here is also square; likewise, the size of the local neighborhood image block is 16 × 16, 32 × 32 or 64 × 64;

s16, matching the local neighborhood image blocks obtained in the step S14 and the step S15 in pairs, wherein the local neighborhood image blocks corresponding to the same-name point pairs are called same-name image pairs, and the rest local neighborhood image blocks matched in pairs are called heterogeneous image pairs; as shown in fig. 4, the square frame in the optical image and the SAR image is a pair of homonymous images, and the center of the pair of homonymous images is a pair of homonymous points.

as shown in fig. 5, for positive samples far lower in number than negative samples, any two heterogeneous image pairs can constitute a negative sample in addition to the homonymous image pair; however, for hundreds or thousands of homonymous image pairs in a batch, traversing all negative examples would increase the computational complexity significantly, and having many negative examples would not actually result in an effective gradient update. Therefore, the proportion of the positive samples and the negative samples of the training data input into the full convolution backbone network is controlled to be 1:1, so that the trained image matching model can ensure that the target image block which is most matched with the local neighborhood image block of the SAR image can be accurately matched from the local neighborhood image block of the optical image containing a large number of negative samples, and the situation that the image matching model cannot be converged due to excessive negative samples is avoided.

As shown in fig. 6, the network structure of the fully-convolutional backbone network in step S2 includes 1 local response normalization layer LRN and 11 network modules, each network module is composed of a convolutional layer Conv and a batch normalization layer BN, and each network module is followed by 1 activation function Relu; the partial response normalization layer LRN is connected after the last activation function Relu. And defining a fine label loss function, inputting a positive sample and a negative sample of training data into the full convolution backbone network, and training to obtain an image matching model by using the defined loss function. The loss function implementation is as follows:

having a unique representation of a remembered image block

τ _i I-1 represents the optical image, i-2 represents the SAR image,

represented as the original image tau _i The image block with the upper index j,

for image blocks

Coordinates of the center point of (a). Image block pair assumed to participate in training

From the same pair of heterogeneous image pairs (tau) _i ,τ _i ) The distance between the image block and the central point (i.e. the coordinates of the key feature point corresponding to the image block) is x _ij . For training samples

Obtaining the descriptive feature vector after the improved L2-Net

Where n represents the number of same-name image block pairs (batch size) in a batch during training, and p represents the feature vector output by L2-Net. Obtaining a space distance matrix after sampling

Distance matrix D ═ D from feature _ij ] _n×n Wherein:

in the above formula x _ij Smaller represents more overlapping areas, they are more similar when x _ij Larger than the block radius b represents a non-overlapping area. Similarly, when the training samples do not belong to the same pair of different source image pairs, they do not have an overlapping region, and accordingly, a spatial distance matrix L ═ L is obtained _ij ] _n×n As shown in the following formula:

where θ is a magnification factor, where θ is set to 1, the larger θ is, the more sensitive the key feature point distance change is. According to the image homonymy information, a label matrix Y ═ Y can be obtained _ij ] _n×n ：

The fine tag loss function, which combines the feature distance and the spatial distance, is then:

in some embodiments, the full convolution backbone network may be pre-trained using Brown and HPatches data sets, and then trained using positive and negative samples of training data to obtain an image matching model, which may eliminate the problem of small data size to some extent.

S3, matching the optical image to be matched with the SAR image by using the image matching model, see fig. 7:

s31, acquiring an optical image and an SAR image to be matched;

s32, image preprocessing is carried out on the optical image to be matched and the SAR image, and the steps are the same as the step S12: (1) using image equalization to the optical image and the SAR image to be matched to improve the local contrast; (2) and filtering image noise by using Gaussian filtering on the SAR image to be matched.

s34, collecting local neighborhood image blocks in a preset shape with SAR feature points as centers in the preprocessed SAR image to be matched; similarly, the preset shape of the local neighborhood image block is a square, and the size of the local neighborhood image block is 16 × 16, 32 × 32 or 64 × 64;

s35, collecting local neighborhood image blocks in different positions and in a preset shape in the preprocessed optical image to be matched; similarly, the preset shape of the local neighborhood image block is a square, and the size of the local neighborhood image block is 16 × 16, 32 × 32 or 64 × 64;

According to the method for matching the remote sensing image based on the optical-SAR heterogeneous remote sensing image, disclosed by the invention, the image detail characteristics are self-adaptively learned by using a deep learning method, so that the accuracy and the efficiency of matching the remote sensing image can be improved, and the remote sensing image is truly commercialized.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A heterogeneous remote sensing image matching method based on optical-SAR is characterized by comprising the following steps:

s3, matching the optical image to be matched with the SAR image by using the image matching model;

step S1 includes the following sub-steps:

s11, acquiring optical images and SAR images of a plurality of same areas;

s12, carrying out image preprocessing on the optical image and the SAR image;

s15, collecting a local neighborhood image block in a preset shape with an optical homonymous feature point as a center in the preprocessed optical image;

s17, using the homonymy image pair as a positive sample of the training data, and using the heterogenous image pair as a negative sample of the training data;

step S13 includes the following sub-steps:

s132, respectively collecting a local neighborhood image block in a preset shape by taking the optical characteristic point and the SAR characteristic point as centers;

2. The optical-SAR-based heterogeneous remote sensing image matching method according to claim 1, wherein the network structure of the fully-convoluted backbone network in step S2 comprises 1 local response normalization layer LRN and 11 network modules, each network module is composed of one convolution layer Conv and batch normalization layer BN, and each network module is followed by 1 activation function Relu; the partial response normalization layer LRN is connected after the last activation function Relu.

3. The method for matching the images based on the optical-SAR heterogeneous remote sensing as claimed in claim 2, wherein the ratio of the positive sample and the negative sample of the training data input into the full convolution backbone network is controlled to be 1: 1.

4. The method for matching the image based on the optical-SAR heterogeneous remote sensing of claim 3, wherein the method for training the full convolution backbone network in the step S2 comprises the following steps: the method comprises the steps of pre-training the full convolution backbone network by using Brown and HPatches data sets, and then training by using positive samples and negative samples of training data to obtain an image matching model, so that the problem of small data volume can be solved to a certain extent.

5. The optical-SAR-based heterogeneous remote sensing image matching method according to claim 4, wherein the step S3 comprises the following sub-steps:

s31, acquiring an optical image and an SAR image to be matched;

s35, collecting local neighborhood image blocks in different positions and in a preset shape in the preprocessed optical image to be matched;

6. The optical-SAR-based heterogeneous remote sensing image matching method according to claim 5, wherein the image preprocessing method comprises the following steps:

(2) gaussian filtering is also used on the SAR image to filter out image noise.

7. The optical-SAR-based heterogeneous remote sensing image matching method according to any one of claims 1-6, wherein the preset shape of the local neighborhood image block is a square.

8. The method for matching the images based on the optical-SAR heterogeneous remote sensing as claimed in claim 7, wherein the size of the image blocks of the local neighborhood is 16 x 16, 32 x 32 or 64 x 64.