CN108510532B

CN108510532B - Optical and SAR image registration method based on deep convolution GAN

Info

Publication number: CN108510532B
Application number: CN201810276562.7A
Authority: CN
Inventors: 权豆; 焦李成; 王爽; 王若静; 梁雪峰; 方帅; 马晶晶; 孙莉
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2018-03-30
Filing date: 2018-03-30
Publication date: 2022-07-15
Anticipated expiration: 2038-03-30
Also published as: CN108510532A

Abstract

The invention discloses an optical and SAR image registration method based on deep convolution GAN, which comprises the steps of obtaining a training sample, constructing two generation countermeasure networks, training the generation countermeasure networks, expanding training sample data, constructing a feature extraction and matching network, training the feature extraction and matching network by adopting a cross iteration strategy, predicting a matching relation, removing mismatching points, calculating a geometric transformation matrix, and registering an image; the method solves the problems of insufficient neural network training sample data, single sample and loss of spatial information of the extracted image characteristics in the prior art, effectively improves the robustness of heterologous image registration, and realizes higher-precision registration of the SAR and the optical image.

Description

Optical and SAR image registration method based on deep convolution GAN

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an optical and SAR image registration method based on a depth convolution GAN.

Background

In recent years, image registration technology is widely applied in the fields of military, scientific research, daily life and the like, in particular to remote sensing image analysis in the field of remote sensing, medical image recognition and processing in the field of medicine, computer vision in the field of artificial intelligence and the like. The information complementary characteristic existing between the heterogeneous images is very beneficial to acquiring complete image information, so that the heterogeneous image registration has important research value and significance.

The existing image registration methods are mainly divided into two types, one is a gray level-based method, and the other is a method based on local significant features. The gray-scale-based method measures the similarity between images by using global or local gray-scale information of the images. The method is simple to implement, but has large computation amount and is sensitive to noise; the method based on local salient features firstly extracts salient features from a given image, then matches the features, and obtains an image transformation model and transformation parameters through the position corresponding relation among the features. The method has the advantages of small calculated amount, low complexity, strong adaptability to the change of the image gray level and strong robustness, and is widely applied at present. However, the features of the heterogeneous images are very different, especially speckle noise existing in the SAR image, and the conventional feature-based registration method cannot adapt to the difference, which easily results in an erroneous matching result. In recent years, compared with a traditional feature-based registration method, a hotter deep neural network is researched, more essential feature information in an image can be extracted, and the accuracy and robustness of registration of a heterogeneous image can be improved. The deep generation confrontation network can be used for expanding the diversity and the number of the training data samples.

A method for removing mismatching points based On improved SIFT and spatial constraints is proposed in a paper "On the Point availability of Conditional additive Networks for Multi-Sensor Image Matching" (IEEE Geoscience and Remote Sensing Letters 10(3),2013) published by Bin Fan et al. The method comprises the steps of respectively extracting features of an optical image and an SAR image by using an improved SIFT method, then carrying out coarse feature matching by using a maximum neighbor method, removing mismatching points by using a spatial relationship between the images to obtain final matching points, and then calculating a conversion relationship of image registration to complete registration. The method has the defects that the image characteristics of the optical image and the SAR extracted by the method based on the improved SIFT are different greatly, particularly the characteristics extracted from the SAR image are easily influenced by speckle noise, more wrong matching points are easily caused, the constraint condition based on the spatial relationship is greatly influenced by the noise, and the registration robustness is poor.

An image block Feature extraction and Matching method Based on a neural network dual-branch Feature vector structure is proposed In a paper published by Xufeng Han et al, namely ' MatchNet ', unity yielding Feature and Metric Learning for Patch-Based Matching ' (In CVPR, IEEE, 2015). The method comprises the steps of respectively extracting feature vectors of image blocks to be registered by using a convolution network, connecting the feature vectors and the image blocks, and inputting the two into a matching network for matching detection. The method has the defects that the feature extraction network obtains the feature vector of the image block, the spatial information of the image features is lost, the matching detection of the matching network is not facilitated, and the method is only used for the matching detection between the same source image blocks and is not used for the registration between different source images.

Disclosure of Invention

The invention aims to provide an optical and SAR image registration method based on a deep convolution GAN, which solves the problem that the precision and robustness of a registration result are poor when the existing optical and SAR images are subjected to heterogeneous image registration.

In order to achieve the purpose, the invention adopts the technical scheme that:

the invention provides an optical and SAR image registration method based on a depth convolution GAN, which comprises the following steps:

step 1), acquiring a training sample according to an optical image and an SAR image to obtain a training sample for generating a confrontation network and a training sample image block pair of a matching network;

step 2), constructing generation countermeasure networks, wherein the two groups of generation countermeasure networks have the same network structure, and each group of generation countermeasure networks comprises a generator network and a discriminator network;

step 3), training to generate a confrontation network: inputting the training samples for generating the confrontation network obtained in the step 1) into the two groups of the confrontation network G constructed in the step 2)₁And G₂Wherein a first group generates a countermeasure network G₁The SAR image block is converted into an optical image block, and a second group generates a countermeasure network G₂Converting an optical image block into an SAR image block, and then respectively training to obtain training weights corresponding to two generator networks;

Step 4), expanding training sample data: firstly, inputting SAR image blocks in the matching network training sample image block pairs obtained in the step 1) into a trained generator network G₁Obtaining a converted optical image block, wherein the converted optical image block and the corresponding SAR image block are combined to form a positive sample; then keeping the order of the SAR image blocks unchanged, disordering the order of the converted optical image blocks, and combining the converted optical image blocks and the negative samples into a negative sample; secondly, inputting the optical image blocks in the matching network training sample image block pairs obtained in the step 1) into a trained generator network G₂Obtaining a converted SAR image block, wherein the converted SAR image and the corresponding optical image block are combined into a positive sample; then keeping the order of the optical image blocks unchanged, disordering the order of the converted SAR images, and combining the SAR images and the negative images into a negative sample; finally, the sequence of the obtained positive sample and the negative sample is respectively disturbed, and then expanded training sample data is obtained;

step 5), constructing a feature extraction and matching network;

step 6), training the feature extraction and matching network, and obtaining the training weight of the trained feature extraction and matching network by adopting a cross iterative training strategy;

Step 7), predicting a matching relation: firstly, an image to be registered, namely an SAR image I is input₁And an optical image I₂(ii) a Secondly, two images I are extracted respectively₁And I₂Then respectively taking the extracted SIFT feature points A as centers in the SAR image I₁And an optical image I₂Cutting SAR image blocks and optical image blocks with the size of 32 x 32; finally, performing feature extraction and matching judgment on the obtained SAR image block and the obtained optical image block by using a trained feature extraction and matching network, and outputting a matching predicted value of the central point of the image block;

step 8), removing mismatching points;

step 9), calculating a geometric transformation matrix: calculating a geometric transformation matrix T between the images to be registered by using a least square method;

step 10), registering images: using the geometric transformation matrix T for the optical image I₂Performing geometric transformation to obtain final image I₁And I₂The result of the registration of (1).

Preferably, in step 1), the method for obtaining the training sample is as follows:

(1a) reading in the registered SAR image and the optical image;

(1b) extracting SIFT feature points A from the optical image by adopting an SIFT method, and cutting 32 × 32 optical image blocks on the optical image by taking the coordinates of the extracted SIFT feature points A as a central point; meanwhile, taking a feature point B corresponding to the coordinate of the SIFT feature point A from the SAR image;

(1c) Randomly selecting a feature point C on the SAR image, then respectively taking the coordinates of the feature point B and the feature point C as centers on the SAR image, cutting two SAR image blocks of 32 x 32, and finally respectively combining the two SAR image blocks with corresponding optical image blocks to respectively obtain a positive sample image block pair and a negative sample image block pair of the training data;

(1d) rotating the SAR image by any range angle, meanwhile, calculating the transformation position of the extracted characteristic point B after rotation according to the rotation matrix of the SAR image, keeping the optical image unchanged at the moment, and repeating the step (1 c);

(1e) taking all the obtained positive samples as training samples for generating the confrontation network;

(1f) and mixing all the obtained positive and negative samples to obtain a training sample image block pair of the matching network.

Preferably, in step 2), the generator network: the method comprises four layers of convolution and four layers of deconvolution, wherein each layer of convolution in the four layers of convolution is connected with a maximum pooling layer, and the number of filters corresponding to the first layer of convolution to the fourth layer of convolution in the four layers of convolution is 32, 64, 128 and 256 respectively; the maximum pooling layer for each layer has a dimension of 2 x 2;

the number of filters corresponding to the first layer of deconvolution to the fourth layer of deconvolution in the four layers of deconvolution is 128, 64, 32 and 1 respectively, wherein the sizes of the filters in the four layers of deconvolution and the four layers of deconvolution are 3 x 3;

The arbiter network: the method comprises five convolutions and three fully connected layers, wherein each convolution in the five convolutions is connected with a maximum pooling layer, and the scale of the maximum pooling layer is 2 x 2; the number of filters corresponding to the convolution from the first layer to the fifth layer in the five-layer convolution is 32, 64, 128 and 256 respectively, and the sizes of the filters are all 3 × 3; the number of nodes corresponding to the first full link layer to the third full link layer in the three full link layers is 512, 128, and 1, respectively.

Preferably, in step 3), the countermeasure network is generated according to the loss function training of the following formula:

wherein L is_cGAN(G, D) represents the competing loss constraints of the generator and the arbiter,

a pixel-level constraint between the image block representing the generator and the real image block; d (x, y) represents the matching prediction of the discriminator on the image block pair (x, y), G (x, z) represents the output image block of the generator, D (x, G (x, z)) Representing the matching prediction of the discriminator on the image block pair (x, G (x, z)), against loss constraints hope that the discriminator correctly discriminates between a real image block and a pseudo image block generated by the generator; wherein, x represents the image block to be converted, y represents the target image block, z represents the input noise data, E represents the mathematical expectation, x, y-p _data(x, y) denotes that the variable (x, y) obeys the data distribution p_data(x,y)，||*||₁Represents a norm, λ represents a coefficient constant, z to p_z(z) represents that the variable z obeys the data distribution p_z(z)。

Preferably, in step 5), (5a), constructing a feature extraction network: the characteristic extraction network is a double-branch frame and is respectively used for extracting the characteristic information of the SAR and the optical image block, the two branches have the same network structure, and the network weight is not shared; each branch frame comprises three layers of convolutions, each layer of convolution is connected with a maximum pooling layer, and the size of the pooling layer is 2 x 2; meanwhile, the number of filters from the first layer convolution to the third layer convolution in the three-layer convolution is 32, 64 and 128 respectively, and the size of each filter is 3 x 3;

(5b) and constructing a matching network: the method comprises a convolution layer and three full-connection layers, wherein each convolution layer is connected with a maximum pooling layer, the size of each pooling layer is 2 x 2, the number of filters of the convolution layer is 256, and the size of each filter is 3 x 3; the node numbers from the first layer full connection layer to the third layer full connection layer in the three layers of full connection layers are 512, 128 and 1 respectively;

(5c) and respectively extracting the characteristic diagrams of the SAR and the optical image block by utilizing a characteristic extraction network of the double-branch framework, connecting the two extracted characteristic diagrams, inputting the two extracted characteristic diagrams into a matching network for matching judgment, and outputting a matching label of the image block pair.

Preferably, the feature extraction and matching network is trained in step 5) according to the loss function of the following formula:

wherein, y_iA true matching label representing the ith image block pair,

and the prediction matching probability obtained by the ith image block pair through the feature extraction and matching network is shown, and n is the number of the image block pairs.

Preferably, in the step 6), (6a), training the feature extraction and matching network constructed in the step 4) by using the matching network training sample obtained in the step 1), and iterating once;

(6b) training the feature extraction and matching network constructed in the step 4) by adopting the extended data training sample obtained in the step 4), and iterating once;

(6c) and (5) repeating the steps (6a) and (6b) in sequence, and carrying out cross iterative training for 40 times to obtain the trained feature extraction and the training weight of the matching network.

Preferably, in step 8), the method for removing the mismatch point is as follows:

(8a) according to the image I to be registered₁And I₂Calculating the edge correlation between image block pairs with the matching labels of 1 in the candidate matching point set, and setting the labels of the matching points which do not meet the correlation condition to be 0;

(8b) and performing iterative computation for multiple times by using an RANSAC method, and screening matching points in the candidate point set.

Preferably, in step 9), the geometric transformation matrix T is calculated as follows:

wherein T represents two images I₁And I₂A geometric transformation matrix between theta and theta representing the image I₂Compared with image I₁S denotes the image I₂Compared with image I₁Cos (×) represents a cosine trigonometric function, sin (×) represents a sine trigonometric function, t_xRepresenting a horizontal translation parameter, t_yRepresenting the vertical translation parameter.

Compared with the prior art, the invention has the beneficial effects that:

according to the optical and SAR image registration method based on the deep convolution GAN, firstly, the method adopts the generation countermeasure network to expand original training data, overcomes the problem that the amount of sample data of neural network training in the prior art is insufficient, and improves the precision of the registration result. Meanwhile, the data expansion of the generated countermeasure network also increases the diversity of the training samples, overcomes the problem of single training sample in the prior art, and is beneficial to improving the robustness of image registration.

Secondly, the invention adopts the improved neural network to extract and match the features of the image blocks, and extracts the feature map structure of the image blocks through the improved neural network, thereby overcoming the problems that the extracted feature vectors are not representative and have large difference in the prior art. Meanwhile, the information fusion is carried out on the feature map extracted from the image block by using a matching method based on a neural network, so that the spatial feature information beneficial to matching is greatly reserved, the problem that the spatial information is lost by the extracted feature vector in the prior art is solved, and the precision of image registration is improved.

Drawings

FIG. 1 is a general flow diagram of the present invention;

FIG. 2 is a simulation of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

As shown in fig. 1, the method for registering an optical image and a SAR image based on a deep convolution GAN provided by the present invention comprises the following steps:

step one, obtaining a training sample:

(1a) reading in registered heterologous images, wherein the heterologous images comprise SAR images and optical images;

(1b) extracting SIFT feature points A from the optical image by adopting an SIFT method, and cutting 32 x 32 optical image blocks from the optical image by taking the coordinates of the extracted SIFT feature points A as a central point; meanwhile, taking a feature point B corresponding to the coordinate of the SIFT feature point A extracted in the step (1B) from the SAR image;

(1c) randomly selecting a feature point C on the SAR image, then respectively taking the coordinates of the feature point B and the feature point C as centers on the SAR image, cutting two SAR image blocks of 32 x 32, and finally respectively combining the two SAR image blocks with corresponding optical image blocks to respectively obtain a group of positive sample and negative sample image block pairs of training data;

(1d) rotating the SAR image by any range angle, simultaneously calculating the transformation position of the extracted characteristic point B after rotation according to the rotation matrix of the SAR image, keeping the optical image unchanged, repeating the step (1c), wherein in the example, the rotation angle of the SAR image is all integer values within the range of (-30 degrees and 30 degrees, and the rotation angle is 1 degree every time;

(1e) All the obtained positive samples are used as training samples for generating the countermeasure network, and in the example, the number of the training samples for generating the countermeasure network is 7 ten thousand image block pairs;

(1f) and mixing all the obtained positive and negative samples to obtain training sample image block pairs of the matching network, wherein the number of the training samples of the matching network is 20 ten thousand image block pairs in the example.

Step two, two groups of generated countermeasure networks are constructed, the two groups of generated countermeasure networks have the same network structure, wherein each group of generated countermeasure networks has the same network structure

The pairwise countermeasure network includes a generator network and an arbiter network:

(2a) a generator network: the method comprises four layers of convolution and four layers of deconvolution, wherein each layer of convolution in the four layers of convolution is connected with a maximum pooling layer, and the number of filters corresponding to the first layer of convolution to the fourth layer of convolution in the four layers of convolution is 32, 64, 128 and 256 respectively; the maximum pooling layer per layer has a dimension of 2 x 2.

(2b) a discriminator network: the method comprises four convolutions and three fully-connected layers, wherein each convolution in the four convolutions is connected with a maximum pooling layer, and the scale of the maximum pooling layer is 2 x 2; the number of filters corresponding to the first convolution layer to the fourth convolution layer in the four convolution layers is 32, 64, 128 and 256 respectively, and the sizes of the filters are all 3 x 3; the number of nodes corresponding to the three fully-connected layers from the first fully-connected layer to the third fully-connected layer is 512, 128 and 1.

Step three, training to generate a confrontation network:

inputting the training samples for generating the confrontation networks obtained in the step (1e) into the two groups of the confrontation networks G constructed in the step two₁And G₂Wherein a first group generates a countermeasure network G₁The SAR image block is converted into an optical image block, and a second group generates a countermeasure network G₂The optical image blocks are converted into SAR image blocks, and then training is respectively carried out to obtain training weights corresponding to two generator networks.

In the embodiment of the invention, a countermeasure network is generated according to the loss function training of the following formula:

a pixel-level constraint between the image block representing the generator and the real image block; d (x, y) represents the matching prediction of the image block pair (x, y) by the discriminator, G (x, z) represents the output image block of the generator, D (x, G (x, z)) represents the matching prediction of the image block pair (x, G (x, z)) by the discriminator, and the discriminator is expected to correctly distinguish the real image block and the pseudo image block generated by the generator against loss constraint; where x denotes the image block to be converted, y denotes the target image block, z denotes the input noise data, E-tableX, y to p_data(x, y) denotes that the variable (x, y) obeys the data distribution p _data(x,y)，||*||₁Represents a norm, and lambda represents a coefficient constant, and the value is 100, z-p_z(z) represents that the variable z obeys the data distribution p_z(z)。

Expanding training sample data:

(4a) inputting SAR image blocks in the matching network training sample image block pairs obtained in the step (1f) into a trained generator network G₁Obtaining a converted optical image block, wherein the converted optical image block and the corresponding SAR image block are combined to form a positive sample; then keeping the order of the SAR image blocks unchanged, disordering the order of the converted optical image blocks, and combining the converted optical image blocks and the converted optical image blocks into negative samples;

(4b) inputting the optical image blocks in the matching network training sample image block pair obtained in the step (1f) into a trained generator network G₂Obtaining a converted SAR image block, wherein the converted SAR image and the corresponding optical image block are combined into a positive sample; then keeping the order of the optical image blocks unchanged, disordering the order of the converted SAR images, and combining the SAR images and the negative images into a negative sample;

(4c) respectively disordering the sequence of the positive samples and the negative samples obtained in the step (4a) and the step (4b), and further obtaining expanded training sample data;

in the embodiment of the present invention, the total number of the extended training data is 4 times of the original data amount, that is, 20 × 4 ten thousand, wherein the ratio of the number of positive and negative samples is 1: 1.

Step five, constructing a feature extraction network and a matching network:

(5a) constructing a feature extraction network: the characteristic extraction network is a double-branch frame and is respectively used for extracting the characteristic information of the SAR and the optical image block, the two branches have the same network structure, and the network weight is not shared; each branch frame comprises three layers of convolutions, each layer of convolution is connected with a maximum pooling layer, and the size of the pooling layer is 2 x 2; meanwhile, the number of filters from the first layer convolution to the third layer convolution in the three-layer convolution is 32, 64 and 128 respectively, and the size of each filter is 3 x 3;

(5b) construction of a matching network: the method comprises a convolution layer and three full-connection layers, wherein each convolution layer is connected with a maximum pooling layer, the size of each pooling layer is 2 x 2, the number of filters of the convolution layer is 256, and the size of each filter is 3 x 3; the node numbers from the first layer full connection layer to the third layer full connection layer in the three layers of full connection layers are 512, 128 and 1 respectively;

(5c) and respectively extracting the characteristic diagrams of the SAR and the optical image block by using a characteristic extraction network of the double-branch frame, connecting the two extracted characteristic diagrams, inputting the two extracted characteristic diagrams into a matching network for matching judgment, and outputting a matching label of the image block pair.

In the embodiment of the invention, a loss function training characteristic extraction and matching network is as follows:

wherein, y_iA true match label representing the ith image block pair,

Step six, training a feature extraction and matching network, and adopting a cross iterative training strategy:

(6a) training the feature extraction and matching network constructed in the step (4) by adopting the matching network training sample obtained in the step (1f), and iterating once;

(6b) training the feature extraction and matching network constructed in the step (4) by using the extended data training sample obtained in the step (4c), and iterating once;

(6c) and (6a) and (6b) are repeated in sequence, and the iteration is stopped when the calculated loss value is less than 1 x 10 (-3) through cross iterative training, in the embodiment, 40 times, so that the trained feature extraction and training weight of the matching network are obtained.

Step seven, predicting the matching relationship:

(7a) inputting a to-be-registered image, SAR image I₁And an optical image I₂；

(7b) Two images I are respectively extracted₁And I₂The SIFT feature points A are respectively arranged in the SAR image I by taking the extracted SIFT feature points A as the center₁And an optical image I ₂Cutting SAR image blocks and optical image blocks with the size of 32 x 32;

(7c) and (4) performing feature extraction and matching judgment on the SAR image block and the optical image block obtained in the step (7b) by using a trained feature extraction and matching network, outputting a matching predicted value of the central point of the image block, and setting the label of the image block with the output matching predicted value larger than a threshold value as 1, otherwise setting the label as 0, wherein 1 represents matching, 0 represents mismatching, and the label of 1 is taken as a candidate matching point, and the threshold value is 0.5 in the embodiment.

In the embodiment of the invention, the SAR image I is used as the auxiliary image₁The extracted image blocks form a set A₁＝{a_i|i＝0,1,2,...,n₁In which a is_iRepresenting the ith image block, n₁Is represented by I₁The number of extracted image blocks; from optical images I₂The extracted image blocks form a set A₂＝{b_i|i＝0,1,2,...,n₂In which b is_iRepresenting the ith image block, n₂Is represented by₂The number of the extracted image blocks; set A₁And set A₂The elements in the step (a) are combined in sequence to obtain an image pair set P ═ { c } of the matching relation to be predicted_i|i＝0,1,2,...,n₁*n₂In which c is_iRepresenting the ith image block pair, n₁*n₂Representing the number of combinations of pairs of image blocks; and inputting the image block pairs in the set P into the trained feature extraction and matching network to obtain a matching label 1 or 0, wherein 1 represents matching, and 0 represents mismatching.

Step eight, removing mismatching points:

(8a) and (3) correlation constraint: according to the image I to be registered₁And I₂The edge correlation between the image block pair with the matching label of 1 in the candidate matching point set is calculated and will not meet the requirementThe matching point of the correlation condition has a label of 0, in this example, the correlation constraint condition means that, for an image block, the image block with the highest correlation among all the image blocks matched with the image block is found, and the correlation between the image block and the image block is greater than a threshold, and the threshold is 0.1.

(8b) Random sample consensus algorithm (RANSAC) constraint: iterative computation is performed by using a RANSAC method until the point set corresponding to the largest point is found to be the result after screening, and matching points are screened in the candidate point set, wherein the iteration number is 100 in the example.

Step nine, calculating a geometric transformation matrix: calculating a geometric transformation matrix T between the images to be registered by using a least square method;

in the embodiment of the present invention, the calculated geometric transformation matrix T is represented by the following formula:

wherein T represents two images I₁And I₂A geometric transformation matrix between theta and theta, theta representing the image I₂Compared with image I₁S denotes the image I₂Compared with image I₁The scaling scale of (1), cos (×) represents a cosine trigonometric function, sin (×) represents a sine trigonometric function, t _xRepresenting a horizontal translation parameter, t_yRepresenting the vertical translation parameter.

Step ten, registering the images: using the geometrical transformation matrix T for the optical image I₂Performing geometric transformation to obtain final image I₁And I₂The result of the registration of (2).

The effect of the present invention will be further described with reference to simulation experiments.

1. Simulation experiment conditions are as follows:

the hardware platform of the simulation experiment of the invention is as follows: intel (r) Core5 processor of dell computer, main frequency 3.20GHz, memory 64 GB; the simulation software platform is as follows: python3.5, tensorflow1.2 platform.

2. And (3) analyzing the experimental content and the result:

the simulation experiment of the invention is divided into two simulation experiments.

Experimental image data: fig. 2(a) (b) are respectively obtained by cutting out the SAR image and the optical image of the entrance of the yellow river, and the sizes are 964 × 519 and 940 × 470. Wherein, FIG. 2(a) is SAR image I to be registered₁FIG. 2(b) is an optical image I to be registered₂。

Simulation experiment 1: compared with the prior art, the network in the invention is respectively trained by the same training set sample and then evaluated by the same test set sample as the network based on the double-branch characteristic vector structure and the network based on the double-branch characteristic diagram structure in the prior art. As shown in table 1, Alg1 in table 1 indicates a network according to the present invention, Alg2 indicates a network having a two-branch eigenvector structure, and Alg3 indicates a network having a two-branch eigenvector structure.

TABLE 1 accuracy of three network simulation experiment test sets

As can be seen from table 1, compared with a network based on a dual-branch eigenvector structure and a network based on a dual-branch eigen map structure, the network in the present invention has the highest matching accuracy of the test set sample image block pair. And the network has faster convergence compared with other two network structures during training, and reaches higher accuracy rate at the 20 th iteration, which shows that the network structure provided by the method has better characteristic extraction and matching performance.

Simulation experiment 2: compared with the prior art, the SAR image and the optical image are respectively registered by adopting the method of the invention and the traditional SIFT method, and the registration result is compared. Registering the images to be registered in fig. 2(a) and 2(b) by using the conventional traditional SIFT method, wherein effective matching key points cannot be obtained, and the images cannot be registered, as shown in fig. 2 (c); the invention registers the images to be registered in fig. 2(a) and fig. 2(b), and the obtained matching key points are 14 pairs, as shown in fig. 2(d), and the final registration result of the invention is shown in fig. 2 (e).

TABLE 2 Experimental result quantitative data of optical and SAR image registration method based on deep convolution GAN

As can be seen from fig. 2(e), which is a final registration result of the present invention, the deep convolution GAN-based optical and SAR image registration method can correctly register the SAR and the optical image, whereas the conventional SIFT-based method cannot effectively register the SAR and the optical image, as shown in fig. 2 (c). In conclusion, the method of the invention can achieve better result on the registration of the SAR and the optical image, and has high registration precision and strong robustness.

Claims

1. The optical and SAR image registration method based on the depth convolution GAN is characterized by comprising the following steps:

step 3), training to generate a confrontation network: inputting the training samples for generating the confrontation network obtained in the step 1) into the two groups of the confrontation network G constructed in the step 2)₁And G₂Wherein a first group generates a countermeasure network G₁The SAR image block is converted into an optical image block, and a second group generates a countermeasure network G ₂Converting an optical image block into an SAR image block, and then respectively training to obtain training weights corresponding to two generator networks;

step 4), expanding training sample data: firstly, inputting SAR image blocks in the matching network training sample image block pairs obtained in the step 1) into a trained generator network G₁In the step (1), the first step,obtaining a converted optical image block, wherein the converted optical image block and the corresponding SAR image block are combined into a positive sample; then keeping the order of the SAR image blocks unchanged, disordering the order of the converted optical image blocks, and combining the converted optical image blocks and the negative samples into a negative sample; secondly, inputting the optical image blocks in the matching network training sample image block pairs obtained in the step 1) into a trained generator network G₂Obtaining a converted SAR image block, wherein the converted SAR image and the corresponding optical image block are combined into a positive sample; then keeping the order of the optical image blocks unchanged, disordering the order of the converted SAR images, and combining the SAR images and the negative images into a negative sample; finally, the sequence of the obtained positive sample and the negative sample is respectively disturbed, and then expanded training sample data is obtained;

step 5), constructing a feature extraction and matching network;

step 8), removing mismatching points;

step 10), registering images: using the geometric transformation matrix T for the optical image I₂Performing geometric transformation to obtain final image I₁And I₂The registration result of (1);

in step 5), (5a) constructing a feature extraction network: the characteristic extraction network is a double-branch frame and is respectively used for extracting the characteristic information of the SAR and the optical image block, the two branches have the same network structure, and the network weight is not shared; each branch frame comprises three layers of convolutions, each layer of convolution is connected with a maximum pooling layer, and the size of the pooling layer is 2 x 2; meanwhile, the number of filters from the first layer of convolution to the third layer of convolution in the three-layer convolution is 32, 64 and 128 respectively, and the size of each filter is 3 x 3;

(5c) respectively extracting the characteristic diagrams of the SAR and the optical image block by utilizing a characteristic extraction network of the double-branch framework, connecting the two extracted characteristic diagrams, inputting the two extracted characteristic diagrams into a matching network for matching judgment, and outputting a matching label of the image block pair;

in step 2), the generator network: the method comprises four layers of convolution and four layers of deconvolution, wherein each layer of convolution in the four layers of convolution is connected with a maximum pooling layer, and the number of filters corresponding to the first layer of convolution to the fourth layer of convolution in the four layers of convolution is 32, 64, 128 and 256 respectively; the maximum pooling layer for each layer has a dimension of 2 x 2;

the arbiter network: the method comprises five convolutions and three fully connected layers, wherein each convolution in the five convolutions is connected with a maximum pooling layer, and the scale of the maximum pooling layer is 2 x 2; the number of filters corresponding to the convolution from the first layer to the fifth layer in the five-layer convolution is 32, 64, 128 and 256 respectively, and the sizes of the filters are all 3 × 3; the number of nodes corresponding to the three fully-connected layers from the first fully-connected layer to the third fully-connected layer is 512, 128 and 1.

2. The method for deep convolutional GAN-based optical and SAR image registration as claimed in claim 1, wherein in step 1), the method for obtaining training samples is as follows:

(1a) reading in the registered SAR image and the optical image;

(1b) extracting SIFT feature points A from the optical image by adopting an SIFT method, and cutting 32 x 32 optical image blocks from the optical image by taking the coordinates of the extracted SIFT feature points A as a central point; meanwhile, taking a feature point B corresponding to the coordinates of the SIFT feature point A from the SAR image;

(1c) randomly selecting a feature point C on the SAR image, then respectively cutting two SAR image blocks of 32 x 32 on the SAR image by taking the coordinates of the feature point B and the feature point C as centers, and finally respectively combining the two SAR image blocks with corresponding optical image blocks to respectively obtain a positive sample image block pair and a negative sample image block pair of the training data;

(1d) rotating the SAR image by any range angle, simultaneously calculating the transformation position of the extracted characteristic point B after rotation according to the rotation matrix of the SAR image, keeping the optical image unchanged at the moment, and repeating the step (1 c);

(1e) taking all the obtained positive samples as training samples for generating the countermeasure network;

3. The method as claimed in claim 1, wherein in step 3), the generation of the countermeasure network is trained according to the loss function of the following formula:

a pixel-level constraint between the image block representing the generator and the real image block; d (x, y) represents the matching prediction of the image block pair (x, y) by the discriminator, G (x, z) represents the output image block of the generator, D (x, G (x, z)) represents the matching prediction of the image block pair (x, G (x, z)) by the discriminator, and the discriminator is expected to correctly discriminate the real image block and the pseudo image block generated by the generator against loss constraint; wherein, x represents the image block to be converted, y represents the target image block, z represents the input noise data, E represents the mathematical expectation, x, y-p_data(x, y) denotes that the variable (x, y) obeys the data distribution p_data(x,y)，||*||₁Represents a norm, λ represents a coefficient constant, z to p_z(z) denotes that the variable z obeys the data distribution p_z(z)。

4. The method of claim 1, wherein the feature extraction and matching network is trained in step 5) according to a loss function of the following formula:

Wherein, y_iA true matching label representing the ith image block pair,

5. The method for optical and SAR image registration based on deep convolution GAN of claim 1, wherein in step 6), (6a), the matching network training sample obtained in step 1) is adopted to train the feature extraction and matching network constructed in step 4), and iteration is performed once;

(6c) and (6) repeating the steps (6a) and (6b) in sequence, and performing cross iterative training for 40 times to obtain the trained feature extraction and the training weight of the matching network.

6. The method for deep convolutional GAN based optical and SAR image registration as claimed in claim 1, wherein the method for removing the mis-matching points in step 8) is as follows:

7. The method for deep convolutional GAN based optical and SAR image registration as claimed in claim 1, wherein in step 9), the computed geometric transformation matrix T is as follows:

wherein T represents two images I₁And I₂A geometric transformation matrix between theta and theta, theta representing the image I₂Compared with image I₁S denotes the image I₂Compared with image I₁The scaling of (1), cos (×) represents a cosine trigonometric function, sin (×) represents positiveChord trigonometric function, t_xRepresenting a horizontal translation parameter, t_yRepresenting the vertical translation parameters.