CN108564606B

CN108564606B - Heterogeneous image block matching method based on image conversion

Info

Publication number: CN108564606B
Application number: CN201810276986.3A
Authority: CN
Inventors: 王爽; 焦李成; 王若静; 权豆; 方帅; 梁雪峰; 郭雨薇; 刘飞航
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2018-03-30
Filing date: 2018-03-30
Publication date: 2022-06-24
Anticipated expiration: 2038-03-30
Also published as: CN108564606A

Abstract

The invention provides a heterogeneous image block matching method based on image conversion, which comprises the following steps: the method comprises the steps of obtaining a training sample and a testing sample, constructing an image conversion network, training the image conversion network, constructing a feature extraction and matching network, training the feature extraction and matching network, and predicting a matching result. The method solves the problems of large feature difference and inaccuracy in extraction of the heterogeneous image in the prior art, effectively reduces the matching difficulty, and improves the accuracy and robustness of matching the heterogeneous image block.

Description

Heterogeneous image block matching method based on image conversion

Technical Field

The invention belongs to the technical field of image processing, and further relates to a heterogeneous image block matching method based on image conversion in an image matching method.

Background

Image registration techniques have proven suitable for many practical applications, including remote sensing image analysis, image fusion, medical image processing, three-dimensional reconstruction, pattern recognition, and the like. As a key technology in the image registration process based on the deep neural network, the image block matching method has very important research significance and value.

The image block matching methods widely applied at present are mainly divided into two types, one is a method based on the traditional manual feature extraction, and the other is a method based on the neural network feature extraction. The method for manually extracting the features is characterized in that local significant features of an image are extracted by utilizing information such as gray scale, outline shape and the like of an image block, and then the image block pair is matched by measuring similarity through feature distance. The method for extracting the features based on the neural network extracts the features of the image blocks through the network and then utilizes the matching network to perform matching measurement on the extracted features. Due to the complementarity of characteristic information between the heterogeneous images, for example, the SAR image and the optical image reflect the electromagnetic radiation and optical radiation information of the ground feature respectively, matching the heterogeneous images can obtain more complete and exact image content, which is very beneficial to the subsequent registration of the heterogeneous images. However, the feature difference between the heterogeneous images is large, especially the speckle noise existing in the SAR image, the feature information having consistency and correlation between them cannot be obtained based on the manual feature extraction method, and the result of the matching error is easily caused. In recent years, a hotter generation countermeasure network is researched, a heterogeneous image can be converted into a homogeneous image for matching, and the matching difficulty is reduced. In addition, the matching network is used for carrying out matching detection on the converted homologous images, and compared with the traditional matching method, the method can extract the characteristic information of a higher layer in the images, and is favorable for improving the matching precision and robustness.

A method based On generating a countermeasure network and Normalized Cross-Correlation (NCC) match is proposed In a paper "On the compatibility of Conditional additive Networks for Multi-Sensor Image Matching" (In IGARSS, IEEE,2017) published by Merkle et al. The method converts a heterogeneous remote sensing image into a homologous image by utilizing a generated countermeasure network, namely converts an optical image into an SAR image, and then performs similarity matching on the image converted into the homologous image through NCC. The method has the defects that the NCC method based on the pixels is adopted for similarity matching, the calculated amount is large, the matching speed is low, the method is greatly influenced by noise such as local illumination and the like, and the matching robustness is poor.

An image block Feature extraction and Matching method Based on neural network is proposed In a paper published by Xufeng Han et al, namely, a unification Feature and a Metric Learning for Patch-Based Matching (In CVPR, IEEE, 2015). The method comprises the steps of respectively extracting feature information of image blocks to be registered by utilizing a deep convolution network, and then inputting the extracted features into a matching network for matching detection. The method has the defects that the double-branch structure network for extracting the image block features is shared by parameters, the structure and the parameters of the feature extraction network for two image blocks to be registered are the same, the targeted feature extraction can not be performed on different images, the extracted image block information is a feature vector, the spatial information of the features is lost, and the network matching precision and the generalization are poor.

Disclosure of Invention

The invention aims to provide a heterogeneous image block matching method based on image conversion, which solves the defects that the existing matching method can not perform targeted feature extraction on different images when matching heterogeneous images, the extracted image block information is a feature vector, the spatial information of the features is lost, and the network matching precision and the generalization are poor.

In order to achieve the purpose, the invention adopts the technical scheme that:

the invention provides a heterogeneous image block matching method based on image conversion, which comprises the following steps:

step 1), obtaining a training sample according to the registered optical image and SAR image to obtain a training sample for generating a countermeasure network and a training sample for generating a matching network;

step 2), constructing an image conversion network, wherein the image conversion network comprises a generator network and a discriminator network;

step 3), training an image conversion network: training the image conversion network constructed in the step 2) by using the training sample for generating the countermeasure network obtained in the step 1) to obtain the weight corresponding to the generator network;

step 4), constructing a feature extraction and matching network;

step 5), training the feature extraction and matching network to obtain the training weight of the constructed feature extraction and matching network;

step 6), predicting a matching result: firstly, converting an SAR image block in the matching network test sample obtained in the step 1) to obtain a pseudo optical image block of the test sample; secondly, loading the feature extraction and matching network weight obtained in the step 5) into the network constructed in the step 4), combining the optical image block and the pseudo optical image block in the matching network test sample obtained in the step 1) into an image block pair, inputting the image block pair into the feature extraction and matching network loaded with the weight for matching result prediction, outputting the predicted matching probability of the image block to a central point, and calculating the matching precision of the network according to the predicted matching result.

Preferably, the method for obtaining the training sample in step 1) is as follows:

(1a) reading in the registered different-source images, wherein the different-source images comprise SAR images and optical images;

(1b) carrying out angular point detection on the optical image by adopting a Harris method, and cutting the optical image into blocks by taking the coordinates of the detected angular point A as a central point and a fixed value w as side length to obtain an optical image block;

(1c) taking a characteristic point B corresponding to the coordinates of the angular point on the SAR image, randomly selecting a characteristic point C on the SAR image, then respectively taking the coordinates of the characteristic point B and the characteristic point C as the center on the SAR image, cutting the SAR image by taking a fixed value w as the side length to respectively obtain two corresponding SAR image blocks, and then respectively combining the two obtained SAR image blocks with optical image blocks corresponding to the angular point A to obtain a positive sample image block pair and a negative sample image block pair of training data;

(1d) rotating the SAR image by an angle in any range, and repeating the step (1 c);

(1e) taking all the obtained positive samples as training samples for generating the countermeasure network;

(1f) and mixing all the obtained positive and negative samples, wherein 3/4 of the mixed sample is used as a training sample of the matching network, and 1/4 of the mixed sample is used as a test sample of the matching network.

Preferably, in step 2), the image conversion network is constructed, such as the following steps:

(2a) constructing a generator network: the method comprises three-layer convolution and three-layer deconvolution, wherein each layer of the three-layer convolution is connected with a 2 x 2 maximum pooling layer; the number of filters corresponding to the convolution from the first layer to the third layer in the three-layer convolution is 32, 64 and 128 respectively; the number of filters corresponding to the first convolution layer to the third convolution layer in the three-layer deconvolution is respectively 64, 32 and 1, and the size of the filter of each convolution layer is 3 x 3;

(2b) constructing a discriminator network: the method comprises three convolution layers and three fully-connected layers, wherein each convolution layer in the three convolution layers is connected with a 2 x 2 maximum pooling layer; the number of filters corresponding to the convolution layers from the first convolution layer to the third convolution layer is 32, 64 and 128 respectively, and the size of the filter of each convolution layer is 3 x 3; the number of nodes corresponding to the first layer to the third layer in the three fully-connected layers is respectively 64, 32 and 1.

Preferably, in step 3), the generation of the countermeasure network is trained according to the loss function of the following formula:

wherein L is_cGAN(G, D) represents the competing loss constraints of the generator and the arbiter,

representing pixel-level constraints between image blocks of the generator and real image blocks; d (x, y) represents the matching prediction of the image block pair (x, y) by the discriminator, G (x, z) represents the output image block of the generator, D (x, G (x, z)) represents the matching prediction of the image block pair (x, G (x, z)) by the discriminator, and the restraint of loss is expected that the discriminator correctly distinguishes a real image block and a false image block generated by the generator, wherein x represents the image block to be converted, y represents the target image block, z represents the input noise data, E represents the mathematical expectation, and x, y to p represent the mathematical expectation_data(x, y) denotes that the variable (x, y) obeys the data distribution p_data(x,y)，||*||₁Denotes a norm, λ denotes a constant coefficient, z to p_z(z) denotes that the variable z obeys the data distribution p_z(z)。

Preferably, in step 4), the feature extraction and matching network is constructed, such as the following steps:

(4a) constructing a feature extraction network: the method comprises three layers of convolution, wherein each layer of convolution is connected with a 2 x 2 maximum pooling layer, the number of filters corresponding to the convolution from the first layer to the third layer in the three layers of convolution layers is respectively 32, 64 and 128, and the size of each layer of convolution filter is 3 x 3;

(4b) constructing a matching network: the method comprises a convolution layer and three full-connection layers, wherein each convolution layer is connected with a 2 x 2 maximum pooling layer, the number of filters of the convolution layer is 256, and the size of each filter is 3 x 3; the number of nodes corresponding to the first layer to the third layer in the three fully connected layers is 512, 128 and 1 respectively.

Preferably, in step 5), the feature extraction and training of the matching network includes the following steps:

(5a) obtaining a conversion image block of the SAR image block: loading the generator weight obtained in the step 3) into the generator network constructed in the step 2), and then inputting the SAR image block of the image block pair in the matching network training sample obtained in the step 1) into the generator network loaded with the weight to obtain a conversion image block corresponding to the SAR image block, namely a pseudo-optical image block;

(5b) combining the optical image blocks of the image block pairs in the matching network training sample obtained in the step 1) and the pseudo optical image blocks obtained in the step (5a) into image block pairs, and training the feature extraction and matching network constructed in the step 4) to obtain the training weights of the constructed feature extraction and matching network.

Preferably, in step 5), the feature extraction and matching network is trained according to the loss function of the following formula:

wherein, y_iA true matching label representing the ith image block pair,

and the prediction matching probability obtained by the ith image block pair through the feature extraction and matching network is shown, and n is the number of the image block pairs.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a heterogeneous image block matching method based on image conversion, which comprises the following steps: the method comprises the steps of obtaining a training sample and a testing sample, constructing an image conversion network, training the image conversion network, constructing a feature extraction and matching network, training the feature extraction and matching network, and predicting a matching result. The method solves the problems of large feature difference and inaccuracy in extraction of the heterogeneous image in the prior art, effectively reduces the matching difficulty, and improves the accuracy and robustness of the matching of the heterogeneous image blocks.

Furthermore, the method adopts the generation countermeasure network to convert the heterogeneous remote sensing images into the homogeneous images for matching, overcomes the problems of large difference and inaccuracy of the characteristics of the heterogeneous image extraction in the prior art, and reduces the matching difficulty. Meanwhile, the image conversion can well reduce the influence of speckle noise in the SAR image on the matching result, find the correlation among image characteristics and overcome the problems of low matching accuracy and poor robustness of the heterogeneous image in the prior art;

furthermore, the invention adopts a feature extraction network with a double-branch structure and without sharing parameters to obtain the feature map information of the image block, thereby overcoming the problems that the feature extraction network in the prior art can not extract features aiming at different images and the extracted image features lose spatial information. Meanwhile, the neural network is used for matching and predicting the extracted image features, the problem that the distance is calculated inaccurately through the features in the prior art is solved, and the accuracy and the robustness of image block matching are improved.

Drawings

FIG. 1 is a general flow chart of the present invention;

FIG. 2 is a SAR and optical image of the invention to be diced; wherein fig. 2a is an optical image and fig. 2b is a SAR image;

fig. 3 is a transformed image effect generated by the generator of the present invention, where fig. 3a is a real SAR image block, fig. 3b is a real optical image block, and fig. 3c is a pseudo optical image block.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

As shown in fig. 1, the method for matching heterogeneous image blocks based on image transformation provided by the present invention comprises the following steps:

step 1, obtaining a training sample and a test sample:

(1a) reading in the registered heterogeneous images, wherein the heterogeneous images comprise SAR images and optical images;

(1b) performing corner point detection on the optical image by using a Harris method, and dicing the optical image by using the coordinates of the detected corner point A as a central point and a fixed value w as a side length to obtain an optical image block, wherein the value of w is 32 in the example;

(1c) taking a characteristic point B corresponding to the coordinates of an angular point on an SAR image, randomly selecting a characteristic point C on the SAR image, then respectively taking the coordinates of the characteristic point B and the characteristic point C as the center on the SAR image, cutting the SAR image by taking a fixed value w as the side length to respectively obtain two corresponding SAR image blocks, and then respectively combining the two obtained SAR image blocks with optical image blocks corresponding to the angular point A to obtain a group of positive sample and negative sample image block pairs of training data;

(1d) rotating the SAR image by an angle within an arbitrary range, and repeating the step (1c), wherein in the example, the rotation angle of the SAR image is all integer values within a range of (-20 degrees and 20 degrees), and the rotation angle is 1 degree every time;

(1e) all the obtained positive samples are used as training samples for generating a countermeasure network;

(1f) all the obtained positive and negative samples are mixed, wherein 3/4 is used as a training sample of the matching network, and the rest is used as a test sample of the matching network.

Step 2, constructing an image conversion network, wherein the image conversion network comprises a generator network and a discriminator network, and the image conversion network comprises the following components:

(2a) constructing a generator network: the method comprises three-layer convolution and three-layer deconvolution, wherein each layer of the three-layer convolution is connected with a 2 x 2 maximum pooling layer; the number of filters corresponding to the first layer convolution to the third layer convolution in the three-layer convolution is 32, 6 and 128 respectively; the number of filters corresponding to the first convolution layer to the third convolution layer in the three-layer deconvolution is respectively 64, 32 and 1, and the size of the filter of each convolution layer is 3 x 3;

(2b) constructing a discriminator network: the method comprises three convolution layers and three fully-connected layers, wherein each convolution layer in the three convolution layers is connected with a 2 x 2 maximum pooling layer; the number of filters corresponding to the convolution layers from the first convolution layer to the third convolution layer is 32, 64 and 128 respectively, and the size of the filter of each convolution layer is 3 x 3; the number of the nodes corresponding to the first layer to the third layer in the three fully-connected layers is 64, 32 and 1 respectively;

step 3, training the image conversion network: training the image conversion network constructed in the step (2) by using the training sample for generating the countermeasure network obtained in the step (1e) to obtain the weight corresponding to the generator network;

in the embodiment of the invention, a countermeasure network is generated according to the loss function training of the following formula:

representing pixel-level constraints between image blocks of the generator and real image blocks; d (x, y) represents the matching prediction of the image block pair (x, y) by the discriminator, G (x, z) represents the output image block of the generator, D (x, G (x, z)) represents the matching prediction of the image block pair (x, G (x, z)) by the discriminator, and the anti-loss constraint is that the discriminator correctly distinguishes a real image block from a false image block generated by the generator, wherein x represents the image block to be converted, y represents the target image block, z represents the input noise data, E represents the mathematical expectation, and x, y-p_data(x, y) denotes that the variable (x, y) obeys the data distribution p_data(x,y)，||*||₁Denotes a norm, λ denotes a constant coefficient, z to p_z(z) denotes that the variable z obeys the data distribution pz (z).

Step 4, constructing a feature extraction and matching network, wherein:

(4b) constructing a matching network: the method comprises a convolution layer and three full-connection layers, wherein each convolution layer is connected with a 2 x 2 maximum pooling layer, the number of filters of the convolution layer is 256, and the size of each filter is 3 x 3; the number of nodes corresponding to the first layer to the third layer in the three fully-connected layers is 512, 128 and 1 respectively;

step 5, training a feature extraction and matching network:

(5a) obtaining a conversion image block of the SAR image block: loading the generator weight obtained in the step (3) into the generator network constructed in the step (2a), and then inputting the SAR image block of the image block pair in the matching network training sample obtained in the step (1f) into the generator network loaded with the weight to obtain a conversion image block corresponding to the SAR image block, namely a pseudo-optical image block; in this embodiment, fig. 3(a) is a real SAR image block, and a pseudo optical image block as shown in fig. 3(c) is finally obtained, where the real SAR image block is shown in fig. 3 (b);

(5b) combining the optical image blocks of the image block pairs in the matching network training sample obtained in the step (1f) and the pseudo optical image blocks obtained in the step (5a) into image block pairs, and training the feature extraction and matching network constructed in the step (4) to obtain training weights of the constructed feature extraction and matching network;

in the embodiment of the invention, a loss function training characteristic extraction and matching network is as follows:

wherein, y_iA true match label representing the ith image block pair,

And 6, predicting a matching result:

(6a) converting the SAR image block in the matching network test sample obtained in the step (1f) by using the same method in the step (5a) to obtain a pseudo optical image block of the test sample;

(6b) and (3) loading the weight of the feature extraction and matching network obtained in the step (5b) into the network constructed in the step (4), combining the optical image blocks of the image block pair in the matching network test sample obtained in the step (1f) and the pseudo optical image block obtained in the step (6a) into an image block pair, inputting the image block pair into the feature extraction and matching network loaded with the weight for predicting the matching result, outputting the predicted matching probability of the image block to the central point, wherein the range is [0,1], and calculating the matching precision of the network according to the predicted matching result.

In the embodiment of the invention, the image blocks to be matched are input into a network to predict the matching result, and a matching label 1 or 0 is obtained, wherein 1 represents matching, and 0 represents mismatching.

The effect of the present invention will be further described with reference to simulation experiments.

1. Simulation experiment conditions are as follows:

the hardware platform of the simulation experiment of the invention is as follows: intel (r) Core5 processor of dell computer, main frequency 3.20GHz, memory 64 GB; the simulation software platform is as follows: python3.5, tensorflow1.2 platform.

2. And (3) analyzing the experimental content and the result:

experimental image data: fig. 2(a) and (b) are obtained by respectively cutting out the SAR image and the optical image of the sea entrance of the yellow river, wherein the sizes of the SAR image and the optical image are 940 multiplied by 470, and the registered heterologous images are cut into blocks to obtain a training sample and a test sample.

Experiments compare the method with the prior art, and the method based on generation of the countermeasure network and normalization cross-correlation matching and the method based on extraction and matching of the image block features of the neural network are adopted to respectively train by using the same training set samples and then evaluate various methods by using the same test set samples. Evaluation results as shown in table 1, Alg1 in table 1 indicates the method of the present invention, Alg2 indicates the method of image block feature extraction and matching based on a neural network, and Alg3 indicates the method of matching based on generation of a competing network and normalized cross-correlation.

TABLE 1 accuracy of three network simulation experiment test sets

As can be seen from table 1, the accuracy of the method for performing matching prediction on the extracted image block features by using the network is 13% to 15% higher than that of the method based on normalized cross-correlation matching, which indicates that the method using the matching network has better feature matching performance. Compared with the image block feature extraction and matching method based on the neural network, the method has the advantages that the network convergence speed is higher, higher accuracy is achieved in 20 th iteration, and the final matching accuracy is higher, so that the method for converting the heterogeneous image into the homologous image provided by the method can obviously reduce the difficulty of matching the heterogeneous image block, and meanwhile, the method has better feature extraction and matching performance.

Claims

1. The method for matching the image blocks of different source based on image conversion is characterized by comprising the following steps:

step 4), constructing a feature extraction and matching network;

step 6), predicting a matching result: firstly, converting an SAR image block in the matching network test sample obtained in the step 1) to obtain a pseudo optical image block of the test sample; secondly, loading the feature extraction and matching network weight obtained in the step 5) into the network constructed in the step 4), combining an optical image block and a pseudo optical image block in the matching network test sample obtained in the step 1) into an image block pair, inputting the image block pair into the feature extraction and matching network loaded with the weight for matching result prediction, outputting the predicted matching probability of the image block to a central point, and calculating the matching precision of the network according to the predicted matching result;

in the step 4), the feature extraction and the construction of the matching network comprise the following steps:

in step 5), the feature extraction and the training of the matching network comprise the following steps:

2. The image transformation-based heterogeneous image block matching method according to claim 1, wherein the method for obtaining training samples in step 1) is as follows:

3. The image transformation-based heterogeneous image block matching method according to claim 1, wherein in step 2), the image transformation network is constructed by the following steps:

(2a) constructing a generator network: the method comprises three-layer convolution and three-layer deconvolution, wherein each layer of the three-layer convolution is connected with a 2 x 2 maximum pooling layer; the number of filters corresponding to the first layer convolution to the third layer convolution in the three-layer convolution is 32, 64 and 128 respectively; the number of filters corresponding to the first convolution layer to the third convolution layer in the three-layer deconvolution is respectively 64, 32 and 1, and the size of the filter of each convolution layer is 3 x 3;

4. The image-transformation-based heterogeneous image block matching method according to claim 1, wherein in step 3), the generation countermeasure network is trained according to the loss function of the following formula:

representing pixel-level constraints between image blocks of the generator and real image blocks; d (x, y) represents the matching prediction of the image block pair (x, y) by the discriminator, G (x, z) represents the output image block of the generator, D (x, G (x, z)) represents the matching prediction of the image block pair (x, G (x, z)) by the discriminator, and the restraint of loss is expected that the discriminator correctly distinguishes a real image block and a false image block generated by the generator, wherein x represents the image block to be converted, y represents the target image block, z represents the input noise data, E represents the mathematical expectation, and x, y to p_data(x, y) denotes that the variable (x, y) obeys the data distribution p_data(x,y)，||*||₁Denotes a norm, λ denotes a constant coefficient, z to p_z(z) denotes that the variable z obeys the data distribution p_z(z)。

5. The image-transformation-based heterogeneous image block matching method according to claim 1, wherein in step 5), the feature extraction and matching network is trained according to the loss function of the following formula:

wherein, y_iA true match label representing the ith image block pair,