CN114494372B

CN114494372B - Remote sensing image registration method based on unsupervised deep learning

Info

Publication number: CN114494372B
Application number: CN202210026370.7A
Authority: CN
Inventors: 叶沅鑫; 唐腾峰; 朱柏; 张家诚; 喻智睿
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2022-01-11
Filing date: 2022-01-11
Publication date: 2023-04-21
Anticipated expiration: 2042-01-11
Also published as: CN114494372A

Abstract

The invention discloses a remote sensing image registration method based on unsupervised deep learning, which converts image registration into regression optimization, and can integrate a feature extraction network, image similarity measure and feature descriptors of various forms and parameters. According to the invention, the depth characteristics of the images to be registered are extracted on a plurality of scales by using a model network, geometric transformation parameters are obtained through parameter regression, the images are geometrically corrected by using the parameters, and the multi-scale gradual registration of the images from coarse to fine is realized. According to the invention, registration truth values are not needed to be used as training samples, the loss functions on multiple scales are jointly trained by constructing the loss functions based on the similarity measure and the feature descriptors between the images, the parameters of each model network are updated by back propagation, the geometric transformation parameters are optimized, and the high-precision and high-robustness multi-source remote sensing image registration is realized.

Description

A remote sensing image registration method based on unsupervised deep learning

技术领域Technical Field

本发明属于遥感技术领域，具体涉及一种基于无监督深度学习的遥感影像配准方法的设计。The present invention belongs to the field of remote sensing technology, and in particular relates to the design of a remote sensing image registration method based on unsupervised deep learning.

背景技术Background Art

随着航空航天与遥感技术的飞速发展，遥感影像的获取手段不断增加，类型不断丰富。由于各类传感器的设备技术与成像机理的差异，单一数据源的遥感影像难以全面反映出地物的特征。为充分利用不同类型传感器获取到的多源遥感数据，实现集成和信息互补，需对多源遥感影像进行配准。With the rapid development of aerospace and remote sensing technology, the means of acquiring remote sensing images are increasing and the types are becoming more diverse. Due to the differences in equipment technology and imaging mechanisms of various sensors, remote sensing images from a single data source are difficult to fully reflect the characteristics of ground objects. In order to make full use of multi-source remote sensing data obtained by different types of sensors and realize integration and information complementarity, multi-source remote sensing images need to be registered.

多源遥感影像配准是指对在不同时间、不同视角或不同传感器条件下获取的同一地区的多传感器遥感影像进行对齐和信息叠加的过程，使得对齐后影像上的同名点具有相同的地理坐标。现有技术中，多源遥感影像配准的方法包括不需要采用深度学习技术的传统方法和基于深度学习的方法。传统方法基于特征或区域模板，其依赖于手工设计的特征，针对不同传感器不同模态的遥感影像配准，这些手工特征通常需要重新设计。基于深度学习的方法从多源遥感影像中提取出深层次的特征，相比于手工特征具有更好的通用性。现阶段基于有监督深度学习的方法需要大量带有真值标签的样本作为训练数据，而现阶段遥感领域并无大量的数据以供训练，成本因素限制了该类方法的实际应用。Multi-source remote sensing image registration refers to the process of aligning and superimposing information on multi-sensor remote sensing images of the same area acquired at different times, different perspectives or different sensor conditions, so that the same-name points on the aligned images have the same geographic coordinates. In the prior art, the methods for multi-source remote sensing image registration include traditional methods that do not require the use of deep learning technology and methods based on deep learning. Traditional methods are based on features or regional templates, which rely on manually designed features. For remote sensing image registration of different sensors and different modalities, these manual features usually need to be redesigned. Methods based on deep learning extract deep features from multi-source remote sensing images and have better versatility than manual features. At present, methods based on supervised deep learning require a large number of samples with true value labels as training data, but at present, there is no large amount of data in the remote sensing field for training, and cost factors limit the practical application of such methods.

发明内容Summary of the invention

本发明的目的是为了解决现有基于有监督深度学习的遥感影像配准方法难以获取大量训练样本的问题，提出了一种基于无监督深度学习的遥感影像配准方法，可在无训练样本的情况下实现遥感影像间的精确配准。The purpose of the present invention is to solve the problem that it is difficult to obtain a large number of training samples in the existing remote sensing image registration method based on supervised deep learning, and propose a remote sensing image registration method based on unsupervised deep learning, which can achieve accurate registration between remote sensing images without training samples.

本发明的技术方案为：一种基于无监督深度学习的遥感影像配准方法，包括以下步骤：The technical solution of the present invention is: a remote sensing image registration method based on unsupervised deep learning, comprising the following steps:

S1、建立包括两组影像数据的多源遥感影像配准数据集，两组影像数据的两两影像之间逐一对应，其中一组影像数据作为参考影像数据集，另一组影像数据作为待校正影像数据集。S1. Establish a multi-source remote sensing image registration dataset including two sets of image data, where each two images of the two sets of image data correspond to each other one by one, one set of image data is used as a reference image dataset, and the other set of image data is used as an image dataset to be corrected.

S2、从参考影像数据集中选取一个参考影像f，从待校正影像数据集中选取与参考影像f对应的待校正影像m，将参考影像f和待校正影像m作为在一个训练样本上的端对端的输入。S2. Select a reference image f from the reference image data set, select an image m to be corrected corresponding to the reference image f from the image data set to be corrected, and use the reference image f and the image m to be corrected as end-to-end inputs on a training sample.

S3、在3个尺度上分别计算影像在各尺度的模型网络上的变换参数μ₁、μ₂、μ₃，对待校正影像m进行逐步校正，产生校正影像m₁、m₂、m₃，反向传播各尺度的模型网络的损失函数，并将校正影像m₃和变换参数μ₃作为在一个训练样本上的端对端的输出。S3. Calculate the transformation parameters μ ₁ , μ ₂ , and μ ₃ of the image on the model network of each scale at three scales respectively, correct the image m to be corrected step by step, generate corrected images m ₁ , m ₂ , and m ₃ , back-propagate the loss function of the model network of each scale, and use the corrected image m ₃ and the transformation parameter μ ₃ as the end-to-end output on a training sample.

S4、分别初始化3个尺度的模型网络参数。S4. Initialize the model network parameters of the three scales respectively.

S5、以端到端的方式对3个尺度的模型网络进行联合训练，最优化3个尺度上的联合损失函数。S5. Jointly train the model networks of the three scales in an end-to-end manner to optimize the joint loss function at the three scales.

S6、通过深度学习优化器寻找联合损失函数值降低最快的方向，以所述方向对模型网络进行反向传播，迭代更新模型网络参数，当联合损失函数下降至预设阈值并收敛时，保存此时的网络模型参数，并输出配准后的参考影像f和校正影像m₃。S6. Use a deep learning optimizer to find the direction in which the joint loss function value decreases fastest, perform back propagation on the model network in the direction, iteratively update the model network parameters, and when the joint loss function drops to a preset threshold and converges, save the network model parameters at this time, and output the registered reference image f and the corrected image m ₃ .

进一步地，步骤S3包括以下分步骤：Further, step S3 includes the following sub-steps:

S3-1、将参考影像f和待校正影像m输入到第1个尺度的模型网络中，得到第1个尺度的变换参数μ₁。S3-1. Input the reference image f and the image to be corrected m into the model network of the first scale to obtain the transformation parameter μ ₁ of the first scale.

S3-2、采用变换参数μ₁对待校正影像m进行几何校正，产生校正影像m₁。S3-2. Use the transformation parameter μ ₁ to perform geometric correction on the image to be corrected m to generate a corrected image m ₁ .

S3-3、计算第1个尺度的模型网络的损失函数。S3-3. Calculate the loss function of the model network of the first scale.

S3-4、将参考影像f和校正影像m₁输入到第2个尺度的模型网络中，得到变换参数的残差Δμ₁，并将其与变换参数μ₁组合得到第2个尺度的变换参数μ₂。S3-4. Input the reference image f and the corrected image m ₁ into the model network of the second scale to obtain the residual Δμ ₁ of the transformation parameter, and combine it with the transformation parameter μ ₁ to obtain the transformation parameter μ ₂ of the second scale.

S3-5、采用变换参数μ₂对校正影像m₁进行几何校正，产生校正影像m₂。S3-5. Use the transformation parameter μ ₂ to perform geometric correction on the correction image m ₁ to generate a correction image m ₂ .

S3-6、计算第2个尺度的模型网络的损失函数。S3-6. Calculate the loss function of the model network of the second scale.

S3-7、将参考影像f和校正影像m₂输入到第3个尺度的模型网络中，得到变换参数的残差Δμ₂，并将其与变换参数μ₂组合得到第3个尺度的变换参数μ₃。S3-7. Input the reference image f and the corrected image _m2 into the model network of the third scale to obtain the residual _Δμ2 of the transformation parameter, and combine it with the transformation parameter _μ2 to obtain the transformation parameter _μ3 of the third scale.

S3-8、采用变换参数μ₃对校正影像m₂进行几何校正，产生校正影像m₃。S3-8. Use the transformation parameter μ ₃ to perform geometric correction on the correction image m ₂ to generate a correction image m ₃ .

S3-9、计算第3个尺度的模型网络的损失函数。S3-9. Calculate the loss function of the model network of the third scale.

S3-10、将校正影像m₃和变换参数μ₃作为在一个训练样本上的端对端的输出。S3-10. Take the corrected image m ₃ and the transformation parameter μ ₃ as the end-to-end output on a training sample.

进一步地，步骤S3-1包括以下分步骤：Furthermore, step S3-1 includes the following sub-steps:

S3-1-1、将参考影像f和待校正影像m分别下采样至原尺寸的1/4，并将下采样后产生的两张影像在通道方向上进行叠置，产生叠置影像。S3-1-1. Downsample the reference image f and the image to be corrected m to 1/4 of their original sizes respectively, and superimpose the two images generated after downsampling in the channel direction to generate a superimposed image.

S3-1-2、将叠置影像输入到第1个尺度的模型网络的特征提取部分，产生深度特征。S3-1-2. Input the superimposed image into the feature extraction part of the model network of the first scale to generate deep features.

S3-1-3、将深度特征通过第1个尺度的模型网络的参数回归部分，得到第1个尺度的变换参数μ₁。S3-1-3. Pass the deep features through the parameter regression part of the model network of the first scale to obtain the transformation parameter μ ₁ of the first scale.

进一步地，步骤S3-2包括以下分步骤：Further, step S3-2 includes the following sub-steps:

S3-2-1、由变换参数μ₁组成几何变换矩阵T_μ1。S3-2-1. The geometric transformation matrix T _μ1 is formed by the transformation parameters μ ₁ .

S3-2-2、通过几何变换矩阵T_μ1对待校正影像m进行几何变换，产生校正影像m₁。S3-2-2. Perform geometric transformation on the image to be corrected m using the geometric transformation matrix T _μ1 to generate a corrected image m ₁ .

进一步地，步骤S3-4包括以下分步骤：Further, step S3-4 includes the following sub-steps:

S3-4-1、将参考影像f和校正影像m₁分别下采样至原尺寸的1/2，并将下采样后产生的两张影像在通道方向上进行叠置，产生叠置影像。S3-4-1. Downsample the reference image f and the correction image _m1 to 1/2 of their original sizes respectively, and superimpose the two images generated after downsampling in the channel direction to generate a superimposed image.

S3-4-2、将叠置影像输入到第2个尺度的模型网络的特征提取部分，产生深度特征。S3-4-2. Input the superimposed image into the feature extraction part of the model network of the second scale to generate deep features.

S3-4-3、将深度特征通过第2个尺度的模型网络的参数回归部分，得到变换参数的残差Δμ₁。S3-4-3. Pass the deep features through the parameter regression part of the model network of the second scale to obtain the residual Δμ ₁ of the transformation parameter.

S3-4-4、将残差Δμ₁与变换参数μ₁组合得到第2个尺度的变换参数μ₂。S3-4-4. Combine the residual Δμ ₁ with the transformation parameter μ ₁ to obtain the transformation parameter μ ₂ of the second scale.

进一步地，步骤S3-5包括以下分步骤：Further, step S3-5 includes the following sub-steps:

S3-5-1、由变换参数μ₂组成几何变换矩阵T_μ2。S3-5-1. The geometric transformation matrix T _μ2 is formed by the transformation parameters μ ₂ .

S3-5-2、通过几何变换矩阵T_μ2对校正影像m₁进行几何变换，产生校正影像m₂。S3-5-2. Perform geometric transformation on the correction image m ₁ through the geometric transformation matrix T _μ2 to generate a correction image m ₂ .

进一步地，步骤S3-7包括以下分步骤：Further, step S3-7 includes the following sub-steps:

S3-7-1、将参考影像f和校正影像m₂在通道方向上进行叠置，产生叠置影像。S3-7-1. Overlay the reference image f and the correction image _m2 in the channel direction to generate an overlay image.

S3-7-2、将叠置影像输入到第3个尺度的模型网络的特征提取部分，产生深度特征。S3-7-2. Input the superimposed image into the feature extraction part of the model network of the third scale to generate deep features.

S3-7-3、将深度特征通过第3个尺度的模型网络的参数回归部分，得到变换参数的残差Δμ₂。S3-7-3. Pass the deep features through the parameter regression part of the model network of the third scale to obtain the residual Δμ ₂ of the transformation parameter.

S3-7-4、将残差Δμ₂与变换参数μ₂组合得到第3个尺度的变换参数μ₃。S3-7-4. Combine the residual Δμ ₂ and the transformation parameter μ ₂ to obtain the transformation parameter μ ₃ of the third scale.

进一步地，步骤S3-8包括以下分步骤：Further, step S3-8 includes the following sub-steps:

S3-8-1、由变换参数μ₃组成几何变换矩阵T_μ3。S3-8-1. The geometric transformation matrix T _μ3 is formed by the transformation parameters μ ₃ .

S3-8-2、通过几何变换矩阵T_μ3对校正影像m₂进行几何变换，产生校正影像m₃。S3-8-2. Perform geometric transformation on the corrected image m ₂ using the geometric transformation matrix T _μ3 to generate a corrected image m ₃ .

进一步地，步骤S3-3中第1个尺度的模型网络的损失函数Loss_sim(f，m，μ₁)为：Furthermore, the loss function Loss _sim (f, m, μ ₁ ) of the model network of the first scale in step S3-3 is:

步骤S3-6中第2个尺度的模型网络的损失函数Loss_sim(f，m₁，μ₂)为：The loss function Loss _sim (f, m ₁ , μ ₂ ) of the model network of the second scale in step S3-6 is:

步骤S3-9中第3个尺度的模型网络的损失函数Loss_sim(f，m₂，μ₃)为：The loss function Loss _sim (f, m ₂ , μ ₃ ) of the model network of the third scale in step S3-9 is:

步骤S5中的联合损失函数Loss为：The joint loss function Loss in step S5 is:

Loss＝λ₁×Loss_sim(f，m，μ₁)+λ₂×Loss_sim(f，m₁，μ₂)+λ₃×Loss_sim(f，m₂，μ₃)Loss＝λ ₁ ×Loss _sim (f, m, μ ₁ )+λ ₂ ×Loss _sim (f, m ₁ , μ ₂ )+λ ₃ ×Loss _sim (f, m ₂ , μ ₃ )

其中Sim(·)表示相似性测度，λ₁，λ₂，λ₃为各尺度模型网络的损失函数的权重因子。Where Sim(·) represents the similarity measure, λ ₁ , λ ₂ , λ ₃ are the weight factors of the loss function of each scale model network.

进一步地，步骤S4包括以下分步骤：Further, step S4 includes the following sub-steps:

S4-1、以最小化损失函数Loss_sim(f，m，μ₁)对第1个尺度的模型网络进行训练。S4-1. Train the model network of the first scale by minimizing the loss function Loss _sim (f, m, μ ₁ ).

S4-2、固定第1个尺度的模型网络的参数，以最小化损失函数Loss_sim(f，m₁，μ₂)对第2个尺度的模型网络进行训练。S4-2. Fix the parameters of the model network of the first scale and train the model network of the second scale to minimize the loss function Loss _sim (f, m ₁ , μ ₂ ).

S4-3、固定第1个尺度的模型网络和第2个尺度的模型网络的参数，以最小化损失函数Loss_sim(f，m₂，μ₃)对第3个尺度的模型网络进行训练。S4-3. Fix the parameters of the model network of the first scale and the model network of the second scale, and train the model network of the third scale to minimize the loss function Loss _sim (f, m ₂ , μ ₃ ).

本发明的有益效果是：The beneficial effects of the present invention are:

(1)本发明将影像配准转化为回归优化问题，可集成多种形式和参数的特征提取网络、影像相似性测度和特征描述符，实现了完全无监督学习的、端对端映射的多尺度影像精确配准。(1) The present invention transforms image registration into a regression optimization problem, which can integrate feature extraction networks, image similarity measures and feature descriptors of various forms and parameters, and realizes multi-scale image precise registration with fully unsupervised learning and end-to-end mapping.

(2)本发明在多个尺度上利用模型网络提取出待配准影像的深度特征，经参数回归得到几何变换参数，利用该参数对影像进行几何校正，实现影像“由粗到精”的多尺度逐级配准。(2) The present invention uses a model network to extract the depth features of the image to be registered at multiple scales, obtains geometric transformation parameters through parameter regression, and uses the parameters to perform geometric correction on the image, thereby achieving multi-scale step-by-step registration of the image "from coarse to fine".

(3)本发明不需要配准真值作为训练样本，通过构建基于影像间相似性测度和特征描述符的损失函数，对多尺度上的损失函数进行联合训练，以反向传播更新各模型网络的参数，优化几何变换参数，实现高精度、高鲁棒性的多源遥感影像配准。(3) The present invention does not require the true value of the registration as a training sample. By constructing a loss function based on the similarity measurement between images and the feature descriptor, the loss function at multiple scales is jointly trained to update the parameters of each model network by back propagation, optimize the geometric transformation parameters, and achieve high-precision and high-robust multi-source remote sensing image registration.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1所示为本发明实施例提供的一种基于无监督深度学习的遥感影像配准方法流程图。FIG1 is a flow chart of a remote sensing image registration method based on unsupervised deep learning provided by an embodiment of the present invention.

图2所示为本发明实施例提供的参考影像、待校正影像和校正影像示意图。FIG. 2 is a schematic diagram showing a reference image, an image to be corrected, and a corrected image provided by an embodiment of the present invention.

图3所示为本发明实施例提供的遥感影像配准方法总体框架示意图。FIG3 is a schematic diagram showing the overall framework of the remote sensing image registration method provided by an embodiment of the present invention.

图4所示为本发明实施例提供的模型网络1结构示意图。FIG4 is a schematic diagram showing the structure of a model network 1 provided in an embodiment of the present invention.

图5所示为本发明实施例提供的计算多源遥感影像相似性测度示意图。FIG5 is a schematic diagram showing a method for calculating a similarity measure of multi-source remote sensing images provided by an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

现在将参考附图来详细描述本发明的示例性实施方式。应当理解，附图中示出和描述的实施方式仅仅是示例性的，意在阐释本发明的原理和精神，而并非限制本发明的范围。Now, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. It should be understood that the embodiments shown and described in the accompanying drawings are only exemplary and are intended to explain the principles and spirit of the present invention, rather than to limit the scope of the present invention.

本发明实施例提供了一种基于无监督深度学习的遥感影像配准方法，如图1所示，包括以下步骤S1～S6：The embodiment of the present invention provides a remote sensing image registration method based on unsupervised deep learning, as shown in FIG1 , comprising the following steps S1 to S6:

本发明实施例中，待校正影像数据集中的待校正影像应当是与参考影像包含的地物信息有一定范围重叠(本发明实施例中大于或等于70％)且带有几何畸变的影像。In the embodiment of the present invention, the image to be corrected in the image data set to be corrected should be an image that has a certain overlap (greater than or equal to 70% in the embodiment of the present invention) with the ground object information contained in the reference image and has geometric distortion.

在本发明的一个实施例中，以光学影像与合成孔径雷达(Synthetic ApertureRadar，SAR)影像的配准为例，对步骤S1进行进一步说明。如图2所示，本发明实施例将固定分辨率的影像a作为参考影像，将与影像a部分区域重叠且带有几何畸变的影像b作为待校正影像，经本发明提供的配准方法配准和校正后，得到与影像a重叠区域逐像素对准的影像c。多源遥感影像数据集中包含多对类似上述影像a和影像b的区域影像。应当理解，本发明的其他实施例包括但不限于对多源光学影像的配准、光学影像与红外影像的配准、光学影像与激光雷达(Light Detection and Ranging，LiDAR)强度和高程影像的配准、光学影像与栅格地图的配准，采用本发明提供的配准方法均应在本发明的保护效力之内。In one embodiment of the present invention, the registration of optical images and synthetic aperture radar (SAR) images is taken as an example to further illustrate step S1. As shown in FIG2, the embodiment of the present invention uses image a with a fixed resolution as a reference image, and image b that overlaps with a portion of image a and has geometric distortion as an image to be corrected. After registration and correction by the registration method provided by the present invention, an image c that is aligned pixel by pixel with the overlapping area of image a is obtained. The multi-source remote sensing image data set contains multiple pairs of regional images similar to the above-mentioned image a and image b. It should be understood that other embodiments of the present invention include but are not limited to the registration of multi-source optical images, the registration of optical images and infrared images, the registration of optical images and laser radar (Light Detection and Ranging, LiDAR) intensity and elevation images, and the registration of optical images and raster maps. The registration method provided by the present invention should be within the protection effect of the present invention.

本发明实施例采用“由粗到精”的多尺度匹配策略，以一种端对端的框架联合训练3个尺度上的模型网络，预测变换参数及其残差，从而实现影像的精确配准。端对端的框架指本发明实施例中输入参考影像f和待校正影像m，输出校正影像m₃和变换参数μ₃，其构成端对端的映射关系。The embodiment of the present invention adopts a "coarse-to-fine" multi-scale matching strategy, and jointly trains the model networks at three scales in an end-to-end framework to predict the transformation parameters and their residuals, thereby achieving accurate image registration. The end-to-end framework refers to the input of the reference image f and the image to be corrected m in the embodiment of the present invention, and the output of the corrected image m ₃ and the transformation parameter μ ₃ , which constitute an end-to-end mapping relationship.

如图3所示，步骤S3包括以下分步骤S3-1～S3-10：As shown in FIG3 , step S3 includes the following sub-steps S3-1 to S3-10:

S3-1、将参考影像f和待校正影像m输入到第1个尺度(本发明实施例中简称“尺度1”)的模型网络(本发明实施例中简称“模型网络1”)中，得到第1个尺度的变换参数μ₁。S3-1. Input the reference image f and the image to be corrected m into a model network (referred to as "model network 1" in the embodiment of the present invention) of the first scale (referred to as "scale 1" in the embodiment of the present invention) to obtain the transformation parameter μ ₁ of the first scale.

步骤S3-1包括以下分步骤S3-1-1～S3-1-3：Step S3-1 includes the following sub-steps S3-1-1 to S3-1-3:

本发明实施例中，参考影像f的尺寸是固定的，若待校正影像m与参考影像f的尺寸不一致，通常采用零填充或裁剪的方式将待校正影像m的尺寸调整至与参考影像f一致。In the embodiment of the present invention, the size of the reference image f is fixed. If the size of the image to be corrected m is inconsistent with the size of the reference image f, zero padding or cropping is usually used to adjust the size of the image to be corrected m to be consistent with the reference image f.

如图4所示，本发明的一个实施例中，模型网络1的特征提取部分由k组相互连接的卷积块和下采样层组成，每个卷积块包括一个卷积层、一个局部响应归一化层和一个线性单元激活函数层，每个下采样层将影像分辨率降至原来的1/2。实验表明，通过合理选取k的值，使得最后一个卷积块生成的特征图的尺寸位于[4，7]之间，且将卷积层的卷积核通道数设定为待校正影像尺寸的1/4时，有利于后续步骤产生更为精确的变换参数μ₁。在本发明实施例中，若参考影像f和待校正影像m的尺寸为512×512，1/4下采样后的影像尺寸为128×128，将每一个卷积块的卷积核通道数设定为32，且k的值设定为5，叠置影像经5组卷积块和下采样层产生的特征图尺寸为4。As shown in FIG4 , in one embodiment of the present invention, the feature extraction part of the model network 1 is composed of k groups of interconnected convolution blocks and downsampling layers, each convolution block includes a convolution layer, a local response normalization layer and a linear unit activation function layer, and each downsampling layer reduces the image resolution to 1/2 of the original. Experiments show that by reasonably selecting the value of k, the size of the feature map generated by the last convolution block is between [4, 7], and the number of convolution kernel channels of the convolution layer is set to 1/4 of the size of the image to be corrected, which is conducive to the subsequent steps to generate more accurate transformation parameters μ ₁ . In an embodiment of the present invention, if the size of the reference image f and the image to be corrected m is 512×512, the image size after 1/4 downsampling is 128×128, the number of convolution kernel channels of each convolution block is set to 32, and the value of k is set to 5, and the size of the feature map generated by the superimposed image through 5 groups of convolution blocks and downsampling layers is 4.

在本发明的另一个实施例中，模型网络1的特征提取部分包括但不限定采用U型结构网络(U-Net)、全卷积神经网络(FCN)等。In another embodiment of the present invention, the feature extraction part of the model network 1 includes but is not limited to a U-net, a fully convolutional neural network (FCN), etc.

如图4所示，本发明的一个实施例中，模型网络1的参数回归部分由t个并行连接的全连接层组成，t的值可综合计算速度和影像尺度变换的范围而设定，本发明对此不进行限定。实验证明若缩放系数在[0.5，2]时，设定4个并行的全连接层效果较好。并行的全连接层类似于传统影像配准中使用的金字塔策略，其区别在于输出空间变换参数的初始值在尺度上的不同。与采用单一全连接层输出参数相比，多个并行全连接层的计算会极大地加速损失函数的收敛。As shown in Figure 4, in one embodiment of the present invention, the parameter regression part of the model network 1 is composed of t parallel connected fully connected layers. The value of t can be set based on the comprehensive calculation speed and the range of image scale transformation, and the present invention does not limit this. Experiments have shown that if the scaling factor is in [0.5, 2], setting 4 parallel fully connected layers has a better effect. The parallel fully connected layers are similar to the pyramid strategy used in traditional image registration, the difference being that the initial values of the output space transformation parameters are different in scale. Compared with using a single fully connected layer output parameter, the calculation of multiple parallel fully connected layers will greatly accelerate the convergence of the loss function.

应当理解，本发明对模型网络1的特征提取部分和参数回归部分的实现并不做出形式上和参数上的限定，凡采用叠置影像的输入方式，经各种形式和参数的卷积神经网络(Convolutional Neural Network，CNN)提取出通道方向上的深度特征、输出几何变换参数的思想，均在本发明的保护效力之内。It should be understood that the present invention does not impose any formal or parametric limitations on the implementation of the feature extraction part and the parameter regression part of the model network 1. Any idea of using an input method of superimposed images and extracting deep features in the channel direction and outputting geometric transformation parameters through convolutional neural networks (CNN) of various forms and parameters is within the protection scope of the present invention.

步骤S3-2包括以下分步骤S3-2-1～S3-2-2：Step S3-2 includes the following sub-steps S3-2-1 to S3-2-2:

在本发明的一个实施例中，如图4所示，步骤S3-1-3中输出6个几何变换参数a₁，a₂，a₃，a₄，a₅，a₆，即构成二维仿射矩阵T_μ1：In one embodiment of the present invention, as shown in FIG4 , six geometric transformation parameters a ₁ , a ₂ , a ₃ , a ₄ , a ₅ , a ₆ are output in step S3 - 1 - 3 , namely, a two-dimensional affine matrix T _μ1 is formed:

仿射变换矩阵式中的6个参数代表了对影像像素坐标的平移、旋转、放缩及错切等操作。假设对影像的几何变换包括：在x方向上的平移量为D_x，在y方向上的平移量为D_x；在x方向上的缩放系数为S_x，在y方向上的缩放系数为S_y；顺时针旋转角度θ；在x方向上的错切角为

在y方向上的错切角为ω，则二维仿射矩阵T_μ1中6个参数由上述操作的任意排列组合得到：The six parameters in the affine transformation matrix represent the translation, rotation, scaling and shearing operations on the image pixel coordinates. Assume that the geometric transformation of the image includes: the translation amount in the x direction is D _x , the translation amount in the y direction is D _x ; the scaling factor in the x direction is S _x , the scaling factor in the y direction is _Sy ; the clockwise rotation angle θ; the shearing angle in the x direction is

The misalignment angle in the y direction is ω, and the six parameters in the two-dimensional affine matrix T _μ1 are obtained by any permutation and combination of the above operations:

在本发明的一个实施例中，模型网络1的参数回归部分输出更多或更少个数的几何变换参数，以构成仿射变换之外的其它几何变换矩阵，如透视变换、刚性变换等，本发明对此不进行限定。In one embodiment of the present invention, the parameter regression part of the model network 1 outputs a greater or lesser number of geometric transformation parameters to form other geometric transformation matrices besides the affine transformation, such as perspective transformation, rigid transformation, etc., which is not limited by the present invention.

S3-2-2、通过几何变换矩阵T_μ1对待校正影像m进行几何变换，产生校正影像m₁：S3-2-2. Perform geometric transformation on the image to be corrected m through the geometric transformation matrix T _μ1 to generate a corrected image m ₁ :

m₁＝T_μ1(m)m ₁ ＝T _μ1 (m)

具体地，对待校正影像m上的每一个坐标为(x，y)的像素，设其灰度值为σ，计算其经空间变换在校正影像上的坐标(X，Y)，按照一定的重采样和内插方法生成校正影像m₁。在仿射变换的实施例中，有：Specifically, for each pixel with coordinates (x, y) on the corrected image m, set its gray value to σ, calculate its coordinates (X, Y) on the corrected image after spatial transformation, and generate the corrected image m ₁ according to a certain resampling and interpolation method. In the embodiment of affine transformation, there is:

S3-3、计算第1个尺度的模型网络的损失函数Loss_sim(f，m，μ₁)：S3-3. Calculate the loss function Loss _sim (f, m, μ ₁ ) of the model network of the first scale:

其中，

表示T_μ1的几何逆变换，其定义为：in,

represents the geometric inverse transformation of T _μ1 , which is defined as:

Sim(·)表示相似性测度，即Sim(A，B)表示计算影像A和影像B的某种相似性测度。常用的相似性测度计算方法有灰度差平方和(Sum of Squared Difference，SSD)、归一化互相关(Normalized Cross Correlation，NCC)和相位相关(Phase Correlation)等：Sim(·) represents a similarity measure, that is, Sim(A, B) represents a similarity measure between images A and B. Commonly used similarity measure calculation methods include Sum of Squared Difference (SSD), Normalized Cross Correlation (NCC), and Phase Correlation, etc.:

其中影像A和影像B的尺寸都是w×w，

和

分别是影像A和影像B的灰度均值。The size of image A and image B is w×w,

and

are the grayscale means of image A and image B respectively.

计算传统的相似性测度(如SSD或NCC)较为耗时，根据两幅影像在空间域中的相关或卷积等于其在频率域的乘积，采用计算速度较快的相位相关，具体步骤如下：Calculating traditional similarity measures (such as SSD or NCC) is time-consuming. Based on the fact that the correlation or convolution of two images in the spatial domain is equal to their product in the frequency domain, phase correlation with faster calculation speed is used. The specific steps are as follows:

设影像A和影像B在空间域存在位移关系(x₀，y₀)，即B(x，y)＝A(x-x₀，y-y₀)，经傅里叶变换分别表示为F_A(u，v)和F_B(u，v)，二者在频率域存在以下关系：Assume that image A and image B have a displacement relationship (x ₀ , y ₀ ) in the spatial domain, that is, B(x, y) = A(xx ₀ , yy ₀ ), which are respectively expressed as _FA (u, v) and _FB (u, v) by Fourier transform. The two have the following relationship in the frequency domain:

F_B(u，v)＝F_A(u，v)exp(-i(ux₀+vy₀))F _B (u, v) = F _A (u, v) exp (-i (ux ₀ + vy ₀ ))

二者的归一化互功率频谱表示为：The normalized cross-power spectrum of the two is expressed as:

其中上标*表示复共轭。The superscript * indicates complex conjugation.

在本发明的一个实施例中，影像A和影像B是由同一类传感器在同一地区获取的多源光学遥感影像，采用灰度值作为计算影像A和影像B相似性测度的输入。In one embodiment of the present invention, image A and image B are multi-source optical remote sensing images acquired by the same type of sensor in the same area, and grayscale values are used as input for calculating similarity measures between image A and image B.

在本发明的另一个实施例中，影像A和影像B是由不同类别传感器(如光学、红外、SAR等)在同一地区获取的遥感影像，不直接采用灰度值作为计算影像A和影像B相似性测度的输入，而是逐像素地计算影像A和影像B的局部特征描述符，如方向梯度特征通道(ChanelFeature of Orientated Gradient，CFOG)、方向梯度直方图(Histogram of OrientatedGradient，HOG)、局部自相似描述子(Local Self-similarity Descriptor，LSS)和相位一致性方向直方图(Histogram of Orientated Phase Congruency，HOPC)等。如图5所示，将两幅影像的特征描述符影像间的SSD、NCC或相位相关作为相似性测度。In another embodiment of the present invention, image A and image B are remote sensing images acquired in the same area by sensors of different types (such as optical, infrared, SAR, etc.). Grayscale values are not directly used as input for calculating the similarity measure between image A and image B. Instead, local feature descriptors of image A and image B are calculated pixel by pixel, such as Chanel Feature of Orientated Gradient (CFOG), Histogram of Orientated Gradient (HOG), Local Self-similarity Descriptor (LSS), and Histogram of Orientated Phase Congruency (HOPC). As shown in FIG5 , the SSD, NCC or phase correlation between the feature descriptors of the two images is used as the similarity measure.

步骤S3-1～S3-3是在尺度1上产生变换参数和校正影像，以及计算损失函数，这几个步骤详细地介绍了相关操作的具体实现方式。后续步骤(步骤S3-4～S3-9)还将在其他尺度上重复类似的操作，其与尺度1上的相关操作仅仅是参数上的不同，将简要概述其流程，而不重复详述其原理。Steps S3-1 to S3-3 generate transformation parameters and correct images at scale 1, and calculate the loss function. These steps introduce the specific implementation methods of the relevant operations in detail. Subsequent steps (steps S3-4 to S3-9) will repeat similar operations at other scales. The only difference between them and the relevant operations at scale 1 is the parameters. The process will be briefly outlined without repeating the detailed description of the principles.

S3-4、将参考影像f和校正影像m₁输入到第2个尺度(本发明实施例中简称“尺度2”)的模型网络(本发明实施例中简称“模型网络2”)中，得到变换参数的残差Δμ₁，并将其与变换参数μ₁组合得到第2个尺度的变换参数μ₂。S3-4, input the reference image f and the corrected image _m1 into the model network of the second scale (referred to as "scale 2" in the embodiment of the present invention) (referred to as "model network 2" in the embodiment of the present invention), obtain the residual _Δμ1 of the transformation parameter, and combine it with the transformation parameter _μ1 to obtain the transformation parameter _μ2 of the second scale.

步骤S3-4包括以下分步骤S3-4-1～S3-4-4：Step S3-4 includes the following sub-steps S3-4-1 to S3-4-4:

本发明实施例中，模型网络2的网络结构类似上述模型网络1的网络结构，仅仅是参数设定上有所不同。现结合具体实施例对步骤S3-4-2中特征提取的具体实现作进一步说明，若参考影像f和校正影像m₁的尺寸为512×512，1/2下采样后的影像尺寸为256×256，将每一个卷积块的卷积核通道数设定为64，且k的值设定为6，叠置影像经6组卷积块和下采样层产生的特征图尺寸为4。In the embodiment of the present invention, the network structure of the model network 2 is similar to the network structure of the model network 1, and only the parameter settings are different. Now, the specific implementation of feature extraction in step S3-4-2 is further described in conjunction with a specific embodiment. If the size of the reference image f and the correction image _m1 is 512×512, the image size after 1/2 downsampling is 256×256, the number of convolution kernel channels of each convolution block is set to 64, and the value of k is set to 6, the feature map size generated by the superimposed image through 6 groups of convolution blocks and downsampling layers is 4.

S3-4-4、将残差Δμ₁与变换参数μ₁组合得到第2个尺度的变换参数μ₂：S3-4-4. Combine the residual Δμ ₁ with the transformation parameter μ ₁ to obtain the transformation parameter μ ₂ of the second scale:

μ₂＝μ₁*Δμ₁ μ ₂ =μ ₁ *Δμ ₁

其中*表示矩阵的乘法。Where * represents matrix multiplication.

步骤S3-5包括以下分步骤S3-5-1～S3-5-2：Step S3-5 includes the following sub-steps S3-5-1 to S3-5-2:

S3-5-2、通过几何变换矩阵T_μ2对校正影像m₁进行几何变换，产生校正影像m₂：S3-5-2. Perform geometric transformation on the correction image m ₁ through the geometric transformation matrix T _μ2 to generate the correction image m ₂ :

m₂＝T_μ2(m₁)m ₂ ＝T _{μ 2} (m ₁ )

S3-6、计算第2个尺度的模型网络的损失函数Loss_sim(f，m₁，μ₂)：S3-6. Calculate the loss function Loss _sim (f, m ₁ , μ ₂ ) of the model network of the second scale:

S3-7、将参考影像f和校正影像m₂输入到第3个尺度(本发明实施例中简称“尺度3”)的模型网络(本发明实施例中简称“模型网络3”)中，得到变换参数的残差Δμ₂，并将其与变换参数μ₂组合得到第3个尺度的变换参数μ₃。S3-7, input the reference image f and the corrected image _m2 into the model network of the third scale (referred to as "scale 3" in the embodiment of the present invention) (referred to as "model network 3" in the embodiment of the present invention), obtain the residual _Δμ2 of the transformation parameter, and combine it with the transformation parameter _μ2 to obtain the transformation parameter _μ3 of the third scale.

步骤S3-7包括以下分步骤S3-7-1～S3-7-4：Step S3-7 includes the following sub-steps S3-7-1 to S3-7-4:

本发明实施例中，模型网络3的网络结构类似上述模型网络1和模型网络2的网络结构，仅仅是参数设定上有所不同。现结合具体实施例对步骤S3-7-2中特征提取的具体实现作进一步说明，若影像f和影像m的尺寸为512×512，将每一个卷积块的卷积核通道数设定为128，且k的值设定为7，叠置影像经7组卷积块和下采样层产生的特征图尺寸为4。In the embodiment of the present invention, the network structure of the model network 3 is similar to the network structure of the model network 1 and the model network 2, and only the parameter settings are different. Now, the specific implementation of feature extraction in step S3-7-2 is further described in conjunction with a specific embodiment. If the size of the image f and the image m is 512×512, the number of convolution kernel channels of each convolution block is set to 128, and the value of k is set to 7. The feature map size generated by the superimposed image through 7 groups of convolution blocks and downsampling layers is 4.

S3-7-4、将残差Δμ₂与变换参数μ₂组合得到第3个尺度的变换参数μ₃：S3-7-4. Combine the residual Δμ ₂ with the transformation parameter μ ₂ to obtain the transformation parameter μ ₃ of the third scale:

μ₃＝μ₂*Δμ₂ μ ₃ =μ ₂ *Δμ ₂

其中*表示矩阵的乘法。Where * represents matrix multiplication.

步骤S3-8包括以下分步骤S3-8-1～S3-8-2：Step S3-8 includes the following sub-steps S3-8-1 to S3-8-2:

S3-8-2、通过几何变换矩阵T_μ3对校正影像m₂进行几何变换，产生校正影像m₃：S3-8-2. Perform geometric transformation on the correction image m ₂ through the geometric transformation matrix T _μ3 to generate the correction image m ₃ :

m₃＝T_μ3(m₂)m ₃ =T _μ3 (m ₂ )

S3-9、计算第3个尺度的模型网络的损失函数Loss_sim(f，m₂，μ₃)：S3-9. Calculate the loss function Loss _sim (f, m ₂ , μ ₃ ) of the model network of the third scale:

步骤S4包括以下分步骤S4-1～S4-3：Step S4 includes the following sub-steps S4-1 to S4-3:

本发明实施例中，在对3个尺度的模型网络进行联合训练之前，需要解除固定所有模型网络的参数。In the embodiment of the present invention, before jointly training the model networks of three scales, it is necessary to unfix the parameters of all the model networks.

本发明实施例中，联合损失函数Loss为：In the embodiment of the present invention, the joint loss function Loss is:

其中λ₁，λ₂，λ₃为各尺度模型网络的损失函数的权重因子，本发明实施例中λ₁，λ₂，λ₃的值分别取0.05、0.05、0.9。Wherein λ ₁ , λ ₂ , and λ ₃ are weight factors of the loss function of each scale model network. In the embodiment of the present invention, the values of λ ₁ , λ ₂ , and λ ₃ are 0.05, 0.05, and 0.9, respectively.

S6、通过深度学习优化器寻找联合损失函数值降低最快的方向，以所述方向对模型网络进行反向传播，迭代更新模型网络参数，当联合损失函数下降至预设阈值并收敛时，端对端映射的所有模型网络具有整体最优的参数，参考影像f与校正影像m₃具有最佳的相似性，保存此时的网络模型参数，并输出配准后的参考影像f和校正影像m₃。S6. Use a deep learning optimizer to find the direction in which the joint loss function value decreases fastest, perform backpropagation on the model network in the direction, iteratively update the model network parameters, and when the joint loss function drops to a preset threshold and converges, all model networks of end-to-end mapping have the overall optimal parameters, and the reference image f and the corrected image m ₃ have the best similarity. Save the network model parameters at this time, and output the aligned reference image f and corrected image m ₃ .

由此，本发明实现了完全无监督学习的、端对端映射的多尺度遥感影像精确配准。Therefore, the present invention realizes the precise registration of multi-scale remote sensing images with completely unsupervised learning and end-to-end mapping.

本领域的普通技术人员将会意识到，这里所述的实施例是为了帮助读者理解本发明的原理，应被理解为本发明的保护范围并不局限于这样的特别陈述和实施例。本领域的普通技术人员可以根据本发明公开的这些技术启示做出各种不脱离本发明实质的其它各种具体变形和组合，这些变形和组合仍然在本发明的保护范围内。Those skilled in the art will appreciate that the embodiments described herein are intended to help readers understand the principles of the present invention, and should be understood that the protection scope of the present invention is not limited to such specific statements and embodiments. Those skilled in the art can make various other specific variations and combinations that do not deviate from the essence of the present invention based on the technical revelations disclosed by the present invention, and these variations and combinations are still within the protection scope of the present invention.

Claims

1. A remote sensing image registration method based on unsupervised deep learning, characterized in that it comprises the following steps:

S1. Establish a multi-source remote sensing image registration dataset including two sets of image data, wherein each two images of the two sets of image data correspond to each other one by one, wherein one set of image data is used as a reference image dataset, and the other set of image data is used as an image dataset to be corrected;

S2, select a reference image f from the reference image data set, select an image m to be corrected corresponding to the reference image f from the image data set to be corrected, and use the reference image f and the image m to be corrected as end-to-end input on a training sample;

S3, respectively calculating the transformation parameters μ ₁ , μ ₂ , μ ₃ of the image on the model network of each scale at three scales, gradually correcting the image m to be corrected, generating corrected images m ₁ , m ₂ , m ₃ , back-propagating the loss function of the model network of each scale, and using the corrected image m ₃ and the transformation parameter μ ₃ as the end-to-end output on a training sample;

S4, initialize the model network parameters of three scales respectively;

S5. Jointly train the model networks of the three scales in an end-to-end manner to optimize the joint loss function of the three scales.

S6. Use a deep learning optimizer to find the direction in which the joint loss function value decreases fastest, perform back propagation on the model network in the direction, iteratively update the model network parameters, and when the joint loss function drops to a preset threshold and converges, save the network model parameters at this time, and output the registered reference image f and the corrected image m ₃ ;

The step S3 comprises the following sub-steps:

S3-1, input the reference image f and the image to be corrected m into the model network of the first scale to obtain the transformation parameter μ ₁ of the first scale;

S3-2, using the transformation parameter μ ₁ to perform geometric correction on the image to be corrected m, to generate a corrected image m ₁ ;

S3-3. Calculate the loss function of the model network of the first scale;

S3-4, input the reference image f and the correction image m ₁ into the model network of the second scale to obtain the residual Δμ ₁ of the transformation parameter, and combine it with the transformation parameter μ ₁ to obtain the transformation parameter μ ₂ of the second scale;

S3-5, using the transformation parameter μ ₂ to perform geometric correction on the correction image m ₁ to generate a correction image m ₂ ;

S3-6. Calculate the loss function of the model network of the second scale;

S3-7, input the reference image f and the correction image m ₂ into the model network of the third scale, obtain the residual Δμ ₂ of the transformation parameter, and combine it with the transformation parameter μ ₂ to obtain the transformation parameter μ ₃ of the third scale;

S3-8, using the transformation parameter μ ₃ to perform geometric correction on the correction image m ₂ to generate a correction image m ₃ ;

S3-9, calculate the loss function of the model network of the third scale;

S3-10, using the corrected image m ₃ and the transformation parameter μ ₃ as the end-to-end output on a training sample;

The step S3-1 includes the following sub-steps:

S3-1-1, down-sampling the reference image f and the image to be corrected m to 1/4 of the original size respectively, and superimposing the two images generated after downsampling in the channel direction to generate a superimposed image;

S3-1-2, inputting the superimposed image into the feature extraction part of the model network of the first scale to generate deep features;

S3-1-3, passing the deep features through the parameter regression part of the model network of the first scale to obtain the transformation parameter μ ₁ of the first scale;

The step S3-2 includes the following sub-steps:

S3-2-1, forming a geometric transformation matrix T _μ1 from transformation parameters μ ₁ ;

S3-2-2, geometrically transform the image to be corrected m by using the geometric transformation matrix T _μ1 to generate a corrected image m ₁ ;

The step S3-4 includes the following sub-steps:

S3-4-1, down-sampling the reference image f and the correction image _m1 to 1/2 of the original size respectively, and superimposing the two images generated after downsampling in the channel direction to generate a superimposed image;

S3-4-2, inputting the superimposed image into the feature extraction part of the model network of the second scale to generate deep features;

S3-4-3, passing the deep features through the parameter regression part of the model network of the second scale to obtain the residual Δμ ₁ of the transformation parameter;

S3-4-4, combining the residual Δμ ₁ with the transformation parameter μ ₁ to obtain the transformation parameter μ ₂ of the second scale;

The step S3-5 comprises the following sub-steps:

S3-5-1, forming a geometric transformation matrix T _μ2 from transformation parameters μ ₂ ;

S3-5-2, geometrically transform the correction image m ₁ by using the geometric transformation matrix T _μ2 to generate a correction image m ₂ ;

The step S3-7 includes the following sub-steps:

S3-7-1, superimposing the reference image f and the correction image _m2 in the channel direction to generate a superimposed image;

S3-7-2, inputting the superimposed image into the feature extraction part of the model network of the third scale to generate deep features;

S3-7-3, passing the deep features through the parameter regression part of the model network of the third scale to obtain the residual Δμ ₂ of the transformation parameter;

S3-7-4, combining the residual Δμ ₂ with the transformation parameter μ ₂ to obtain the transformation parameter μ ₃ of the third scale;

The step S3-8 comprises the following sub-steps:

S3-8-1, forming a geometric transformation matrix T _μ3 from transformation parameters μ ₃ ;

S3-8-2, geometrically transform the correction image m ₂ by using the geometric transformation matrix T _μ3 to generate a correction image m ₃ ;

The loss function Loss _sim (f, m, μ ₁ ) of the model network of the first scale in step S3-3 is:

The loss function Loss _sim (f, m ₁ , μ ₂ ) of the model network of the second scale in step S3-6 is:

The loss function Loss _sim (f, m ₂ , μ ₃ ) of the model network of the third scale in step S3-9 is:

The joint loss function Loss in step S5 is:

Loss＝λ ₁ ×Loss _sim (f,m,μ ₁ )+λ ₂ ×Loss _sim (f,m ₁ ,μ ₂ )+λ ₃ ×Loss _sim (f,m ₂ ,μ ₃ )

Where Sim(·) represents the similarity measure, and λ ₁ ,λ ₂ ,λ ₃ are the weight factors of the loss function of each scale model network.

2. The remote sensing image registration method according to claim 1, characterized in that step S4 comprises the following sub-steps:

S4-1, training the model network of the first scale by minimizing the loss function Loss _sim (f,m,μ ₁ );

S4-2, fixing the parameters of the model network of the first scale, and training the model network of the second scale to minimize the loss function Loss _sim (f,m ₁ ,μ ₂ );

S4-3. Fix the parameters of the model network of the first scale and the model network of the second scale, and train the model network of the third scale to minimize the loss function loss _sim (f,m ₂ ,μ ₃ ).