WO2022241877A1 - 基于神经网络的异构图像位姿估计及配准方法、装置及介质 - Google Patents
基于神经网络的异构图像位姿估计及配准方法、装置及介质 Download PDFInfo
- Publication number
- WO2022241877A1 WO2022241877A1 PCT/CN2021/099255 CN2021099255W WO2022241877A1 WO 2022241877 A1 WO2022241877 A1 WO 2022241877A1 CN 2021099255 W CN2021099255 W CN 2021099255W WO 2022241877 A1 WO2022241877 A1 WO 2022241877A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- matched
- heterogeneous
- feature map
- transformation
- image
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 28
- 230000009466 transformation Effects 0.000 claims description 110
- 238000013519 translation Methods 0.000 claims description 61
- 238000001228 spectrum Methods 0.000 claims description 29
- 238000004590 computer program Methods 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 11
- 238000013507 mapping Methods 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 9
- 238000000844 transformation Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 abstract description 7
- 238000011156 evaluation Methods 0.000 abstract description 6
- 238000004422 calculation algorithm Methods 0.000 abstract description 5
- 238000012360 testing method Methods 0.000 abstract description 2
- 238000004904 shortening Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 230000000717 retained effect Effects 0.000 description 3
- 101100173585 Schizosaccharomyces pombe (strain 972 / ATCC 24843) fft1 gene Proteins 0.000 description 2
- 101100173586 Schizosaccharomyces pombe (strain 972 / ATCC 24843) fft2 gene Proteins 0.000 description 2
- 101150091027 ale1 gene Proteins 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 240000004050 Pentaglottis sempervirens Species 0.000 description 1
- 235000004522 Pentaglottis sempervirens Nutrition 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/33—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/74—Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/32—Determination of transform parameters for the alignment of images, i.e. image registration using correlation-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/33—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
- G06T7/337—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10032—Satellite or aerial image; Remote sensing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20048—Transform domain processing
- G06T2207/20056—Discrete and fast Fourier transform, [DFT, FFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30244—Camera pose
Definitions
- the invention belongs to the field of image processing, and in particular relates to an image pose estimation and matching method.
- Self-localization is one of the most fundamental problems of mobile robots. After more than ten years of research, it is relatively mature to locate a given observation in the map established by the same sensor. But it remains an open problem for measurement matching from heterogeneous sensors. Heterogeneous sensors are limited by the characteristics of the sensor itself, and the two images obtained are heterogeneous images with differences in angle, scale, and viewing angle; and the sensor is also subject to different forms of interference such as lighting, shadows, and occlusions when acquiring graphics , and these disturbances will make pose estimation extremely difficult. Considering the positive progress of researchers in building maps in recent years, we also hope to complete the matching of heterogeneous images obtained by multiple sensors by building maps, so that the maps formed after matching can be shared by multiple robots equipped with heterogeneous sensors.
- the purpose of the present invention is to solve the problem that heterogeneous images are difficult to realize pose estimation and registration in the prior art, and provide a method for pose estimation and registration of heterogeneous images based on neural network.
- the present invention provides a method for estimating poses of heterogeneous images based on neural networks, the steps of which are as follows:
- S3 The two amplitude spectra obtained in S2 are respectively subjected to logarithmic polar coordinate transformation, so that they are converted from the Cartesian coordinate system to the logarithmic polar coordinate system, so that the rotation in the Cartesian coordinate system between the two amplitude spectra The transformation is mapped to a translational transformation in the y direction in logarithmic polar coordinates;
- S7 The two amplitude spectra obtained in S6 are respectively subjected to logarithmic polar coordinate transformation, so that they are converted from the Cartesian coordinate system to the logarithmic polar coordinate system, so that the scaling between the two amplitude spectra in the Cartesian coordinate system
- the transformation is mapped to a translation transformation in the x-direction in the logarithmic polar coordinate system;
- S12 Use the pre-trained seventh U-Net network and the eighth U-Net network as two feature extractors, respectively use the template image and the new picture to be matched as the original input pictures of the two feature extractors, extract The isomorphic features in the two original input pictures are obtained to obtain the seventh feature map and the eighth feature map that are isomorphic and only retain the translation transformation relationship between the original input pictures;
- the present invention provides a method for registration of heterogeneous images based on neural networks, which is to obtain the pose estimation between the template image and the picture to be matched according to the heterogeneous image pose estimation method described in the first aspect , and then the image to be matched is rotated, scaled, and translated simultaneously according to the estimated transformation relationship, so that it is registered to the template image, and the matching stitching between the template image and the image to be matched is realized.
- the present invention has the following beneficial effects:
- the invention optimizes the phase correlation algorithm to be differentiable, embeds it into the end-to-end learning network framework, and constructs a neural network-based heterogeneous image pose estimation method.
- This method can find the optimal feature extractor for the results of image matching, obtain the solution without exhaustive evaluation, and has good interpretability and generalization ability.
- the test results show that the present invention can accurately realize the accurate pose estimation and registration of heterogeneous pictures, and the required time is relatively short, with high accuracy and real-time performance, which can meet the needs of practical applications and can be applied to robotics positioning etc.
- Fig. 1 is a schematic diagram of a network framework structure of a pose estimator in the present invention
- Fig. 2 is a schematic flow chart of two stages of the pose estimation method of the present invention
- FIG. 3 is a schematic diagram of a graphic registration process in an embodiment of the present invention.
- Fig. 4 is a schematic diagram of a graphic registration result in another embodiment of the present invention.
- the heterogeneous sensor is limited by the characteristics of the sensor itself, and the two images obtained by it are heterogeneous images with differences in angle, scale, and viewing angle. Moreover, the sensor is also subject to different forms of interference such as illumination, shadow, and occlusion when acquiring graphics, and these interferences will make pose estimation extremely difficult. For example, O 1 is captured by the drone's bird's-eye view camera in the early morning, while O 2 is the local elevation map constructed by the ground robot using lidar. These two types of graphics are heterogeneous images, and the two cannot be directly matched. To solve this problem, the general processing method is to extract features from the two images, and use the features instead of raw sensor measurements to estimate the relative pose.
- the present invention constructs a neural network-based heterogeneous image pose estimation method to estimate the pose transformation relationship between any two heterogeneous images.
- the estimation method is realized by a pose estimator based on neural network, which is essentially a differentiable phase correlation algorithm.
- Phase correlation is a similarity-based matcher that performs well for inputs with the same modality, but can only complete the match with small high-frequency noise.
- the present invention takes conventional phase correlation and endows the Fast Fourier Transform layer (FFT), Log Polar Transform layer (LPT) and Phase Correlation layer (DC) with differentiable properties, making it usable for the training of end-to-end pose estimators.
- FFT Fast Fourier Transform layer
- LPT Log Polar Transform layer
- DC Phase Correlation layer
- FIG. 1 it is the network frame structure of the pose estimator built in a preferred embodiment of the present invention, its core is 8 independent U-Net networks and Fourier transform layer (FFT), logarithmic pole
- the input of the pose estimator is a pair of heterogeneous graphics, which are recorded as the template image Source and the image to be matched Template, and the final output is the registration template image and the image to be matched
- the three pose transformation relationships needed to match the picture are translation, rotation and scaling.
- the template image is used as a matching template, and the image to be matched can be matched and stitched to the template image after undergoing pose transformation.
- the general processing method is to extract features from the two images, and use the features instead of raw sensor measurements to estimate the relative pose.
- a high-pass filter is used to suppress the random random noise of the two inputs, and this process can be regarded as a feature extractor.
- a high-pass filter is far from enough.
- the present invention utilizes end-to-end learning to solve this problem.
- U-Net1 ⁇ U-Net8 8 independent trainable U-Net networks (denoted as U-Net1 ⁇ U-Net8) are respectively constructed in the rotation zoom stage and the translation stage for the template image and the source image, and these 8 U-Net networks are in translation After pre-training under the supervision of the three types of loss, rotation and scaling, the isomorphic features in the picture can be extracted from the heterogeneous images, that is, the common features, so that two heterogeneous images can be converted into two isomorphic feature maps.
- each U-Net consists of 4 downsampled encoder layers and 4 upsampled decoder layers to extract features. As the training progresses, the parameters of the 8 U-Nets will be adjusted. Note that this network is lightweight, so it is efficient enough in real-time to meet the requirements of the application scenario.
- the function of the Fourier transform layer is to perform Fourier transform on the feature map extracted by the U-Net network, remove the translation transformation relationship between images but retain the rotation and scaling transformation relationship. Because according to the characteristics of the Fourier transform, only rotation and scale have an effect on the magnitude of the spectrum, but the magnitude of the spectrum is not sensitive to translation. Therefore, after the introduction of FFT, a representation method that is insensitive to translation but particularly sensitive to scaling and rotation is obtained, so translation can be ignored in the subsequent solution to scaling and rotation.
- FFT Fourier transform layer
- the function of the logarithmic polar transformation layer is to perform logarithmic polar coordinate transformation on the image transformed by FFT, and map the image from the Cartesian coordinate system to the logarithmic polar coordinate system.
- LPT logarithmic polar transformation layer
- scaling and rotation in Cartesian coordinates can be converted to translation in logarithmic polar coordinates.
- This coordinate transformation which yields a form of cross-correlation with respect to scale and rotation, removes all exhaustive evaluation in the overall pose estimator.
- phase correlation layer (DC)
- the role of the phase correlation layer is to solve the phase correlation, that is, to calculate the cross-correlation between two amplitude spectra. According to the correlation obtained from the solution, the translation transformation relationship between the two can be obtained.
- the specific calculation process of the cross-correlation belongs to the prior art and will not be repeated here.
- S2 The first feature map and the second feature map obtained in S1 are respectively subjected to the first Fourier transform operation (denoted as FFT1) and then the respective amplitude spectra are obtained. At this time, the original input is retained between the two amplitude spectra The rotation and scaling transformation relationship between pictures, but the translation transformation relationship has been filtered out in FFT1.
- FFT1 first Fourier transform operation
- S3 The two amplitude spectra obtained in S2 are respectively subjected to the first logarithmic polar coordinate transformation operation (denoted as LPT1), so that it is converted from the Cartesian coordinate system to the logarithmic polar coordinate system, so that the two amplitude spectra
- LPT1 first logarithmic polar coordinate transformation operation
- the rotation transformation in the Cartesian coordinate system between is mapped to the translation transformation in the y direction in the logarithmic polar coordinate system.
- S4 Perform a phase correlation solution in the phase correlation layer (DC) on the magnitude spectra after the two coordinate transformations in S3, and obtain the translation transformation relationship between them.
- DC phase correlation layer
- the above rotation transformation relationship is essentially the angle theta that needs to be rotated to achieve registration between the image to be matched and the template image.
- the pre-trained third U-Net network U-Net3 and the fourth U-Net network U-Net4 are used as two feature extractors, and the heterogeneous template image and the picture to be matched are used as two feature extractors.
- the feature extractors U-Net3 and U-Net4 respectively extract the isomorphic features in the two original input pictures to obtain the isomorphic third feature map and fourth feature map.
- the translation, rotation and scaling transformation relationships between the original input images are also preserved in the third feature map and the fourth feature map.
- S6 Perform the second Fourier transform operation (referred to as FFT2) on the third feature map and the fourth feature map obtained in S5, respectively, and obtain their respective amplitude spectra. Similarly, the rotation and scaling transformation relations between the original input images are preserved between the two magnitude spectra, while the translation transformation relations have been filtered out in FFT2.
- FFT2 the second Fourier transform operation
- S7 The two amplitude spectra obtained in S6 are respectively subjected to the second logarithmic polar coordinate transformation operation (referred to as LPT2), so that it is transformed from the Cartesian coordinate system into the logarithmic polar coordinate system, so that the two amplitude spectra
- LPT2 logarithmic polar coordinate transformation operation
- S8 Solve the phase correlation in the phase correlation layer (DC) for the magnitude spectra after the two coordinate transformations in S7, and obtain the translation transformation relationship between them.
- DC phase correlation layer
- LPT2 of S7 there is a mapping relationship between the rotation transformation in the Cartesian coordinate system and the translation transformation in the x direction in the logarithmic polar coordinate system, so you can follow the Cartesian coordinate system and logarithmic polar coordinate system in S7
- the mapping relationship between the coordinate systems is re-transformed to obtain the scaling transformation relationship between the aforementioned template image and the aforementioned picture to be matched.
- the above-mentioned scaling transformation relationship is essentially the scaling scale that needs to be scaled in order to achieve registration between the image to be matched and the template image.
- S11 Solve the phase correlation of the fifth feature map and the sixth feature map obtained in S10 in the phase correlation layer (DC), and obtain the translation transformation relationship X in the x direction between the template image and the picture to be matched.
- DC phase correlation layer
- the above-mentioned translation transformation relationship in the x direction and the translation transformation relationship in the y direction are essentially that the image to be matched needs to be registered with the template image, and the translation distance X in the x direction and the translation distance Y in the y direction are required respectively. .
- the pose estimation of the present invention is implemented in two stages, and the estimated values of four degrees of freedom (X, Y, theta, scale) are obtained in total.
- the estimation of the relationship between rotation and scaling is realized through the rotation and scaling stages of S1 ⁇ S9, and then the estimation of the relationship of translation transformation is realized through the translation stage of S10 ⁇ S13.
- the above-mentioned processing process of S1-S9 can be referred to as shown in a) in FIG. 2
- the above-mentioned processing process of S10-S13 can be referred to as shown in FIG. 2 b).
- the pose estimation values of the three transformation relationships of rotation, scaling, and translation between the heterogeneous template image and the image to be matched can be obtained, thereby completing the pose estimation process of the two , and then the heterogeneous images can be registered according to the corresponding estimated values.
- the 8 U-Net networks are pre-trained. In order to ensure that each U-Net network can accurately extract isomorphic features, a reasonable loss function needs to be set.
- the total loss function for training should be the weighted sum of the rotation transformation relationship loss between the template image and the image to be matched, the scaling transformation relationship loss, the translation transformation relationship loss in the x direction, and the translation transformation relationship loss in the y direction, specifically The weighted value can be adjusted according to the actual situation.
- the weighted weights of the four kinds of losses in the total loss function are all 1, and the four kinds of losses all adopt L1 loss, and the four kinds of loss functions are as follows:
- the model parameters of the 8 U-Net networks are optimized by the gradient descent method to minimize the total loss function.
- the 8 U-Net networks form a pose estimator for estimating the pose of the actual heterogeneous image.
- the pose estimator of the two heterogeneous images can be performed according to the above methods S1 ⁇ S13. Pose estimation, and image registration based on the estimation results.
- a neural network-based heterogeneous image registration method can be further provided.
- the method is: The image to be matched is simultaneously rotated, scaled, and translated according to the estimated values of the three transformation relationships (X, Y, theta, and scale), so that it can be registered to the template image. Then the template image and the registered image to be matched are matched and stitched.
- FIG. 3 it is a specific example of using the above-mentioned pose estimator to perform pose estimation and registration of a single group of heterogeneous images.
- the single group of heterogeneous pictures contains a template image and an image to be matched.
- 4 degrees of freedom can be matched , input the two images on the left, and output the matching result on the right. It can be seen that this method can better realize the matching and registration of two heterogeneous images.
- Fig. 4 it is another specific example of using the above-mentioned pose estimator to perform pose estimation and registration of multiple sets of heterogeneous images.
- the multiple sets of heterogeneous pictures contain a template image and two images to be matched, and the 4-degree-of-freedom matching can be performed after estimating the estimated values of 4 degrees of freedom (X, Y, theta, scale) through the above-mentioned pose estimator , to match multiple observation pictures in a global map as a template image, so as to realize multi-source data fusion.
- the simulated data set is a graph randomly generated by a computer and its 4 degrees of freedom and appearance transformation
- the real data set 1 is the map collected by the ground robot using the black and white camera and the ground map collected by the color camera of the aerial UAV color camera
- the real data set 2 is the map collected by the ground robot using the lidar and the color camera of the aerial UAV
- the real data set 3 is the map collected by the ground robot using the color camera and the ground map collected by the color camera of the aerial drone.
- the present invention can accurately realize the accurate pose estimation and registration of heterogeneous pictures, and the required time is relatively short, with high accuracy and real-time performance, which can meet the actual application requirements and can be applied to Robot self-positioning and other fields.
- a neural network-based heterogeneous image pose estimation device which includes a memory and a processor;
- the memory is used to store computer programs
- the processor is configured to, when executing the computer program, implement the aforementioned method for estimating poses of heterogeneous images based on neural networks.
- a computer-readable storage medium may also be provided, on which a computer program is stored.
- the computer program is executed by a processor, the aforementioned neural network-based Heterogeneous image pose estimation method.
- a neural network-based heterogeneous image registration device which includes a memory and a processor;
- the memory is used to store computer programs
- the processor is configured to implement the aforementioned neural network-based heterogeneous image registration method when executing the computer program.
- a computer-readable storage medium may also be provided, on which a computer program is stored.
- the computer program is executed by a processor, the aforementioned neural network-based Heterogeneous image registration methods.
- the above-mentioned memory may include a random access memory (Random Access Memory, RAM), and may also include a non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk memory.
- the above-mentioned processor can be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (Digital Signal Processing, DSP), an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
- DSP Digital Signal Processing
- ASIC Application Specific Integrated Circuit
- FPGA Field-Programmable Gate Array
- the device should also have the necessary components to realize the running of the program, such as power supply, communication bus and so on.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
本发明公开了一种基于神经网络的异构图像位姿估计及配准方法,属于图像处理领域。本发明将相位相关算法优化为可微分,并将其嵌入到端到端学习网络框架中,构建了一种基于神经网络的异构图像位姿估计方法。该方法能够针对图像匹配的结果找到最优的特征提取器,不需要详尽的评估就能得到解,又具有良好的可解释性和泛化能力。测试结果表明,本发明能够准确实现异构图片的准确位姿估计和配准,而且所需的时间较短,具有较高的准确率和实时性,能够满足实际应用需求,可应用于机器人自定位等领域。
Description
本发明属于图像处理领域,具体涉及一种图像位姿估计及匹配方法。
自定位是移动机器人最基本的问题之一。经过十几年的研究,将给定的某一观测在同一传感器建立的地图中完成定位已经相对成熟。但对于来自异构传感器的测量匹配仍然是一个开放性的问题。异构传感器受限于传感器自身的特性,其得到的两幅图像属于存在角度、比例、视角等差异的异构图像;而且传感器在获取图形时还会受到光照、阴影和遮挡等不同形式的干扰,而这些干扰都会使位姿估计变得异常困难。考虑近年来研究人员在构建地图方方面的积极进展,我们也希望通过构建地图的方式完成多传感器得到的异构图像的匹配,使匹配后形成的地图能够被多个配备异构传感器机器人共享。
关于同风格的同构图像匹配的现有技术可以分为两类:一类是依靠点特征匹配来在特定的情景中进行定位,另一类是应用相关方法来寻找解空间中的最佳候选位置。然而,当面对异构图像时,所有这些方法的效果都不理想。
因此,设计一套针对异构图像位姿估计及配准的方法,是现有技术中亟待解决的技术问题。
发明内容
本发明的目的在于解决现有技术中异构图像难以实现位姿估计和配准的问题,并提供一种基于神经网络的异构图像位姿估计及配准方法。
本发明所采用的具体技术方案如下:
第一方面,本发明提供了一种基于神经网络的异构图像位姿估计方法,其步骤如下:
S1:以预先经过训练的第一U-Net网络和第二U-Net网络作为两个特征提取器,分别以异构的模板图像和待匹配图片作为两个特征提取器各自的原始输入图 片,提取两张原始输入图片中的同构特征,得到同构的第一特征图和第二特征图;
S2:将S1中得到的第一特征图和第二特征图分别进行傅里叶变换后取各自的幅度谱;
S3:将S2中得到的两个幅度谱分别进行对数极坐标变换,使其从笛卡尔坐标系转换到对数极坐标系中,从而使两个幅度谱之间笛卡尔坐标系下的旋转变换被映射成对数极坐标系中y方向上的平移变换;
S4:将S3中两个坐标变换后的幅度谱进行相位相关求解,得到二者之间的平移变换关系,再按照S3中笛卡尔坐标系和对数极坐标系之间的映射关系重新转换,得到所述模板图像和所述待匹配图片之间的旋转变换关系;
S5:以预先经过训练的第三U-Net网络和第四U-Net网络作为两个特征提取器,分别以异构的模板图像和待匹配图片作为两个特征提取器各自的原始输入图片,提取两张原始输入图片中的同构特征,得到同构的第三特征图和第四特征图;
S6:将S5中得到的第三特征图和第四特征图分别进行傅里叶变换后取各自的幅度谱;
S7:将S6中得到的两个幅度谱分别进行对数极坐标变换,使其从笛卡尔坐标系转换到对数极坐标系中,从而使两个幅度谱之间笛卡尔坐标系下的缩放变换被映射成对数极坐标系中x方向上的平移变换;
S8:将S7中两个坐标变换后的幅度谱进行相位相关求解,得到二者之间的平移变换关系,再按照S7中笛卡尔坐标系和对数极坐标系之间的映射关系重新转换,得到所述模板图像和所述待匹配图片之间的缩放变换关系;
S9:将所述待匹配图片按照S4和S8中得到的旋转变换关系和缩放变换关系进行对应的旋转和缩放变换,得到一张新的待匹配图片模板图像;
S10:以预先经过训练的第五U-Net网络和第六U-Net网络作为两个特征提取器,分别以模板图像和新的待匹配图片作为两个特征提取器各自的原始输入图片,提取两张原始输入图片中的同构特征,得到同构的第五特征图和第六特征图;
S11:将S10中得到的第五特征图和第六特征图进行相位相关求解,得到所述模板图像和所述待匹配图片之间的x方向上的平移变换关系;
S12:以预先经过训练的第七U-Net网络和第八U-Net网络作为两个特征提取器,分别以模板图像和新的待匹配图片作为两个特征提取器各自的原始输入图 片,提取两张原始输入图片中的同构特征,得到同构且仅保留有原始输入图片之间的平移变换关系的第七特征图和第八特征图;
S13:将S12中得到的第七特征图和第八特征图进行相位相关求解,得到所述模板图像和所述待匹配图片之间的y方向上的平移变换关系,完成异构的模板图像和待匹配图片之间旋转、缩放和平移三种变换关系的位姿估计。
第二方面,本发明提供了一种基于神经网络的异构图像配准方法,其做法是按照第一方面所述异构图像位姿估计方法得到模板图像和待匹配图片之间的位姿估计,然后将待匹配图片按照估计的变换关系同时进行旋转、缩放和平移变换,使其配准至模板图像,实现模板图像和待匹配图片之间的匹配拼接。
本发明相对于现有技术而言,具有以下有益效果:
本发明将相位相关算法优化为为可微分,并将其嵌入到端到端学习网络框架中,构建了一种基于神经网络的异构图像位姿估计方法。该方法能够针对图像匹配的结果找到最优的特征提取器,不需要详尽的评估就能得到解,又具有良好的可解释性和泛化能力。测试结果表明,本发明能够准确实现异构图片的准确位姿估计和配准,而且所需的时间较短,具有较高的准确率和实时性,能够满足实际应用需求,可应用与机器人自定位等领域。
图1为本发明中位姿估计器的网络框架结构示意图;
图2为本发明位姿估计方法的两个阶段流程示意图;
图3为本发明一个实施例中的图形配准过程示意图;
图4为本发明另一个实施例中的图形配准结果示意图。
下面结合附图和具体实施方式对本发明做进一步阐述和说明。本发明中各个实施方式的技术特征在没有相互冲突的前提下,均可进行相应组合。
异构传感器受限于传感器自身的特性,其得到的两幅图像属于存在角度、比例、视角等差异的异构图像。而且传感器在获取图形时还会受到光照、阴影和遮挡等不同形式的干扰,而这些干扰都会使位姿估计变得异常困难。例如,O
1是由无人机的鸟瞰相机在清晨获取的,而O
2是由地面机器人用激光雷达构建的局部高程图,这两种图形就属于异构图像,两者无法直接匹配。为了解决这个问题,一 般的处理方法是从两幅图像中提取特征,用特征代替原始传感器测量值来估计相对姿态。
本发明针对异构传感器获取到的异构图像,构建了一种基于神经网络的异构图像位姿估计方法对任意两张异构图像之间的位姿变换关系进行估计。该估计方法是通过一个基于神经网络构建的位姿估计器来实现的,其本质是一种可微分的相位相关算法。相位相关是一种基于相似性的匹配器,它对具有相同模态的输入表现良好,但只能在小的高频噪声情况下完成匹配。我们将相位相关算法优化为为可微分,并将其嵌入到我们的端到端学习网络框架中,形成位姿估计器。这种架构使得我们的系统能够针对图像匹配的结果找到最优的特征提取器。具体来说,本发明采用了传统的相位相关,并赋予快速傅里叶变换层(FFT)、对数极性变换层(LPT)和相位相关层(DC)以可微分性质,从而使其可用于端到端位姿估计器的训练。
如图1所示,为本发明一个较佳实施例中构建的位姿估计器的网络框架结构,其核心是8个独立的U-Net网络以及傅里叶变换层(FFT)、对数极性变换层(LPT)和相位相关层(DC),该位姿估计器的输入是一对异构的图形,记为模板图像Source和待匹配图片Template,其最终输出是配准模板图像和待匹配图片所需要的三种位姿变换关系,即平移、旋转和缩放。模板图像用于作为匹配的模板,待匹配图片通过位姿变换后可以匹配拼接到模板图像上。
为了解决异构图像无法直接配准的问题,一般的处理方法是从两幅图像中提取特征,用特征代替原始传感器测量值来估计相对姿态。在传统的相位相关算法中,利用高通滤波器来抑制两个输入的随机随机噪声,而这一过程可以看作是一个特征提取器。但是对于一对输入的异构图像而言,两者之间存在比较明显的变化,一个高通滤波器是远远不够的。考虑到没有共同的特征来直接监督特征提取器,本发明利用端到端学习来解决这个问题。本发明中对模板图像和源图像在旋转缩放阶段和平移阶段分别构建了8个独立的可训练U-Net网络(记为U-Net1~U-Net8),这8个U-Net网络在平移、旋转和缩放三类损失的监督下预先经过训练后,能够从异构图像中提取出图片中的同构特征即共同特征,从而将两张异构图像转换为两张同构的特征图。本发明中,假如仅设4个U-Net网络,那么旋转与缩放两种变换的求解需要被偶合起来的,x方向平移与y方向平移的求解也需 要被偶合起来,这样训练得到的特征提取器所提取的特征存在效果不佳的情况;因此,我们将旋转、缩放、x平移、y平移解耦,分别训练各自的U-Net网络,一共得到8个U-Net网络,以达到提升精度的效果。
在本实施例中,对于8个独立的U-Net网络,其输入和输出大小分别为256×256。每个U-Net网络由4个下采样的编码器层和4个上采样的解码器层来提取特征。随着训练的进行,8个U-Net的参数会被调整。请注意,这个网络是轻量级的,所以它具有足够高效的实时性,能够满足应用场景的要求。
另外,傅里叶变换层(FFT)的作用是对U-Net网络提取的特征图进行傅里叶变换,去掉图像之间的平移变换关系但保留旋转和缩放变换关系。因为根据傅里叶变换的特性,只有旋转和比例对频谱的幅度有影响,但对频谱的幅度对平移不敏感。因此引入FFT后就得到了对平移不敏感但对缩放和旋转尤其敏感的表示方法,因此在后续求解缩放和旋转时可以忽略平移。
另外,对数极性变换层(LPT)的作用是对FFT变换后的图像进行对数极坐标变换,将图像从笛卡尔坐标系映射至对数极坐标系。在该映射过程中,笛卡尔坐标系下的缩放和旋转可以转换成对数极坐标系下的平移。该坐标系变换,可以得出关于缩放和旋转的交叉相关形式,消除整个位姿估计器中的所有穷尽性评价。
另外,相位相关层(DC)的作用是进行相位相关求解,即计算两个幅度谱之间的交叉相关性。根据求解得到的相关性,可以得到二者之间的平移变换关系。交叉相关性的具体计算过程属于现有技术,不再赘述。
下面基于上述位姿估计器,对本发明一个较佳实施例中的具体异构图像位姿估计过程进行详细描述,其步骤如下:
S1:以预先经过训练的第一U-Net网络U-Net1和第二U-Net网络U-Net2作为两个特征提取器,分别以异构的模板图像和待匹配图片作为两个特征提取器U-Net1和U-Net2各自的原始输入图片(即模板图像输入U-Net1中,而待匹配图片输入U-Net2中,下同),提取两张原始输入图片中的同构特征,得到同构的第一特征图和第二特征图。此时,第一特征图和第二特征图中同时保留有原始输入图片之间的平移、旋转和缩放变换关系。
S2:将S1中得到的第一特征图和第二特征图分别进行第一次傅里叶变换操作(记为FFT1)后取各自的幅度谱,此时两个幅度谱之间保留有原始输入图片 之间的旋转和缩放变换关系,但平移变换关系已在FFT1中被滤掉。
S3:将S2中得到的两个幅度谱分别进行第一次对数极坐标变换操作(记为LPT1),使其从笛卡尔坐标系转换到对数极坐标系中,从而使两个幅度谱之间笛卡尔坐标系下的旋转变换被映射成对数极坐标系中y方向上的平移变换。
S4:将S3中两个坐标变换后的幅度谱在相位相关层(DC)中进行相位相关求解,得到二者之间的平移变换关系。需注意,在S3的LPT1中,笛卡尔坐标系下的旋转变换与对数极坐标系中y方向上的平移变换之间存在映射关系,因此可以再将该平移变换关系按照S3中笛卡尔坐标系和对数极坐标系之间的映射关系重新转换,得到前述模板图像和待匹配图片之间的旋转变换关系。
上述旋转变换关系本质上是待匹配图片要实现与模板图像的配准,需要被旋转的角度theta。
S5:同样的,以预先经过训练的第三U-Net网络U-Net3和第四U-Net网络U-Net4作为两个特征提取器,分别以异构的模板图像和待匹配图片作为两个特征提取器U-Net3和U-Net4各自的原始输入图片,提取两张原始输入图片中的同构特征,得到同构的第三特征图和第四特征图。此时,第三特征图和第四特征图中也同时保留有原始输入图片之间的平移、旋转和缩放变换关系。
S6:将S5中得到的第三特征图和第四特征图分别进行第二次傅里叶变换操作(记为FFT2)后取各自的幅度谱。同样的,这两个幅度谱之间保留有原始输入图片之间的旋转和缩放变换关系而平移变换关系已在已在FFT2中被滤掉。
S7:将S6中得到的两个幅度谱分别进行第二次对数极坐标变换操作(记为LPT2),使其从笛卡尔坐标系转换到对数极坐标系中,从而使两个幅度谱之间笛卡尔坐标系下的缩放变换被映射成对数极坐标系中x方向上的平移变换。
S8:将S7中两个坐标变换后的幅度谱在相位相关层(DC)中进行相位相关求解,得到二者之间的平移变换关系。同样的,在S7的LPT2中,笛卡尔坐标系下的旋转变换与对数极坐标系中x方向上的平移变换之间存在映射关系,因此可以再按照S7中笛卡尔坐标系和对数极坐标系之间的映射关系重新转换,得到前述模板图像和前述待匹配图片之间的缩放变换关系。
上述缩放变换关系本质上是待匹配图片要实现与模板图像的配准,需要被缩放的比例scale。
由此,通过上述步骤,已获得了模板图像和待匹配图片之间的旋转变换关系和缩放变换关系。
S9:将前述待匹配图片按照S4和S8中得到的旋转变换关系和缩放变换关系进行对应的旋转和缩放变换,得到一张新的待匹配图片。由于通过旋转和缩放变换后,模板图像和待匹配图片之间已不存在角度和比例的差异,因此新的待匹配图片与输入的模板图像之间目前仅包含平移变换关系,而不存在旋转变换关系和缩放变换关系,后续仅需要通过平移变换消除两者之间的平移差异即可。对于平移变换关系,只需要通过相位相关求解,就可以获取其x和y方向上的平移变换关系。
S10:以预先经过训练的第五U-Net网络U-Net5和第六U-Net网络U-Net6作为两个特征提取器,分别以模板图像和新的待匹配图片作为两个特征提取器U-Net5和U-Net6各自的原始输入图片,提取两张原始输入图片中的同构特征,得到同构的第五特征图和第六特征图。此时,第五特征图和第六特征图中仅保留有原始输入图片之间的平移变换关系,而不存在旋转和缩放变换关系。
S11:将S10中得到的第五特征图和第六特征图在相位相关层(DC)中进行相位相关求解,得到模板图像和待匹配图片之间的x方向上的平移变换关系X。
S12:以预先经过训练的第七U-Net网络U-Net7和第八U-Net网络U-Net8作为两个特征提取器,分别以模板图像和新的待匹配图片作为两个特征提取器U-Net7和U-Net8各自的原始输入图片,提取两张原始输入图片中的同构特征,得到同构的第七特征图和第八特征图。此时,第七特征图和第八特征图中仅保留有原始输入图片之间的平移变换关系,而不存在旋转和缩放变换关系。
S13:将S12中得到的第七特征图和第八特征图在相位相关层(DC)中进行相位相关求解,得到模板图像和待匹配图片之间的y方向上的平移变换关系Y。
上述x方向上的平移变换关系和y方向上的平移变换关系本质上是待匹配图片要实现与模板图像的配准,分别需要在x方向上平移的距离X和在y方向上平移的距离Y。
由此可见,本发明的位姿估计是分为两个阶段来实现的,一共得到了四个自由度(X、Y、theta、scale)的估计值。首先,通过S1~S9的旋转缩放阶段实现旋转和缩放变换关系的估计,然后再通过S10~S13平移阶段实现平移变换关系 的估计。上述S1~S9的处理过程可参见图2中a)所示,上述S10~S13的处理过程可参加图2中b)所示。
综合上述S4、S8、S11和S13的结果,就可以得到异构的模板图像和待匹配图片之间旋转、缩放和平移三种变换关系的位姿估计值,从而完成两者的位姿估计过程,后续可以根据相应的估计值将异构图像进行配准。
需注意,上述位姿估计器中,8个U-Net网络均预先进行训练,为了保证每一个U-Net网络均可以准确提取同构特征,需要设置合理的损失函数。训练的总损失函数应当为模板图像和所述待匹配图片之间的旋转变换关系损失、缩放变换关系损失、x方向上的平移变换关系损失和y方向上的平移变换关系损失的加权和,具体加权值可根据实际进行调整。
在本实施例中,总损失函数中四种损失的加权权值均为1,且四种损失均采用L1损失,四种损失函数分别如下:
将S4中预测的旋转关系theta记为theta_predict,将S8中预测的缩放关系scale记为scale_predict,将S11中预测的x方向上的平移变换X记为x_predict,将S13中预测的y方向上的平移变换Y记为y_predict。因此,每一轮训练过程中求得两个异构图片之间的平移(x_predict,y_predict),旋转(theta_predict),缩放(scale_predict)关系。
1)在模型中将所求得的theta_predict与其真值theta_gt做1范数距离损失,L_theta=(theta_gt-theta_predict),将L_theta回传用以训练U-Net1、U-Net2,使其能够提取到更好的用于求theta_predict的特征。
2)将所求得的scale_predict与其真值scale_gt做1范数距离损失,L_scale=(scale_gt-scale_predict),将L_scale回传用以训练U-Net3、U-Net4,使其能够提取到更好的用于求scale_predict的特征。
3)将所求得的x_predict与其真值x_gt做1范数距离损失,L_x=(x_gt-x_predict),将L_x回传用以训练U-Net5、U-Net6,使其能够提取到更好的用于求x_predict的特征。
4)将所求得的y_predict与其真值y_gt做1范数距离损失,L_y=(y_gt-y_predict),将L_y回传用以训练U-Net7、U-Net8,使其能够提取到更好的用于求y_predict的特征。
因此,总损失函数为L=L_x+L_y+L_theta+L_scale,训练过程中通过梯度下降方法对8个U-Net网络的模型参数进行优化,使总损失函数最小。训练完毕后的8个U-Net网络组成了用于对实际的异构图像进行位姿估计的位姿估计器,该位姿估计器中可按照上述S1~S13的方法进行两张异构图像的位姿估计,并根据估计结果进行图像配准。
本发明中,在上述异构图像位姿估计方法得到模板图像和待匹配图片之间的位姿估计基础上,可以进一步提供一种基于神经网络的异构图像配准方法,其做法是:将待匹配图片按照估计得到的三种变换关系估计值(X、Y、theta、scale),同时进行旋转、缩放和平移变换,使其配准至模板图像。然后再将模板图像和配准后的待匹配图片进行匹配拼接。
但需要说明的是,上述位姿估计器中,待匹配图片可以是一张也可以是多张,如果有多张待匹配图片仅需要不断重复相同的位姿估计过程,然后将其分别配准到模板图像上即可。
如图3所示,为利用上述位姿估计器进行单组异构图像的位姿估计和配准的一个具体实例。该单组异构图片中含有一张模板图像和一张待匹配图像,通过上述位姿估计器估计4个自由度的估计值(X、Y、theta、scale)后即可进行4自由度匹配,输入左侧两张图片,输出右侧匹配结果,可见该方法可以较好地实现两张异构图像的匹配配准。
如图4所示,为利用上述位姿估计器进行多组异构图像的位姿估计和配准的另一个具体实例。该多组异构图片中含有一张模板图像和两张待匹配图像,通过上述位姿估计器估计4个自由度的估计值(X、Y、theta、scale)后即可进行4自由度匹配,将多个观测图片匹配在一张作为模板图像的全局地图中,从而实现多源数据融合。
为了进一步评估本发明上述方法的技术效果,在不同实物数据集中进行了详尽的评估,评估结果如表1所示,其中模拟数据集为计算机随机生成的图形及其4自由度以及样貌变换;真实数据集1为地面机器人利用黑白相机所采集的地图以及空中无人机彩色相机彩色相机所采集的地面地图;真实数据集2为地面机器人利用激光雷达所采集的地图以及空中无人机彩色相机彩色相机所采集的地面地图;真实数据集3为地面机器人利用彩色相机所采集的地图以及空中无人机彩 色相机所采集的地面地图。
表1 本发明在不同实物数据集中的评估结果
数据集 | X精度% | Y精度% | 旋转精度% | 缩放精度% | 运行时间ms |
模拟数据集 | 98.7 | 97.9 | 99.3 | 98.1 | 102 |
真实数据集1 | 95.2 | 92.3 | 99.1 | 97.5 | 101 |
真实数据集2 | 97.6 | 91.4 | 98.9 | 95.0 | 105 |
真实数据集3 | 92.9 | 94.7 | 99.1 | 98.6 | 99 |
从表中结果可见,本发明能够准确实现异构图片的准确位姿估计和配准,而且所需的时间较短,具有较高的准确率和实时性,能够满足实际应用需求,可应用与机器人自定位等领域。
另外,在本发明的其他实施例中,还可以提供一种基于神经网络的异构图像位姿估计装置,其包括存储器和处理器;
所述存储器,用于存储计算机程序;
所述处理器,用于当执行所述计算机程序时,实现前述的基于神经网络的异构图像位姿估计方法。
另外,在本发明的其他实施例中,还可以提供一种计算机可读存储介质,所述存储介质上存储有计算机程序,当所述计算机程序被处理器执行时,实现前述的基于神经网络的异构图像位姿估计方法。
另外,在本发明的其他实施例中,还可以提供一种基于神经网络的异构图像配准装置,其包括存储器和处理器;
所述存储器,用于存储计算机程序;
所述处理器,用于当执行所述计算机程序时,实现前述的基于神经网络的异构图像配准方法。
另外,在本发明的其他实施例中,还可以提供一种计算机可读存储介质,所述存储介质上存储有计算机程序,当所述计算机程序被处理器执行时,实现前述的基于神经网络的异构图像配准方法。
需要注意的是,上述存储器可以包括随机存取存储器(Random Access Memory,RAM),也可以包括非易失性存储器(Non-Volatile Memory,NVM),例如至少一个磁盘存储器。上述处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。当然,装置中还应当具有实现程序运行的必要组件,例如电源、通信总线等等。
以上所述的实施例只是本发明的一种较佳的方案,然其并非用以限制本发明。有关技术领域的普通技术人员,在不脱离本发明的精神和范围的情况下,还可以做出各种变化和变型。因此凡采取等同替换或等效变换的方式所获得的技术方案,均落在本发明的保护范围内。
Claims (10)
- 一种基于神经网络的异构图像位姿估计方法,其特征在于,步骤如下:S1:以预先经过训练的第一U-Net网络和第二U-Net网络作为两个特征提取器,分别以异构的模板图像和待匹配图片作为两个特征提取器各自的原始输入图片,提取两张原始输入图片中的同构特征,得到同构的第一特征图和第二特征图;S2:将S1中得到的第一特征图和第二特征图分别进行傅里叶变换后取各自的幅度谱;S3:将S2中得到的两个幅度谱分别进行对数极坐标变换,使其从笛卡尔坐标系转换到对数极坐标系中,从而使两个幅度谱之间笛卡尔坐标系下的旋转变换被映射成对数极坐标系中y方向上的平移变换;S4:将S3中两个坐标变换后的幅度谱进行相位相关求解,得到二者之间的平移变换关系,再按照S3中笛卡尔坐标系和对数极坐标系之间的映射关系重新转换,得到所述模板图像和所述待匹配图片之间的旋转变换关系;S5:以预先经过训练的第三U-Net网络和第四U-Net网络作为两个特征提取器,分别以异构的模板图像和待匹配图片作为两个特征提取器各自的原始输入图片,提取两张原始输入图片中的同构特征,得到同构的第三特征图和第四特征图;S6:将S5中得到的第三特征图和第四特征图分别进行傅里叶变换后取各自的幅度谱;S7:将S6中得到的两个幅度谱分别进行对数极坐标变换,使其从笛卡尔坐标系转换到对数极坐标系中,从而使两个幅度谱之间笛卡尔坐标系下的缩放变换被映射成对数极坐标系中x方向上的平移变换;S8:将S7中两个坐标变换后的幅度谱进行相位相关求解,得到二者之间的平移变换关系,再按照S7中笛卡尔坐标系和对数极坐标系之间的映射关系重新转换,得到所述模板图像和所述待匹配图片之间的缩放变换关系;S9:将所述待匹配图片按照S4和S8中得到的旋转变换关系和缩放变换关系进行对应的旋转和缩放变换,得到一张新的待匹配图片模板图像;S10:以预先经过训练的第五U-Net网络和第六U-Net网络作为两个特征提取器,分别以模板图像和新的待匹配图片作为两个特征提取器各自的原始输入图片,提取两张原始输入图片中的同构特征,得到同构的第五特征图和第六特征图;S11:将S10中得到的第五特征图和第六特征图进行相位相关求解,得到所述模板图像和所述待匹配图片之间的x方向上的平移变换关系;S12:以预先经过训练的第七U-Net网络和第八U-Net网络作为两个特征提取器,分别以模板图像和新的待匹配图片作为两个特征提取器各自的原始输入图片,提取两张原始输入图片中的同构特征,得到同构且仅保留有原始输入图片之间的平移变换关系的第七特征图和第八特征图;S13:将S12中得到的第七特征图和第八特征图进行相位相关求解,得到所述模板图像和所述待匹配图片之间的y方向上的平移变换关系,完成异构的模板图像和待匹配图片之间旋转、缩放和平移三种变换关系的位姿估计。
- 如权利要求1所述的基于神经网络的异构图像位姿估计方法,其特征在于,所述估计方法中的8个U-Net网络均预先进行训练,训练的总损失函数为所述模板图像和所述待匹配图片之间的旋转变换关系损失、缩放变换关系损失、x方向上的平移变换关系损失和y方向上的平移变换关系损失的加权和。
- 如权利要求2所述的基于神经网络的异构图像位姿估计方法,其特征在于,所述总损失函数中四种损失的加权权值均为1。
- 如权利要求1所述的基于神经网络的异构图像位姿估计方法,其特征在于,所述总损失函数中四种损失均采用L1损失。
- 如权利要求1所述的基于神经网络的异构图像位姿估计方法,其特征在于,所述估计方法中的8个U-Net网络相互独立,各自通过4个下采样的编码器层和4个上采样的解码器层来提取特征。
- 一种基于神经网络的异构图像配准方法,其特征在于,按照如权利要求1~5任一所述异构图像位姿估计方法得到模板图像和待匹配图片之间的位姿估计,然后将待匹配图片按照估计的变换关系同时进行旋转、缩放和平移变换,使其配准至模板图像,实现模板图像和待匹配图片之间的匹配拼接。
- 一种基于神经网络的异构图像位姿估计装置,其特征在于,包括存储器和处理器;所述存储器,用于存储计算机程序;所述处理器,用于当执行所述计算机程序时,实现如权利要求1~5任一项所述的基于神经网络的异构图像位姿估计方法。
- 一种计算机可读存储介质,其特征在于,所述存储介质上存储有计算机程序,当所述计算机程序被处理器执行时,实现如权利要求1~5任一项所述的基于神经网络的异构图像位姿估计方法。
- 一种基于神经网络的异构图像配准装置,其特征在于,包括存储器和处理器;所述存储器,用于存储计算机程序;所述处理器,用于当执行所述计算机程序时,实现如权利要求6所述的基于神经网络的异构图像配准方法。
- 一种计算机可读存储介质,其特征在于,所述存储介质上存储有计算机程序,当所述计算机程序被处理器执行时,实现如权利要求6所述的基于神经网络的异构图像配准方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/512,075 US20240169584A1 (en) | 2021-05-18 | 2023-11-17 | Neural network-based pose estimation and registration method and device for heterogeneous images, and medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110540496.1A CN113240743B (zh) | 2021-05-18 | 2021-05-18 | 基于神经网络的异构图像位姿估计及配准方法、装置及介质 |
CN202110540496.1 | 2021-05-18 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/512,075 Continuation US20240169584A1 (en) | 2021-05-18 | 2023-11-17 | Neural network-based pose estimation and registration method and device for heterogeneous images, and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022241877A1 true WO2022241877A1 (zh) | 2022-11-24 |
Family
ID=77135024
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/099255 WO2022241877A1 (zh) | 2021-05-18 | 2021-06-09 | 基于神经网络的异构图像位姿估计及配准方法、装置及介质 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20240169584A1 (zh) |
CN (1) | CN113240743B (zh) |
WO (1) | WO2022241877A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115619835B (zh) * | 2022-09-13 | 2023-09-01 | 浙江大学 | 基于深度相位相关的异构三维观测配准方法、介质及设备 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521834A (zh) * | 2011-12-12 | 2012-06-27 | 上海海事大学 | 采用极对数坐标表示的分数阶傅里叶变换的图像配准方法 |
CN103020945A (zh) * | 2011-09-21 | 2013-04-03 | 中国科学院电子学研究所 | 一种多源传感器的遥感图像配准方法 |
CN103606139A (zh) * | 2013-09-09 | 2014-02-26 | 上海大学 | 一种声纳图像拼接方法 |
US20200184641A1 (en) * | 2018-12-06 | 2020-06-11 | Definiens Gmbh | A Deep Learning Method For Predicting Patient Response To A Therapy |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104596502B (zh) * | 2015-01-23 | 2017-05-17 | 浙江大学 | 一种基于cad模型与单目视觉的物体位姿测量方法 |
CN105427298B (zh) * | 2015-11-12 | 2018-03-06 | 西安电子科技大学 | 基于各向异性梯度尺度空间的遥感图像配准方法 |
US9830732B1 (en) * | 2016-05-16 | 2017-11-28 | The Governing Council Of The University Of Toronto | Methods and systems for image alignment of at least one image to a model |
CN107036594A (zh) * | 2017-05-07 | 2017-08-11 | 郑州大学 | 智能电站巡检智能体的定位与多粒度环境感知技术 |
CN108765479A (zh) * | 2018-04-04 | 2018-11-06 | 上海工程技术大学 | 利用深度学习对视频序列中单目视图深度估计优化方法 |
KR102275572B1 (ko) * | 2018-12-21 | 2021-07-09 | 한국전자통신연구원 | 이종 고도 항공영상을 이용한 3차원 지형정보 정합 방법 및 장치 |
CN111325794B (zh) * | 2020-02-23 | 2023-05-26 | 哈尔滨工业大学 | 一种基于深度卷积自编码器的视觉同时定位与地图构建方法 |
-
2021
- 2021-05-18 CN CN202110540496.1A patent/CN113240743B/zh active Active
- 2021-06-09 WO PCT/CN2021/099255 patent/WO2022241877A1/zh active Application Filing
-
2023
- 2023-11-17 US US18/512,075 patent/US20240169584A1/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103020945A (zh) * | 2011-09-21 | 2013-04-03 | 中国科学院电子学研究所 | 一种多源传感器的遥感图像配准方法 |
CN102521834A (zh) * | 2011-12-12 | 2012-06-27 | 上海海事大学 | 采用极对数坐标表示的分数阶傅里叶变换的图像配准方法 |
CN103606139A (zh) * | 2013-09-09 | 2014-02-26 | 上海大学 | 一种声纳图像拼接方法 |
US20200184641A1 (en) * | 2018-12-06 | 2020-06-11 | Definiens Gmbh | A Deep Learning Method For Predicting Patient Response To A Therapy |
Also Published As
Publication number | Publication date |
---|---|
CN113240743B (zh) | 2022-03-25 |
US20240169584A1 (en) | 2024-05-23 |
CN113240743A (zh) | 2021-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109559340B (zh) | 一种并行的三维点云数据自动化配准方法 | |
Yao et al. | Application of convolutional neural network in classification of high resolution agricultural remote sensing images | |
CN107358629B (zh) | 一种基于目标识别的室内建图与定位方法 | |
CN108197605A (zh) | 基于深度学习的牦牛身份识别方法 | |
Tan et al. | Image co-saliency detection by propagating superpixel affinities | |
WO2023284070A1 (zh) | 基于位姿自监督对抗生成网络的弱配对图像风格迁移方法 | |
CN102122359B (zh) | 一种图像配准方法及装置 | |
CN105160686B (zh) | 一种基于改进sift算子的低空多视角遥感影像匹配方法 | |
CN111028292A (zh) | 一种亚像素级图像匹配导航定位方法 | |
US20240169584A1 (en) | Neural network-based pose estimation and registration method and device for heterogeneous images, and medium | |
Xingteng et al. | Image matching method based on improved SURF algorithm | |
CN107862680A (zh) | 一种基于相关滤波器的目标跟踪优化方法 | |
CN104050674B (zh) | 一种显著性区域检测方法及装置 | |
Kim et al. | Ep2p-loc: End-to-end 3d point to 2d pixel localization for large-scale visual localization | |
CN116188249A (zh) | 基于图像块三阶段匹配的遥感图像配准方法 | |
Lin et al. | 6D object pose estimation with pairwise compatible geometric features | |
Liu et al. | An efficient edge-feature constraint visual SLAM | |
Popovic et al. | Surface normal clustering for implicit representation of manhattan scenes | |
Chen et al. | Feature Points Extraction and Matching Based on Improved Surf Algorithm | |
Pramanik et al. | Image registration using discrete wavelet transform and particle swarm optimization | |
Jia et al. | A particle filter human tracking method based on HOG and Hu moment | |
Wei et al. | Matching filter-based vslam optimization in indoor environments | |
WO2024055493A1 (zh) | 基于深度相位相关的异构三维观测配准方法、介质及设备 | |
Anagnostopoulos et al. | Reviewing Deep Learning-Based Feature Extractors in a Novel Automotive SLAM Framework | |
WO2024011455A1 (zh) | 一种基于激光雷达可估位姿的移动机器人位置重识别方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21940326 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21940326 Country of ref document: EP Kind code of ref document: A1 |