CN115546268A

CN115546268A - Multi-mode remote sensing image registration method, system, terminal device and storage medium

Info

Publication number: CN115546268A
Application number: CN202211162936.5A
Authority: CN
Inventors: 汪璞; 安玮; 石添鑫; 邓新蒲; 盛卫东; 林再平; 曾瑶源; 李振; 李骏
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2022-09-23
Filing date: 2022-09-23
Publication date: 2022-12-30

Abstract

The invention discloses a multi-mode remote sensing image registration method, a system, terminal equipment and a storage medium, wherein based on structural similarity between multi-mode remote sensing image pairs, a pre-trained edge detection network is used for extracting local feature descriptors (edge features), and a traditional template matching method is used for carrying out feature matching, so that accurate and efficient registration under various mode images is realized. The method obtains a result with higher matching accuracy and better robustness in multi-source image matching.

Description

Multi-mode remote sensing image registration method, system, terminal device and storage medium

Technical Field

The invention relates to the technical field of image registration, in particular to a multi-mode remote sensing image registration method, a multi-mode remote sensing image registration system, terminal equipment and a storage medium.

Background

With the rapid development of remote sensing technology, ground observation images from various sensors such as visible light and synthetic aperture radar are more and more abundant. Due to the influence of factors such as image size, cloud occlusion and imaging quality, global information of a target area is often difficult to obtain through detection of a single type of image target area in a complex environment, and multi-modal image information acquired through different platforms and sensors is complementary. Multi-modality image registration may provide the ability to jointly use multiple information, resulting in greater data volume and shorter revisit times. Therefore, in recent years, the multi-mode remote sensing image registration attracts wide attention, and the multi-mode remote sensing image registration is a process of identifying the same-name point from two or more images obtained from different sensors, different visual angles or at different time, is also a bottom-layer task, and is a preprocessing process of a plurality of remote sensing image analysis such as image super-resolution, image fusion and the like. However, due to the existence of a large amount of non-linear radiation distortion and geometric deformation between multi-modal images, high-precision registration between multi-modal images is still a challenging issue. Similar to conventional image matching, the implementation of multi-modal image registration includes a conventional method and a deep learning method. Conventional methods can be largely divided into feature-based and region-based, where the most central challenge is how to design and use appropriate similarity measures to drive the iterative process to accurately estimate the geometric transformation. One straightforward solution is to utilize or modify common metrics under information theory, and the other is to indirectly use similarity measures such as Fast Fourier Transform (FFT), structural information extraction, and mapping image intensities to high-dimensional space using descriptors, etc., by simplifying the uniform domain. In the prior art, based on the structural attributes of images, a phase consistency descriptor with illumination and contrast invariance is provided, the model is expanded to a new image registration method, and the Euclidean distance between descriptors of multi-scale phase controllers (MS-PC) is used as similarity measurement to realize the corresponding relation. Experimental results show that the MS-PC has strong robustness to radiation difference between images, and the method is superior to quantitative precision and connection point number of two common methods (SIFT and SAR-SIFT). But the use of similarity measures to guide the optimization can be computationally burdensome due to the high resolution of the remotely sensed images and the strong noise caused by the atmosphere. In addition, the remote sensing images usually have significant geometric variances, such as large rotation, scaling, deformation and small overlapping area, so that the solution space is complex and difficult to optimize. Feature-based methods, which attempt to estimate the geometric transformation between images by identifying matching features, which may be points, lines, faces, etc., but where the features must be salient and stable, such as Harris corner and Scale Invariant Feature Transform (SIFT), conventional SIFT algorithms, tend to have some problematic differences in radiometry in terms of the distribution of extracted features and the sensitivity of descriptors to saliency-especially in multi-source remote sensing imaging, considering that the structural similarity between images can be well preserved and used for image registration of different modes, the prior art proposes a fast, robust multi-modal matching framework. Specifically, a dense description image is first generated based on existing local descriptors, such as HOG and LSS. And then determining and matching similarity measurement of a frequency domain by using the 3D-FFT and the directional gradient, and extracting feature points from the Harris corner points by using a template matching scheme. The final performance of the whole large-size image pair is verified under the piecewise linear transformation model with the iterative mismatch removal process mismatching points and the local affine estimation in the cubic polynomial model estimation and consistency judgment. Experimental results show that the matching performance of the framework is superior to that of the existing matching method. However, this method is not ideal for the case where there is a large geometric offset between the two images.

With the development of artificial intelligence technology, a deep convolutional neural network is introduced into the field of image registration as an advanced feature extractor. By utilizing the nonlinear operation and the hierarchical structure of the CNN, the image information is continuously learned from low level to high level, the complex high-level image characteristics are obtained, the expression method is provided, and high-level characteristic matching of more abstract semantic information is used. The prior art provides a multi-temporal remote sensing image registration method based on CNN characteristics, and the matching performance is improved by learning multi-scale characteristic descriptors and gradually increasing the selection of initial images. The multi-scale feature descriptors are generated by a pre-trained VGG network, and a TPS model is integrated to explain that non-rigid transformation is paired under the GMM and EM frameworks. However, the method for deep learning in multi-modal remote sensing image registration is not as rich as the computer vision direction, mainly because on one hand, on the aspect of data acquisition, a data set matched with the multi-modal remote sensing image is not disclosed at present for training and testing, and on the other hand, the remote sensing image has the characteristics of high resolution, mixed noise, large geometric variance and the like, so that the design of a more effective deep registration framework is more difficult, and the prior art is difficult to realize accurate and efficient image registration.

Disclosure of Invention

The invention aims to solve the technical problem that aiming at the defects of the prior art, provides a multi-mode remote sensing image registration method, a system, terminal equipment and a storage medium, and realizes accurate and efficient image registration.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a multi-modal remote sensing image registration method comprises the following steps:

s1, extracting edge features of a reference image and a remote sensing image on different scales;

s2, respectively selecting the edge characteristics of the maximum scale of the reference image and the edge characteristics of the maximum scale of the remote sensing image, and calculating the offset between the selected edge characteristics;

s3, respectively connecting edge features of different scales corresponding to the reference image and the remote sensing image to obtain a reference image feature map and a remote sensing image feature map;

s4, determining a search area of the remote sensing image characteristic diagram by using the offset;

and S5, selecting a plurality of angular points on the search areas of the reference image characteristic diagram and the remote sensing image characteristic diagram respectively, and realizing the registration of the remote sensing image by adopting a pixel-by-pixel matching method.

The invention calculates the offset between the reference image and the remote sensing image based on the edge feature with the maximum scale, selects the search area of the remote sensing image feature map based on the offset, and performs image registration in the search area by combining the angular point detection process, thereby greatly reducing the registration calculation amount and improving the efficiency of image registration. Meanwhile, the method does not relate to a complex depth registration framework, the implementation process is simple, and experiments show that the method obtains higher matching accuracy and better robustness in multi-source image matching.

In the step S1, edge features on different scales of the reference image and the remote sensing image are extracted by utilizing a pre-trained edge detection network. The invention does not need to design a complex depth registration frame, further simplifies the image registration process and improves the image registration efficiency. Meanwhile, the result of the edge detection network is subjected to coarse matching (offset is determined) and then accurate matching is performed, so that the precision of image registration is further improved.

In the invention, the pre-trained edge detection network adopts a convolutional neural network.

In the invention, in order to ensure the image registration precision and simultaneously give consideration to the image registration efficiency, in step S1, edge features on three scales of a reference image and a remote sensing image are respectively extracted.

In step S2 of the present invention, the calculation formula of the offset (δ x, δ y) is:

wherein, (x, y) is a reference image feature map F _{ref_3} (x + delta) of the position coordinates of the pixel points (c) in (c) _x ,y+δ _y ) For remote sensing image feature map F _{sen_3} The position coordinates of the upper pixel points.

The offset is simple in calculation process, and coarse registration of the reference image and the remote sensing image (namely the image to be registered) is easy to realize.

In step S5, the specific implementation process of implementing registration of the remote sensing image by adopting the pixel-by-pixel matching method includes: and determining a matching window taking each corner point on the reference image as a center on the reference image feature map, sliding in the search area by taking each matching window as a template, and searching for a point on the remote sensing image feature map with the highest similarity with the corner point in each matching window, wherein the point with the highest similarity is a point registered with the corner point in the corresponding matching window.

The invention adopts the pixel-by-pixel matching method of template matching to realize image registration, and the fine registration process is carried out in a search area, so that the registration accuracy is ensured, the registration efficiency is considered, and the high-efficiency and high-precision image registration is realized.

In the invention, in order to further improve the image registration efficiency and accelerate the whole calculation process, before the matching window is determined, the reference image characteristic diagram and the remote sensing image characteristic diagram are converted into a frequency domain characteristic diagram through three-dimensional fast Fourier transform, and at the moment, the remote sensing image registration is realized by adopting a pixel-by-pixel matching method based on the frequency domain characteristic diagram.

As an inventive concept, the present invention also provides a multi-modal remote sensing image registration system, which includes:

the edge feature extraction module is used for extracting edge features on different scales of the reference image and the remote sensing image;

the offset calculation module is used for respectively selecting the edge characteristics with the maximum scale of the reference image and the edge characteristics with the maximum scale of the remote sensing image and calculating the offset between the selected edge characteristics;

the first feature map generation module is used for connecting edge features of different scales of the reference image to obtain a reference image feature map;

the second characteristic diagram generation module is used for connecting edge characteristics of the remote sensing image in different scales to obtain a characteristic diagram of the remote sensing image;

the search area determining module is used for determining a search area of the remote sensing image characteristic diagram by using the offset;

and the registration module is used for respectively selecting a plurality of angular points on the search areas of the reference image characteristic diagram and the remote sensing image characteristic diagram and realizing the registration of the remote sensing image by adopting a pixel-by-pixel matching method.

As an inventive concept, the present invention also provides a terminal device comprising a memory, a processor, and a computer program stored on the memory; characterized in that the processor executes the computer program to implement the steps of the above-mentioned multi-modal image registration method of the present invention.

As an inventive concept, the present invention also provides a computer-readable storage medium having stored thereon a computer program/instructions; the computer program/instructions, when executed by a processor, implement the steps of the above-described multi-modality image registration method of the present invention.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention carries out coarse registration and accurate registration on the images, and greatly improves the accuracy of image registration.

2. The method is based on the structural similarity between the multi-mode remote sensing images, extracts local feature descriptors (edge features) by utilizing the pre-trained edge detection network, and performs feature matching by utilizing the traditional template matching method, thereby realizing accurate and efficient registration under various modal images.

3. The method obtains a result with higher matching accuracy and better robustness in the multi-source image matching. Experiments show that compared with the classical traditional method or the deep learning method, the method provided by the invention has higher registration precision and higher registration speed, and provides a good basis for high-precision robust matching of the multi-source remote sensing image.

Drawings

FIG. 1 is a schematic diagram of a method according to embodiment 1 of the present invention;

FIG. 2 is a schematic diagram of an edge detection result in embodiment 1 of the present invention;

FIG. 3 is a schematic diagram of a pixel-by-pixel matching process according to embodiment 1 of the present invention;

4 (a) -4 (d) are multi-modal image samples in the MMD data set constructed in example 1 of the present invention; FIG. 4 (a) GF (23 ° 30'34 ' N,113 ' 09' E), LDS8 (23 ° 33'23 ' N,113 '06 '46 ' E); FIG. 4 (b) GF (23 '30' N,113 '23' E), LDS8 (23 '31' 50 'N, 113' 25 '02' E); FIG. 4 (c) GF (22 ° 16 '07' N,112 ° 36 '46' E), LDS8 (22 ° 18 '05' N,112 ° 36 '06' E); FIG. 4 (d) GF (22 '46 '15 ' N,114 ' 18' E), LDS8 (22 ' 48'54 ' N,114 ' 19'31 ' E); wherein GF represents high-score satellite data, LDS8 represents Landsat8 satellite data, and data in brackets represents longitude and latitude;

fig. 5 (a) is an optical image of the depth of the lidar in embodiment 1 of the present invention; FIG. 5 (b) is a laser radar depth optical picture in DFM method; FIG. 5 (c) is a visible infrared photograph in example 1 of the present invention; fig. 5 (d) is a visible SAR picture in embodiment 1 of the present invention; FIG. 5 (e) is an infrared optical photograph of example 1 of the present invention; fig. 5 (f) is an infrared optical picture of the SURT method; FIG. 5 (g) is an infrared optical picture of the ORB method; fig. 5 (h) is an infrared optical picture of the SIFT method.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In describing embodiments of the present invention, the terms "first," "second," and the like are not intended to imply any order, quantity, or importance, but rather are used to distinguish one element from another. As used herein, the terms "a," "an," and other similar terms are not intended to mean that there is only one of the referenced item, but rather that the pertinent description is directed to only one of the referenced items 2, which may have one or more of those items. In the description of the embodiments of the present invention, the terms "comprise," "include," and other similar terms are intended to represent logical interrelationships, and are not to be construed as representing spatial structural relationships. For example, "a includes B" is intended to mean that logically B belongs to a, and not that spatially B is located inside a. Furthermore, the terms "comprising," "including," and other similar words are to be construed as open-ended, rather than closed-ended. For example, "a includes B" is intended to mean that B belongs to a, but B does not necessarily constitute all of a, and a may also include other elements such as C, D, E, and the like.

Example 1

The principle of the method of embodiment 1 of the present invention is shown in fig. 1. The present embodiment is based on the principle that there is a modal gap at the edges in the multi-modal images, but the modal gap is generally uniform (fig. 2). Furthermore, after edge extraction, the coarse-scale features typically contain only abstract information, while the fine-scale edge features may provide finer-grained edge details. Inspired by this principle, the present embodiment suggests using edge features as a modal robust descriptor for image matching and performing progressive matching from coarse to fine.

Specifically, a pre-trained edge detection network (i.e., RCF (Liu Y, cheng M M, hu X, et al. Ring. Capacitive defects for edge detection [ C ]// Proceedings of the IEEE conference on computer vision and pattern registration.2017: 3000-3009)) is used as a descriptor network to extract edge features of three different scales (scales) of a reference image and an image to be registered. In this embodiment, two pre-trained edge detection networks are used, one for extracting the edge features of the reference image and the other for extracting the edge features of the image to be registered. In the fine matching phase, three different scales of edge features (i.e., fref 3, fref, fsen 3, fsen) are concatenated into local feature descriptors, which are used to retain detailed information and overall structural information. After the edge features are extracted, the generated edge features are fed to a subsequent coarse matching stage and a feature matching (fine matching) stage. The pre-training network used in this embodiment is a convolutional neural network, and the convolution characteristics in CNN (convolutional neural network) become gradually thicker as the receptive field increases (Liu Y, cheng M M, hu X, et al. Richer convolutional provisions for the edge detection [ C ]// Proceedings of the IEEE conference on computer vision and pattern registration.2017: 3000-3009.).

Fig. 2 depicts edge features extracted from two images of different morphologies. As can be seen from the first column of fig. 2, the multimodal images share consistent edges at different scales (edges are consistent, see the second and third columns of fig. 2), although the appearance is very different.

Given the coarsest proportion of edge features (Fre) extracted from the reference image _f—3 ) The edge feature with the coarsest proportion is the edge feature with the largest scale, and the edge feature with the largest scale is extracted from the image to be registered, and the two edge features with the largest scale are used for coarse registration. And performing template matching by using the edge feature with the maximum scale in the reference image and the edge feature with the maximum scale in the image to be registered so as to calculate the offset (delta x, delta y) between the reference image and the image to be registered:

Compared with global matching, the method has stronger robustness when larger displacement deviation exists between the multi-modal remote sensing images. The superiority of the matching accuracy and the calculation time in the embodiment of the present invention will be discussed later.

After the coarse matching stage, the edge features extracted on the coarse and fine scales, i.e., all of the edge features extracted previously, will be used for finer-grained feature matching. Namely, the reference image feature map and the remote sensing image feature map are obtained by respectively connecting the edge features of different scales corresponding to the reference image and the remote sensing image. Firstly, a search area of the remote sensing image feature map is determined by using the calculated offset (deltax, deltay). Then, the corners on the feature map of the reference image are obtained by Harris corner detection method (Harris, chris, mike Stephens, et al, 1988."A combined corner and edge detector." approximate Vision Conference 15 (50): 10-5244), and the unstable corners are filtered by Non-Maximum suppression method (Neubeck, A., and L.Van Gool.2006. "effective Non-Maximum compression." In 18th International Conference Panel Recognition (ICPR' 06), vol.3, 850-855.). Based on local consistency of features, for each stable feature point P (x) _p ，y _p ) And extracting a block Wp (i.e. a matching window, see a solid line frame and a dashed line frame in fig. 3) with the size of r × r and taking P as a center, and calculating the similarity between the block Wp and the candidate corner point on the target image (i.e. the search area of the feature map of the image to be registered). Namely, a matching window which takes each corner point on the reference image as the center is determined on the reference image feature map, each matching window is taken as a template, the matching windows slide in the search area, and the point on the image feature map to be registered with the highest similarity of the corner points in each matching window is searched, wherein the point with the highest similarity is the point registered with the corner point in the corresponding matching window. In order to accelerate the whole calculation process, the present embodiment converts the pixel-by-pixel feature expression maps (i.e. the reference image feature map and the image feature map to be registered) into the frequency domain by Using three-dimensional Fast Fourier Transform (FFT) (De Castro, e., and c. Morandi.1987."Registration of transformed and registered Images Using finish Fourier transforms." IEEE Transactions on Pattern Analysis and Machine Analysis (5): 700-703.), and performs the pixel-by-pixel matching described above to obtain a similarity map, and the position of the maximum value of the similarity map is the matching position of the image to be registered.

As shown in fig. 4 (a) to 4 (d), the present embodiment introduces a multi-modal remote sensing image matching dataset, i.e., an MMD dataset. The MMD data set comprises 40 pairs of satellite multimodality remote sensing images acquired from a landsat satellite No. 8 (infrared data) and a satellite No. GF1-WFV (optical data). The data is divided into four subdata sets according to the seasonal attribute, and the four subdata sets correspond to spring, summer, autumn and winter respectively. For each image pair, the size of the GF1-WFV image is 801X 801 and the Landsat8 image size is 512X 512. Using keypoints with the same geographic coordinates, a transformation matrix can be computed and treated as a label for the image pairs in the image matching task.

Ablation experiments were performed below to demonstrate the effectiveness of the method of this embodiment. Then, the method of the present embodiment is compared with the prior art method in terms of matching performance, computational complexity, and inference time. Finally, the performance of the method of the present embodiment in different modes is compared, where the different modes include Visible-SAR, visible-infra, and Optical-LiDAR. And taking the correct matching point number (CMN), the Matching Accuracy (MA) and the running Time (Time) as indexes for performance evaluation. All experiments were performed on a PC equipped with an AMD Ryzen 7 5800H CPU and an NVIDIA GeForce RTX 3060 notebook GPU.

The present embodiment first studies the effectiveness of the strategy from coarse matching to fine matching. Specifically, the validity of the scheme of the embodiment is verified by deleting the coarse matching to the fine matching (for example, model 2). Further, here the coarse matching stage is removed and the edge features are fed directly into the feature matching stage (fine matching stage). As shown in table 1, since the strategy from rough matching to fine matching is not adopted, a matching point with perfect matching cannot be obtained. Due to the large offset between the two images, it is difficult to directly obtain an accurate matching result in the feature matching stage. By using the strategy from coarse matching to fine matching in embodiment 1 of the present invention, the influence caused by a large offset can be overcome in the coarse matching stage, so as to obtain a better matching result.

In table 1, the ablation results are achieved by different model variants. C2F represents a matching strategy from rough matching to fine matching in the embodiment of the invention, FFT represents three-dimensional fast Fourier transform, MSF represents multi-scale feature descriptors, and Features represents different edge descriptors in the MSF. As can be seen from table 1, the registration result obtained by the method of embodiment 1 of the present invention is the best in all of the indexes of the correct matching point number (CMN), the Matching Accuracy (MA), and the running Time (Time).

TABLE 1 matching results under different methods

In this embodiment, a 3D FFT (three-dimensional fast fourier transform) is used for acceleration in the feature matching stage. To verify its validity, a variation of the method of the present embodiment (i.e., model-3) is introduced here by removing the 3D-FFT. As shown in table 1, the comparison standard (model 1) of the present embodiment benefits from 3D FFT and the image registration speed is improved by more than 200 times compared to model 3. This also fully demonstrates the effectiveness of the 3D FFT in this embodiment.

In this embodiment, in the feature matching stage, a multi-scale edge feature is adopted. To demonstrate the importance of multi-scale features to matching, a variant of the present embodiment method (i.e. model 4) is introduced here by using edge features on only a single scale. As can be seen from table 1, using only one scale of features in the feature matching stage results in a significant degradation of the performance of model 4. In contrast, the multi-scale feature descriptor of the present embodiment helps the comparison standard (model 1) of the present embodiment to achieve better performance.

To further investigate the effectiveness of the edge features in the method of this embodiment, two models (5 and 6, respectively) corresponding to the method were introduced here by replacing the extracted features with the features of CHOG (Chandrasekhar, v., g.takacs, d.m.chen, s.s.tsai, y.reznik, r.grzeszczuk, and b.girod.2012, "Compressed history of grams: a Low-binary descriptor." International Journal of Computer Vision 96 (3): p.384-399.) and VGG (simony, karen, and Zisserman, andrew.2014. "y depth consistency for image synthesis" arXiv prediction x print x iv: 1409.1556). The CHOG feature is a widely used artificial feature, while the VGG feature is a widely used CNN-based feature. As can be seen from table 1 above, the models 5 and 6 have inferior performance and longer processing time compared to the model of the present embodiment (i.e., model 1). This clearly demonstrates that the multi-scale edge feature of the present embodiment is simple and effective, contributing to achieving high efficiency, high accuracy registration.

The method of the present embodiment is experimentally compared with the conventional image matching method. <xnotran> , SIFT (Lowe, david G.2004."Distinctive image features from scale-invariant keypoints." International Journal of Computer Vision 60 (2): 91-110.), SURF (Neubeck, A., and L.Van Gool.2006."Efficient Non-Maximum Suppression." In 18th International Conference on Pattern Recognition (ICPR' 06), vol.3,850-855.), ORB (Rublee, ethan, vincent Rabaud, kurt Konolige, and Gary Bradski.2011."ORB: an efficient alternative to SIFT or SURF." In 2011 International Conference on Computer Vision,2564-2571.), RIFT (Li, jiayuan, qingwu Hu, and Mingyao Ai.2019."RIFT: multi-modal image matching based on radiation-variation insensitive feature transform." IEEE Transactions on Image Processing 29:3296-3310.), DFM (Efe, ufuk, kutalmis Gokalp Ince, and Aydin Alatan.2021."Dfm: A performance baseline for deep feature matching." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4284-4293.), patch2Pixel (Zhou, qunjie, torsten Sattler, and Laura Leal-Taixe.2021."Patch2pix: epipolar-guided pixellevel correspondences." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4669-4678.), SuperGlue (Sarlin, paul-Edouard, daniel DeTone, tomasz Malisiewicz, and Andrew Rabinovich.2020."Superglue: learning feature matching with graph neural networks." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4938-4947.). </xnotran> Among them, RIFT, SURF, SIFT, and ORB are traditional image registration methods, while DFM, patch2Pixel, and SuperGlue are learning-based methods.

The results of the quantitative analysis are given in table 2. As can be seen from table 2, in most cases, the method of the present embodiment achieves the best matching accuracy in terms of the MA index. Although conventional methods including SIFT, SURF, and ORB have high processing efficiency, the artificially set descriptors limit the performance of these methods. The RIFT method has many correct points and good effect, but takes too long time. In contrast, in most cases, the learning-based descriptors help DFM, patch2Pixel, and superslue approaches achieve higher registration accuracy. However, learning-based approaches have limited generalization performance. For example, superGlue has a MA index score of 0.504 on the sub data set corresponding to spring, while the MA index score on the sub data set corresponding to autumn is only 0.096. According to the method, the CNN descriptor is integrated into the traditional framework, so that higher matching precision is achieved, and meanwhile excellent generalization performance is kept in different scenes.

Table 2 registration results of the method of the present embodiment and the conventional method

FIGS. 5 (a) -5 (h) are quantitative results obtained by the method of this example, which further show several quantitative results produced by the method of this example. Obviously, the method of the present embodiment can well realize multi-modal image registration and overcome large offset to generate more accurate matching result.

The generalization performance is crucial to the practicality of the image registration method in practical applications. To further verify the excellent generalization ability of the method of this embodiment to out-of-diagnosis modalities, three additional sets of images with different modalities were collected, as shown in table 3. As can be seen from table 3, the method of this example is clearly superior to the other methods in these modalities. Although the method of the embodiment is only trained in Visible-Infrared (Visible-Infrared) scenes, accuracy exceeding 0.8MA can be achieved in Visible SAR (Visible-SAR) and lidar depth Visible (lidar depth-Visible) modes. This fully demonstrates the superior generalization performance of the method of this embodiment.

TABLE 3 three sets of multimodal image datasets for generalization experiments

In table 3, GSD represents the distance of the ground sample.

The embodiment provides a mixed multi-modal remote sensing image matching method by integrating a descriptor based on CNN and a traditional template matching frame. Furthermore, the present embodiment introduces a new type of data set for multimodal remote sensing image matching, i.e. MMD. It is proved that the method of the present embodiment benefits from the CNN-based descriptor, and the image matching precision is high, while maintaining excellent generalization performance. A large number of experiments show that the method of the embodiment is superior to the traditional method and the learning-based method in the prior art in the aspects of accuracy and registration efficiency.

Example 2

An embodiment 2 of the present invention provides a multi-modal remote sensing image registration system corresponding to the embodiment 1, including:

the offset calculation module is used for respectively selecting the edge feature with the maximum scale of the reference image and the edge feature with the maximum scale of the remote sensing image and calculating the offset between the selected edge features;

Example 3

Embodiment 3 of the present invention provides a terminal device corresponding to embodiment 1, where the terminal device may be a processing device for a client, such as a mobile phone, a notebook computer, a tablet computer, a desktop computer, and the like, so as to execute the method of the above embodiment.

The terminal device of the embodiment comprises a memory, a processor and a computer program stored on the memory; the processor executes the computer program on the memory to implement the steps of the method of embodiment 1 described above.

In some implementations, the Memory may be a high-speed Random Access Memory (RAM), and may also include a non-volatile Memory, such as at least one disk Memory.

In other implementations, the processor may be various general-purpose processors such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), and the like, and is not limited herein.

Example 4

Embodiment 4 of the present invention provides a computer-readable storage medium corresponding to the above embodiments, on which a computer program/instruction is stored. The computer program/instructions, when executed by the processor, implement the steps of the method of embodiment 1 described above.

The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any combination of the foregoing.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the application can be implemented by adopting various computer languages, such as object-oriented programming language Java and transliterated scripting language JavaScript.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A multi-modal remote sensing image registration method is characterized by comprising the following steps:

s2, respectively selecting the edge feature with the maximum scale of the reference image and the edge feature with the maximum scale of the remote sensing image, and calculating the offset between the selected edge features;

s3, respectively connecting the reference image and the edge characteristics of the remote sensing image in different scales to obtain a reference image characteristic diagram and a remote sensing image characteristic diagram;

2. The multi-modal remote sensing image registration method according to claim 1, wherein in step S1, pre-trained edge detection networks are used to extract edge features on different scales of the reference image and the remote sensing image.

3. The method of multi-modal remote sensing image registration according to claim 2, wherein the pre-trained edge detection network employs a convolutional neural network.

4. The multi-modal remote sensing image registration method according to any one of claims 1 to 3, wherein in step S1, edge features on three dimensions of the reference image and the remote sensing image are extracted respectively.

5. The multi-modal remote sensing image registration method according to claim 1, wherein in step S2, the calculation formula of the offset (δ x, δ y) is:

wherein, (x, y) is a reference image feature map F _{ref_3} (x + delta) of the position coordinates of the pixel points (c) in (c) _x ,y+δ _y ) As a characteristic map F of the remote sensing image _{sen_3} The position coordinates of the upper pixel point.

6. The multi-modal remote sensing image registration method according to any one of claims 1 to 3 and 5, wherein in step S5, the specific implementation process for implementing registration of the remote sensing images by adopting the pixel-by-pixel matching method comprises the following steps: and determining a matching window taking each corner point on the reference image as a center on the reference image feature map, sliding in the search area by taking each matching window as a template, and searching for a point on the remote sensing image feature map with the highest similarity to the corner point in each matching window, wherein the point with the highest similarity is a point registered with the corner point in the corresponding matching window.

7. The method of claim 6, wherein before the matching window is determined, the reference image feature map and the remote sensing image feature map are converted into a frequency domain feature map by three-dimensional fast Fourier transform, and at this time, the remote sensing image is registered by a pixel-by-pixel matching method based on the frequency domain feature map.

8. A multi-modality remote sensing image registration system, comprising:

and the registration module is used for selecting a plurality of angular points on the search areas of the reference image characteristic diagram and the remote sensing image characteristic diagram respectively and realizing the registration of the remote sensing image by adopting a pixel-by-pixel matching method.

9. A terminal device comprising a memory, a processor and a computer program stored on the memory; characterized in that said processor executes said computer program to implement the steps of the method according to one of claims 1 to 7.

10. A computer readable storage medium having stored thereon a computer program/instructions; characterized in that the computer program/instructions, when executed by a processor, implement the steps of the method according to one of claims 1 to 7.