CN115546268A - Multi-mode remote sensing image registration method, system, terminal device and storage medium - Google Patents

Multi-mode remote sensing image registration method, system, terminal device and storage medium Download PDF

Info

Publication number
CN115546268A
CN115546268A CN202211162936.5A CN202211162936A CN115546268A CN 115546268 A CN115546268 A CN 115546268A CN 202211162936 A CN202211162936 A CN 202211162936A CN 115546268 A CN115546268 A CN 115546268A
Authority
CN
China
Prior art keywords
remote sensing
sensing image
image
matching
registration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211162936.5A
Other languages
Chinese (zh)
Inventor
汪璞
安玮
石添鑫
邓新蒲
盛卫东
林再平
曾瑶源
李振
李骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202211162936.5A priority Critical patent/CN115546268A/en
Publication of CN115546268A publication Critical patent/CN115546268A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/88Image or video recognition using optical means, e.g. reference filters, holographic masks, frequency domain filters or spatial domain filters
    • G06V10/89Image or video recognition using optical means, e.g. reference filters, holographic masks, frequency domain filters or spatial domain filters using frequency domain filters, e.g. Fourier masks implemented on spatial light modulators

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-mode remote sensing image registration method, a system, terminal equipment and a storage medium, wherein based on structural similarity between multi-mode remote sensing image pairs, a pre-trained edge detection network is used for extracting local feature descriptors (edge features), and a traditional template matching method is used for carrying out feature matching, so that accurate and efficient registration under various mode images is realized. The method obtains a result with higher matching accuracy and better robustness in multi-source image matching.

Description

Multi-mode remote sensing image registration method, system, terminal device and storage medium
Technical Field
The invention relates to the technical field of image registration, in particular to a multi-mode remote sensing image registration method, a multi-mode remote sensing image registration system, terminal equipment and a storage medium.
Background
With the rapid development of remote sensing technology, ground observation images from various sensors such as visible light and synthetic aperture radar are more and more abundant. Due to the influence of factors such as image size, cloud occlusion and imaging quality, global information of a target area is often difficult to obtain through detection of a single type of image target area in a complex environment, and multi-modal image information acquired through different platforms and sensors is complementary. Multi-modality image registration may provide the ability to jointly use multiple information, resulting in greater data volume and shorter revisit times. Therefore, in recent years, the multi-mode remote sensing image registration attracts wide attention, and the multi-mode remote sensing image registration is a process of identifying the same-name point from two or more images obtained from different sensors, different visual angles or at different time, is also a bottom-layer task, and is a preprocessing process of a plurality of remote sensing image analysis such as image super-resolution, image fusion and the like. However, due to the existence of a large amount of non-linear radiation distortion and geometric deformation between multi-modal images, high-precision registration between multi-modal images is still a challenging issue. Similar to conventional image matching, the implementation of multi-modal image registration includes a conventional method and a deep learning method. Conventional methods can be largely divided into feature-based and region-based, where the most central challenge is how to design and use appropriate similarity measures to drive the iterative process to accurately estimate the geometric transformation. One straightforward solution is to utilize or modify common metrics under information theory, and the other is to indirectly use similarity measures such as Fast Fourier Transform (FFT), structural information extraction, and mapping image intensities to high-dimensional space using descriptors, etc., by simplifying the uniform domain. In the prior art, based on the structural attributes of images, a phase consistency descriptor with illumination and contrast invariance is provided, the model is expanded to a new image registration method, and the Euclidean distance between descriptors of multi-scale phase controllers (MS-PC) is used as similarity measurement to realize the corresponding relation. Experimental results show that the MS-PC has strong robustness to radiation difference between images, and the method is superior to quantitative precision and connection point number of two common methods (SIFT and SAR-SIFT). But the use of similarity measures to guide the optimization can be computationally burdensome due to the high resolution of the remotely sensed images and the strong noise caused by the atmosphere. In addition, the remote sensing images usually have significant geometric variances, such as large rotation, scaling, deformation and small overlapping area, so that the solution space is complex and difficult to optimize. Feature-based methods, which attempt to estimate the geometric transformation between images by identifying matching features, which may be points, lines, faces, etc., but where the features must be salient and stable, such as Harris corner and Scale Invariant Feature Transform (SIFT), conventional SIFT algorithms, tend to have some problematic differences in radiometry in terms of the distribution of extracted features and the sensitivity of descriptors to saliency-especially in multi-source remote sensing imaging, considering that the structural similarity between images can be well preserved and used for image registration of different modes, the prior art proposes a fast, robust multi-modal matching framework. Specifically, a dense description image is first generated based on existing local descriptors, such as HOG and LSS. And then determining and matching similarity measurement of a frequency domain by using the 3D-FFT and the directional gradient, and extracting feature points from the Harris corner points by using a template matching scheme. The final performance of the whole large-size image pair is verified under the piecewise linear transformation model with the iterative mismatch removal process mismatching points and the local affine estimation in the cubic polynomial model estimation and consistency judgment. Experimental results show that the matching performance of the framework is superior to that of the existing matching method. However, this method is not ideal for the case where there is a large geometric offset between the two images.
With the development of artificial intelligence technology, a deep convolutional neural network is introduced into the field of image registration as an advanced feature extractor. By utilizing the nonlinear operation and the hierarchical structure of the CNN, the image information is continuously learned from low level to high level, the complex high-level image characteristics are obtained, the expression method is provided, and high-level characteristic matching of more abstract semantic information is used. The prior art provides a multi-temporal remote sensing image registration method based on CNN characteristics, and the matching performance is improved by learning multi-scale characteristic descriptors and gradually increasing the selection of initial images. The multi-scale feature descriptors are generated by a pre-trained VGG network, and a TPS model is integrated to explain that non-rigid transformation is paired under the GMM and EM frameworks. However, the method for deep learning in multi-modal remote sensing image registration is not as rich as the computer vision direction, mainly because on one hand, on the aspect of data acquisition, a data set matched with the multi-modal remote sensing image is not disclosed at present for training and testing, and on the other hand, the remote sensing image has the characteristics of high resolution, mixed noise, large geometric variance and the like, so that the design of a more effective deep registration framework is more difficult, and the prior art is difficult to realize accurate and efficient image registration.
Disclosure of Invention
The invention aims to solve the technical problem that aiming at the defects of the prior art, provides a multi-mode remote sensing image registration method, a system, terminal equipment and a storage medium, and realizes accurate and efficient image registration.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a multi-modal remote sensing image registration method comprises the following steps:
s1, extracting edge features of a reference image and a remote sensing image on different scales;
s2, respectively selecting the edge characteristics of the maximum scale of the reference image and the edge characteristics of the maximum scale of the remote sensing image, and calculating the offset between the selected edge characteristics;
s3, respectively connecting edge features of different scales corresponding to the reference image and the remote sensing image to obtain a reference image feature map and a remote sensing image feature map;
s4, determining a search area of the remote sensing image characteristic diagram by using the offset;
and S5, selecting a plurality of angular points on the search areas of the reference image characteristic diagram and the remote sensing image characteristic diagram respectively, and realizing the registration of the remote sensing image by adopting a pixel-by-pixel matching method.
The invention calculates the offset between the reference image and the remote sensing image based on the edge feature with the maximum scale, selects the search area of the remote sensing image feature map based on the offset, and performs image registration in the search area by combining the angular point detection process, thereby greatly reducing the registration calculation amount and improving the efficiency of image registration. Meanwhile, the method does not relate to a complex depth registration framework, the implementation process is simple, and experiments show that the method obtains higher matching accuracy and better robustness in multi-source image matching.
In the step S1, edge features on different scales of the reference image and the remote sensing image are extracted by utilizing a pre-trained edge detection network. The invention does not need to design a complex depth registration frame, further simplifies the image registration process and improves the image registration efficiency. Meanwhile, the result of the edge detection network is subjected to coarse matching (offset is determined) and then accurate matching is performed, so that the precision of image registration is further improved.
In the invention, the pre-trained edge detection network adopts a convolutional neural network.
In the invention, in order to ensure the image registration precision and simultaneously give consideration to the image registration efficiency, in step S1, edge features on three scales of a reference image and a remote sensing image are respectively extracted.
In step S2 of the present invention, the calculation formula of the offset (δ x, δ y) is:
Figure BDA0003860904400000031
wherein, (x, y) is a reference image feature map F ref_3 (x + delta) of the position coordinates of the pixel points (c) in (c) x ,y+δ y ) For remote sensing image feature map F sen_3 The position coordinates of the upper pixel points.
The offset is simple in calculation process, and coarse registration of the reference image and the remote sensing image (namely the image to be registered) is easy to realize.
In step S5, the specific implementation process of implementing registration of the remote sensing image by adopting the pixel-by-pixel matching method includes: and determining a matching window taking each corner point on the reference image as a center on the reference image feature map, sliding in the search area by taking each matching window as a template, and searching for a point on the remote sensing image feature map with the highest similarity with the corner point in each matching window, wherein the point with the highest similarity is a point registered with the corner point in the corresponding matching window.
The invention adopts the pixel-by-pixel matching method of template matching to realize image registration, and the fine registration process is carried out in a search area, so that the registration accuracy is ensured, the registration efficiency is considered, and the high-efficiency and high-precision image registration is realized.
In the invention, in order to further improve the image registration efficiency and accelerate the whole calculation process, before the matching window is determined, the reference image characteristic diagram and the remote sensing image characteristic diagram are converted into a frequency domain characteristic diagram through three-dimensional fast Fourier transform, and at the moment, the remote sensing image registration is realized by adopting a pixel-by-pixel matching method based on the frequency domain characteristic diagram.
As an inventive concept, the present invention also provides a multi-modal remote sensing image registration system, which includes:
the edge feature extraction module is used for extracting edge features on different scales of the reference image and the remote sensing image;
the offset calculation module is used for respectively selecting the edge characteristics with the maximum scale of the reference image and the edge characteristics with the maximum scale of the remote sensing image and calculating the offset between the selected edge characteristics;
the first feature map generation module is used for connecting edge features of different scales of the reference image to obtain a reference image feature map;
the second characteristic diagram generation module is used for connecting edge characteristics of the remote sensing image in different scales to obtain a characteristic diagram of the remote sensing image;
the search area determining module is used for determining a search area of the remote sensing image characteristic diagram by using the offset;
and the registration module is used for respectively selecting a plurality of angular points on the search areas of the reference image characteristic diagram and the remote sensing image characteristic diagram and realizing the registration of the remote sensing image by adopting a pixel-by-pixel matching method.
As an inventive concept, the present invention also provides a terminal device comprising a memory, a processor, and a computer program stored on the memory; characterized in that the processor executes the computer program to implement the steps of the above-mentioned multi-modal image registration method of the present invention.
As an inventive concept, the present invention also provides a computer-readable storage medium having stored thereon a computer program/instructions; the computer program/instructions, when executed by a processor, implement the steps of the above-described multi-modality image registration method of the present invention.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention carries out coarse registration and accurate registration on the images, and greatly improves the accuracy of image registration.
2. The method is based on the structural similarity between the multi-mode remote sensing images, extracts local feature descriptors (edge features) by utilizing the pre-trained edge detection network, and performs feature matching by utilizing the traditional template matching method, thereby realizing accurate and efficient registration under various modal images.
3. The method obtains a result with higher matching accuracy and better robustness in the multi-source image matching. Experiments show that compared with the classical traditional method or the deep learning method, the method provided by the invention has higher registration precision and higher registration speed, and provides a good basis for high-precision robust matching of the multi-source remote sensing image.
Drawings
FIG. 1 is a schematic diagram of a method according to embodiment 1 of the present invention;
FIG. 2 is a schematic diagram of an edge detection result in embodiment 1 of the present invention;
FIG. 3 is a schematic diagram of a pixel-by-pixel matching process according to embodiment 1 of the present invention;
4 (a) -4 (d) are multi-modal image samples in the MMD data set constructed in example 1 of the present invention; FIG. 4 (a) GF (23 ° 30'34 ' N,113 ' 09' E), LDS8 (23 ° 33'23 ' N,113 '06 '46 ' E); FIG. 4 (b) GF (23 '30' N,113 '23' E), LDS8 (23 '31' 50 'N, 113' 25 '02' E); FIG. 4 (c) GF (22 ° 16 '07' N,112 ° 36 '46' E), LDS8 (22 ° 18 '05' N,112 ° 36 '06' E); FIG. 4 (d) GF (22 '46 '15 ' N,114 ' 18' E), LDS8 (22 ' 48'54 ' N,114 ' 19'31 ' E); wherein GF represents high-score satellite data, LDS8 represents Landsat8 satellite data, and data in brackets represents longitude and latitude;
fig. 5 (a) is an optical image of the depth of the lidar in embodiment 1 of the present invention; FIG. 5 (b) is a laser radar depth optical picture in DFM method; FIG. 5 (c) is a visible infrared photograph in example 1 of the present invention; fig. 5 (d) is a visible SAR picture in embodiment 1 of the present invention; FIG. 5 (e) is an infrared optical photograph of example 1 of the present invention; fig. 5 (f) is an infrared optical picture of the SURT method; FIG. 5 (g) is an infrared optical picture of the ORB method; fig. 5 (h) is an infrared optical picture of the SIFT method.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In describing embodiments of the present invention, the terms "first," "second," and the like are not intended to imply any order, quantity, or importance, but rather are used to distinguish one element from another. As used herein, the terms "a," "an," and other similar terms are not intended to mean that there is only one of the referenced item, but rather that the pertinent description is directed to only one of the referenced items 2, which may have one or more of those items. In the description of the embodiments of the present invention, the terms "comprise," "include," and other similar terms are intended to represent logical interrelationships, and are not to be construed as representing spatial structural relationships. For example, "a includes B" is intended to mean that logically B belongs to a, and not that spatially B is located inside a. Furthermore, the terms "comprising," "including," and other similar words are to be construed as open-ended, rather than closed-ended. For example, "a includes B" is intended to mean that B belongs to a, but B does not necessarily constitute all of a, and a may also include other elements such as C, D, E, and the like.
Example 1
The principle of the method of embodiment 1 of the present invention is shown in fig. 1. The present embodiment is based on the principle that there is a modal gap at the edges in the multi-modal images, but the modal gap is generally uniform (fig. 2). Furthermore, after edge extraction, the coarse-scale features typically contain only abstract information, while the fine-scale edge features may provide finer-grained edge details. Inspired by this principle, the present embodiment suggests using edge features as a modal robust descriptor for image matching and performing progressive matching from coarse to fine.
Specifically, a pre-trained edge detection network (i.e., RCF (Liu Y, cheng M M, hu X, et al. Ring. Capacitive defects for edge detection [ C ]// Proceedings of the IEEE conference on computer vision and pattern registration.2017: 3000-3009)) is used as a descriptor network to extract edge features of three different scales (scales) of a reference image and an image to be registered. In this embodiment, two pre-trained edge detection networks are used, one for extracting the edge features of the reference image and the other for extracting the edge features of the image to be registered. In the fine matching phase, three different scales of edge features (i.e., fref 3, fref, fsen 3, fsen) are concatenated into local feature descriptors, which are used to retain detailed information and overall structural information. After the edge features are extracted, the generated edge features are fed to a subsequent coarse matching stage and a feature matching (fine matching) stage. The pre-training network used in this embodiment is a convolutional neural network, and the convolution characteristics in CNN (convolutional neural network) become gradually thicker as the receptive field increases (Liu Y, cheng M M, hu X, et al. Richer convolutional provisions for the edge detection [ C ]// Proceedings of the IEEE conference on computer vision and pattern registration.2017: 3000-3009.).
Fig. 2 depicts edge features extracted from two images of different morphologies. As can be seen from the first column of fig. 2, the multimodal images share consistent edges at different scales (edges are consistent, see the second and third columns of fig. 2), although the appearance is very different.
Given the coarsest proportion of edge features (Fre) extracted from the reference image f—3 ) The edge feature with the coarsest proportion is the edge feature with the largest scale, and the edge feature with the largest scale is extracted from the image to be registered, and the two edge features with the largest scale are used for coarse registration. And performing template matching by using the edge feature with the maximum scale in the reference image and the edge feature with the maximum scale in the image to be registered so as to calculate the offset (delta x, delta y) between the reference image and the image to be registered:
Figure BDA0003860904400000061
wherein, (x, y) is a reference image feature map F ref_3 (x + delta) of the position coordinates of the pixel points (c) in (c) x ,y+δ y ) For remote sensing image feature map F sen_3 The position coordinates of the upper pixel points.
Compared with global matching, the method has stronger robustness when larger displacement deviation exists between the multi-modal remote sensing images. The superiority of the matching accuracy and the calculation time in the embodiment of the present invention will be discussed later.
After the coarse matching stage, the edge features extracted on the coarse and fine scales, i.e., all of the edge features extracted previously, will be used for finer-grained feature matching. Namely, the reference image feature map and the remote sensing image feature map are obtained by respectively connecting the edge features of different scales corresponding to the reference image and the remote sensing image. Firstly, a search area of the remote sensing image feature map is determined by using the calculated offset (deltax, deltay). Then, the corners on the feature map of the reference image are obtained by Harris corner detection method (Harris, chris, mike Stephens, et al, 1988."A combined corner and edge detector." approximate Vision Conference 15 (50): 10-5244), and the unstable corners are filtered by Non-Maximum suppression method (Neubeck, A., and L.Van Gool.2006. "effective Non-Maximum compression." In 18th International Conference Panel Recognition (ICPR' 06), vol.3, 850-855.). Based on local consistency of features, for each stable feature point P (x) p ,y p ) And extracting a block Wp (i.e. a matching window, see a solid line frame and a dashed line frame in fig. 3) with the size of r × r and taking P as a center, and calculating the similarity between the block Wp and the candidate corner point on the target image (i.e. the search area of the feature map of the image to be registered). Namely, a matching window which takes each corner point on the reference image as the center is determined on the reference image feature map, each matching window is taken as a template, the matching windows slide in the search area, and the point on the image feature map to be registered with the highest similarity of the corner points in each matching window is searched, wherein the point with the highest similarity is the point registered with the corner point in the corresponding matching window. In order to accelerate the whole calculation process, the present embodiment converts the pixel-by-pixel feature expression maps (i.e. the reference image feature map and the image feature map to be registered) into the frequency domain by Using three-dimensional Fast Fourier Transform (FFT) (De Castro, e., and c. Morandi.1987."Registration of transformed and registered Images Using finish Fourier transforms." IEEE Transactions on Pattern Analysis and Machine Analysis (5): 700-703.), and performs the pixel-by-pixel matching described above to obtain a similarity map, and the position of the maximum value of the similarity map is the matching position of the image to be registered.
As shown in fig. 4 (a) to 4 (d), the present embodiment introduces a multi-modal remote sensing image matching dataset, i.e., an MMD dataset. The MMD data set comprises 40 pairs of satellite multimodality remote sensing images acquired from a landsat satellite No. 8 (infrared data) and a satellite No. GF1-WFV (optical data). The data is divided into four subdata sets according to the seasonal attribute, and the four subdata sets correspond to spring, summer, autumn and winter respectively. For each image pair, the size of the GF1-WFV image is 801X 801 and the Landsat8 image size is 512X 512. Using keypoints with the same geographic coordinates, a transformation matrix can be computed and treated as a label for the image pairs in the image matching task.
Ablation experiments were performed below to demonstrate the effectiveness of the method of this embodiment. Then, the method of the present embodiment is compared with the prior art method in terms of matching performance, computational complexity, and inference time. Finally, the performance of the method of the present embodiment in different modes is compared, where the different modes include Visible-SAR, visible-infra, and Optical-LiDAR. And taking the correct matching point number (CMN), the Matching Accuracy (MA) and the running Time (Time) as indexes for performance evaluation. All experiments were performed on a PC equipped with an AMD Ryzen 7 5800H CPU and an NVIDIA GeForce RTX 3060 notebook GPU.
The present embodiment first studies the effectiveness of the strategy from coarse matching to fine matching. Specifically, the validity of the scheme of the embodiment is verified by deleting the coarse matching to the fine matching (for example, model 2). Further, here the coarse matching stage is removed and the edge features are fed directly into the feature matching stage (fine matching stage). As shown in table 1, since the strategy from rough matching to fine matching is not adopted, a matching point with perfect matching cannot be obtained. Due to the large offset between the two images, it is difficult to directly obtain an accurate matching result in the feature matching stage. By using the strategy from coarse matching to fine matching in embodiment 1 of the present invention, the influence caused by a large offset can be overcome in the coarse matching stage, so as to obtain a better matching result.
In table 1, the ablation results are achieved by different model variants. C2F represents a matching strategy from rough matching to fine matching in the embodiment of the invention, FFT represents three-dimensional fast Fourier transform, MSF represents multi-scale feature descriptors, and Features represents different edge descriptors in the MSF. As can be seen from table 1, the registration result obtained by the method of embodiment 1 of the present invention is the best in all of the indexes of the correct matching point number (CMN), the Matching Accuracy (MA), and the running Time (Time).
TABLE 1 matching results under different methods
Figure BDA0003860904400000081
In this embodiment, a 3D FFT (three-dimensional fast fourier transform) is used for acceleration in the feature matching stage. To verify its validity, a variation of the method of the present embodiment (i.e., model-3) is introduced here by removing the 3D-FFT. As shown in table 1, the comparison standard (model 1) of the present embodiment benefits from 3D FFT and the image registration speed is improved by more than 200 times compared to model 3. This also fully demonstrates the effectiveness of the 3D FFT in this embodiment.
In this embodiment, in the feature matching stage, a multi-scale edge feature is adopted. To demonstrate the importance of multi-scale features to matching, a variant of the present embodiment method (i.e. model 4) is introduced here by using edge features on only a single scale. As can be seen from table 1, using only one scale of features in the feature matching stage results in a significant degradation of the performance of model 4. In contrast, the multi-scale feature descriptor of the present embodiment helps the comparison standard (model 1) of the present embodiment to achieve better performance.
To further investigate the effectiveness of the edge features in the method of this embodiment, two models (5 and 6, respectively) corresponding to the method were introduced here by replacing the extracted features with the features of CHOG (Chandrasekhar, v., g.takacs, d.m.chen, s.s.tsai, y.reznik, r.grzeszczuk, and b.girod.2012, "Compressed history of grams: a Low-binary descriptor." International Journal of Computer Vision 96 (3): p.384-399.) and VGG (simony, karen, and Zisserman, andrew.2014. "y depth consistency for image synthesis" arXiv prediction x print x iv: 1409.1556). The CHOG feature is a widely used artificial feature, while the VGG feature is a widely used CNN-based feature. As can be seen from table 1 above, the models 5 and 6 have inferior performance and longer processing time compared to the model of the present embodiment (i.e., model 1). This clearly demonstrates that the multi-scale edge feature of the present embodiment is simple and effective, contributing to achieving high efficiency, high accuracy registration.
The method of the present embodiment is experimentally compared with the conventional image matching method. <xnotran> , SIFT (Lowe, david G.2004."Distinctive image features from scale-invariant keypoints." International Journal of Computer Vision 60 (2): 91-110.), SURF (Neubeck, A., and L.Van Gool.2006."Efficient Non-Maximum Suppression." In 18th International Conference on Pattern Recognition (ICPR' 06), vol.3,850-855.), ORB (Rublee, ethan, vincent Rabaud, kurt Konolige, and Gary Bradski.2011."ORB: an efficient alternative to SIFT or SURF." In 2011 International Conference on Computer Vision,2564-2571.), RIFT (Li, jiayuan, qingwu Hu, and Mingyao Ai.2019."RIFT: multi-modal image matching based on radiation-variation insensitive feature transform." IEEE Transactions on Image Processing 29:3296-3310.), DFM (Efe, ufuk, kutalmis Gokalp Ince, and Aydin Alatan.2021."Dfm: A performance baseline for deep feature matching." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4284-4293.), patch2Pixel (Zhou, qunjie, torsten Sattler, and Laura Leal-Taixe.2021."Patch2pix: epipolar-guided pixellevel correspondences." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4669-4678.), SuperGlue (Sarlin, paul-Edouard, daniel DeTone, tomasz Malisiewicz, and Andrew Rabinovich.2020."Superglue: learning feature matching with graph neural networks." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4938-4947.). </xnotran> Among them, RIFT, SURF, SIFT, and ORB are traditional image registration methods, while DFM, patch2Pixel, and SuperGlue are learning-based methods.
The results of the quantitative analysis are given in table 2. As can be seen from table 2, in most cases, the method of the present embodiment achieves the best matching accuracy in terms of the MA index. Although conventional methods including SIFT, SURF, and ORB have high processing efficiency, the artificially set descriptors limit the performance of these methods. The RIFT method has many correct points and good effect, but takes too long time. In contrast, in most cases, the learning-based descriptors help DFM, patch2Pixel, and superslue approaches achieve higher registration accuracy. However, learning-based approaches have limited generalization performance. For example, superGlue has a MA index score of 0.504 on the sub data set corresponding to spring, while the MA index score on the sub data set corresponding to autumn is only 0.096. According to the method, the CNN descriptor is integrated into the traditional framework, so that higher matching precision is achieved, and meanwhile excellent generalization performance is kept in different scenes.
Table 2 registration results of the method of the present embodiment and the conventional method
Figure BDA0003860904400000101
FIGS. 5 (a) -5 (h) are quantitative results obtained by the method of this example, which further show several quantitative results produced by the method of this example. Obviously, the method of the present embodiment can well realize multi-modal image registration and overcome large offset to generate more accurate matching result.
The generalization performance is crucial to the practicality of the image registration method in practical applications. To further verify the excellent generalization ability of the method of this embodiment to out-of-diagnosis modalities, three additional sets of images with different modalities were collected, as shown in table 3. As can be seen from table 3, the method of this example is clearly superior to the other methods in these modalities. Although the method of the embodiment is only trained in Visible-Infrared (Visible-Infrared) scenes, accuracy exceeding 0.8MA can be achieved in Visible SAR (Visible-SAR) and lidar depth Visible (lidar depth-Visible) modes. This fully demonstrates the superior generalization performance of the method of this embodiment.
TABLE 3 three sets of multimodal image datasets for generalization experiments
Figure BDA0003860904400000111
In table 3, GSD represents the distance of the ground sample.
The embodiment provides a mixed multi-modal remote sensing image matching method by integrating a descriptor based on CNN and a traditional template matching frame. Furthermore, the present embodiment introduces a new type of data set for multimodal remote sensing image matching, i.e. MMD. It is proved that the method of the present embodiment benefits from the CNN-based descriptor, and the image matching precision is high, while maintaining excellent generalization performance. A large number of experiments show that the method of the embodiment is superior to the traditional method and the learning-based method in the prior art in the aspects of accuracy and registration efficiency.
Example 2
An embodiment 2 of the present invention provides a multi-modal remote sensing image registration system corresponding to the embodiment 1, including:
the edge feature extraction module is used for extracting edge features on different scales of the reference image and the remote sensing image;
the offset calculation module is used for respectively selecting the edge feature with the maximum scale of the reference image and the edge feature with the maximum scale of the remote sensing image and calculating the offset between the selected edge features;
the first feature map generation module is used for connecting edge features of different scales of the reference image to obtain a reference image feature map;
the second characteristic diagram generation module is used for connecting edge characteristics of the remote sensing image in different scales to obtain a characteristic diagram of the remote sensing image;
the search area determining module is used for determining a search area of the remote sensing image characteristic diagram by using the offset;
and the registration module is used for respectively selecting a plurality of angular points on the search areas of the reference image characteristic diagram and the remote sensing image characteristic diagram and realizing the registration of the remote sensing image by adopting a pixel-by-pixel matching method.
Example 3
Embodiment 3 of the present invention provides a terminal device corresponding to embodiment 1, where the terminal device may be a processing device for a client, such as a mobile phone, a notebook computer, a tablet computer, a desktop computer, and the like, so as to execute the method of the above embodiment.
The terminal device of the embodiment comprises a memory, a processor and a computer program stored on the memory; the processor executes the computer program on the memory to implement the steps of the method of embodiment 1 described above.
In some implementations, the Memory may be a high-speed Random Access Memory (RAM), and may also include a non-volatile Memory, such as at least one disk Memory.
In other implementations, the processor may be various general-purpose processors such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), and the like, and is not limited herein.
Example 4
Embodiment 4 of the present invention provides a computer-readable storage medium corresponding to the above embodiments, on which a computer program/instruction is stored. The computer program/instructions, when executed by the processor, implement the steps of the method of embodiment 1 described above.
The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any combination of the foregoing.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the application can be implemented by adopting various computer languages, such as object-oriented programming language Java and transliterated scripting language JavaScript.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A multi-modal remote sensing image registration method is characterized by comprising the following steps:
s1, extracting edge features of a reference image and a remote sensing image on different scales;
s2, respectively selecting the edge feature with the maximum scale of the reference image and the edge feature with the maximum scale of the remote sensing image, and calculating the offset between the selected edge features;
s3, respectively connecting the reference image and the edge characteristics of the remote sensing image in different scales to obtain a reference image characteristic diagram and a remote sensing image characteristic diagram;
s4, determining a search area of the remote sensing image characteristic diagram by using the offset;
and S5, selecting a plurality of angular points on the search areas of the reference image characteristic diagram and the remote sensing image characteristic diagram respectively, and realizing the registration of the remote sensing image by adopting a pixel-by-pixel matching method.
2. The multi-modal remote sensing image registration method according to claim 1, wherein in step S1, pre-trained edge detection networks are used to extract edge features on different scales of the reference image and the remote sensing image.
3. The method of multi-modal remote sensing image registration according to claim 2, wherein the pre-trained edge detection network employs a convolutional neural network.
4. The multi-modal remote sensing image registration method according to any one of claims 1 to 3, wherein in step S1, edge features on three dimensions of the reference image and the remote sensing image are extracted respectively.
5. The multi-modal remote sensing image registration method according to claim 1, wherein in step S2, the calculation formula of the offset (δ x, δ y) is:
Figure FDA0003860904390000011
wherein, (x, y) is a reference image feature map F ref_3 (x + delta) of the position coordinates of the pixel points (c) in (c) x ,y+δ y ) As a characteristic map F of the remote sensing image sen_3 The position coordinates of the upper pixel point.
6. The multi-modal remote sensing image registration method according to any one of claims 1 to 3 and 5, wherein in step S5, the specific implementation process for implementing registration of the remote sensing images by adopting the pixel-by-pixel matching method comprises the following steps: and determining a matching window taking each corner point on the reference image as a center on the reference image feature map, sliding in the search area by taking each matching window as a template, and searching for a point on the remote sensing image feature map with the highest similarity to the corner point in each matching window, wherein the point with the highest similarity is a point registered with the corner point in the corresponding matching window.
7. The method of claim 6, wherein before the matching window is determined, the reference image feature map and the remote sensing image feature map are converted into a frequency domain feature map by three-dimensional fast Fourier transform, and at this time, the remote sensing image is registered by a pixel-by-pixel matching method based on the frequency domain feature map.
8. A multi-modality remote sensing image registration system, comprising:
the edge feature extraction module is used for extracting edge features on different scales of the reference image and the remote sensing image;
the offset calculation module is used for respectively selecting the edge feature with the maximum scale of the reference image and the edge feature with the maximum scale of the remote sensing image and calculating the offset between the selected edge features;
the first feature map generation module is used for connecting edge features of different scales of the reference image to obtain a reference image feature map;
the second characteristic diagram generation module is used for connecting edge characteristics of the remote sensing image in different scales to obtain a characteristic diagram of the remote sensing image;
the search area determining module is used for determining a search area of the remote sensing image characteristic diagram by using the offset;
and the registration module is used for selecting a plurality of angular points on the search areas of the reference image characteristic diagram and the remote sensing image characteristic diagram respectively and realizing the registration of the remote sensing image by adopting a pixel-by-pixel matching method.
9. A terminal device comprising a memory, a processor and a computer program stored on the memory; characterized in that said processor executes said computer program to implement the steps of the method according to one of claims 1 to 7.
10. A computer readable storage medium having stored thereon a computer program/instructions; characterized in that the computer program/instructions, when executed by a processor, implement the steps of the method according to one of claims 1 to 7.
CN202211162936.5A 2022-09-23 2022-09-23 Multi-mode remote sensing image registration method, system, terminal device and storage medium Pending CN115546268A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211162936.5A CN115546268A (en) 2022-09-23 2022-09-23 Multi-mode remote sensing image registration method, system, terminal device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211162936.5A CN115546268A (en) 2022-09-23 2022-09-23 Multi-mode remote sensing image registration method, system, terminal device and storage medium

Publications (1)

Publication Number Publication Date
CN115546268A true CN115546268A (en) 2022-12-30

Family

ID=84730029

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211162936.5A Pending CN115546268A (en) 2022-09-23 2022-09-23 Multi-mode remote sensing image registration method, system, terminal device and storage medium

Country Status (1)

Country Link
CN (1) CN115546268A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118071804A (en) * 2024-02-21 2024-05-24 安徽大学 Multi-mode remote sensing image registration method based on mixed deformation and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118071804A (en) * 2024-02-21 2024-05-24 安徽大学 Multi-mode remote sensing image registration method based on mixed deformation and storage medium

Similar Documents

Publication Publication Date Title
Li et al. LNIFT: Locally normalized image for rotation invariant multimodal feature matching
Kuppala et al. An overview of deep learning methods for image registration with focus on feature-based approaches
El Amin et al. Convolutional neural network features based change detection in satellite images
EP2833293B1 (en) Automated graph local constellation (GLC) method of correspondence search for registration of 2-D and 3-D data
Cheng et al. Accurate urban road centerline extraction from VHR imagery via multiscale segmentation and tensor voting
EP3440428A1 (en) Remote determination of quantity stored in containers in geographical region
US20190205693A1 (en) Scale-Invariant Feature Point Extraction in Edge Map
Li et al. Place recognition based on deep feature and adaptive weighting of similarity matrix
Direkoğlu et al. Shape classification via image-based multiscale description
Han et al. Research on remote sensing image target recognition based on deep convolution neural network
Cao et al. Multi angle rotation object detection for remote sensing image based on modified feature pyramid networks
Wang et al. A deep deformable residual learning network for SAR image segmentation
Wei et al. SARNet: Spatial Attention Residual Network for pedestrian and vehicle detection in large scenes
CN115546268A (en) Multi-mode remote sensing image registration method, system, terminal device and storage medium
Choi et al. Regression with residual neural network for vanishing point detection
Ye et al. Fast and Robust Optical-to-SAR Remote Sensing Image Registration Using Region Aware Phase Descriptor
Salehpour et al. Hierarchical approach for synthetic aperture radar and optical image coregistration using local and global geometric relationship of invariant features
Zhong et al. Online learning 3D context for robust visual tracking
Ye et al. Fast and robust structure-based multimodal geospatial image matching
Ye et al. Rotation invariant feature lines transform for image matching
Quach Convolutional networks for vehicle track segmentation
Kai et al. Multi-source remote sensing image registration based on normalized SURF algorithm
Liu et al. Superpixel segmentation of high-resolution remote sensing image based on feature reconstruction method by salient edges
Hou et al. Robust point correspondence with gabor scale-invariant feature transform for optical satellite image registration
Zhu et al. A lightweight deep convolutional network with inverted residuals for matching optical and SAR images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination