CN111476251A

CN111476251A - Remote sensing image matching method and device

Info

Publication number: CN111476251A
Application number: CN202010224164.8A
Authority: CN
Inventors: 蓝朝桢; 施群山; 张永显; 卢万杰; 吕亮; 崔志祥; 侯慧太; 秦剑琪
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2020-03-26
Filing date: 2020-03-26
Publication date: 2020-07-31

Abstract

The invention relates to a remote sensing image matching method and a device, belonging to the technical field of remote sensing image data processing, firstly, an improved VGG16 model is adopted for feature extraction, so that extracted feature points have enough abstraction and higher positioning precision, then key point screening is carried out on the extracted feature map, so as to avoid the interference of the feature map with the less obvious features, not only improve the matching accuracy, but also improve the data processing efficiency, meanwhile, a dynamic adaptive distance method is adopted for screening the screened matching pairs, the automatic configuration is carried out according to the data characteristics, the high-quality matching pairs can be screened, the matching performance of the heterogeneous remote sensing images is improved, and the test is carried out on the invention by utilizing a plurality of groups of heterogeneous remote sensing image pairs, and the result shows that the invention has strong adaptability and robustness, and is superior to a DE L F algorithm in the aspects of adaptability, the number of the matching points, the distribution, the efficiency and the like, so that the heterogeneous and the robust matching performance of the remote sensing images with larger differences are improved.

Description

Remote sensing image matching method and device

Technical Field

The invention relates to a remote sensing image matching method and device, and belongs to the technical field of remote sensing image data processing.

Background

With the rapid development of remote sensing technology, earth observation images of various sensors such as visible light, infrared and Synthetic Aperture Radar (SAR) are becoming abundant. Different platforms and different source images acquired by the sensors have certain complementarity, and a massive data source is provided for deep mining and big data analysis of remote sensing information. Matching between images is a core problem for further processing and analysis of heterogeneous images. Due to different imaging mechanisms, wave bands, time phases and the like, the heterogeneous images have great differences in radiation characteristics and geometric characteristics, and the matching between the heterogeneous images is always the research difficulty of image matching.

Scholars at home and abroad propose various matching methods aiming at the problem. Image Feature matching generally extracts local Feature information descriptors in a certain neighborhood around a keypoint, and determines a matching point by comparing the descriptors, wherein the most notable descriptor is a Scale Invariant Feature Transform (SIFT) descriptor. The SIFT descriptor can well resist rotation and scale difference among images, but the matching effect on different-source images is poor due to the gradient distribution based on the local neighborhood of the images. Therefore, many scholars have attempted to develop heterogeneous image matching studies by improving the SIFT algorithm or incorporating other constraint information. For example, large-scale adaptive anisotropic Gaussian SIFT is adopted as a matching feature, a progressive SIFT matching method is adopted, SIFT features and edge information are combined to extract features, and SIFT multi-source image matching algorithm based on a virtual matching window is adopted. The algorithm based on the traditional feature extraction operator is suitable for heterogeneous images with relatively similar image gray levels and structures. However, when the radiation and geometric differences between the heterogeneous images are large, it is difficult to obtain good results because the small critical gradient information does not provide stable features.

Because the radiation difference between images is difficult to overcome by the gradient information of the critical domain, some scholars introduce the phase consistency characteristics of illumination and contrast invariance into the matching of remote sensing images and then introduce the phase consistency characteristics into the matching of heterogeneous remote sensing images. Compared with SIFT characteristics, the phase consistency characteristics have higher anti-interference capability than the gradient characteristics, so that a certain effect is achieved in the heterogeneous image matching. However, the features and the SIFT features have the same weakness, depend on the bottom-layer gray scale information in a very small neighborhood around the key point, and the performance is still not robust enough when the imaging mechanism and conditions are different.

In the past years, deep learning methods, particularly Convolutional Neural Networks (CNNs), have made tremendous advances and performance improvements in computer vision tasks such as image classification, object detection, and segmentation. By using the continuous layers of the CNN, more and more complex image features can be well acquired, and specific high-level features can be learned. Since the first introduction in 2014, scholars began to apply CNN to the image feature extraction process and gradually shifted from SIFT features to CNN features. Traditional artificially designed descriptors can only extract and represent low-level features of images, while CNNs are generally considered to be able to extract higher-level abstract features. The method utilizes high-level more abstract semantic information for matching, has strong generalization, is closer to the human visual observation principle, theoretically can resist the interference caused by wave bands, imaging modes, seasonal changes and the like, and is expected to obtain great improvement on the matching adaptability.

The CNN feature extraction method proposed in the early stage well solves the feature description problem, but due to the contradiction with generalization capability, the feature lacks pixel accurate positioning capability, in 2017, aiming at large-scale landmark image retrieval, the proposed DEep local feature (DEep L global Features, DE L F) introduces an attention mechanism to carry out key point selection, but DE L F does not carry out targeted training under the condition of huge change of environment, the simultaneous feature extraction and description method D2-Net proposed by scholars such as Mihai Dusmanu in Suzurich institute of Federal workers in 2019 utilizes more than 30 ten thousand pre-matched stereo pairs to carry out training, and has important progress in solving the landmark recognition of ground vehicle visual navigation under the changed scene and shows huge potential, most of images processed by D2-Net are ground near-view visible optical images and are not suitable for heterogeneous remote sensing images with huge difference in all aspects.

Disclosure of Invention

The invention aims to provide a remote sensing image matching method and a remote sensing image matching device, and aims to solve the problem that the existing heterogeneous remote sensing image matching method cannot be applied to scenes with large environmental changes.

The invention provides a remote sensing image matching method for solving the technical problems, which comprises the following steps:

1) extracting a depth feature map of a remote sensing image to be matched by using a trained VGG16 improved network model, selecting key points from the extracted depth feature map, and generating descriptors of the key points, wherein the VGG16 improved network model adopts the last convolutional layer in the fourth layer in the middle of the VGG16 model as output;

2) searching the key point descriptors in the step 1) by utilizing a nearest neighbor search algorithm to obtain matching pairs to be screened, and counting a first matching point distance closest to each matching pair to be screened and a second matching point distance next to each matching pair to be screened;

3) and screening the matching pairs to be screened by adopting a dynamic adaptive distance method according to the first matching point distance and the second matching point distance in each statistical matching pair to be screened, and screening out the matching pairs with the first matching point distance smaller than the second matching point distance difference value so as to realize the matching of the remote sensing images to be matched.

The invention also provides a remote sensing image matching device which comprises a processor and a memory, wherein the processor executes the computer program stored by the memory so as to realize the remote sensing image matching method.

Firstly, an improved VGG16 model is adopted for feature extraction, and the last convolutional layer in the fourth convolutional layer in the middle of the VGG16 model is used as output, so that the extracted feature points have enough abstraction and can also obtain higher positioning accuracy; then, the extracted feature graph is subjected to key point screening to avoid the interference of the feature graph with less obvious features, so that the matching accuracy is improved, and the data processing efficiency is also improved; and finally, screening the to-be-screened matching pairs by adopting a dynamic adaptive distance method, automatically configuring according to the characteristics of the data, avoiding manual adjustment, ensuring the high-quality matching pairs which can be screened out, and improving the matching performance of the heterogeneous remote sensing images.

Further, in order to adapt to the geometric transformation of the remote sensing image, the step 3) further comprises re-screening the screened matching pairs according to the geometric position constraint of the key points.

Further, in order to improve the accuracy of the geometric constraint, the RANSAC algorithm is used for screening again, and an affine transformation model is used as the geometric model.

Further, in order to ensure that a high-quality matching pair can be automatically screened out, the principle of screening in the dynamic adaptive distance method in step 3) is as follows:

dis_j＜dis′_j-avgdis

wherein dis_jIs the first matching point distance, dis 'of the jth matching pair of the matching pairs to be screened'_jThe distance of the second matching point of the jth matching pair in the matching pair to be screened is obtained, and avgdis the mean value of the difference between the distance of the second matching point and the distance of the first matching point in each matching pair.

Further, in order to ensure the resolution of the extracted feature map, the VGG16 improves the window sliding step size of the last pooling layer in the network model to 1 pixel.

Further, the VGG16 improves the last pooling layer in the network model by using an average pooling method.

Further, in order to reduce mismatching and enhance the uniqueness of the descriptors, the VGG16 improves that the network model adopts a triple edge ordering loss function during training, and a detection item is added in the function.

Further, in order to adapt to the transformation of different scales of the images to be matched, when feature extraction is performed, feature extraction is performed on each layer of the pyramid image, and the image features of low resolution are added to the image features of high resolution.

Further, in order to ensure that the features of the screened key points are significant, the selection principle of the key points in the step 1) is channel maximization and local maximization.

Drawings

Fig. 1 is a schematic diagram of a CNN network structure adopted in the present invention;

FIG. 2 is a flow chart of a remote sensing image matching method of the present invention;

FIG. 3-a is a diagram of DS1 Google image (2009) in a first set of image data of an experimental example of the present invention;

FIG. 3-b is a graph of DS2 Google images (2018) in a first set of image data according to an example of the present invention;

FIG. 3-c is a visible light image of the DS3 UAV in the first set of image data according to the experimental example of the present invention;

FIG. 3-d is a SAR image of DS4 Google unmanned aerial vehicle in the first set of image data according to the experimental example of the present invention;

FIG. 4-a is a high-resolution image of DS5 in the second set of image data according to the experimental example of the present invention;

FIG. 4-b is a drawing of the DS6 resource No. 3 image in the second set of image data according to the experimental example of the present invention;

FIG. 5-a is a thermal infrared image of the DS7 UAV in the third set of image data according to the experimental example of the present invention;

FIG. 5-b is a visible light image of the DS2 UAV in the third set of image data according to the experimental example of the present invention;

FIG. 6 is a schematic diagram of the SIFT matching results of DS1-DS2 images in the experimental example of the present invention;

FIG. 7 a is a schematic diagram showing the matching results of DS1-DS2 images using DE L F in the experimental example of the present invention;

FIG. 7-b is a diagram showing the matching results of DS1-DS2 images obtained by the matching method of the present invention in the experimental examples of the present invention;

FIG. 7-c is a partial enlarged view of the matching result of DS1-DS2 images using DE L F in the experimental example of the present invention;

FIG. 7-d is a partial enlarged view of the matching results of DS1-DS2 images obtained by the matching method of the present invention in the experimental examples of the present invention;

FIG. 8 is a diagram illustrating the relationship between the number of matching points and the error in the matching method of the present invention;

FIG. 9 is a matching point location error map for the matching method of the present invention;

FIG. 10-a is a matching point location error pattern for the matching method of the present invention;

FIG. 10-b is an enlarged view of a portion of the matching point location error direction of the matching method of the present invention;

FIG. 11-a is a diagram showing the matching results of Google video DS2 and unmanned aerial vehicle video DS3 using the DE L F algorithm in the test example of the present invention;

FIG. 11-b is a diagram showing the matching results of Google image DS2 and unmanned aerial vehicle image DS3 using the method of the present invention in the test example of the present invention;

FIG. 12-a is a diagram showing the matching result between Google images DS2 and unmanned aerial vehicle SAR images DS4 using the DE L F algorithm in the test example of the present invention;

FIG. 12-b is a diagram showing the matching results of Google images DS2 and UAV SAR images DS4 using the method of the present invention in the test example of the present invention;

FIG. 13-a is a diagram showing the matching result between the high-resolution No. 3 SAR image DS6 and the resource No. three panchromatic image DS5 using the DE L F algorithm in the experimental example of the present invention;

FIG. 13-b is a diagram showing the matching results of the high-resolution No. 3 SAR image DS6 and the resource No. three panchromatic image DS5 according to the method of the present invention in the experimental example of the present invention;

FIG. 14-a is a graph showing the matching results of an optical image DS8 and a thermal infrared image DS7 using the DE L F algorithm in the test examples of the present invention;

FIG. 14-b is a graph showing the matching results of an optical image DS8 and a thermal infrared image DS7 according to the method of the present invention in the test example of the present invention;

fig. 15 is a schematic structural diagram of the remote sensing image matching device according to the present invention.

Detailed Description

Method embodiment

Firstly, inputting an image to be matched into a trained improved VGG16 network, extracting a conv4_3 convolutional layer in the improved VGG16 network to generate a 512-channel feature map, selecting key points by using two strategies of channel maximization and local maximization, extracting a key point position 512-dimensional descriptor on the feature map to complete feature extraction and description, and then, after feature matching is performed by using fast nearest neighbor search (F L ANN), in order to solve the problem of more mismatching points, adopting a rejection algorithm of dynamic self-adaptive Euclidean distance constraint and RANSAC common constraint, so that the mismatching can be effectively rejected, and the correct matching points can be maximally reserved.

1. And constructing a depth feature extraction model and training.

For remote sensing images, a convolutional neural network CNN is generally adopted for depth feature extraction, and in the face of heterogeneous remote sensing images with great differences in all aspects, a VGG16 network model in the convolutional neural network is selected and adaptively modified. The VGG16 model has 5 block modules, each block module is regarded as a convolution network, namely the model comprises 5 convolution networks in total, the receptive fields of the first layers of the network are very small, the obtained features are local features of a relative bottom layer, most of the extracted features are features such as edges, angular points and the like, and the positioning precision is high; the higher the network layer number is, the more abstract the extracted features are, the more global the information is, the more resistant the interference caused by the heterogeneous image is, but the worse the positioning accuracy is. Therefore, in order to enable the feature points to have sufficient abstraction and obtain higher positioning accuracy, the invention selects and improves the VGG16 network model, namely, the last (third) convolutional layer Conv4_3 of the 4 th block module in the middle of the VGG16 network model as a feature map, as shown in fig. 1.

The resolution of each layer of the VGG16 network model is generally reduced after pooling, and in order to maintain the resolution of the feature map, the window sliding step size of the last (third) pooling layer is replaced by 1 pixel from 2 pixels, and the pooling method is also replaced by maximum pooling and average pooling. The third convolutions of the fourth layer (Conv4_1 to Conv4_3) are void convolutions with a void convolution rate (disparity) of 2 to increase the receptive field and improve the generalization ability of the feature expression.

After the transformation, compared with a classic VGG16 network, the feature map output by the novel VGG16 network constructed by the invention is expanded from 1/8 to 1/4 of an original image, and the positioning accuracy can be doubled.

In the process of feature detection, a certain universality of feature points is expected to adapt to the influence of different environmental illumination, and meanwhile, in the process of feature description, a near-possible uniqueness of feature vectors is expected to avoid mismatching, so that the design of a loss function is particularly critical in the training process, the invention adopts a triple margin ranking loss function (TMR L) which tries to enhance the uniqueness of related descriptors by punishing any confusion descriptor which can cause mismatching, and in addition, in order to seek repeatability ranking loss functions of detection features, detection items are added into the triple margin ranking loss function, wherein the loss function is as follows:

wherein p (c) defines positive distances for descriptors with co-name image points in cross-correlation, n (c) defines negative distances for descriptors with non-co-name points,

and

is in image I₁And I₂The feature detection scores obtained at the two points of upper A, B. C is an image I₁And I₂The set of all the above homonymous image points.

The loss function generates a weighted average of the edge factor m based on the detection scores of all matches. Therefore, in order to minimize the loss, the most relevant correspondences with lower edge factors will get higher relative scores, and let the correspondences with higher relative scores have similar descriptors different from the rest of the features, which is more favorable for the robustness of the matching.

To generate training data at the pixel level correspondence level, the invention uses a MegaDepth dataset consisting of images of 196 different scenes reconstructed from 1070468 Internet photos using the open source motion recovery architecture (SfM) software CO L MAP.

The VGG16 model after the training is called a D2-Net model, the model pre-trained by D2-Net is adopted to perform work in the embodiment, the D2-Net model is obtained by using VGG16 pre-trained by ImageNet, and the final dense feature extractor conv4_3 is subjected to fine adjustment. For each image pair, all points of the second image with depth information are projected into the first image, training sharing 327,036 image pairs.

Although the data is expanded during the training and the CNN feature has a certain adaptability to the scale change, it is difficult to cope with the case where the scale change is large. Therefore, the invention adopts the image pyramid to deal with the scale change.

For the input image I, the pyramid image I_ρIn the embodiment, 4 levels of 0.25/0.5/1/2 times of resolution (i.e. ρ ═ 0.25,0.5,1,2) are adopted to adapt to the resolution change of the two images, and each layer of the pyramid extracts the CNN feature F respectively^ρThen, the superposition is performed using the following formula:

the high-resolution image feature map is overlapped with low-resolution features, and because the pyramid resolution is different, the low-resolution feature map needs to be linearly interpolated into the high-resolution feature map with the same size, and then corresponding feature accumulation is carried out.

2. And acquiring a remote sensing image to be matched, extracting depth features of the remote sensing image, and selecting key points and extracting descriptors.

Inputting the obtained remote sensing image to be matched into the trained model established in step 1, in this embodiment, inputting the obtained remote sensing image into the model to perform feature extraction, specifically, a feature map output by Conv4_3 is the extracted depth feature. However, if each pixel of the feature map output by the Conv4_3 is directly used as a feature, the features are too dense, most of the features are not significant enough, and therefore, some key points with obvious features need to be selected from the feature map for feature screening.

Using the network model in step 1, inputting an original image as I, the size as w × h, and setting a network output characteristic diagram as a 3D tensor

The number of channels n is 512, and feature key point position and descriptor extraction is performed in F. In this embodiment, feature point screening is implemented by finding a policy that a feature value is locally maximum in a longitudinal channel direction and a transverse plane at the same time, where a formula used in the screening is as follows:

wherein D is^kIs a k-th layer characteristic value, and

is the eigenvalue at the pixel of point (i, j) on the eigenvalue. For a point P (i, j) to be selected, firstly, channel selection is carried out, namely, a channel k with the largest response value is selected from n channel characteristic graphs, and a characteristic graph D on the corresponding channel is obtained^k(ii) a Is connected withAuthentication of a person

If yes, the candidate point P is a salient point, namely, the candidate point P is selected as a characteristic point, and meanwhile, L2 normalized characteristic values of the paradigm on 512 channels at the (i, j) positions are extracted

As a descriptor.

Wherein d is_ij＝F_ij，

In order to obtain more accurate positions of the key points, an SIFT algorithm is used for reference, a method of local encryption on a feature map is adopted, and meanwhile descriptors are obtained by utilizing linear interpolation.

Is an n-dimensional vector, and the matching can be performed according to the Euclidean spatial distance by using the vector.

3. And matching the images to be matched according to the extracted key points and the descriptors.

Because the heterogeneous images have large difference and a large number of mismatching situations exist, the embodiment adopts a mismatching point removing method combining a dynamic self-adaptive distance constraint condition and random adopted consistency (RANSAC) constraint, and the specific process is as follows:

searching the key points and the feature descriptors selected in the step 2 through an F L ANN algorithm, and determining a matching pair to be screened, wherein the matching pair to be screened comprises a first matching point with the closest Euclidean distance and a second matching point with the second closest Euclidean distance_jDistance dis 'to second matching point'_jThe smaller the size, the better the quality of the match.

The traditional algorithm all adopts a fixed scale factor t as a threshold value, namely when dis is satisfied_j＜t·dis′_jIn order to solve the problem and improve the adaptability of the algorithm, the invention designs a dynamic self-adaptive Euclidean distance constraint method, which is used for counting data of matching pairs to be purified and automatically configuring corresponding parameters according to the characteristics of the data, wherein N matching pairs searched from F L ANN contain a large number of mismatching and count the mean value of the distance difference between a first matching point and a second matching point:

then, for each matching pair to be screened, whether the matching pair is removed or not is determined as that the distance of the first matching point is smaller than the difference between the distance of the second matching point and the mean value avgdis of the distance difference, and the formula is as follows:

dis_j＜dis′_j-avgdis

distance difference mean values obtained through statistics from data are used as a judgment comparison standard, so that the method can be well suitable for differences between image pairs from different sources, can be well used for first-round screening, and retains high-quality matching points.

The dynamic self-adaptive Euclidean distance constraint is a screening method on the aspect of a feature vector descriptor, and also needs to utilize the geometric position constraint of feature points, RANSAC is a common geometric constraint method of matching points, and RANSAC (random sample consensus) algorithm is an algorithm for obtaining effective sample data by continuously iterating and calculating the optimal mathematical model parameters of data according to a group of data sets containing abnormal data based on the idea of least square algorithm, so as to carry out the constraint of feature matching between images. The invention also uses RANSAC algorithm, and the geometric model adopts affine transformation model to adapt to the transformation of scaling, translation, rotation, miscut, etc. between image pairs of different imaging models.

The method firstly provides a CNN model for the transformation of the heterogeneous images, and adopts image matching points with larger illumination environment difference as samples for training, so that the influence caused by radiation change can be better resisted; then selecting key points according to two conditions of simultaneously meeting the maximum channel and the maximum local channel, and extracting 512-dimensional descriptors at corresponding positions on the feature map; and finally, in the matching stage, after the rapid nearest neighbor search feature matching is completed, in order to solve the problem of more mismatching points, a purification algorithm jointly constrained by the dynamic self-adaptive Euclidean distance and RANSAC is provided, so that the correct matching points are reserved to the maximum extent while the mismatching is effectively eliminated.

Device embodiment

The apparatus proposed in this embodiment, as shown in fig. 15, includes a processor and a memory, where a computer program operable on the processor is stored in the memory, and the processor implements the method of the above method embodiment when executing the computer program.

That is, the method in the above method embodiment should be understood as a flow of the remote sensing image matching method implemented by the computer program instructions. These computer program instructions may be provided to a processor such that execution of the instructions by the processor results in the implementation of the functions specified in the method flow described above.

The processor referred to in this embodiment refers to a processing device such as a microprocessor MCU or a programmable logic device FPGA;

the memory referred to in this embodiment includes a physical device for storing information, and generally, information is digitized and then stored in a medium using an electric, magnetic, optical, or the like. For example: various memories for storing information by using an electric energy mode, such as RAM, ROM and the like; various memories for storing information by magnetic energy, such as hard disk, floppy disk, magnetic tape, magnetic core memory, bubble memory, and U disk; various types of memory, CD or DVD, that store information optically. Of course, there are other ways of memory, such as quantum memory, graphene memory, and so forth.

The apparatus comprising the memory, the processor and the computer program is realized by the processor executing corresponding program instructions in the computer, and the processor can be loaded with various operating systems, such as windows operating system, linux system, android, iOS system, and the like.

As other embodiments, the device can also comprise a display, and the display is used for displaying the diagnosis result for the reference of workers.

In order to comprehensively evaluate the heterogeneous remote sensing image matching method, the performance and the adaptability, a test data set containing multi-source remote sensing images is established below, tests are carried out by utilizing SIFT and DE L F and the method, the precision and the adaptability of the heterogeneous remote sensing images are tested, and indexes such as matching accuracy, uniformity, matching speed and the like are adopted for evaluation, wherein a deep learning model is realized under a Pythch frame, a computer for the tests is a luxury ROG notebook computer, a CPU is i7-9750H, a video card is GeForce RTX2060(6G video memory), the memory is 32GB, the realization language is Python, and an operating system is Ubuntu 16.04.

Experimental data as shown in table 1, the data are named DS1-DS8 for convenience of description, and the image data source, the band type, the resolution, the clipped pixel size, and the imaging time, etc., are listed in detail in table 1.

TABLE 1

The test data source covers a satellite-borne sensor, an unmanned aerial vehicle sensor and Google earth images, the wave bands comprise visible light, SAR and thermal infrared wave bands, the resolution ratios are different, the time span and the season span are large, and the method has good representativeness on the adaptability of the test algorithm.

The coverage area of all data is located in the area near the Chinese remote sensing satellite calibration field in the south Henan Density, and the coverage area is divided into three groups. The first group is a Chinese remote sensing satellite calibration field region (DS 1-DS 4), which comprises Google images (DS1 and DS2) with the span of nearly 10 years, and the ground-object difference is quite obvious, as shown in figures 3-a and 3-b; further, the unmanned aerial vehicle image (DS3) and the SAR image (DS4) in the same area are included, as shown in FIGS. 3-c and 3-d. For convenience of statistics of matching accuracy, two orthoimages DS1 and DS2 are registered. The images of the group are all cut uniformly, and the resolution is 0.5 m. The first group of data mainly considers the adaptability of the algorithm under the conditions of seasons, imaging modes and ground feature changes, and the number of correct matching points of the algorithm can be further considered by using the well-registered DS1 and DS 2.

The second group is DS5 and DS6, which are resource No. three lower view panchromatic images (resolution about 2 m) and high resolution No. 3S L sensor SAR images (resolution about 0.5 m) respectively, as shown in FIG. 4-a and FIG. 4-b respectively, and the location is Changchang Tangzhuanlu village.

The third group is DS7 and DS 8. The DS7 is a thermal infrared image obtained by mounting the macro-jiang XT2 sensor with an unmanned aerial vehicle in the complete absence of visible light, and as shown in fig. 5-a, the earth surface temperature is quantified by using a 0-255 gray scale method, and the gray scale value at the high temperature is large. DS8 is the visible image of the corresponding area, as shown in fig. 5-b, and the thermal infrared image is rotated by a certain angle. This set of data is intended to test whether the matching algorithm is effective for matching thermographic images to visible light and to test the matching ability of images for a certain rotation.

Test results and analysis

Firstly, DS1 and DS2 images with large time span, large seasonal variation and obvious ground feature variation are selected for carrying out a matching experiment, a representative algorithm SIFT with the best characteristic matching performance and a CNN characteristic matching method DE L F are adopted for comparison, and for the convenience of clear display of results, DS1 and DS2 are uniformly resampled into 500 × 500 pixels in the experiment.

Fig. 6 shows the results of feature extraction and matching using the SIFT algorithm. It can be found that, for images with different time phases and large seasonal differences such as DS1 and DS2, although a large number of feature points can be extracted, correct points cannot be obtained by matching, and meanwhile, experiments also verify that the matching of the SIFT algorithm to the optical image, the SAR and the infrared image is also invalid. The characteristics extracted by the traditional method are not suitable for matching of heterogeneous images.

In the test, a RANSAC purification threshold value is set to be 15 pixels, a DE L F algorithm presets 5000 extracted feature limits for each image, a distance threshold value is 0.95, a feature dimension is configured to be 256, a score threshold value is taken to be 50, a matching result pair is shown in a figure 7-a and a figure 7-b, a statistical result is shown in a table 2, and therefore the coordinate difference of a matching point on the two images is used as a standard for judging whether the matching is correct or not.

FIG. 7-a is the matching result of DE L F algorithm, FIG. 7-b is the matching result of the algorithm of the present invention, and FIG. 7-c and FIG. 7-d are the partial enlarged views of the matching results of the two methods, the present invention also counts the initial matching logarithm, the number of inliers after the dynamic adaptive distance and RANSAC constraint screening, and the result of the correct matching logarithm obtained from the strictly registered image coordinates as shown in Table 2 (the correct matching point is calculated within 3 pixels of error).

TABLE 2

Compared with DE L F and the algorithm of the invention, the method of the invention is superior to DE L F in the aspects of correct point logarithm quantity and distribution uniformity, and the advantages of the invention are relatively obvious in matching efficiency.

From the above test results, although the present invention obtains a large number of matching points on the heterogeneous image, the errors are about several pixels, and the matching accuracy is less than 1 pixel, as shown in fig. 8. The reason is analyzed, because the Google image adopted in the test is not a true ortho-image, the time span is nearly 10 years, the difference of land and objects such as houses, vegetation and the like is obvious, the number of layers, quantity and the like of the houses are greatly changed, and high-precision registration is difficult, so certain errors exist in the comparative standard; on the other hand, because the CNN depth features require a plurality of pooling operations, the resolution is also reduced while the abstract features are extracted, which also affects the accuracy of extracting feature points. In addition, as can be seen from fig. 9, 10-a and 10-b, the errors are uniformly distributed around the origin, the mean value of the errors is very close to the origin, and the algorithm does not introduce systematic errors.

The invention tests the matching of Google images with unmanned aerial vehicle optical and SAR images, high-resolution No. 3 SAR images, resource No. three panchromatic images, thermal imaging images, visible light images and other heterogeneous images in various modes, and compares the Google images with DE L F under the same condition, and each pair of images are matched and tested by two methods, and the number of matched points and the time consumption are shown in Table 3.

TABLE 3

11-a and 11-b are the Google Earth image and the unmanned aerial vehicle image matching, the original resolution 1500 × pixel image matching, the number of points matching the invention is 2 times of DE L F, while the DE L F method increases with the image increase, the time consumption increases sharply, FIGS. 12-a and 12-b are the matching results of Google Earth image and unmanned aerial vehicle SAR image, FIGS. 13-a and 13-b are the matching results of SAR image and panchromatic image, FIGS. 14-a and 14-b are the matching results of thermal infrared image and optical image.

Through the experiment, the result shows that the heterogeneous image matching method and the heterogeneous image matching device provided by the invention have strong adaptability and robustness, are superior to the DE L F algorithm in the aspects of adaptability, the number, distribution, efficiency and the like of matching points, provide a good algorithm for the robust matching of the heterogeneous images and the remote sensing images with large differences, and simultaneously provide a certain thought reference for other fields such as image intelligent retrieval and the like.

Claims

1. A remote sensing image matching method is characterized by comprising the following steps:

2. The remote sensing image matching method according to claim 1, wherein the step 3) further comprises re-screening the screened matching pairs according to geometric position constraints of key points.

3. The remote sensing image matching method according to claim 2, wherein the rescreening uses a RANSAC algorithm, and the geometric model uses an affine transformation model.

4. A remote sensing image matching method according to claim 1 or 3, wherein the principle of the dynamic adaptive distance method in the step 3) is as follows:

dis_j＜dis′_j-avgdis

5. The remote sensing image matching method according to claim 1, wherein the window sliding step size of the last pooling layer in the VGG16 improved network model is 1 pixel.

6. The remote sensing image matching method according to claim 1 or 5, wherein an average pooling method is adopted by the last pooling layer in the VGG16 improved network model.

7. The remote sensing image matching method according to claim 6, wherein the VGG16 improved network model adopts a triple edge ordering loss function during training, and a detection item is added to the triple edge ordering loss function.

8. The remote sensing image matching method according to claim 1, wherein in the feature extraction, feature extraction is performed on each layer of the pyramid image, and image features of low resolution are added to image features of high resolution.

9. The remote sensing image matching method according to claim 1, wherein the selection principle of the key points in the step 1) is channel maximization and local maximization.

10. A remote sensing image matching apparatus, characterized in that the matching apparatus comprises a processor and a memory, the processor executes a computer program stored by the memory to realize the remote sensing image matching method according to any one of claims 1 to 9.