CN115082533B

CN115082533B - Near space remote sensing image registration method based on self-supervision

Info

Publication number: CN115082533B
Application number: CN202210748938.6A
Authority: CN
Inventors: 张浩鹏; 李晓涵; 姜志国; 谢凤英; 赵丹培
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2022-06-28
Filing date: 2022-06-28
Publication date: 2024-05-28
Anticipated expiration: 2042-06-28
Also published as: CN115082533A

Abstract

The invention discloses a self-supervision-based near space remote sensing image registration method, which comprises the steps of inputting images to be registered into a convolutional neural network to obtain depth features for feature matching, screening matched feature points by using RANSAC, and estimating a homography matrix between two image pairs to obtain a coarse registration image pair; and secondly, inputting the rough registered image pairs into a registration flow estimation network, finally obtaining a registration flow estimation result based on multiple scales and a pixel level matching mask through pyramid feature extraction and cascading optical flow inference processes, deleting the points with high matching degree in the matching feature points obtained in the first step through the mask, and then recalculating a homography matrix, so that the iterative registration algorithm enables the iterative registration algorithm to find more homography matrix candidates in the first step. In each iteration, the feature correspondence of the points in the previous matrix office is deleted, and the positions in the previously predicted matchable mask are then recalculated RANSAC, thereby obtaining a better registration result.

Description

Near space remote sensing image registration method based on self-supervision

Technical Field

The invention relates to the technical field of digital image processing, in particular to a near space remote sensing image registration method based on self supervision.

Background

The near space refers to an airspace 20-100 km away from the ground, and is widely focused due to important development and application values. Based on research of near space platforms, most of the near space platforms are based on mature technologies of aircrafts such as satellites or aviation aircrafts, and most of the near space platforms are used together with other platforms. Compared with a satellite image, the image acquired by the near space platform has higher spatial resolution, wider imaging inclination angle and larger freedom degree in time and space, and can effectively supplement the satellite image and the aerial image.

The remote sensing image registration refers to the process of integrating image pairs with a certain overlapping area into the same coordinate system, so that the pixel points in the two images, which refer to the same position in space, are optimally aligned on the image position, which is a key technology for image processing and analysis, and provides technical support for the fields of image analysis, image detection, remote sensing image processing and the like. Because of image sensor type changes, imaging angle differences, obstacle coverage, long imaging time spans, high-precision image registration cannot be achieved by solely relying on the given geographic reference information of the images. The coverage area of the adjacent space platform is also large, but the adjacent space platform may be limited by the air right, so that the adjacent space platform may be limited to shooting with a large inclination angle during investigation, target imaging distortion is caused, and meanwhile, geometric distortion and atmospheric distortion exist due to unstable platform posture and large shaking distortion. The registration of the near space remote sensing images is carried out by using the traditional method, because the traditional feature descriptors are not strong in robustness and are easily influenced by factors such as brightness, the registration effect is often unsatisfactory. Thus, the deep learning-based approach is more applicable to diverse features. But the larger tilt and time span that the image exists, the more costly it is to obtain manual annotation data.

When matching a remote sensing image dataset with large Scale, high resolution and complex scene, the traditional algorithm such as sift (Scale-INVARIANT FEATURE TRANSFORM) is difficult to extract complete characteristic information, is greatly influenced by noise interference, so that the matching accuracy is poor, and the traditional matching algorithm has the defects of complex manual characteristic design calculation process, large calculation amount and low calculation efficiency. The remote sensing image registration technology based on deep learning mostly adopts a pre-trained feature extractor to extract features, and then matches key points to obtain a change matrix, but the change matrix is greatly limited by the matrix itself, such as a commonly used homography matrix, and has only 8 degrees of freedom, but natural image distortion causes are various, and the change matrix is difficult to express well by a single matrix. A few methods attempt to use a flow field to express the displacement between two remote sensing images, but this can only be applied to the case of small displacement, and the effect is often unsatisfactory for the near space remote sensing images with large dip angles. And the flow field is abnormally sensitive to the change of the ground object, so that the calculated flow field and the registration image are often abnormal. The algorithm requirement is high. In terms of data sets, supervised methods require large amounts of labeling accurate data, but the acquisition costs are excessive and the accuracy is highly affected by human factors. Unsupervised or self-supervised methods are important research directions in the future.

Therefore, how to effectively improve the efficiency and effect of the space remote sensing image counterweight is a problem that needs to be solved by those skilled in the art.

Disclosure of Invention

In view of the above, the invention provides a self-supervision-based near-space remote sensing image registration method, which improves the near-space remote sensing image registration effect and registration efficiency.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a self-supervision-based near space remote sensing image registration method comprises the following steps:

step 1: acquiring a near space remote sensing image to obtain an image pair to be registered; the image pair to be registered comprises a matched image and a target image;

Step 2: feature matching is carried out on the image pair to be registered by adopting a pre-trained deep convolutional neural network, so as to obtain matching points, a homography matrix between the image pair to be registered is calculated according to the matching points, rough registration is carried out according to the homography matrix, and a rough registration image pair is obtained, wherein the rough registration image pair comprises a rough matching result image C _I1 and a target image C _I2;

step 21: extracting image features of an image pair to be registered by adopting a pre-trained deep convolution neural network, and obtaining matched feature point pairs between the image pair to be registered by utilizing cosine similarity, wherein the deep convolution neural network can adopt Resnet-50 backbone networks;

Step 22: screening the feature point pairs which are mismatched by adopting a RANSAC algorithm, and constructing a homography matrix by using the points which are correctly matched by the rest to obtain candidate homography matrix change;

Step 23: performing coarse registration on the image pair to be registered according to the homography matrix to generate a coarse matching result graph;

Step 3: acquiring a plurality of groups of image pairs to be registered to form a training data set, preprocessing the image pairs through the rough matching process in the step2, inputting the preprocessed rough registration image pairs into a fine matching optical flow reasoning network for self-supervision training optimization, and obtaining an optimized fine matching optical flow reasoning network so as to output a fine registration flow field and a pixel level point matching degree mask; the fine registration flow field comprises the corresponding relation of points between the coarse registration image pairs and the optical flow between the corresponding points;

step 31: inputting the rough matching result graph C _I1 and the target image C _I2 into a pyramid feature extraction network to obtain a plurality of layers of feature pyramids F _k(C_Ii);

the pyramid feature extraction network is a dual-stream network in which convolutional layer weights are shared between two streams, each acting as feature descriptor extraction features, changing the coarse match result graph C _Ii into a pyramid of multi-scale high-dimensional features { F _k(C_Ii) }, from highest spatial resolution (k=1) to lowest spatial resolution (k=l), k representing pyramid levels;

step 32: inputting a plurality of layers of feature pyramids into a registration flow estimation network, estimating to obtain a fine registration flow field, and inputting the last layer of feature pyramids F ₃(C_Ii) into a mask estimation network to obtain a mask;

The registration flow estimation network comprises a plurality of cascade registration flow reasoning networks and a regularization process; the cascade registration flow reasoning network performs flow reasoning, an optical flow field is randomly initialized to serve as an initial optical flow field, and some priori information, such as known camera movement leftwards, can be added in the initial optical flow field, and the optical flow field is provided with a leftwards movement vector; then the final hierarchical cascade reasoning module (third hierarchical cascade reasoning module) is matched with the target image pyramid feature map F ₃(C_I2 to obtain a current level registration flow field, and the current level registration flow field is input into a next level registration flow reasoning module so as to realize cascade reasoning;

The specific process of each level is as follows:

Step 321: each level k applies the registration flow field obtained from the previous level to the image feature map F _k(C_I1 to be registered of the feature pyramid level of the corresponding level to obtain a further registered feature map Then, the pixel level point matching rate is calculated with the target image feature map F _k(C_I2) of the current level, and a convolution layer is input to obtain a pixel level residual optical flow field/>The point matching rate calculation formula is:

Further registration of feature maps by computing the kth layer The cosine distance between the point (a, b) and the point (m, n) of the target image feature map F _k(C_I2) obtains a point matching rate C;

step 322: overlapping and reapplying the residual optical flow field of the pixel level and the registration flow field obtained from the previous level to the image feature map to be registered to obtain a further registration feature map;

step 323: registering the feature image and the target image feature image further, inputting the image into a convolution layer, and obtaining a sub-pixel level residual flow field by minimizing the feature space distance between the feature image and the target image feature image further The registration flow field obtained from the kth layer can be input into the next layer for cascading after regularization, and the fine registration flow field is obtained after cascading of all the layers;

the registration flow field obtained for each level is expressed as:

Wherein, Registration flow field obtained for the present layer,/>Is the registration flow field after the upper layer is up-sampled,/>A residual optical flow field at the pixel level obtained in the step 321; /(I)A sub-pixel level residual flow field obtained in step 323;

step 33: registering the rough matching result graph with the target image according to the fine registration flow field to obtain a registration result;

step 34: performing self-supervision training on a fine matching optical flow reasoning network formed by an optimization pyramid feature extraction network and a cascade registration flow estimation network based on the loss function to obtain an optimized fine matching optical flow reasoning network;

The specific process for self-supervision training optimization of the fine matching optical flow reasoning network is as follows:

Step 341: the loss function comprises three parts, namely, feature space distance between the image feature map F ₁(C_I1) to be registered and the target image feature map F ₁(C_I2) of each level is reduced through feature distortion, feature space is forcedly optimized, input image pairs to be matched are not directly optimized, and the influence of transformation such as image luminosity can be effectively weakened;

Based on a fine registration flow field output by a fine matching optical flow reasoning network, the image feature map F ₁(C_I1) to be registered is distorted to F ₁(C_I1), and then a mask containing the formula (5) is formulated Is lost by the triplet:

Step 342: the feature-dependent loss function is used for retaining distortion transformation and the like of learning features, namely, feature extraction is firstly performed and then optical flow transformation is performed, the result finally obtained by the optical flow transformation and then feature extraction is consistent, the effect of sequence is avoided, the feature extraction process is not affected by input, and the feature-dependent loss function of the rough matching result graph C _I1 and the target image C _I2 is expressed as follows:

Where f ()'s represent the pyramid feature extraction process, W ₁₂ ()'s are the process of applying the fine registration flow field to the map, and the coarse matching result map C _I1 and the target image C _I2;

Step 343: mask the mask As the matching degree, since the matching between the two input images is consistent, the optical flows between the two input images are inverse transformation, and the mask/>The matching property of the cyclic coincidence of the position (x, y) in the rough matching result diagram C _I1 and the position (x ', y') in the target image C _I2 is expressed as:

Where M _2→1 is the predicted point match from C _I2 to C _I1; m _1→2 is the predicted point match from C _I1 to C _I2; only when the matching of the corresponding pixels in the rough matching result map and the target image is high, Will be high;

Since the matching between the two input images should be uniform, the optical flow calculated from C _I1 to C _I2 and the optical flow calculated from C _I2 to C _I1 should be inverse transforms to each other, the matching process from C _I1 to C _I2 and from C _I2 to C _I1 should be uniform, with a high matching degree, which can be regarded as pixel-level weights in formulas (3) and (4), the matching degree defining a circularly uniform matching of the position (x, y) in C _I1 and the position (x ', y') in C _I2;

step 344: adding a matching loss function to mask Approaching 1, the expression of the matching loss function of the rough matching result map C _I1 and the target image C _I2 is:

Adding loss L _m to reduce the matching degree, thereby reducing the loss function, enabling the loss function to descend by a gradient descent method, and optimizing a fine matching optical flow reasoning network; in the optimization process, (3) and (4) tend to make the matching degree Approaching zero, thus reducing the loss function, adding loss L _m favors the matching degree/>, in the network optimization process1, The formula is:

step 345: and carrying out weighted addition on the triplet loss, the characteristic dependent loss function and the matching loss function to obtain a total loss function, wherein the expression is as follows:

the total loss function is therefore:

Lambda and mu are hyper-parameters set during training; the loss function is lowered through a gradient descent method, a fine matching optical flow reasoning network is optimized, the fine matching optical flow reasoning network, namely the whole fine alignment network, is learned from random initialization by using an Adam optimizer during training, the learning rate is 2e ^-4, and the momentum item beta ₁,β₂ is set to 0.5,0.999;

12 in the above formula refers to and In (C), both refer to inputting a rough matching result map C _I1 and a target image C _I2, for example:

Step 4: an reasoning process; inputting the rough registration image pair in the step 2 into an optimized fine matching optical flow reasoning network to obtain a fine registration flow field, and registering a rough matching result graph in the rough registration image pair with a target image according to the fine registration flow field to obtain a registration result and a mask;

Performing a rough matching process (i.e. step 2) on an image pair to be registered to obtain a rough matching result image and a target image, then performing a fine matching optical flow reasoning network (i.e. a network after training optimization in step 3) to obtain a fine registration flow field, and registering the rough matching result image and the target image according to the fine registration flow field to obtain a registration result and a mask;

Step 5: and (3) screening the matching points in the step (2) according to a mask, deleting the characteristic correspondence of the corresponding homography matrix local points, returning to the step (2), recalculating the homography matrix until the repeated cycle is repeated n times, and merging the registration results obtained in each repeated cycle to obtain a final result graph. The value range of the cycle number n is 1-3.

Compared with the prior art, the invention discloses a near space remote sensing image registration method based on self-supervision, which is applied to the acquired near space remote sensing image. The defects of a single step are effectively avoided through the complementation of the advantages and disadvantages of the two steps. In the whole, a network framework with two steps is designed, the first step is to input the images to be registered into a convolutional neural network to obtain depth features for feature matching, then RANSAC is used for screening matching feature points, homography matrixes between two image pairs are estimated to obtain coarse registered image pairs, and the convolutional neural network can be a residual network used in a plurality of previous methods, such as Resnet, resnet; and secondly, inputting the rough registered image pairs into a registration flow estimation network, and finally obtaining a registration flow estimation result based on multiple scales through pyramid feature extraction and cascading optical flow inference processes. The registration algorithm is then iterated, letting it find more homography matrix candidates in the first step. In each iteration, the feature correspondence of the points in the previous matrix office is deleted, as well as the positions in the previously predicted matchable mask, and then RANSAC is recalculated again, thus obtaining a better registration result.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic view of the structure provided by the present invention;

FIG. 2 is a schematic diagram of a ResNet-50 backhaul structure provided by the present invention;

FIG. 3 is a schematic diagram of a cascade registration flow reasoning module structure provided by the invention;

FIG. 4 is a schematic image-to-image diagram of an embodiment of the present invention;

FIG. 5 is a schematic view of the rough registration effect provided by the present invention;

Fig. 6 is a schematic diagram of a fine registration effect according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment of the invention discloses a self-supervision-based near space remote sensing image registration method, which comprises the following specific steps:

s1: acquiring a near space remote sensing image to obtain an image pair to be registered, wherein the image pair to be registered comprises a matched image and a target image;

S2: performing feature matching on the image pair to be registered by using a depth convolution neural network, establishing a homography matrix between the image pair to be registered, and performing rough registration according to the homography matrix to obtain a rough registration image pair, wherein the rough registration image pair comprises a rough matching result image C _I1 and a target image C _I2;

s3: the training stage, preprocessing a data set through a coarse matching process (namely S2), then performing self-supervision training optimization on a calculation loss function input into a fine matching optical flow inference network by an image pair to obtain an optimized deep convolution neural network, so that the optimized deep convolution neural network can output a fine registration flow field and a pixel-level point matching degree mask, wherein the registration flow field comprises a point corresponding relation between image pairs and an optical flow between corresponding points;

s4: in the reasoning stage, the image pair to be registered obtains a rough matching result image and a target image through a rough matching process (S2), then a fine matching optical flow reasoning network (namely a pyramid feature extraction network and a cascade registration flow estimation network) is used for obtaining a fine registration flow field, and the rough matching result image and the target image are registered according to the fine registration flow field to obtain a registration result and a mask;

S5: and (3) screening the matching points by using the mask output by the step (S4), deleting the characteristic correspondence of the points in the previous matrix bureau, returning to the step (S2), recalculating the homography matrix, circulating for 1 to 3 times, and merging the obtained result graphs to obtain a final result graph.

Examples

(1) Collecting data;

the characteristics of the images acquired in the near space are greatly dependent on the operating characteristics of the adopted aircraft, and the near space aircraft mainly operated at present mainly has floating air balls, and the United states is developing airships capable of leaving air for a long time. In general, they are mainly shot at fixed points, i.e. the aircraft is fixed at a specific position in the near space, and the target is detected for a longer time span in the region of interest, so that the target is likely to have a larger appearance change. During which imaging presents large tilt problems due to possible overhead constraints. Table 1 shows the relationship between the field angle and pitch angle of the near space remote sensing platforms at different heights when shooting.

TABLE 1 relationship between near space platform pitch angle and field angle

Platform	Pitch angle	Visual field
			20	78.69	30
30	73.3	30
			40	68.2	30
50	63.43	30
			60	60	30
100	45	30

The resolution of a typical visible light sensor is directly dependent on the focal length of the camera and the proportional relationship of the shooting position to the target distance.

(2) Inputting the obtained image pair set as an image I1 to be matched and a target image I2 into a network, obtaining a registration flow field between the two images, obtaining a point X= (X, y) on the image I1 to be matched and a corresponding point X _s＝(x_s,y_s on the target image I2, and an optical flow between the two pointsWherein/>

(3) Rough registration;

Extracting image features in the rough registration process to obtain a feature point pair matched between two image pairs, screening the feature point pair which is mismatched by using RANSAC to obtain candidate homography matrix change, and generating a rough matching result graph. This is done first because of limitations of the optical flow estimation principle itself, and if the large-dip, large-looking transformed image is directly input into the subsequent fine registration network, it is difficult to obtain a result map that matches accurately.

Since the residual network exhibits excellent accuracy and rapidity in detection tasks, we will employ a pre-trained Resnet backbone network as a feature extractor to extract features of the image to be matched and the target image. The overall architecture is shown in fig. 2.

These correspondences were obtained using pre-trained deep features (Conv 4 layer of ResNet-50 networks). Wherein, feature matching is important in different scales, and is particularly important for fine matching of high-resolution remote sensing images. The aspect ratio of each image is fixed in the process and features are extracted in seven dimensions: 0.5, 0.6, 0.88, 1, 1.33, 1.66 and 2. The asymmetrically consistent matches are discarded. The estimated homography matrix is applied to the source image and the results are given with the target image as inputs C _I1 and C _I2 for fine registration.

(4) Thin registration

The fine registration network is composed of two secondary sub-networks, dedicated to pyramid feature extraction and registration flow estimation, as shown in fig. 1 above. The spatial dimension of the feature map decreases during feature extraction and the registration flow spatial dimension increases during registration flow estimation. The pyramid feature extraction network converts any given image pair into two feature pyramids with multi-scale high-dimensional features, making it easy to use the context information. The registration flow estimation network consists of a plurality of cascade registration flow reasoning blocks and regularized blocks, and is used for estimating a flow field from coarse to fine. Under the condition of increasing smaller calculated amount, the network is more robust to scale transformation, and the information fusion of images with different scales such as near space images and satellite remote sensing images is facilitated.

As shown in fig. 1, the pyramid feature extraction network is a dual-stream network in which convolutional layer weights are shared between two streams, each serving as feature descriptor extraction features, changing the coarse matching result graph C _Ii into a pyramid of multi-scale high-dimensional features { F _k(C_Ii) }, from the highest spatial resolution (k=1) to the lowest spatial resolution (k=l), k representing pyramid levels; illustrated in fig. 1 is a three-layer feature pyramid example F ₁(C_Ii)-F₃(C_Ii);

The overall architecture of the cascade registration flow reasoning module is shown in fig. 3, and a three-level cascade schematic diagram is shown.

The optical flow reasoning module integrally has two steps, each level k applies the registration flow obtained by the previous level to the image feature map F _k(C_I1 to be registered of the corresponding feature pyramid level to obtain a further registered feature mapThen the pixel level point matching rate is calculated with the target image feature map F _k(C_I2), and a convolution layer is input to obtain a pixel level residual optical flow field/>Then the optical flow field and the registration flow superposition obtained from the previous level are reapplied to F _k(C_I1 to obtain/>Will beAnd feature map F _k(C_I2),/>Merging inputs into convolutional layer, minimizing/>And F _k(C_I2) to obtain a sub-pixel level registration flow field. The final flow field is:

Wherein, Registration flow obtained for the present layer,/>For the up-sampled registration stream of the previous layer,/>Residual optical flow field of pixel level obtained for the k layer,/>A sub-pixel level residual flow field is obtained for the present k layer. The registration flow obtained by the k layer can be input to the next layer for cascading after regularization, so that the whole fine matching network is formed.

(5) Training optimization fine matching network

The process is realized by self-supervision training of a fine matching deep convolutional neural network.

Calculating gradient feedback of a loss function to perform training optimization, wherein the loss function comprises three parts, firstly, reducing the characteristic space distance between F ₁(C_I1) and F ₁(C_I2) through characteristic distortion, and forcibly optimizing the characteristic space without directly optimizing an input image pair to be matched, so that the influence of transformation such as image luminosity and the like can be effectively weakened;

Based on the fine registration flow field of the network output, the feature map F ₁(C_I1) is distorted to F ₁(C_I1)', and then a mask containing the formula (5) is formulated Is lost by the triplet:

the feature-dependent loss function is then used to preserve the warp-transformation invariance of the learned feature, which means that if we exchange the order of feature warp transformation and feature extraction operations, the feature should be approximately the same:

a loop consistency penalty is also defined such that the matching process from C _I1 to C _I2 and from C _I2 to C _I1 is consistent, all with a high degree of matching, which can be seen as pixel level weights in equations (3) and (4), defined as:

The matching degree in the optimization process (3) and (4) tends to be close to zero so as to reduce the loss function, so the added loss Lm tends to be 1 in the network optimization process, as shown in the formula (6):

the total loss function is therefore:

(6) Iterative acquisition of more candidate homography matrices

In the reasoning stage, the image pair to be registered is subjected to a rough matching process to obtain a rough matching result image and a target image, then a fine matching optical flow reasoning network (namely a pyramid feature extraction network and a cascade registration flow estimation network) is used for obtaining a fine registration flow field, and the rough matching result image and the target image are registered according to the fine registration flow field to obtain a registration result and a mask;

screening matching points by using a mask output by the fine matching network, deleting the characteristic correspondence of the points in the previous matrix bureau, returning to the rough matching process, recalculating the homography matrix, circulating for 1 to 3 times, and merging the obtained result graphs to obtain a final result graph.

(7) Test verification effect

The self-supervision-based near-space remote sensing image registration method is applied to the obtained near-space remote sensing image. The images used in the experimental part are adjacent remote sensing images and partial remote sensing satellite images, so that the algorithm can effectively process images with different resolutions and distortions, and the total number of the images is 1000 pairs of test images.

The evaluation indexes are RMSE, PSNR and SSIM, wherein higher PSNR and SSIM indicate smaller image difference, and lower RMSE indicates higher image similarity. Comparative data for the process of the present invention with other conventional processes are shown in table 2 below.

TABLE 2 comparison of the inventive process with other processes

As can be seen from the table, compared with other methods, the method provided by the invention has better indexes and higher registration accuracy. The image registration effect diagram of the invention is shown in fig. 4-6, wherein fig. 4 (a) is a target image, and fig. 4 (b) is an image to be matched; fig. 5 (a) is a transformed image after coarse registration of an image to be matched, and fig. 5 (b) is a superimposed display effect of the coarse registered image and a target image; FIG. 6 (a) is a transformed image after coarse registration of the images to be matched; fig. 6 (b) is a view showing the superimposed display effect of the fine registered and target images.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The self-supervision-based near space remote sensing image registration method is characterized by comprising the following steps of:

Step 1: acquiring a near space remote sensing image to obtain an image pair to be registered; the image pair to be registered comprises an image to be matched and a target image;

step 2: feature matching is carried out on the image pair to be registered by adopting a pre-trained deep convolutional neural network, so as to obtain matching points, a homography matrix between the image pair to be registered is calculated according to the matching points, rough registration is carried out according to the homography matrix, and a rough registration image pair comprising a rough matching result image and a target image is obtained;

Step 3: acquiring a training data set, and training the fine matching optical flow reasoning network after preprocessing in the step 2 to obtain an optimized fine matching optical flow reasoning network;

Step 31: inputting the rough matching result graph C _I1 and the target image C _I2 into a pyramid feature extraction network to obtain a plurality of layers of feature pyramids;

Step 32: inputting a plurality of layers of feature pyramids into a cascade registration flow estimation network, estimating to obtain a fine registration flow field, and inputting the last layer of feature pyramids into a mask estimation network to obtain a mask;

the cascade registration flow estimation network comprises a plurality of cascade registration flow reasoning networks and a regularization process; the cascade registration flow reasoning network performs optical flow reasoning, and the specific process is as follows:

Step 321: each level k upsamples the registration flow field obtained from the previous level and then applies the upsampled registration flow field to the image feature map F _k(C_I1 to be registered of the feature pyramid corresponding to the level k to obtain a further registration feature map Then, calculating the matching rate of pixel level points with the target image feature map F _k(C_I2) of the current level, inputting the corresponding condition of the evaluation feature points of the convolution layer, screening points with high matching degree as corresponding points, and obtaining the residual optical flow field/>, of the pixel levelThe point matching rate calculation formula is:

step 323: registering the feature image and the target image feature image further, inputting the image into a convolution layer, and obtaining a sub-pixel level residual flow field by minimizing the feature space distance between the feature image and the target image feature image further The k-th registration flow field is input to the next layer for cascade connection after regularization, and the thin registration flow field is obtained after all the cascade connection of the layers are finished;

the registration flow field obtained for each level is expressed as:

Wherein, A registration flow field obtained for the present layer; /(I)Registering a flow field after upsampling of the upper layer; /(I)A residual light flow field at the pixel level; /(I)Is a sub-pixel level residual flow field;

Step 4: inputting the rough registration image pair in the step 2 into an optimized fine matching optical flow reasoning network to obtain a fine registration flow field, and registering a rough matching result graph in the rough registration image pair with a target image according to the fine registration flow field to obtain a registration result and a mask;

step 5: and (3) screening the matching points in the step (2) according to a mask, deleting the characteristic correspondence of the corresponding homography matrix local points, returning to the step (2), recalculating the homography matrix until the repeated cycle is repeated n times, and merging the registration results obtained in each repeated cycle to obtain a final result graph.

2. The method for registering the near space remote sensing image based on the self-supervision according to claim 1, wherein the specific implementation process of the step2 is as follows:

Step 21: extracting image features of the image pairs to be registered by adopting a pre-trained deep convolutional neural network, and obtaining matched feature point pairs between the image pairs to be registered by utilizing cosine similarity;

step 23: and performing coarse registration on the image pair to be registered according to the homography matrix to generate a coarse matching result graph.

3. The method for registering the near space remote sensing image based on the self-supervision as set forth in claim 1, wherein the specific implementation process of the self-supervision training optimization by the fine matching optical flow reasoning network in the step 3 is as follows:

Step 341: the loss function comprises three parts, namely firstly, feature space distance between the image feature map to be registered and the target image feature map of each level is reduced through feature distortion;

based on a fine registration flow field output by a fine matching optical flow reasoning network, the image feature map F ₁(C_I1) to be registered is distorted to F ₁(C_I1)', and a mask is arranged Triplet loss representing coarse match result graph C _I1 and target image C _I2:

Step 342: the feature-dependent loss function used to preserve the distortion transformation isomorphism of the learned features, the rough matching result map C _I1 and the target image C _I2 are expressed as:

Where f (-) represents a pyramid feature extraction process, W ₁₂ (-) is a process of applying a fine registration flow field to the map;

Where M _2→1 is the predicted point match from C _I2 to C _I1; m _1→2 is the predicted point match from C _I1 to C _I2;

Lambda and mu are hyper-parameters set during training; and (3) lowering the loss function through a gradient descent method, optimizing the fine matching optical flow inference network, and training the fine matching optical flow inference network to obtain the optimized fine matching optical flow inference network by learning from random initialization by using an Adam optimizer.