CN111028277B - SAR and optical remote sensing image registration method based on pseudo-twin convolution neural network - Google Patents

SAR and optical remote sensing image registration method based on pseudo-twin convolution neural network Download PDF

Info

Publication number
CN111028277B
CN111028277B CN201911256966.0A CN201911256966A CN111028277B CN 111028277 B CN111028277 B CN 111028277B CN 201911256966 A CN201911256966 A CN 201911256966A CN 111028277 B CN111028277 B CN 111028277B
Authority
CN
China
Prior art keywords
pseudo
sar
twin
neural network
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911256966.0A
Other languages
Chinese (zh)
Other versions
CN111028277A (en
Inventor
帅通
董喆
孙建国
田左
关键
林尤添
田野
袁野
刘加贝
肖飞扬
尹晗琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
CETC 54 Research Institute
Original Assignee
Harbin Engineering University
CETC 54 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University, CETC 54 Research Institute filed Critical Harbin Engineering University
Priority to CN201911256966.0A priority Critical patent/CN111028277B/en
Publication of CN111028277A publication Critical patent/CN111028277A/en
Application granted granted Critical
Publication of CN111028277B publication Critical patent/CN111028277B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • G06T2207/10044Radar image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20164Salient point detection; Corner detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a pseudo-twin convolutional neural network-based SAR and optical remote sensing image registration method in the technical field of remote sensing image registration, which comprises the steps of collecting and matching characteristic image blocks, then removing abnormal points and finally registering, adopting a strategy of maximizing the characteristic distance between a positive sample and a difficult-to-negative sample, defining a new loss function to train the network, and connecting two branches of a pseudo-twin network through convolution operation to obtain a similarity score between two input image blocks; according to the invention, by providing the pseudo-twin convolutional neural network system structure, the left branch and the right branch of the pseudo-twin network can be respectively input into optical and SAR remote sensing images with different sizes, and the task of identifying corresponding image blocks in the optical and SAR remote sensing images under extremely high resolution can be solved.

Description

SAR and optical remote sensing image registration method based on pseudo-twin convolutional neural network
Technical Field
The invention relates to the technical field of remote sensing image registration, in particular to a pseudo twin convolutional neural network-based SAR and optical remote sensing image registration method.
Background
Earth monitoring by Remote Sensing (RS) has been widely used both military and civilian in recent decades. The multimode remote sensing image contains a lot of complementary information, which is beneficial to a lot of remote sensing applications. For this reason, image registration is a common requirement to utilize multimodal images. However, due to differences in imaging mechanisms, multimodal image registration is more challenging than general image registration, especially for optical and Synthetic Aperture Radar (SAR) images. Finding common features in optical and SAR images is very difficult due to differences in imaging mechanisms. And as the spatial resolution increases, the existing gap in geometry and radiometric differences between two different sensors further expands.
There are two key steps to the remote sensing image registration. The first step is to construct a set of Corresponding Points (CPs) that are as evenly distributed as possible across the image. The second step is to estimate a spatial transformation (e.g., affine transformation or projective transformation) based on the corresponding points. Image registration can be divided into two categories-feature-based methods and region-based methods-due to the different ways in which the construction assumes corresponding points.
For the feature-based approach, we assume that the corresponding points of the set match if there is the most similar structure around the corresponding points. First, candidate feature points in which the surrounding structure is prominent are detected from the main image and the slave image. Feature descriptors are generated using local structural information (mainly gradient information). The point pair having the smallest descriptor distance is regarded as the corresponding point pair. Many hand-made features (e.g., SIFT, SURF, ORB) are designed for point matching, but most cannot be applied in multi-modal image registration.
The region-based approach first generates a set of candidate feature points from the main image. For each feature point, it tries to find a correspondence on the slave image within the local search window. We define the metric for locating the corresponding relationship by the similarity between the local intensities. Therefore, the definition of the similarity measure is essential for the region-based approach. Normalized Cross-correlation (NCC) and Mutual Information (MI) are two baseline similarity measures. NCC metrics are mainly used for optical image registration, but always fail in multi-modal image registration. In contrast, MI is more robust to complex radiation variations and is widely used for multi-modality image registration.
The ideal feature of multimodal image registration should be unique and robust to various non-linear metric changes caused by different imaging conditions. Handmade features or MI are not sufficient to describe highly non-linear relationships. It is well known that the advent of convolutional neural networks has revolutionized almost all computer vision problems. In the field of image registration, learning-based features have attracted a great deal of attention. Learned features perform better than hand-written descriptors on some visual tasks, but they have not yet demonstrated overwhelming advantages, and classical local image block detectors and descriptors still provide very competitive registration results. One reason for this is that restating the image registration task as a distinguishable end-to-end process is very challenging. In addition, in the aspect of remote sensing image registration, the local image block data set currently used for training is not large enough and diversified, and does not allow learning of high-quality and widely-applicable descriptors.
Depth features are also used more in the field of remote sensing. However, to date, most deep learning related research has focused on classification and detection related tasks in different remote sensing fields. For remote sensing image registration, deep learning has just succeeded. Currently, almost all remote sensing images are geocoded with latitude and longitude. Theoretically, the correctly geocoded remote sensing images are registered, but inaccurate measurement of the spatial attitude angle often causes geographic positioning errors, and further optimization is needed. Based on the method, the SAR and the optical remote sensing image registration method based on the pseudo-twin convolutional neural network are designed to solve the problems.
Disclosure of Invention
The invention aims to provide a pseudo-twin convolutional neural network-based SAR and optical remote sensing image registration method, so as to solve the problems of poor registration effect and low precision of the SAR and the optical image provided in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: a pseudo-twin convolutional neural network-based SAR and optical remote sensing image registration method comprises an SAR and an optical remote sensing image, wherein the SAR and the optical remote sensing image are registered, and the registration comprises the following specific steps:
first step, feature image block acquisition and matching
Step1, carrying out block-based Harris characteristic point detection on the main image, and extracting local candidate image blocks around the Harris characteristic points from the main image;
step2, extracting larger local search image blocks from the images around the same geographical position of the main Harris feature points;
step3, the local candidate image blocks and the corresponding local search image blocks are transmitted as the input of a pseudo-twin convolution network;
second, outlier removal and final registration
Step4, using the similarity score output by the pseudo-twin convolutional neural network as an index of matching confidence, and using the similarity score as a unique index to judge whether an assumed corresponding point is an abnormal value, namely judging whether the CP is an abnormal value, wherein the higher the score of the CP is, the higher the possibility that the CP is a correct value is;
step5, outlier removal is performed by removing all assumed CPs for which the score value is less than the given threshold Tscore.
Preferably, a pseudo-twin convolutional neural network model is constructed, the pseudo-twin convolutional neural network adopts a strategy of maximizing a characteristic distance between a positive sample and a difficult-to-negative sample, and the specific steps are as follows:
firstly, extracting the features of the SAR and the optical image block through the convolution layer, and using a convolution filter with a receptive field of 3 multiplied by 3 in a pseudo-twin convolution neural network, wherein the 3 multiplied by 3 convolution filter is the minimum kernel for capturing patterns in different directions, and then the practical small convolution filter can increase the nonlinearity in the network, so that the network has more discriminative power;
a second step, the pseudo-twin convolutional neural network having two separate but same convolutional streams for processing very different geometric and radiative appearances of SAR and optical images, processing SAR tiles and optical tiles in parallel, and fusing result information only at a later decision stage, the SAR tiles and optical tiles feature-extracting by convolutional layers, the two separate convolutional flow networks comprising 8 convolutional layers and 3 maximum pooling layers, the convolutional layers input space-filling such that spatial resolution is maintained after convolution, using a convolutional filter with a 3 × 3 receptive field, the pseudo-twin convolutional neural network in which the convolution step is fixed to one pixel, all 3 × 3 convolutional layers in the pseudo-twin convolutional neural network are filled to 1 pixel, the space pooling is performed by performing 7 maximum pooling layers, the pooling layers after convolutional layers for reducing the dimension of the feature map, the maximum pooling is performed on a 2 × 2 pixel window with a step size of 2.
Preferably, the fusion phase of the pseudo-twin convolutional neural network comprises two successive convolutional layers followed by two fully connected layers, the convolutional layers consisting of 3 × 3 filters for operating on SAR and optical cascaded feature maps in order to learn the fusion rules that minimize the final loss function, maximum pooling is omitted after the fusion phase after the first convolutional layer, and step size is set to 2 in order to downsample the feature map while preserving spatial information, 3 × 3 filters are used, and there is no maximum pooling after the first convolution, step size is made 2 to reduce the feature size, the last phase of the fusion network consists of two fully connected layers, the first layer contains 512 channels and the second layer performs a one-hot binary classification, containing 2 channels.
Preferably, in order to improve the identifiability of the pseudo-twin convolutional neural network model, a loss function is defined in a manner of maximizing a feature distance between a positive sample and a difficult-to-negative sample, the designed loss function is designed to make a difference between a correctly matched feature and a nearly unmatched feature as large as possible, a score map is obtained by convolution operation between a con-8 output feature pair, in the score map, a response of a correctly matched position is expressed as,
y hns =max k (y ns )
the goal of the training is to maximize the distance between the average response values of the positive and difficult negative matches, i.e.
Figure BDA0002310534310000041
But if it is directly going to
Figure BDA0002310534310000051
The loss function can lead to unstable training results, the optimization process is very sensitive to learning rate for different training data sets, and logic operation is adopted to obtain smoother loss function
f logi (y)=log(1+exp(-y))
And the loss function L is defined as follows
Figure BDA0002310534310000052
Wherein N is ps And N hns The number of positive samples and the number of hard negative samples respectively;
for network training, the ground truth map is defined as
Figure BDA0002310534310000053
Wherein u = (x, y) is a two-dimensional coordinate of an arbitrary position on the score map, and u = (x, y) is a two-dimensional coordinate of an arbitrary position on the score map 0 =(x 0 ,y 0 ) Is the coordinate of the center position, r is the effective L1 distance, if the positions u to u 0 Is less than r, the position u is assumed to be positive and, therefore, the loss function can be rewritten as
Figure BDA0002310534310000054
Can also be written as
Figure BDA0002310534310000055
Preferably, a semi-manual training data set sorting method is established to obtain a data set with higher matching precision, and it is essential to collect co-registered small image blocks for training the pseudo-twin network, and the semi-manual training set sorting method is as follows:
a first step of rough registration of the image pair by manually selecting four corresponding points CP at the four corners of the image;
secondly, detecting candidate characteristic points of the main image by adopting a Harris angular point detector;
thirdly, searching high-confidence corresponding points on the secondary images;
step four, registering the large Geotiff;
and fifthly, acquiring more corresponding points CP from the registered large Geotiff image pair.
Preferably, the local candidate image blocks are M × M pixels, and the local search image blocks are [ M + s ] × M + s ] pixels.
Compared with the prior art, the invention has the beneficial effects that: according to the invention, by providing the pseudo-twin convolutional neural network system structure, the left branch and the right branch of the pseudo-twin network can be respectively input into optical and SAR remote sensing images with different sizes, and the task of identifying corresponding image blocks in the optical and SAR remote sensing images under extremely high resolution can be solved. Unlike the twin structure used in CNNs traditionally used for image registration, with a pseudo-twin structure, there is no interconnection between the two data streams of SAR and optical images, the network is trained using automatically generated training data, and does not resort to any manually made features. Secondly, a new loss function is defined, similarity measurement is defined as convolution calculation in two outputs of the pseudo-twin network, and L2 distance or point generation is not performed, and the performance of the designed network can be obviously improved through a new difficult-to-case mining strategy.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a diagram of a pseudo-twin convolutional neural network framework of the present invention.
FIG. 2 is a diagram of a pseudo-twin convolutional neural network layer configuration in accordance with the present invention.
FIG. 3 is a loss function construction diagram of the present invention.
Fig. 4 is an overall roadmap of SAR and optical image registration according to the present invention.
FIG. 5 is a diagram of RMSE-i for SAR-optics of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-5, the present invention provides a technical solution: an SAR and optical remote sensing image registration method based on a pseudo-twin convolution neural network,
(1) A pseudo-twin convolutional network model was constructed as shown in figure 1.
In order to deal with the very different geometric and radiation appearances of SAR and optical remote sensing images, a pseudo-twin network architecture with two separate but identical convolution streams is proposed, which can process SAR image blocks and optical image blocks in parallel and only fuse the result information at a later decision phase.
And carrying out feature extraction on the SAR and the optical image block through the convolution layer. In a network, a convolution filter with a 3 x 3 field is used because the 3 x 3 convolution filter is the smallest kernel that can capture different directional patterns, and secondly the use of a small convolution filter can increase the non-linearity inside the network, making the network more discriminative.
The convolution step length in the network is fixed to be one pixel; the spatial filling of the convolutional layer input is such that the spatial resolution is preserved after convolution, i.e. the filling is 1 pixel for all 3 x 3 convolutional layers in the network. Spatial pooling is achieved by performing 7 maximum pooling layers, which are used to reduce the dimensionality of the feature map after the convolutional layers. The maximum pooling is performed over a 2 x 2 window of pixels with a step size of 2. The layer configuration of a particular network model is shown in fig. 2.
The fusion phase of the network consists of two successive convolutional layers followed by two fully-connected layers. The convolutional layer consists of 3 x 3 filters that operate on the SAR and optical cascaded feature maps to learn the fusion rule that minimizes the final loss function. In the fusion stage, maximum pooling is omitted after the first convolution layer, and a step size of 2 is set to downsample the feature map while preserving spatial information. The use of a 3 x 3 filter and no pooling of maxima after the first convolution enables the fusion layer to learn the fusion rule, which has some invariance to spatial mismatch due to differences in imaging modalities. This is because the fusion layer learns the relationship between features using a 3 × 3 convolution while preserving nearby spatial information. The lack of maximum pooling means that these learned spatial relationships are preserved, not only taking into account the maximum response, but also letting step size be 2 to reduce the feature size. The final phase of the converged network consists of two fully connected layers: the first layer contains 512 channels; and the second layer performs one-hot binary classification, containing 2 channels.
In summary, the convolutional layers in the network, except the fusion layer, are usually composed of 3 × 3 filters and follow two rules:
1) Layers with the same feature map size have the same number of filters.
2) The number of feature maps in deeper layers increases, approximately doubling after each maximum pooling layer (except for the last convolution stack in each stream). Except that the last fully connected layer in the network takes Softmax as an activation function, the rest layers take ReLU as the activation function.
(2) A new loss function is proposed for training the neural network to improve the model identifiability.
The designed loss function is intended to show as much as possible the difference between the correctly matched features and the nearly mismatched features, and a graphical representation of the loss function construction is shown in fig. 3. The score map is obtained by a convolution operation between the con-8 output feature pairs. In the score map, the response of the correct match location is represented as:
y hns =max k (y ns )
the goal of the training is to maximize the distance between the mean response values of the positive and difficult-to-negative matches, i.e.
Figure BDA0002310534310000081
But if it is directly going to
Figure BDA0002310534310000082
The loss function can lead to unstable training results, the optimization process is very sensitive to learning rate for different training data sets, and logic operation is adopted to obtain smoother loss function
f logi (y)=log(1+exp(-y))
And the loss function L is defined as follows
Figure BDA0002310534310000083
Wherein N is ps And N hns The number of positive samples and the number of hard negative samples respectively;
for network training, a ground truth map is defined as
Figure BDA0002310534310000091
Where u = (x, y) is a two-dimensional coordinate of an arbitrary position on the score map, and u is 0 =(x 0 ,y 0 ) Is the coordinate of the center position, r is the effective L1 distance, if the positions u to u 0 Is less than r, the position u is assumed to be positive and, therefore, the loss function can be rewritten as
Figure BDA0002310534310000092
Can also be written as
Figure BDA0002310534310000093
During the training process, master and slave image patches are drawn around each pair of corresponding points on the optical SAR image pair. That is, for pairs of characteristic points { p } m ,p s At p, respectively m And p s Has a center extraction size of N m *N m Main image block and size N s *N s Is selected from the image blocks. For the corresponding search, I s Range greater than I m Mean N s >N m . The size of the score map y is (N) by convolving the main image block with the conv-8 output from the search image block s -N m +1)*(N s -N m +1). Theoretically there is only one correctly matched location, centered in the final score map y, but experiments have found that better results can be obtained when the network tolerates small displacements during training. Thus, as shown in fig. 3, the effective distance r is set to 2, where the red, brown and yellow positions are all considered positive.
During the test, the master block and the larger slave search block are sent to the network to obtain a score graph y. The maximum response position on y may be located as u max 。u max And u 0 An offset u between max -u 0 Exactly the offset between the master and slave corresponding feature points. Therefore, for each feature point on the master image, it can be positioned in the slave imageThe corresponding relation of (3).
(3) A semi-manual training data set sorting method is established to obtain a data set with higher matching precision.
The network was trained and tested using Geotiff image pairs, optical images were extracted from Google Earth historical images, and SAR images were from TerraSAR-X.
The method for collecting the co-registered small image blocks is essential for training the pseudo-twin network, and the semi-manual training set finishing method mainly comprises the following five steps:
(1) each pair of large Geotiff images is roughly registered: the image pair is roughly registered by manually selecting four Corresponding Points (CP) at the four corners of the image. The slave image is warped under affine transformation. The coarse registration process is to ensure that each CP on the master and slave images has a limited offset, thereby reducing the time overhead of performing the CP search step.
(2) Detecting candidate feature points of the main image: and detecting candidate characteristic points by adopting a Harris corner detector. Specifically, block-based feature points uniformly distributed on a large geotif main image are extracted. The optical image is considered to be the main image of the optical-SAR image pair. Considering that the texture of the map image is very low, harris candidate feature points are extracted from the map image.
(3) Searching for high confidence correspondence points from the image: the two methods of MI and HOPC are combined to obtain a high confidence match result. First, a local image block of size 101 × 101 pixels is drawn around each candidate dominant feature point, and then a corresponding search block of size 121 × 121 pixels is drawn from the secondary images that are roughly matched around the same position. And respectively carrying out MI and HOPC methods on the main and auxiliary image block pairs to obtain the matching results of the CPMI and the CPHOPC.
For each Corresponding Point (CP), if | | | CP MI -CP HOPC || 2 Below 1.5 pixels, it is considered a high confidence match. They were collected at CPMI = = HOPC. Then, within the set CPMI = = HOPC, the matching points that are actually mismatched are manually eliminated by visual inspection. If the number of high confidence matches is not sufficient for fine image registration, the remaining CPMI and CPH are manually subtracted from the remaining CPMIAnd selecting a new effective matched set in the OPC. Thus, a high-confidence matching feature point set CPconfi is obtained.
(4) Registration of large Geotiff: registering the large geotif image pair with a Piecewise Linear (PL) transform based on the CPconfi obtained in step (3), the transform using triangulation to divide the image into triangular regions and affine transformation to map each triangular region in the image to a corresponding region on the main image. Since the topographical variations of each image pair to be registered are non-linear and unpredictable, PL is more robust than other higher order geometric transformation models. Thus, large Geotiff image pairs are registered with high accuracy.
(5) More Corresponding Points (CP) are acquired from the registered large geotif image pair: the CPconfi data set collected in step (3) is not sufficient for network training. Since the large geotif has been accurately registered in step (4), any pair of feature points extracted from the same positions of the master image and the warped slave image can be considered to be CPs. In this step, more Harris feature points are collected on the master image and their corresponding positions are positioned on the same position of the registered slave image. And finally obtaining a final CPs set. Training patch pairs are drawn around the final CPs, which are not only used for training patch collection, but also as checkpoints for network validation.
Performing SAR and optical image registration:
SAR and optical remote sensing image registration comprise the following two main steps, and a registration flow chart is shown in FIG. 4.
(1) Collecting and matching characteristic image blocks: due to local deformation of the remote sensing image, uniform distribution of the CPs is an important factor for calculating an accurate transformation formula. Thus, similar to the training data set management process, a block-based Harris feature point detector is performed on the master image. Local candidate image blocks (M × M pixels) around the Harris feature point are extracted from the main image. Since the offset between geocoded image pairs is not significant, a larger local search patch [ (M + s) × (M + s) pixels ] is extracted from the image around the same geographic location of the master Harris feature points. The candidate tiles and the corresponding search tiles are taken as inputs to the pseudo-twin network. On the output score map, the offset between the maximum score position and its center position is taken as the offset between the geographic positions of the corresponding master and slave feature points.
(2) Outlier removal and final matching: in the method, a similarity score output by a network is used as an index of matching confidence, and the score value is used as a unique index to determine whether a CP is an abnormal value. Clearly, a higher CP score indicates a greater likelihood that the CP is correct. Outlier removal is performed by removing all hypothetical CPs whose fractional values are less than a given threshold Tscore.
A method for automatically determining the Tscore value is devised. Assume that there are N hypothetical CPs { cp 1.,. Cpi.,. CpN }, sorted in descending order of score. For each i>The first i-estimated CPs are used to estimate the projective transformation between them, and then the Root Mean Square Error (RMSE) of these i-estimated CPs is calculated. Then, a graph of RMSE versus i is obtained. A third order polynomial curve R (I) was fitted to the RMSE-I plot as shown in FIG. 5. Mixing min (abs: (b))
Figure BDA0002310534310000121
(i) ) is IP, wherein
Figure BDA0002310534310000122
(i) The first derivative of R (i) is indicated. The score of cpip is assumed to be the threshold Tscore and the first iP CPs is considered to be an internal value. min (abs: (
Figure BDA0002310534310000123
(i) In) indicates that its impact on RMSE is minimal when nearby CPs are added or removed, which means that the transformation determined by the first iP CPs is relatively stable.
The final registration is performed using the filtered CPs by warping the slave image onto the master image. The choice of transformation model depends on the particular conditions of the image pair to be registered. For low or medium resolution images, affine or projective transformations are usually sufficient. But for high resolution images, the PL transform can achieve better registration results.
In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (6)

1. A pseudo-twin convolutional neural network-based SAR and optical remote sensing image registration method comprises an SAR and an optical remote sensing image and is characterized in that the SAR and the optical remote sensing image are registered, and the registration comprises the following specific steps:
first step, feature image block acquisition and matching
Step1, carrying out block-based Harris characteristic point detection on the main image, and extracting local candidate image blocks around the Harris characteristic points from the main image;
step2, extracting larger local search image blocks from the image around the same geographical position of the main Harris feature point;
step3, the local candidate image blocks and the corresponding local search image blocks are transmitted as the input of a pseudo-twin convolution network;
second, outlier removal and final registration
Step4, using the similarity score output by the pseudo-twin convolutional neural network as an index of matching confidence, and using the similarity score as a unique index to judge whether an assumed corresponding point is an abnormal value, namely judging whether the CP is an abnormal value, wherein the higher the score of the CP is, the higher the possibility that the CP is a correct value is;
step5, outlier removal is performed by removing all hypothetical CPs whose fractional value is less than a given threshold Tscore.
2. The SAR and optical remote sensing image registration method based on the pseudo-twin convolutional neural network as claimed in claim 1, characterized in that a pseudo-twin convolutional neural network model is constructed, the pseudo-twin convolutional neural network adopts a strategy of maximizing a characteristic distance between a positive sample and a difficult negative sample, and the specific steps are as follows:
firstly, extracting the features of the SAR and the optical image block through the convolution layer, and using a convolution filter with a receptive field of 3 multiplied by 3 in a pseudo-twin convolution neural network, wherein the 3 multiplied by 3 convolution filter is the minimum kernel for capturing patterns in different directions, and then the practical small convolution filter can increase the nonlinearity in the network, so that the network has more discriminative power;
and a second step, the pseudo-twin convolutional neural network has two separate but same convolutional streams for processing very different geometric and radiation appearances of SAR and optical images, processing SAR tiles and optical tiles in parallel, and fusing result information only at a later decision stage, the SAR tiles and optical tiles perform feature extraction by convolutional layers, the two separate convolutional stream networks include 8 convolutional layers and 3 maximum pooling layers, space filling of convolutional layer input maintains spatial resolution after convolution, a convolutional filter with a 3 × 3 receptive field is used, a convolution step in the pseudo-twin convolutional neural network is fixed to one pixel, all 3 × 3 convolutional layers in the pseudo-twin convolutional neural network are filled to 1 pixel, space pooling is performed by executing 7 maximum pooling layers, the pooling layers are used for reducing the dimension of the feature map after convolutional layers, and maximum pooling is performed on a 2 × 2 pixel window with a step size of 2.
3. The pseudo-twin convolutional neural network based SAR and optical remote sensing image registration method of claim 2, wherein the fusion phase of the pseudo-twin convolutional neural network comprises two consecutive convolutional layers followed by two fully connected layers, the convolutional layers are composed of 3 x 3 filters for operating on the SAR and optical cascaded feature mapping to learn the fusion rule that minimizes the final loss function, maximum pooling is omitted after the fusion phase after the first convolutional layer, and step size is set to 2 to downsample the feature map while preserving spatial information, 3 x 3 filters are used, and there is no maximum pooling after the first convolution, step size is made to 2 to reduce feature size, the last phase of the fusion network is composed of two fully connected layers, the first layer contains 512 channels, and the second layer performs the unique binary classification, contains 2 channels.
4. The SAR and optical remote sensing image registration method based on the pseudo-twin convolutional neural network as claimed in claim 1, characterized in that: in order to improve the identifiability of the pseudo-twin convolutional neural network model, a loss function is defined in a mode of maximizing the characteristic distance between a positive sample and a difficult-to-negative sample, the designed loss function aims to enable the difference between a correctly matched characteristic and an approximate unmatched characteristic to be as large as possible, a score map is obtained through convolution operation between con-8 output characteristic pairs, in the score map, the response of a correctly matched position is expressed as,
y hns =max k (y ns )
the goal of the training is to maximize the distance between the average response values of the positive and difficult negative matches, i.e.
Figure FDA0002310534300000031
But if it is direct will
Figure FDA0002310534300000032
The loss function can lead to unstable training results, the optimization process is very sensitive to learning rate for different training data sets, and logic operation is adopted to obtain smoother loss function
f logi (y)=log(1+exp(-y))
And the loss function L is defined as follows
Figure FDA0002310534300000033
Wherein N is ps And N hns The number of positive samples and the number of hard negative samples respectively;
for network training, the ground truth map is defined as
Figure FDA0002310534300000034
Wherein u = (x, y) is a two-dimensional coordinate of an arbitrary position on the score map, and u = (x, y) is a two-dimensional coordinate of an arbitrary position on the score map 0 =(x 0 ,y 0 ) Is the coordinate of the center position, r is the effective L1 distance, if the positions u to u 0 Is less than r, the position u is assumed to be positive and, therefore, the loss function can be rewritten as
Figure FDA0002310534300000035
Can also be written as
Figure FDA0002310534300000036
5. The SAR and optical remote sensing image registration method based on the pseudo-twin convolutional neural network as claimed in claim 1, characterized in that a semi-manual training data set sorting method is established to obtain a data set with higher matching accuracy, and collecting co-registered small image blocks is essential for training the pseudo-twin network, the semi-manual training set sorting method is as follows:
a first step of rough registration of the image pair by manually selecting four corresponding points CP at the four corners of the image;
secondly, detecting candidate characteristic points of the main image by adopting a Harris angular point detector;
thirdly, searching high-confidence corresponding points on the slave images;
fourthly, registering the large Geotiff;
and fifthly, acquiring more corresponding points CP from the registered large Geotiff image pair.
6. The SAR and optical remote sensing image registration method based on the pseudo-twin convolutional neural network according to claim 1, characterized in that: the local candidate image blocks are M × M pixels, and the local search image blocks are [ M + s ] × [ M + s ] pixels.
CN201911256966.0A 2019-12-10 2019-12-10 SAR and optical remote sensing image registration method based on pseudo-twin convolution neural network Active CN111028277B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911256966.0A CN111028277B (en) 2019-12-10 2019-12-10 SAR and optical remote sensing image registration method based on pseudo-twin convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911256966.0A CN111028277B (en) 2019-12-10 2019-12-10 SAR and optical remote sensing image registration method based on pseudo-twin convolution neural network

Publications (2)

Publication Number Publication Date
CN111028277A CN111028277A (en) 2020-04-17
CN111028277B true CN111028277B (en) 2023-01-10

Family

ID=70208394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911256966.0A Active CN111028277B (en) 2019-12-10 2019-12-10 SAR and optical remote sensing image registration method based on pseudo-twin convolution neural network

Country Status (1)

Country Link
CN (1) CN111028277B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801110B (en) * 2021-02-01 2022-11-01 中车青岛四方车辆研究所有限公司 Target detection method and device for image distortion correction of linear array camera of rail train
CN113066023B (en) * 2021-03-19 2022-12-13 哈尔滨工程大学 SAR image speckle removing method based on self-calibration convolutional neural network
CN113223065B (en) * 2021-03-30 2023-02-03 西南电子技术研究所(中国电子科技集团公司第十研究所) Automatic matching method for SAR satellite image and optical image
CN112989792B (en) * 2021-04-25 2024-04-16 中国人民解放军国防科技大学 Case detection method and electronic equipment
CN113223068B (en) * 2021-05-31 2024-02-02 西安电子科技大学 Multi-mode image registration method and system based on depth global features
CN113538534B (en) * 2021-06-23 2022-05-20 复旦大学 Image registration method based on depth reinforcement learning nano imaging
CN113537379B (en) * 2021-07-27 2024-04-16 沈阳工业大学 Three-dimensional matching method based on CGANs
CN113743515B (en) * 2021-09-08 2022-03-11 感知天下(北京)信息科技有限公司 Remote sensing image feature matching method based on self-supervision and self-learning feature points
CN113838107B (en) * 2021-09-23 2023-12-22 哈尔滨工程大学 Automatic heterogeneous image registration method based on dense connection
CN114387439B (en) * 2022-01-13 2023-09-12 中国电子科技集团公司第五十四研究所 Semantic segmentation network based on optical and PolSAR feature fusion
CN114565653B (en) * 2022-03-02 2023-07-21 哈尔滨工业大学 Heterologous remote sensing image matching method with rotation change and scale difference
CN114359359B (en) * 2022-03-11 2022-07-01 北京化工大学 Multitask optical and SAR remote sensing image registration method, equipment and medium
CN115129917B (en) * 2022-06-06 2024-04-09 武汉大学 optical-SAR remote sensing image cross-modal retrieval method based on modal common characteristics
CN115170979B (en) * 2022-06-30 2023-02-24 国家能源投资集团有限责任公司 Mining area fine land classification method based on multi-source data fusion
CN115574831A (en) * 2022-09-28 2023-01-06 曾丽红 Unmanned aerial vehicle navigation method based on map fusion
CN116701695B (en) * 2023-06-01 2024-01-30 中国石油大学(华东) Image retrieval method and system for cascading corner features and twin network
CN117315433B (en) * 2023-11-30 2024-02-13 中国科学院空天信息创新研究院 Remote sensing multi-mode multi-space functional mapping method based on distribution consistency constraint
CN117649613B (en) * 2024-01-30 2024-04-26 之江实验室 Optical remote sensing image optimization method and device, storage medium and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510532B (en) * 2018-03-30 2022-07-15 西安电子科技大学 Optical and SAR image registration method based on deep convolution GAN
CN109902192B (en) * 2019-01-15 2020-10-23 华南师范大学 Remote sensing image retrieval method, system, equipment and medium based on unsupervised depth regression
CN110414349A (en) * 2019-06-26 2019-11-05 长安大学 Introduce the twin convolutional neural networks face recognition algorithms of sensor model

Also Published As

Publication number Publication date
CN111028277A (en) 2020-04-17

Similar Documents

Publication Publication Date Title
CN111028277B (en) SAR and optical remote sensing image registration method based on pseudo-twin convolution neural network
CN110119438B (en) Airborne LiDAR point cloud filtering method based on active learning
Jiang et al. Multiscale locality and rank preservation for robust feature matching of remote sensing images
CN103337052B (en) Automatic geometric correcting method towards wide cut remote sensing image
CN108376408B (en) Three-dimensional point cloud data rapid weighting registration method based on curvature features
Matkan et al. Road extraction from lidar data using support vector machine classification
CN104077760A (en) Rapid splicing system for aerial photogrammetry and implementing method thereof
CN107909018B (en) Stable multi-mode remote sensing image matching method and system
CN109993800A (en) A kind of detection method of workpiece size, device and storage medium
CN103727930A (en) Edge-matching-based relative pose calibration method of laser range finder and camera
CN105160649A (en) Multi-target tracking method and system based on kernel function unsupervised clustering
CN107862319B (en) Heterogeneous high-light optical image matching error eliminating method based on neighborhood voting
CN110084743B (en) Image splicing and positioning method based on multi-flight-zone initial flight path constraint
CN110569861A (en) Image matching positioning method based on point feature and contour feature fusion
CN105354841A (en) Fast matching method and system for remote sensing images
CN111242000A (en) Road edge detection method combining laser point cloud steering
JP2023530449A (en) Systems and methods for air and ground alignment
Gao et al. Multi-scale PIIFD for registration of multi-source remote sensing images
Jiang et al. Leveraging vocabulary tree for simultaneous match pair selection and guided feature matching of UAV images
Huang et al. SAR and optical images registration using shape context
CN114332172A (en) Improved laser point cloud registration method based on covariance matrix
Lu et al. A lightweight real-time 3D LiDAR SLAM for autonomous vehicles in large-scale urban environment
CN110765993B (en) SEM graph measuring method based on AI algorithm
Jin et al. Registration of UAV images using improved structural shape similarity based on mathematical morphology and phase congruency
CN104700359A (en) Super-resolution reconstruction method of image sequence in different polar axis directions of image plane

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant