CN116363175A

CN116363175A - Polarized SAR image registration method based on attention mechanism

Info

Publication number: CN116363175A
Application number: CN202211653564.6A
Authority: CN
Inventors: 项德良; 丁怀跃; 程建达; 胡粲彬; 孙晓坤
Original assignee: Beijing University of Chemical Technology
Current assignee: Beijing University of Chemical Technology
Priority date: 2022-12-21
Filing date: 2022-12-21
Publication date: 2023-06-30

Abstract

The embodiment of the invention discloses a polarized SAR image registration method based on an attention mechanism, which comprises the following steps: respectively carrying out polarization whitening filtering on two different polarized SAR images to obtain respective filtered images; performing adaptive threshold constraint, morphological corrosion and non-maximum suppression operations on the two filtered images, and determining key points; respectively inputting data blocks with set sizes around each key point in the two polarized SAR images into two branches of a trained twin attention network for depth feature extraction to obtain depth feature descriptors of each key point; matching depth feature descriptors of key points in the two polarized SAR images by using a violence matching method and a random sampling coincidence algorithm to obtain a perspective transformation matrix; and transforming the polarized SAR image into a registered image by utilizing the perspective transformation matrix. The embodiment of the invention improves the registration accuracy and registration efficiency of the polarized SAR image.

Description

Polarized SAR image registration method based on attention mechanism

Technical Field

The embodiment of the invention relates to the technical field of polarized SAR image registration, in particular to a polarized SAR image registration method based on an attention mechanism.

Background

Synthetic aperture radar (Synthetic Aperture Radar, SAR) is an active microwave sensor with a certain penetration capability, which can acquire ground information all the time and all the weather without being interfered by weather conditions. The polarized synthetic aperture radar (Polarimetric Synthetic Aperture Radar, polarized SAR) can utilize the SAR complex images in different polarized channels to analyze the target properties such as the shape, structure, direction and the like of the target due to different polarized working modes.

However, in the process of polarized SAR imaging, the interference of various internal and external factors, such as system factors, environmental factors and the like, can limit the acquisition of multi-phase polarized SAR images in the same area, so that the differences in resolution, angle and the like occur, and influence on subsequent processing and application, so that two or more polarized SAR images with the differences are required to be registered, the differences generated by the factors such as angle, resolution, sensors and the like are eliminated, and preprocessing work is performed for subsequent application.

The existing polarized SAR image registration method is generally divided into the following three types: region-based registration methods, feature-based registration methods, and depth-learning registration methods. At present, most of the registration methods based on features utilize amplitude graphs to determine key points of images and construct feature descriptors, and polarization information of polarization data is not fully utilized, so that the problems of insignificant and uneven selection of key points, poor registration timeliness, low registration precision and the like are caused. Other related prior art are: CN110458876a "multi-temporal phase POLSAR image registration method based on SAR-SIFT features", and CN102520407a "polarization interference SAR image registration method based on optimal linearity of coherence coefficient".

Disclosure of Invention

The embodiment of the invention provides a polarized SAR image registration method, equipment and medium based on an attention mechanism, which adopt self-adaptive constraint, morphological corrosion and non-maximum suppression to realize quick point selection on edge data after polarized whitening filtering treatment, adopt depth feature descriptors to match, and improve the registration efficiency and registration precision of polarized SAR images.

In a first aspect, an embodiment of the present invention provides a polarized SAR image registration method based on an attention mechanism, including:

respectively carrying out polarization whitening filtering on two different polarized SAR images in the same scene to obtain respective filtered images, wherein the two polarized SAR images are C3 data, and the filtered images comprise edges and texture information of the polarized SAR images;

performing adaptive threshold constraint, morphological corrosion and non-maximum suppression operations on the two filtered images, and determining key points with significant features in the two polarized SAR images;

respectively inputting data blocks with set sizes around each key point in the two polarized SAR images, and carrying out depth feature extraction on two branches of a trained twin attention network to obtain depth feature descriptors of each key point, wherein the trained twin attention network can enable the depth feature descriptors of the matched key points to be consistent;

Matching depth feature descriptors of key points in the two polarized SAR images by using a violence matching method and a random sampling coincidence algorithm to obtain a perspective transformation matrix from one polarized SAR image to the other polarized SAR image;

and transforming the one polarized SAR image into a registered image consistent with the coordinates of the other polarized SAR image by utilizing the perspective transformation matrix.

In a second aspect, an embodiment of the present invention provides an electronic device, including:

one or more processors;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement any of the attention-based polarized SAR image registration methods.

In a third aspect, embodiments of the present invention also provide a computer-readable storage medium, including:

a memory storing one or more computer programs;

when the stored program is executed by the processor, any polarized SAR image registration method based on the attention mechanism is realized.

The technical effects of the embodiment of the invention are as follows:

1. according to the embodiment, the polarized SAR image is subjected to polarized whitening filtering treatment to obtain edge data after image denoising, threshold constraint is carried out on the edge data, the threshold is generated through self-adaptive calculation, the edge data after constraint is obtained, morphological corrosion and non-maximum suppression are further carried out on the edge data, and finally selected key points are obtained. The point selecting mode reduces the interference of speckle noise on one hand, so that the acquired edge data is more reliable; on the other hand, the speed of selecting points is very fast, most of the selected key points are positioned in the edge and texture areas of the image, and the number of the selected points can be controlled by a non-maximum value inhibition method to flexibly process.

2. In the embodiment, the depth feature extraction of the data block and the construction of the depth feature descriptor are performed by utilizing the twin attention network, training parameters are shared between two branches of the twin attention network, and training efficiency and training quality are improved; the network input data are C3 data of the polarized SAR image, ground object target scattering information of the polarized SAR image data is fully utilized, compared with training by only adopting image amplitude information, scattering characteristics of the ground object target can be fully learned, the method is more suitable for learning of depth characteristics of the polarized SAR image, meanwhile, the depth characteristics of the polarized SAR image can be better extracted, and the constructed depth characteristic descriptor is more robust. Meanwhile, the key point is taken as the center, and the image blocks with the surrounding size of 32 multiplied by 32 are selected and used for constructing the feature descriptors of the points, so that the registration performance can be further improved without inputting images with fixed sizes.

3. By combining the measures, the registration efficiency and registration accuracy of the polarized SAR image can be effectively improved.

Drawings

Fig. 1 is a flowchart of a polarized SAR image registration method based on an attention mechanism according to an embodiment of the present invention;

fig. 2 is a flowchart of a polarized SAR image registration method based on an attention mechanism according to an embodiment of the present invention;

FIG. 3 is a flow chart of key point selection provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram of a channel attention module according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a spatial attention module according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the invention, are within the scope of the invention.

In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should also be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

Fig. 1 is a flowchart of a polarized SAR image registration method based on an attention mechanism according to an embodiment of the present invention. The method is realized based on the deep learning twin attention network, and in order to explain the method, a twin attention network model supporting the realization of the method is preferentially described. As shown in fig. 1, the structure of the twinning attention network is shown within the middle largest dashed box. It can be seen that the network comprises two branches of identical structure, with weights shared between the branches. Each branch comprises a first convolution module, a first mixed attention module, a second convolution module, a second mixed attention module and a third convolution module. The first convolution module is used for extracting characteristics of an input data block to obtain a characteristic diagram; the first mixed attention module is used for carrying out feature aggregation and attention weighting on the feature map in the channel and space dimensions to obtain a shallow feature map. The second convolution module extracts features from the shallow feature map and reduces the size of the shallow feature map to obtain another feature map; the second mixed attention module is used for carrying out further feature aggregation and attention weighting on the other feature map in the channel and space dimensions to obtain a depth feature map; and extracting the features from the depth feature map through a third mixed attention module, and reducing the size to obtain a depth feature descriptor. For convenience of distinction and description, the feature map output by the first convolution module is referred to as a first feature map, and the feature map output by the second convolution module is referred to as a second feature map.

Each mixed attention module includes a channel attention module and a spatial attention module connected in sequence. The input feature map of the mixed attention module is directly input into the channel attention module, the output feature map of the channel attention module is the input feature map of the space attention module, and the output feature map of the space attention module is the output feature map of the whole mixed attention module. Further, the channel attention module is used for acquiring the relation among the channels of the input feature map, recalibrating the input feature map in the channel dimension based on the attention mechanism, strengthening the salient features in the input feature map, and inhibiting the non-salient features in the input feature map to obtain the output feature map. The spatial attention module is used for acquiring spatial information of the input feature map, recalibrating the output feature map of the channel attention module in the spatial dimension based on the attention mechanism, strengthening the salient features in the output feature map, and inhibiting the non-salient features in the output feature map of the channel attention module to obtain the output feature map of the spatial attention module. The specific structure and function of each module will be described in detail in the following embodiments.

Based on the twin attention network, fig. 2 is a flowchart of another polarized SAR image registration method based on an attention mechanism according to an embodiment of the present invention, where the method is applicable to the case of registering different polarized SAR images in the same scene. The method is executed by the electronic equipment, and the specific method comprises the following steps of:

S110, respectively carrying out polarization whitening filtering on two different polarized SAR images in the same scene to obtain respective filtered images, wherein the two polarized SAR images are C3 data, and the filtered images comprise edges and texture information of the polarized SAR images.

The scene here refers to the content presented by the polarized SAR image, and may be a certain area, the same building, or the like. The two different polarized SAR images are data sources of the whole registration method, can be the polarized SAR images acquired by different sensors, and can also be two polarized SAR images with different wave bands or different irradiation angles. Furthermore, any one of the two polarized SAR images can be used as a reference image, the other one is used as an image to be registered, and the image registration process is to convert the image to be registered into a registered image consistent with the coordinates of the reference image.

Unlike the prior art in which the SAR amplitude map is adopted for image registration, the polarized SAR image used as the data source in this embodiment is C3 data. The original polarized SAR image data is generally a polarized scattering matrix S, and contains all polarization information of the ground object target. And C3 data refer to polarization covariance matrix data, and are calculated based on a polarization scattering matrix S. The C3 data includes 9 channels, which respectively correspond to 9 parameters of the polarization covariance matrix, and each of the 9 parameters has physical significance, such as intensity, phase, energy, back scattering, electromagnetic scattering, and the like, wherein the phase information and the scattering information are specific to the C3 data, and play an important role in the accuracy of subsequent image registration. In addition, the C3 data is calculated based on the scattering matrix S, the data precision is of double type, and the ground feature information is described more precisely.

After two polarized SAR images in a C3 data form are obtained, respectively performing polarized whitening filtering on the two polarized SAR images, so as to reduce the influence of speckle noise in the polarized SAR images, and obtaining edge and texture data of the polarized SAR images for selecting key points in subsequent images. Optimally combining all elements in the polarization covariance matrix; reducing speckle noise in the image, determining the edge and texture area of the image, and generating edge data of the image. Specifically, by combining elements in the polarization covariance matrix, the ratio of the standard deviation s to the mean value m of the image is minimized, so that speckle noise in the image can be reduced, and edge and texture information of the image can be obtained.

And S120, performing self-adaptive threshold constraint, morphological corrosion and non-maximum suppression operation on the two filtered images, and determining key points with significant characteristics in the two polarized SAR images.

The filtered images comprise edge and texture information of polarized SAR images, the steps sequentially carry out self-adaptive constraint, morphological corrosion and non-maximum suppression operation on each filtered image, and key points with obvious characteristics in each filtered image are obtained and serve as key points of each polarized SAR image under the combined action of the operations.

In an alternative embodiment, for any one of the filtered images, an adaptive threshold for the filtered image is first calculated. Specifically, the average intensity value of all pixel points in the filtered image is calculated and divided by 2 to obtain an adaptive constraint threshold, wherein the threshold represents the degree of constraint on the brightness of the image.

And then, carrying out morphological corrosion treatment on the filtered image by utilizing the self-adaptive threshold value. Specifically, for data greater than the adaptive threshold, performing morphological corrosion by adopting a flat circular corrosion core with a radius of 1; for data less than or equal to the adaptive threshold, then no morphological erosion process is performed. The eroded image includes more refined edge and texture information.

And finally, performing non-maximum suppression on the corroded image to obtain key points with obvious features in the filtered image, wherein the key points are used as the key points with obvious features in the corresponding polarized SAR image. Specifically, after non-maximum suppression processing, key points to be reserved are determined according to actual registration requirements. And assuming that the key points to be reserved in the actual registration requirement are M, using M points with strongest response intensity in the corroded image to be used as the key points, wherein M is a natural number. The whole key point selection flow is shown in fig. 3.

S130, respectively inputting data blocks with set sizes around each key point in the two polarized SAR images, and carrying out depth feature extraction on two branches of a trained twin attention network to obtain depth feature descriptors of each key point, wherein the trained twin attention network can enable the depth feature descriptors of the matched key points to be consistent.

The twin attention network is shown in fig. 1. Inputting the data blocks with set sizes around the key points in any one polarized SAR image into one of the branches, inputting the data blocks with set sizes around the key points in the other polarized SAR image into the other branch, and extracting depth features of the input data blocks by each branch to obtain depth feature description of the data blocks. The depth feature descriptor refers to a feature vector of a preset dimension for characterizing the depth feature of the data block. It can be seen that each keypoint will be able to be a depth feature descriptor. The trained twin attention network can enable the matched key points in the two SAR images to obtain depth feature descriptors as much as possible for matching the key points.

In an alternative embodiment, the generating process of the depth feature descriptor of any key point in any polarized SAR image includes: and extracting data blocks with set surrounding sizes by taking the key point as a center and inputting the data blocks into any branch of the twin attention network. Optionally, the set size is 32×32. In the branch, the data block is operated as follows:

And step one, extracting the characteristics of the data block through a first convolution module to obtain a first characteristic diagram. The feature map includes information such as texture, edge, structure and the like of an image, as shown in fig. 1, the first convolution module includes 3 3×3 convolution layers sequentially connected, the step distance is 1, the size of the image is not changed, the three convolution layers gradually extract features of a data block, and the extracted features are different with the increase of the depth of a network.

And secondly, carrying out feature aggregation and attention weighting on the first feature map in the channel and space dimensions through a first mixed attention module to obtain a shallow feature map. The first hybrid attention module is capable of extracting more significant information in the polarized SAR image, such as building contours, road boundaries, etc. In order to distinguish from the information extracted by the second mixed-attention module, these information are referred to as shallow information. Compared with the texture, edge and structure information in the first feature map in the first step, the shallow information in the first step has deeper network hierarchy, and the depth of the information is relatively deeper.

In an alternative embodiment, the channel attention module includes a global averaging pooling layer and 2 1 x 1 convolution layers, as shown in fig. 4. After the first feature map is input into the channel attention module, the first feature map is firstly used as an input feature map U of a global average pooling layer, space dimension compression is carried out in the layer, and one-dimensional feature vectors are obtained, wherein the one-dimensional feature vectors comprise global features of the channel direction. The global averaging pooling layer functions to aggregate feature maps, which feature vectors can be understood as a channel descriptor. Because the original data is C3 data of the polarized SAR image and contains polarization information of the ground feature, the channel descriptor is embedded into the global distribution of channel direction characteristic response, so that the information of the global receptive field from the network can be utilized by a lower layer of the information, and the polarization scattering characteristic information can be effectively combined in the channel dimension. After the one-dimensional feature vector is obtained, the vector is input into the 2 1 multiplied by 1 convolution layers which are sequentially connected, and the attention weight of each channel is learned, so that the model has stronger distinguishing capability on the features of each channel. Specifically, the part can learn the nonlinear relationship among the channels and the learned relationship is not mutually exclusive; in order to reduce the complexity of the model and improve the generalization capability, the first 1×1 convolution layer plays a role in dimension reduction of the channel, a convolution result is activated by adopting a ReLU function, the second 1×1 convolution layer is used for recovering the original dimension in the channel dimension, and finally the learned weight value (activated by the Sigmoid function) of each channel is multiplied by the input feature U to obtain a new feature weighted by the channel attention, wherein the dimension and the dimension are the same as the input feature U, and the recalibration of the input feature U in the channel dimension is realized. Further, the data of each channel represents different characteristics, and the help provided by each characteristic to the network learning is different, so that the attention weight of each channel is increased along with the increase of the saliency of each channel characteristic, and the channel characteristics with strong saliency can be enhanced by weighting each channel through the attention of the channel, so that the effective characteristics are more obvious, meanwhile, the channel characteristics with weak saliency are inhibited, and the ineffective characteristics are weaker. For ease of distinction and description, the profile output by the channel attention module is referred to as the third profile.

In another alternative embodiment, the spatial attention module includes a max pooling layer, an average pooling layer, and a 7 x 7 convolution layer, as shown in fig. 5. And after the output feature map F of the channel attention module is input into the spatial attention module, respectively inputting the maximum pooling layer and the average pooling layer, carrying out maximum pooling along the channel dimension in the maximum pooling layer, carrying out average pooling along the channel dimension in the average pooling layer, and connecting the two pooling results to obtain a 2-channel spatial feature map, wherein the spatial feature map is also a feature descriptor. The feature descriptors are input into the 7×7 convolution layer (the activation function is Sigmoid), and the attention weights of the spatial positions are learned to form a 1-channel spatial attention map. The spatial attention map represents the locations in the image that need to be emphasized and suppressed, with the attention weight of each spatial location increasing with increasing saliency of each spatial location feature. And weighting the output feature map F of the channel attention module in the space dimension according to the attention weights of the spatial positions to obtain new features weighted by the spatial attention, namely the shallow feature descriptors. The dimension and the size of the shallow feature descriptor are the same as those of the feature map F, and the feature F is recalibrated spatially. Furthermore, in an image, some spatial position features are obvious, effective information is more, and some positions may not have effective features, so that the method has no effective effect on network learning and feature extraction. Therefore, the attention weight of each spatial position is increased along with the increase of the saliency of the features of each spatial position, the spatial attention module can be used for acquiring the position information which needs to be emphasized or suppressed in the image, the weight is increased for the effective position, the extraction of the feature information is enhanced, the weight is reduced for the ineffective position, and the extraction of the information is suppressed. In addition, due to Double type in the C3 data, the texture description of the space detail is better, and the detail is richer.

And thirdly, after the shallow feature descriptor is obtained, extracting features from the shallow feature map through a second convolution module and reducing the size to obtain a second feature map. As shown in fig. 1, the second convolution module includes 3 3×3 convolution layers connected in sequence, where the first convolution layer has a stride of 2 and the last two convolution layers have a stride of 1. The convolution layer with the step distance of 2 reduces the size of the image by half while extracting the features, and the convolution layer with the step distance of 1 does not change the size of the image while extracting the features. Also, as the network hierarchy is deeper, the features extracted by the convolutional layer are deeper.

And step four, carrying out feature aggregation and attention weighting on the second feature map in the channel and space dimensions through a second mixed attention module to obtain a depth feature map. The second hybrid attention module is capable of extracting more abstract information, such as semantic information, in the polarized SAR image, which is called depth information in order to distinguish from the information extracted by the first hybrid attention module. The depth feature descriptors can not only characterize texture, geometry, scattering information of the polarized SAR image, but also more abstract high-dimensional information thereof. The specific structure and operation of the second mixed attention module are the same as those of the first mixed attention module, and will not be described again.

And fifthly, extracting features from the depth feature map through a third convolution module, and reducing the size to obtain a depth feature descriptor. As shown in fig. 1, the third convolution module includes 2 3×3 convolution layers and 1 image size convolution layer connected in sequence, where the first convolution layer has a stride of 2 and the second two convolution layers have a stride of 1. The image size of the depth feature map is reduced after passing through the first convolution layer, the feature after the size reduction continuously extracts deeper features through the second convolution layer, and finally, the input features are convolved by adopting a third convolution layer with the image size of 8 multiplied by 8, so that the feature vector with the preset dimension as the feature vector, namely the depth feature descriptor is obtained.

And S140, matching the depth feature descriptors of key points in the two polarized SAR images by using a violence matching method and a random sampling coincidence algorithm to obtain a perspective transformation matrix from one polarized SAR image to the other polarized SAR image.

The "one polarized SAR image" is the image to be registered in S110, and the "other polarized SAR image" is the reference image in S110. The perspective transformation matrix is used for transforming the image coordinates to be registered into reference image coordinates. According to the depth feature descriptor, performing rough key point matching by adopting a violent matching method, performing fine matching by adopting a random sample consensus (RANSAC) algorithm, and finally calculating a perspective transformation matrix by adopting a least square method.

In an alternative embodiment, first, a rough key point matching is performed using a brute force matching method. Specifically, matching depth feature descriptors of key points in the reference image with depth feature descriptors generated by key points in the image to be registered by adopting a violent matching and K nearest neighbor method, and calculating L between feature descriptors of each key point in the reference image and feature descriptors of all key points in the image to be registered ₂ A distance; for each key point in the reference image, determining the L with the key point in the image to be registered ₂ And K key points with the distance smaller than a given threshold value form matching point pairs, wherein K is a non-negative integer. That is, L between the feature descriptors of the K keypoints in the image to be registered and the feature descriptors of the keypoints in the reference image ₂ The distances are all smaller than the given threshold.

For example, a threshold value of 2 is set, taking a key point in a reference image as an example, first calculating L of feature descriptors of the key point and feature descriptors of all key points in the image to be matched ₂ Distance, then according to the set threshold 2, L of all the obtained point pairs ₂ And screening the distances to obtain K key points meeting the requirements, and forming matching point pairs.

After the rough matching is finished, the error matching removal is carried out on the matching point pairs obtained by the rough matching by adopting a RANSAC algorithm, so that the fine registration is realized. Specifically, RANSAC is an algorithm for iterative solution, which estimates a mathematical model of matching point pair data through multiple iterations, returns a matching point pair set which meets a certain mathematical model most, removes mismatching points, and obtains more accurate matching point pairs to realize fine registration. After the accurate matching point pair is obtained, a perspective transformation matrix is calculated through a least square method.

S150, transforming the polarized SAR image into a registered image consistent with the coordinates of the other polarized SAR image by utilizing the perspective transformation matrix. After transformation, the SAR image registration is completed.

On the basis of the above embodiments, the training process of the twin attention network is refined as follows. The purpose of training is to give the twinning attention network the ability to efficiently extract polarized SAR image depth features and efficiently construct depth feature descriptors. Optionally, before the data blocks with the set sizes around the key points in the two polarized SAR images are input into two branches of the trained twin attention network to perform depth feature extraction, the method further includes the following steps:

Step one, constructing a polarized SAR image set as a training data set of a twin attention network, wherein the image set comprises a plurality of matched polarized SAR image pairs. Each matched polarized SAR image pair consists of two polarized SAR images in the same shooting scene, and each polarized SAR image is the C3 data with the set size. Alternatively, when the C3 data block size in the above embodiment is 32×32, the polarized SAR image for training is also a C3 data block of 32×32.

Step two, any N matched polarized SAR image pairs are taken from the image set, and two images in each matched polarized SAR image pair are respectively input into two branches of a twin attention network, so that corresponding depth characteristic symbols are obtained.

And thirdly, training the twin attention network through minimizing the depth characteristic descriptor difference of the matched polarized SAR image pair and maximizing the depth characteristic descriptor difference of the unmatched polarized SAR image pair. Specifically, the loss calculation is performed according to the feature descriptors of the two matched polarized SAR images, so that the updating of the weight parameters of the twin attention network is realized, and the depth feature extraction and the depth feature descriptor construction capacity of the twin attention network are continuously improved.

In an alternative embodiment, a loss function is calculated, by minimizing the loss function, the depth feature descriptor differences of the matched pair of polarized SAR images are minimized, and the depth feature descriptor differences of the unmatched pair of polarized SAR images are maximized, completing the training of the twin attention network:

L _T ＝L _FOS +R _SOS (1)

wherein L is _T Indicating total loss, L _FOS Representing first order similarity loss, R _SOS Representing a second order similarity regularization, both having the same weight.

More specifically, L _FOS First order similarity loss between depth feature descriptors for matched pairs of polarized SAR images. The first order similarity penalty is used to learn local descriptors that make the distance between depth feature descriptors of matching polarized SAR images smaller, while the distance between depth feature descriptors of non-matching polarized SAR images larger. Representing positive sample descriptor pairs as

I.e., ith matched pair of polarized SAR images, x _i Representing either polarized SAR image in said pair,/or->

Representing the other polarized SAR image in the pair. The first order similarity loss is calculated as:

wherein t represents the edge margin of the polarized SAR image; d (u, v) = ||u-v|| ₂ L between feature descriptors u and v representing arbitrary two polarized SAR images ₂ A distance;

a pair of polarized SAR images, x, representing the jth match _j Representing either polarized SAR image in said pair,/or->

Representing the other polarized SAR image in the image pair; d, d _i ^pos L representing depth feature descriptors of two images in an ith matched pair of polarized SAR images ₂ A distance; d, d _i ^neg L representing depth feature descriptors of two images in an i-th matched pair of polarized SAR images correlated with a non-matched pair of polarized SAR images ₂ Minimum value of distance.

At the same time, R _SOS Regularizing second-order similarity between depth feature descriptors of matched polarized SAR image pairs. Except through L _FOS In addition to the first-order constraints imposed, combining information from higher-order similarities can improve the performance of clustering and graph matching, so that the process of depth descriptor learning is further supervised by applying second-order constraints on the basis of the first-order similarity loss.

Training of small batches can be seen as two sets of descriptors with a one-to-one correspondence, i.e. { x } _i } _i＝1...N And { x } _i ⁺ } _i＝1...N In this case, x is _i And x _i ⁺ The second order similarity between is defined as:

the second order similarity regularization is defined as:

wherein d (x _i ，x _j ) Representing L between any one of the ith matched pair of polarized SAR images and the feature descriptor of any one of the jth matched pair of polarized SAR images ₂ The distance between the two adjacent substrates is determined,

representing L between the other image in the ith matched pair of polarized SAR images and the feature descriptor of the other image in the jth matched pair of polarized SAR images ₂ Distance (L)>

From { x _j } _j≠i And { x } _j ⁺ } _j≠i Is characterized by x _i And x _i ⁺ Degree of similarity between.

R _SOS Does not force the distance between mutually matching descriptors to decrease or the distance between non-matching descriptors to increase, so it cannot be done without L _FOS Used alone, can only be used as regularization term.

And calculating the loss in the training process through the loss function, and reversely transmitting and updating the network weight parameters to continuously train the twin attention network, so that the twin attention network can effectively extract the depth characteristics of input data and effectively construct depth characteristic descriptors for subsequent registration.

The technical effects of the embodiment of the invention are as follows:

1. according to the embodiment, polarized SAR images are subjected to Polarized Whitening Filtering (PWF) processing to obtain edge data after image denoising, threshold constraint is carried out on the edge data, the threshold is generated through adaptive calculation, the edge data after constraint is obtained, morphological corrosion and non-maximum suppression are further carried out on the edge data, and finally selected key points are obtained. The point selecting mode reduces the interference of speckle noise on one hand, so that the acquired edge data is more reliable; on the other hand, the speed of selecting points is very fast, most of the selected key points are positioned in the edge and texture areas of the image, and the number of the selected points can be controlled by a non-maximum value inhibition method to flexibly process.

2. In the embodiment, the depth feature extraction and the depth feature descriptor construction of the data block are performed by utilizing the twin attention network, and training parameters are shared between two branches of the twin attention network in the training process, so that the training efficiency and the training quality are improved; the training data is adopted as C3 data of the polarized SAR image, ground object target scattering information of the polarized SAR image data is fully utilized, compared with training by only adopting image amplitude information, the method can fully learn the scattering characteristics of the ground object target, is more suitable for learning the depth characteristics of the polarized SAR image, can better extract the depth characteristics of the polarized SAR image, and has more robustness in constructing the depth characteristic descriptor. Meanwhile, the key point is taken as the center, and the image blocks with the surrounding size of 32 multiplied by 32 are selected and used for constructing the feature descriptors of the points, so that the registration performance can be further improved without inputting images with fixed sizes.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 6, the device includes a processor 50, a memory 51, an input device 52 and an output device 53; the number of processors 50 in the device may be one or more, one processor 50 being taken as an example in fig. 6; the processor 50, the memory 51, the input means 52 and the output means 53 in the device may be connected by a bus or other means, in fig. 6 by way of example.

The memory 51 is used as a computer readable storage medium for storing a software program, a computer executable program and a module, such as program instructions/modules corresponding to a polarized SAR image registration method based on an attention mechanism in an embodiment of the present invention. The processor 50 performs various functional applications of the device and data processing, i.e. implements a polarized SAR image registration method based on an attention mechanism as described above, by running software programs, instructions and modules stored in the memory 51.

The memory 51 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for functions; the storage data area may store data created according to the use of the terminal, etc. In addition, memory 51 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, memory 51 may further include memory located remotely from processor 50, which may be connected to the device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input means 52 may be used to receive entered numeric or character information and to generate key signal inputs related to user settings and function control of the device. The output means 53 may comprise a display device such as a display screen.

The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor implements a polarized SAR image registration method based on the attention mechanism of any of the embodiments.

The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the essence of the corresponding technical solutions from the technical solutions of the embodiments of the present invention.

Claims

1. A polarized SAR image registration method based on an attention mechanism, comprising:

2. The method according to claim 1, wherein the performing the polarization whitening filtering on the two different polarized SAR images in the same scene to obtain respective filtered images includes:

for any one polarized SAR image, the ratio of the standard deviation s to the mean value m of the image is minimized by combining elements in the polarization covariance matrix, and a filtered image is obtained.

3. The method of claim 1, wherein performing adaptive threshold constraint, morphological erosion and non-maxima suppression operations on the two filtered images, determining keypoints of feature saliency in the two filtered images comprises:

calculating an adaptive threshold of any one of the filtered images;

performing morphological corrosion treatment on the filtered image by utilizing the self-adaptive threshold;

And performing non-maximum suppression on the corroded image to obtain key points with obvious features in the filtered image, wherein the key points are used as the key points with obvious features in the corresponding polarized SAR image.

4. The method of claim 1, wherein the twin attention network comprises two branches of identical structure, each branch comprising a first convolution module, a first mixed attention module, a second convolution module, a second mixed attention module, and a third convolution module, each mixed attention module comprising a channel attention module and a spatial attention module connected in sequence;

the step of respectively inputting the data blocks with set sizes around each key point in the two polarized SAR images into two branches of a trained twin attention network for depth feature extraction to obtain depth feature descriptors of each key point, comprising:

taking any key point in any polarized SAR image as a center, extracting data blocks with set surrounding sizes and inputting the data blocks into any branch of the twin attention network, wherein the branch comprises the steps of:

extracting the characteristics of the data block through a first convolution module to obtain a first characteristic diagram;

performing feature aggregation and attention weighting on the first feature map in the channel and space dimensions through a first mixed attention module to obtain a shallow feature map;

Extracting features from the shallow feature map and reducing the size of the shallow feature map through a second convolution module to obtain a second feature map;

performing feature aggregation and attention weighting on the second feature map in the channel and space dimensions through a second mixed attention module to obtain a depth feature map;

and extracting the features from the depth feature map through a third convolution module, and reducing the size to obtain a depth feature descriptor.

5. The method of claim 4, wherein the channel attention module comprises a global averaging pooling layer and 2 1 x 1 convolutional layers;

the feature aggregation and the attention weighting are carried out on the first feature map in the channel and space dimensions through a first mixed attention module to obtain a shallow feature map, which comprises the following steps:

inputting the first feature map into the global average pooling layer, and compressing space dimensions to obtain a one-dimensional feature vector;

the 2 1 multiplied by 1 convolution layers connected in sequence are input into the one-dimensional feature vector, and the attention weight of each channel is learned, wherein the attention weight of each channel is increased along with the increase of the feature significance of each channel;

and weighting the first feature map in the channel dimension according to the attention weight of each channel to obtain a third feature map.

6. The method of claim 5, wherein the spatial attention module comprises a max pooling layer, an average pooling layer, and a 7 x 7 convolution layer;

the feature aggregation and the attention weighting are carried out on the first feature map in the channel and space dimension through a first mixed attention module to obtain a shallow feature map, and the method further comprises the following steps:

inputting the third feature map into the maximum pooling layer and the average pooling layer respectively, carrying out maximum pooling and average pooling along the channel dimension respectively, and connecting pooling results to generate feature descriptors;

inputting the feature descriptors into the 7 x 7 convolution layer, and learning the attention weight of each spatial position, wherein the attention weight of each spatial position increases with the increase of the feature significance of each spatial position;

and weighting the third feature map in the space dimension according to the attention weight of each space position to obtain a shallow feature map.

7. The method of claim 1, further comprising, before the inputting the data blocks of the set size around each key point in the two polarized SAR images into the two branches of the trained twin attention network for depth feature extraction:

Acquiring a polarized SAR image set, wherein the image set comprises a plurality of matched polarized SAR image pairs, each matched polarized SAR image pair consists of two polarized SAR images in the same shooting scene, and each polarized SAR image is C3 data with the set size;

taking N matched polarized SAR image pairs from the image set, respectively inputting two images in each matched polarized SAR image pair into two branches of a twin attention network, and obtaining corresponding depth characteristic symbols;

training of the twinning attention network is accomplished by minimizing depth feature descriptor differences for matched polarized SAR image pairs and maximizing depth feature descriptor differences for unmatched polarized SAR image pairs.

8. The method of claim 7, wherein the training of the twin attention network is accomplished by minimizing depth feature descriptor differences for matched polarized SAR image pairs and maximizing depth feature descriptor differences for unmatched polarized SAR image pairs, comprising:

computing depth feature descriptors for matched pairs of polarized SAR images using the formulaFirst-order similarity loss L between _FOS ：

feature descriptor, x, representing the ith matched pair of polarized SAR images _i Feature descriptors representing either polarized SAR image in said image pair,/or->

A feature descriptor representing the other polarized SAR image of the pair; />

Feature descriptor, x, representing the j-th matched pair of polarized SAR images _j Feature descriptors representing either polarized SAR image in said image pair,/or->

A feature descriptor representing the other polarized SAR image of the pair; />

Represent the firstL of depth feature descriptors of i matched pairs of polarized SAR images ₂ A distance; />

L representing depth feature descriptors of correlated, non-matched polarized SAR image pairs of an ith matched polarized SAR image pair ₂ A minimum value of the distance;

calculating a second order similarity regularization R between depth feature descriptors of the matched pair of polarized SAR images using the formula _SOS ：

From { x _j } _j≠i And { x } _j ⁺ } _j≠i Is characterized by x _i And x _i ⁺ A degree of similarity between;

according to L _FOS And R is _SOS The following loss function L is calculated _T ：L _T ＝L _FOS +R _SOS ；

And through the minimization of the loss function, the minimization of the depth characteristic descriptor difference of the matched polarized SAR image pair and the maximization of the depth characteristic descriptor difference of the unmatched polarized SAR image pair are realized, and the training of the twin attention network is completed.

9. An electronic device, comprising:

one or more processors;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the attention-based polarized SAR image registration method of any one of claims 1-8.

10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the attention-based polarized SAR image registration method according to any one of claims 1-8.