CN113221923A

CN113221923A - Feature decomposition method and system for multi-mode image block matching

Info

Publication number: CN113221923A
Application number: CN202110605524.3A
Authority: CN
Inventors: 王爽; 魏慧媛; 李毅; 段宝瑞; 权豆; 雷睿琪; 杨博武; 焦李成
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-08-06
Anticipated expiration: 2041-05-31
Also published as: CN113221923B

Abstract

The invention discloses a characteristic decomposition method and a system for multi-modal image block matching, which are characterized in that a data set of a heterogeneous image block is manufactured; preprocessing an image; performing feature extraction by using an encoder; characteristic decomposition; reconstructing the image block with a decoder; a discriminator; optimizing a network; predicting the matching probability; finally, evaluating network performance shows that the features of the image blocks are decomposed into public features and private features, a countertraining optimization encoder is introduced, reconstruction loss is utilized to ensure that the encoder can extract the information features, and the original image is reconstructed based on the public features and the private features to obtain a final image block matching result. The method utilizes the combined optimization of the four loss functions, not only greatly improves the accuracy of the matching of the heterogeneous images, but also shortens the training period of the network.

Description

Feature decomposition method and system for multi-mode image block matching

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a feature decomposition method and a feature decomposition system for multi-mode image block matching, which can be used for target tracking, heterologous image registration, image retrieval and the like, and can effectively improve the matching precision of heterologous images.

Background

The purpose of image block matching is to establish local correspondence between image blocks. The method has important application in various remote sensing image processing such as remote sensing image registration, change detection, image splicing, image fusion and the like. In recent years, it has become a trend to acquire more abundant information using images of different sensors. However, the multimode image such as the optical image and the sar image has great difference in appearance and texture, which brings great difficulty to image mosaic matching.

The traditional image block matching method relies on a manually constructed descriptor, and obtains the corresponding relation between image blocks, such as SIFT, according to the characteristic distance. However, the accuracy and robustness of these methods still remain to be improved. In recent years, deep learning has achieved remarkable results in the field of image processing. Descriptors learned from deep networks are more adaptive and robust than artificially designed descriptors, such as MatchNet, L2Net, HardNet. These methods achieve better results in homogeneous patch matching tasks, but still have certain limitations in multi-modal patch matching tasks, especially in optical and sar patch matching tasks. Since the optical image and the sar image have a relatively large difference in appearance and texture, it is difficult to achieve a more accurate matching.

The existing method adopts a progressive sampling strategy to ensure that a network can obtain a large number of training samples within a few rounds, emphasizes the relative distance between descriptors, additionally supervises an intermediate characteristic diagram, considers the compactness of the descriptors, and outputs the descriptors in Euclidean space through L₂The distances are matched and very significant performance is achieved. And a batch-based sampling strategy is used for mining the negative samples, so that the problem of sample imbalance in the L2Net is solved, namely the distance between the nearest positive sample and the nearest negative sample in one batch is maximized, and the performance is better. Both methods do not take into account the decomposition of the features.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a method and a system for feature decomposition for multi-modal image block matching, which significantly improve the performance of multi-modal image matching.

The invention adopts the following technical scheme:

a feature decomposition method for multi-modal patch matching, comprising the steps of:

s1, making a data set;

s2, normalizing the pixel values of all image blocks in the data set manufactured in the step S1 to [ -1, 1 ];

s3, inputting a pair of visible light optical image blocks and sar image blocks which are matched after normalization in the step S2, and respectively extracting the image block characteristics of the optical image blocks and the sar image blocks by using an encoder;

s4, performing feature decomposition on the features of the optical image blocks and the features of the sar image blocks obtained in the step S3 to obtain the common features O of the optical image blocks_cAnd private characteristics O_pCommon features S of sar image blocks_cAnd private features S_p；

S5, using decoder to process the common characteristic O of the optical image block obtained in the step S4_cAnd private characteristics O_pReconstruction of common features S of optical, sar, image blocks_cAnd private features S_pReconstructing an sar image;

s6, sending the two public features obtained in the step S4 to a discriminator, and distinguishing the two public features from optical image blocks or sar image blocks through the discriminator;

s7, optimizing the encoder by using the triple loss; calculating reconstruction loss of the reconstructed optical image and the sar image obtained in the step S5 and the corresponding original images respectively; introducing countermeasures to optimize the encoder and the discriminator in step S6, so that the encoder spoofs the discriminator; constraining public and private features using differential losses; optimizing the feature decomposition network through triple loss, countermeasure loss, reconstruction loss and difference loss to obtain a training weight for subsequent testing of common features;

and S8, loading the weights trained in the step S7 into the feature decomposition network model, reading all the test set data in sequence, predicting the matching probability of each pair of image blocks in the test set, and obtaining the final image block matching result.

Specifically, in step S3, the encoder includes five convolution layers, the size of the convolution kernel is 3 × 3, and the number of convolution kernels is 32, 32, 64, and 128, respectively.

Specifically, in step S4, the feature decomposition module includes two convolutional layers, the convolutional kernel size of the first convolutional layer is 3 × 3, and the number of channels is 128; the convolution kernel size of the second layer is 8 × 8 and the number of channels is 128.

Specifically, in step S4, the optical common module and the sar common module are respectively used to extract the common features from the optical image block features and the sar image block features, and the optical special module F is used_poAnd sar specific module F_psRespectively extracting the private features of the optical image block features and the sar image block features to obtain the public features and the private features of the optical image and the sar image:

O_c,i,Q_p,i＝F_c(x_opt,i),F_po(x_opt,i)

S_c,i,S_p,i＝F_c(x_sar,i),F_pS(x_sar,i)

wherein, O_c,iAnd O_p,iPublic and private features, S, representing an optical image block_c,iAnd S_p,iRepresenting the public and private features of the sar image block.

Specifically, in step S5, the decoder includes two fully-connected layers and four convolutional layers, and the number of neurons in the input layer, the hidden layer, and the output layer is 256, 512, and 1024, respectively; the sizes of the convolution kernels are 3 × 3, 5 × 5, and 7 × 7, and the numbers of the convolution kernels are 32, 64, 128, and 1, respectively.

Specifically, in step S5, reconstructing the input image block based on the public and private features of the input image is as follows:

where De is the number of bits in the decoder,

is a reconstructed block of the optical image,

is the reconstructed sar image block.

Specifically, in step S6, the discriminator includes five full-link layers, and the number of neurons in each full-link layer is 128, 512, and 2.

Specifically, in step S7, the final loss function obtained by jointly optimizing the four losses is as follows:

L＝L_tri+L_adv+λL_diff+L_rec

wherein L is_triIs a triplet loss, L_advFor encoder and discriminator countermeasures against losses, L_diffFor differential losses, L_recTo reconstruct the loss, λ is the weight in the loss function.

Further, the triplet loss L_triComprises the following steps:

wherein d is_posAnd d_negEuclidean distances of the positive image block pair and the difficult negative block pair respectively;

the differential losses are as follows:

wherein,

for transposition of common features of optical image blocks, O_p,iFor the private features of the optical image block,

for transposition of common features of sar image blocks, S_p,iThe private characteristics of the sar image block are that i is 1, …, n is the number of samples;

loss of reconstruction L_recFrom the reconstructed image and the real image, the following is calculated:

where k is the number of pixels in each image block,

for the reconstructed optical image block, x_opt,iFor an input block of an optical image,

for reconstructed sar image block, x_sar,iAn input sar image block;

the penalty function of the arbiter is:

wherein D is a discriminator, E is an entropy calculation, F_c(x_sar，i) Features extracted by the encoder for the input sar image block, F_c(x_opt，i) Being opti of inputFeatures of the cal image block extracted by an encoder;

the encoder's penalty function is as follows:

wherein E is an entropy calculation, F_c(x_sar,i) Features extracted by the encoder for the input sar image block.

Another technical solution of the present invention is a system for feature decomposition for multi-modal patch matching, comprising:

a data module for making a data set;

the preprocessing module normalizes the pixel values of all image blocks in a data set manufactured by the data module to [ -1, 1 ];

the characteristic module is used for inputting the visible light optical image block and the sar image block which are matched after normalization of the pair of preprocessing modules, and extracting the image block characteristics of the optical image block and the sar image block respectively by using an encoder;

the decomposition module is used for performing characteristic decomposition on the characteristics of the optical image block and the characteristics of the sar image block obtained by the characteristic module to obtain the common characteristics O of the optical image block_cAnd private characteristics O_pCommon features S of sar image blocks_cAnd private features S_p；

A reconstruction module using the common characteristic O of the optical image blocks obtained by the decoder pair decomposition module_cAnd private characteristics O_pReconstruction of common features S of optical, sar, image blocks_cAnd private features S_pReconstructing an sar image;

the distinguishing module is used for sending the two public features obtained by the decomposition module to the discriminator and distinguishing the two public features from optical image blocks or sar image blocks through the discriminator;

an optimization module to optimize the encoder using the triplet losses; respectively calculating reconstruction loss of the reconstructed optical image and the sar image obtained by the reconstruction module and the corresponding original image; introducing countermeasures to optimize the encoder and a discriminator in the distinguishing module, so that the encoder cheats the discriminator; constraining public and private features using differential losses; optimizing the feature decomposition network through triple loss, countermeasure loss, reconstruction loss and difference loss to obtain a training weight for subsequent testing of common features;

and the output module loads the trained weight in the optimization module into the feature decomposition network model, sequentially reads all test set data, predicts the matching probability of each pair of image blocks in the test set and obtains the final image block matching result.

Compared with the prior art, the invention has at least the following beneficial effects:

the invention relates to a feature decomposition method for multi-modal image block matching, which decomposes the features of image blocks into public features and private features through a feature decomposition module, uses the public features for image block matching, and eliminates the influence of larger difference among the multi-modal image blocks, thereby obtaining better matching effect.

Further, an encoder is used for extracting image block features, common characteristics of the bottom layers of the heterogeneous images are deeply mined, and the obtained features are used for subsequent feature decomposition.

Further, in order to eliminate the influence of large differences between the multi-modal image blocks, the feature decomposition module decomposes the features of the image blocks into public features and private features.

Furthermore, compared with the method that all the features of the image are used for matching, the private features are discarded, and only the public features are used for image block matching, modal differences can be eliminated, and a better matching result is obtained.

Further, to ensure that the learned features contain valid information, a decoder is used to reconstruct the image.

Furthermore, the original image is reconstructed based on the public characteristic and the private characteristic, and the encoder can extract the information characteristic.

Further, in order to ensure that the common features learned by the optical image and the sar image are similar, a discriminator is introduced to identify the corresponding modalities and perform counterstudy, and the purpose of the discriminator is to distinguish the common features of the optical image from the common features of the sar image.

Further, the network is optimized through the final loss, and the weight is continuously adjusted to obtain a high matching result.

Further, the encoder is optimized using the triplet losses to make the distance between matched pairs as close as possible and the distance between difficult mismatched pairs as far as possible. In order to extract consistent common features, countermeasures are introduced to optimize the encoder and the discriminator, and the common features learned by the optical image and the sar image are similar. In addition, the reconstruction loss is utilized to ensure that the encoder can extract the informative features in order to reconstruct the original image based on the public and private features. The public and private features are constrained to be different using a disparity penalty.

In conclusion, the method utilizes the four loss functions to jointly optimize, so that the accuracy of the heterogeneous image matching is greatly improved, and the training period of the network is shortened.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a diagram of a network framework of the present invention;

FIG. 3 is some exemplary diagrams of the heterogeneous source data sets used in simulation experiments in accordance with the present invention;

FIG. 4 is a diagram illustrating image block matching results according to the present invention;

FIG. 5 shows the distribution visualization of extracted descriptors, wherein (a) is HardNet and (b) is FDNet.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Various structural schematics according to the disclosed embodiments of the invention are shown in the drawings. The figures are not drawn to scale, wherein certain details are exaggerated and possibly omitted for clarity of presentation. The shapes of various regions, layers and their relative sizes and positional relationships shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, according to actual needs.

The invention provides a characteristic decomposition method for multi-modal image block matching, which is characterized in that a data set of a heterogeneous image block is manufactured; preprocessing an image; performing feature extraction by using an encoder; characteristic decomposition; reconstructing the image block with a decoder; a discriminator; optimizing a network; predicting the matching probability; finally, evaluating the network performance shows that the characteristics of the image block are decomposed into public characteristics and private characteristics, and a confrontation training optimization encoder and a discriminator are introduced. In addition, reconstruction loss is utilized to ensure that the encoder can extract informative features in order to reconstruct the original image based on public and private features. The method of the invention achieves good effect in multi-modal image matching.

Referring to fig. 1 and fig. 2, a feature decomposition method for multi-modal tile matching according to the present invention includes the following steps:

s1, making a data set

Cutting 323464 pairs of patch blocks from the visible light and sar images with 512 × 512 pixel level alignment of 1500 pairs; of these 246121 are used for training on the patch block and the rest are used for testing. The size of the Patch block is 32 × 32;

s2, image preprocessing

Normalizing the pixel values of all image blocks to be between [ -1, 1 ];

s3, extracting image block characteristics by coder

The encoder is composed of five convolutional layers, the size of the convolutional kernel is 3 × 3, and the number of convolutional kernels is 32, 32, 64, and 128, respectively.

Inputting a pair of matched visible optical and sar image blocks P ═ x_opt，x_sar) Respectively extracting the features of the optical image block and the sar image block by using two encoders sharing weight;

s4, feature decomposition

The feature decomposition module is composed of two convolutional layers, the convolutional kernel size of the first convolutional layer is 3 x 3, and the number of channels is 128. The convolution kernel size of the second layer is 8 × 8 and the number of channels is 128.

The features of the optical image block and the features of the sar image block obtained in step S3 are subjected to feature decomposition to obtain the common feature O of the optical image block_cAnd private characteristics O_pCommon features S of sar image blocks_cAnd private features S_p；

The optical image block feature and the sar image block feature obtained in step S3 are respectively subjected to an optical common module and an sar common module F_cExtracting public features from the optical image block features and the sar image block features; using optical specific modules F_poAnd sar specific module F_psAre respectively pairedAnd carrying out private feature extraction on the optical image block features and the sar image block features. In practical applications, the feature decomposition module is implemented by two convolutional layers. And obtaining public characteristics and private characteristics of the optical image and the sar image through characteristic decomposition:

O_c,i,Q_p,i＝F_c(x_opt，i)，F_po(x_opt，i)

S_c，i，S_p,i＝F_c(x_sar,i),F_ps(x_sar,i)

S5, reconstructing the image block by the decoder

The decoder is composed of two full-connection layers and four convolution layers, and the number of neurons of the input layer, the hidden layer and the output layer is 256, 512 and 1024 respectively. The sizes of the convolution kernels are 3 × 3, 5 × 5, and 7 × 7, and the numbers of the convolution kernels are 32, 64, 128, and 1, respectively.

To ensure that the learned features contain valid information, the common features O of the optical image blocks obtained by step S4 are used with a decoder_cAnd private characteristics O_pReconstruction of common features S of optical, sar, image blocks_cAnd private features S_pReconstructing an sar image;

reconstructing an input image block based on public and private features of the input image:

where De is the number of bits in the decoder,

is heavyThe image block of the created optical image is,

is the reconstructed sar image block. In the matching task, it is desirable that the common features of the corresponding optical and sar image blocks are as similar as possible.

S6, discriminator

The discriminator is composed of five fully-connected layers. The number of each full connectivity layer neuron was 128, 512 and 2, respectively.

Distinguishing, with a discriminator, that the two common features obtained from step S4 are from either an optical image block or an sar image block, the common features of the corresponding optical and sar image blocks being expected to be as similar as possible;

s7, network optimization

Optimizing the network through a plurality of loss functions, including triple loss, countermeasure loss, reconstruction loss and difference loss;

the triplet losses are used to optimize the encoder so that the distance between matched pairs is as close as possible and the distance between difficult mismatched pairs is as far as possible. In order to extract consistent common features, countermeasures are introduced to optimize the encoder and the discriminator, and the common features learned by the optical image and the sar image are similar. In addition, the reconstruction loss is utilized to ensure that the encoder can extract the informative features in order to reconstruct the original image based on the public and private features. And uses differential losses to constrain the public and private features to be different. 100 epochs are trained; the learning rate of the encoder and the feature decomposition module is 1.0; the learning rate of the discriminator is 1.0; the learning rate of the decoder is 0.0001; the weight λ in the loss function is 0.001 and the Batchsize is 321.

S701, optimizing an encoder by adopting a difficult sample mining strategy and triple loss;

the distance between matched pairs is made as close as possible and the distance between difficult mismatched pairs is made as far as possible. The triad loss is as follows:

d_pos＝d(O_c,i,S_c,i)

d_neg＝min(d(O_c,j,S_c,i),d(O_c,i,S_c,j)),i≠j

wherein d is_posAnd d_negThe euclidean distances of the pair of positive image blocks and the pair of difficult negative blocks, respectively.

S702, a discriminator is introduced to identify the mode corresponding to the input public characteristic and carry out counterstudy;

the purpose of the discriminator is to distinguish the common features of the optical image from those of the sar image. Thus, the penalty function of the arbiter is:

wherein D is a discriminator, E is an entropy calculation, F_c(x_sar,i) Features extracted by the encoder for the input sar image block, F_c(x_opt,i) Features extracted by the encoder for the input optical image block.

The encoder expects to fool the discriminator into being unable to separate common features of different modalities.

The encoder's penalty function is as follows:

wherein E is an entropy calculation, F_c(x_sar,i) The bits extracted by the encoder are the input sar image block.

S703, in the training process, alternately optimizing the encoder and the discriminator;

furthermore, public and private features should be different.

The differential losses are as follows:

wherein,

s704, cascading the public characteristic and the private characteristic and inputting the public characteristic and the private characteristic into a decoder to obtain a reconstructed image;

the reconstruction loss is calculated from the reconstructed image and the real image as follows:

where k is the number of pixels in each image block,

for reconstructed sar image block, x_sar,iIs the input sar image block.

S705, jointly optimizing four losses, wherein the final loss function is as follows:

L＝L_tri+L_adv+λL_diff+L_rec

s8, predicting matching probability

Loading the weights trained in the step S7 into the model, reading all the test set data in sequence, and predicting the matching probability of each pair of image blocks in the test set;

s9, evaluating network performance

The performance of the network on the disparate source data sets is evaluated by the FPR 95.

In another embodiment of the present invention, a feature decomposition system for multi-modal tile matching is provided, where the system can be used to implement the above-mentioned feature decomposition method for multi-modal tile matching, and specifically, the feature decomposition system for multi-modal tile matching includes a data module, a preprocessing module, a feature module, a decomposition module, a reconstruction module, a distinguishing module, an optimization module, and an output module.

The data module is used for making a data set;

In yet another embodiment of the present invention, a terminal device is provided that includes a processor and a memory for storing a computer program comprising program instructions, the processor being configured to execute the program instructions stored by the computer storage medium. The Processor may be a Central Processing Unit (CPU), or may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable gate array (FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, etc., which is a computing core and a control core of the terminal, and is adapted to implement one or more instructions, and is specifically adapted to load and execute one or more instructions to implement a corresponding method flow or a corresponding function; the processor according to the embodiment of the present invention may be used for feature decomposition operation of multi-modal image block matching, including:

making a data set; normalizing pixel values of all image blocks in a dataset to [ -1, 1](ii) a Inputting a pair of visible light optical image blocks and sar image blocks which are matched after normalization, and respectively extracting the image block characteristics of the optical image blocks and the sar image blocks by using an encoder; performing characteristic decomposition on the characteristics of the optical image block and the characteristics of the sar image block to obtain the common characteristic O of the optical image block_cAnd private characteristics O_pCommon features S of sar image blocks_cAnd private features S_p(ii) a Common features O for optical image blocks using a decoder_cAnd private characteristics O_pReconstruction of common features S of optical, sar, image blocks_cAnd private features S_pReconstructing an sar image; sending the two public features to a discriminator, and distinguishing the two public features from optical image blocks or sar image blocks through the discriminator; using a triplet loss optimization encoder; respectively calculating reconstruction loss of the reconstructed optical image and the sar image and the corresponding original image; introducing countermeasures to optimize the encoder and the discriminator so that the encoder spoofs the discriminator; constraining public and private features using differential losses; optimizing the feature decomposition network through triple loss, countermeasure loss, reconstruction loss and difference loss to obtain a training weight for subsequent testing of common features; and loading the trained weight into the feature decomposition network model, sequentially reading all the test set data, predicting the matching probability of each pair of image blocks in the test set, and obtaining the final image block matching result.

In still another embodiment of the present invention, the present invention further provides a storage medium, specifically a computer-readable storage medium (Memory), which is a Memory device in a terminal device and is used for storing programs and data. It is understood that the computer readable storage medium herein may include a built-in storage medium in the terminal device, and may also include an extended storage medium supported by the terminal device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also, one or more instructions, which may be one or more computer programs (including program code), are stored in the memory space and are adapted to be loaded and executed by the processor. It should be noted that the computer-readable storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory.

One or more instructions stored in a computer-readable storage medium may be loaded and executed by a processor to perform the corresponding steps for feature decomposition for multi-modal tile matching in the above embodiments; one or more instructions in the computer-readable storage medium are loaded by the processor and perform the steps of:

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Simulation experiment conditions are as follows:

the hardware platform of the simulation experiment of the invention:

intel (r) Core5 processor of dell computer, main frequency 3.20GHz, memory 64 GB; the simulation software platform is as follows: spyder software (python3.5) version.

Simulation experiment content and result analysis:

the simulation experiment of the invention is divided into two simulation experiments.

Referring to FIG. 3, the present invention uses the disclosed data set. 323464 pairs of patch blocks are cut out from the 512 x 512 sized visible light and sar images aligned at the 1500 pair pixel level. Of these 246121 are used for training on the patch block and the rest are used for testing. The size of the Patch block is 32 × 32. The invention uses the trained network weight to predict the matching probability of each group of data in the test set, and the obtained matching result is shown in figure 4. The first line in the figure is an optical image block and the second line is an sar image block. The first column is real matching patches, the second column is real non-matching patches, the third column is fake matching patches, and the fourth column is fake non-matching patches. It can be seen that there is very similar appearance and semantic information between the mismatched image blocks. The appearances of fake non-matching patches are quite dissimilar. Therefore, it is very difficult to match these confusable image blocks according to appearance.

Simulation experiment 1

The performance of the present invention is compared to the prior art. As shown in the table 1 below, the following examples,

TABLE 1

The image block matching results of the method of the present invention and some existing methods, such as Match-Net, L2-Net and HardNet, are shown. The evaluation indexes used are FPR95 and accuracy, and a smaller FPR95 represents a better matching effect. Where FDNet indicates the results of the present invention, and bold indicates the best results.

It can be seen that the FDNet proposed by the present invention achieves the best matching performance in terms of FPR95 and matching accuracy. Specifically, FPR95 was reduced by 23.1%, 15.7% and 3.1% compared to Match-Net, L2-Net and HardNet, respectively. Meanwhile, the accuracy of the method is respectively improved by 12.4%, 6.7% and 1.4%. Furthermore, the performance of HardNet depends on the size of the minilot. The larger the batchsize, the better the performance. When the blocksize of HardNet is increased, the performance is remarkably improved. However, as the batchsize increases, the amount of memory and computation increases. FDNet achieves the best results even with small batch sizes.

Simulation experiment 2

By adopting the method, the result of the distribution visualization of the descriptors extracted by HardNet and FDNet is shown in FIG. 5. Fig. 5(a) is a HardNet visualization, where opt represents a descriptor of a visible light image and sar represents a descriptor of an sar image. Fig. 5(b) shows an FDNet visualization result, where opt _ com represents a public feature of the visible light image, sar _ com represents a public feature of the sar image, opt _ pri represents a private feature of the visible light image, and sar _ pri represents a private feature of the sar image. Keypoint detection is performed on the same pair of visible-sar images using SIFT, and 32 × 32 sized image blocks are cropped around keypoints. The image blocks are input into HardNet and FDNet extraction descriptors, respectively. The descriptors extracted by HardNet and FDNet were finally processed and visualized separately.

It can be seen that HardNet utilizes a way of sharing weights to constrain the descriptor distribution of visible light images and sar images with certain effect, but has an improved space. FDNet utilizes the mode of feature decomposition and antagonism training to retrain descriptor distribution, makes the private characteristic and the public characteristic separation of light image and sar image, makes the public characteristic distribution of visible light image and sar image more approximate simultaneously, has obtained better matching effect.

In conclusion, the characteristic decomposition method for multi-modal image block matching disclosed by the invention has the advantages that experimental results show that the method achieves a good effect in a multi-modal image matching task.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims

1. A method of feature decomposition for multi-modal patch matching, comprising the steps of:

s1, making a data set;

2. The method of claim 1, wherein in step S3, the encoder includes five convolutional layers, the size of the convolutional kernel is 3 x 3, and the number of convolutional kernels is 32, 32, 64, 64 and 128, respectively.

3. The method of claim 1, wherein in step S4, the feature decomposition module includes two convolutional layers, the convolutional kernel size of the first convolutional layer is 3 × 3, and the number of channels is 128; the convolution kernel size of the second layer is 8 × 8 and the number of channels is 128.

4. The method according to claim 1, wherein in step S4, the optical common module and the sar common module are respectively used to extract common features from the optical image block features and the sar image block features, and the optical special module F is used_poAnd sar specific module F_psRespectively extracting the private features of the optical image block features and the sar image block features to obtain the public features and the private features of the optical image and the sar image:

O_c，i，O_p，i＝F_c(x_opt，i)，F_po(x_opt，i)

S_c，i，S_p，i＝F_c(x_sar，i)，F_ps(x_sar，i)

wherein O is_c，iAnd O_p，iPublic and private features, S, representing an optical image block_c，iAnd S_p，iPublic and private features representing sar image blocks。

5. The method according to claim 1, wherein in step S5, the decoder comprises two fully-connected layers and four convolutional layers, and the number of neurons in the input layer, the hidden layer and the output layer is 256, 512 and 1024 respectively; the sizes of the convolution kernels are 3 × 3, 5 × 5, and 7 × 7, and the numbers of the convolution kernels are 32, 64, 128, and 1, respectively.

6. The method according to claim 1, wherein in step S5, reconstructing the input image block based on the public and private features of the input image is as follows:

where De is the number of bits in the decoder,

is a reconstructed block of the optical image,

is the reconstructed sar image block.

7. The method of claim 1, wherein in step S6, the arbiter comprises five full-connection layers, and the number of each full-connection layer neuron is 128, 512 and 2.

8. The method according to claim 1, wherein in step S7, jointly optimizing the four losses results in a final loss function as follows:

L＝L_tri+L_adv+λL_diff+L_rec

9. The method of claim 8, wherein the triplet loss L is_triComprises the following steps:

the differential losses are as follows:

wherein,

for transposition of common features of optical image blocks, O_p，iFor the private features of the optical image block,

for transposition of common features of sar image blocks, S_p，iThe private characteristics of the sar image block are that i is 1, …, n is the number of samples;

where k is the number of pixels in each image block,

for the reconstructed optical image block, x_opt，iFor an input block of an optical image,

for reconstructed sar image block, x_sar，iAn input sar image block;

the penalty function of the arbiter is:

wherein D is a discriminator, E is an entropy calculation, F_c(x_sar，i) Features extracted by the encoder for the input sar image block, F_c(x_opt，i) Features extracted by an encoder for an input optical image block;

the encoder's penalty function is as follows:

wherein E is an entropy calculation, F_c(x_sar，i) Features extracted by the encoder for the input sar image block.

10. A feature decomposition system for multi-modal patch matching, comprising:

a data module for making a data set;

a decomposition module for performing decomposition on the features of the optical image block and the sar image block obtained by the feature moduleDecomposing the line characteristics to obtain the common characteristics O of the optical image blocks_cAnd private characteristics O_pCommon features S of sar image blocks_cAnd private features S_p；