CN114491135A

CN114491135A - Cross-view angle geographic image retrieval method based on variation information bottleneck

Info

Publication number: CN114491135A
Application number: CN202210352920.4A
Authority: CN
Inventors: 徐行; 胡谦; 李宛思; 沈复民; 申恒涛
Original assignee: Chengdu Koala Youran Technology Co ltd
Current assignee: Chengdu Koala Youran Technology Co ltd
Priority date: 2022-04-06
Filing date: 2022-04-06
Publication date: 2022-05-13

Abstract

The invention discloses a cross-view angle geographic image retrieval method based on variation information bottleneck, which relates to the technical field of cross-view angle geographic image retrieval in computer vision, wherein a classifier of a conventional retrieval model can be quickly converged in a training process, so that the generated gradient contains too little information and a feature extraction module cannot be effectively trained, the retrieval model is easy to be over-fitted, and the performance on a tested data set is poor; the method adds Gaussian noise to a classifier in a training process by using a variational information bottleneck module, forces a feature extraction module to extract image representation with view invariance and discriminability so as to improve the generalization capability and robustness of a retrieval model, and uses the features compressed by the variational information bottleneck module as retrieval features; thereby achieving the purpose of improving the accuracy of the retrieval result.

Description

Cross-view angle geographic image retrieval method based on variation information bottleneck

Technical Field

The invention relates to the technical field of cross-view angle geographic image retrieval in computer vision, in particular to a cross-view angle geographic image retrieval method based on variation information bottleneck.

Background

The cross-view geographic image retrieval is to perform retrieval matching on the same geographic target in the image from different views such as a ground view or a satellite view, for example, given a ground view query image, a satellite image of the same geographic target is searched in a candidate image of the satellite view. The method has wide application, such as unmanned driving, and the precise geographic target positioning is required to be realized, so the method has great application value and economic benefit.

Cross-perspective geographic image retrieval is a challenging task because extreme perspective changes cause large changes in visual appearance; in recent years, the task of cross-perspective geographic image retrieval has been greatly developed.

The traditional method focuses on mining the feature representation of the image center geographic target, but ignores the importance of the context information of the adjacent areas of the image. Therefore, the method proposes to use the adjacent area of the image center geographic target as auxiliary information to enrich the judgment clue, and obviously improves the retrieval effect. The method is realized based on the variation information bottleneck module, Gaussian noise can be added to the output result of the feature extraction module, so that the classifier can have robustness to the noise, the feature extraction module is forced to have image representation of extraction view invariance and discriminability, and the generalization capability of the cross-view angle geographic image retrieval model based on the variation information bottleneck is improved.

Disclosure of Invention

The invention aims to: the method comprises the steps of providing a cross-view geographic image retrieval method based on variation information bottleneck, improving generalization capability and robustness of a retrieval model, and using features compressed by a variation information bottleneck module as retrieval features; and obtaining the image representation with view invariance and discriminability as the retrieval feature, thereby achieving the purpose of improving the accuracy of the retrieval result.

The invention specifically adopts the following technical scheme for realizing the purpose:

a cross-view angle geographic image retrieval method based on variation information bottleneck comprises the following steps:

step S1: selecting common cross-view geographic image data sets, wherein the common cross-view geographic image data sets comprise a train data set and a val data set, and the common cross-view geographic image data sets comprise images of two views, namely a ground view image and a satellite view image;

step S2: training a cross-view angle geographic image retrieval model based on variation information bottleneck;

step S3: cross-view angle geographic image retrieval model test based on variation information bottleneck; selecting any one ground visual angle image, inputting the ground visual angle image into the cross-visual angle geographic image retrieval model based on the variation information bottleneck obtained in the step S2, and obtaining the output characteristic Zⁱ _jMean value of Uⁱ _jWill U isⁱ _jAnd splicing the images in rows to obtain features serving as retrieval features, so as to retrieve the satellite view images with the same targets as the ground view images.

As a preferred technical scheme, the cross-view angle geographic image retrieval model based on the variation information bottleneck comprises a feature extraction module, a variation information bottleneck module and a classifier module;

a feature extraction module: is a ResNet-50 model pre-trained on an ImageNet dataset to extract features of an input image;

a variation information bottleneck module: is composed of an encoder, and the input of the variable information bottleneck is Vⁱ _jThe encoder is provided with two linear layers as output layers, the dimension size is 512, and the output two feature vectors are respectively used as the mean value and the variance of the learning of the variational information bottleneck module;

the classifier module sequentially comprises a full connection layer, a batch processing normalization layer, a Dropout layer and a linear classification layer, wherein the dimension of the linear classification layer is the number of classes of a classification target.

As a preferable technical scheme, the feature extraction module adopts a square ring feature partitioning strategy to extract image features, and provides attention according to the distance from the peripheral area of the image to the center of the image, so that discriminant clues of the image features are enriched.

As a preferred technical solution, the feature extraction module is specifically operative to:

will input image x_jAdjusting the image size to 256 × 256, inputting the image size to a feature extraction module to obtain image features R_jWherein x is_j∈｛x_d，x_s｝, x_dAnd x_sEach representing two different viewing angles, x_dRepresenting ground perspective, x_sRepresenting a satellite view;

then using the design of square ring feature partition to divide the feature map into i square ring parts,

is denoted by Rⁱ _j=P _slice（R_jI) then each fraction is averaged and pooled to give a feature R of dimension 2048ⁱ _jIs marked as Vⁱ _j=Avgpool(Rⁱ _j) WhereinP _sliceThe operation is partitioned for the square-ring feature,Avgpoolis an average pooling operation.

As a preferred technical solution, the step S2 specifically includes:

step S2.1: extracting image features of a train data set by using a feature extraction module, wherein the input of the feature extraction module is two images with different visual angles, and the two images are recorded as a ground visual angle image x_dAnd satellite view angle image x_s；

Step S2.2: ground view image x_dThe input feature extraction module obtains image features R_d(ii) a Utilizing a square ring characteristic partitioning strategy, and obtaining the characteristics V of each part through average poolingⁱ _d；

Step S2.3: satellite view angle image x_sAnd ground perspective image x_dThe operation processing flow is the same, and the characteristic V of the satellite view image is obtainedⁱ _s；

Step S2.4: inputting characteristics V of the two visual angles obtained in the steps S2.2 and S2.3ⁱ _dAnd Vⁱ _SInputting a variation information bottleneck module to obtain respective mean value and variance, and then carrying out a re-parameter to obtain an output characteristic Zⁱ _dAnd Zⁱ _s：

Step S2.5: the operation of the heavy parameters is to study the mean value and variance sum of the variation information bottleneck module in normal distributionN(0，I）Sampling an epsilon, carrying out re-parameter according to the following formula,Idimension of (d) and output characteristic Zⁱ _dAnd Zⁱ _sThe dimensions are the same;

Z=μ+σ*ε

where μ represents the mean, σ represents the variance, and ε represents the distribution from normalN(0，I）Randomly sampling a numerical value as added Gaussian noise, and adding disturbance to the training of the classifier;

step S2.6: two image characteristics Z obtained by resampling in the step S2.5ⁱ _dAnd Zⁱ _sInputting the data into a classifier module to calculate classification loss;

step S2.7: aiming at enhancing generalization capability and robustness of a cross-view geographic image retrieval model based on variation information bottleneck, and preventing variance of training output of a variation information bottleneck module from being zero according to output characteristics Zⁱ _dMean and variance calculation of (2): z is a linear or branched memberⁱ _dKL distance from standard normal distribution, and output characteristic Zⁱ _sThe same calculation is also carried out, and the proportion of the calculated KL distance loss in the total loss function is controlled by the parameter beta;

the method specifically comprises the following steps: computing output characteristic Zⁱ _dAnd Zⁱ _sKL distance from the standard normal distribution, and finally total loss function concrete formula L of the whole cross-view angle geographic image retrieval model based on variation information bottleneck_VIBThe following were used:

L_VIB=L _cls+β*D_KL[[p(Z|x), r(z)]]

wherein D_KLRepresenting calculated KL distance, r (z) representing a prior scoreHere, normal distribution, p (Z | x) represents the predicted distribution of the features Z of the input image x, the specific values include the mean and variance of cross-perspective geographic image retrieval model learning based on variation information bottleneck, β is a weight hyperparameter, the specific values will be set in specific implementation cases,L _clsclassifying a loss function for cross entropy;

step S2.8: total loss function L of cross-view angle geographic image retrieval model based on variation information bottleneck by utilizing random gradient descent method_VIBCarrying out optimization solution, and recording the calculated total loss function value;

step S2.9: repeating the step S2.1-S2.8, and training a cross-view angle geographic image retrieval model based on variation information bottleneck by using a train data set of the cross-view angle geographic image data set; and stopping training until the total loss function value is reduced to no longer change, which indicates that the cross-view angle geographic image retrieval model based on the variable information bottleneck has converged, and saving the model as the cross-view angle geographic image retrieval model based on the variable information bottleneck for final testing.

As a preferred technical solution, in step S2.6, the classification loss function is a cross-entropy function, which is a cross-entropy classification loss functionL _clsThe details are as follows:

subscript j represents different visual angles, subscript j represents a ground visual angle when being d, represents a satellite visual angle when being s, i represents an ith divided part, c represents the number of classified objects, gⁱ _j(y) is a predicted probability value of the classification target real label у, gⁱ _j(c) Predicted probability values for other classification targets.

The invention has the following beneficial effects:

1. according to the invention, the variation information bottleneck module is added, the robustness and the generalization capability of the cross-view geographic image retrieval model based on the variation information bottleneck are improved, the characteristic representation with view invariance and discriminability is obtained as the retrieval characteristic, and the accuracy of the cross-view geographic image retrieval is improved.

Drawings

FIG. 1 is a flow chart of a retrieval method of the present invention;

FIG. 2 is a network framework diagram of the search model of the present invention;

fig. 3 is a diagram of CVACT _ val search results according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations

Thus, the following detailed description of the embodiments of the present invention, as presented in FIGS. 1-3, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

step S3: cross-view angle geographic image retrieval model test based on variation information bottleneck; selecting any one ground visual angle image, inputting the ground visual angle image into the cross-visual angle geographic image retrieval model based on the variation information bottleneck obtained in the step S2, and obtaining the output characteristic Zⁱ _jMean value of Uⁱ _jWill U isⁱ _jLine-wise stitching to obtain featuresAs a retrieval feature, a satellite perspective image having the same target as the ground perspective image is thereby retrieved.

The concrete during operation: the generalization capability and robustness of a retrieval model are improved, and the features compressed by a variational information bottleneck module are used as retrieval features; and obtaining the image representation with view invariance and discriminability, thereby achieving the purpose of improving the accuracy of the retrieval result.

Example 2

As shown in fig. 1, the invention relates to a cross-view geographic image retrieval method based on variation information bottleneck, comprising the following steps:

step A1: selecting a commonly-used public cross-view geographic image data set CVACT, wherein the CVACT is a large-scale cross-view public data set, the CVACT provides 35532 to train ground view and satellite view images, and additionally provides 8884 to a test set of images, namely CVACT _ val, and for the CVACT _ val test set, an inquiry image only has a correct matching image, namely, a given ground view inquiry image only has a correct satellite view image to match with.

Step A2 data preprocessing

The preprocessing of the data is to adjust the image of the input train data set to be 256 multiplied by 256 in fixed size and then randomly turn over the image to increase the diversity of the training samples.

Step A3: cross-view angle geographic image retrieval model based on variation information bottleneck in training

In this example, a network framework of a cross-perspective geographic image retrieval model based on variational information bottlenecks is shown in figure 2,

step A.1, inputting the image preprocessed in the step A2 into a feature extraction module to extract image features, and then obtaining input features V by adopting a square ring feature partition strategyⁱ _S(satellite perspective image representation) and Vⁱ _d(ground perspective image representation), as detailed below;

for two visual angles of the CVACT data set, each processing branch is a satellite visual angle branch and a ground visual angle branch; because of the ground viewing angle in the CVACT data setThe image has a wide ground view, so the characteristics of the CVACT image adopt a square ring partition design with 8 blocks; dividing the extracted image features into 8 square rings according to the distance from the adjacent region to the center of the image, and then obtaining input features Vⁱ _SAnd Vⁱ _d；

Step A3.2: input characteristics V obtained in step 3.1ⁱ _dAnd Vⁱ _SInputting a variation information bottleneck module to obtain respective mean value and variance, and then carrying out a re-parameter to obtain an output characteristic Zⁱ _dAnd Zⁱ _s；

The operation of the heavy parameters is to study the mean value and variance sum of the variation information bottleneck module in normal distributionN(0，I）Sampling an epsilon, and repeating the parameters, the dimension of I and the output characteristic Z according to the following formulaⁱ _dAnd Zⁱ _sThe dimensions are the same;

Z=μ+σ*ε

step A3.3: the output characteristic Z obtained by resampling the step A3.2ⁱ _dAnd Zⁱ _sInputting a classifier module, calculating classification loss, wherein a cross entropy function is adopted as a classification loss function, and the method is specifically as follows:

the subscript j indicates different view angles, the subscript j indicates a ground view angle when d, indicates a satellite view angle when s, i indicates an ith division, c indicates the number of classified objects, gⁱ _j(y) is a predicted probability value of the classification target real label у, gⁱ _j(c) Predicted probability values for other classification targets.

Step A3.4: aiming at enhancing generalization capability and robustness of cross-perspective geographic image retrieval model based on variation information bottleneckThe variance output by the variational information bottleneck module training is prevented from being zero, and the final output characteristic Z is enabled to beⁱ _dAnd Zⁱ _sApproximate standard distribution, calculating output characteristic Zⁱ _dAnd Zⁱ _sKL distance (Kullback-Leibler Divergence) from standard normal distribution, and finally, a specific formula L of a loss function of the whole cross-view angle geographic image retrieval model based on variation information bottleneck_VIBThe following were used:

L_VIB =L _cls+β*D_KL[[p(Z|x), r(z)]]

wherein D_KLRepresenting calculation of KL distance (Kullback-Leibler Divergence), r (Z) representing prior distribution, here normal distribution, p (Z | x) representing prediction distribution of characteristic Z of input image x, specific values including mean and variance of cross-view geographic image retrieval model learning based on variation information bottleneck, beta being weight hyper-parameter, and being initialized to 10^-6And increases as the number of iterations of the training increases, by multiplying the initialization value by the number of iterations,L _clsclassifying a loss function for cross entropy;

step A3.5: total loss function L of cross-view angle geographic image retrieval model based on variation information bottleneck by using random gradient descent method_VIBCarrying out optimization solution, and recording the calculated total loss function value;

step A3.6: repeating the steps A2-A3.5, and training a cross-view angle geographic image retrieval model based on variation information bottleneck by using a train data set of a CVACT data set; stopping training until the total loss function value is reduced to no longer change, indicating that the cross-view angle geographic image retrieval model based on the variation information bottleneck has converged, and saving the model as the cross-view angle geographic image retrieval model based on the variation information bottleneck for final testing;

step A4: cross-view angle geographic image retrieval model test based on variation information bottleneck

Selecting any ground visual angle image, inputting the ground visual angle image into the cross-visual angle geographic image retrieval model based on the variation information bottleneck obtained in the step A3.6, and obtaining the output characteristic Zⁱ _jMean value of Uⁱ _jWill U isⁱ _jAnd splicing the images in rows to obtain features serving as retrieval features, so as to retrieve the satellite view images with the same targets as the ground view images.

The test results on the CVACT test set are shown in table 1, and a judgment index Recall @ K (R @ K, K =1,5,10) is output to evaluate the retrieval performance of the model. (R @ K represents the proportion of the top K of a correctly matched image in the ranking list, the higher the value of R @ K, the better the performance of the model, and in Table 1, terrestrial- > satellite represents that a given terrestrial view image retrieves a satellite view image)

TABLE 1 comparison of model Performance on CVACT _ val

On CVACT _ val, the present invention is compared to other most advanced methods. The results are shown in table 1, and the bold numbers in table 1 indicate that the present invention has a numerical improvement in the search index compared to other methods; it can be observed that the invention realizes 81.04% Recall @1 precision on ground- > satellite, and the precision index is obviously superior to other methods.

The effectiveness of the cross-view geographic image retrieval method based on the variational information bottleneck is proved.

As shown in fig. 3, the search results on CVACT _ val are visualized, and the similarity of the search results is ranked from large to small. In fig. 3, the result of ground- > 3 before satellite search is shown on CVACT _ val, and v represents an image with accurate search, and it can be seen from the figure that the most relevant and correct image can be accurately searched by the present invention, and the above example further intuitively illustrates the effectiveness of the present invention in the actual cross-view geographic image search task.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A cross-view angle geographic image retrieval method based on variation information bottleneck is characterized by comprising the following steps:

2. The cross-perspective geographic image retrieval method based on the variation information bottleneck, according to claim 1, is characterized in that the cross-perspective geographic image retrieval model based on the variation information bottleneck comprises a feature extraction module, a variation information bottleneck module and a classifier module;

3. The cross-view geographic image retrieval method based on the variational information bottleneck as claimed in claim 2, wherein the feature extraction module adopts a square ring feature partitioning strategy to extract image features, and provides attention according to the distance from the peripheral region of the image to the center of the image, thereby enriching the distinguishing clues of the image features.

4. The cross-perspective geographic image retrieval method based on variational information bottlenecks of claim 3, wherein the feature extraction module is specifically operative to:

is denoted by Rⁱ _j=P _slice（R_jI) then each portion is averaged pooled to obtain a feature R of dimension 2048ⁱ _jIs marked as Vⁱ _j=Avgpool(Rⁱ _j) WhereinP _sliceThe operation is partitioned for the square-ring feature,Avgpoolis an average pooling operation.

5. The method for retrieving the cross-perspective geographic image based on the variation information bottleneck as claimed in claim 1, wherein the step S2 specifically comprises:

step S2.1: extracting the image characteristics of the train data set by using a characteristic extraction module, wherein the input of the characteristic extraction module is two images with different visual angles, and the two images are recorded as a ground visual angle image x_dAnd satellite view angle image x_s；

Step S2.5: the operation of the heavy parameters is to study the mean value and variance sum of the variation information bottleneck module in normal distributionN (0，I）Sampling an epsilon, carrying out re-parameter according to the following formula,Idimension of (d) and output characteristic Zⁱ _dAnd Zⁱ _sThe dimensions are the same;

Z=μ+σ*ε

step S2.6: two image characteristics Z obtained by resampling in the step S2.5ⁱ _dAnd Zⁱ _sInputting the classification loss into a classifier module, and calculating the classification loss;

step S2.7: computing output characteristic Zⁱ _dAnd Zⁱ _sKL distance from the standard normal distribution, and finally total loss function concrete formula L of the whole cross-view angle geographic image retrieval model based on variation information bottleneck_VIBThe following were used:

L_VIB=L _cls+β*D_KL[[p(Z|x), r(z)]]

wherein D_KLRepresenting the calculated KL distance, r (Z) representing the prior distribution, here, the normal distribution, p (Z | x) representing the prediction distribution of the characteristic Z of the input image x, specific values including the mean and variance of cross-view geographic image retrieval model learning based on variation information bottleneck, and beta being a weight hyperparameter, will be set in specific implementation casesThe value of the volume is,L _clsis a cross entropy classification loss;

step S2.8: total loss function L of cross-view angle geographic image retrieval model based on variation information bottleneck by using random gradient descent method_VIBCarrying out optimization solution, and recording the calculated total loss function value;

6. The cross-view geographic image retrieval method based on variation information bottleneck as claimed in claim 5, wherein in step S2.6, the classification loss function is a cross entropy function, and the cross entropy classification loss isL _clsThe details are as follows: