CN114691911B - Cross-view angle geographic image retrieval method based on information bottleneck variational distillation - Google Patents

Cross-view angle geographic image retrieval method based on information bottleneck variational distillation Download PDF

Info

Publication number
CN114691911B
CN114691911B CN202210285790.7A CN202210285790A CN114691911B CN 114691911 B CN114691911 B CN 114691911B CN 202210285790 A CN202210285790 A CN 202210285790A CN 114691911 B CN114691911 B CN 114691911B
Authority
CN
China
Prior art keywords
image
geographic
cross
view
distillation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210285790.7A
Other languages
Chinese (zh)
Other versions
CN114691911A (en
Inventor
徐行
胡谦
李宛思
沈复民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210285790.7A priority Critical patent/CN114691911B/en
Publication of CN114691911A publication Critical patent/CN114691911A/en
Application granted granted Critical
Publication of CN114691911B publication Critical patent/CN114691911B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a cross-view angle geographic image retrieval method based on information bottleneck variational distillation, which carries out cross-view angle geographic image retrieval by using discriminant representation after redundant information is removed; the feature extracted by the feature extraction module is compressed by the information bottleneck module to obtain low-dimensional image representation by using the variational distillation technology, and the low-dimensional image representation is constrained by using variational distillation loss and cross entropy classification loss to retain more prediction information, so that the aim of removing redundant information is fulfilled; finally, the low-dimensional image representation with discriminability is obtained as the retrieval feature, and the aims of improving the accuracy of the retrieval result and accelerating the retrieval speed are fulfilled.

Description

Cross-view angle geographic image retrieval method based on information bottleneck variational distillation
Technical Field
The invention belongs to the technical field of cross-view image retrieval in computer vision, and particularly relates to a cross-view geographic image retrieval method based on information bottleneck variational distillation.
Background
The cross-view geographic image retrieval is to perform retrieval matching on the same geographic target in an image from different angles such as a satellite view angle or an unmanned aerial vehicle view angle, for example, given an unmanned aerial vehicle view angle query image, searching for an image of the same geographic target in a candidate image of the satellite view angle. It has extensive application, like the accurate express delivery of unmanned aerial vehicle, unmanned aerial vehicle investigation, unmanned aerial vehicle navigation task etc. these tasks all require that unmanned aerial vehicle can realize comparatively accurate geographical target location, have very big using value and economic benefits.
Cross-perspective geographic image retrieval is a challenging task due to the tremendous change in visual appearance caused by extreme perspective changes. With the development of deep learning, the search task of the cross-view geographic image is greatly developed, and the main methods can be divided into the following two types:
(1) And (3) studying the deep neural network learning discriminant features by metric learning: the deep neural network learns a feature space, makes the matched image pair closer, and pushes away the unmatched image pair; attention mechanisms have also found widespread use in network design of such methods.
(2) And (3) enriching and judging clues by using information of adjacent areas in the center of the image: inspired by the work of the human visual system, the human visual system generally adopts a layered processing mode to improve the judgment accuracy; the human visual system focuses first on whether different perspective scenes contain the same geographic object, and then checks the context information around the geographic object to verify the correctness of the match. The method utilizes the adjacent area of the image center geographic target as auxiliary information, explores the information of the geographic image context, and enriches the judgment clues.
Traditional methods generally focus on mining fine-grained features of image-centric geographic objects, while underestimating the importance of context information of neighboring regions. The newly proposed method utilizes the adjacent region of the image center geographic target as auxiliary information, enriches the judgment clues and obviously improves the effect. However, when the context information of the image is focused, inevitable redundant information is brought, which causes a reduction in the search accuracy to some extent, and causes a large search feature dimension, which reduces the search speed.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a cross-view angle geographic image retrieval method based on information bottleneck variational distillation, which uses discriminant representation after redundant information is removed to carry out cross-view angle geographic image retrieval; by utilizing the variational distillation technology, the characteristics extracted by the characteristic extraction module are subjected to redundant information removal by compressing the characteristics through the information bottleneck module, so that the image representation with higher discriminability and lower dimension is obtained, and the aims of improving the accuracy of the retrieval result and accelerating the retrieval speed are fulfilled.
The invention is realized based on a cross-view geographic image retrieval model of information bottleneck variation distillation, and specifically comprises a feature extraction module, a classifier 1 and a classifier 2 which are respectively corresponding to the information bottleneck module and the two modules, and the detailed explanation is as follows.
The feature extraction module extracts global features of the input image by using a residual neural network ResNet-50 with weights pre-trained on ImageNet. The ResNet-50 comprises five blocks named as conv1, conv2, conv3, conv4 and conv5, an average pooling layer and a full connection layer, the invention removes the average pooling layer and the full connection layer of the ResNet-50, and the input image obtains the global features of the image for subsequent processing.
In order to fully utilize the context information of the image, a square ring feature partitioning strategy is adopted for the extracted global features of the image, and the adjacent regions are used as auxiliary information according to the attention provided by the distance from the adjacent regions to the center of the image, so that the judgment clues of the geographic image are enriched. The specific operation is to use a square ring partition design, divide the image global features extracted by the feature extraction module into several square ring parts, then obtain features with dimension 2048 by average pooling of each part, and the process can be expressed as:
f j =F resnet-50 (x j )
Figure BDA0003558172310000021
Figure BDA0003558172310000022
the subscript j represents the different viewing angles, x j Representing an input image, f j Representing the global features of the extracted image,
Figure BDA0003558172310000023
representing global features f from images j Characterization of the i-th part of the division, <' > H>
Figure BDA0003558172310000024
The features representing the i-th part of the cut are averaged over pooled features. F slice Representing square ring feature partition strategy operation, and Avgpool representing average pooling operation; the resulting initial characteristic->
Figure BDA0003558172310000025
Will be the input to the classifier 1 and the information bottleneck module.
The classifier 1 is composed of a full connection layer, a batch processing normalization layer, a Dropout layer and a classification layer, wherein the classification layer is the full connection layer, and the dimension of the output vector of the classification layer is the category number of the geographic target.
The information bottleneck module is realized by an encoder, the obtained initial features are compressed and dimensionality reduced, the features with the output dimensionality of 400 are smaller than the commonly-used feature dimensionality 512, and after the cross-view angle geographic image retrieval model based on the information bottleneck variational distillation is trained, the information bottleneck module can obtain low-dimensional and more discriminative image representation as retrieval features, so that the retrieval speed can be accelerated, and the retrieval performance can be improved.
The input of the classifier 2 is the output of the information bottleneck module, the input characteristic dimension is 400, the dimension of the output vector is the number of the categories of the geographic target, and the middle is also composed of a batch processing normalization layer and a Dropout layer.
The invention discloses a cross-view angle geographic image retrieval method based on information bottleneck variational distillation, which is characterized by comprising the following steps of:
step S1: selecting a common cross-perspective geographic image training dataset
Step S2: cross-visual angle geographic image retrieval model based on information bottleneck variation distillation training
Step S2.1: extracting image features of a training data set by using a feature extraction module, wherein the input of the feature extraction module is two images with different visual angles, and the two images are recorded as a visual angle 1 image and a visual angle 2 image;
step S2.2: view 1 image x 1 The input feature extraction module obtains the global features f of the image 1 (ii) a Adopting a square ring characteristic partition strategy and using square ring partition design to obtain the characteristics of each part after partition
Figure BDA0003558172310000031
The initial characteristic ^ of the view 1 image after the average pooling>
Figure BDA0003558172310000032
Step S2.3: view 2 image x 2 And View 1 image x 1 The same operation is carried out, and the initial characteristics of the view angle 2 image are obtained
Figure BDA0003558172310000033
Step S2.4: the initial characteristics of the two visual angles obtained in the steps S2.2 and S2.3 are compared
Figure BDA0003558172310000034
And &>
Figure BDA0003558172310000035
Inputting into a classifier 1, calculating cross entropy classification loss, and classifying loss function L cls1 The following were used: />
Figure BDA0003558172310000036
Figure BDA0003558172310000037
j e {1,2} represents a different view angle, i represents the i-th part of the division, F classifier1 () represents the operations performed by the classifier 1,
Figure BDA0003558172310000038
represents the predicted probability of the geotarget real tag y, <' > is>
Figure BDA0003558172310000039
Representing the prediction probability of each geographic target, wherein C is the category number of the geographic targets; />
Figure BDA00035581723100000310
Is a vector output by the classifier 1, whose dimensions are the number of classification targets, which are combined>
Figure BDA00035581723100000311
Represents the value at the position c, which is the predicted probability, and c corresponds to a subscript; />
Figure BDA00035581723100000312
Is given a value y, and is directly calculated.
Step S2.5: the initial characteristics of the two views obtained in step S2.2 and step S2.3
Figure BDA00035581723100000313
And &>
Figure BDA00035581723100000314
Inputting an information bottleneck module to carry out compression characteristic redundancy removal information to obtain low-dimensional image representations which are respectively recorded as ^ greater than or equal to>
Figure BDA00035581723100000315
And &>
Figure BDA00035581723100000316
Step S2.6: the images with two visual angles and low dimension obtained in the step S2.5 are represented
Figure BDA00035581723100000317
And &>
Figure BDA00035581723100000318
Inputting into a classifier 2, calculating cross entropy classification loss and a classification loss function L cls2 The following:
Figure BDA00035581723100000319
Figure BDA00035581723100000320
j ∈ {1,2} denotes different viewing angles, i denotes the ith part of the partition, F classifier2 () represents the operations performed by the classifier 2,
Figure BDA00035581723100000321
represents the predicted probability of the geotarget real tag y, <' > is>
Figure BDA00035581723100000322
Representing the prediction probability of each geographic target, wherein C is the category number of the geographic targets; />
Figure BDA0003558172310000041
Is a vector output by the classifier 2, the dimension of which is the number of classification targets, which is greater than or equal to>
Figure BDA0003558172310000042
Represents the value at the position c, which is the predicted probability, and c corresponds to a subscript; />
Figure BDA0003558172310000043
Is given a value y, and is directly calculated.
Step S2.7: for low dimensional image representation
Figure BDA0003558172310000044
And &>
Figure BDA0003558172310000045
It is not sufficient to constrain the remaining prediction information by only cross-entropy classification losses, and the present invention utilizes the prediction distributions &'s obtained by classifier 1 and classifier 2>
Figure BDA0003558172310000046
And &>
Figure BDA0003558172310000047
And
Figure BDA0003558172310000048
calculating variational distillation loss which forcibly restricts the low-dimensional image representation->
Figure BDA0003558172310000049
And &>
Figure BDA00035581723100000410
Discarding redundant information, and keeping more prediction information to obtain more discriminative image representation, wherein the variational distillation loss function is as follows:
Figure BDA00035581723100000411
wherein D KL To calculate the KL distance (Kullback-Leibler Divergence),
Figure BDA00035581723100000412
and
Figure BDA00035581723100000413
and &>
Figure BDA00035581723100000414
Figure BDA00035581723100000415
The prediction distributions of the label y are obtained for the classifier 1 and the classifier 2, respectively, by calculating the prediction distribution->
Figure BDA00035581723100000416
And &>
Figure BDA00035581723100000417
Figure BDA00035581723100000418
And &>
Figure BDA00035581723100000419
KL distance between to ensure that the obtained low-dimensional image representation @>
Figure BDA00035581723100000420
And &>
Figure BDA00035581723100000421
Is sufficient for the label y, and +>
Figure BDA00035581723100000422
And &>
Figure BDA00035581723100000423
Compared to the initial characteristic->
Figure BDA00035581723100000424
And &>
Figure BDA00035581723100000425
Redundant information irrelevant to the task is discarded when the feature dimension is compressed, so that the method is more discriminative; />
Step S2.8: the total loss function L of the cross-view geographic image retrieval model is as follows, wherein lambda is a weight hyperparameter;
L=L cls1 +L cls2 +λL d
step S2.9: optimizing and solving the total loss function L by using a random gradient descent method, and recording an optimized total loss function value;
step S2.10: repeating the step S2.1-S2.9, and processing the cross-view geographic image training data set; stopping training until the total loss function value does not decrease, indicating that the cross-view angle geographic image retrieval model based on the information bottleneck variation distillation is trained, and saving the model as a finally detected cross-view angle geographic image retrieval model based on the information bottleneck variation distillation;
and step S3: retrieving the cross-perspective geographic image;
selecting a view angle 1 or view angle 2 image of any geographic target, inputting the image into the finally detected cross-view angle geographic image retrieval model based on information bottleneck variation distillation obtained in the step S2.10, and obtaining low-dimensional image representation after redundant information is removed
Figure BDA00035581723100000426
Will be/are>
Figure BDA00035581723100000427
And splicing to obtain a characteristic z as a retrieval characteristic, thereby retrieving another perspective image which is most relevant to the same geographic target as the perspective image.
On the basis of a square ring feature partitioning strategy, the context information of adjacent areas is fully mined, the feature information is enriched, and then redundant information irrelevant to tasks is removed through an information bottleneck network to obtain more discriminative features. In addition, the invention uses the variation distillation technology to teach the knowledge learned by the complex model ResNet-50 to an information bottleneck network by restricting the variation self-distillation loss, and obtains redundancy-free and more discriminant characteristics as image retrieval characteristics.
Drawings
FIG. 1 is a flow chart of a cross-view geographic image retrieval method based on information bottleneck variational distillation according to the invention;
FIG. 2 is a network framework diagram of a cross-perspective geographic image retrieval model based on information bottleneck variational distillation according to the present invention;
FIG. 3 is a visual display diagram of the search results of the cross-perspective geographic image search model in the University-1652 test set based on information bottleneck variational distillation.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described below in conjunction with embodiments and accompanying drawings so that those skilled in the art can better understand the present invention. It should be particularly noted that the described embodiments are only some embodiments of the invention, rather than all embodiments, and are not intended to limit the scope of the invention as claimed. All other embodiments obtained by a person skilled in the art without making any inventive step are within the scope of protection of the present invention.
Examples
FIG. 1 is a flow chart of a cross-view geographic image retrieval method based on information bottleneck variational distillation.
In this embodiment, as shown in fig. 1, the cross-view geographic image retrieval method based on information bottleneck variational distillation of the present invention includes the following steps:
step S1: selecting a common published cross-perspective geographic image dataset University-1652:
the invention will describe in detail the operation and flow on the University-1652 data set, the University-1652 is a multi-view data set, including satellite view data, unmanned aerial vehicle view data and ground view data, it includes 1652 buildings of 72 universities in the world, the training set includes 701 teaching buildings of 33 colleges, the test set is 951 teaching buildings of the other 39 colleges, the training set and the test set have no overlapping part. The data set is used for researching two tasks, wherein the first task is unmanned aerial vehicle visual angle target positioning (unmanned aerial vehicle- > satellite), namely the unmanned aerial vehicle visual angle image of a given geographic target queries the most similar satellite visual angle image of the same geographic target; the second task drone navigates (satellite- > drone) and vice versa. In the unmanned aerial vehicle visual angle target positioning task, 37855 unmanned aerial vehicle visual angle images are in a query set, and 951 satellite visual angle images to be matched are in a gallery. In the unmanned aerial vehicle navigation task, 701 satellite view images are collected in the query set, and 51355 unmanned aerial vehicle view images to be matched are contained in the gallery.
Step S2: data pre-processing
The data preprocessing comprises the steps of adjusting the input training set images to be 256 multiplied by 256 in fixed size, and then turning the images randomly, so that the characteristic that the view of the cross-view angle geographic image retrieval model based on the information bottleneck variation distillation is not changed is conveniently learned, and the generalization capability of the cross-view angle geographic image retrieval model based on the information bottleneck variation distillation is improved.
And step S3: cross-visual angle geographic image retrieval model based on information bottleneck variational distillation training
In the example, the network framework of the cross-perspective geographic image retrieval model based on information bottleneck variation distillation is shown in FIG. 2,
step S3.1: inputting the data preprocessed in the step S2 into a feature extraction module to extract global features of the image, dividing the extracted global features of the image into a plurality of square ring parts by adopting a square ring feature partition strategy, and obtaining features with the dimension of 2048 by each part through average pooling to obtain initial features
Figure BDA0003558172310000061
(initial satellite View feature) and>
Figure BDA0003558172310000062
(view angle initial feature of drone), this step is detailed as follows;
for two visual angles of the University-1652 data set, each of the two visual angles is provided with a processing branch which is a satellite view branch and an unmanned aerial vehicle view branch, and because input images of the two branches are from aerial visual angles, the feature extraction modules share weight; adopting square ring partition design with the block number of 4 for the global features of the University-1652 image, and dividing the extracted global features of the image into 4 square rings according to the distance from the adjacent region to the center of the image;
step S3.2: initial characteristics obtained in step 3.1
Figure BDA0003558172310000063
And &>
Figure BDA0003558172310000064
Inputting the classification into a classifier 1, and calculating the cross entropy classification loss, wherein the classification loss function is as follows:
Figure BDA0003558172310000065
Figure BDA0003558172310000066
j e {1,2} represents a different view angle, i represents the ith part of the partition,
Figure BDA0003558172310000067
represents the predicted probability of the geotarget real tag y, <' > is>
Figure BDA0003558172310000068
Representing the prediction probability of each geographic target, wherein C is the category number of the geographic targets;
step S3.3: initial characterization of step 3.1
Figure BDA0003558172310000069
And &>
Figure BDA00035581723100000610
Inputting information bottleneck module to compress characteristics to obtain low-dimensional image representations which are respectively recorded as->
Figure BDA00035581723100000611
And &>
Figure BDA00035581723100000612
Step S3.4: the images with two low-dimensional visual angles obtained in the step S3.3 are represented
Figure BDA00035581723100000613
And &>
Figure BDA00035581723100000614
Inputting the classification into a classifier 2, and calculating cross entropy classification loss, wherein a classification loss function is as follows:
Figure BDA00035581723100000615
Figure BDA00035581723100000616
/>
j e {1,2} represents a different view angle, i represents the ith part of the partition,
Figure BDA00035581723100000617
a predicted probability, representing a geotarget real tag y, in +>
Figure BDA00035581723100000618
Representing the prediction probability of each geographic target, wherein C is the category number of the geographic targets;
step S3.5: prediction distribution using classifier 1 and classifier 2
Figure BDA00035581723100000619
And &>
Figure BDA00035581723100000620
And
Figure BDA00035581723100000621
variational distillation losses are calculated to ensure that an image representation in the lower dimension->
Figure BDA0003558172310000071
And &>
Figure BDA0003558172310000072
Is sufficient for the label y, and->
Figure BDA0003558172310000073
And &>
Figure BDA0003558172310000074
Compared to the initial characteristic->
Figure BDA0003558172310000075
And &>
Figure BDA0003558172310000076
Redundant information not related to the task is discarded, and the variational distillation loss function is as follows:
Figure BDA0003558172310000077
wherein D KL To calculate the KL distance (Kullback-Leibler Divergence),
Figure BDA0003558172310000078
and
Figure BDA0003558172310000079
and &>
Figure BDA00035581723100000710
Figure BDA00035581723100000711
The prediction distributions of the label y are obtained for the classifier 1 and the classifier 2, respectively, by calculating the prediction distribution->
Figure BDA00035581723100000712
And &>
Figure BDA00035581723100000713
Figure BDA00035581723100000714
And &>
Figure BDA00035581723100000715
KL distance therebetween;
step S3.6: the total loss function L of the cross-view geographic image retrieval model is as follows, in the example, lambda is set to be 10, the total loss function L is optimized and solved by using a random gradient descent method, and the optimized total loss function value is recorded;
L=L cls1 +L cls2 +λL d
step S3.7: repeating the steps S2-S3.6, and processing the University-1652 training data set; stopping training until the total loss function value does not decrease, indicating that the cross-view angle geographic image retrieval model based on the information bottleneck variation distillation is trained, and saving the model as a finally detected cross-view angle geographic image retrieval model based on the information bottleneck variation distillation;
and step S4: cross-perspective geographic image retrieval
Inputting the unmanned aerial vehicle and satellite view angle images of the University-1652 test set into the finally detected cross-view angle geographic image retrieval model based on information bottleneck variational distillation obtained in the step S3.7, and obtaining low-dimensional image representation after redundant information is removed
Figure BDA00035581723100000716
Will->
Figure BDA00035581723100000717
And splicing to obtain a characteristic z as a retrieval characteristic, and retrieving another perspective image which is the same with the perspective image and is most relevant to the geographic target.
The test results on the University-1652 test set are shown in table 1, and the evaluation indexes recall @ K (K = 1) and average Accuracy (AP) are output to evaluate the retrieval performance of the model. ( R@K represents the proportion of the front k of a correctly matched image in a ranking list, the higher the R@K value is, the better the performance of a cross-view angle geographic image retrieval model based on information bottleneck variational distillation is, the higher the accuracy of the retrieval performance is reflected by the Average Precision (AP), and the higher the accuracy is; in table 1, drone satellite indicates that the view angle image of the drone is given to retrieve the view angle image of the satellite, and satellite drone vice versa )
TABLE 1 comparison of model Performance on the University-1652 test set
Figure BDA00035581723100000718
/>
Figure BDA0003558172310000081
The present invention was compared with other most advanced methods on the University-1652 test set. The results are shown in table 1, the bold numbers in table 1 indicate that the invention has numerical improvements over the search index of the latest method; it can be observed that the invention achieves 77.42% Recall @1 accuracy and 80.43% AP on drone- > satellite, 86.88% Recall @1 accuracy and 76.61% AP on satellite- > drone; the precision index of the method is obviously superior to that of the existing method, and both the precision index and the accuracy index reach the advanced level at present, and the characteristic dimension of the method after compression is 400, which is smaller than 512 which is commonly used, and the retrieval speed is faster.
The effectiveness of the cross-view angle geographic image retrieval method based on information bottleneck variational distillation provided by the invention is proved, the distinguishing clues of the geographic image are enriched by utilizing the context information of the image, the characteristic is compressed by the information bottleneck module, and the image characteristic representation with lower dimension than the common retrieval characteristic dimension is obtained as the retrieval characteristic, so that the redundant information in the extracted image characteristic is eliminated, the cross-view angle geographic image retrieval precision is improved, and the retrieval speed is accelerated.
As shown in fig. 3, the search results on the University-1652 test set are visualized, and the similarity of the search results is sorted from large to small. The results of drone- > satellite search and satellite- > drone search top five on the University-1652 test set are shown in fig. 3, with a square representing the image that was searched correctly and a x representing the image that was searched incorrectly. As can be seen from FIG. 3, the most relevant and correct images can be accurately retrieved by the present invention, and the above example further intuitively illustrates the effectiveness of the present invention in the actual cross-perspective geographic image retrieval task.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims (3)

1. A cross-view angle geographic image retrieval method based on information bottleneck variational distillation is characterized by being realized based on a cross-view angle geographic image retrieval model of information bottleneck variational distillation, wherein the model comprises a feature extraction module, an information bottleneck module, a classifier 1 corresponding to the feature extraction module and a classifier 2 corresponding to the information bottleneck module, and the cross-view angle geographic image retrieval method based on the information bottleneck variational distillation specifically comprises the following steps:
step S1), selecting a public cross-view geographic image training data set;
step S2) preprocessing the cross-view geographic image training data set to obtain a preprocessed training data set, wherein the preprocessing operation comprises adjusting images in the input cross-view geographic image training data set to be 256 multiplied by 256 in fixed size and then randomly overturning the images;
step S3) training a cross-view angle geographic image retrieval model based on information bottleneck variational distillation by adopting the preprocessed training data set, and specifically comprising the following steps:
step S31) extracting image features of the cross-view geographic image training data set by using a feature extraction module, wherein the input of the feature extraction module is two images with different view angles, and the two images are recorded as a view angle 1 image and a view angle 2 image;
step S32) View 1 image x 1 The input feature extraction module obtains the global feature f of the image 1 (ii) a Adopting a square ring characteristic partition strategy to ensure thatThe characteristics f of each divided part are obtained by the design of square ring partition 1 i And obtaining the initial characteristics of the view angle 1 image through average pooling
Figure FDA0003558172300000011
Step S33) View 2 image x 2 And View 1 image x 1 The same operation is carried out, and the initial characteristics of the view angle 2 image are obtained
Figure FDA0003558172300000012
Step S34) and initial characteristics of the two perspective images obtained in the step S32) and the step S33)
Figure FDA0003558172300000013
And &>
Figure FDA0003558172300000014
Inputting into a classifier 1, calculating cross entropy classification loss and a classification loss function L cls1 As follows:
Figure FDA0003558172300000015
Figure FDA0003558172300000016
j ∈ {1,2} represents a different viewing angle, j =1 represents viewing angle 1, j =2 represents viewing angle 2; f classifier1 () represents the operations performed by classifier 1; i denotes the i-th part of the division,
Figure FDA0003558172300000017
a predicted probability, representing a geotarget real tag y, in +>
Figure FDA0003558172300000018
Denotes the cThe prediction probability of each geographic target, C is the category number of the geographic target;
step S35) of obtaining the initial characteristics of the two perspective images obtained in the step S32) and the step S33)
Figure FDA0003558172300000019
And &>
Figure FDA00035581723000000110
Inputting information bottleneck module to compress characteristics to obtain low-dimensional image representations which are respectively recorded as ^ er and ^ er>
Figure FDA00035581723000000111
And &>
Figure FDA00035581723000000112
Step S36) represents the two low-dimensional images with the view angles obtained in the step S35)
Figure FDA00035581723000000113
And &>
Figure FDA00035581723000000114
Inputting the data into a classifier 2, calculating cross entropy classification loss and a classification loss function L cls2 As follows:
Figure FDA0003558172300000021
Figure FDA0003558172300000022
Figure FDA0003558172300000023
represents the predicted probability of the geographical target real tag y at that time, and>
Figure FDA0003558172300000024
representing the predicted probability of the c-th geographic object at the moment; f classifier2 Represents the operations performed by the classifier 2;
step S37) predicted distribution of label y obtained by using classifier 1 and classifier 2
Figure FDA0003558172300000025
And &>
Figure FDA0003558172300000026
And &>
Figure FDA0003558172300000027
The variational distillation loss was calculated as follows:
Figure FDA0003558172300000028
wherein D KL To calculate the KL distance, the above formula calculates the predicted distribution
Figure FDA0003558172300000029
And &>
Figure FDA00035581723000000210
And &>
Figure FDA00035581723000000211
KL distance therebetween to ensure that a low-dimensional image representation is obtained @>
Figure FDA00035581723000000212
And &>
Figure FDA00035581723000000213
Is sufficient for the label y, and->
Figure FDA00035581723000000214
And &>
Figure FDA00035581723000000215
Compared to the initial characteristic->
Figure FDA00035581723000000216
And &>
Figure FDA00035581723000000217
Redundant information irrelevant to the task is discarded when the feature dimension is compressed, so that the method is more discriminative;
step S38) the total loss function L of the cross-perspective geographical image retrieval model is as follows:
L=L cls1 +L cls2 +λL d
wherein λ is a weight hyperparameter;
step S39) optimizing and solving the total loss function L by using a random gradient descent method, and recording an optimized total loss function value;
step S310) repeating the steps S31) to S39), and processing the preprocessed training data set until the total loss function value does not decrease, stopping training, indicating that the cross-view angle geographic image retrieval model based on the information bottleneck variation distillation is trained, and saving the trained model as a finally detected cross-view angle geographic image retrieval model based on the information bottleneck variation distillation;
step S4) cross-view angle geographic image retrieval
Selecting a view angle 1 or view angle 2 image of any geographic target to be retrieved, inputting the image into the finally detected cross-view angle geographic image retrieval model based on information bottleneck variation distillation obtained in the step S310), and obtaining low-dimensional image representation after redundant information is removed
Figure FDA00035581723000000218
Will be/are>
Figure FDA00035581723000000219
And splicing to obtain a characteristic z' as a retrieval characteristic, thereby retrieving another perspective image which is most relevant to the same geographic target as the perspective image.
2. The cross-view geographic image retrieval method based on information bottleneck variation distillation as claimed in claim 1, wherein the cross-view geographic image retrieval model based on information bottleneck variation distillation has a specific structure as follows:
the feature extraction module is used for extracting global features of an input image by using a residual neural network ResNet-50 with weights pre-trained on ImageNet, wherein the ResNet-50 comprises five blocks named as conv1, conv2, conv3, conv4 and conv5, an average pooling layer and a full connection layer, the average pooling layer and the full connection layer of the ResNet-50 are removed, and the input image obtains the global features of the image for subsequent processing;
in order to fully utilize the context information of the image, for the extracted image global features, a square ring feature partitioning strategy is adopted, the adjacent regions are used as auxiliary information according to the attention provided by the distance from the adjacent regions to the center of the image, the distinguishing clue of the geographic image is enriched, the specific operation is to use the square ring partitioning design, the image global features extracted by a feature extraction module are divided into a plurality of square ring parts, then each part is subjected to average pooling to obtain the features with the dimension of 2048, and the process is expressed as follows:
f j =F resnet-50 (x j )
Figure FDA0003558172300000031
Figure FDA0003558172300000032
the subscript j represents the different view angle numbers, x j Representing an input image, f j Representing the global features of the extracted image,
Figure FDA0003558172300000036
representing global features f from images j Characterization of the i-th part of the division, <' > H>
Figure FDA0003558172300000033
Features representing the i-th part of the cut after an average pooling, F slice Representing square ring feature partition strategy operation, and Avgpool representing average pooling operation; the resulting initial characteristic->
Figure FDA0003558172300000034
Will be the input to the classifier 1 and the information bottleneck module;
the classifier 1 consists of a full-connection layer, a batch processing normalization layer, a Dropout layer and a classification layer, wherein the classification layer is the full-connection layer, and the dimensionality of an output vector of the classification layer is the category number of the geographic target;
the information bottleneck module is realized by an encoder, and the obtained initial characteristics are subjected to
Figure FDA0003558172300000035
Performing compression dimensionality reduction, and outputting the characteristic with the dimension size of 400, wherein the characteristic is smaller than a common characteristic dimension 512;
the input of the classifier 2 is the output of the information bottleneck module, the input characteristic dimension is 400, the dimension of the output vector is the number of the categories of the geographic target, and the middle of the output vector is also composed of a batch processing normalization layer and a Dropout layer.
3. The method for retrieving the cross-view geographical image based on information bottleneck variational distillation according to claim 2, wherein the view 1 is a satellite view and the view 2 is an unmanned aerial vehicle view.
CN202210285790.7A 2022-03-22 2022-03-22 Cross-view angle geographic image retrieval method based on information bottleneck variational distillation Active CN114691911B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210285790.7A CN114691911B (en) 2022-03-22 2022-03-22 Cross-view angle geographic image retrieval method based on information bottleneck variational distillation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210285790.7A CN114691911B (en) 2022-03-22 2022-03-22 Cross-view angle geographic image retrieval method based on information bottleneck variational distillation

Publications (2)

Publication Number Publication Date
CN114691911A CN114691911A (en) 2022-07-01
CN114691911B true CN114691911B (en) 2023-04-07

Family

ID=82139786

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210285790.7A Active CN114691911B (en) 2022-03-22 2022-03-22 Cross-view angle geographic image retrieval method based on information bottleneck variational distillation

Country Status (1)

Country Link
CN (1) CN114691911B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036910B (en) * 2023-09-28 2024-01-12 合肥千手医疗科技有限责任公司 Medical image training method based on multi-view and information bottleneck

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101271465A (en) * 2007-12-12 2008-09-24 北京航空航天大学 Lens clustering method based on information bottleneck theory
CN109740013A (en) * 2018-12-29 2019-05-10 深圳英飞拓科技股份有限公司 Image processing method and image search method
CA3060914A1 (en) * 2018-11-05 2020-05-05 Royal Bank Of Canada Opponent modeling with asynchronous methods in deep rl
US10664722B1 (en) * 2016-10-05 2020-05-26 Digimarc Corporation Image processing arrangements
CN113836330A (en) * 2021-09-13 2021-12-24 清华大学深圳国际研究生院 Image retrieval method and device based on generation antagonism automatic enhanced network
CN114022727A (en) * 2021-10-20 2022-02-08 之江实验室 Deep convolution neural network self-distillation method based on image knowledge review

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7035467B2 (en) * 2002-01-09 2006-04-25 Eastman Kodak Company Method and system for processing images for themed imaging services
US8929877B2 (en) * 2008-09-12 2015-01-06 Digimarc Corporation Methods and systems for content processing
US11074495B2 (en) * 2013-02-28 2021-07-27 Z Advanced Computing, Inc. (Zac) System and method for extremely efficient image and pattern recognition and artificial intelligence platform

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101271465A (en) * 2007-12-12 2008-09-24 北京航空航天大学 Lens clustering method based on information bottleneck theory
US10664722B1 (en) * 2016-10-05 2020-05-26 Digimarc Corporation Image processing arrangements
CA3060914A1 (en) * 2018-11-05 2020-05-05 Royal Bank Of Canada Opponent modeling with asynchronous methods in deep rl
CN109740013A (en) * 2018-12-29 2019-05-10 深圳英飞拓科技股份有限公司 Image processing method and image search method
CN113836330A (en) * 2021-09-13 2021-12-24 清华大学深圳国际研究生院 Image retrieval method and device based on generation antagonism automatic enhanced network
CN114022727A (en) * 2021-10-20 2022-02-08 之江实验室 Deep convolution neural network self-distillation method based on image knowledge review

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Heterogeneous relational complement for vehicle re-identification;Jiajian Zhao等;《Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)》;205-214 *
Learning discriminative representations via variational self-distillation for cross-view geo-localization;QianHu等;《Computers and Electrical Engineering》;1-11 *
UAV-Satellite View Synthesis for Cross-View Geo-Localization;Xiaoyang Tian;《IEEE Transactions on Circuits and Systems for Video Technology 》;第32卷(第7期);4804-4815 *
基于并行信息瓶颈的多语种文本聚类算法;闫小强等;《模式识别与人工智能》(第06期);81-90 *
基于深度学习的图像目标检测算法综述;张婷婷等;《电信科学》(第07期);96-110 *
基于知识蒸馏的车辆可行驶区域分割算法研究;周苏等;《汽车技术》(第01期);5-9 *

Also Published As

Publication number Publication date
CN114691911A (en) 2022-07-01

Similar Documents

Publication Publication Date Title
CN111177446B (en) Method for searching footprint image
CN110728263A (en) Pedestrian re-identification method based on strong discrimination feature learning of distance selection
CN103324677B (en) Hierarchical fast image global positioning system (GPS) position estimation method
CN112633382B (en) Method and system for classifying few sample images based on mutual neighbor
CN103020265B (en) The method and system of image retrieval
CN111652293B (en) Vehicle weight recognition method for multi-task joint discrimination learning
CN114005096A (en) Vehicle weight recognition method based on feature enhancement
CN103810299A (en) Image retrieval method on basis of multi-feature fusion
CN110598543A (en) Model training method based on attribute mining and reasoning and pedestrian re-identification method
CN107958067A (en) It is a kind of based on without mark Automatic Feature Extraction extensive electric business picture retrieval system
CN109308324A (en) A kind of image search method and system based on hand drawing style recommendation
CN115171165A (en) Pedestrian re-identification method and device with global features and step-type local features fused
CN114691911B (en) Cross-view angle geographic image retrieval method based on information bottleneck variational distillation
CN116775922A (en) Remote sensing image cross-modal retrieval method based on fusion of language and visual detail characteristics
CN114972506A (en) Image positioning method based on deep learning and street view image
Aly et al. Axes at trecvid 2013
CN112232885A (en) Multi-mode information fusion-based warehouse rental price prediction method
CN114860974A (en) Remote sensing image retrieval positioning method
CN117011883A (en) Pedestrian re-recognition method based on pyramid convolution and transducer double branches
CN114067356B (en) Pedestrian re-recognition method based on combined local guidance and attribute clustering
CN115719455A (en) Ground-to-air geographic positioning method
CN115410102A (en) SAR image airplane target detection method based on combined attention mechanism
CN114610941A (en) Cultural relic image retrieval system based on comparison learning
CN110941994A (en) Pedestrian re-identification integration method based on meta-class-based learner
CN114491135A (en) Cross-view angle geographic image retrieval method based on variation information bottleneck

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant