CN111368109A

CN111368109A - Remote sensing image retrieval method and device, computer readable storage medium and equipment

Info

Publication number: CN111368109A
Application number: CN201811600675.4A
Authority: CN
Inventors: 周军; 江武明; 王洋; 王姣娟
Original assignee: Beijing Techshino Technology Co Ltd; Beijing Eyecool Technology Co Ltd
Current assignee: Beijing Techshino Technology Co Ltd; Beijing Eyecool Technology Co Ltd
Priority date: 2018-12-26
Filing date: 2018-12-26
Publication date: 2020-07-03
Anticipated expiration: 2038-12-26
Also published as: CN111368109B

Abstract

The invention discloses a remote sensing image retrieval method, a remote sensing image retrieval device, a computer readable storage medium and computer readable storage equipment, and belongs to the field of image processing and pattern recognition. The method comprises the following steps: acquiring K features of a retrieval image and all target images in an image set to be retrieved, wherein the K features comprise a VGG16 feature and at least one improved VGG16 feature; for each target image, calculating similarity measurement between the retrieval image and the target image according to the K features; acquiring the weight of each feature in the K features; for each target image, weighting and summing the weight of each feature and the similarity measurement to obtain a comprehensive similarity measurement; the target image corresponding to the comprehensive similarity measurement meeting the preset threshold range is the retrieval result; or sequencing all the comprehensive similarity measurements, wherein the target images corresponding to a plurality of the comprehensive similarity measurements which are sequenced at the top are retrieval results. The invention improves the retrieval precision of the remote sensing image retrieval system.

Description

Remote sensing image retrieval method and device, computer readable storage medium and equipment

Technical Field

The invention relates to the field of image processing and pattern recognition, in particular to a remote sensing image retrieval method, a remote sensing image retrieval device, a computer readable storage medium and computer readable storage equipment.

Background

The remote sensing image retrieval system is a new application of the retrieval system of daily scene images in remote sensing images. With the rapid development of the space remote sensing technology, the number of remote sensing images is increased rapidly, and the establishment of a large-scale remote sensing image retrieval system is more and more important. The excellent image representation is beneficial to improving the trust of the remote sensing image retrieval system. The traditional image representation is mainly the underlying visual characteristics of the remote sensing image, including color, texture, shape, spatial information and the like.

The VGG16 network is a model proposed by oxford university in 2014. Due to the simplicity and practicability, the convolutional neural network model becomes a popular convolutional neural network model. It shows very good results in both image classification and object detection tasks. However, under the influence of factors such as resolution, the remote sensing image has higher diversity and complexity, the VGG16 network has poor learning capability on small-scale target features in the remote sensing image, and the VGG16 network is directly used for feature extraction of the remote sensing image, so that the learning of detailed features is not in place, and the trust is low.

Disclosure of Invention

In order to solve the technical problems, the invention provides a remote sensing image retrieval method, a remote sensing image retrieval device, a computer readable storage medium and a computer readable storage device, which improve the retrieval precision of a remote sensing image retrieval system.

The technical scheme provided by the invention is as follows:

in a first aspect, the present invention provides a remote sensing image retrieval method, including:

acquiring K features of a retrieval image and all target images in an image set to be retrieved, wherein the K features comprise a VGG16 feature and at least one improved VGG16 feature;

for each target image, calculating similarity measurement between the retrieval image and the target image according to the K features;

acquiring the weight of each feature in the K features;

for each target image, weighting and summing the weight of each feature and the similarity measurement to obtain a comprehensive similarity measurement;

the target image corresponding to the comprehensive similarity measurement meeting the preset threshold range is the retrieval result; or sequencing all the comprehensive similarity measurements, wherein the target images corresponding to a plurality of comprehensive similarity measurements in the front sequence are retrieval results;

wherein: the VGG16 feature is obtained through a VGG16 network, and the improved VGG16 feature is obtained through an improved VGG16 network;

the modified VGG16 network adds several convolutional layers after the second fully-connected layer of the VGG16 network, as well as a residual layer, a deconvolution layer, and an Eltwise layer, and modifies the first fully-connected layer and the second fully-connected layer of the VGG16 network into convolutional layers.

Further, the plurality of convolutional layers includes four groups of seven convolutional layers, the first group of convolutional layers includes a first convolutional layer and a second convolutional layer, the second group of convolutional layers includes a third convolutional layer and a fourth convolutional layer, the third group of convolutional layers includes a fifth convolutional layer and a sixth convolutional layer, and the fourth group of convolutional layers includes a seventh convolutional layer;

the sixth convolution layer is connected with the residual error layer, and the seventh convolution layer is connected with the deconvolution layer; the residual layer and the deconvolution layer are connected with the Eltwise layer, and the Eltwise layer is sequentially connected with an activation layer and a pooling layer.

Furthermore, a VGG16 characteristic is obtained at a second full connection layer of the VGG16 network, at least one of three Eltwise modes of Prod, Sum and Max is used at an Eltwise layer of the improved VGG16 network, and at least one improved VGG16 characteristic is obtained at the pooling layer.

Further, the calculating, for each target image, a similarity measure between the retrieval image and the target image according to the K features includes:

carrying out binarization processing on the K features of the retrieval image and all target images to obtain binarization features;

calculating the distance between the retrieval image and the target image according to the binarization features of the retrieval image and the K binarization features of each target image, wherein each target image obtains K distances;

and normalizing the K distances of each target image to obtain K similarity measurements corresponding to the K features of each target image.

Further, the obtaining the weight of each of the K features includes:

according to the similarity measurement corresponding to each feature of all target images, the trust of each feature is obtained through a relevant feedback method;

according to the trust of each feature, a transition matrix H representing the preference degree of each feature is constructed by the following formula:

where H (x, y) is an element of the transfer matrix H, pre_xAnd pre_yAre respectively a feature F_xAnd feature F_yα is a set bias coefficient;

initializing the weight of each feature, and performing a plurality of iterations through the following formula to obtain the final weight of each feature:

w_d＝γw_d-1+(1-γ)Hw_d-1(γ∈[0,1])

wherein, w_dFor the weight of each feature after the d-th iteration, w_d-1 is the weight of each feature after the d-1 th iteration, and γ is the set iteration parameter.

Further, the weighting and summing the weight of each feature and the similarity measure for each target image to obtain a comprehensive similarity measure includes:

dividing the features into good features and bad features according to the trust of each feature;

the integrated similarity measure is calculated by the following formula:

wherein sim (q) is the image set to be retrievedIntegrated similarity measure, w, of the mid-target image q_q ⁽ⁱ⁾In order to be a weight of a good feature,

weight of bad features, K1 number of bad features, D_i(q) is the similarity measure of the ith feature of the target image q.

In a second aspect, the present invention provides a remote sensing image retrieval apparatus, comprising:

the system comprises a feature extraction module, a feature extraction module and a feature extraction module, wherein the feature extraction module is used for acquiring K features of a retrieval image and all target images in an image set to be retrieved, and the K features comprise a VGG16 feature and at least one improved VGG16 feature;

the similarity measurement calculation module is used for calculating the similarity measurement between the retrieval image and each target image according to the K features;

the weight obtaining module is used for obtaining the weight of each feature in the K features;

the comprehensive similarity measurement module is used for weighting and summing the weight of each feature and the similarity measurement of each target image to obtain comprehensive similarity measurement;

the result output module is used for meeting the requirement that the target image corresponding to the comprehensive similarity measurement in the preset threshold range is the retrieval result; or sequencing all the comprehensive similarity measurements, wherein the target images corresponding to a plurality of comprehensive similarity measurements in the front sequence are retrieval results;

the sixth convolution layer is connected with the residual error layer, and the seventh convolution layer is connected with the deconvolution layer; the residual error layer and the deconvolution layer are connected with the Eltwise layer, and the Eltwise layer is sequentially connected with an activation layer and a pooling layer;

and obtaining a VGG16 characteristic at a second full connection layer of the VGG16 network, and obtaining at least one improved VGG16 characteristic at the pooling layer by using at least one of three Eltwise modes of Prod, Sum and Max at an Eltwise layer of the improved VGG16 network.

Further, the similarity metric calculation module includes:

the binarization unit is used for carrying out binarization processing on the K features of the retrieval image and all the target images to obtain binarization features;

the distance calculation unit is used for calculating the distance between the retrieval image and the target image according to the binarization features of the retrieval image and the K binarization features of each target image, and each target image obtains K distances;

and the normalization unit is used for normalizing the K distances of each target image to obtain K similarity measurements corresponding to the K features of each target image.

Further, the weight obtaining module includes:

the trust calculation unit is used for acquiring the trust of each feature through a relevant feedback method according to the similarity measurement corresponding to each feature of all the target images;

a transfer matrix constructing unit, configured to construct a transfer matrix H representing a preference degree for each feature according to the trust of each feature by the following formula:

where H (x, y) is an element of the transfer matrix H, pre_xAnd pre_yAre respectively a feature F_xAnd feature F_y(ii) trust of;

the iteration unit is used for initializing the weight of each feature, and performing a plurality of iterations through the following formula to obtain the final weight of each feature, wherein α is a set offset coefficient:

w_d＝γw_d-1+(1-γ)Hw_d-1(γ∈[0,1])

wherein, w_dFor the weight of each feature after the d-th iteration, w_d-1And gamma is a set iteration parameter for the weight of each feature after the d-1 th iteration.

Further, the integrated similarity metric module includes:

the characteristic classification unit is used for classifying the characteristics into good characteristics and bad characteristics according to the trust of each characteristic;

a calculating unit for calculating the comprehensive similarity measure by the following formula:

wherein sim (q) is the comprehensive similarity measure of the target image q in the image set to be retrieved, w_q ⁽ⁱ⁾In order to be a weight of a good feature,

weight of bad feature, K₁For the number of bad features, D_i(q) is the similarity measure of the ith feature of the target image q.

In a third aspect, the present invention provides a computer readable storage medium for remote sensing image retrieval, comprising a processor and a memory for storing processor executable instructions, which when executed by the processor implement the steps of the remote sensing image retrieval method of the first aspect.

In a fourth aspect, the present invention provides an apparatus for remote sensing image retrieval, comprising at least one processor and a memory storing computer executable instructions, the processor implementing the steps of the remote sensing image retrieval method according to the first aspect when executing the instructions.

The invention has the following beneficial effects:

according to the invention, the capability of learning the small-scale target features of the remote sensing image is improved by improving the VGG16 features, the retrieval accuracy of a retrieval system is improved, the VGG16 features are improved, the feature dimension of image representation is reduced, and the calculated amount is reduced. The invention fuses the VGG16 characteristic and the improved VGG16 characteristic, reserves the learning capability of the original VGG16 network to the image characteristic, can learn the detail characteristic in the remote sensing image by utilizing the improved VGG16 network, and further improves the retrieval accuracy of the retrieval system through multi-characteristic fusion.

Drawings

FIG. 1 is a flow chart of a remote sensing image retrieval method of the present invention;

FIG. 2 is a schematic diagram of a VGG16 network;

FIG. 3 is a schematic diagram of an improved VGG16 network of the present invention;

FIG. 4 is an exemplary illustration of various types of images in a remote sensing image dataset collected in accordance with the present invention;

FIG. 5 is an exemplary graph of improved search accuracy for VGG16 networks versus VGG16 networks based on Euclidean metrics;

FIG. 6 is an exemplary graph of improved search time comparisons for VGG16 networks and VGG16 networks based on Euclidean metrics;

FIG. 7 is an exemplary graph of improved search accuracy for VGG16 networks versus VGG16 networks based on Hamming metrics;

FIG. 8 is an exemplary graph of improved search time comparisons for VGG16 networks and VGG16 networks based on Hamming metrics;

FIG. 9 is a graph showing the comparison of precision ratios for each type of image based on each feature search when returning an image of 100 using Euclidean measurements;

FIG. 10 is a graph showing the comparison of precision rates for each type of image based on each feature search using Hamming metric, returning an image of 100;

FIG. 11 is an exemplary graph of a comparison of search performance based on multi-feature fusion and single feature using Euclidean metrics;

FIG. 12 is an exemplary graph of a comparison of search performance based on multi-feature fusion and based on single feature using Hamming metric;

fig. 13 is a schematic diagram of the remote sensing image retrieval device according to the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Example 1:

the embodiment of the invention provides a remote sensing image retrieval method, which is used for retrieving images similar to a retrieval image from an image set to be retrieved, and as shown in figure 1, the method comprises the following steps:

step 100: k features of the retrieval image and all target images in the image set to be retrieved are obtained, and the K features comprise a VGG16 feature and at least one improved VGG16 feature.

The embodiment of the invention is used for retrieving all target images in the image set to be retrieved, retrieving images similar to the retrieved images from the target images, and each image in the image set to be retrieved is a target image.

Step 200: and for each target image, calculating similarity measurement between the retrieval image and the target image according to the K features.

In this step, for each target image, a similarity measure is calculated for each feature, one target image has K similarity measures corresponding to the K features, and if there are n target images, there are n × K similarity measures in total.

Step 300: a weight is obtained for each of the K features.

In this step, the weight may be a simple preset value, may be various weighting methods in the prior art, and may also be a weighting method provided by the present invention (see the following description).

Step 400: and for each target image, weighting and summing the weight of each feature and the similarity measure to obtain a comprehensive similarity measure.

Due to the influence of the complexity of the remote sensing image, the image trust based on the single characteristic still needs to be improved. The invention fuses the VGG16 characteristic and the improved VGG16 characteristic, and the multi-characteristic fusion is helpful to improve the retrieval accuracy of the retrieval system.

Step 500: the target image corresponding to the comprehensive similarity measurement meeting the preset threshold range is the retrieval result; or, all the comprehensive similarity measures are sorted, the target images corresponding to a plurality of comprehensive similarity measures which are sorted in the front are retrieval results, and a plurality of numerical values which are sorted in the front can be set as required.

The threshold range may be set as needed, and may be a value greater than a certain setting, for example.

The VGG16 features of the invention are obtained through a VGG16 network, the VGG16 network comprises 5 convolutional neural networks ConvNet which are connected in sequence as shown in figure 2, each ConvNet comprises a plurality of convolutional layers conv and a pooling layer pool, and after the convolutional layers conv and the pooling layer pool, features are extracted through 3 layers of full-connection layers and input into a Softmax classifier to complete classification tasks. The outputs of the convolution layer and the full connection layer in the network are activated by the ReLU, so that the network training time is shortened, and meanwhile, the VGG16 network comprises a Dropout layer, so that an Overfitting (Overfitting) phenomenon is avoided. In fig. 2, data is an input image, Conv1+ pool1 is a first convolutional neural network, Conv1 includes two convolutional layers, Conv2+ pool2 is a second convolutional neural network, and so on. fc6, fc7 and fc8 are three full connection layers. The image is input into the VGG16 network and output from the appropriate position (each layer) of the VGG16 network as required, and the required VGG16 characteristics can be obtained.

The improved VGG16 features of the invention are obtained by improving the VGG16 network, the improved VGG16 network is an improvement of the VGG16 network of the invention: several convolutional layers conv are added after the second fully-connected layer fc7 of the VGG16 network, as well as the residual layer BN, the deconvolution layer Deconv and the Eltwise layers, and the first fully-connected layer fc6 and the second fully-connected layer fc7 of the VGG16 network are modified to convolutional layers fc6 ', fc 7'. The outputs of the added convolutional layers conv and fc6 ', fc 7' are all subjected to ReLU activation.

According to the embodiment of the invention, the retrieval image and all target images in the to-be-retrieved image set are input into the improved VGG16 network, different Eltwise modes such as Prod, Sum and Max are selected according to requirements, and then the Eltwise modes are output from a proper position of the improved VGG16 network, so that the improved VGG16 characteristic can be obtained.

According to the invention, the fully-connected layers fc6 and fc7 of the VGG16 network are modified into the convolutional layers fc6 'and fc 7', the characteristic dimension of image representation is reduced, the learning capability of the VGG16 network on small-scale target characteristics in remote sensing images is improved by the deconvolution layer Deconv, the network convergence is faster by the residual layer BN, the computing resources are saved, and the computing efficiency is improved.

In addition, in order to retain the learning capability of the original VGG16 network on image characteristics, the remote sensing image characteristics (VGG16 characteristics) are extracted by using the original VGG16 network while the remote sensing image characteristics (improved VGG16 characteristics) are extracted by using the original VGG16 network, and the improved VGG16 characteristics and the improved VGG16 characteristics are fused according to different weights.

According to the invention, the capability of learning the small-scale target features of the remote sensing image is improved by improving the VGG16 features, the retrieval precision of a retrieval system is improved, the VGG16 features are improved, the feature dimension of image representation is reduced, the network convergence speed is improved, the calculated amount is reduced, and the calculation efficiency is improved. The invention fuses the VGG16 characteristic and the improved VGG16 characteristic, reserves the learning capability of the original VGG16 network to the image characteristic, can learn the detail characteristic in the remote sensing image by utilizing the improved VGG16 network, and further improves the retrieval accuracy of the retrieval system through multi-characteristic fusion.

In the present invention, adding several convolutional layers conv after the second fully connected layer fc7 of the VGG16 network preferably includes seven convolutional layers of four groups of conv6, conv7, conv8 and conv9, the first convolutional layer conv6 includes a first convolutional layer and a second convolutional layer, the second convolutional layer conv7 includes a third convolutional layer and a fourth convolutional layer, the third convolutional layer conv8 includes a fifth convolutional layer and a sixth convolutional layer, and the fourth convolutional layer conv9 includes a seventh convolutional layer; the structure of the improved VGG16 network is shown in fig. 3.

Five convolutional neural networks before fc6 ' which are connected in sequence are the same as the VGG16 network, convolutional layers fc6 ' and fc7 ' are modified from fully-connected layers fc6 and fc7 of the VGG16 network, and nine convolutional layers including fc6 ', fc7 ', conv6, conv7, conv8 and conv 9; the outputs of the convolutional layers of the five convolutional neural networks and the nine convolutional layers are both ReLU activated.

The sixth convolutional layer (i.e., the second convolutional layer of conv 8) is connected with the residual layer BN, and the seventh convolutional layer (i.e., conv9) is connected with the deconvolution layer Deconv; the residual layer and the deconvolution layer are connected with an Eltwise layer, and the Eltwise layer is sequentially connected with an activation layer ReLU and a pooling layer pool 6.

According to the embodiment of the invention, the retrieval image and all target images in the to-be-retrieved image set are input into the improved VGG16 network, different Eltwise modes such as Prod, Sum and Max are selected according to needs, and output from the pool6 position of the improved VGG16 network, so that the improved VGG16 characteristic can be obtained. Compared with the VGG16 network, the improved VGG16 network of the embodiment increases the feature learning capability of a medium and small-scale target, and simultaneously reduces the feature dimension of image representation, compared with the fc7 layer 4096-dimensional feature of the VGG16 network, the feature dimension of the pool6 layer in the improved VGG16 network is 256, and is reduced by 16 times. A comparison of key layer feature dimensions of the VGG16 network and the modified VGG16 network is shown in table 1.

TABLE 1 comparison of parameters for VGG16 networks and key layers of improved VGG16 networks

Wherein Conv1-2 is the second convolutional layer of Conv1, Conv4-3 is the third convolutional layer of Conv4, and so on.

The VGG16 feature of the embodiment of the present invention can be output from a suitable position (each layer) of the VGG16 network as required, the improved VGG16 feature can select different Eltwise modes as required, and output from the pool6 position of the improved VGG16 network, and the number of the VGG16 feature and the improved VGG16 feature can be set as required, where a specific example is given:

the VGG16 characteristic, denoted as F, is obtained at the second full connection level fc7 of the VGG16 network₁At least one of three Eltwise modes of Prod, Sum and Max is used in an Eltwise layer of the improved VGG16 network, and at least one improved VGG16 characteristic is obtained by a pooling layer pool6 behind the Eltwise layer; for example, three Eltwise modes of Prod, Sum and Max are used to obtain three VGG16 with the characteristic of F₂,F₃,F₄Then K features can be noted as F_i∈{F₁,F₂,F₃,F₄}; for another example, an Eltwise method of Prod, Sum and Max is used to obtain a VGG16 with the characteristic of F₂Then K features can be noted as F_i∈{F₁,F₂}。

In the present invention, the similarity measure represents the similarity between the search image and the target image, and various methods for representing the similarity between two images may be adopted in the embodiment of the present invention, and may be methods in the prior art, and may also be the following methods provided by the present invention:

step 210: and performing binarization processing on the K features of the retrieval image and all the target images to obtain binarization features.

The binarization process reduces the time it takes to compute similarity measures for several features. Assuming that the retrieval image and each target image have K features, F is the feature of each target image_i∈{F₁,F₂,…,F_KIt can be binarized by the following formula:

ave(F_i) Is characterized by F_iMean value of (1), F_i(c_j) Is characterized by F_iM is the feature F_iDimension (d); the binarization formula of each feature of the retrieval image is the same as that of the target image.

Step 220: and calculating the distance between the retrieval image and the target image according to the binarization features of the retrieval image and the K binarization features of each target image, wherein each target image obtains K distances.

In this step, the distance between the search image and the target image may be calculated by various methods in the prior art, or may be calculated by the following method provided by the present invention:

let q be the search image, p_k∈{p₁,p₂,...,p_nIs the image set to be retrieved omega ═ p₁,p₂,...,p_nThe kth target image in the image set is searched, n is the number of target images in the image set to be searched, q and p_kA distance d betweenⁱ(k)；

Fqⁱ(j),Fp_k ⁱ(j) Respectively a retrieval image q and a target image p_kThe j-dimension feature of the i-th feature of (1), w_jIs the weight of the j dimension of the feature, m is the feature Fqⁱ(j),Fp_k ⁱ(j) Of (c) is calculated.

n is the number of target images of the image set to be retrieved, m is the dimension of the characteristic, f_ij,i∈{1, 2.. n }, j ∈ {1, 2.. n., m } is the j-dimensional feature of the ith image, H_jThe entropy of the characteristic dimension j.

Step 230: and normalizing the K distances of each target image to obtain K similarity measurements corresponding to the K features of each target image.

In this step, the normalization method is as follows:

D_i(q)∈{D₁(q),D₂(q),…,D_K(q) } is based on feature F_i∈{F₁,F₂,…,F_KCalculated retrieval image q and image set to be retrieved omega ═ p₁,p₂,...,p_nAnd (4) measuring the similarity between the target images, wherein n is the number of the target images of the image set to be retrieved.

The method for image retrieval by fusing multiple features necessarily involves the weight selection of each feature, and the currently widely used method is as follows: and image retrieval is carried out by adopting global weight fusion multi-feature, the weight is determined by experienced experts according to experience, and after the weight of the single feature is determined, the weight of the single feature is not changed in the whole retrieval process. The image retrieval is carried out by adopting the global weight fusion multi-feature, and the size of the weight is determined by experienced experts according to experience, so that the retrieval accuracy is unstable.

The method for acquiring the weight of each feature in the K features comprises the following steps:

step 310: and according to the similarity measurement corresponding to each feature of all the target images, obtaining the trust of each feature through a relevant feedback method.

The related feedback method refers to: firstly, a retrieval system provides a preliminary retrieval result according to a retrieval keyword provided by a user; then, the user judges the current retrieval result, which retrieval results are relevant results meeting the retrieval requirements of the user, and which retrieval results are irrelevant; the re-retrieval system provides new retrieval results based on user feedback. And continuously repeating the process until the retrieval result meets the user requirement.

The invention measures the similarity of all target images respectively_i(q)∈{D₁(q),D₂(q),…,D_K(q) } sorting and outputting the retrieval result accordingly, the method of the invention for obtaining the trust of each feature by using the relevant feedback method is as follows:

pre_i∈{pre₁,pre₂,…,pre_Kis a feature F_i∈{F₁,F₂,…,F_KTrust of i ∈ {1, 2.., K }. Snum_iIs according to the characteristic F_i∈{F₁,F₂,…,F_KThe number of similar images in the returned retrieval results. num_iIs according to the characteristic F_i∈{F₁,F₂,…,F_KTotal number of returned images.

Step 320: according to the trust of each feature, a transition matrix H representing the preference degree of each feature is constructed by the following formula:

where H (x, y) is an element of the transfer matrix H, pre_xAnd pre_yAre respectively a feature F_xAnd feature F_yα is a preset bias coefficient, feature F_x∈{F₁,F₂,…,F_KTransferring as a feature F_y∈{F₁,F₂,…,F_KDepends on the bias weight H (x, y).

When feature F_yTrusted pre_yGreater than feature F_xTrusted pre_xTo obtain better search results, we consider feature F_xCan be characterised by F_yAlternatively, a larger replacement-dependent parameter α indicates that the retrieval system is more dependent on feature F_yα ≧ 1 is due to feature F_ySpecific characteristic F_xHas good trust and guarantees

Thereby making the feature F_yIs more heavily weighted and the retrieval system is more dependent on the feature F_y. When feature F_yAnd feature F_xWhen the confidence of (2) is equal, we still consider feature F_xCan be characterised by F_yAlternatively, the alternative bias H (x, y) is 1 when feature F_yIs less than feature F_xWhen we believe that feature F is considered_xCan still be characterized by F_yThe alternative, but the alternative bias H (x, y) is relatively small. One benefit of this is that although some features are less trusted, we believe it is helpful to the retrieval task.

Step 330: initializing the weight of each feature, and performing a plurality of iterations through the following formula to obtain the final weight of each feature:

w_d＝γw_d-1+(1-γ)Hw_d-1(γ∈[0,1])

wherein, w_dFor the weight of each feature after the d-th iteration, w_d-1 is the weight of each feature after the d-1 th iteration, and γ is a preset iteration parameter.

First we initialize weights as

Is the weight, w, of each feature_dIs the weight, w, obtained for this iteration_d-₁Based on H ═ { H (x, y) }, the invention adopts the following formula to carry out iteration to obtain w_d：

I.e. abbreviated as w_d＝γw_d-1+(1-γ)Hw_d-1(γ∈[0,1])。

w_dDepends not only on the choice of the transition matrix but also on the result w of the previous iteration_d-1One benefit of this is that the final iteration result is not affected by a certain poor decision.

The optimization process for obtaining the single feature weight of the present invention is summarized as follows:

repeat

w_d＝γw_d-1+(1-γ)Hw_d-1(γ∈[0,1])

w_d＝w_d/sum(w_d)

d←d+1

Until||w_d-w_d-1||＜ε(ε≥0)

return w_d

wherein epsilon is a set numerical value and can be adjusted as required.

The invention adopts a self-adaptive weight mode, the weight is obtained based on a relevant feedback method, and the weight determination mode is greatly improved compared with the whole office weight determination mode.

In the invention, for each target image, the weighting and the similarity measurement of each feature are weighted and summed to obtain the comprehensive similarity measurement, which comprises the following steps:

step 410: the features are classified into good features and bad features according to the trust of each feature.

Although the adaptive weight method is used, the good features and the bad features are not distinguished obviously, and the retrieval performance still has room for improvement. To improve retrieval accuracy, the present invention expects that features with good confidence (retrieval performance) may be weighted more heavily than those with poor confidence. To this end, the present invention divides features into good features and bad features based on trust. Such as: the invention is based on the features F respectively_x∈{F₁,F₂,…,F_KAnd feature F_y∈{F₁,F₂,…,F_KCarry out image retrieval, if the feature F_y∈{F₁,F₂,…,F_KFeature of trust ratio F_x∈{F₁,F₂,…,F_KGood confidence, the invention considers feature F_y∈{F₁,F₂,…,F_KIs a good feature, feature F_x∈{F₁,F₂,…,F_KIs a poor feature. Good and bad features are defined as follows:

if pre_y＞＝prex

pre_y∈{good_feature}

else

pre_x∈{bad_feature}

pre_yis characterized by_y∈{F₁,F₂,…,F_K} Trust of pre_xIs characterized by_x∈{F₁,F₂,…,F_KThe trust of.

Step 420: and calculating the comprehensive similarity measure through the following formula, sorting the comprehensive similarity measure sim (q) and outputting a retrieval result according to the sorting.

weight of bad feature, K₁For the number of bad features, D_i(q) is a similarity measure of the ith feature.

The effects of the present invention are described below in specific examples:

the invention collects 22 types of images from Google-earth as given in fig. 4 (from left to right and top to bottom, in sequence as shown in table 2). Table 2 gives details of the data sets collected by the present invention.

TABLE 2. remote sensing image data set collected by the present invention

Training of the VGG16 network and the improved VGG16 network is based on a classification model trained on daily scene images. The training parameters are shown in table 3 below.

TABLE 3 training parameters for the model

Fig. 5 is a comparative example diagram of the search accuracy of the improved VGG16 network and the VGG16 network based on the euclidean metric, where the abscissa is the number of returned images, i.e., the previous search results ranked in the top order, and the ordinate is the precision. Fig. 6 shows an exemplary comparison of the search times of the improved VGG16 network and the VGG16 network based on the euclidean metric, where the abscissa is the number of returned images, i.e., the previous search results ranked in the top order, and the ordinate is the average search time. Fig. 7 is a graph showing an example of the comparison of the search accuracy of the improved VGG16 network and the VGG16 network based on the hamming metric, wherein the abscissa is the number of returned images, i.e., the previous search results ranked in the top, and the ordinate is the precision. Fig. 8 shows an exemplary graph of the search time comparison between the improved VGG16 network and the VGG16 network based on the hamming metric, wherein the abscissa is the number of returned images, i.e., the previous search results ranked in the top order, and the ordinate is the average search time. As can be seen from FIGS. 5, 6, 7 and 8, although the detection accuracy of the improved VGG16 network is slightly lower than that of the VGG16 network by about 0.02-0.04, the average retrieval time is improved by about 0.025-0.03.

FIG. 9 shows a comparison of precision for each type of image based on each single feature search using the Euclidean metric, returning an image of 100, where the abscissa is the image class and the ordinate is the precision. The returned images are the images sorted at the top, that is, the search results, where the number of the search results is 100. FIG. 10 shows a comparison of precision for each type of image based on individual single feature searches using the Hamming metric, returning an image of 100, where the abscissa is the image class and the ordinate is the precision. As can be seen from fig. 9 and 10, the VGG16 network and the modified VGG16 network have slightly different expression capabilities for image features for different classes of images, and although 4096-dimensional features of the VGG16 network perform better than 256-dimensional features of the modified VGG16 network over the entire data set, the modified VGG16 network performs better for certain specific classes, such as class 5, class 7, class 9, class 10, class 11, class 12, class 17, class 18, and class 19. Therefore, it is effective to merge the multilevel VGG16 network for image retrieval to improve the retrieval accuracy.

Fig. 11 shows a comparison of multi-feature fusion based and single-feature based search performance using the euclidean metric, where the abscissa is the number of returned images and the ordinate is the precision ratio. Fig. 12 shows a comparison of multi-feature fusion based and single feature based search performance using hamming metric, where the abscissa is the number of returned images and the ordinate is the precision ratio. As can be seen from the graph, the image retrieval is carried out based on the European measurement fusion multi-feature, compared with the improved VGG16 feature, the retrieval accuracy is improved by about 0.0612-0.0980 on average, compared with the VGG16 feature, the retrieval accuracy is improved by about 0.0247-0.0757 on average; the image retrieval is carried out based on Hamming measurement fusion multi-feature, compared with the improved VGG16 feature, the retrieval accuracy is improved by about 0.0594-0.0961 on average, and compared with the VGG16 feature, the retrieval accuracy is improved by about 0.0238-0.0897 on average. Therefore, the embodiment of the invention can effectively improve the retrieval accuracy.

Example 2:

an embodiment of the present invention provides a remote sensing image retrieval device, and as shown in fig. 13, the image retrieval device includes:

the feature extraction module 10 is configured to obtain K features of the retrieval image and all target images in the to-be-retrieved image set, where the K features include a VGG16 feature and at least one improved VGG16 feature.

And a similarity measure calculating module 20, configured to calculate, for each target image, a similarity measure between the search image and the target image according to the K features.

And a weight obtaining module 30, configured to obtain a weight of each of the K features.

And the comprehensive similarity measurement module 40 is used for weighting and summing the weight of each feature and the similarity measurement of each target image to obtain the comprehensive similarity measurement.

A result output module 50, configured to obtain a retrieval result as a target image corresponding to the comprehensive similarity measure meeting the preset threshold range; or sequencing all the comprehensive similarity measurements, wherein the target images corresponding to a plurality of the comprehensive similarity measurements which are sequenced at the top are retrieval results.

According to the embodiment of the invention, the retrieval image and all target images in the to-be-retrieved image set are input into the improved VGG16 network, different Eltwise modes such as Prod, Sum and Max are selected according to needs, and output from the pool6 position of the improved VGG16 network, so that the improved VGG16 characteristic can be obtained. Compared with the VGG16 network, the improved VGG16 network of the embodiment increases the feature learning capability of a medium and small-scale target, and simultaneously reduces the feature dimension of image representation, compared with the fc7 layer 4096-dimensional feature of the VGG16 network, the feature dimension of the pool6 layer in the improved VGG16 network is 256, and is reduced by 16 times.

The VGG16 feature of the embodiment of the invention can be output from a proper position (each layer) of the VGG16 network according to requirements, the improved VGG16 feature can be output from a pool6 position of the improved VGG16 network by selecting different Eltwise modes according to requirements, and the number of the VGG16 feature and the improved VGG16 feature can be set according to requirements. A specific example is given here:

In the present invention, the similarity measure represents the similarity between the retrieved image and the target image, and various methods for representing the similarity between the two images may be adopted in the embodiment of the present invention, and may be methods in the prior art, and may also be methods of the present invention, where the similarity measure calculating module includes:

and the binarization unit is used for carrying out binarization processing on the K features of the search image and all the target images to obtain binarization features.

And the distance calculation unit is used for calculating the distance between the retrieval image and the target image according to the binarization features of the retrieval image and the K binarization features of each target image, and each target image obtains the K distances.

In the present invention, the similarity measure represents the similarity between the retrieved image and the target image, and various methods for representing the similarity between the two images may be adopted in the embodiment of the present invention, and may be methods in the prior art, and may also be the following methods provided by the present invention, where the weight obtaining module includes:

and the trust calculation unit is used for acquiring the trust of each feature through a related feedback method according to the similarity measurement corresponding to each feature of all the target images.

where H (x, y) is an element of the transfer matrix H, pre_xAnd pre_yAre respectively a feature F_xAnd feature F_yα is a preset bias coefficient.

The iteration unit is used for initializing the weight of each feature, and performing a plurality of iterations through the following formula to obtain the final weight of each feature:

w_d＝γw_d-1+(1-γ)Hw_d-1(γ∈[0,1])

wherein, w_dFor the weight of each feature after the d-th iteration, w_d-1Gamma is a preset iteration parameter for the weight of each feature after the d-1 th iteration.

In the present invention, the comprehensive similarity measurement module comprises:

and the characteristic classification unit is used for classifying the characteristics into good characteristics and bad characteristics according to the trust of each characteristic.

The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the apparatus and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Example 3:

the method or apparatus provided by the present specification and described in the foregoing embodiments may implement service logic through a computer program and record the service logic on a storage medium, where the storage medium may be read and executed by a computer, so as to implement the effect of the solution described in the embodiments of the present specification. Accordingly, the present invention also provides a computer readable storage medium for remote sensing image retrieval, comprising a processor and a memory for storing processor executable instructions, which when executed by the processor, implement the steps comprising the steps of embodiment 1 as described above, for example:

acquiring K features of the retrieval image and all target images in the image set to be retrieved, wherein the K features comprise a VGG16 feature and at least one improved VGG16 feature;

acquiring the weight of each feature in the K features;

The storage medium may include a physical device for storing information, and typically, the information is digitized and then stored using an electrical, magnetic, or optical media. The storage medium may include: devices that store information using electrical energy, such as various types of memory, e.g., RAM, ROM, etc.; devices that store information using magnetic energy, such as hard disks, floppy disks, tapes, core memories, bubble memories, and usb disks; devices that store information optically, such as CDs or DVDs. Of course, there are other ways of storing media that can be read, such as quantum memory, graphene memory, and so forth.

The above description of the apparatus according to the method embodiment may also include other embodiments. The specific implementation manner may refer to the description of the related method embodiment, and is not described in detail herein.

According to the embodiment, the capability of learning the small-scale target features of the remote sensing images is improved by improving the VGG16 features, the retrieval accuracy of a retrieval system is improved, the feature dimension of image representation is reduced by improving the VGG16 features, and the calculated amount is reduced. The invention fuses the VGG16 characteristic and the improved VGG16 characteristic, reserves the learning capability of the original VGG16 network to the image characteristic, can learn the detail characteristic in the remote sensing image by utilizing the improved VGG16 network, and further improves the retrieval accuracy of the retrieval system through multi-characteristic fusion.

Example 4:

the invention also provides a device for remote sensing image retrieval, which can be a single computer, and can also comprise an actual operation device and the like using one or more methods or one or more embodiment devices in the specification. The apparatus for remote sensing image retrieval may include at least one processor and a memory storing computer executable instructions that when executed by the processor perform the steps of the method described in any one or more of the embodiments above.

The above description of the device according to the method or apparatus embodiment may also include other embodiments, and specific implementation may refer to the description of the related method embodiment, which is not described herein in detail.

It should be noted that, the above-mentioned apparatus or system in this specification may also include other implementation manners according to the description of the related method embodiment, and a specific implementation manner may refer to the description of the method embodiment, which is not described herein in detail. The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the hardware + program class, storage medium + program embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for the relevant points, refer to the partial description of the method embodiment.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a vehicle-mounted human-computer interaction device, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, when implementing one or more of the present description, the functions of each module may be implemented in one or more software and/or hardware, or a module implementing the same function may be implemented by a combination of multiple sub-modules or sub-units, etc. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method or apparatus that comprises the element.

As will be appreciated by one skilled in the art, one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

One or more embodiments of the present description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the present specification can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment. In the description of the specification, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present invention in its spirit and scope. Are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A remote sensing image retrieval method is characterized by comprising the following steps:

acquiring the weight of each feature in the K features;

2. The remote sensing image retrieval method of claim 1, wherein the plurality of convolutional layers comprises four groups of seven convolutional layers, the first group of convolutional layers comprises a first convolutional layer and a second convolutional layer, the second group of convolutional layers comprises a third convolutional layer and a fourth convolutional layer, the third group of convolutional layers comprises a fifth convolutional layer and a sixth convolutional layer, and the fourth group of convolutional layers comprises a seventh convolutional layer;

3. The remote sensing image retrieval method of claim 2, wherein the VGG16 features are obtained at a second full connection layer of the VGG16 network, and at least one improved VGG16 feature is obtained at the pooling layer by using at least one of three Eltwise modes of Prod, Sum and Max at an Eltwise layer of an improved VGG16 network.

4. A remote sensing image retrieval method according to any one of claims 1-3, wherein said calculating a similarity measure between the retrieved image and each target image based on said K features, respectively, comprises:

5. The remote sensing image retrieval method according to claim 4, wherein the obtaining the weight of each of the K features comprises:

w_d＝γw_d-1+(1-γ)Hw_d-1(γ∈[0,1])

6. A remote sensing image retrieval method according to claim 5, wherein the weighted summation of the weight and similarity measure of each feature for each target image to obtain a composite similarity measure comprises:

the integrated similarity measure is calculated by the following formula:

7. A remote sensing image retrieval apparatus, characterized in that the image retrieval apparatus comprises:

8. The remote sensing image retrieval device according to claim 7, wherein the plurality of convolutional layers includes seven convolutional layers in four groups, a first group of convolutional layers includes a first convolutional layer and a second convolutional layer, a second group of convolutional layers includes a third convolutional layer and a fourth convolutional layer, a third group of convolutional layers includes a fifth convolutional layer and a sixth convolutional layer, and a fourth group of convolutional layers includes a seventh convolutional layer;

9. A remote sensing image retrieval apparatus according to claim 7 or 8, wherein the similarity measure calculation module includes:

10. The remote sensing image retrieval device according to claim 9, wherein the weight acquisition module includes:

w_d＝γw_d-1+(1-γ)Hw_d-1(γ∈[0,1])

11. A remote sensing image retrieval apparatus according to claim 10, wherein the comprehensive similarity metric module comprises:

12. A computer-readable storage medium for remote sensing image retrieval, comprising a processor and a memory for storing processor-executable instructions, which when executed by the processor, implement the steps of the remote sensing image retrieval method of any one of claims 1-6.

13. An apparatus for remote sensing image retrieval, comprising at least one processor and a memory storing computer executable instructions which when executed by the processor implement the steps of the remote sensing image retrieval method of any one of claims 1-6.