CN114090813A

CN114090813A - Variational self-encoder balanced Hash remote sensing image retrieval method based on multi-channel feature fusion

Info

Publication number: CN114090813A
Application number: CN202111070959.9A
Authority: CN
Inventors: 陈亚雄; 王凡; 李小玉; 汤一博; 熊盛武
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2021-09-13
Filing date: 2021-09-13
Publication date: 2022-02-25

Abstract

The invention relates to a variational self-encoder balanced Hash remote sensing image retrieval method based on multi-channel feature fusion. Firstly, a data set is divided into a training data set and a testing data set, then Gist features of different channels are extracted and fused by a multi-channel feature fusion module, the fused Gist features pass through a multi-scale expansion convolution module to obtain multi-scale context information of a remote sensing image, then an integral network model is constructed, the training data set is used for training an integral network, and finally, a retrieval result is obtained by using the trained network. The multi-channel feature fusion module is used for learning multi-channel features of the multi-spectral remote sensing image retrieval, the multi-scale expansion convolution module is used for learning the Hash code, so that multi-scale context information of the remote sensing image is captured, and a target function containing reconstruction cost, KL divergence and balance items is constructed, so that the balance characteristic and the discrimination of the Hash code are kept in the Hash learning process, and the precision of the remote sensing image retrieval is further improved.

Description

Variational self-encoder balanced Hash remote sensing image retrieval method based on multi-channel feature fusion

Technical Field

The invention belongs to the field of remote sensing image retrieval, and particularly relates to a variational self-encoder balanced Hash remote sensing image retrieval method based on multi-channel feature fusion.

Background

With the rapid development of satellite and aircraft technologies, remote sensing image retrieval technology is also improved to a great extent. In the field of remote sensing image retrieval, effective retrieval is more and more concerned. Because a large amount of high-resolution remote sensing data can be obtained, large-scale remote sensing image retrieval gradually becomes one of the most important research topics in the remote sensing field. Many content-based remote sensing image retrieval methods are evolving for managing and analyzing remote sensing image data. The methods mainly comprise two parts of image feature extraction and similarity measurement. Although content-based remote sensing image retrieval methods have been widely developed, they can automatically process representations of features of remote sensing images and measure the similarity between remote sensing images. However, these methods still face many challenges in large-scale remote sensing image retrieval.

The traditional image retrieval methods compare the high-dimensional features of the query image with the high-dimensional features of each image in the data set, and these methods have high spatial complexity, and as the dimensions of data and features increase, the search cost will gradually increase. To solve these problems, the following two aspects are mainly considered: (1) improving the feature search strategy; (2) and reducing the dimension of the image feature. The former can be applied based on a tree structure, but when the original image feature dimension is high, the retrieval performance will be affected. To avoid the drawbacks of this approach, researchers have turned their eyes to reduce the characteristic dimensions of the remote sensing images. The hashing algorithm may map images to a low dimensional space, each image being represented by a binary code, with similar images producing similar binary codes. Unsupervised hash algorithms have attracted the attention of many scholars due to the problems of large-scale data sets and insufficient label information in supervised hash applications. The scholars propose various unsupervised hashing algorithms to improve the retrieval performance, but still have the following three problems: (1) the multi-channel characteristic of the multispectral remote sensing image is not considered, so that the information of the multispectral image is lost; (2) some methods perform unsupervised hash code learning, but ignore multi-scale context information of remote sensing graphs, so that the retrieval performance of the remote sensing images is reduced; (3) the existing method cannot fully capture the balance characteristic of the hash code, so that the balance characteristic is not fully utilized and the performance of remote sensing image retrieval is finally influenced.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a variational self-encoder balanced hash remote sensing image retrieval method based on multi-channel feature fusion. Firstly, a data set is divided into a training data set and a testing data set, then Gist features of different channels are extracted and fused by a multi-channel feature fusion module, the fused Gist features pass through a multi-scale expansion convolution module to obtain multi-scale context information of a remote sensing image, then an integral network model is constructed, the training data set is used for training an integral network, and finally, a retrieval result is obtained by using the trained network.

In order to achieve the aim, the technical scheme provided by the invention is a variational self-encoder balanced Hash remote sensing image retrieval method based on multi-channel feature fusion, which comprises the following steps:

step 1, dividing a training data set and a testing data set;

step 2, extracting and fusing Gist characteristics of different channels by using a multi-channel characteristic fusion module;

step 3, obtaining multi-scale context information of the remote sensing image by the fusion Gist characteristic obtained in the step 2 through a multi-scale expansion convolution module;

step 4, constructing an integral network model, and using a variational self-encoder as a backbone network, wherein the variational self-encoder comprises an inference network and a generation network;

step 5, training the whole network model, calculating a target function of the whole network model, and updating initial parameters of the whole network model;

and 6, obtaining a retrieval result by using the trained network.

Moreover, the multi-channel Gist feature fusion calculation method in the step 2 is as follows:

in the formula, X represents a multispectral remote sensing image, n represents the number of channels of the multispectral remote sensing image, f (-) is a function for extracting n channels from the multispectral remote sensing image, k (-) represents a function for extracting Gist characteristics of the n channels, and G represents the Gist characteristics of the final multispectral remote sensing image.

And in the step 3, a multi-scale expansion convolution module is used for acquiring more context information of the remote sensing image, the multi-scale expansion convolution module obtains the context information of different scales after convolution operation that the Gist characteristics extracted in the step 2 are subjected to expansion convolution into 64 × 11, 64 × 17 and 64 × 19, the context information of different scales is connected through a serial operator, then the expansion convolution of 1 × 11 is carried out, the shape is reshaped to make the output shape consistent with the input shape, and finally the output shape is transmitted forward through the FC for a subsequent training process.

Moreover, the inference network h (x) in the step 4 is mainly responsible for analyzing the original data (the data processed by the step 2 and the step 3) x ∈ R^dMapping to variation probability distribution, and sampling hidden layer vector z epsilon R from the probability distribution^k(ii) a Generating a network g (z) primarily responsible for hiding the layer vector z ∈ R^kThat is, the hash code obtained by inference network mapping is remapped back to the original data x ∈ R^dWherein d is the dimension of the original data, k is the dimension after mapping, and d ≠ k.

Furthermore, the step 5 assumes aggregation

Represents a set of unlabeled samples, x_iRepresenting the ith sample, the hash function is calculated as follows:

b＝sign(h(x_i))＝sign(h_j(x_i)),j＝1,2,...,k (2)

wherein b represents a hash code, h_j(x_i) Representing data from the original x_iAnd obtaining the j value of the intermediate layer feature vector through an inference network of the variational self-encoder.

In order to better learn the hash function and avoid the difficulty of optimizing a non-smooth discrete function, the output h (x) of the inference network is used_i) As an input part of the generation network, the reconstruction cost function is:

in the formula, L_rThe reconstruction cost is represented, so that the similar points in the reconstruction process can be similar as much as possible;

the variational self-encoder needs to keep the probability distribution close to the normal distribution N (0,1) by minimizing the KL divergence, which is defined as:

in the formula (I), the compound is shown in the specification,

the KL divergence is expressed, and the distinguishing degree of the hash codes can be kept in the hash learning process; mu.s_iIs a sample x_iThe average value of (a) of (b),

is a sample x_iThe variance of (a);

to maintain the balanced property of the hash code, a balance term is defined as:

in the formula, L_bRepresenting balance items, and keeping balance characteristics of the hash codes in the hash learning process; mu.s_iIs a sample x_iThe mean value of (a);

considering the reconstruction cost, KL divergence and balance term, the final objective function is formulated as follows:

in the formula, α and β represent hyper-parameters that evaluate the degree of the term.

When the whole network model is trained, the Adam algorithm is used for optimization, the learning rate is set to be epsilon 0.0001, the batch size M is 100, the lengths k of the hash codes are respectively set to be 16, 24, 36, 48 and 64, the weight parameters theta and phi of the generation network and the inference network are initialized by the uniform distribution of glorot, alpha is set to be 1, beta is set to be 5, 1000 rounds of iteration or loss functions are not reduced any more during training, and the trained network model is obtained.

In step 6, the trained network is used to calculate the hash codes of the sample images in the test data set, the hamming distances between the query sample and the hash codes of the samples in the data set are sorted from large to small, and the top n precisions of the ranking list are calculated to obtain the average precision index MAP and the top n retrieval results.

Compared with the prior art, the invention has the following advantages: the method utilizes a multi-channel feature fusion module to execute unsupervised learning of the hash code and learn multi-channel features of multispectral remote sensing image retrieval; learning a hash code by using a multi-scale expansion convolution module so as to capture multi-scale context information of the remote sensing image; and constructing a target function containing the reconstruction cost, the KL divergence and the balance item, so that the balance characteristic and the discrimination of the hash code are kept in the hash learning process, and the precision of remote sensing image retrieval is further improved.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention.

FIG. 2 is a block diagram of a multi-channel feature fusion module according to an embodiment of the present invention.

Fig. 3 is a network structure diagram according to an embodiment of the present invention.

FIG. 4 is a diagram of multi-scale context information learning based on multi-scale dilation convolution according to an embodiment of the present invention.

FIG. 5 is a top 10 results retrieved on a SAT-4 dataset according to an embodiment of the present invention, the first column being an example of a query, the other columns showing results retrieved using the method of the present invention, and erroneous retrieved results being marked with a red box.

Detailed Description

The invention provides a variational self-encoder balanced Hash remote sensing image retrieval method based on multi-channel feature fusion.

The technical solution of the present invention is further explained with reference to the drawings and the embodiments.

As shown in fig. 1, the process of the embodiment of the present invention includes the following steps:

step 1, dividing a training data set and a testing data set.

Using the SAT-4 image dataset, 500000 images, all 28 x 28 in size, with 4 channels per image. 1000 images were randomly selected as the test and query data set, and the remainder as the training data set and the search database.

And 2, extracting and fusing Gist characteristics of different channels by using a multi-channel characteristic fusion module.

The Gist characteristic can describe not only the external outline information of the image, but also the high-level information and statistical information hidden in the image, and the multi-channel Gist characteristic fusion calculation mode is as follows:

And 3, passing the fusion Gist characteristics obtained in the step 2 through a multi-scale expansion convolution module to obtain multi-scale context information of the remote sensing image.

Different convolution kernels can learn different feature maps, and all feature maps are connected with fusion information of different scales. Acquiring context information of more remote sensing images by using a multi-scale expansion convolution module, acquiring the context information of different scales by the multi-scale expansion convolution module after convolution operation that the expansion convolution is 64 × 11, 64 × 17 and 64 × 19 on the Gist characteristics extracted in the step 2, connecting the context information of different scales through a serial operator, performing expansion convolution by 1 × 11, reshaping to make the shapes of output and input consistent, and finally performing the training process after FC (forward propagation).

And 4, constructing an integral network model, and using a variational self-encoder as a backbone network, wherein the backbone network comprises an inference network and a generation network.

The reasoning network h (x) is mainly responsible for enabling the original data (the data processed by the step 2 and the step 3) x to be in the range of R^dMapping to variation probability distribution, and sampling hidden layer vector z epsilon R from the probability distribution^k(ii) a Generating a network g (z) primarily responsible for hiding the layer vector z ∈ R^kThat is, the hash code obtained by inference network mapping is remapped back to the original data x ∈ R^dWherein d is the dimension of the original data, k is the dimension after mapping, and d ≠ k.

And 5, training the whole network model, calculating an objective function of the whole network model, and updating initial parameters of the whole network model.

Set of assumptions

b＝sign(h(x_i))＝sign(h_j(x_i)),j＝1,2,...,k (2)

in the formula, L_rRepresenting the reconstruction cost, the similar points in the reconstruction process can be made as similar as possible.

in the formula (I), the compound is shown in the specification,

is a sample x_iThe variance of (c).

in the formula, L_bRepresenting balance items, and keeping balance characteristics of the hash codes in the hash learning process; mu.s_iIs a sample x_iIs measured.

And 6, obtaining a retrieval result by using the trained network.

And calculating the hash codes of the sample images in the test data set by using the trained network, sequencing the Hamming distances between the query sample and the hash codes of all samples in the data set from large to small, and calculating the first n precisions of the ranking list to obtain an average precision index MAP and the first n retrieval results.

In order to evaluate the effectiveness of the method, three factors of multi-channel feature fusion, multi-scale expansion convolution and balance items are considered, the retrieval performance of the method provided by the invention is compared with the retrieval performance of the method which does not utilize a multi-channel feature fusion module (VAEH-M), a multi-scale expansion convolution module (VAEH-S) and a balance item (VAEH-B), and then the retrieval performance of the method provided by the invention is compared with advanced methods such as IMH, IsoHash, ITQ, SpH, KULSH, PRH, OKH, OSH, OPRH, VAEH and the like. The experiment adopts 16, 24, 36, 48 and 64-bit different hash codes, adopts an SAT-4 image data set, adopts a multi-channel feature fusion method to express image data by using Gist features, and executes IMH, IsoHash, ITQ, SpH, KULSH, PRH, OKH, OSH, OPRH and VAEH methods according to the original thesis.

TABLE 1

Table 1 shows the results of the present invention compared to VAEH-M, VAEH-S, VAEH-B on SAT-4 datasets with different bit hash codes, where MAP is the average accuracy index. The comparison result shows that the average accuracy index of the first 10 retrieval results of the method provided by the invention on the SAT-4 data set with different bit hash codes is the highest.

TABLE 2

Table 2 shows the average accuracy comparison of top-10 and top-100 search results on SAT-4 datasets with different bit hash codes, where MAP is the average accuracy indicator. The comparison result shows that the average accuracy index of the first 10 and the first 100 retrieval results of the method provided by the invention on the SAT-4 data set with different bit hash codes is the highest.

In specific implementation, the above process can adopt computer software technology to realize automatic operation process.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. A variational self-encoder balanced Hash remote sensing image retrieval method based on multi-channel feature fusion is characterized by comprising the following steps:

step 1, dividing a training data set and a testing data set;

and 6, obtaining a retrieval result by using the trained network.

2. The method for retrieving the balanced hash remote sensing image of the variational self-encoder based on the multi-channel feature fusion as claimed in claim 1, wherein: the multi-channel Gist feature fusion calculation method in the step 2 is as follows:

3. The method for retrieving the balanced hash remote sensing image of the variational self-encoder based on the multi-channel feature fusion as claimed in claim 2, wherein: and 3, acquiring context information of more remote sensing images by using a multi-scale expansion convolution module, obtaining the context information of different scales by the multi-scale expansion convolution module after convolution operation that the expansion convolution is 64 × 11, 64 × 17 and 64 × 19 on the Gist characteristics extracted in the step 2, connecting the context information of different scales through a serial operator, performing expansion convolution by 1 × 11, reshaping to make the shapes of output and input consistent, and finally performing forward propagation through FC for a subsequent training process.

4. The method for retrieving the balanced hash remote sensing image of the variational self-encoder based on the multi-channel feature fusion as claimed in claim 1, wherein: in the step 4, the inference network h (x) is mainly responsible for dividing the original data x into the elements R^dMapping the data processed in the step 2 and the step 3 to variation probability distribution, and sampling the hidden layer vector z epsilon R from the probability distribution^k(ii) a Generating a network g (z) primarily responsible for hiding the layer vector z ∈ R^kThat is, the hash code obtained by inference network mapping is remapped back to the original data x ∈ R^dWherein d is the dimension of the original data, k is the dimension after mapping, and d ≠ k.

5. The method for retrieving the balanced hash remote sensing image of the variational self-encoder based on the multi-channel feature fusion as claimed in claim 4, wherein: assumption set in step 5

b＝sign(h(x_i))＝sign(h_j(x_i)),j＝1,2,...,k (2)

wherein b represents a hash code, h_j(x_i) Representing data from the original x_iJ value of intermediate layer characteristic vector obtained by inference network of variational self-encoder；

in the formula, L_rRepresenting a reconstruction cost;

in the formula (I), the compound is shown in the specification,

is a sample x_iThe variance of (a);

6. The method for retrieving the balanced hash remote sensing image of the variational self-encoder based on the multi-channel feature fusion as claimed in claim 5, wherein: when the whole network model is trained in the step 5, the Adam algorithm is used for optimization, the learning rate is set to be epsilon 0.0001, the batch size M is 100, the lengths k of hash codes are respectively set to be 16, 24, 36, 48 and 64, the weight parameters theta and phi of a generation network and an inference network are initialized by uniform distribution of glorot, alpha is set to be 1, beta is set to be 5, 1000 rounds of training are carried out, iteration or loss functions are not reduced any more, and the trained network model is obtained.

7. The method for retrieving the balanced hash remote sensing image of the variational self-encoder based on the multi-channel feature fusion as claimed in claim 6, wherein: and 6, calculating the hash codes of the sample images in the test data set by using the trained network, sequencing the Hamming distances between the query sample and the hash codes of all samples in the data set from large to small, and calculating the top n precisions of the ranking list to obtain an average precision index MAP and top n retrieval results.