CN110941734A

CN110941734A - Depth unsupervised image retrieval method based on sparse graph structure

Info

Publication number: CN110941734A
Application number: CN201911083223.8A
Authority: CN
Inventors: 张浩峰; 王伟伟
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2019-11-07
Filing date: 2019-11-07
Publication date: 2020-03-31
Anticipated expiration: 2039-11-07
Also published as: CN110941734B

Abstract

The invention provides a depth unsupervised image retrieval method based on a sparse graph structure, which comprises the steps of preprocessing a training data set and extracting image features of the training data set; constructing a weighted sparse graph, and determining a network model according to the sparse graph; training the network model by using the image characteristics of the training data set and the weighted sparse graph; extracting image features of an image to be detected, inputting the image features of the image to be detected into a network model, extracting the output of an encoder network as low-dimensional features of the image to be detected, and performing binary quantization on the low-dimensional features to obtain a hash code of the image to be detected; and calculating the Hamming distance between the image to be detected and all image Hash codes in the database to be inquired, and obtaining an approximate image according to the calculation result. The invention adopts the sparse graph structure to store the similarity information, saves the storage space of the graph structure and can avoid the requirement of retrieval performance on the number of categories.

Description

Depth unsupervised image retrieval method based on sparse graph structure

Technical Field

The invention belongs to the computer vision and pattern recognition technology, and particularly relates to a depth unsupervised image retrieval method based on a sparse graph structure.

Background

In recent years, research on image retrieval direction has been greatly improved, but today, methods for achieving excellent effects require class labels, and manual labeling is time-consuming and not professional enough for large-scale data sets in real life, so that an unsupervised image retrieval algorithm is produced and is increasingly popular. Unsupervised algorithms can avoid the need for tags, but inevitably cause a decrease in retrieval performance.

Currently, for unsupervised image retrieval, there are three main categories, namely, graph structure-based retrieval, pseudo tag-based retrieval and depth unsupervised retrieval. The image retrieval method based on the graph structure creates the graph structure to save the neighbor information of the original space before training, and then uses the neighbor relation in the graph structure for training, so that the generated hash code can contain the neighbor information of the original space. The traditional graph structure comprises a Laplace graph and an anchor graph, and the two graphs have the defects of storing excessive information, being easily interfered by redundant information and having certain requirements on storage space. The method based on the pseudo label can solve the two defects, the training process is supervised mainly through the manually labeled information, however, the image retrieval capability still needs to be improved due to the influence of the defects of the manual label, and even some traditional pseudo label algorithms are very sensitive to the preset category number.

Disclosure of Invention

The invention aims to provide a depth unsupervised image retrieval method based on a sparse graph structure.

The technical solution for realizing the purpose of the invention is as follows: a depth unsupervised image retrieval method based on a sparse graph structure comprises the following specific steps:

step 1, preprocessing a training data set, and extracting image features of the training data set;

step 2, constructing a weighted sparse graph, and determining a network model according to the sparse graph, wherein the network model uses a symmetrical self-encoder structure and comprises an encoder and a decoder;

step 3, training the network model by using the image characteristics of the training data set and the weighted sparse graph;

step 4, extracting image features of the image to be detected according to the step 1, taking the image features of the image to be detected as input of a network model, extracting output of an encoder network as low-dimensional features of the image to be detected, and performing binary quantization on the low-dimensional features to obtain a hash code of the image to be detected;

and 5, calculating Hamming distances between the image to be detected and all image Hash codes in the database to be inquired, judging whether the distance values are smaller than a preset threshold value or sequencing all the distance values, and outputting the corresponding image in the database to be inquired as an approximate image of the image to be detected according to a comparison result or a sequencing result.

Preferably, the specific method for preprocessing the training data set is as follows:

the fc7 layer 4096 dimensional image features were extracted by inputting the training dataset images into a VGG16 network pre-trained by classification on the ImageNet dataset.

Preferably, the specific steps of constructing the weighted sparse graph and determining the network model according to the sparse graph are as follows:

step 2-1, taking the image characteristics of the training data set as samples, and calculating the similarity among all the samples;

step 2-2, respectively sequencing similarity values of each sample and other samples from high to low;

2-3, respectively selecting the first k adjacent samples for each sample, and connecting to form a sparse graph;

2-4, calculating the weight of each edge according to the degree of the node pair connected with the edge in the sparse graph to form a weighted sparse graph;

2-5, determining a network model according to the weighted sparse graph, wherein the network model uses a symmetrical self-encoder structure, and a training formula of the self-encoder network is as follows:

wherein omega_BRepresenting a set of edges in the sparse graph structure of the present invention, each edge being represented by a connected pair of nodes, w_ijRepresenting the weight of the edge (i, j), β representing the relative contribution of the third term to the overall formula, z_nRepresenting the output of the encoder, i.e. z_n＝f(x_nΘ), Θ denotes the encoder network weight parameter, u_nRepresenting the decoder output, i.e. u_n＝g(x_nAnd theta, lambda) represents the decoder network weights. b_nFrom z_nBinary quantized, α indicating the relative importance of the second term, the index n indicating the nth image,

feature (N is 1,2, … N), z representing the nth image_i-z_j。

Preferably, the similarity between samples is calculated by the following formula:

wherein x_iAnd x_jRespectively representing the image characteristics of two images, subscripts i and j representing the serial numbers of the images, | · | survival₂Representing the vector two norm, s_ijIndicating the similarity of the two.

Preferably, the weight of each edge is calculated in the following manner:

wherein d is_mDegree of m-th side, w_ijAnd representing the weight of an edge connecting the ith sample and the jth sample in the weighted sparse graph.

Preferably, the specific method for training the network model by using the image features of the training data set and the weighted sparse graph is as follows:

step 3-1, according to the training formula of the self-encoder network and the output z of the encoder_nUpdating encoder weights for gradients of the encoder weights, wherein:

training formula pair z from the encoder network_nThe gradient of (d) is:

according to the chain rule, the updating process of Θ is:

η is the learning rate;

step 3-2, updating the decoder network weight parameter Lambda by using a gradient descent mode, wherein the updating formula is as follows:

3-3, updating the corresponding binary hash code, wherein an updating formula is as follows:

b_n ^new＝sign(z_n ^new)＝sign(f(x_n,Θ^new))

and 3-4, repeating the steps 3-1 to 3-3 until the difference value of the training formulas of the self-encoder network in the last two times is smaller than a preset value.

Compared with the prior art, the invention has the following remarkable advantages:

(1) the invention adopts the sparse graph structure to store the similarity information, saves the storage space of the graph structure and can avoid the requirement of retrieval performance on the number of categories;

(2) according to the invention, the sparse graph is added in the field of image retrieval for the first time to store effective neighbor information, the sparse graph structure is more sparse than the traditional Laplace graph and anchor graph, and the required storage space is smaller, so that the method can be used for retrieving a larger-scale image set;

(3) the sparse graph structure can store sufficient and effective information, so that the dependence on the number of categories is reduced; on the other hand, because the redundant edges are deleted from the sparse graph, the interference of redundant information on the final performance can be avoided;

(4) the method has simple algorithm, although a network structure is used, the number of network layers is less, the requirement on the performance of a machine display card is lower, the running speed is high, and the process of creating the sparse graph can be completed by using a CPU and a machine memory, so that the method can be widely applied to occasions with larger sample size.

The present invention is described in further detail below with reference to the attached drawings.

Drawings

Fig. 1 is a flow chart in training of a depth unsupervised image retrieval method based on a sparse graph structure.

Fig. 2 is a flowchart of a sparse graph structure-based depth unsupervised image retrieval method during testing.

Detailed Description

A depth unsupervised image retrieval method based on a sparse graph structure comprises the following specific steps:

the training data set comprises N images, the N images of the training data set are input into a VGG16 network which is subjected to classification pre-training on the ImageNet data set to extract fc7 layers of 4096-dimensional image features, and the image features are combined into a matrix X, wherein the matrix X comprises the image features

(where d is 4096),

the feature of the nth image (N is 1,2, … N) is shown. The class labels of the training set are not known.

Step 2, constructing a weighted sparse graph, and determining a network model according to the sparse graph, wherein the specific steps are as follows:

step 2-1, taking the image characteristics of the training data set as samples, and calculating the similarity among all samples, wherein the similarity can be calculated in a cosine similarity mode, and the calculation formula is as follows:

wherein x_iAnd x_jRespectively representing the image characteristics of the two images, the indices i and j representing the image numbers, ·₂Representing the vector two norm, s_ijIndicating the similarity of the two.

step 2-4, calculating the weight of each edge according to the degree of the node pair connected with the edge in the sparse graph to form a weighted sparse graph, wherein the calculation mode of the weight is as follows:

2-5, determining a network model according to the weighted sparse graph, wherein the network model uses a symmetrical self-encoder structure (comprising an encoder and a decoder) to perform low-dimensional feature mapping, the encoder and the decoder both adopt three layers of fully-connected networks, the output dimensionality of the encoder is 1000, 2000 and L in sequence, and L represents the length of the encoder outputting binary codes; the decoder network output dimension is 2000, 1000 and 4096 in sequence.

After adding the weighted sparse graph structure, the training formula of the self-encoder network is as follows:

wherein omega_BRepresenting a set of edges in the sparse graph structure of the present invention, each edge being represented by a connected pair of nodes, e.g. (i, j), w_ijRepresenting the weight of the edge (i, j), β representing the relative contribution of the third term to the overall formula, z_nRepresenting the output of the encoder, u_nRepresenting the decoder output, b_nFrom z_nBinary quantized, α indicates the relative importance of the second term, and the index n indicates the nth image.

the model of the invention needs to train the weight parameters of the encoder and decoder networks, which are respectively expressed by theta and lambda, and z is recorded_n＝f(x_nTheta) and u_n＝g(x_nAnd Θ, Λ) (N ═ 1,2, …, N) are outputs of the encoder and decoder, respectively, when the nth image feature is input, and b is written_nAnd the binary hash code is the binary hash code corresponding to the nth image. Multiple optimizations may be used in deep neural networksBy way of example, the gradient method is used here. The specific updating steps are as follows:

step 3-1, according to the training formula of the self-encoder network and the output z of the encoder_nThe encoder weights are updated for the gradient of the encoder weights. Wherein the training formula (3) is to z_nThe gradient of (d) can be written as:

then according to the chain rule, the update procedure of Θ is:

thus, the weighting parameters of the network part of the encoder can be updated in a back-propagation manner, wherein η is called a learning rate, and the main control parameter is updated in a single step.

Step 3-2, updating the decoder network weight parameter Λ in a gradient descending manner, wherein the updating formula is as follows:

it is assumed here that the learning rates of the encoder and decoder are the same.

And 3-3, updating the corresponding binary hash code, wherein an updating formula is as follows:

b_n ^new＝sign(z_n ^new)＝sign(f(x_n,Θ^new)) (7)

and 3-4, returning to the step 3-1, and repeating the steps 3-1 to 3-3 until the difference value of the training formula (3) of the last two times is smaller than a preset value.

Step 4, extracting the image characteristics of the image to be detected according to the step 1, and enabling the image characteristics x of the image to be detected_cAs the input of the network model, extracting the output of the encoder network as the low-dimensional feature of the image to be detected, wherein the dimension of the low-dimensional feature is the same as the length of the required binary Hash code, and performing binary quantization on the low-dimensional feature by using a sign function to obtain the low-dimensional featureHash code b of test image_cI.e. b_c＝sign(z_c) Wherein z is_c＝f(x_cTheta) as input image feature x to be measured_cThe encoder output of the time. When inquiring, the image to be inquired is used as inquiry input, the Hamming distance between the image to be inquired and all image Hash codes in the database to be inquired is calculated, whether the distance value is smaller than a preset threshold value or all the distance values are sequenced is judged, and the corresponding image in the database to be inquired is output as an approximate image of the image to be inquired according to a comparison result or a sequencing result. In the testing stage, the image to be tested is the test set image.

As shown in fig. 1, images of a training set are classified and pre-trained on an ImageNet data set, and output fc 7-layer high-dimensional features through a VGG16 network, sparse graphs are constructed for the high-dimensional features by using a kNN algorithm, weights of edges are calculated according to the degree of each edge node pair in the graphs, and a weighted sparse graph is formed, and the sparse graph structure can enable hash codes generated by training to keep original spatial information as much as possible. The image high-dimensional features are input into a self-encoder network, the output of an encoder is subjected to binary quantization to be used as a hash code generated in a training process, and the output of a decoder can be used as a reconstruction feature, so that the encoder output stores more information of the original high-dimensional features. The formula in the whole training process is divided into three parts, namely quantization loss, reconstruction loss and neighbor keeping information, wherein the quantization loss and the neighbor keeping loss are used for training an encoder network during training, and the reconstruction loss is used for training a decoder network.

As shown in fig. 2, in the retrieval process, the high-dimensional features of the image to be detected are input into the trained encoder network, and binary quantization is performed on the output of the encoder network to obtain the hash code corresponding to the input image. Calculating the Hamming distance between the image to be detected and the image Hash codes in the database to be inquired, judging whether the distance value is smaller than a preset threshold value or sequencing the distance values of all the images to be detected, and outputting the corresponding image as an approximate image of the image to be detected according to a comparison result or a sequencing result.

Compared with the traditional graph, the sparse graph structure used by the invention is more sparse, so that the interference of redundant information and the requirement on storage space can be reduced; meanwhile, the information stored in the graph structure is richer than the artificially labeled pseudo label information, so that the retrieval performance can be further improved on the basis of the traditional pseudo label algorithm. In addition, the utilization of the deep network structure can further improve the learning capability of the invention on the hash code, thereby increasing the practicability of the invention.

Claims

1. A depth unsupervised image retrieval method based on a sparse graph structure is characterized by comprising the following specific steps:

2. The method for deep unsupervised image retrieval based on sparse graph structure as claimed in claim 1, wherein the specific method for preprocessing the training data set is as follows:

3. The method for retrieving the depth unsupervised image based on the sparse graph structure as claimed in claim 1, wherein the specific steps of constructing the weighted sparse graph and determining the network model according to the sparse graph are as follows:

the feature of the nth image (N is 1,2, … N) is shown.

4. The method for retrieving the depth unsupervised image based on the sparse graph structure as claimed in claim 3, wherein the similarity calculation formula among the samples is as follows:

5. The sparse graph structure-based depth unsupervised image retrieval method of claim 3, wherein the weight of each edge is calculated in a manner that:

6. The method for deep unsupervised image retrieval based on sparse graph structure as claimed in claim 1, wherein the specific method for training the network model by using the image features of the training data set and the weighted sparse graph is as follows:

training formula pair z from the encoder network_nThe gradient of (d) is:

according to the chain rule, the updating process of Θ is:

η is the learning rate;

b_n ^new＝sign(z_n ^new)＝sign(f(x_n,Θ^new))