CN114170333A

CN114170333A - Image hash coding method based on direct-push type semi-supervised deep learning

Info

Publication number: CN114170333A
Application number: CN202111427674.6A
Authority: CN
Inventors: 石伟伟; 黑新宏; 王晓帆; 鲁晓锋; 贾萌
Original assignee: Xian University of Technology
Current assignee: Xian University of Technology
Priority date: 2021-11-24
Filing date: 2021-11-24
Publication date: 2022-03-11
Anticipated expiration: 2041-11-24
Also published as: CN114170333B

Abstract

The invention discloses an image hash coding method based on direct-push semi-supervised deep learning, which comprises the following steps: dividing a data set into a training sample set and a testing sample set; the training set is further divided into a marked training sample set and a non-marked training sample set; building a deep convolutional neural network model; randomly initializing class label vectors of the unlabeled samples, and then training the network model set up in the step 3) on the whole training set until the network model is converged; deducing corresponding class label vectors for all training samples; calculating the confidence corresponding to each training sample; training the network model set up in the step 3) from random initialization by using the whole training sample set until the network model converges; repeatedly executing the steps 5) -7) until the training number of the current round reaches the preset maximum training number of the round; and calculating the Hash codes of the images in the test sample set by using the trained network model. The method can obviously reduce the labeling cost of the data.

Description

Image hash coding method based on direct-push type semi-supervised deep learning

Technical Field

The invention belongs to the technical field of computer vision image hash coding, and particularly relates to an image hash coding method based on direct-push type semi-supervised depth learning.

Background

With the rapid development of the internet, an urgent need is: how to quickly retrieve an image that is the same as or similar to the query image or an image that meets specified requirements from a large-scale image database. It is obvious that the nearest neighbor search method is a solution, namely, some images which are nearest to the query image in the feature space are returned as the retrieval result. However, for a large-scale image database, the dimension of the image feature vector is usually large, and the nearest neighbor search method is very time-consuming and consumes a large storage space. As an approximation of nearest neighbor search, hash search has the advantages of low computation cost, high storage efficiency, high search speed, high search accuracy and the like, and is the most popular image search method at present. The hash retrieval is to perform nearest neighbor search according to hash coding of an image, and in general, image hash coding is to map a high-dimensional feature vector of an image into a binary vector with a smaller dimension through a set of hash functions while keeping a similarity relationship between the images unchanged, and the vector is called hash coding of the image.

The hash encoding method is classified into a conventional hash encoding method and a depth hash encoding method. The traditional Hash coding method firstly extracts the manually designed features of the image, then learns a group of Hash functions according to the extracted features, and then maps the feature vectors of the image into corresponding Hash codes according to the learned group of Hash functions. The deep hash coding method is characterized in that a feature vector and a hash function of an image are learned simultaneously from an original image through a deep convolutional neural network, and after network training is completed, input image networks can directly output hash codes of the image. The hash-coding method may be further classified into unsupervised, supervised and semi-supervised hash-coding methods based on whether class label information of an image is used in a training process.

Because the deep convolutional neural network has strong feature learning and nonlinear mapping capabilities, compared with the traditional hash coding method, the deep hash coding method has remarkable advantages. However, most of the current deep hash coding methods are supervised hash coding methods, and in order to achieve good hash coding quality and retrieval accuracy, a large number of labeled training samples are usually required for the methods. In practical applications, constructing a large-scale high-quality labeled training sample data set is very time-consuming and expensive, and even impractical for some special tasks. However, there are a large number of free images on the internet that are easily downloaded by search engines or web crawlers. The semi-supervised deep hash coding method can learn better image hash coding by using a small amount of marked samples and a large amount of unmarked samples, thereby reducing the marked amount of training samples and using a large amount of unmarked samples.

The traditional hash coding method based on the manual design features has poor performance and cannot meet the practical application. At present, most of semi-supervised deep hash coding methods adopt a graph model to model data distribution of unlabelled samples, and the methods have very high computational complexity, need a very large memory space for operation, and are not beneficial to popularization on a large-scale image data set.

The direct-push type semi-supervised learning is a semi-supervised learning method, and the core idea of the method is to consider labels of unlabelled training samples as variables needing learning and optimization, and the variables are updated and optimized in an iterative manner together with model parameters in the training process until the model converges.

The traditional direct-push semi-supervised learning method has the following two problems: first, it requires high quality feature vectors at the initial stage of training to infer reasonable label vectors for unlabeled samples. Because the quality of the feature vectors generated by the deep convolutional neural network in the initial training stage is poor, the traditional direct-push semi-supervised learning method cannot be directly combined with the training of the deep convolutional neural network. Secondly, the traditional direct-push semi-supervised learning method treats each unlabelled sample equally, can not process singular samples and uncertain samples, and influences the convergence and stability of model training.

Disclosure of Invention

The invention aims to provide an image hash coding method based on direct-push semi-supervised deep learning, which is independent of a network structure, can be applied to any deep convolutional neural network and can obviously reduce the labeling cost of data.

The technical scheme adopted by the invention is that the image hash coding method based on the direct-push type semi-supervised deep learning comprises the following steps:

1) preparing an image data set, and dividing the data set into a training sample set and a testing sample set;

2) the training set is further divided into a marked training sample set and a non-marked training sample set;

3) building a deep convolutional neural network model of an image Hash coding method based on direct-push type semi-supervised deep learning;

4) setting the confidence degrees of all the labeled samples to be 1, setting the confidence degrees of all the unlabeled samples to be 0, randomly initializing the class label vectors of the unlabeled samples, and then training the network model built in the step 3) on the whole training set until the network model is converged;

5) deducing corresponding class label vectors for all training samples based on the layer parameters of the currently learned network model;

6) calculating a confidence corresponding to each training sample based on the layer parameters of the currently learned network model and the inferred class label vector of the training sample;

7) training the network model set up in the step 3) from random initialization by using the whole training sample set based on the class label vectors and the confidence degrees of all the current training samples until the network model converges;

8) repeatedly executing the steps 5), 6) and 7) until the current round training number reaches the preset maximum round training number;

9) and calculating the Hash codes of the images in the test sample set by using the trained network model.

The present invention is also characterized in that,

the specific implementation method of the step 2) is as follows:

given training sample set

Wherein

And

respectively representing a marked training sample set and a non-marked training sample set, L representing the number of marked training samples, U representing the number of non-marked training samples, L being generally much smaller than U, X_iRepresenting the ith training sample image; if the image is

Represents X_iCorresponding class label vector if image X_iContains the label of the jth category and,

otherwise

C represents a data set

The number of category labels of (1); image X_iPossibly containing a plurality of category labels, i.e. y_iThere may be a plurality of components of 1; let N-L + U denote the total number of training sample images.

The specific implementation method of the step 3) is as follows:

giving a deep convolutional neural network model, replacing the last layer of the deep convolutional neural network model with two new fully-connected layers which are respectively used for image hash coding and image classification and are respectively called a hash coding layer hcl and an image classification layer cls; in the newly constructed network, the hash coding layer hcl is in front of the network, and the image classification layer cls is behind the network, namely the image classification layer cls is at the last layer of the network; the number of neurons in the hash encoding layer hcl is the same as the number of bits in the hash encoding, and the number of neurons in the image classification layer cls is the same as the number of class labels in the image data set.

The specific implementation method of the step 4) is as follows:

401) constructing a classification loss function;

402) constructing a Hash coding learning function;

403) constructing a Min-Max characteristic regular term with confidence coefficient;

404) constructing a total target function by combining a classification loss function, a Hash coding learning function and a Min-Max characteristic regular term with confidence coefficient;

405) based on the total objective function, a mini-batch-based stochastic gradient descent method is used for training a deep convolutional neural network model.

The specific implementation method of step 401) is as follows:

for a single-label image dataset and a multi-label image dataset, different classification loss functions are respectively adopted:

for a single label image dataset: applying softmax activation function at classification level, and classification loss function

Comprises the following steps:

wherein the content of the first and second substances,

a set of images representing a training sample is shown,

representing the network model learned by the previous training as sample image X_iThe broken class label vector and training in the current roundThe process is kept unchanged, and the process is not changed,

the jth component of

Is expressed as sample X_iInferred jth category label;

v_irepresenting a sample image X_iConfidence of, i.e. v_iRepresented as sample image X_iInferred class label vector

The degree of certainty;

a set of parameters representing a network model layer,

representing a sample image X_iInputting an output vector of the currently trained network model at the last classification layer; i (cond) represents an indicator function, which has a value of 1 if the condition cond is true, and 0 otherwise;

for a multi-label image dataset: there is no activation function at the classification level, and the classification loss function used is:

wherein the content of the first and second substances,

representing the network model learned by the previous training as sample image X_iThe broken class label vector is kept unchanged in the training of the current round,

the jth component of

Represented as sample image X_iInferred jth category label;

representing a sample image X_iAnd inputting an output vector of the currently trained network model at the last classification layer.

The specific implementation method of step 402) is as follows:

definition image X_iAnd X_jThe similarity between them is ω_ij: if X is_iAnd X_jAre semantically similar, i.e. there is at least one common class label, ω_ij1 is ═ 1; otherwise ω is_ij0; assume N training sample images

The corresponding hash code is B ═ B₁,…,b_N]I.e. b_iAs an image X_iHash coding of (1); similarity omega between N training samples { omega ═ omega_ijThe likelihood function of is:

wherein the content of the first and second substances,

n is a successive multiplication symbol, v_ij＝v_i·v_jRepresenting the similarity ω between samples_ijThe degree of confidence of (a) is,

representing a hash code b_iExp (-) represents an exponential function;

based on the above description, the proposed hash coding learning function

Comprises the following steps:

the specific implementation method of step 403) is as follows:

constructing a Min-Max feature regular term with confidence coefficient, and constructing a Min-Max feature regular term with confidence coefficient

Comprises the following steps:

wherein v is_ij＝v_i·v_j，x_iRepresenting a sample image X_iThe feature vector of (2).

The overall objective function constructed in step 404)

Comprises the following steps:

in the above formula, the three items on the right side are a classification loss function, a hash coding learning function and a Min-Max feature regular item with confidence coefficient, wherein the classification loss function is applied to the classification layer, the hash coding learning function is applied to the hash coding layer, and the Min-Max feature regular item with confidence coefficient is applied to the hash coding layerThe Min-Max feature regular term is applied to the feature layer, lambda is a hyperparameter used for adjusting the balance between the three terms on the right,

a set of images representing a training sample is shown,

represented as sample image X_iThe vector of the class labels that are inferred,

a set of parameters representing a network model layer,

The degree of certainty; if it is not

Then during the entire training process,

are all equal to y_iI.e. by

If it is not

Representing the network model learned from the previous round as a sample image X_iThe inferred category label vector.

The specific implementation method of the step 5) is as follows:

fixing

Updating

That is, based on the network model learned in the current round, corresponding class label vectors are deduced for all training samples:

for a single label image dataset: the classification loss function for the label-free sample set is:

because the classification losses of different samples are independent of each other, and v_iNon-negative constants, so the above formula can be rewritten as U mutually independent sub-optimization problems:

the optimal solution of the above sub-optimization problem is:

based on the optimal solution

Thereby obtaining

For a multi-label image dataset: the classification loss function for the label-free sample set is:

the optimal solution of the above sub-optimization problem is:

based on the optimal solution

Thereby obtaining

The specific implementation method of the step 6) is as follows:

for a single label image dataset, in feature space, a sample image X is defined_iTo the same kind of sample_iCan be expressed as:

wherein |₂Represents the modulo length of the vector if

Otherwise

For a multi-label image dataset, in feature space, a sample image X is defined_iTo similar samples_iCan be expressed as:

wherein |₂The length of the modulus, ω, representing the vector_ijRepresentation image X_iAnd X_jIf X is similar to each other_iAnd X_jThere is at least one common class label, then ω_ij1, otherwise ω_ij＝0；

Thus, image X_iConfidence of r_iThe calculation formula of (2) is as follows:

wherein z is_i＝exp(-d_i)，z_max＝max{z₁,z₂,…,z_N}, exp (·) denotes an exponential function.

The invention has the beneficial effects that:

(1) the method expands and applies the traditional direct-push type semi-supervised learning method to the Hash coding method based on deep learning, and provides the image Hash coding method based on the direct-push type semi-supervised deep learning.

(2) The method introduces confidence into the label-free training sample, thereby greatly reducing the adverse effect of the uncertain sample on the training process and ensuring that the convergence process of the network model is more stable.

(3) The method provides a hash code learning function, so that the hash codes of similar samples have smaller Hamming distance.

(4) The method of the invention provides a Min-Max characteristic regular term with confidence coefficient, so that in a characteristic space: the distance between similar samples is as small as possible and the distance between non-similar samples is as large as possible.

(5) The image hash coding method based on the direct-push semi-supervised deep learning provided by the invention does not depend on a network structure, and can be applied to any deep convolutional neural network. Meanwhile, the method can obviously reduce the labeling cost of the data.

Drawings

Fig. 1 is a deep convolutional network model of an image hash coding method based on direct-push semi-supervised deep learning according to the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

The invention provides an image hash coding method based on direct-push type semi-supervised deep learning, which comprises the following steps as shown in figure 1:

the specific implementation method of the step 2) is as follows:

given training sample set

Wherein

And

respectively representing a marked training sample set and a non-marked training sample set, L representing the number of marked training samples, U representing the number of non-marked training samples, L being generally much smaller than U, X_iRepresenting the ith training sample image. If the image is

otherwise

C represents a data set

The number of category labels of (1). Image X_iPossibly containing a plurality of category labels, i.e. y_iThere may be multiple components of 1. Let N-L + U denote the total number of training sample images.

The purpose of the depth image hash coding method is as follows: learning a Hamming space from an image space to a Hamming space based on a deep convolutional neural network { -1,1}^KNon-linear hash mapping of

The mapping can map the image X into a K-bit Hash code b ═ h (X), so that the similarity between different image pairs can be still maintained in a Hamming space, and K is the bit number of the image Hash code; for convenience of description, image X is illustrated_iThe hash code of (b)_i＝h(X_i)。

the specific implementation method of the step 3) is as follows:

For a more intuitive description, a deep convolutional neural network model based on a direct-push semi-supervised deep learning image hash coding method is described below by taking a deep convolutional neural network as an example. Referring to fig. 1, given a deep convolutional neural network, the network consists of several convolutions, feature layers and classification layers, wherein the feature layer is the penultimate layer of the network and the classification layer is the last layer of the network. As described above, the classification layer is replaced with two new fully-connected layers, namely the hash encoding layer hcl and the classification layer cls, and the rest of the network structure is unchanged. The number of neurons in the hash coding layer hcl and the classification layer cls are K and C, respectively. In the network training process, a Hash coding learning function is applied to a Hash coding layer, and a classification loss function is applied to a classification layer. The following describes more detailed details of the image hash coding method based on the direct-push semi-supervised deep learning.

the specific implementation method of the step 4) is as follows:

401) constructing a classification loss function;

the specific implementation method of step 401) is as follows:

the method provided by the invention can process a single-label image data set and can also process a multi-label image data set. The classification loss function is applied at the classification level. For a single-label image dataset and a multi-label image dataset, the invention respectively adopts different classification loss functions:

Comprises the following steps:

wherein the content of the first and second substances,

a set of images representing a training sample is shown,

the jth component of

Is expressed as sample X_iInferred jth category label.

The degree of certainty.

A set of parameters representing a network model layer,

representing a sample image X_iAnd inputting an output vector of the currently trained network model at the last classification layer. I (cond) represents an indicator function, which has a value of 1 if the condition cond is true, and 0 otherwise;

wherein the content of the first and second substances,

the jth component of

Represented as sample image X_iInferred jth category label;

402) Constructing a Hash coding learning function;

the specific implementation method of step 402) is as follows:

definition image X_iAnd X_jThe similarity between them is ω_ij: if X is_iAnd X_jAre semantically similar (i.e., there is at least one common class label), ω_ij1 is ═ 1; otherwise ω is_ij0; assume N training sample images

wherein the content of the first and second substances,

representing a hash code b_iExp (-) represents an exponential function;

based on the above description, the hash coding learning function proposed by the present invention

Comprises the following steps:

the specific implementation method of step 403) is as follows:

in order to learn better image hash coding, the invention provides a Min-Max feature regular term with confidence coefficient, wherein the Min-Max feature regular term with confidence coefficient is applied to a feature layer during network training, and the regular term explicitly enables the features learned by the network to have the following attributes: if there is at least one common class label for both images, the distance between their feature vectors should be as small as possible, otherwise the distance between their feature vectors should be as large as possible. Constructing a Min-Max feature regularization term with confidence coefficient, wherein the Min-Max feature regularization term with confidence coefficient is provided by the invention

Comprises the following steps:

wherein v is_ij＝v_i·v_j，x_iRepresenting a sample image X_iThe feature vector (output of the network feature layer).

the overall objective function constructed in step 404)

Comprises the following steps:

in the above formula, the three items on the right side are respectively a classification loss function, a hash coding learning function and a Min-Max feature regular item with confidence coefficient, the classification loss function is applied to the classification layer, the hash coding learning function is applied to the hash coding layer, the Min-Max feature regular item with confidence coefficient is applied to the feature layer, lambda is a hyper-parameter for adjusting the balance between the three items on the right side,

a set of images representing a training sample is shown,

a set of parameters representing a network model layer,

The degree of certainty; if it is not

Then during the entire training process,

are all equal to y_iI.e. by

If it is not

the specific implementation method of the step 5) is as follows:

fixing

Updating

That is, based on the network model learned in the current round, corresponding class label vectors are deduced for all training samples: in fact, only the corresponding class needs to be inferred for all the unlabeled training sample imagesThe label vector is just needed.

the optimal solution of the above sub-optimization problem is:

based on the optimal solution

Thereby obtaining

the optimal solution of the above sub-optimization problem is:

based on the optimal solution

Thereby obtaining

the specific implementation method of the step 6) is as follows:

if the image is

Then its confidence v is always set throughout the training process_i1 is ═ 1; if it is not

The present invention calculates r based on two intuitive assumptions_i: (1) in the feature space, if the average distance from one sample to other similar samples or similar samples is smaller, the sample is closer to the class center of the corresponding class, and higher confidence is given; (2) in the feature space, if the average distance from one sample to other similar samples or similar samples is larger, the sample is farther away from the center of the corresponding class, and a lower confidence degree is given; defining a sample image X_iIs x (the output of the network feature layer)_i；

wherein |₂Represents the modulo length of the vector if

Otherwise

Thus, based on the above analysis, whether a single-label image dataset or a multi-label image dataset, d_iThe smaller, X_iThe easier it is to be assigned the correct tag vector, the higher confidence should be assigned; thus, the image X proposed by the present invention_iConfidence of r_iThe calculation formula of (2) is as follows:

7) Training the network model set up in the step 3) from random initialization by using the whole training sample set based on the class label vectors and the confidence degrees of all the current training samples until the network model converges; wherein, a total target function is adopted, and a mini-batch-based stochastic gradient descent method is used for training a deep convolution neural network model.

8) Repeatedly executing the steps 5), 6) and 7) until the current round training number reaches the preset maximum round training number; according to experiments, the preset maximum number of training rounds is generally set to be 4.

The experimental results are as follows:

the invention has been experimented with three commonly used image datasets CIFAR10, NUS-WIDE and MIR-Flickr25K, respectively.

The CIFAR10 dataset is a single label image dataset with 60000 images in total, 10 classes. According to the common semi-supervised learning setting of the data set, 1000 images (100 images are randomly selected for each class) are used as a query set, and the rest 59000 images are used as a retrieval database; and taking all the images in the retrieval database as a training sample set, wherein 5000 images are taken as labeled training sample sets, and the rest 54000 images are taken as unlabeled training sample sets.

NUS-WIDE is a multi-label image dataset, with a total of about 270000 images. According to the semi-supervised learning setting commonly used by the data set, 2100 images are used as a query set, 10500 images are used as a labeled training sample set, 149733 images are used as an unlabeled training sample set, and the images of the whole training set are used as a retrieval database.

The MIR-Flickr25K dataset is a multi-label image dataset, with a total of 25000 images. According to the common semi-supervised learning setting of the data set, 1000 images are used as a query set, 5000 images are used as an annotated training sample set, 19000 images are used as an annotated training sample set, and the images of the whole training set are used as a retrieval database.

Currently, very representative Hash coding methods based on semi-supervised learning are DSH-GANs, SSDH and BGDH. For fair comparison, the present invention was experimented with the deep convolutional neural network model used by these methods.

In the experiments on these three data sets, the parameter λ in the overall objective function of the present invention was set to 0.01. The maximum number of rounds of training is set to 4.

The experimental test method comprises the following steps: after the network model is trained, respectively calculating a query set corresponding to the data set and hash codes of images in a retrieval database, then calculating retrieval performance based on the hash codes of the images, and adopting MAP scores commonly used in academia for evaluation indexes (the larger the MAP value is, the better the description method is).

The results of the different methods on the corresponding data sets are given in tables 1, 2 and 3, respectively. The experimental results show that the method of the invention is obviously superior to other comparison methods in the table, and the superiority of the method of the invention is fully proved.

In addition, the method of the invention also performs ablation experiments on the three data sets to verify the confidence coefficient, and the experimental results are shown in tables 4, 5 and 6. The experimental results fully verify that confidence is introduced into the label-free training samples, so that the adverse effect of uncertain samples on the training process can be greatly reduced, and the convergence process of the network model is more stable.

TABLE 1 MAP scores on CIFAR10 data set by different methods

TABLE 2 MAP scores on NUS-WIDE data set by different methods

TABLE 3 MAP scores on MIR-Flickr25K data set by different methods

TABLE 4 Effect of whether confidence is used on the MAP score on the CIFAR10 dataset

TABLE 5 Effect of whether confidence is used on MAP scores on NUS-WIDE datasets

TABLE 6 influence of whether confidence was used on MAP scores on the MIR-Flickr25K dataset

Claims

1. The image hash coding method based on the direct-push semi-supervised deep learning is characterized by comprising the following steps of:

2. The image hash coding method based on the direct-push semi-supervised deep learning of claim 1, wherein the specific implementation method of step 2) is as follows:

given training sample set

Wherein

And

otherwise

C represents a data set

3. The image hash coding method based on the direct-push semi-supervised deep learning of claim 2, wherein the specific implementation method of step 3) is as follows:

4. The image hash coding method based on the direct-push semi-supervised deep learning of claim 3, wherein the specific implementation method of step 4) is as follows:

401) constructing a classification loss function;

402) constructing a Hash coding learning function;

5. The image hash coding method based on the direct-push semi-supervised deep learning of claim 4, wherein the specific implementation method of step 401) is as follows:

Comprises the following steps:

wherein the content of the first and second substances,

a set of images representing a training sample is shown,

the jth component of

Is expressed as sample X_iInferred jth category label;

The degree of certainty;

a set of parameters representing a network model layer,

wherein the content of the first and second substances,

the jth component of

Represented as sample image X_iInferred jth category label;

6. The image hash coding method based on the direct-push semi-supervised deep learning of claim 4, wherein the specific implementation method of step 402) is as follows:

wherein the content of the first and second substances,

representing a hash code b_iExp (-) represents an exponential function;

based on the above description, the proposed hash coding learning function

Comprises the following steps:

7. the image hash coding method based on the direct-push semi-supervised deep learning of claim 4, wherein the specific implementation method of step 403) is as follows:

Comprises the following steps:

8. The image hash coding method based on direct-push semi-supervised deep learning as claimed in claim 4, wherein the overall objective function constructed in step 404) is

Comprises the following steps:

a set of images representing a training sample is shown,

a set of parameters representing a network model layer,

The degree of certainty; if it is not

Then during the entire training process,

are all equal to y_iI.e. by

If it is not

Representing the network model learned from the previous round as a sample image X_iInferred categoryA tag vector.

9. The image hash coding method based on the direct-push semi-supervised deep learning of claim 4, wherein the specific implementation method of step 5) is as follows:

fixing

Updating

the optimal solution of the above sub-optimization problem is:

based on the optimal solution

Thereby obtaining

the optimal solution of the above sub-optimization problem is:

based on the optimal solution

Thereby obtaining

10. The image hash coding method based on the direct-push semi-supervised deep learning of claim 9, wherein the specific implementation method of step 6) is as follows:

wherein |₂Represents the modulo length of the vector if

Otherwise

Thus, image X_iConfidence of r_iThe calculation formula of (2) is as follows: