CN114170333B - Image hash coding method based on direct-push type semi-supervised deep learning - Google Patents
Image hash coding method based on direct-push type semi-supervised deep learning Download PDFInfo
- Publication number
- CN114170333B CN114170333B CN202111427674.6A CN202111427674A CN114170333B CN 114170333 B CN114170333 B CN 114170333B CN 202111427674 A CN202111427674 A CN 202111427674A CN 114170333 B CN114170333 B CN 114170333B
- Authority
- CN
- China
- Prior art keywords
- image
- training
- network model
- sample
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/002—Image coding using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/001—Model-based coding, e.g. wire frame
Abstract
The invention discloses an image hash coding method based on direct-push semi-supervised deep learning, which comprises the following steps: dividing a data set into a training sample set and a testing sample set; the training set is further divided into a marked training sample set and a non-marked training sample set; building a deep convolutional neural network model; randomly initializing class label vectors of the unlabeled samples, and then training the network model set up in the step 3) on the whole training set until the network model is converged; deducing corresponding class label vectors for all training samples; calculating the confidence corresponding to each training sample; training the network model set up in the step 3) from random initialization by using the whole training sample set until the network model converges; repeatedly executing the steps 5) -7) until the training number of the current round reaches the preset maximum training number of the round; and calculating the Hash codes of the images in the test sample set by using the trained network model. The method can obviously reduce the labeling cost of the data.
Description
Technical Field
The invention belongs to the technical field of computer vision image hash coding, and particularly relates to an image hash coding method based on direct-push type semi-supervised deep learning.
Background
With the rapid development of the internet, an urgent need is: how to quickly retrieve an image identical or similar to a query image or an image satisfying a specified requirement from a large-scale image database. It is obvious that the nearest neighbor search method is a solution, that is, some images closest to the query image in the feature space are returned as the retrieval result. However, for a large-scale image database, the dimension of the image feature vector is usually large, and the nearest neighbor search method is very time-consuming and consumes a large storage space. As an approximation of nearest neighbor search, hash search has the advantages of low computation cost, high storage efficiency, high search speed, high search accuracy and the like, and is the most popular image search method at present. The hash retrieval is to perform nearest neighbor search according to hash coding of an image, and in general, image hash coding is to map a high-dimensional feature vector of an image into a binary vector with a smaller dimension through a set of hash functions while keeping a similarity relationship between the images unchanged, and the vector is called hash coding of the image.
The hash coding method is classified into a conventional hash coding method and a depth hash coding method. The traditional Hash coding method firstly extracts the manually designed features of the image, then learns a group of Hash functions according to the extracted features, and then maps the feature vectors of the image into corresponding Hash codes according to the learned group of Hash functions. The deep hash coding method is characterized in that a feature vector and a hash function of an image are learned simultaneously from an original image through a deep convolutional neural network, and after network training is completed, input image networks can directly output hash codes of the image. The hash-coding method may be further classified into unsupervised, supervised and semi-supervised hash-coding methods based on whether class label information of an image is used in a training process.
Because the deep convolutional neural network has strong feature learning and nonlinear mapping capabilities, compared with the traditional hash coding method, the deep hash coding method has remarkable advantages. However, most of the current deep hash coding methods are supervised hash coding methods, and in order to achieve good hash coding quality and retrieval accuracy, a large number of labeled training samples are usually required for the methods. In practical applications, constructing a large-scale high-quality labeled training sample data set is very time-consuming and expensive, and even impractical for some special tasks. However, there are a large number of free images on the internet that are easily downloaded by search engines or web crawlers. The semi-supervised deep hash coding method can learn better image hash coding by using a small amount of marked samples and a large amount of unmarked samples, thereby reducing the marked amount of training samples and using a large amount of unmarked samples.
The traditional hash coding method based on the manual design features has poor performance and cannot meet the practical application. At present, most of semi-supervised deep hash coding methods adopt a graph model to model data distribution of unlabelled samples, and the methods have very high computational complexity, need a very large memory space for operation, and are not beneficial to popularization on a large-scale image data set.
The direct-push type semi-supervised learning is a semi-supervised learning method, and the core idea of the method is to consider labels of unlabelled training samples as variables needing learning and optimization, and the variables are updated and optimized in an iterative manner together with model parameters in the training process until the model converges.
The traditional direct-push semi-supervised learning method has the following two problems: first, it requires high quality feature vectors at the initial stage of training to infer reasonable label vectors for unlabeled samples. Because the quality of the feature vectors generated by the deep convolutional neural network in the initial training stage is poor, the traditional direct-push semi-supervised learning method cannot be directly combined with the training of the deep convolutional neural network. Secondly, the traditional direct-push semi-supervised learning method treats each unlabelled sample equally, can not process singular samples and uncertain samples, and influences the convergence and stability of model training.
Disclosure of Invention
The invention aims to provide an image hash coding method based on direct-push semi-supervised deep learning, which is independent of a network structure, can be applied to any deep convolutional neural network and can obviously reduce the labeling cost of data.
The technical scheme adopted by the invention is that the image hash coding method based on the direct-push type semi-supervised deep learning comprises the following steps:
1) Preparing an image data set, and dividing the data set into a training sample set and a testing sample set;
2) The training set is further divided into a marked training sample set and a non-marked training sample set;
3) Building a deep convolutional neural network model of an image Hash coding method based on direct-push type semi-supervised deep learning;
4) Setting the confidence degrees of all the labeled samples to be 1, setting the confidence degrees of all the unlabeled samples to be 0, randomly initializing the class label vectors of the unlabeled samples, and then training the network model built in the step 3) on the whole training set until the network model is converged;
5) Deducing corresponding class label vectors for all training samples based on the layer parameters of the currently learned network model;
6) Calculating a confidence corresponding to each training sample based on the layer parameters of the currently learned network model and the inferred class label vector of the training sample;
7) Training the network model set up in the step 3) from random initialization by using the whole training sample set based on the class label vectors and the confidence degrees of all the current training samples until the network model converges;
8) Repeatedly executing the steps 5), 6) and 7) until the current round training number reaches the preset maximum round training number;
9) And calculating the Hash codes of the images in the test sample set by using the trained network model.
The present invention is also characterized in that,
the specific implementation method of the step 2) is as follows:
given training sample setWhereinAndrespectively representing a marked training sample set and a non-marked training sample set, L representing the number of marked training samples, U representing the number of non-marked training samples, L being generally much smaller than U, X i Show the ith trainingA sample image; if the image Represents X i Corresponding class label vector if image X i There is a tag for the jth category,otherwiseC represents a data setThe number of category labels of (1); image X i Possibly containing a plurality of category labels, i.e. y i There may be a plurality of components of 1; let N = L + U denote the total number of training sample images.
The specific implementation method of the step 3) is as follows:
giving a deep convolutional neural network model, replacing the last layer of the deep convolutional neural network model with two new fully-connected layers which are respectively used for image hash coding and image classification and are respectively called a hash coding layer hcl and an image classification layer cls; in the newly built network, a Hash coding layer hcl is in front of the network, and an image classification layer cls is behind the network, namely the image classification layer cls is at the last layer of the network; the number of the hash coding layer hcl neurons is the same as the number of bits of hash coding, and the number of the image classification layer cls neurons is the same as the number of class labels of the image data set.
The specific implementation method of the step 4) is as follows:
401 Constructing a classification loss function;
402 Construct a hash-coding learning function;
403 Constructing a Min-Max feature regular term with confidence coefficient;
404 Combining a classification loss function, a Hash coding learning function and a Min-Max characteristic regular term with confidence coefficient to construct a total target function;
405 Based on the overall objective function, a mini-batch based stochastic gradient descent method is used to train the deep convolutional neural network model.
The specific implementation method of step 401) is as follows:
for a single-label image dataset and a multi-label image dataset, different classification loss functions are respectively adopted:
for a single label image dataset: applying softmax activation function at classification level, and classification loss functionComprises the following steps:
wherein the content of the first and second substances,a set of images representing a training sample is shown,representing the network model learned by the previous training as sample image X i The broken class label vector is kept unchanged in the training of the current round,the jth component ofIs expressed as sample X i Inferred jth category label;v i representing a sample image X i Of (c), i.e. v i Represented as sample image X i Inferred class label vectorThe degree of certainty;a set of parameters representing a network model layer,representing a sample image X i Inputting an output vector of the currently trained network model at the last classification layer; i (cond) represents an indicator function, which has a value of 1 if the condition cond is true, and 0 otherwise;
for a multi-label image dataset: there is no activation function at the classification level, and the classification loss function used is:
wherein, the first and the second end of the pipe are connected with each other,representing the network model learned from the previous training round as sample image X i The broken class label vector is kept unchanged in the training of the current round,the jth component ofRepresented as sample image X i Inferred jth category label;representing a sample image X i And inputting an output vector of the currently trained network model at the last classification layer.
The specific implementation method of step 402) is as follows:
definition image X i And X j The similarity between them is ω ij : if X is i And X j Are semantically similar, i.e. there is at least one common class label, ω ij =1; otherwise ω is ij =0; assume N training sample imagesThe corresponding hash code is B = [ B = [ ] 1 ,…,b N ]I.e. b i As an image X i Hash coding of (1); similarity Ω = { ω = between N training samples ij The likelihood function of is:
wherein the content of the first and second substances,
n is a successive multiplication symbol, v ij =v i ·v j Representing the similarity ω between samples ij The degree of confidence of (a) is,representing a hash code b i Exp (-) represents an exponential function;
based on the above description, the proposed hash coding learning functionComprises the following steps:
the specific implementation method of step 403) is as follows:
constructing a Min-Max feature regularization term with confidence coefficient, and constructing a Min-Max feature regularization term with confidence coefficientComprises the following steps:
wherein v is ij =v i ·v j ,x i Representing a sample image X i The feature vector of (2).
in the above formula, the three items on the right side are respectively a classification loss function, a hash coding learning function and a Min-Max feature regular item with confidence coefficient, the classification loss function is applied to the classification layer, the hash coding learning function is applied to the hash coding layer, the Min-Max feature regular item with confidence coefficient is applied to the feature layer, lambda is a hyper-parameter for adjusting the balance between the three items on the right side,a set of images representing a training sample is shown, represented as sample image X i The vector of the class labels that are inferred,a set of parameters representing a network model layer,v i representing a sample image X i Confidence of, i.e. v i Represented as sample image X i Inferred class label vectorThe degree of certainty; if it is notThen during the entire training process,are all equal to y i I.e. byIf it is used Representing the network model learned from the previous round as a sample image X i The inferred category label vector.
The specific implementation method of the step 5) is as follows:
fixing the deviceUpdatingThat is, based on the network model learned in the current round, corresponding class label vectors are deduced for all training samples:
for a single label image dataset: the classification loss function for the label-free sample set is:
because the classification losses of different samples are independent of each other, and v i Non-negative constants, so the above formula can be rewritten as U mutually independent sub-optimization problems:
the optimal solution of the above sub-optimization problem is:based on the optimal solutionThereby obtaining
For a multi-label image dataset: the classification loss function for the label-free sample set is:
because the classification losses of different samples are independent of each other, and v i Non-negative constants, so the above formula can be rewritten as U mutually independent sub-optimization problems:
the optimal solution of the above sub-optimization problem is:based on the optimal solutionThereby obtaining
The specific implementation method of the step 6) is as follows:
for a single label image dataset, in feature space, a sample image X is defined i To homogeneous samples d i Can be expressed as:
For a multi-label image dataset, in feature space, a sample image X is defined i To similar samples i Can be expressed as:
wherein | 2 Representing the modular length, ω, of the vector ij Representation image X i And X j If X is similar to each other i And X j There is at least one common class label, then ω ij =1, otherwise ω ij =0;
Thus, image X i Confidence of r i The calculation formula of (2) is as follows:
wherein z is i =exp(-d i ),z max =max{z 1 ,z 2 ,…,z N }, exp (·) denotes an exponential function.
The beneficial effects of the invention are:
(1) The method expands and applies the traditional direct-push type semi-supervised learning method to the Hash coding method based on deep learning, and provides the image Hash coding method based on the direct-push type semi-supervised deep learning.
(2) The method introduces confidence into the label-free training sample, thereby greatly reducing the adverse effect of the uncertain sample on the training process and enabling the convergence process of the network model to be more stable.
(3) The method provides a hash code learning function, so that the hash codes of similar samples have smaller Hamming distance.
(4) The method of the invention provides a Min-Max characteristic regular term with confidence coefficient, so that in a characteristic space: the distance between similar samples is as small as possible and the distance between non-similar samples is as large as possible.
(5) The image hash coding method based on the direct-push semi-supervised deep learning provided by the invention does not depend on a network structure, and can be applied to any deep convolutional neural network. Meanwhile, the method can obviously reduce the labeling cost of the data.
Drawings
Fig. 1 is a deep convolutional network model of an image hash coding method based on direct-push semi-supervised deep learning according to the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention provides an image hash coding method based on direct-push type semi-supervised deep learning, which comprises the following steps as shown in figure 1:
1) Preparing an image data set, and dividing the data set into a training sample set and a testing sample set;
2) The training set is further divided into a marked training sample set and a non-marked training sample set;
the specific implementation method of the step 2) is as follows:
given training sample setWhereinAndrespectively representing a marked training sample set and a non-marked training sample set, L representing the number of marked training samples, U representing the number of non-marked training samples, L being generally much smaller than U, X i Representing the ith training sample image. If the image is Represents X i Corresponding class label vector if image X i Contains the label of the jth category and,otherwiseC represents a data setThe number of category labels of (1). Image X i Possibly containing a plurality of category labels, i.e. y i There may be multiple components of 1. Let N = L + U denote the total number of training sample images.
The purpose of the depth image hash coding method is as follows: learning a Hamming space from an image space to a Hamming space { -1,1 }based on a deep convolutional neural network K Non-linear hashing ofMappingThis mapping can map image X into a K-bit hash code b = h (X), so that in hamming space the similarity between different image pairs can still be maintained, K being the number of bits of the image hash code; for convenience of description, image X is illustrated i The hash code of (b) i =h(X i )。
3) Building a deep convolutional neural network model of an image Hash coding method based on direct-push type semi-supervised deep learning;
the specific implementation method of the step 3) is as follows:
giving a deep convolutional neural network model, replacing the last layer of the deep convolutional neural network model with two new fully-connected layers which are respectively used for image hash coding and image classification and are respectively called a hash coding layer hcl and an image classification layer cls; in the newly built network, a Hash coding layer hcl is in front of the network, and an image classification layer cls is behind the network, namely the image classification layer cls is at the last layer of the network; the number of the hash coding layer hcl neurons is the same as the number of bits of hash coding, and the number of the image classification layer cls neurons is the same as the number of class labels of the image data set.
For a more intuitive description, a deep convolutional neural network model of the image hash coding method based on the direct-push semi-supervised deep learning is illustrated below by taking a deep convolutional neural network as an example. Referring to fig. 1, given a deep convolutional neural network, the network is composed of several convolutions, feature layers and classification layers, wherein the feature layer is the penultimate layer of the network and the classification layer is the last layer of the network. As described above, the classification layer is replaced with two new fully-connected layers, namely the hash coding layer hcl and the classification layer cls, and the rest of the network structure is unchanged. The number of neurons in the hash encoding layer hcl and the classification layer cls are K and C, respectively. In the network training process, a Hash coding learning function is applied to a Hash coding layer, and a classification loss function is applied to a classification layer. The following describes more detailed details of the image hash coding method based on the direct-push semi-supervised deep learning.
4) Setting the confidence degrees of all the labeled samples to be 1, setting the confidence degrees of all the unlabeled samples to be 0, randomly initializing the class label vectors of the unlabeled samples, and then training the network model built in the step 3) on the whole training set until the network model is converged;
the specific implementation method of the step 4) is as follows:
401 Constructing a classification loss function;
the specific implementation method of step 401) is as follows:
the method provided by the invention can process a single-label image data set and can also process a multi-label image data set. The classification loss function is applied at the classification level. For a single-label image dataset and a multi-label image dataset, the invention respectively adopts different classification loss functions:
for a single label image dataset: applying softmax activation function at classification level, and classification loss functionComprises the following steps:
wherein the content of the first and second substances,a set of images representing a training sample is shown,representing the network model learned by the previous training as sample image X i The broken class label vector is kept unchanged in the training of the current round,the jth component ofIs expressed as sample X i Is pushed byBroken jth category label.v i Representing a sample image X i Confidence of, i.e. v i Represented as sample image X i Inferred category label vectorThe degree of certainty.A set of parameters representing a network model layer,representing a sample image X i And inputting the output vector of the currently trained network model at the last classification layer. I (cond) represents an indicator function, which has a value of 1 if the condition cond is true, and 0 otherwise;
for a multi-label image dataset: there is no activation function at the classification level, and the classification loss function used is:
wherein the content of the first and second substances,representing the network model learned by the previous training as sample image X i The broken class label vector is kept unchanged in the training of the current round,the jth component of (1)Represented as sample image X i Inferred jth category label;representing a sample image X i And inputting the output vector of the currently trained network model at the last classification layer.
402 Construct a hash-coding learning function;
the specific implementation method of step 402) is as follows:
definition image X i And X j The similarity between them is ω ij : if X is i And X j Are semantically similar (i.e., there is at least one common class label), ω ij =1; otherwise ω is ij =0; assume N training sample imagesThe corresponding hash code is B = [ B = [ ] 1 ,…,b N ]I.e. b i Is an image X i Hash coding of (1); similarity Ω = { ω = between N training samples ij The likelihood function of is:
wherein the content of the first and second substances,
n is a successive multiplication symbol, v ij =v i ·v j Representing the similarity ω between samples ij The degree of confidence of (a) is,representing a hash code b i Exp (-) represents an exponential function;
based on the above description, the hash coding learning function proposed by the present inventionComprises the following steps:
403 Constructing a Min-Max feature regular term with confidence coefficient;
the specific implementation method of step 403) is as follows:
in order to learn better image hash coding, the invention provides a Min-Max characteristic regular term with confidence coefficient, wherein the Min-Max characteristic regular term with the confidence coefficient is applied to a characteristic layer during network training, and the regular term explicitly enables the characteristics learned by the network to have the following attributes: if there is at least one common class label for both images, the distance between their feature vectors should be as small as possible, otherwise the distance between their feature vectors should be as large as possible. Constructing a Min-Max feature regularization term with confidence coefficient, the Min-Max feature regularization term with confidence coefficient provided by the inventionComprises the following steps:
wherein v is ij =v i ·v j ,x i Representing a sample image X i The feature vector (output of the network feature layer).
404 Combining a classification loss function, a Hash coding learning function and a Min-Max characteristic regular term with confidence coefficient to construct a total target function;
in the above formula, the three items on the right side are respectively a classification loss function, a hash coding learning function and a Min-Max feature regular item with confidence coefficient, the classification loss function is applied to the classification layer, the hash coding learning function is applied to the hash coding layer, the Min-Max feature regular item with confidence coefficient is applied to the feature layer, lambda is a hyper-parameter for adjusting the balance between the three items on the right side,a set of images representing a training sample is shown, represented as sample image X i The vector of the class labels that are inferred,a set of parameters representing a network model layer,v i representing a sample image X i Confidence of, i.e. v i Represented as sample image X i Inferred class label vectorThe degree of certainty; if it is notThen during the entire training process,are all equal to y i I.e. byIf it is not Representing the network model learned from the previous round as a sample image X i The inferred category label vector.
405 Based on the overall objective function, a mini-batch based stochastic gradient descent method is used to train the deep convolutional neural network model.
5) Deducing corresponding class label vectors for all training samples based on the layer parameters of the currently learned network model;
the specific implementation method of the step 5) is as follows:
fixingUpdatingThat is, based on the network model learned in the current round, corresponding class label vectors are deduced for all training samples: in fact, it is only necessary to infer the corresponding class label vectors for all unlabeled training sample images.
For a single label image dataset: the classification loss function for the unlabeled sample set is:
since the classification penalties of different samples are independent of each other, and v i Non-negative constants, so the above formula can be rewritten as U mutually independent sub-optimization problems:
the optimal solution of the above sub-optimization problem is:based on the optimal solutionThereby obtaining
For a multi-label image dataset: the classification loss function for the label-free sample set is:
because the classification losses of different samples are independent of each other, and v i Non-negative constants, so the above formula can be rewritten as U mutually independent sub-optimization problems:
the optimal solution of the above sub-optimization problem is:based on the optimal solutionThereby obtaining
6) Calculating a confidence corresponding to each training sample based on the layer parameters of the currently learned network model and the inferred class label vector of the training sample;
the specific implementation method of the step 6) is as follows:
if the image isThen its confidence v is always set throughout the training process i =1; if it is notThe present invention calculates r based on two intuitive assumptions i : (1) In the feature space, if the average distance from one sample to other similar samples or similar samples is smaller, the sample is closer to the class center of the corresponding class, and higher confidence is given; (2) In the feature space, if the average distance from one sample to other similar samples or similar samples is larger, the sample is farther away from the center of the corresponding class, and a lower confidence degree is given; defining a sample image X i Is x (the output of the network feature layer) i ;
For a single label image dataset, in feature space, a sample image X is defined i To homogeneous samples d i Can be expressed as:
For a multi-label image dataset, in feature space, a sample image X is defined i To similar samples i Can be expressed as:
wherein | 2 The length of the modulus, ω, representing the vector ij Representation image X i And X j If X is similar to each other i And X j There is at least one common class label, then ω ij =1, otherwise ω ij =0;
Thus, based on the above analysis, whether a single-label image dataset or a multi-label image dataset, d i The smaller, X i The easier it is to be assigned the correct tag vector, the higher confidence should be assigned; thus, the image X proposed by the present invention i Confidence of r i The calculation formula of (c) is:
wherein z is i =exp(-d i ),z max =max{z 1 ,z 2 ,…,z N }, exp (·) denotes an exponential function.
7) Training the network model set up in the step 3) from random initialization by using the whole training sample set based on the class label vectors and the confidence degrees of all the current training samples until the network model converges; wherein, a total objective function is adopted, and a mini-batch-based stochastic gradient descent method is used for training a deep convolution neural network model.
8) Repeatedly executing the steps 5), 6) and 7) until the current round training number reaches the preset maximum round training number; according to experiments, the preset maximum number of training rounds is generally set to be 4.
9) And calculating the Hash codes of the images in the test sample set by using the trained network model.
The experimental results are as follows:
the invention respectively performs experiments on three common image data sets of CIFAR10, NUS-WIDE and MIR-Flickr 25K.
The CIFAR10 dataset is a single label image dataset, with 60000 images in total, 10 classes. According to the common semi-supervised learning setting of the data set, 1000 images (100 images are randomly selected for each class) are used as a query set, and the rest 59000 images are used as a retrieval database; and taking all the images in the retrieval database as a training sample set, wherein 5000 images are taken as labeled training sample sets, and the rest 54000 images are taken as unlabeled training sample sets.
NUS-WIDE is a multi-label image dataset, with a total of about 270000 images. According to the common semi-supervised learning setting of the data set, 2100 images are used as a query set, 10500 images are used as a labeled training sample set, 149733 images are used as a label-free training sample set, and the images of the whole training set are used as a retrieval database.
The MIR-Flickr25K dataset is a multi-label image dataset, with a total of 25000 images. According to the common semi-supervised learning setting of the data set, 1000 images serve as a query set, 5000 images serve as a labeled training sample set, 19000 images serve as an unlabeled training sample set, and the images of the whole training set serve as a retrieval database.
Currently, very representative Hash coding methods based on semi-supervised learning are DSH-GANs, SSDH and BGDH. For fair comparison, the present invention was experimented with the deep convolutional neural network model used by these methods.
In experiments on these three data sets, the parameter λ in the overall objective function of the present invention was set to 0.01. The maximum number of rounds of training is set to 4.
The experimental test method comprises the following steps: after the network model is trained, respectively calculating a query set corresponding to the data set and hash codes of images in a retrieval database, and then calculating retrieval performance based on the hash codes of the images, wherein the evaluation index adopts MAP scores commonly used in academia (the larger the MAP value is, the better the description method is).
The results of the different methods on the corresponding data sets are given in tables 1, 2 and 3, respectively. The experimental results show that the method of the invention is obviously superior to other comparison methods in the table, and the superiority of the method of the invention is fully proved.
In addition, the method of the invention also performs ablation experiments on the three data sets to verify the confidence coefficient, and the experimental results are shown in tables 4, 5 and 6. The experimental results fully verify that confidence is introduced into the label-free training samples, so that the adverse effect of uncertain samples on the training process can be greatly reduced, and the convergence process of the network model is more stable.
TABLE 1 MAP scores on CIFAR10 data set by different methods
TABLE 2 MAP scores on NUS-WIDE data set by different methods
TABLE 3 MAP scores on MIR-Flickr25K data set by different methods
TABLE 4 Effect of whether confidence is used on MAP score on CIFAR10 dataset
TABLE 5 Effect of whether confidence is used on MAP scores on NUS-WIDE datasets
TABLE 6 influence of whether confidence is used on MAP scores on the MIR-Flickr25K dataset
Claims (6)
1. The image hash coding method based on the direct-push semi-supervised deep learning is characterized by comprising the following steps of:
1) Preparing an image data set, and dividing the data set into a training sample set and a testing sample set;
2) The training set is further divided into a marked training sample set and a non-marked training sample set;
the specific implementation method of the step 2) is as follows:
given training sample setWhereinAndrespectively representing a marked training sample set and a non-marked training sample set, L representing the number of marked training samples, U representing the number of non-marked training samples, L being much smaller than U and X i Representing the ith training sample image; if the imageRepresents X i Corresponding class label vector if image X i Contains the label of the jth category and,otherwiseC represents a data setThe number of category labels of (1); image X i Possibly containing a plurality of category labels, i.e. y i There may be a plurality of components of 1; order toN = L + U represents the total number of training sample images;
3) Building a deep convolutional neural network model of an image Hash coding method based on direct-push type semi-supervised deep learning;
the specific implementation method of the step 3) is as follows:
a deep convolutional neural network model is given, the last layer of the deep convolutional neural network model is replaced by two new full connection layers which are respectively used for image hash coding and image classification and are respectively called as a hash coding layer hcl and an image classification layer cls; in the newly constructed network, the Hash coding layer hcl is in front, and the image classification layer cls is behind, namely the image classification layer cls is at the last layer of the network; the number of the hash coding layer hcl neurons is the same as the number of bits of hash coding, and the number of the image classification layer cls neurons is the same as the number of class labels of the image data set;
4) Setting the confidence degrees of all the labeled samples to be 1, setting the confidence degrees of all the unlabeled samples to be 0, randomly initializing the class label vectors of the unlabeled samples, and then training the network model built in the step 3) on the whole training set until the network model is converged;
the specific implementation method of the step 4) is as follows:
401 Constructing a classification loss function;
the specific implementation method of step 401) is as follows:
for a single-label image dataset and a multi-label image dataset, different classification loss functions are respectively adopted:
for a single label image dataset: applying softmax activation function at classification level, and classification loss functionComprises the following steps:
wherein the content of the first and second substances,a set of images representing a training sample is shown,representing the network model learned from the previous training round as sample image X i The broken class label vector is kept unchanged in the training of the current round,the jth component ofIs expressed as sample X i Inferred jth category label;v i representing a sample image X i Confidence of, i.e. v i Represented as sample image X i Inferred class label vectorThe degree of certainty;a set of parameters representing a network model layer,representing a sample image X i Inputting an output vector of the currently trained network model at the last classification layer; i (cond) represents an indicator function, which has a value of 1 if the condition cond is true, and 0 otherwise;
for a multi-label image dataset: there is no activation function at the classification level, and the classification loss function used is:
wherein the content of the first and second substances,representing the network model learned by the previous training as sample image X i The broken class label vector is kept unchanged in the training of the current round,the jth component ofRepresented as sample image X i Inferred jth category label;representing a sample image X i Inputting an output vector of the currently trained network model at the last classification layer;
402 Constructing a hash coding learning function;
403 Constructing a Min-Max feature regular term with confidence coefficient;
404 Combining a classification loss function, a Hash coding learning function and a Min-Max characteristic regular term with confidence coefficient to construct a total target function;
405 Based on the total objective function, training a deep convolutional neural network model by using a mini-batch-based stochastic gradient descent method;
5) Deducing corresponding class label vectors for all training samples based on the layer parameters of the currently learned network model;
6) Calculating a confidence corresponding to each training sample based on the layer parameters of the currently learned network model and the inferred class label vector of the training sample;
7) Training the network model set up in the step 3) from random initialization by using the whole training sample set based on the class label vectors and the confidence degrees of all the current training samples until the network model converges;
8) Repeatedly executing the steps 5), 6) and 7) until the current round training number reaches the preset maximum round training number;
9) And calculating the Hash codes of the images in the test sample set by using the trained network model.
2. The image hash coding method based on the direct-push semi-supervised deep learning of claim 1, wherein the specific implementation method of step 402) is as follows:
definition image X i And X j The similarity between them is ω ij : if X i And X j Are semantically similar, i.e. there is at least one common class label, ω ij =1; otherwise ω is ij =0; assume N training sample imagesThe corresponding hash code is B = [ B = [ ] 1 ,…,b N ]I.e. b i As an image X i The hash coding of (2); similarity Ω = { ω = between N training samples ij The likelihood function of is:
wherein the content of the first and second substances,
n is a successive multiplication symbol, v ij =v i ·v j Representing the similarity ω between samples ij The degree of confidence of (a) is,representing a hash code b i Exp (-) represents an exponential function;
based on the above description, the proposed hash coding learning functionComprises the following steps:
3. the image hash coding method based on the direct-push semi-supervised deep learning of claim 1, wherein the specific implementation method of step 403) is as follows:
constructing a Min-Max feature regular term with confidence coefficient, and constructing a Min-Max feature regular term with confidence coefficientComprises the following steps:
wherein v is ij =v i ·v j ,x i Representing a sample image X i The feature vector of (2).
4. The image hash coding method based on direct-push semi-supervised deep learning of claim 1, wherein the overall objective function constructed in step 404) isComprises the following steps:
in the above formula, the three items on the right side are respectively a classification loss function, a hash coding learning function and a Min-Max feature regular item with confidence coefficient, the classification loss function is applied to the classification layer, the hash coding learning function is applied to the hash coding layer, the Min-Max feature regular item with confidence coefficient is applied to the feature layer, lambda is a hyper-parameter for adjusting the balance between the three items on the right side,a set of images representing a training sample is shown, represented as sample image X i The vector of the class labels that are inferred,a set of parameters representing a network model layer,v i representing a sample image X i Confidence of, i.e. v i Represented as sample image X i Inferred class label vectorThe degree of certainty; if it is notThen during the entire training process,are all equal toy i I.e. byIf it is not Representing the network model learned from the previous round as a sample image X i The inferred category label vector.
5. The image hash coding method based on the direct-push semi-supervised deep learning of claim 1, wherein the specific implementation method of step 5) is as follows:
fixingUpdatingThat is, based on the network model learned in the current round, corresponding class label vectors are deduced for all training samples:
for a single label image dataset: the classification loss function for the label-free sample set is:
because the classification losses of different samples are independent of each other, and v i Non-negative constants, so the above formula can be rewritten as U mutually independent sub-optimization problems:
the optimal solution of the above sub-optimization problem is:based on the optimal solutionThereby obtaining
For a multi-label image dataset: the classification loss function for the label-free sample set is:
since the classification penalties of different samples are independent of each other, and v i Non-negative constants, so the above formula can be rewritten as U mutually independent sub-optimization problems:
6. The image hash coding method based on direct-push semi-supervised deep learning according to claim 5, wherein the specific implementation method of step 6) is as follows:
for a single label image dataset, in feature space, a sample image X is defined i To the same kind of sample i Can be expressed as:
For a multi-label image dataset, in feature space, a sample image X is defined i To similar samples i Can be expressed as:
wherein | 2 The length of the modulus, ω, representing the vector ij Representation image X i And X j If X is similar to each other i And X j There is at least one common class label, then ω ij =1, otherwise ω ij =0;
Thus, image X i Confidence of r i The calculation formula of (2) is as follows:
wherein z is i =exp(-d i ),z max =max{z 1 ,z 2 ,…,z N }, exp (·) denotes an exponential function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111427674.6A CN114170333B (en) | 2021-11-24 | 2021-11-24 | Image hash coding method based on direct-push type semi-supervised deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111427674.6A CN114170333B (en) | 2021-11-24 | 2021-11-24 | Image hash coding method based on direct-push type semi-supervised deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114170333A CN114170333A (en) | 2022-03-11 |
CN114170333B true CN114170333B (en) | 2023-02-03 |
Family
ID=80481230
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111427674.6A Active CN114170333B (en) | 2021-11-24 | 2021-11-24 | Image hash coding method based on direct-push type semi-supervised deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114170333B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114379416B (en) * | 2022-03-23 | 2022-06-17 | 蔚来汽车科技(安徽)有限公司 | Method and system for controlling battery replacement operation based on vehicle chassis detection |
CN115294396B (en) * | 2022-08-12 | 2024-04-23 | 北京百度网讯科技有限公司 | Backbone network training method and image classification method |
CN115905926A (en) * | 2022-12-09 | 2023-04-04 | 华中科技大学 | Code classification deep learning model interpretation method and system based on sample difference |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165306A (en) * | 2018-08-09 | 2019-01-08 | 长沙理工大学 | Image search method based on the study of multitask Hash |
CN109783682A (en) * | 2019-01-19 | 2019-05-21 | 北京工业大学 | It is a kind of based on putting non-to the depth of similarity loose hashing image search method |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109034205B (en) * | 2018-06-29 | 2021-02-02 | 西安交通大学 | Image classification method based on direct-push type semi-supervised deep learning |
CN109918528A (en) * | 2019-01-14 | 2019-06-21 | 北京工商大学 | A kind of compact Hash code learning method based on semanteme protection |
CN109960737B (en) * | 2019-03-15 | 2020-12-08 | 西安电子科技大学 | Remote sensing image content retrieval method for semi-supervised depth confrontation self-coding Hash learning |
CN112861976B (en) * | 2021-02-11 | 2024-01-12 | 温州大学 | Sensitive image identification method based on twin graph convolution hash network |
-
2021
- 2021-11-24 CN CN202111427674.6A patent/CN114170333B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165306A (en) * | 2018-08-09 | 2019-01-08 | 长沙理工大学 | Image search method based on the study of multitask Hash |
CN109783682A (en) * | 2019-01-19 | 2019-05-21 | 北京工业大学 | It is a kind of based on putting non-to the depth of similarity loose hashing image search method |
Also Published As
Publication number | Publication date |
---|---|
CN114170333A (en) | 2022-03-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114170333B (en) | Image hash coding method based on direct-push type semi-supervised deep learning | |
CN111523047B (en) | Multi-relation collaborative filtering algorithm based on graph neural network | |
CN108710894B (en) | Active learning labeling method and device based on clustering representative points | |
CN107944410B (en) | Cross-domain facial feature analysis method based on convolutional neural network | |
CN109063113B (en) | Rapid image retrieval method, retrieval model and model construction method based on asymmetric depth discrete hash | |
CN110941734B (en) | Depth unsupervised image retrieval method based on sparse graph structure | |
CN112699247A (en) | Knowledge representation learning framework based on multi-class cross entropy contrast completion coding | |
CN108399185B (en) | Multi-label image binary vector generation method and image semantic similarity query method | |
WO2021227091A1 (en) | Multi-modal classification method based on graph convolutional neural network | |
CN110598022B (en) | Image retrieval system and method based on robust deep hash network | |
CN110264372B (en) | Topic community discovery method based on node representation | |
Li et al. | DAHP: Deep attention-guided hashing with pairwise labels | |
CN112000689A (en) | Multi-knowledge graph fusion method based on text analysis | |
CN116258990A (en) | Cross-modal affinity-based small sample reference video target segmentation method | |
CN115048539A (en) | Social media data online retrieval method and system based on dynamic memory | |
CN115828143A (en) | Node classification method for realizing heterogeneous primitive path aggregation based on graph convolution and self-attention mechanism | |
CN114860973A (en) | Depth image retrieval method for small sample scene | |
CN108647295B (en) | Image labeling method based on depth collaborative hash | |
CN116383422B (en) | Non-supervision cross-modal hash retrieval method based on anchor points | |
CN117173702A (en) | Multi-view multi-mark learning method based on depth feature map fusion | |
Mudiyanselage et al. | Feature selection with graph mining technology | |
Ahmed et al. | Clustering research papers using genetic algorithm optimized self-organizing maps | |
CN114882279A (en) | Multi-label image classification method based on direct-push type semi-supervised deep learning | |
CN114299336A (en) | Photographic image aesthetic style classification method based on self-supervision learning and deep forest | |
CN114564594A (en) | Knowledge graph user preference entity recall method based on double-tower model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |