CN114170333A - Image hash coding method based on direct-push type semi-supervised deep learning - Google Patents
Image hash coding method based on direct-push type semi-supervised deep learning Download PDFInfo
- Publication number
- CN114170333A CN114170333A CN202111427674.6A CN202111427674A CN114170333A CN 114170333 A CN114170333 A CN 114170333A CN 202111427674 A CN202111427674 A CN 202111427674A CN 114170333 A CN114170333 A CN 114170333A
- Authority
- CN
- China
- Prior art keywords
- image
- training
- network model
- sample
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/002—Image coding using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/001—Model-based coding, e.g. wire frame
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Image Processing (AREA)
Abstract
The invention discloses an image hash coding method based on direct-push semi-supervised deep learning, which comprises the following steps: dividing a data set into a training sample set and a testing sample set; the training set is further divided into a marked training sample set and a non-marked training sample set; building a deep convolutional neural network model; randomly initializing class label vectors of the unlabeled samples, and then training the network model set up in the step 3) on the whole training set until the network model is converged; deducing corresponding class label vectors for all training samples; calculating the confidence corresponding to each training sample; training the network model set up in the step 3) from random initialization by using the whole training sample set until the network model converges; repeatedly executing the steps 5) -7) until the training number of the current round reaches the preset maximum training number of the round; and calculating the Hash codes of the images in the test sample set by using the trained network model. The method can obviously reduce the labeling cost of the data.
Description
Technical Field
The invention belongs to the technical field of computer vision image hash coding, and particularly relates to an image hash coding method based on direct-push type semi-supervised depth learning.
Background
With the rapid development of the internet, an urgent need is: how to quickly retrieve an image that is the same as or similar to the query image or an image that meets specified requirements from a large-scale image database. It is obvious that the nearest neighbor search method is a solution, namely, some images which are nearest to the query image in the feature space are returned as the retrieval result. However, for a large-scale image database, the dimension of the image feature vector is usually large, and the nearest neighbor search method is very time-consuming and consumes a large storage space. As an approximation of nearest neighbor search, hash search has the advantages of low computation cost, high storage efficiency, high search speed, high search accuracy and the like, and is the most popular image search method at present. The hash retrieval is to perform nearest neighbor search according to hash coding of an image, and in general, image hash coding is to map a high-dimensional feature vector of an image into a binary vector with a smaller dimension through a set of hash functions while keeping a similarity relationship between the images unchanged, and the vector is called hash coding of the image.
The hash encoding method is classified into a conventional hash encoding method and a depth hash encoding method. The traditional Hash coding method firstly extracts the manually designed features of the image, then learns a group of Hash functions according to the extracted features, and then maps the feature vectors of the image into corresponding Hash codes according to the learned group of Hash functions. The deep hash coding method is characterized in that a feature vector and a hash function of an image are learned simultaneously from an original image through a deep convolutional neural network, and after network training is completed, input image networks can directly output hash codes of the image. The hash-coding method may be further classified into unsupervised, supervised and semi-supervised hash-coding methods based on whether class label information of an image is used in a training process.
Because the deep convolutional neural network has strong feature learning and nonlinear mapping capabilities, compared with the traditional hash coding method, the deep hash coding method has remarkable advantages. However, most of the current deep hash coding methods are supervised hash coding methods, and in order to achieve good hash coding quality and retrieval accuracy, a large number of labeled training samples are usually required for the methods. In practical applications, constructing a large-scale high-quality labeled training sample data set is very time-consuming and expensive, and even impractical for some special tasks. However, there are a large number of free images on the internet that are easily downloaded by search engines or web crawlers. The semi-supervised deep hash coding method can learn better image hash coding by using a small amount of marked samples and a large amount of unmarked samples, thereby reducing the marked amount of training samples and using a large amount of unmarked samples.
The traditional hash coding method based on the manual design features has poor performance and cannot meet the practical application. At present, most of semi-supervised deep hash coding methods adopt a graph model to model data distribution of unlabelled samples, and the methods have very high computational complexity, need a very large memory space for operation, and are not beneficial to popularization on a large-scale image data set.
The direct-push type semi-supervised learning is a semi-supervised learning method, and the core idea of the method is to consider labels of unlabelled training samples as variables needing learning and optimization, and the variables are updated and optimized in an iterative manner together with model parameters in the training process until the model converges.
The traditional direct-push semi-supervised learning method has the following two problems: first, it requires high quality feature vectors at the initial stage of training to infer reasonable label vectors for unlabeled samples. Because the quality of the feature vectors generated by the deep convolutional neural network in the initial training stage is poor, the traditional direct-push semi-supervised learning method cannot be directly combined with the training of the deep convolutional neural network. Secondly, the traditional direct-push semi-supervised learning method treats each unlabelled sample equally, can not process singular samples and uncertain samples, and influences the convergence and stability of model training.
Disclosure of Invention
The invention aims to provide an image hash coding method based on direct-push semi-supervised deep learning, which is independent of a network structure, can be applied to any deep convolutional neural network and can obviously reduce the labeling cost of data.
The technical scheme adopted by the invention is that the image hash coding method based on the direct-push type semi-supervised deep learning comprises the following steps:
1) preparing an image data set, and dividing the data set into a training sample set and a testing sample set;
2) the training set is further divided into a marked training sample set and a non-marked training sample set;
3) building a deep convolutional neural network model of an image Hash coding method based on direct-push type semi-supervised deep learning;
4) setting the confidence degrees of all the labeled samples to be 1, setting the confidence degrees of all the unlabeled samples to be 0, randomly initializing the class label vectors of the unlabeled samples, and then training the network model built in the step 3) on the whole training set until the network model is converged;
5) deducing corresponding class label vectors for all training samples based on the layer parameters of the currently learned network model;
6) calculating a confidence corresponding to each training sample based on the layer parameters of the currently learned network model and the inferred class label vector of the training sample;
7) training the network model set up in the step 3) from random initialization by using the whole training sample set based on the class label vectors and the confidence degrees of all the current training samples until the network model converges;
8) repeatedly executing the steps 5), 6) and 7) until the current round training number reaches the preset maximum round training number;
9) and calculating the Hash codes of the images in the test sample set by using the trained network model.
The present invention is also characterized in that,
the specific implementation method of the step 2) is as follows:
given training sample setWhereinAndrespectively representing a marked training sample set and a non-marked training sample set, L representing the number of marked training samples, U representing the number of non-marked training samples, L being generally much smaller than U, XiRepresenting the ith training sample image; if the image is Represents XiCorresponding class label vector if image XiContains the label of the jth category and,otherwiseC represents a data setThe number of category labels of (1); image XiPossibly containing a plurality of category labels, i.e. yiThere may be a plurality of components of 1; let N-L + U denote the total number of training sample images.
The specific implementation method of the step 3) is as follows:
giving a deep convolutional neural network model, replacing the last layer of the deep convolutional neural network model with two new fully-connected layers which are respectively used for image hash coding and image classification and are respectively called a hash coding layer hcl and an image classification layer cls; in the newly constructed network, the hash coding layer hcl is in front of the network, and the image classification layer cls is behind the network, namely the image classification layer cls is at the last layer of the network; the number of neurons in the hash encoding layer hcl is the same as the number of bits in the hash encoding, and the number of neurons in the image classification layer cls is the same as the number of class labels in the image data set.
The specific implementation method of the step 4) is as follows:
401) constructing a classification loss function;
402) constructing a Hash coding learning function;
403) constructing a Min-Max characteristic regular term with confidence coefficient;
404) constructing a total target function by combining a classification loss function, a Hash coding learning function and a Min-Max characteristic regular term with confidence coefficient;
405) based on the total objective function, a mini-batch-based stochastic gradient descent method is used for training a deep convolutional neural network model.
The specific implementation method of step 401) is as follows:
for a single-label image dataset and a multi-label image dataset, different classification loss functions are respectively adopted:
for a single label image dataset: applying softmax activation function at classification level, and classification loss functionComprises the following steps:
wherein the content of the first and second substances,a set of images representing a training sample is shown,representing the network model learned by the previous training as sample image XiThe broken class label vector and training in the current roundThe process is kept unchanged, and the process is not changed,the jth component ofIs expressed as sample XiInferred jth category label;virepresenting a sample image XiConfidence of, i.e. viRepresented as sample image XiInferred class label vectorThe degree of certainty;a set of parameters representing a network model layer,representing a sample image XiInputting an output vector of the currently trained network model at the last classification layer; i (cond) represents an indicator function, which has a value of 1 if the condition cond is true, and 0 otherwise;
for a multi-label image dataset: there is no activation function at the classification level, and the classification loss function used is:
wherein the content of the first and second substances,representing the network model learned by the previous training as sample image XiThe broken class label vector is kept unchanged in the training of the current round,the jth component ofRepresented as sample image XiInferred jth category label;representing a sample image XiAnd inputting an output vector of the currently trained network model at the last classification layer.
The specific implementation method of step 402) is as follows:
definition image XiAnd XjThe similarity between them is ωij: if X isiAnd XjAre semantically similar, i.e. there is at least one common class label, ωij1 is ═ 1; otherwise ω isij0; assume N training sample imagesThe corresponding hash code is B ═ B1,…,bN]I.e. biAs an image XiHash coding of (1); similarity omega between N training samples { omega ═ omegaijThe likelihood function of is:
wherein the content of the first and second substances,
n is a successive multiplication symbol, vij=vi·vjRepresenting the similarity ω between samplesijThe degree of confidence of (a) is,representing a hash code biExp (-) represents an exponential function;
based on the above description, the proposed hash coding learning functionComprises the following steps:
the specific implementation method of step 403) is as follows:
constructing a Min-Max feature regular term with confidence coefficient, and constructing a Min-Max feature regular term with confidence coefficientComprises the following steps:
wherein v isij=vi·vj,xiRepresenting a sample image XiThe feature vector of (2).
in the above formula, the three items on the right side are a classification loss function, a hash coding learning function and a Min-Max feature regular item with confidence coefficient, wherein the classification loss function is applied to the classification layer, the hash coding learning function is applied to the hash coding layer, and the Min-Max feature regular item with confidence coefficient is applied to the hash coding layerThe Min-Max feature regular term is applied to the feature layer, lambda is a hyperparameter used for adjusting the balance between the three terms on the right,a set of images representing a training sample is shown, represented as sample image XiThe vector of the class labels that are inferred,a set of parameters representing a network model layer,virepresenting a sample image XiConfidence of, i.e. viRepresented as sample image XiInferred class label vectorThe degree of certainty; if it is notThen during the entire training process,are all equal to yiI.e. byIf it is not Representing the network model learned from the previous round as a sample image XiThe inferred category label vector.
The specific implementation method of the step 5) is as follows:
fixingUpdatingThat is, based on the network model learned in the current round, corresponding class label vectors are deduced for all training samples:
for a single label image dataset: the classification loss function for the label-free sample set is:
because the classification losses of different samples are independent of each other, and viNon-negative constants, so the above formula can be rewritten as U mutually independent sub-optimization problems:
the optimal solution of the above sub-optimization problem is:based on the optimal solutionThereby obtaining
For a multi-label image dataset: the classification loss function for the label-free sample set is:
because the classification losses of different samples are independent of each other, and viNon-negative constants, so the above formula can be rewritten as U mutually independent sub-optimization problems:
the optimal solution of the above sub-optimization problem is:based on the optimal solutionThereby obtaining
The specific implementation method of the step 6) is as follows:
for a single label image dataset, in feature space, a sample image X is definediTo the same kind of sampleiCan be expressed as:
For a multi-label image dataset, in feature space, a sample image X is definediTo similar samplesiCan be expressed as:
wherein |2The length of the modulus, ω, representing the vectorijRepresentation image XiAnd XjIf X is similar to each otheriAnd XjThere is at least one common class label, then ωij1, otherwise ωij=0;
Thus, image XiConfidence of riThe calculation formula of (2) is as follows:
wherein z isi=exp(-di),zmax=max{z1,z2,…,zN}, exp (·) denotes an exponential function.
The invention has the beneficial effects that:
(1) the method expands and applies the traditional direct-push type semi-supervised learning method to the Hash coding method based on deep learning, and provides the image Hash coding method based on the direct-push type semi-supervised deep learning.
(2) The method introduces confidence into the label-free training sample, thereby greatly reducing the adverse effect of the uncertain sample on the training process and ensuring that the convergence process of the network model is more stable.
(3) The method provides a hash code learning function, so that the hash codes of similar samples have smaller Hamming distance.
(4) The method of the invention provides a Min-Max characteristic regular term with confidence coefficient, so that in a characteristic space: the distance between similar samples is as small as possible and the distance between non-similar samples is as large as possible.
(5) The image hash coding method based on the direct-push semi-supervised deep learning provided by the invention does not depend on a network structure, and can be applied to any deep convolutional neural network. Meanwhile, the method can obviously reduce the labeling cost of the data.
Drawings
Fig. 1 is a deep convolutional network model of an image hash coding method based on direct-push semi-supervised deep learning according to the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention provides an image hash coding method based on direct-push type semi-supervised deep learning, which comprises the following steps as shown in figure 1:
1) preparing an image data set, and dividing the data set into a training sample set and a testing sample set;
2) the training set is further divided into a marked training sample set and a non-marked training sample set;
the specific implementation method of the step 2) is as follows:
given training sample setWhereinAndrespectively representing a marked training sample set and a non-marked training sample set, L representing the number of marked training samples, U representing the number of non-marked training samples, L being generally much smaller than U, XiRepresenting the ith training sample image. If the image is Represents XiCorresponding class label vector if image XiContains the label of the jth category and,otherwiseC represents a data setThe number of category labels of (1). Image XiPossibly containing a plurality of category labels, i.e. yiThere may be multiple components of 1. Let N-L + U denote the total number of training sample images.
The purpose of the depth image hash coding method is as follows: learning a Hamming space from an image space to a Hamming space based on a deep convolutional neural network { -1,1}KNon-linear hash mapping ofThe mapping can map the image X into a K-bit Hash code b ═ h (X), so that the similarity between different image pairs can be still maintained in a Hamming space, and K is the bit number of the image Hash code; for convenience of description, image X is illustratediThe hash code of (b)i=h(Xi)。
3) Building a deep convolutional neural network model of an image Hash coding method based on direct-push type semi-supervised deep learning;
the specific implementation method of the step 3) is as follows:
giving a deep convolutional neural network model, replacing the last layer of the deep convolutional neural network model with two new fully-connected layers which are respectively used for image hash coding and image classification and are respectively called a hash coding layer hcl and an image classification layer cls; in the newly constructed network, the hash coding layer hcl is in front of the network, and the image classification layer cls is behind the network, namely the image classification layer cls is at the last layer of the network; the number of neurons in the hash encoding layer hcl is the same as the number of bits in the hash encoding, and the number of neurons in the image classification layer cls is the same as the number of class labels in the image data set.
For a more intuitive description, a deep convolutional neural network model based on a direct-push semi-supervised deep learning image hash coding method is described below by taking a deep convolutional neural network as an example. Referring to fig. 1, given a deep convolutional neural network, the network consists of several convolutions, feature layers and classification layers, wherein the feature layer is the penultimate layer of the network and the classification layer is the last layer of the network. As described above, the classification layer is replaced with two new fully-connected layers, namely the hash encoding layer hcl and the classification layer cls, and the rest of the network structure is unchanged. The number of neurons in the hash coding layer hcl and the classification layer cls are K and C, respectively. In the network training process, a Hash coding learning function is applied to a Hash coding layer, and a classification loss function is applied to a classification layer. The following describes more detailed details of the image hash coding method based on the direct-push semi-supervised deep learning.
4) Setting the confidence degrees of all the labeled samples to be 1, setting the confidence degrees of all the unlabeled samples to be 0, randomly initializing the class label vectors of the unlabeled samples, and then training the network model built in the step 3) on the whole training set until the network model is converged;
the specific implementation method of the step 4) is as follows:
401) constructing a classification loss function;
the specific implementation method of step 401) is as follows:
the method provided by the invention can process a single-label image data set and can also process a multi-label image data set. The classification loss function is applied at the classification level. For a single-label image dataset and a multi-label image dataset, the invention respectively adopts different classification loss functions:
for a single label image dataset: applying softmax activation function at classification level, and classification loss functionComprises the following steps:
wherein the content of the first and second substances,a set of images representing a training sample is shown,representing the network model learned by the previous training as sample image XiThe broken class label vector is kept unchanged in the training of the current round,the jth component ofIs expressed as sample XiInferred jth category label.viRepresenting a sample image XiConfidence of, i.e. viRepresented as sample image XiInferred class label vectorThe degree of certainty.A set of parameters representing a network model layer,representing a sample image XiAnd inputting an output vector of the currently trained network model at the last classification layer. I (cond) represents an indicator function, which has a value of 1 if the condition cond is true, and 0 otherwise;
for a multi-label image dataset: there is no activation function at the classification level, and the classification loss function used is:
wherein the content of the first and second substances,representing the network model learned by the previous training as sample image XiThe broken class label vector is kept unchanged in the training of the current round,the jth component ofRepresented as sample image XiInferred jth category label;representing a sample image XiAnd inputting an output vector of the currently trained network model at the last classification layer.
402) Constructing a Hash coding learning function;
the specific implementation method of step 402) is as follows:
definition image XiAnd XjThe similarity between them is ωij: if X isiAnd XjAre semantically similar (i.e., there is at least one common class label), ωij1 is ═ 1; otherwise ω isij0; assume N training sample imagesThe corresponding hash code is B ═ B1,…,bN]I.e. biAs an image XiHash coding of (1); similarity omega between N training samples { omega ═ omegaijThe likelihood function of is:
wherein the content of the first and second substances,
n is a successive multiplication symbol, vij=vi·vjRepresenting the similarity ω between samplesijThe degree of confidence of (a) is,representing a hash code biExp (-) represents an exponential function;
based on the above description, the hash coding learning function proposed by the present inventionComprises the following steps:
403) constructing a Min-Max characteristic regular term with confidence coefficient;
the specific implementation method of step 403) is as follows:
in order to learn better image hash coding, the invention provides a Min-Max feature regular term with confidence coefficient, wherein the Min-Max feature regular term with confidence coefficient is applied to a feature layer during network training, and the regular term explicitly enables the features learned by the network to have the following attributes: if there is at least one common class label for both images, the distance between their feature vectors should be as small as possible, otherwise the distance between their feature vectors should be as large as possible. Constructing a Min-Max feature regularization term with confidence coefficient, wherein the Min-Max feature regularization term with confidence coefficient is provided by the inventionComprises the following steps:
wherein v isij=vi·vj,xiRepresenting a sample image XiThe feature vector (output of the network feature layer).
404) Constructing a total target function by combining a classification loss function, a Hash coding learning function and a Min-Max characteristic regular term with confidence coefficient;
in the above formula, the three items on the right side are respectively a classification loss function, a hash coding learning function and a Min-Max feature regular item with confidence coefficient, the classification loss function is applied to the classification layer, the hash coding learning function is applied to the hash coding layer, the Min-Max feature regular item with confidence coefficient is applied to the feature layer, lambda is a hyper-parameter for adjusting the balance between the three items on the right side,a set of images representing a training sample is shown, represented as sample image XiThe vector of the class labels that are inferred,a set of parameters representing a network model layer,virepresenting a sample image XiConfidence of, i.e. viRepresented as sample image XiInferred class label vectorThe degree of certainty; if it is notThen during the entire training process,are all equal to yiI.e. byIf it is not Representing the network model learned from the previous round as a sample image XiThe inferred category label vector.
405) Based on the total objective function, a mini-batch-based stochastic gradient descent method is used for training a deep convolutional neural network model.
5) Deducing corresponding class label vectors for all training samples based on the layer parameters of the currently learned network model;
the specific implementation method of the step 5) is as follows:
fixingUpdatingThat is, based on the network model learned in the current round, corresponding class label vectors are deduced for all training samples: in fact, only the corresponding class needs to be inferred for all the unlabeled training sample imagesThe label vector is just needed.
For a single label image dataset: the classification loss function for the label-free sample set is:
because the classification losses of different samples are independent of each other, and viNon-negative constants, so the above formula can be rewritten as U mutually independent sub-optimization problems:
the optimal solution of the above sub-optimization problem is:based on the optimal solutionThereby obtaining
For a multi-label image dataset: the classification loss function for the label-free sample set is:
because the classification losses of different samples are independent of each other, and viNon-negative constants, so the above formula can be rewritten as U mutually independent sub-optimization problems:
the optimal solution of the above sub-optimization problem is:based on the optimal solutionThereby obtaining
6) Calculating a confidence corresponding to each training sample based on the layer parameters of the currently learned network model and the inferred class label vector of the training sample;
the specific implementation method of the step 6) is as follows:
if the image isThen its confidence v is always set throughout the training processi1 is ═ 1; if it is notThe present invention calculates r based on two intuitive assumptionsi: (1) in the feature space, if the average distance from one sample to other similar samples or similar samples is smaller, the sample is closer to the class center of the corresponding class, and higher confidence is given; (2) in the feature space, if the average distance from one sample to other similar samples or similar samples is larger, the sample is farther away from the center of the corresponding class, and a lower confidence degree is given; defining a sample image XiIs x (the output of the network feature layer)i;
For a single label image dataset, in feature space, a sample image X is definediTo the same kind of sampleiCan be expressed as:
For a multi-label image dataset, in feature space, a sample image X is definediTo similar samplesiCan be expressed as:
wherein |2The length of the modulus, ω, representing the vectorijRepresentation image XiAnd XjIf X is similar to each otheriAnd XjThere is at least one common class label, then ωij1, otherwise ωij=0;
Thus, based on the above analysis, whether a single-label image dataset or a multi-label image dataset, diThe smaller, XiThe easier it is to be assigned the correct tag vector, the higher confidence should be assigned; thus, the image X proposed by the present inventioniConfidence of riThe calculation formula of (2) is as follows:
wherein z isi=exp(-di),zmax=max{z1,z2,…,zN}, exp (·) denotes an exponential function.
7) Training the network model set up in the step 3) from random initialization by using the whole training sample set based on the class label vectors and the confidence degrees of all the current training samples until the network model converges; wherein, a total target function is adopted, and a mini-batch-based stochastic gradient descent method is used for training a deep convolution neural network model.
8) Repeatedly executing the steps 5), 6) and 7) until the current round training number reaches the preset maximum round training number; according to experiments, the preset maximum number of training rounds is generally set to be 4.
9) And calculating the Hash codes of the images in the test sample set by using the trained network model.
The experimental results are as follows:
the invention has been experimented with three commonly used image datasets CIFAR10, NUS-WIDE and MIR-Flickr25K, respectively.
The CIFAR10 dataset is a single label image dataset with 60000 images in total, 10 classes. According to the common semi-supervised learning setting of the data set, 1000 images (100 images are randomly selected for each class) are used as a query set, and the rest 59000 images are used as a retrieval database; and taking all the images in the retrieval database as a training sample set, wherein 5000 images are taken as labeled training sample sets, and the rest 54000 images are taken as unlabeled training sample sets.
NUS-WIDE is a multi-label image dataset, with a total of about 270000 images. According to the semi-supervised learning setting commonly used by the data set, 2100 images are used as a query set, 10500 images are used as a labeled training sample set, 149733 images are used as an unlabeled training sample set, and the images of the whole training set are used as a retrieval database.
The MIR-Flickr25K dataset is a multi-label image dataset, with a total of 25000 images. According to the common semi-supervised learning setting of the data set, 1000 images are used as a query set, 5000 images are used as an annotated training sample set, 19000 images are used as an annotated training sample set, and the images of the whole training set are used as a retrieval database.
Currently, very representative Hash coding methods based on semi-supervised learning are DSH-GANs, SSDH and BGDH. For fair comparison, the present invention was experimented with the deep convolutional neural network model used by these methods.
In the experiments on these three data sets, the parameter λ in the overall objective function of the present invention was set to 0.01. The maximum number of rounds of training is set to 4.
The experimental test method comprises the following steps: after the network model is trained, respectively calculating a query set corresponding to the data set and hash codes of images in a retrieval database, then calculating retrieval performance based on the hash codes of the images, and adopting MAP scores commonly used in academia for evaluation indexes (the larger the MAP value is, the better the description method is).
The results of the different methods on the corresponding data sets are given in tables 1, 2 and 3, respectively. The experimental results show that the method of the invention is obviously superior to other comparison methods in the table, and the superiority of the method of the invention is fully proved.
In addition, the method of the invention also performs ablation experiments on the three data sets to verify the confidence coefficient, and the experimental results are shown in tables 4, 5 and 6. The experimental results fully verify that confidence is introduced into the label-free training samples, so that the adverse effect of uncertain samples on the training process can be greatly reduced, and the convergence process of the network model is more stable.
TABLE 1 MAP scores on CIFAR10 data set by different methods
TABLE 2 MAP scores on NUS-WIDE data set by different methods
TABLE 3 MAP scores on MIR-Flickr25K data set by different methods
TABLE 4 Effect of whether confidence is used on the MAP score on the CIFAR10 dataset
TABLE 5 Effect of whether confidence is used on MAP scores on NUS-WIDE datasets
TABLE 6 influence of whether confidence was used on MAP scores on the MIR-Flickr25K dataset
Claims (10)
1. The image hash coding method based on the direct-push semi-supervised deep learning is characterized by comprising the following steps of:
1) preparing an image data set, and dividing the data set into a training sample set and a testing sample set;
2) the training set is further divided into a marked training sample set and a non-marked training sample set;
3) building a deep convolutional neural network model of an image Hash coding method based on direct-push type semi-supervised deep learning;
4) setting the confidence degrees of all the labeled samples to be 1, setting the confidence degrees of all the unlabeled samples to be 0, randomly initializing the class label vectors of the unlabeled samples, and then training the network model built in the step 3) on the whole training set until the network model is converged;
5) deducing corresponding class label vectors for all training samples based on the layer parameters of the currently learned network model;
6) calculating a confidence corresponding to each training sample based on the layer parameters of the currently learned network model and the inferred class label vector of the training sample;
7) training the network model set up in the step 3) from random initialization by using the whole training sample set based on the class label vectors and the confidence degrees of all the current training samples until the network model converges;
8) repeatedly executing the steps 5), 6) and 7) until the current round training number reaches the preset maximum round training number;
9) and calculating the Hash codes of the images in the test sample set by using the trained network model.
2. The image hash coding method based on the direct-push semi-supervised deep learning of claim 1, wherein the specific implementation method of step 2) is as follows:
given training sample setWhereinAndrespectively representing a marked training sample set and a non-marked training sample set, L representing the number of marked training samples, U representing the number of non-marked training samples, L being generally much smaller than U, XiRepresenting the ith training sample image; if the image is Represents XiCorresponding class label vector if image XiContains the label of the jth category and,otherwiseC represents a data setThe number of category labels of (1); image XiPossibly containing a plurality of category labels, i.e. yiThere may be a plurality of components of 1; let N-L + U denote the total number of training sample images.
3. The image hash coding method based on the direct-push semi-supervised deep learning of claim 2, wherein the specific implementation method of step 3) is as follows:
giving a deep convolutional neural network model, replacing the last layer of the deep convolutional neural network model with two new fully-connected layers which are respectively used for image hash coding and image classification and are respectively called a hash coding layer hcl and an image classification layer cls; in the newly constructed network, the hash coding layer hcl is in front of the network, and the image classification layer cls is behind the network, namely the image classification layer cls is at the last layer of the network; the number of neurons in the hash encoding layer hcl is the same as the number of bits in the hash encoding, and the number of neurons in the image classification layer cls is the same as the number of class labels in the image data set.
4. The image hash coding method based on the direct-push semi-supervised deep learning of claim 3, wherein the specific implementation method of step 4) is as follows:
401) constructing a classification loss function;
402) constructing a Hash coding learning function;
403) constructing a Min-Max characteristic regular term with confidence coefficient;
404) constructing a total target function by combining a classification loss function, a Hash coding learning function and a Min-Max characteristic regular term with confidence coefficient;
405) based on the total objective function, a mini-batch-based stochastic gradient descent method is used for training a deep convolutional neural network model.
5. The image hash coding method based on the direct-push semi-supervised deep learning of claim 4, wherein the specific implementation method of step 401) is as follows:
for a single-label image dataset and a multi-label image dataset, different classification loss functions are respectively adopted:
for a single label image dataset: applying softmax activation function at classification level, and classification loss functionComprises the following steps:
wherein the content of the first and second substances,a set of images representing a training sample is shown,representing the network model learned by the previous training as sample image XiThe broken class label vector is kept unchanged in the training of the current round,the jth component ofIs expressed as sample XiInferred jth category label;virepresenting a sample image XiConfidence of, i.e. viRepresented as sample image XiInferred class label vectorThe degree of certainty;a set of parameters representing a network model layer,representing a sample image XiInputting an output vector of the currently trained network model at the last classification layer; i (cond) represents an indicator function, which has a value of 1 if the condition cond is true, and 0 otherwise;
for a multi-label image dataset: there is no activation function at the classification level, and the classification loss function used is:
wherein the content of the first and second substances,representing the network model learned by the previous training as sample image XiThe broken class label vector is kept unchanged in the training of the current round,the jth component ofRepresented as sample image XiInferred jth category label;representing a sample image XiAnd inputting an output vector of the currently trained network model at the last classification layer.
6. The image hash coding method based on the direct-push semi-supervised deep learning of claim 4, wherein the specific implementation method of step 402) is as follows:
definition image XiAnd XjThe similarity between them is ωij: if X isiAnd XjAre semantically similar, i.e. there is at least one common class label, ωij1 is ═ 1; otherwise ω isij0; assume N training sample imagesThe corresponding hash code is B ═ B1,…,bN]I.e. biAs an image XiHash coding of (1); similarity omega between N training samples { omega ═ omegaijThe likelihood function of is:
wherein the content of the first and second substances,
n is a successive multiplication symbol, vij=vi·vjRepresenting the similarity ω between samplesijThe degree of confidence of (a) is,representing a hash code biExp (-) represents an exponential function;
based on the above description, the proposed hash coding learning functionComprises the following steps:
7. the image hash coding method based on the direct-push semi-supervised deep learning of claim 4, wherein the specific implementation method of step 403) is as follows:
constructing a Min-Max feature regular term with confidence coefficient, and constructing a Min-Max feature regular term with confidence coefficientComprises the following steps:
wherein v isij=vi·vj,xiRepresenting a sample image XiThe feature vector of (2).
8. The image hash coding method based on direct-push semi-supervised deep learning as claimed in claim 4, wherein the overall objective function constructed in step 404) isComprises the following steps:
in the above formula, the three items on the right side are respectively a classification loss function, a hash coding learning function and a Min-Max feature regular item with confidence coefficient, the classification loss function is applied to the classification layer, the hash coding learning function is applied to the hash coding layer, the Min-Max feature regular item with confidence coefficient is applied to the feature layer, lambda is a hyper-parameter for adjusting the balance between the three items on the right side,a set of images representing a training sample is shown, represented as sample image XiThe vector of the class labels that are inferred,a set of parameters representing a network model layer,virepresenting a sample image XiConfidence of, i.e. viRepresented as sample image XiInferred class label vectorThe degree of certainty; if it is notThen during the entire training process,are all equal to yiI.e. byIf it is not Representing the network model learned from the previous round as a sample image XiInferred categoryA tag vector.
9. The image hash coding method based on the direct-push semi-supervised deep learning of claim 4, wherein the specific implementation method of step 5) is as follows:
fixingUpdatingThat is, based on the network model learned in the current round, corresponding class label vectors are deduced for all training samples:
for a single label image dataset: the classification loss function for the label-free sample set is:
because the classification losses of different samples are independent of each other, and viNon-negative constants, so the above formula can be rewritten as U mutually independent sub-optimization problems:
the optimal solution of the above sub-optimization problem is:based on the optimal solutionThereby obtaining
For a multi-label image dataset: the classification loss function for the label-free sample set is:
because the classification losses of different samples are independent of each other, and viNon-negative constants, so the above formula can be rewritten as U mutually independent sub-optimization problems:
10. The image hash coding method based on the direct-push semi-supervised deep learning of claim 9, wherein the specific implementation method of step 6) is as follows:
for a single label image dataset, in feature space, a sample image X is definediTo the same kind of sampleiCan be expressed as:
For a multi-label image dataset, in feature space, a sample image X is definediTo similar samplesiCan be expressed as:
wherein |2The length of the modulus, ω, representing the vectorijRepresentation image XiAnd XjIf X is similar to each otheriAnd XjThere is at least one common class label, then ωij1, otherwise ωij=0;
Thus, image XiConfidence of riThe calculation formula of (2) is as follows:
wherein z isi=exp(-di),zmax=max{z1,z2,…,zN}, exp (·) denotes an exponential function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111427674.6A CN114170333B (en) | 2021-11-24 | 2021-11-24 | Image hash coding method based on direct-push type semi-supervised deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111427674.6A CN114170333B (en) | 2021-11-24 | 2021-11-24 | Image hash coding method based on direct-push type semi-supervised deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114170333A true CN114170333A (en) | 2022-03-11 |
CN114170333B CN114170333B (en) | 2023-02-03 |
Family
ID=80481230
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111427674.6A Active CN114170333B (en) | 2021-11-24 | 2021-11-24 | Image hash coding method based on direct-push type semi-supervised deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114170333B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114379416A (en) * | 2022-03-23 | 2022-04-22 | 蔚来汽车科技(安徽)有限公司 | Method and system for controlling battery replacement operation based on vehicle chassis detection |
CN115294396A (en) * | 2022-08-12 | 2022-11-04 | 北京百度网讯科技有限公司 | Backbone network training method and image classification method |
CN115905926A (en) * | 2022-12-09 | 2023-04-04 | 华中科技大学 | Code classification deep learning model interpretation method and system based on sample difference |
CN115905926B (en) * | 2022-12-09 | 2024-05-28 | 华中科技大学 | Code classification deep learning model interpretation method and system based on sample difference |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109034205A (en) * | 2018-06-29 | 2018-12-18 | 西安交通大学 | Image classification method based on the semi-supervised deep learning of direct-push |
CN109165306A (en) * | 2018-08-09 | 2019-01-08 | 长沙理工大学 | Image search method based on the study of multitask Hash |
CN109783682A (en) * | 2019-01-19 | 2019-05-21 | 北京工业大学 | It is a kind of based on putting non-to the depth of similarity loose hashing image search method |
CN109918528A (en) * | 2019-01-14 | 2019-06-21 | 北京工商大学 | A kind of compact Hash code learning method based on semanteme protection |
CN109960737A (en) * | 2019-03-15 | 2019-07-02 | 西安电子科技大学 | Remote Sensing Images search method of the semi-supervised depth confrontation from coding Hash study |
CN112861976A (en) * | 2021-02-11 | 2021-05-28 | 温州大学 | Sensitive image identification method based on twin graph convolution hash network |
-
2021
- 2021-11-24 CN CN202111427674.6A patent/CN114170333B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109034205A (en) * | 2018-06-29 | 2018-12-18 | 西安交通大学 | Image classification method based on the semi-supervised deep learning of direct-push |
CN109165306A (en) * | 2018-08-09 | 2019-01-08 | 长沙理工大学 | Image search method based on the study of multitask Hash |
CN109918528A (en) * | 2019-01-14 | 2019-06-21 | 北京工商大学 | A kind of compact Hash code learning method based on semanteme protection |
CN109783682A (en) * | 2019-01-19 | 2019-05-21 | 北京工业大学 | It is a kind of based on putting non-to the depth of similarity loose hashing image search method |
CN109960737A (en) * | 2019-03-15 | 2019-07-02 | 西安电子科技大学 | Remote Sensing Images search method of the semi-supervised depth confrontation from coding Hash study |
CN112861976A (en) * | 2021-02-11 | 2021-05-28 | 温州大学 | Sensitive image identification method based on twin graph convolution hash network |
Non-Patent Citations (3)
Title |
---|
WEIWEI SHI 等: "Transductive Semisupervised Deep Hashing", 《IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 》 * |
WEIWEI SHI 等: "Transductive Semi-Supervised Deep Learning using Min-Max Features", 《ECCV 2018》 * |
刘梦迪等: "图像自动标注技术研究进展", 《计算机应用》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114379416A (en) * | 2022-03-23 | 2022-04-22 | 蔚来汽车科技(安徽)有限公司 | Method and system for controlling battery replacement operation based on vehicle chassis detection |
CN115294396A (en) * | 2022-08-12 | 2022-11-04 | 北京百度网讯科技有限公司 | Backbone network training method and image classification method |
CN115294396B (en) * | 2022-08-12 | 2024-04-23 | 北京百度网讯科技有限公司 | Backbone network training method and image classification method |
CN115905926A (en) * | 2022-12-09 | 2023-04-04 | 华中科技大学 | Code classification deep learning model interpretation method and system based on sample difference |
CN115905926B (en) * | 2022-12-09 | 2024-05-28 | 华中科技大学 | Code classification deep learning model interpretation method and system based on sample difference |
Also Published As
Publication number | Publication date |
---|---|
CN114170333B (en) | 2023-02-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111523047B (en) | Multi-relation collaborative filtering algorithm based on graph neural network | |
CN108710894B (en) | Active learning labeling method and device based on clustering representative points | |
CN114170333B (en) | Image hash coding method based on direct-push type semi-supervised deep learning | |
CN110941734B (en) | Depth unsupervised image retrieval method based on sparse graph structure | |
CN108399185B (en) | Multi-label image binary vector generation method and image semantic similarity query method | |
CN110598022B (en) | Image retrieval system and method based on robust deep hash network | |
WO2021227091A1 (en) | Multi-modal classification method based on graph convolutional neural network | |
CN110264372B (en) | Topic community discovery method based on node representation | |
CN113377981B (en) | Large-scale logistics commodity image retrieval method based on multitask deep hash learning | |
Li et al. | DAHP: Deep attention-guided hashing with pairwise labels | |
CN112000689A (en) | Multi-knowledge graph fusion method based on text analysis | |
CN114299362A (en) | Small sample image classification method based on k-means clustering | |
CN115048539A (en) | Social media data online retrieval method and system based on dynamic memory | |
CN116258990A (en) | Cross-modal affinity-based small sample reference video target segmentation method | |
CN115618096A (en) | Inner product retrieval method and electronic equipment | |
CN114860973A (en) | Depth image retrieval method for small sample scene | |
Li et al. | Deep learning for approximate nearest neighbour search: A survey and future directions | |
CN116383422B (en) | Non-supervision cross-modal hash retrieval method based on anchor points | |
Xie et al. | Deep online cross-modal hashing by a co-training mechanism | |
CN116524282B (en) | Discrete similarity matching classification method based on feature vectors | |
CN114882279B (en) | Multi-label image classification method based on direct-push semi-supervised deep learning | |
Ahmed et al. | Clustering research papers using genetic algorithm optimized self-organizing maps | |
CN114299336A (en) | Photographic image aesthetic style classification method based on self-supervision learning and deep forest | |
Lin et al. | High-order structure preserving graph neural network for few-shot learning | |
Qin et al. | A novel deep hashing method with top similarity for image retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |