CN114170333B - Image hash coding method based on direct-push type semi-supervised deep learning - Google Patents

Image hash coding method based on direct-push type semi-supervised deep learning Download PDF

Info

Publication number
CN114170333B
CN114170333B CN202111427674.6A CN202111427674A CN114170333B CN 114170333 B CN114170333 B CN 114170333B CN 202111427674 A CN202111427674 A CN 202111427674A CN 114170333 B CN114170333 B CN 114170333B
Authority
CN
China
Prior art keywords
image
training
network model
sample
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111427674.6A
Other languages
Chinese (zh)
Other versions
CN114170333A (en
Inventor
石伟伟
黑新宏
王晓帆
鲁晓锋
贾萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN202111427674.6A priority Critical patent/CN114170333B/en
Publication of CN114170333A publication Critical patent/CN114170333A/en
Application granted granted Critical
Publication of CN114170333B publication Critical patent/CN114170333B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame

Abstract

The invention discloses an image hash coding method based on direct-push semi-supervised deep learning, which comprises the following steps: dividing a data set into a training sample set and a testing sample set; the training set is further divided into a marked training sample set and a non-marked training sample set; building a deep convolutional neural network model; randomly initializing class label vectors of the unlabeled samples, and then training the network model set up in the step 3) on the whole training set until the network model is converged; deducing corresponding class label vectors for all training samples; calculating the confidence corresponding to each training sample; training the network model set up in the step 3) from random initialization by using the whole training sample set until the network model converges; repeatedly executing the steps 5) -7) until the training number of the current round reaches the preset maximum training number of the round; and calculating the Hash codes of the images in the test sample set by using the trained network model. The method can obviously reduce the labeling cost of the data.

Description

Image hash coding method based on direct-push type semi-supervised deep learning
Technical Field
The invention belongs to the technical field of computer vision image hash coding, and particularly relates to an image hash coding method based on direct-push type semi-supervised deep learning.
Background
With the rapid development of the internet, an urgent need is: how to quickly retrieve an image identical or similar to a query image or an image satisfying a specified requirement from a large-scale image database. It is obvious that the nearest neighbor search method is a solution, that is, some images closest to the query image in the feature space are returned as the retrieval result. However, for a large-scale image database, the dimension of the image feature vector is usually large, and the nearest neighbor search method is very time-consuming and consumes a large storage space. As an approximation of nearest neighbor search, hash search has the advantages of low computation cost, high storage efficiency, high search speed, high search accuracy and the like, and is the most popular image search method at present. The hash retrieval is to perform nearest neighbor search according to hash coding of an image, and in general, image hash coding is to map a high-dimensional feature vector of an image into a binary vector with a smaller dimension through a set of hash functions while keeping a similarity relationship between the images unchanged, and the vector is called hash coding of the image.
The hash coding method is classified into a conventional hash coding method and a depth hash coding method. The traditional Hash coding method firstly extracts the manually designed features of the image, then learns a group of Hash functions according to the extracted features, and then maps the feature vectors of the image into corresponding Hash codes according to the learned group of Hash functions. The deep hash coding method is characterized in that a feature vector and a hash function of an image are learned simultaneously from an original image through a deep convolutional neural network, and after network training is completed, input image networks can directly output hash codes of the image. The hash-coding method may be further classified into unsupervised, supervised and semi-supervised hash-coding methods based on whether class label information of an image is used in a training process.
Because the deep convolutional neural network has strong feature learning and nonlinear mapping capabilities, compared with the traditional hash coding method, the deep hash coding method has remarkable advantages. However, most of the current deep hash coding methods are supervised hash coding methods, and in order to achieve good hash coding quality and retrieval accuracy, a large number of labeled training samples are usually required for the methods. In practical applications, constructing a large-scale high-quality labeled training sample data set is very time-consuming and expensive, and even impractical for some special tasks. However, there are a large number of free images on the internet that are easily downloaded by search engines or web crawlers. The semi-supervised deep hash coding method can learn better image hash coding by using a small amount of marked samples and a large amount of unmarked samples, thereby reducing the marked amount of training samples and using a large amount of unmarked samples.
The traditional hash coding method based on the manual design features has poor performance and cannot meet the practical application. At present, most of semi-supervised deep hash coding methods adopt a graph model to model data distribution of unlabelled samples, and the methods have very high computational complexity, need a very large memory space for operation, and are not beneficial to popularization on a large-scale image data set.
The direct-push type semi-supervised learning is a semi-supervised learning method, and the core idea of the method is to consider labels of unlabelled training samples as variables needing learning and optimization, and the variables are updated and optimized in an iterative manner together with model parameters in the training process until the model converges.
The traditional direct-push semi-supervised learning method has the following two problems: first, it requires high quality feature vectors at the initial stage of training to infer reasonable label vectors for unlabeled samples. Because the quality of the feature vectors generated by the deep convolutional neural network in the initial training stage is poor, the traditional direct-push semi-supervised learning method cannot be directly combined with the training of the deep convolutional neural network. Secondly, the traditional direct-push semi-supervised learning method treats each unlabelled sample equally, can not process singular samples and uncertain samples, and influences the convergence and stability of model training.
Disclosure of Invention
The invention aims to provide an image hash coding method based on direct-push semi-supervised deep learning, which is independent of a network structure, can be applied to any deep convolutional neural network and can obviously reduce the labeling cost of data.
The technical scheme adopted by the invention is that the image hash coding method based on the direct-push type semi-supervised deep learning comprises the following steps:
1) Preparing an image data set, and dividing the data set into a training sample set and a testing sample set;
2) The training set is further divided into a marked training sample set and a non-marked training sample set;
3) Building a deep convolutional neural network model of an image Hash coding method based on direct-push type semi-supervised deep learning;
4) Setting the confidence degrees of all the labeled samples to be 1, setting the confidence degrees of all the unlabeled samples to be 0, randomly initializing the class label vectors of the unlabeled samples, and then training the network model built in the step 3) on the whole training set until the network model is converged;
5) Deducing corresponding class label vectors for all training samples based on the layer parameters of the currently learned network model;
6) Calculating a confidence corresponding to each training sample based on the layer parameters of the currently learned network model and the inferred class label vector of the training sample;
7) Training the network model set up in the step 3) from random initialization by using the whole training sample set based on the class label vectors and the confidence degrees of all the current training samples until the network model converges;
8) Repeatedly executing the steps 5), 6) and 7) until the current round training number reaches the preset maximum round training number;
9) And calculating the Hash codes of the images in the test sample set by using the trained network model.
The present invention is also characterized in that,
the specific implementation method of the step 2) is as follows:
given training sample set
Figure BDA0003371749730000041
Wherein
Figure BDA0003371749730000042
And
Figure BDA0003371749730000043
respectively representing a marked training sample set and a non-marked training sample set, L representing the number of marked training samples, U representing the number of non-marked training samples, L being generally much smaller than U, X i Show the ith trainingA sample image; if the image
Figure BDA0003371749730000044
Figure BDA0003371749730000045
Represents X i Corresponding class label vector if image X i There is a tag for the jth category,
Figure BDA0003371749730000046
otherwise
Figure BDA0003371749730000047
C represents a data set
Figure BDA0003371749730000048
The number of category labels of (1); image X i Possibly containing a plurality of category labels, i.e. y i There may be a plurality of components of 1; let N = L + U denote the total number of training sample images.
The specific implementation method of the step 3) is as follows:
giving a deep convolutional neural network model, replacing the last layer of the deep convolutional neural network model with two new fully-connected layers which are respectively used for image hash coding and image classification and are respectively called a hash coding layer hcl and an image classification layer cls; in the newly built network, a Hash coding layer hcl is in front of the network, and an image classification layer cls is behind the network, namely the image classification layer cls is at the last layer of the network; the number of the hash coding layer hcl neurons is the same as the number of bits of hash coding, and the number of the image classification layer cls neurons is the same as the number of class labels of the image data set.
The specific implementation method of the step 4) is as follows:
401 Constructing a classification loss function;
402 Construct a hash-coding learning function;
403 Constructing a Min-Max feature regular term with confidence coefficient;
404 Combining a classification loss function, a Hash coding learning function and a Min-Max characteristic regular term with confidence coefficient to construct a total target function;
405 Based on the overall objective function, a mini-batch based stochastic gradient descent method is used to train the deep convolutional neural network model.
The specific implementation method of step 401) is as follows:
for a single-label image dataset and a multi-label image dataset, different classification loss functions are respectively adopted:
for a single label image dataset: applying softmax activation function at classification level, and classification loss function
Figure BDA0003371749730000051
Comprises the following steps:
Figure BDA0003371749730000052
wherein the content of the first and second substances,
Figure BDA0003371749730000053
a set of images representing a training sample is shown,
Figure BDA0003371749730000054
representing the network model learned by the previous training as sample image X i The broken class label vector is kept unchanged in the training of the current round,
Figure BDA0003371749730000055
the jth component of
Figure BDA0003371749730000056
Is expressed as sample X i Inferred jth category label;
Figure BDA0003371749730000057
v i representing a sample image X i Of (c), i.e. v i Represented as sample image X i Inferred class label vector
Figure BDA0003371749730000058
The degree of certainty;
Figure BDA0003371749730000059
a set of parameters representing a network model layer,
Figure BDA00033717497300000510
representing a sample image X i Inputting an output vector of the currently trained network model at the last classification layer; i (cond) represents an indicator function, which has a value of 1 if the condition cond is true, and 0 otherwise;
for a multi-label image dataset: there is no activation function at the classification level, and the classification loss function used is:
Figure BDA00033717497300000511
Figure BDA00033717497300000512
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003371749730000061
representing the network model learned from the previous training round as sample image X i The broken class label vector is kept unchanged in the training of the current round,
Figure BDA0003371749730000062
the jth component of
Figure BDA0003371749730000063
Represented as sample image X i Inferred jth category label;
Figure BDA0003371749730000064
representing a sample image X i And inputting an output vector of the currently trained network model at the last classification layer.
The specific implementation method of step 402) is as follows:
definition image X i And X j The similarity between them is ω ij : if X is i And X j Are semantically similar, i.e. there is at least one common class label, ω ij =1; otherwise ω is ij =0; assume N training sample images
Figure BDA0003371749730000065
The corresponding hash code is B = [ B = [ ] 1 ,…,b N ]I.e. b i As an image X i Hash coding of (1); similarity Ω = { ω = between N training samples ij The likelihood function of is:
Figure BDA0003371749730000066
wherein the content of the first and second substances,
Figure BDA0003371749730000067
Figure BDA0003371749730000068
n is a successive multiplication symbol, v ij =v i ·v j Representing the similarity ω between samples ij The degree of confidence of (a) is,
Figure BDA0003371749730000069
representing a hash code b i Exp (-) represents an exponential function;
based on the above description, the proposed hash coding learning function
Figure BDA00033717497300000610
Comprises the following steps:
Figure BDA00033717497300000611
the specific implementation method of step 403) is as follows:
constructing a Min-Max feature regularization term with confidence coefficient, and constructing a Min-Max feature regularization term with confidence coefficient
Figure BDA0003371749730000071
Comprises the following steps:
Figure BDA0003371749730000072
wherein v is ij =v i ·v j ,x i Representing a sample image X i The feature vector of (2).
The overall objective function constructed in step 404)
Figure BDA0003371749730000073
Comprises the following steps:
Figure BDA0003371749730000074
in the above formula, the three items on the right side are respectively a classification loss function, a hash coding learning function and a Min-Max feature regular item with confidence coefficient, the classification loss function is applied to the classification layer, the hash coding learning function is applied to the hash coding layer, the Min-Max feature regular item with confidence coefficient is applied to the feature layer, lambda is a hyper-parameter for adjusting the balance between the three items on the right side,
Figure BDA0003371749730000075
a set of images representing a training sample is shown,
Figure BDA0003371749730000076
Figure BDA0003371749730000077
represented as sample image X i The vector of the class labels that are inferred,
Figure BDA0003371749730000078
a set of parameters representing a network model layer,
Figure BDA0003371749730000079
v i representing a sample image X i Confidence of, i.e. v i Represented as sample image X i Inferred class label vector
Figure BDA00033717497300000710
The degree of certainty; if it is not
Figure BDA00033717497300000711
Then during the entire training process,
Figure BDA00033717497300000712
are all equal to y i I.e. by
Figure BDA00033717497300000713
If it is used
Figure BDA00033717497300000714
Figure BDA00033717497300000715
Representing the network model learned from the previous round as a sample image X i The inferred category label vector.
The specific implementation method of the step 5) is as follows:
fixing the device
Figure BDA00033717497300000716
Updating
Figure BDA00033717497300000717
That is, based on the network model learned in the current round, corresponding class label vectors are deduced for all training samples:
for a single label image dataset: the classification loss function for the label-free sample set is:
Figure BDA00033717497300000718
because the classification losses of different samples are independent of each other, and v i Non-negative constants, so the above formula can be rewritten as U mutually independent sub-optimization problems:
Figure BDA0003371749730000081
the optimal solution of the above sub-optimization problem is:
Figure BDA0003371749730000082
based on the optimal solution
Figure BDA0003371749730000083
Thereby obtaining
Figure BDA0003371749730000084
For a multi-label image dataset: the classification loss function for the label-free sample set is:
Figure BDA0003371749730000085
Figure BDA0003371749730000086
because the classification losses of different samples are independent of each other, and v i Non-negative constants, so the above formula can be rewritten as U mutually independent sub-optimization problems:
Figure BDA0003371749730000087
the optimal solution of the above sub-optimization problem is:
Figure BDA0003371749730000088
based on the optimal solution
Figure BDA0003371749730000089
Thereby obtaining
Figure BDA00033717497300000810
The specific implementation method of the step 6) is as follows:
for a single label image dataset, in feature space, a sample image X is defined i To homogeneous samples d i Can be expressed as:
Figure BDA00033717497300000811
wherein | 2 Represents the modulo length of the vector if
Figure BDA00033717497300000812
Otherwise
Figure BDA00033717497300000813
For a multi-label image dataset, in feature space, a sample image X is defined i To similar samples i Can be expressed as:
Figure BDA0003371749730000091
wherein | 2 Representing the modular length, ω, of the vector ij Representation image X i And X j If X is similar to each other i And X j There is at least one common class label, then ω ij =1, otherwise ω ij =0;
Thus, image X i Confidence of r i The calculation formula of (2) is as follows:
Figure BDA0003371749730000092
wherein z is i =exp(-d i ),z max =max{z 1 ,z 2 ,…,z N }, exp (·) denotes an exponential function.
The beneficial effects of the invention are:
(1) The method expands and applies the traditional direct-push type semi-supervised learning method to the Hash coding method based on deep learning, and provides the image Hash coding method based on the direct-push type semi-supervised deep learning.
(2) The method introduces confidence into the label-free training sample, thereby greatly reducing the adverse effect of the uncertain sample on the training process and enabling the convergence process of the network model to be more stable.
(3) The method provides a hash code learning function, so that the hash codes of similar samples have smaller Hamming distance.
(4) The method of the invention provides a Min-Max characteristic regular term with confidence coefficient, so that in a characteristic space: the distance between similar samples is as small as possible and the distance between non-similar samples is as large as possible.
(5) The image hash coding method based on the direct-push semi-supervised deep learning provided by the invention does not depend on a network structure, and can be applied to any deep convolutional neural network. Meanwhile, the method can obviously reduce the labeling cost of the data.
Drawings
Fig. 1 is a deep convolutional network model of an image hash coding method based on direct-push semi-supervised deep learning according to the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention provides an image hash coding method based on direct-push type semi-supervised deep learning, which comprises the following steps as shown in figure 1:
1) Preparing an image data set, and dividing the data set into a training sample set and a testing sample set;
2) The training set is further divided into a marked training sample set and a non-marked training sample set;
the specific implementation method of the step 2) is as follows:
given training sample set
Figure BDA0003371749730000101
Wherein
Figure BDA0003371749730000102
And
Figure BDA0003371749730000103
respectively representing a marked training sample set and a non-marked training sample set, L representing the number of marked training samples, U representing the number of non-marked training samples, L being generally much smaller than U, X i Representing the ith training sample image. If the image is
Figure BDA0003371749730000104
Figure BDA0003371749730000105
Represents X i Corresponding class label vector if image X i Contains the label of the jth category and,
Figure BDA0003371749730000106
otherwise
Figure BDA0003371749730000107
C represents a data set
Figure BDA0003371749730000108
The number of category labels of (1). Image X i Possibly containing a plurality of category labels, i.e. y i There may be multiple components of 1. Let N = L + U denote the total number of training sample images.
The purpose of the depth image hash coding method is as follows: learning a Hamming space from an image space to a Hamming space { -1,1 }based on a deep convolutional neural network K Non-linear hashing ofMapping
Figure BDA0003371749730000109
This mapping can map image X into a K-bit hash code b = h (X), so that in hamming space the similarity between different image pairs can still be maintained, K being the number of bits of the image hash code; for convenience of description, image X is illustrated i The hash code of (b) i =h(X i )。
3) Building a deep convolutional neural network model of an image Hash coding method based on direct-push type semi-supervised deep learning;
the specific implementation method of the step 3) is as follows:
giving a deep convolutional neural network model, replacing the last layer of the deep convolutional neural network model with two new fully-connected layers which are respectively used for image hash coding and image classification and are respectively called a hash coding layer hcl and an image classification layer cls; in the newly built network, a Hash coding layer hcl is in front of the network, and an image classification layer cls is behind the network, namely the image classification layer cls is at the last layer of the network; the number of the hash coding layer hcl neurons is the same as the number of bits of hash coding, and the number of the image classification layer cls neurons is the same as the number of class labels of the image data set.
For a more intuitive description, a deep convolutional neural network model of the image hash coding method based on the direct-push semi-supervised deep learning is illustrated below by taking a deep convolutional neural network as an example. Referring to fig. 1, given a deep convolutional neural network, the network is composed of several convolutions, feature layers and classification layers, wherein the feature layer is the penultimate layer of the network and the classification layer is the last layer of the network. As described above, the classification layer is replaced with two new fully-connected layers, namely the hash coding layer hcl and the classification layer cls, and the rest of the network structure is unchanged. The number of neurons in the hash encoding layer hcl and the classification layer cls are K and C, respectively. In the network training process, a Hash coding learning function is applied to a Hash coding layer, and a classification loss function is applied to a classification layer. The following describes more detailed details of the image hash coding method based on the direct-push semi-supervised deep learning.
4) Setting the confidence degrees of all the labeled samples to be 1, setting the confidence degrees of all the unlabeled samples to be 0, randomly initializing the class label vectors of the unlabeled samples, and then training the network model built in the step 3) on the whole training set until the network model is converged;
the specific implementation method of the step 4) is as follows:
401 Constructing a classification loss function;
the specific implementation method of step 401) is as follows:
the method provided by the invention can process a single-label image data set and can also process a multi-label image data set. The classification loss function is applied at the classification level. For a single-label image dataset and a multi-label image dataset, the invention respectively adopts different classification loss functions:
for a single label image dataset: applying softmax activation function at classification level, and classification loss function
Figure BDA0003371749730000121
Comprises the following steps:
Figure BDA0003371749730000122
wherein the content of the first and second substances,
Figure BDA0003371749730000123
a set of images representing a training sample is shown,
Figure BDA0003371749730000124
representing the network model learned by the previous training as sample image X i The broken class label vector is kept unchanged in the training of the current round,
Figure BDA0003371749730000125
the jth component of
Figure BDA0003371749730000126
Is expressed as sample X i Is pushed byBroken jth category label.
Figure BDA0003371749730000127
v i Representing a sample image X i Confidence of, i.e. v i Represented as sample image X i Inferred category label vector
Figure BDA0003371749730000128
The degree of certainty.
Figure BDA0003371749730000129
A set of parameters representing a network model layer,
Figure BDA00033717497300001210
representing a sample image X i And inputting the output vector of the currently trained network model at the last classification layer. I (cond) represents an indicator function, which has a value of 1 if the condition cond is true, and 0 otherwise;
for a multi-label image dataset: there is no activation function at the classification level, and the classification loss function used is:
Figure BDA00033717497300001211
Figure BDA00033717497300001212
wherein the content of the first and second substances,
Figure BDA00033717497300001213
representing the network model learned by the previous training as sample image X i The broken class label vector is kept unchanged in the training of the current round,
Figure BDA00033717497300001214
the jth component of (1)
Figure BDA00033717497300001215
Represented as sample image X i Inferred jth category label;
Figure BDA0003371749730000131
representing a sample image X i And inputting the output vector of the currently trained network model at the last classification layer.
402 Construct a hash-coding learning function;
the specific implementation method of step 402) is as follows:
definition image X i And X j The similarity between them is ω ij : if X is i And X j Are semantically similar (i.e., there is at least one common class label), ω ij =1; otherwise ω is ij =0; assume N training sample images
Figure BDA0003371749730000132
The corresponding hash code is B = [ B = [ ] 1 ,…,b N ]I.e. b i Is an image X i Hash coding of (1); similarity Ω = { ω = between N training samples ij The likelihood function of is:
Figure BDA0003371749730000133
wherein the content of the first and second substances,
Figure BDA0003371749730000134
Figure BDA0003371749730000135
n is a successive multiplication symbol, v ij =v i ·v j Representing the similarity ω between samples ij The degree of confidence of (a) is,
Figure BDA0003371749730000136
representing a hash code b i Exp (-) represents an exponential function;
based on the above description, the hash coding learning function proposed by the present invention
Figure BDA0003371749730000137
Comprises the following steps:
Figure BDA0003371749730000138
403 Constructing a Min-Max feature regular term with confidence coefficient;
the specific implementation method of step 403) is as follows:
in order to learn better image hash coding, the invention provides a Min-Max characteristic regular term with confidence coefficient, wherein the Min-Max characteristic regular term with the confidence coefficient is applied to a characteristic layer during network training, and the regular term explicitly enables the characteristics learned by the network to have the following attributes: if there is at least one common class label for both images, the distance between their feature vectors should be as small as possible, otherwise the distance between their feature vectors should be as large as possible. Constructing a Min-Max feature regularization term with confidence coefficient, the Min-Max feature regularization term with confidence coefficient provided by the invention
Figure BDA0003371749730000141
Comprises the following steps:
Figure BDA0003371749730000142
wherein v is ij =v i ·v j ,x i Representing a sample image X i The feature vector (output of the network feature layer).
404 Combining a classification loss function, a Hash coding learning function and a Min-Max characteristic regular term with confidence coefficient to construct a total target function;
the overall objective function constructed in step 404)
Figure BDA0003371749730000143
Comprises the following steps:
Figure BDA0003371749730000144
in the above formula, the three items on the right side are respectively a classification loss function, a hash coding learning function and a Min-Max feature regular item with confidence coefficient, the classification loss function is applied to the classification layer, the hash coding learning function is applied to the hash coding layer, the Min-Max feature regular item with confidence coefficient is applied to the feature layer, lambda is a hyper-parameter for adjusting the balance between the three items on the right side,
Figure BDA0003371749730000145
a set of images representing a training sample is shown,
Figure BDA0003371749730000146
Figure BDA0003371749730000147
represented as sample image X i The vector of the class labels that are inferred,
Figure BDA0003371749730000148
a set of parameters representing a network model layer,
Figure BDA0003371749730000149
v i representing a sample image X i Confidence of, i.e. v i Represented as sample image X i Inferred class label vector
Figure BDA00033717497300001410
The degree of certainty; if it is not
Figure BDA00033717497300001411
Then during the entire training process,
Figure BDA00033717497300001412
are all equal to y i I.e. by
Figure BDA00033717497300001413
If it is not
Figure BDA00033717497300001414
Figure BDA00033717497300001415
Representing the network model learned from the previous round as a sample image X i The inferred category label vector.
405 Based on the overall objective function, a mini-batch based stochastic gradient descent method is used to train the deep convolutional neural network model.
5) Deducing corresponding class label vectors for all training samples based on the layer parameters of the currently learned network model;
the specific implementation method of the step 5) is as follows:
fixing
Figure BDA0003371749730000151
Updating
Figure BDA0003371749730000152
That is, based on the network model learned in the current round, corresponding class label vectors are deduced for all training samples: in fact, it is only necessary to infer the corresponding class label vectors for all unlabeled training sample images.
For a single label image dataset: the classification loss function for the unlabeled sample set is:
Figure BDA0003371749730000153
since the classification penalties of different samples are independent of each other, and v i Non-negative constants, so the above formula can be rewritten as U mutually independent sub-optimization problems:
Figure BDA0003371749730000154
the optimal solution of the above sub-optimization problem is:
Figure BDA0003371749730000155
based on the optimal solution
Figure BDA0003371749730000156
Thereby obtaining
Figure BDA0003371749730000157
For a multi-label image dataset: the classification loss function for the label-free sample set is:
Figure BDA0003371749730000158
Figure BDA0003371749730000159
because the classification losses of different samples are independent of each other, and v i Non-negative constants, so the above formula can be rewritten as U mutually independent sub-optimization problems:
Figure BDA00033717497300001510
the optimal solution of the above sub-optimization problem is:
Figure BDA0003371749730000161
based on the optimal solution
Figure BDA0003371749730000162
Thereby obtaining
Figure BDA0003371749730000163
6) Calculating a confidence corresponding to each training sample based on the layer parameters of the currently learned network model and the inferred class label vector of the training sample;
the specific implementation method of the step 6) is as follows:
if the image is
Figure BDA0003371749730000164
Then its confidence v is always set throughout the training process i =1; if it is not
Figure BDA0003371749730000165
The present invention calculates r based on two intuitive assumptions i : (1) In the feature space, if the average distance from one sample to other similar samples or similar samples is smaller, the sample is closer to the class center of the corresponding class, and higher confidence is given; (2) In the feature space, if the average distance from one sample to other similar samples or similar samples is larger, the sample is farther away from the center of the corresponding class, and a lower confidence degree is given; defining a sample image X i Is x (the output of the network feature layer) i
For a single label image dataset, in feature space, a sample image X is defined i To homogeneous samples d i Can be expressed as:
Figure BDA0003371749730000166
wherein | 2 Represents the modulo length of the vector if
Figure BDA0003371749730000167
Otherwise
Figure BDA0003371749730000168
For a multi-label image dataset, in feature space, a sample image X is defined i To similar samples i Can be expressed as:
Figure BDA0003371749730000169
wherein | 2 The length of the modulus, ω, representing the vector ij Representation image X i And X j If X is similar to each other i And X j There is at least one common class label, then ω ij =1, otherwise ω ij =0;
Thus, based on the above analysis, whether a single-label image dataset or a multi-label image dataset, d i The smaller, X i The easier it is to be assigned the correct tag vector, the higher confidence should be assigned; thus, the image X proposed by the present invention i Confidence of r i The calculation formula of (c) is:
Figure BDA0003371749730000171
wherein z is i =exp(-d i ),z max =max{z 1 ,z 2 ,…,z N }, exp (·) denotes an exponential function.
7) Training the network model set up in the step 3) from random initialization by using the whole training sample set based on the class label vectors and the confidence degrees of all the current training samples until the network model converges; wherein, a total objective function is adopted, and a mini-batch-based stochastic gradient descent method is used for training a deep convolution neural network model.
8) Repeatedly executing the steps 5), 6) and 7) until the current round training number reaches the preset maximum round training number; according to experiments, the preset maximum number of training rounds is generally set to be 4.
9) And calculating the Hash codes of the images in the test sample set by using the trained network model.
The experimental results are as follows:
the invention respectively performs experiments on three common image data sets of CIFAR10, NUS-WIDE and MIR-Flickr 25K.
The CIFAR10 dataset is a single label image dataset, with 60000 images in total, 10 classes. According to the common semi-supervised learning setting of the data set, 1000 images (100 images are randomly selected for each class) are used as a query set, and the rest 59000 images are used as a retrieval database; and taking all the images in the retrieval database as a training sample set, wherein 5000 images are taken as labeled training sample sets, and the rest 54000 images are taken as unlabeled training sample sets.
NUS-WIDE is a multi-label image dataset, with a total of about 270000 images. According to the common semi-supervised learning setting of the data set, 2100 images are used as a query set, 10500 images are used as a labeled training sample set, 149733 images are used as a label-free training sample set, and the images of the whole training set are used as a retrieval database.
The MIR-Flickr25K dataset is a multi-label image dataset, with a total of 25000 images. According to the common semi-supervised learning setting of the data set, 1000 images serve as a query set, 5000 images serve as a labeled training sample set, 19000 images serve as an unlabeled training sample set, and the images of the whole training set serve as a retrieval database.
Currently, very representative Hash coding methods based on semi-supervised learning are DSH-GANs, SSDH and BGDH. For fair comparison, the present invention was experimented with the deep convolutional neural network model used by these methods.
In experiments on these three data sets, the parameter λ in the overall objective function of the present invention was set to 0.01. The maximum number of rounds of training is set to 4.
The experimental test method comprises the following steps: after the network model is trained, respectively calculating a query set corresponding to the data set and hash codes of images in a retrieval database, and then calculating retrieval performance based on the hash codes of the images, wherein the evaluation index adopts MAP scores commonly used in academia (the larger the MAP value is, the better the description method is).
The results of the different methods on the corresponding data sets are given in tables 1, 2 and 3, respectively. The experimental results show that the method of the invention is obviously superior to other comparison methods in the table, and the superiority of the method of the invention is fully proved.
In addition, the method of the invention also performs ablation experiments on the three data sets to verify the confidence coefficient, and the experimental results are shown in tables 4, 5 and 6. The experimental results fully verify that confidence is introduced into the label-free training samples, so that the adverse effect of uncertain samples on the training process can be greatly reduced, and the convergence process of the network model is more stable.
TABLE 1 MAP scores on CIFAR10 data set by different methods
Figure BDA0003371749730000191
TABLE 2 MAP scores on NUS-WIDE data set by different methods
Figure BDA0003371749730000192
TABLE 3 MAP scores on MIR-Flickr25K data set by different methods
Figure BDA0003371749730000193
TABLE 4 Effect of whether confidence is used on MAP score on CIFAR10 dataset
Figure BDA0003371749730000201
TABLE 5 Effect of whether confidence is used on MAP scores on NUS-WIDE datasets
Figure BDA0003371749730000202
TABLE 6 influence of whether confidence is used on MAP scores on the MIR-Flickr25K dataset
Figure BDA0003371749730000203

Claims (6)

1. The image hash coding method based on the direct-push semi-supervised deep learning is characterized by comprising the following steps of:
1) Preparing an image data set, and dividing the data set into a training sample set and a testing sample set;
2) The training set is further divided into a marked training sample set and a non-marked training sample set;
the specific implementation method of the step 2) is as follows:
given training sample set
Figure FDA0004008896680000011
Wherein
Figure FDA0004008896680000012
And
Figure FDA0004008896680000013
respectively representing a marked training sample set and a non-marked training sample set, L representing the number of marked training samples, U representing the number of non-marked training samples, L being much smaller than U and X i Representing the ith training sample image; if the image
Figure FDA0004008896680000014
Represents X i Corresponding class label vector if image X i Contains the label of the jth category and,
Figure FDA0004008896680000015
otherwise
Figure FDA0004008896680000016
C represents a data set
Figure FDA0004008896680000017
The number of category labels of (1); image X i Possibly containing a plurality of category labels, i.e. y i There may be a plurality of components of 1; order toN = L + U represents the total number of training sample images;
3) Building a deep convolutional neural network model of an image Hash coding method based on direct-push type semi-supervised deep learning;
the specific implementation method of the step 3) is as follows:
a deep convolutional neural network model is given, the last layer of the deep convolutional neural network model is replaced by two new full connection layers which are respectively used for image hash coding and image classification and are respectively called as a hash coding layer hcl and an image classification layer cls; in the newly constructed network, the Hash coding layer hcl is in front, and the image classification layer cls is behind, namely the image classification layer cls is at the last layer of the network; the number of the hash coding layer hcl neurons is the same as the number of bits of hash coding, and the number of the image classification layer cls neurons is the same as the number of class labels of the image data set;
4) Setting the confidence degrees of all the labeled samples to be 1, setting the confidence degrees of all the unlabeled samples to be 0, randomly initializing the class label vectors of the unlabeled samples, and then training the network model built in the step 3) on the whole training set until the network model is converged;
the specific implementation method of the step 4) is as follows:
401 Constructing a classification loss function;
the specific implementation method of step 401) is as follows:
for a single-label image dataset and a multi-label image dataset, different classification loss functions are respectively adopted:
for a single label image dataset: applying softmax activation function at classification level, and classification loss function
Figure FDA0004008896680000021
Comprises the following steps:
Figure FDA0004008896680000022
wherein the content of the first and second substances,
Figure FDA0004008896680000023
a set of images representing a training sample is shown,
Figure FDA0004008896680000024
representing the network model learned from the previous training round as sample image X i The broken class label vector is kept unchanged in the training of the current round,
Figure FDA0004008896680000025
the jth component of
Figure FDA0004008896680000026
Is expressed as sample X i Inferred jth category label;
Figure FDA0004008896680000027
v i representing a sample image X i Confidence of, i.e. v i Represented as sample image X i Inferred class label vector
Figure FDA0004008896680000028
The degree of certainty;
Figure FDA0004008896680000029
a set of parameters representing a network model layer,
Figure FDA00040088966800000210
representing a sample image X i Inputting an output vector of the currently trained network model at the last classification layer; i (cond) represents an indicator function, which has a value of 1 if the condition cond is true, and 0 otherwise;
for a multi-label image dataset: there is no activation function at the classification level, and the classification loss function used is:
Figure FDA00040088966800000211
Figure FDA0004008896680000031
wherein the content of the first and second substances,
Figure FDA0004008896680000032
representing the network model learned by the previous training as sample image X i The broken class label vector is kept unchanged in the training of the current round,
Figure FDA0004008896680000033
the jth component of
Figure FDA0004008896680000034
Represented as sample image X i Inferred jth category label;
Figure FDA0004008896680000035
representing a sample image X i Inputting an output vector of the currently trained network model at the last classification layer;
402 Constructing a hash coding learning function;
403 Constructing a Min-Max feature regular term with confidence coefficient;
404 Combining a classification loss function, a Hash coding learning function and a Min-Max characteristic regular term with confidence coefficient to construct a total target function;
405 Based on the total objective function, training a deep convolutional neural network model by using a mini-batch-based stochastic gradient descent method;
5) Deducing corresponding class label vectors for all training samples based on the layer parameters of the currently learned network model;
6) Calculating a confidence corresponding to each training sample based on the layer parameters of the currently learned network model and the inferred class label vector of the training sample;
7) Training the network model set up in the step 3) from random initialization by using the whole training sample set based on the class label vectors and the confidence degrees of all the current training samples until the network model converges;
8) Repeatedly executing the steps 5), 6) and 7) until the current round training number reaches the preset maximum round training number;
9) And calculating the Hash codes of the images in the test sample set by using the trained network model.
2. The image hash coding method based on the direct-push semi-supervised deep learning of claim 1, wherein the specific implementation method of step 402) is as follows:
definition image X i And X j The similarity between them is ω ij : if X i And X j Are semantically similar, i.e. there is at least one common class label, ω ij =1; otherwise ω is ij =0; assume N training sample images
Figure FDA0004008896680000041
The corresponding hash code is B = [ B = [ ] 1 ,…,b N ]I.e. b i As an image X i The hash coding of (2); similarity Ω = { ω = between N training samples ij The likelihood function of is:
Figure FDA0004008896680000042
wherein the content of the first and second substances,
Figure FDA0004008896680000043
Figure FDA0004008896680000044
n is a successive multiplication symbol, v ij =v i ·v j Representing the similarity ω between samples ij The degree of confidence of (a) is,
Figure FDA0004008896680000045
representing a hash code b i Exp (-) represents an exponential function;
based on the above description, the proposed hash coding learning function
Figure FDA0004008896680000046
Comprises the following steps:
Figure FDA0004008896680000047
3. the image hash coding method based on the direct-push semi-supervised deep learning of claim 1, wherein the specific implementation method of step 403) is as follows:
constructing a Min-Max feature regular term with confidence coefficient, and constructing a Min-Max feature regular term with confidence coefficient
Figure FDA0004008896680000048
Comprises the following steps:
Figure FDA0004008896680000049
wherein v is ij =v i ·v j ,x i Representing a sample image X i The feature vector of (2).
4. The image hash coding method based on direct-push semi-supervised deep learning of claim 1, wherein the overall objective function constructed in step 404) is
Figure FDA0004008896680000051
Comprises the following steps:
Figure FDA0004008896680000052
in the above formula, the three items on the right side are respectively a classification loss function, a hash coding learning function and a Min-Max feature regular item with confidence coefficient, the classification loss function is applied to the classification layer, the hash coding learning function is applied to the hash coding layer, the Min-Max feature regular item with confidence coefficient is applied to the feature layer, lambda is a hyper-parameter for adjusting the balance between the three items on the right side,
Figure FDA0004008896680000053
a set of images representing a training sample is shown,
Figure FDA0004008896680000054
Figure FDA0004008896680000055
represented as sample image X i The vector of the class labels that are inferred,
Figure FDA0004008896680000056
a set of parameters representing a network model layer,
Figure FDA0004008896680000057
v i representing a sample image X i Confidence of, i.e. v i Represented as sample image X i Inferred class label vector
Figure FDA0004008896680000058
The degree of certainty; if it is not
Figure FDA0004008896680000059
Then during the entire training process,
Figure FDA00040088966800000510
are all equal toy i I.e. by
Figure FDA00040088966800000511
If it is not
Figure FDA00040088966800000512
Figure FDA00040088966800000513
Representing the network model learned from the previous round as a sample image X i The inferred category label vector.
5. The image hash coding method based on the direct-push semi-supervised deep learning of claim 1, wherein the specific implementation method of step 5) is as follows:
fixing
Figure FDA00040088966800000514
Updating
Figure FDA00040088966800000515
That is, based on the network model learned in the current round, corresponding class label vectors are deduced for all training samples:
for a single label image dataset: the classification loss function for the label-free sample set is:
Figure FDA00040088966800000516
because the classification losses of different samples are independent of each other, and v i Non-negative constants, so the above formula can be rewritten as U mutually independent sub-optimization problems:
Figure FDA00040088966800000517
the optimal solution of the above sub-optimization problem is:
Figure FDA0004008896680000061
based on the optimal solution
Figure FDA0004008896680000062
Thereby obtaining
Figure FDA0004008896680000063
For a multi-label image dataset: the classification loss function for the label-free sample set is:
Figure FDA0004008896680000064
Figure FDA0004008896680000065
since the classification penalties of different samples are independent of each other, and v i Non-negative constants, so the above formula can be rewritten as U mutually independent sub-optimization problems:
Figure FDA0004008896680000066
the optimal solution of the above sub-optimization problem is:
Figure FDA0004008896680000067
based on the optimal solution to obtain
Figure FDA0004008896680000068
Thereby obtaining
Figure FDA0004008896680000069
6. The image hash coding method based on direct-push semi-supervised deep learning according to claim 5, wherein the specific implementation method of step 6) is as follows:
for a single label image dataset, in feature space, a sample image X is defined i To the same kind of sample i Can be expressed as:
Figure FDA00040088966800000610
wherein | 2 Represents the modulo length of the vector if
Figure FDA00040088966800000611
Otherwise
Figure FDA00040088966800000612
For a multi-label image dataset, in feature space, a sample image X is defined i To similar samples i Can be expressed as:
Figure FDA0004008896680000071
wherein | 2 The length of the modulus, ω, representing the vector ij Representation image X i And X j If X is similar to each other i And X j There is at least one common class label, then ω ij =1, otherwise ω ij =0;
Thus, image X i Confidence of r i The calculation formula of (2) is as follows:
Figure FDA0004008896680000072
wherein z is i =exp(-d i ),z max =max{z 1 ,z 2 ,…,z N }, exp (·) denotes an exponential function.
CN202111427674.6A 2021-11-24 2021-11-24 Image hash coding method based on direct-push type semi-supervised deep learning Active CN114170333B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111427674.6A CN114170333B (en) 2021-11-24 2021-11-24 Image hash coding method based on direct-push type semi-supervised deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111427674.6A CN114170333B (en) 2021-11-24 2021-11-24 Image hash coding method based on direct-push type semi-supervised deep learning

Publications (2)

Publication Number Publication Date
CN114170333A CN114170333A (en) 2022-03-11
CN114170333B true CN114170333B (en) 2023-02-03

Family

ID=80481230

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111427674.6A Active CN114170333B (en) 2021-11-24 2021-11-24 Image hash coding method based on direct-push type semi-supervised deep learning

Country Status (1)

Country Link
CN (1) CN114170333B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114379416B (en) * 2022-03-23 2022-06-17 蔚来汽车科技(安徽)有限公司 Method and system for controlling battery replacement operation based on vehicle chassis detection
CN115294396B (en) * 2022-08-12 2024-04-23 北京百度网讯科技有限公司 Backbone network training method and image classification method
CN115905926A (en) * 2022-12-09 2023-04-04 华中科技大学 Code classification deep learning model interpretation method and system based on sample difference

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165306A (en) * 2018-08-09 2019-01-08 长沙理工大学 Image search method based on the study of multitask Hash
CN109783682A (en) * 2019-01-19 2019-05-21 北京工业大学 It is a kind of based on putting non-to the depth of similarity loose hashing image search method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034205B (en) * 2018-06-29 2021-02-02 西安交通大学 Image classification method based on direct-push type semi-supervised deep learning
CN109918528A (en) * 2019-01-14 2019-06-21 北京工商大学 A kind of compact Hash code learning method based on semanteme protection
CN109960737B (en) * 2019-03-15 2020-12-08 西安电子科技大学 Remote sensing image content retrieval method for semi-supervised depth confrontation self-coding Hash learning
CN112861976B (en) * 2021-02-11 2024-01-12 温州大学 Sensitive image identification method based on twin graph convolution hash network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165306A (en) * 2018-08-09 2019-01-08 长沙理工大学 Image search method based on the study of multitask Hash
CN109783682A (en) * 2019-01-19 2019-05-21 北京工业大学 It is a kind of based on putting non-to the depth of similarity loose hashing image search method

Also Published As

Publication number Publication date
CN114170333A (en) 2022-03-11

Similar Documents

Publication Publication Date Title
CN114170333B (en) Image hash coding method based on direct-push type semi-supervised deep learning
CN111523047B (en) Multi-relation collaborative filtering algorithm based on graph neural network
CN108710894B (en) Active learning labeling method and device based on clustering representative points
CN107944410B (en) Cross-domain facial feature analysis method based on convolutional neural network
CN109063113B (en) Rapid image retrieval method, retrieval model and model construction method based on asymmetric depth discrete hash
CN110941734B (en) Depth unsupervised image retrieval method based on sparse graph structure
CN112699247A (en) Knowledge representation learning framework based on multi-class cross entropy contrast completion coding
CN108399185B (en) Multi-label image binary vector generation method and image semantic similarity query method
WO2021227091A1 (en) Multi-modal classification method based on graph convolutional neural network
CN110598022B (en) Image retrieval system and method based on robust deep hash network
CN110264372B (en) Topic community discovery method based on node representation
Li et al. DAHP: Deep attention-guided hashing with pairwise labels
CN112000689A (en) Multi-knowledge graph fusion method based on text analysis
CN116258990A (en) Cross-modal affinity-based small sample reference video target segmentation method
CN115048539A (en) Social media data online retrieval method and system based on dynamic memory
CN115828143A (en) Node classification method for realizing heterogeneous primitive path aggregation based on graph convolution and self-attention mechanism
CN114860973A (en) Depth image retrieval method for small sample scene
CN108647295B (en) Image labeling method based on depth collaborative hash
CN116383422B (en) Non-supervision cross-modal hash retrieval method based on anchor points
CN117173702A (en) Multi-view multi-mark learning method based on depth feature map fusion
Mudiyanselage et al. Feature selection with graph mining technology
Ahmed et al. Clustering research papers using genetic algorithm optimized self-organizing maps
CN114882279A (en) Multi-label image classification method based on direct-push type semi-supervised deep learning
CN114299336A (en) Photographic image aesthetic style classification method based on self-supervision learning and deep forest
CN114564594A (en) Knowledge graph user preference entity recall method based on double-tower model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant