CN114168782B - Deep hash image retrieval method based on triplet network - Google Patents
Deep hash image retrieval method based on triplet network Download PDFInfo
- Publication number
- CN114168782B CN114168782B CN202111448102.6A CN202111448102A CN114168782B CN 114168782 B CN114168782 B CN 114168782B CN 202111448102 A CN202111448102 A CN 202111448102A CN 114168782 B CN114168782 B CN 114168782B
- Authority
- CN
- China
- Prior art keywords
- image
- images
- hash
- triplet
- query
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000012360 testing method Methods 0.000 claims abstract description 35
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 27
- 239000013598 vector Substances 0.000 claims abstract description 21
- 238000005457 optimization Methods 0.000 claims abstract description 13
- 238000012549 training Methods 0.000 claims description 17
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 abstract description 34
- 238000003062 neural network model Methods 0.000 abstract 1
- 238000013507 mapping Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000004069 differentiation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000003631 expected effect Effects 0.000 description 1
- 238000005755 formation reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/51—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Library & Information Science (AREA)
- Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a deep hash image retrieval method based on a triplet network, which specifically comprises the following steps: adopting a triplet network as a CNN network to learn image characteristics; carrying out loss function optimization according to the characteristic learning condition, so that the distance between sample image pairs in the same category is smaller than that between sample image pairs in different categories, and optimizing a neural network model; inputting the query image and the test set image into a triplet network to respectively obtain binary hash codes of the images and depth feature vectors of fc7 layers, firstly, performing coarse retrieval by utilizing the binary hash codes, obtaining candidate set images of similar images according to the Hamming distance and a set threshold value, then performing fine retrieval by utilizing the depth features of fc7 layers, calculating Euclidean distances of the feature vectors of fc7 layers of the query image and the candidate set images, and obtaining final n retrieval results according to the sequence from small Euclidean distances to large. The invention solves the problem of low image retrieval precision in the prior art.
Description
Technical Field
The invention belongs to the technical field of computer image retrieval methods, and relates to a deep hash image retrieval method based on a triplet network.
Background
In recent years, hash technology has been widely used in large-scale image retrieval, and rapid image retrieval is performed by converting high-dimensional features of an image into binary hash codes in a low-dimensional space, and calculating a hamming distance. The method has the advantages of low storage cost and high calculation efficiency. The traditional hash method is to extract high-dimensional features first, learn the features to obtain a hash function, and then convert the high-dimensional features of the image into low-dimensional space hash codes by utilizing hash function mapping. With the development of CNN, a depth hash algorithm combining the depth features extracted by CNN with hash learning appears, so that hash codes representing deeper information of images can be directly obtained. However, most of the existing deep hash methods have insufficient utilization of supervision information, and cannot acquire more distinguishable hash codes, so that a model obtained through training does not reach an expected effect, and the accuracy of a search result is directly affected.
Therefore, how to provide an image retrieval method to improve the retrieval accuracy of the hash technique has important value.
Disclosure of Invention
The invention aims to provide a deep hash image retrieval method based on a triplet network, which solves the problem of low image retrieval precision in the prior art.
The technical scheme adopted by the invention is that the deep hash image retrieval method based on the triplet network is implemented according to the following steps:
step 1, adopting three parallel Alexnet network models as CNN network models, adding a hidden layer after the fc7 layer of the last full-connection layer of each Alexnet network model, namely adding a hash layer, and sharing weight parameters among the three Alexnet network models;
step 2, selecting an image data set CIFAR-10, dividing the image data set CIFAR-10 into a training set and a test set, taking a training set image as an input image of the CNN network model constructed in the step 1, wherein each input image is a triplet image and comprises an anchor point image, a positive sample image and a negative sample image, wherein the positive sample image and the anchor point image are of the same category, and the negative sample image and the anchor point image are of different categories;
step 3, respectively inputting an anchor point image, a positive sample image and a negative sample image of each input image into an Alexnet network model, and respectively generating a depth feature of an output fc7 layer and a preliminary hash code representing image features at the output fc7 layer and the hash layer by the anchor point image, the positive sample image and the negative sample image;
step 4, introducing a triplet loss function into the CNN network model, and optimizing the CNN network model by minimizing the triplet loss function by using the preliminary hash code generated in the step 3, so that the distances of the anchor point image and the positive sample image in the feature space are smaller than those of the anchor point image and the negative sample image in the feature space;
and 5, inputting the test set image and the query image into an Alexnet network model of the CNN network model optimized in the step 4, which corresponds to the input anchor point image, to obtain corresponding hash codes and fc7 layer depth characteristics, wherein the test image and the query image are anchor point images, calculating the Hamming distance of binary hash codes corresponding to the query image and the test set image, considering that the query image is similar to the test set image according to a set threshold value if the Hamming distance is smaller than the threshold value, obtaining m most similar candidate image sets in the test set image, then calculating the Euclidean distance between depth characteristic vectors of the query image vector and the candidate set image, and obtaining a final retrieval result according to the descending order, wherein n most similar images are found in the candidate image sets, namely the query image most similar n images.
The present invention is also characterized in that,
the CIFAR-10 image data set is 60000, the training set is classified into 10 types, 5000 types are classified into each type, and the test set is classified into 10 types, and 1000 types are classified into each type.
In the step 4, the distance between the anchor point image and the positive sample image in the feature space is smaller than the distance between the anchor point image and the negative sample image in the feature space, namely:
S(x i ,x i + )<S(x i ,x i - ) (2)
equivalent to
S 2 (x i ,x i + )<S 2 (x i ,x i - ) (3)
Wherein x is i For anchor pictures, x i + Is a positive sample image, i.e. is associated with x i Sample images of the same class, x i - As a negative sample image, i.e. with x i Sample images of different classes, S (x, x') representing two sheetsThe similarity of images x and x', i.e. S (x i ,x i + ) For similarity of anchor image and positive sample image, S (x i ,x i - ) Similarity of anchor image and negative sample image, according to which the triplet loss function is defined as follows:
loss(x i ,x i + ,x i - )=max{0,g+S 2 (x i ,x i + )-S 2 (x i ,x i - )} (4)
in the formula (x) i ,x i + ,x i - ) Represents a triplet loss function, where g is a threshold parameter that measures the distance between the same category image pair and a different category image pair.
g=-1。
The optimization of the CNN network model, namely the optimization of the triplet loss function, comprises the following specific processes:
record training set x= { X i (X) i }=(x i ,x i + ,x i - ) The optimization of the triplet loss function is then translated into the minimization problem of the following objective function L, namely:
wherein W is a neural network parameter, lambda is a regularization coefficient introduced for improving the generalization capability of the network model, and n is the number of triplets in the training set;
introducing a function d (W, X i ) Representing distances between anchor images, positive sample images and anchor image, negative sample image pairs in triplets, i.e
d(W,X i )=S 2 (x i ,x i + )-S 2 (x i ,x i - ) (6)
The objective function may be further written as:
to optimize the objective function to minimize it, the partial derivative of the objective function is:
by d (W, X) i ) Can be defined by d (W, X) i ) The gradient of (2) is as follows:
from this, it can be seen that, inputting the triplet image into the network, performing forward and reverse computation to obtain the gradient of the triplet loss function, and then using the gradient descent algorithm to obtain the optimized objective function, thereby optimizing the triplet loss function and obtaining the optimal CNN network model.
In step 5, the Hamming distance of binary hash codes corresponding to the query image and the test set image is specifically:
firstly, giving an image I in a test set, inputting a query image I into a CNN network model optimized in the step 5, extracting a hidden layer, namely, taking the output of a hash layer as an image characteristic, marking as Out (H), carrying Out binarization by setting a threshold value to obtain a corresponding binary hash code, namely, for each bit j=1..h, wherein H is the node number of the hidden layer, and the output binary code is as follows
Setting Γ= { I 1 ,I 2 ,...,I n Test set of image formations for retrieval,I n Representing the nth test set image, each corresponding binary code being represented as Γ H ={H 1 ,H 2 ,...,H n }, wherein H i ∈{0,1} h ;
For query image I q And its corresponding binary code H q Calculate H q And H i If H is the Hamming distance of q And H i ∈Γ H If the Hamming distance is smaller than the threshold value, the Hamming distance is considered as a similar image, and a candidate image set with m similar images is finally obtained according to the threshold value setting, so that a corresponding candidate image set is obtained
In step 5, the Euclidean distance between the query image vector and the depth feature vector of the candidate set image is specifically:
for query image I q And candidate image set P, V q Feature vector, V, representing query image q i P Representing feature vectors of the ith image in the candidate set images, the euclidean distance between their corresponding feature vectors is calculated by:
s i =||V q -V i P || (12)
the smaller the Euclidean distance is, the higher the similarity of the two images is, and finally n most similar result images are obtained in the candidate image set according to the sequence from small to large of the Euclidean distance, wherein n is smaller than m.
The beneficial effects of the invention are as follows:
the invention selects the triplet network sharing weight to learn the image characteristics, can fully utilize the supervision information of the images to obtain the depth characteristics of the representative images, and further optimizes the network model by utilizing the triplet loss function, so that the distance between images of the same category is smaller, the distance between images of different categories is larger, and the binary hash code with more differentiation and higher query precision is obtained. When searching the image, firstly, rough searching is carried out by utilizing hash coding to obtain a candidate image set, then fine searching is carried out by utilizing fc7 layer depth characteristics of the image, and finally, n most similar images are obtained in the candidate image set, thereby effectively improving the searching precision.
Drawings
FIG. 1 is a schematic diagram of image feature learning in a deep hash image retrieval method based on a triplet network;
FIG. 2 is a flow chart of a deep hash image retrieval method based on a triplet network in accordance with the present invention.
Detailed Description
The invention will be described in detail below with reference to the drawings and the detailed description.
The invention discloses a deep hash image retrieval method based on a triplet network, and the flow of the method is shown in figure 2. The method is implemented according to the following steps:
step 1, as shown in fig. 1, three parallel Alexnet network models are adopted as CNN network models, and a hidden layer is added after the fc7 layer of the last full-connection layer of each Alexnet network model, namely a hash layer is added, and weight parameters are shared among the three Alexnet network models;
step 2, selecting an image data set CIFAR-10, dividing the image data set CIFAR-10 into a training set and a test set, taking the training set image as an input image of the CNN network model constructed in the step 1, wherein the image data set CIFAR-10 is divided into 10 types, each type of training set is divided into 5000 types, each type of test set is divided into 10 types, each type of test set is 1000, each input image is a triplet image and comprises an anchor point image, a positive sample image and a negative sample image, the positive sample image and the anchor point image are of the same type, and the negative sample image and the anchor point image are of different types;
step 3, respectively inputting an anchor point image, a positive sample image and a negative sample image of each input image into an Alexnet network model, and respectively generating a depth feature of an output fc7 layer and a preliminary hash code representing image features at the output fc7 layer and the hash layer by the anchor point image, the positive sample image and the negative sample image;
step 4, introducing a triplet loss function into the CNN network model, and using the preliminary hash codes generated in the step 3 to minimize the triplet loss function, so that the distances of the anchor point image and the positive sample image in the feature space are smaller than those of the anchor point image and the negative sample image in the feature space, namely the hash codes generated by the images of the same category are more similar, and the hash code differences of the images of different categories are larger, thereby completing the optimization of the CNN network model;
the optimization process comprises the following steps:
assume that the relative distance relationship between sample images is represented using a series of image triplets: { X i }=(x,x + ,x - ) Where x is the anchor image, x + Is a positive sample, i.e. a sample image of the same class as x, x - Is a negative sample image, i.e. a sample image of a different class than x. According to the distance of the images in European space, the similarity of two images x, x' is defined as:
S(x,x`)=||F(x)-F(x`)|| 2 (1)
where S (x, x') is the similarity of two images, F (x) is a function of mapping image data to euclidean space, and for a feature mapping function implemented by a deep neural network, F (x) represents the network output of image x, or a feature representation of image x.
For a triplet during training, the distance between the anchor point image and the positive sample image in the feature space is smaller than the distance between the anchor point image and the negative sample image in the feature space, namely the feature representation expected to be output should be in L 2 The following condition is satisfied under the norm:
S(x i ,x i + )<S(x i ,x i - ) (2)
equivalent to
S 2 (x i ,x i + )<S 2 (x i ,x i - ) (3)
Wherein x is i For anchor pictures, x i + Is a positive sample image, i.e. is associated with x i Sample images of the same class, x i - As a negative sample image, i.e. with x i Sample images of different classes, S (x, x ') representing the similarity of the two images x and x', i.e. S (x) i ,x i + ) For similarity of anchor image and positive sample image, S (x i ,x i - ) The similarity of the anchor image and the negative sample image, where the form of square is used, facilitates the calculation of the derivative in the subsequent implementation of the optimization. For a given triplet { X ] i For classification problems, it is desirable to maximize the distance between the same class image pair and the non-same class image pair, so the triplet loss function is defined as follows:
loss(x i ,x i + ,x i - )=max{0,g+S 2 (x i ,x i + )-S 2 (x i ,x i - )} (4)
in the formula (x) i ,x i + ,x i - ) Represents a triplet loss function, where g is a threshold parameter that measures the distance between the same-category image pair and a different-category image pair, g= -1.
The optimization of the CNN network model, namely the optimization of the triplet loss function, comprises the following specific processes:
record training set x= { X i (X) i }=(x i ,x i + ,x i - ) The optimization of the triplet loss function is then translated into the minimization problem of the following objective function L, namely:
wherein W is a neural network parameter, lambda is a regularization coefficient introduced for improving the generalization capability of the network model, and n is the number of triplets in the training set;
introducing a function d (W, X i ) Representing distances between anchor images, positive sample images and anchor image, negative sample image pairs in triplets, i.e
d(W,X i )=S 2 (x i ,x i + )-S 2 (x i ,x i - ) (6)
The objective function may be further written as:
to optimize the objective function to minimize it, the partial derivative of the objective function is:
by d (W, X) i ) Can be defined by d (W, X) i ) The gradient of (2) is as follows:
from this, it can be seen that, inputting the triplet image into the network, performing forward and reverse computation to obtain the gradient of the triplet loss function, and then using the gradient descent algorithm to obtain the optimized objective function, thereby optimizing the triplet loss function, obtaining the optimal CNN network model, and extracting the most essential deep features of the image, and obtaining the hash code with higher differentiation and query precision.
Step 5, inputting the test set image and the query image into an Alexnet network model of the optimized CNN network model corresponding to the input anchor point image in step 4, obtaining corresponding hash codes and fc7 layer depth characteristics, wherein the test image and the query image are anchor point images, calculating the Hamming distance of binary hash codes corresponding to the query image and the test set image, considering that the query image is similar to the test set image according to a set threshold value, obtaining m most similar candidate image sets in the test set image if the Hamming distance is smaller than the threshold value, then calculating the Euclidean distance between depth feature vectors of the query image vector and the candidate set image, and obtaining a final retrieval result according to the descending order, wherein n most similar images are found in the candidate image sets, namely the query image most similar n images;
the Hamming distance of binary hash codes corresponding to the query image and the test set image is calculated specifically as follows:
firstly, giving an image I in a test set, inputting a query image I into a CNN network model optimized in the step 5, extracting a hidden layer, namely, taking the output of a hash layer as an image characteristic, marking as Out (H), carrying Out binarization by setting a threshold value to obtain a corresponding binary hash code, namely, for each bit j=1..h, wherein H is the node number of the hidden layer, and the output binary code is as follows
Setting Γ= { I 1 ,I 2 ,...,I n Test set of images for retrieval, I n Representing the nth test set image, each corresponding binary code being represented as Γ H ={H 1 ,H 2 ,...,H n }, wherein H i ∈{0,1} h ;
For query image I q And its corresponding binary code H q Calculate H q And H i If H is the Hamming distance of q And H i ∈Γ H If the Hamming distance is smaller than the threshold value, the Hamming distance is considered as a similar image, and a candidate image set with m similar images is finally obtained according to the threshold value setting, so that a corresponding candidate image set is obtained
The Euclidean distance between the query image vector and the depth feature vector of the candidate set image is calculated specifically as follows:
for query image I q And candidate image set P, V q Feature vector, V, representing query image q i P Special for representing ith image in candidate set imageThe eigenvectors, the euclidean distance between their corresponding eigenvectors is calculated by:
s i =||V q -V i P || (12)
the smaller the Euclidean distance is, the higher the similarity of the two images is, and finally n most similar result images are obtained in the candidate image set according to the sequence from small to large of the Euclidean distance, wherein n is smaller than m.
Claims (7)
1. The deep hash image retrieval method based on the triplet network is characterized by comprising the following steps of:
step 1, adopting three parallel Alexnet network models as CNN network models, adding a hidden layer after the fc7 layer of the last full-connection layer of each Alexnet network model, namely adding a hash layer, and sharing weight parameters among the three Alexnet network models;
step 2, selecting an image data set CIFAR-10, dividing the image data set CIFAR-10 into a training set and a test set, taking a training set image as an input image of the CNN network model constructed in the step 1, wherein each input image is a triplet image and comprises an anchor point image, a positive sample image and a negative sample image, wherein the positive sample image and the anchor point image are of the same category, and the negative sample image and the anchor point image are of different categories;
step 3, respectively inputting an anchor point image, a positive sample image and a negative sample image of each input image into an Alexnet network model, and respectively generating a depth feature of an output fc7 layer and a preliminary hash code representing image features at the output fc7 layer and the hash layer by the anchor point image, the positive sample image and the negative sample image;
step 4, introducing a triplet loss function into the CNN network model, and optimizing the CNN network model by minimizing the triplet loss function by using the preliminary hash code generated in the step 3, so that the distances of the anchor point image and the positive sample image in the feature space are smaller than those of the anchor point image and the negative sample image in the feature space;
and 5, inputting the test set image and the query image into an Alexnet network model of the CNN network model optimized in the step 4, which corresponds to the input anchor point image, to obtain corresponding hash codes and fc7 layer depth characteristics, wherein the test image and the query image are anchor point images, calculating the Hamming distance of binary hash codes corresponding to the query image and the test set image, considering that the query image is similar to the test set image according to a set threshold value if the Hamming distance is smaller than the threshold value, obtaining m most similar candidate image sets in the test set image, then calculating the Euclidean distance between depth characteristic vectors of the query image vector and the candidate set image, and obtaining a final retrieval result according to the descending order, wherein n most similar images are found in the candidate image sets, namely the query image most similar n images.
2. The method for searching deep hash images based on a triplet network according to claim 1, wherein the image dataset CIFAR-10 is 60000, the training set is classified into 10 categories, 5000 categories, and the testing set is classified into 10 categories, 1000 categories.
3. The method for searching deep hash images based on triple-play network according to claim 1, wherein in the step 4, the distance between the anchor point image and the positive sample image in the feature space is smaller than the distance between the anchor point image and the negative sample image in the feature space, namely:
S(x i ,x i + )<S(x i ,x i - ) (2)
equivalent to
S 2 (x i ,x i + )<S 2 (x i ,x i - ) (3)
Wherein x is i For anchor pictures, x i + Is a positive sample image, i.e. is associated with x i Sample images of the same class, x i - As a negative sample image, i.e. with x i Sample images of different classes, S (x, x ') representing the similarity of the two images x and x', i.e. S (x) i ,x i + ) For anchor and positive sample imagesSimilarity, S (x i ,x i - ) Similarity of anchor image and negative sample image, according to which the triplet loss function is defined as follows:
loss(x i ,x i + ,x i - )=max{0,g+S 2 (x i ,x i + )-S 2 (x i ,x i - )} (4)
in the formula (x) i ,x i + ,x i - ) Represents a triplet loss function, where g is a threshold parameter that measures the distance between the same category image pair and a different category image pair.
4. A deep hash image retrieval method based on a triplet network according to claim 3, characterized in that g= -1.
5. A deep hash image retrieval method based on a triplet network according to claim 3, wherein the optimization of the CNN network model is the optimization of the triplet loss function, and the specific process is:
record training set x= { X i (X) i }=(x i ,x i + ,x i - ) The optimization of the triplet loss function is then translated into the minimization problem of the following objective function L, namely:
wherein W is a neural network parameter, lambda is a regularization coefficient introduced for improving the generalization capability of the network model, and n is the number of triplets in the training set;
introducing a function d (W, X i ) Representing distances between anchor images, positive sample images and anchor image, negative sample image pairs in triplets, i.e
d(W,X i )=S 2 (x i ,x i + )-S 2 (x i ,x i - ) (6)
The objective function is further written as:
to optimize the objective function to minimize it, the partial derivative of the objective function is:
by d (W, X) i ) Can be defined by d (W, X) i ) The gradient of (2) is as follows:
from this, it can be seen that, inputting the triplet image into the network, performing forward and reverse computation to obtain the gradient of the triplet loss function, and then using the gradient descent algorithm to obtain the optimized objective function, thereby optimizing the triplet loss function and obtaining the optimal CNN network model.
6. The method for searching deep hash images based on triple-play network according to claim 4, wherein the calculating the hamming distance of binary hash codes corresponding to the query image and the test set image in step 5 is specifically:
firstly, giving an image I in a test set, inputting a query image I into a CNN network model optimized in the step 5, extracting a hidden layer, namely, taking the output of a hash layer as an image characteristic, marking as Out (H), carrying Out binarization by setting a threshold value to obtain a corresponding binary hash code, namely, for each bit j=1..h, wherein H is the node number of the hidden layer, and the output binary code is as follows
Setting Γ= { I 1 ,I 2 ,...,I n Test set of images for retrieval, I n Representing the nth test set image, each corresponding binary code being represented as Γ H ={H 1 ,H 2 ,...,H n }, wherein H i ∈{0,1} h ;
For query image I q And its corresponding binary code H q Calculate H q And H i If H is the Hamming distance of q And H i ∈Γ H If the Hamming distance is smaller than the threshold value, the Hamming distance is considered as a similar image, and a candidate image set with m similar images is finally obtained according to the threshold value setting, so that a corresponding candidate image set is obtained
7. The method for searching deep hash images based on triple-play network according to claim 6, wherein the euclidean distance between the query image vector and the depth feature vector of the candidate set image in step 5 is specifically:
for query image I q And candidate image set P, V q Feature vector, V, representing query image q i P Representing feature vectors of the ith image in the candidate set images, the euclidean distance between their corresponding feature vectors is calculated by:
s i =||V q -V i P || (12)
the smaller the Euclidean distance is, the higher the similarity of the two images is, and finally n most similar result images are obtained in the candidate image set according to the sequence from small to large according to the Euclidean distance, wherein n is smaller than m.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111448102.6A CN114168782B (en) | 2021-11-30 | 2021-11-30 | Deep hash image retrieval method based on triplet network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111448102.6A CN114168782B (en) | 2021-11-30 | 2021-11-30 | Deep hash image retrieval method based on triplet network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114168782A CN114168782A (en) | 2022-03-11 |
CN114168782B true CN114168782B (en) | 2024-03-15 |
Family
ID=80482302
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111448102.6A Active CN114168782B (en) | 2021-11-30 | 2021-11-30 | Deep hash image retrieval method based on triplet network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114168782B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114417046B (en) * | 2022-03-31 | 2022-07-12 | 腾讯科技(深圳)有限公司 | Training method of feature extraction model, image retrieval method, device and equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017092183A1 (en) * | 2015-12-03 | 2017-06-08 | 中山大学 | Image retrieval method based on variable-length deep hash learning |
CN113190699A (en) * | 2021-05-14 | 2021-07-30 | 华中科技大学 | Remote sensing image retrieval method and device based on category-level semantic hash |
-
2021
- 2021-11-30 CN CN202111448102.6A patent/CN114168782B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017092183A1 (en) * | 2015-12-03 | 2017-06-08 | 中山大学 | Image retrieval method based on variable-length deep hash learning |
CN113190699A (en) * | 2021-05-14 | 2021-07-30 | 华中科技大学 | Remote sensing image retrieval method and device based on category-level semantic hash |
Non-Patent Citations (1)
Title |
---|
冯兴杰 ; 程毅玮 ; .基于深度卷积神经网络与哈希的图像检索.计算机工程与设计.2020,(03),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN114168782A (en) | 2022-03-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113177132B (en) | Image retrieval method based on depth cross-modal hash of joint semantic matrix | |
CN110941734B (en) | Depth unsupervised image retrieval method based on sparse graph structure | |
CN110110100A (en) | Across the media Hash search methods of discrete supervision decomposed based on Harmonious Matrix | |
CN113326731A (en) | Cross-domain pedestrian re-identification algorithm based on momentum network guidance | |
CN108446334B (en) | Image retrieval method based on content for unsupervised countermeasure training | |
CN111274424B (en) | Semantic enhanced hash method for zero sample image retrieval | |
CN111428073A (en) | Image retrieval method of depth supervision quantization hash | |
CN110647907A (en) | Multi-label image classification algorithm using multi-layer classification and dictionary learning | |
CN111008224A (en) | Time sequence classification and retrieval method based on deep multitask representation learning | |
CN111104555A (en) | Video hash retrieval method based on attention mechanism | |
CN112036511B (en) | Image retrieval method based on attention mechanism graph convolution neural network | |
CN114329031B (en) | Fine-granularity bird image retrieval method based on graph neural network and deep hash | |
CN109871379A (en) | A kind of online Hash K-NN search method based on data block study | |
CN116452862A (en) | Image classification method based on domain generalization learning | |
CN114168782B (en) | Deep hash image retrieval method based on triplet network | |
CN114860973A (en) | Depth image retrieval method for small sample scene | |
CN115329120A (en) | Weak label Hash image retrieval framework with knowledge graph embedded attention mechanism | |
CN114612747A (en) | Remote sensing image retrieval method based on unsupervised weighted hash | |
CN113033345B (en) | V2V video face recognition method based on public feature subspace | |
CN117763185A (en) | Hash image retrieval method based on thinking space dimension | |
CN117951576A (en) | Power system malicious flow detection method based on transform time sequence multi-mode characteristics | |
CN117671666A (en) | Target identification method based on self-adaptive graph convolution neural network | |
CN117315534A (en) | Short video classification method based on VGG-16 and whale optimization algorithm | |
CN116894948A (en) | Uncertainty guidance-based semi-supervised image segmentation method | |
CN110717068A (en) | Video retrieval method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |