CN113191445B - Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm - Google Patents

Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm Download PDF

Info

Publication number
CN113191445B
CN113191445B CN202110531130.8A CN202110531130A CN113191445B CN 113191445 B CN113191445 B CN 113191445B CN 202110531130 A CN202110531130 A CN 202110531130A CN 113191445 B CN113191445 B CN 113191445B
Authority
CN
China
Prior art keywords
image
hash
generator
hash code
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110531130.8A
Other languages
Chinese (zh)
Other versions
CN113191445A (en
Inventor
曹媛
刘峻玮
桂杰
许晓伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN202110531130.8A priority Critical patent/CN113191445B/en
Publication of CN113191445A publication Critical patent/CN113191445A/en
Application granted granted Critical
Publication of CN113191445B publication Critical patent/CN113191445B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention provides a large-scale image retrieval method based on an automatic supervision countercheck hash algorithm. The invention provides a new Hash learning framework, which is called self-supervision counterattack Hash; the framework primarily learns discriminative hash codes using image rotation based self-supervised similarity metrics and generation countermeasure networks. The neural network model mainly comprises an encoder for acquiring the hash code, a generator for generating a pseudo image and a discriminator for distinguishing a true image from a false image; a loss function consisting of approximate semantic similarity loss, feature loss and antagonism loss is designed to maintain the similarity between the image and the hash code. Adding self-supervision characteristics into the whole model, neglecting bottom-layer semantic information and keeping high-layer semantic information; particularly for short hash codes, the high-level semantic information of the image can be better maintained. Experimental results show that compared with the conventional retrieval method, the image retrieval method provided by the invention has better image retrieval performance.

Description

Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm
Technical Field
The invention belongs to the technical field of deep learning, and particularly relates to a large-scale image data retrieval method based on an automatic supervision counterattack hash algorithm.
Background
The hash algorithm is concerned more and more in solving the problem of large-scale image retrieval due to low storage requirement and high search efficiency; and can be divided into supervised hashing and unsupervised hashing according to whether an image label is used, and the supervised hashing method generally has better performance than the unsupervised hashing method. However, in most cases, there is no label information in the dataset that is useful for the image, and manual labeling requires a lot of manpower. To address this problem, many researchers have attempted to improve the process. For example, Gidaris et al propose an image rotation based self-supervision method; however, this may result in different feature representations of the images before and after rotation. Although Misra et al solved this problem, they did not map the similarity matrix of similar images in the original space to the feature space.
With the rise of deep learning, deep learning algorithms can be divided into two categories: supervised learning and unsupervised learning. Supervised learning algorithms are favored by people with their high accuracy. However, manually marked labels are not readily available and require significant human resources. Therefore, in recent years, unsupervised learning algorithms have received increasing attention. The self-supervised learning is a popular choice in the unsupervised learning, and the popularity thereof is inevitable. After all mainstream supervised learning tasks mature, data becomes the most important bottleneck. Learning effective information from unlabeled data is a very important research subject, and self-supervision learning provides very rich imagination.
Disclosure of Invention
The invention aims to provide a large-scale image retrieval method based on an automatic supervision countermeasure hash algorithm to make up for the defects of the prior art.
In order to achieve the purpose, the invention adopts the following specific technical scheme:
a large-scale image retrieval method based on an automatic supervision countermeasure hash algorithm comprises the following steps:
s1: acquiring image data comprising a training set and a test set;
s2: optimizing the encoder by utilizing the training set;
s3: rotating the test set image data, and inputting the test set image data into an optimized S2 encoder to obtain a hash code;
s4: and calculating the Hamming distance between the hash code obtained in the step S3 and the hash code of the training set of the step S2, sorting the Hamming distances from small to large, outputting the first k search results, and completing the search.
Further, in S2: the encoder uses a structure similar to VGG19, including five convolutional layers, two fully-connected layers, and one hash layer; for feature comparison, a full connection layer is added at the end; by utilizing the relation between image neighborhood structures, namely the relation between the Hash code B and the semantic similarity matrix, the following objective function l is providedsTo learn hash codes to approximate as closely as possible the raw data distribution in projection space:
Figure BDA0003067929210000021
wherein, L is the length of the hash code, S is a similarity matrix, E represents that the objective function optimizes an encoder which is used for generating the hash code
Figure BDA0003067929210000022
Optimization of lsSimilar images in the original space can be made to have similar hash codes when mapped to the hash space.
Further, the encoder optimization in S2 specifically includes:
s2-1: obtaining the feature vectors of the images of the training set, calculating the cosine distances between the images and sequencing to obtain a similarity ranking;
s2-2: analyzing the similarity ranking and setting a threshold value to obtain a similarity matrix;
s2-3: rotating the training set image and inputting the image into the encoder to obtain a hash code;
s2-4: inputting the hash code into a generator to obtain a pseudo image;
s2-5: inputting the pseudo image and the real image into a discriminator at the same time for confrontation training;
s2-6: optimizing the encoder, the generator and the discriminator according to an objective function; the optimized encoder, generator and discriminator form an auto-supervised countermeasure hash algorithm.
Further, the S2-1 is specifically: for database points
Figure BDA0003067929210000023
Feature vectors are extracted from pool5 layer of VGG model by using k-nearest neighbor (KNN) method
Figure BDA0003067929210000024
And calculating cosine distances between the two groups, and sequencing the two groups in the order from small to large to obtain similarity ranking.
Further, in the S2-2: setting the K1 range as the neighborhood according to the cosine similarity of each image, and obtaining the initial matrix S1,S1The calculation is as follows:
Figure BDA0003067929210000025
wherein x isiAnd xjFor the feature vector of the image, K1-NN is xiK1 nearest neighbors of S1On the basis of (1), comparing S1And corresponding column of (1) and use
Figure BDA0003067929210000031
Structure S2As follows:
Figure BDA0003067929210000032
wherein K2-NN is xiK2 nearest neighbors and finally, combining these two matrices into S, is calculated as follows:
Figure BDA0003067929210000033
further, in the S2-4: the generator consists of a full-link layer and four deconvolution layers, and is generatedIn the device, a hash code is input as 'noise' to generate a new image; specifically, the hash code is input into a fully concatenated layer of size 8 × 8 × 256, and then 3 deconvolution layers of 5 × 5 and 1 × 1 are used, the number of kernels being 256, 128, 32, and 3, respectively; for the image I generated by the generatorGAnd an original image I, and an objective function l is provided between the feature vectorsf1(ii) a The objective function is defined as follows:
Figure BDA0003067929210000034
where Ψ (-) denotes the convolution-activated feature vector, w and h denote the sizes of the corresponding features, and D denotes the adjustment of the parameters in the arbiter; the generator generates a new image by using the hash code of the rotated image, so that a larger difference exists between the feature vector of the new image and the original image in consideration of low-level semantic information in the image; based on this problem, the image I after rotationRAn objective function is arranged between the feature vectors of the original image I and the feature vectors of the original image I, the objective function being to ensure that the feature vectors of the same image are as similar as possible irrespective of rotation, thereby reducing the new image IRAnd a rotated image I obtained from the encoderGThe feature vector of the original image is obtained from the discriminator. Therefore, we use this loss function to optimize the encoder and the discriminator; so the finally set objective function lf2The following were used:
Figure BDA0003067929210000035
wherein, IRThe image is rotated, and I is an original image; the method aims to ignore semantic information of a lower layer, so that the loss of characteristics of the lower layer is not considered. Wherein lf=lf1+γlf2And gamma is a weight parameter.
Further, in the S2-5: according to the structure of counterstudy, a discriminator D is arranged to judge whether the image is true or false, so as to optimize the generator G; the optimization of the generation of the countermeasure network is a very small value game problem; the discriminator D is composed of four convolution layers, the number of the cores is 32, 128, 256 and 256, a full connection layer is arranged next to the cores, the size of the full connection layer is 1024, and eLU is used as an activation function; the generative model is essentially a maximum likelihood estimate, which is used to generate a model of the particular distribution data; the function of the generated model is to capture the distribution of sample data, and convert the distribution of the original input information into a sample with designated distribution through the transformation of parameters in the maximum likelihood estimation; using a standard objective function for generating a countermeasure network, the formula is as follows:
Figure BDA0003067929210000041
wherein G represents adjusting parameters in the generator, D (-) represents the output of the last layer of the discriminator, and G (-) represents the output of the last layer of the generator; in order to make the effect of generating the antagonistic network more obvious, random noise is input into the generator, and the generator and the discriminator are optimized by using the following loss function:
Figure BDA0003067929210000042
where z is random noise, ld=ld′+ld”
Further, in the S2-6: the sum of the four parts of the loss function is set, and the two weights α and β of the loss function are set, the overall loss function, i.e. the objective function, is as follows:
L=ls+αlf+βld (9);
using the above (9) to perform network optimization, and adding a hash layer after the last complete connection layer to obtain hash code, learning parameters is required
Figure BDA0003067929210000043
θ、ηAnd ξ, the calculation process is as follows;
Figure BDA0003067929210000044
wherein
Figure BDA0003067929210000045
Indicates the input rotated picture, beta,
Figure BDA0003067929210000046
And θ represents a parameter in the encoder network;
IG=G(bi;η) (11)
where η represents a parameter in the generator;
Figure BDA0003067929210000047
where ξ represents a parameter in the discriminator; the objective function is optimized using back-propagation and random gradient descent algorithms.
The invention has the advantages and technical effects that:
the invention provides a new Hash learning framework, which is called self-supervision counterattack Hash; the framework primarily learns discriminative hash codes using image rotation based self-supervised similarity metrics and generation countermeasure networks (GANs). The neural network model mainly comprises an encoder for acquiring a hash code, a generator for generating a pseudo image and a discriminator for distinguishing a true image from a false image; a loss function consisting of approximate semantic similarity loss, feature loss and antagonism loss is designed to maintain the similarity between the image and the hash code. Adding self-supervision characteristics in the whole model, neglecting bottom semantic information and keeping high-level semantic information; particularly for short hash codes, the high-level semantic information of the image can be better maintained.
And the experimental result shows that compared with the conventional retrieval method, the image retrieval method provided by the invention has better image retrieval performance.
Drawings
FIG. 1 is a diagram illustrating the process of the present invention for self-supervised countermeasure hashing.
Fig. 2 is an effect diagram of the similarity matrix S generated by the present invention.
FIG. 3 is a graph comparing the effect of the weight parameter γ on the mean of accuracy (mAP) of different bits in the present invention.
FIG. 4 is a comparison graph of the results of whether pixel loss is considered in the loss function of the present invention.
Detailed Description
The invention will be further explained and illustrated by means of specific embodiments and with reference to the drawings.
Example 1:
a large-scale image retrieval method based on an auto-supervision countermeasure hash algorithm comprises the following steps (as shown in figure 1):
step 1: firstly, extracting a Semantic similarity matrix S (such as a Semantic similarity matrix part in FIG. 1 and FIG. 2) from a data set;
step 2: rotating the image to a certain angle, inputting the image into an Encoder to obtain a hash code (such as an Encoder part in fig. 1);
and step 3: inputting the hash code into the Generator to obtain a new image (as in the Generator section of FIG. 1);
and 4, step 4: the original image and the new image are input to a Discriminator for confrontation recognition (see the Discriminator section of fig. 1).
And 5: and optimizing the network according to the objective function. The performance (table 1) and training time (table 2) of the self-supervised counterhash algorithm (SHGan) and several hash algorithms (iterative quantization (ITQ), Locality Sensitive Hash (LSH), Spectral Hash (SH), Spherical hash (Spherical hash), deep binary hash (DeepBit), Deep Hash (DH), binary counterhash (BGAN)) on the cifar-10 dataset are shown in the following table:
TABLE 1 mean of average precision results on cifar-10 data set after 90 degree rotation of the image
Figure BDA0003067929210000061
TABLE 2 training and testing time for deep Hash, binary countermeasure Hash, and self-supervision countermeasure Hash
Figure BDA0003067929210000062
TABLE 3 average precision mean of 12-bit hash codes obtained by 90 and 180 degree rotation in self-supervised countermeasure hashing
Rotation angle mAP
90 0.495
180 0.510
Example 2:
the embodiment 1 comprises the following concrete steps:
step 1: for database points
Figure BDA0003067929210000063
Feature vectors are extracted from pool5 layer of VGG model by using k-nearest neighbor (KNN) method
Figure BDA0003067929210000064
And calculating cosine distances between them, and sorting in order from small to large. Setting the K1 range as its neighborhood according to the cosine similarity of each image, and obtaining the initial matrix S1,S1The calculation is as follows:
Figure BDA0003067929210000071
wherein x isiAnd xjFor the feature vector of the image, K1-NN is xiK1 nearest neighbors of (S)1Based on (2), compare S1And corresponding column of (1) and use
Figure BDA0003067929210000072
Structure S2As follows:
Figure BDA0003067929210000073
wherein K2-NN is xiK2 nearest neighbors and finally, combining these two matrices into S, is calculated as follows:
Figure BDA0003067929210000074
step 2: the picture is rotated to a certain angle and then input into the encoder E, which uses a structure similar to VGG19, including five convolutional layers, two fully-connected layers, and one hash layer. For feature comparison, a full link layer is added at the end. By utilizing the relationship between the image neighborhood structures, namely the relationship between the hash code B and the semantic similarity matrix S, the following objective function is proposed to learn the hash code so as to be as close to the original data distribution in the projection space as possible:
Figure BDA0003067929210000075
wherein L is the length of the hash code, S is a similarity matrix, E represents that the objective function optimizes the encoder, and the encoder E is used for generating the hash code
Figure BDA0003067929210000076
Optimization ofsSimilar images in the original space can be made to have similar hash codes when mapped to the hash space.
And step 3: in generator G, we input hash code B as "noise" to generate a new image. Specifically, the hash code B is input into a fully connected layer having a size of 8 × 8 × 256. Then 3 5 × 5 and 1 × 1 deconvolution layers were used. The number of kernels is 256, 128, 32 and 3, respectively. For the image I generated by the generatorGAnd an original image I, an objective function is proposed between the feature vectors. The objective function is defined as follows:
Figure BDA0003067929210000077
where Ψ (-) represents the convolution-activated feature vector, w and h represent the sizes of the corresponding features, and D represents the adjustment of the parameters in the arbiter. However, the generator generates a new image by using the hash code of the rotated image, so that a large gap exists between the feature vector of the new image and the original image in consideration of low-level semantic information in the image. Based on this problem, the image I after rotationRAn objective function is arranged between the feature vectors of the original image I and the feature vectors of the original image I, the objective function being to ensure that the feature vectors of the same image are as similar as possible irrespective of rotation, thereby reducing the new image IRAnd a rotated image I obtained from the encoderGThe feature vector of the original image is obtained from the discriminator. Therefore, this loss function is used to optimize the encoder and the discriminator. The set objective function is as follows:
Figure BDA0003067929210000081
wherein, IRThe image is rotated, and I is an original image; the method aims to ignore semantic information of a lower layer, so that feature loss of the lower layer is not consideredAnd (6) losing. lf=lf1+γlf2And gamma is a weight parameter.
And 4, step 4: the original picture I and the pseudo picture I are combinedGThe input to the discriminator is optimized by setting a discriminator D to judge the truth of the picture according to the structure of the counterstudy. Optimization to generate a competing network is a very small value gaming problem. Arbiter D consists of four convolutional layers with core numbers 32, 128, 256, followed by a fully-connected layer with size 1024, with eLU as the activation function. The generative model is essentially a maximum likelihood estimate that is used to generate a model of the particular distribution data. The function of the generated model is to capture the distribution of sample data and convert the distribution of the original input information into a sample with specified distribution through the transformation of parameters in the maximum likelihood estimation. Using a standard objective function for generating a countermeasure network, the formula is as follows:
Figure BDA0003067929210000082
wherein G represents adjusting parameters in the generator, D (-) represents the output of the last layer of the discriminator, and G (-) represents the output of the last layer of the generator; in order to make the effect of generating the antagonistic network more obvious, random noise is input into the generator, and the generator and the discriminator are optimized by using the following loss function:
Figure BDA0003067929210000083
where z is random noise. l. thed=ld′+ld”
The sum of the four parts of the loss function is set, and the two weights α and β of the loss function are set, and the overall loss function is as follows:
L=ls+αlf+βld (9)
and 5: we use (9) above to optimize our network and connect completely at the lastAdding a hash layer after layer connection to obtain our hash code requires learning parameter We Tθ, η, and ξ, the calculation is as follows:
Figure BDA0003067929210000091
wherein
Figure BDA0003067929210000092
Indicates the input rotated picture, beta,
Figure BDA0003067929210000093
And θ represents a parameter in the encoder network.
IG=G(bi;η) (11)
Where η represents a parameter in the generator G.
Figure BDA0003067929210000094
Where ξ represents a parameter in discriminator D; the objective function is optimized using back-propagation and random gradient descent algorithms.
Example 3:
experiments were performed on Cifar-10. Cifar-10 is a dataset compiled by Alex krizzesky and Ilya sutskver. It contains 60000 images (32 × 32)10 categories of 6000 pictures each. In Cifar-10, 1000 pictures are randomly drawn at each class as a training set and 100 pictures are drawn as a test set.
Evaluation indices mean of precision (mAP) and mean precision (AP) were used to evaluate our method. For each query, the average precision is the average of the top k results, the average precision is the average of all queries, and the calculation formula of the average precision is as follows:
Figure BDA0003067929210000095
where N represents the number of instances in the database used for the query that are associated with the ground truth. P (k) represents the precision of the first k instances. When the kth instance is relevant to the query, δ (k) is 1, otherwise δ (k) is 0.
Results on the cifar-10 dataset:
first, let K1 be 20 and K2 be 30 to obtain the semantic similarity matrix S. Fig. 2 shows a part of data in the matrix. Then, the minimum batch size was set to 256, and the learning rate was set to 0.0001. Setting alpha-1, beta-1 and gamma-3. The rotation angle is 90 degrees.
Table 1 shows the average precision mean results for 12, 24, 32 and 48 bits. The results show that the average precision mean results of 12 bits, 24 bits and 48 bits are respectively improved by 9.4%, 9.8% and 5.2%. This shows that the present invention has better performance on fewer bits, and the hash code provided by the present invention can better represent high-level semantic information of an image, thereby verifying the above inference. To further verify this idea, the image was rotated by 180 degrees again for the experiment, and the results are shown in table 3. As shown in table 3, the result of the 12-bit hash code is improved by 10.9%.
Training and testing times for Deep Hash (DH), binary countermeasure hash (BGAN), and self-supervised countermeasure hash (SHGan) were further compared. As shown in table 2. Binary hash (BGAN) and self-supervision countermeasure hash parameters are more, and training time and testing time are both longer than DH. However, binary hashing (BGAN) and self-supervised counterhashing (hch) generate hash codes very quickly due to the advantages of the hashing algorithm.
The influence of the parameter gamma on the experimental results was investigated. As shown in fig. 3, it was found that γ had little influence on the experimental results.
According to the pixel loss function in the binary countermeasure hash, a loss function at a pixel level is added to the self-supervised countermeasure hash. The formula is as follows:
Figure BDA0003067929210000101
Iijand
Figure BDA0003067929210000102
respectively representing the original image and the generated pseudo-image. Since the high number of bits is more representative of the pixel information in the image, the experimental results of the 32-bit and 48-bit hash codes are compared. The results are shown in FIG. 4. It can be seen that the mean of average accuracy (mAP) of this method decreases significantly with the loss of pixels, which further illustrates that our invention can enable neural networks to learn the high level semantic information of the image.
In summary, the present invention provides an auto-supervised hash algorithm based on a generative countermeasure network, which is called as an auto-supervised countermeasure hash algorithm. The self-supervision countermeasure hash is composed of an encoder, a generator and a discriminator. A loss function consisting of approximate semantic similarity loss, feature loss and antagonism loss is designed to maintain the similarity between the image and the hash code. The learned hash code can better represent high-level semantic information of the image, so that the accuracy of image retrieval is improved. Experimental results on the Cifar-10 dataset show that the proposed self-supervised counterhash has a higher performance. The invention provides a self-supervision learning method, which utilizes self-supervision information based on rotation or other transformation to design a target function; the generation of an antagonistic network is one of the most promising self-supervision learning methods; generating the countermeasure network can effectively generate synthetic data from the underlying space that is similar to the training data.

Claims (6)

1. A large-scale image retrieval method based on an automatic supervision countermeasure hash algorithm is characterized by comprising the following steps:
s1: acquiring image data comprising a training set and a test set;
s2: optimizing the encoder by utilizing the training set; the encoder optimization specifically includes:
s2-1: obtaining the feature vectors of the images of the training set, calculating the cosine distances between the images and sequencing to obtain a similarity ranking;
s2-2: analyzing the similarity ranking and setting a threshold value to obtain a similarity matrix;
s2-3: rotating the training set image and inputting the image into the encoder to obtain a hash code;
s2-4: inputting the hash code into a generator to obtain a pseudo image;
s2-5: inputting the pseudo image and the real image into a discriminator simultaneously for countermeasure training;
s2-6: optimizing the encoder, the generator and the discriminator according to an objective function; the optimized encoder, generator and discriminator form an automatic supervision countermeasure Hash algorithm;
s3: rotating the image data of the test set, and inputting the image data of the test set into an optimized S2 encoder to obtain a hash code;
s4: calculating the Hamming distance between the hash code obtained in the step S3 and the hash code of the training set S2, sorting the Hamming distances from small to large, outputting the first k retrieval results, and completing retrieval;
in the S2: the encoder uses a structure similar to VGG19, including five convolutional layers, two fully-connected layers, and one hash layer; adding a full connection layer; by utilizing the relationship between image neighborhood structures, namely the relationship between the hash code and the semantic similarity matrix, the following objective function is proposed to learn the hash code so as to approximate the original data distribution in the projection space:
Figure FDA0003683465100000011
where L is the length of the hash code, and the encoder is configured to generate the hash code
Figure FDA0003683465100000012
Optimization of lsSimilar images in the original space are made to have similar hash codes when mapped to the hash space.
2. The large-scale image retrieval method according to claim 1, wherein the S2-1 is specifically: for database points
Figure FDA0003683465100000013
Extracting feature vectors from pool5 layer of VGG model by using k-nearest neighbor KNN method
Figure FDA0003683465100000014
And calculating cosine distances between the two groups, and sequencing the two groups in the order from small to large to obtain similarity ranking.
3. The large-scale image retrieval method according to claim 1, wherein in S2-2: setting the K1 range as its neighborhood according to the cosine similarity of each image, and obtaining the initial matrix S1,S1The calculation is as follows:
Figure FDA0003683465100000015
wherein x isiAnd xjFor the feature vector of the image, K1-NN is xiK1 nearest neighbors of (S)1On the basis of (1), comparing S1And corresponding column of (1) and use
Figure FDA0003683465100000016
Structure S2As follows:
Figure FDA0003683465100000021
wherein K2-NN is xiK2 nearest neighbors and finally, combining these two matrices into S, is calculated as follows:
Figure FDA0003683465100000022
4. the large-scale image retrieval method according to claim 1, wherein in S2-4: the generator consists of a fully connected layerAnd four deconvolution layers, in the generator, inputting hash code as 'noise' to generate new image; specifically, the hash code is input into a fully concatenated layer of size 8 × 8 × 256, and then 3 deconvolution layers of 5 × 5 and 1 × 1 are used, the number of kernels being 256, 128, 32, and 3, respectively; for the image I generated by the generatorGAnd an original image I, and an objective function is provided among the characteristic vectors; the objective function is defined as follows:
Figure FDA0003683465100000023
where Ψ (-) denotes the convolution-activated feature vector, w and h represent the sizes of the corresponding features; the generator generates a new image by using the hash code of the rotated image, and the rotated image IRAn objective function is set between the feature vector of the original image I and the feature vector of the original image I, and the feature vector of the original image is obtained from a discriminator; using this loss function to optimize the encoder and the arbiter; the final set objective function is therefore as follows:
Figure FDA0003683465100000024
wherein lf=lf1+γlf2And gamma is a weight parameter.
5. The large-scale image retrieval method according to claim 1, wherein in S2-5: according to the structure of the counterstudy, a discriminator D is arranged to judge whether the image is true or false, so that the generator G is optimized; the discriminator D is composed of four convolution layers, the number of the cores is 32, 128, 256 and 256, a full connection layer is arranged next to the cores, the size of the full connection layer is 1024, and eLU is used as an activation function; using a standard objective function for generating a countermeasure network, the formula is as follows:
Figure FDA0003683465100000025
in order to make the effect of generating the antagonistic network more obvious, random noise is input into the generator, and the generator and the arbiter are optimized by using the following loss function:
Figure FDA0003683465100000026
where z is random noise, ld=ld'+ld”
6. The large-scale image retrieval method according to claim 1, wherein in S2-6: the sum of the three parts of the loss function is set, and the two weights α and β of the loss function are set, the overall loss function, i.e. the objective function, is as follows:
L=ls+αlf+βld (9);
using the above (9) to perform network optimization, and adding a hash layer after the last complete connection layer to obtain hash code, learning parameters is required
Figure FDA0003683465100000031
Theta, eta and xi, and the calculation process is as follows;
Figure FDA0003683465100000032
wherein
Figure FDA0003683465100000033
A rotated picture representing the input is shown,
Figure FDA0003683465100000034
and θ represents a parameter in the encoder network;
IG=G(bi;η) (11)
where η represents a parameter in the generator;
Figure FDA0003683465100000035
where ξ represents a parameter in the discriminator; the objective function is optimized using back-propagation and random gradient descent algorithms.
CN202110531130.8A 2021-05-16 2021-05-16 Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm Active CN113191445B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110531130.8A CN113191445B (en) 2021-05-16 2021-05-16 Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110531130.8A CN113191445B (en) 2021-05-16 2021-05-16 Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm

Publications (2)

Publication Number Publication Date
CN113191445A CN113191445A (en) 2021-07-30
CN113191445B true CN113191445B (en) 2022-07-19

Family

ID=76981846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110531130.8A Active CN113191445B (en) 2021-05-16 2021-05-16 Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm

Country Status (1)

Country Link
CN (1) CN113191445B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326390B (en) * 2021-08-03 2021-11-02 中国海洋大学 Image retrieval method based on depth feature consistent Hash algorithm
CN113946710A (en) * 2021-10-12 2022-01-18 浙江大学 Video retrieval method based on multi-mode and self-supervision characterization learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063112A (en) * 2018-07-30 2018-12-21 成都快眼科技有限公司 A kind of fast image retrieval method based on multi-task learning deep semantic Hash, model and model building method
CN109918528A (en) * 2019-01-14 2019-06-21 北京工商大学 A kind of compact Hash code learning method based on semanteme protection
CN109960737A (en) * 2019-03-15 2019-07-02 西安电子科技大学 Remote Sensing Images search method of the semi-supervised depth confrontation from coding Hash study
CN110110128A (en) * 2019-05-06 2019-08-09 西南大学 The discrete hashing image searching system of quickly supervision for distributed structure/architecture
CN110516095A (en) * 2019-08-12 2019-11-29 山东师范大学 Weakly supervised depth Hash social activity image search method and system based on semanteme migration
CN112214623A (en) * 2020-09-09 2021-01-12 鲁东大学 Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597298A (en) * 2020-03-26 2020-08-28 浙江工业大学 Cross-modal retrieval method and device based on deep confrontation discrete hash learning
CN112199520B (en) * 2020-09-19 2022-07-22 复旦大学 Cross-modal Hash retrieval algorithm based on fine-grained similarity matrix
CN112214570A (en) * 2020-09-23 2021-01-12 浙江工业大学 Cross-modal retrieval method and device based on counterprojection learning hash

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063112A (en) * 2018-07-30 2018-12-21 成都快眼科技有限公司 A kind of fast image retrieval method based on multi-task learning deep semantic Hash, model and model building method
CN109918528A (en) * 2019-01-14 2019-06-21 北京工商大学 A kind of compact Hash code learning method based on semanteme protection
CN109960737A (en) * 2019-03-15 2019-07-02 西安电子科技大学 Remote Sensing Images search method of the semi-supervised depth confrontation from coding Hash study
CN110110128A (en) * 2019-05-06 2019-08-09 西南大学 The discrete hashing image searching system of quickly supervision for distributed structure/architecture
CN110516095A (en) * 2019-08-12 2019-11-29 山东师范大学 Weakly supervised depth Hash social activity image search method and system based on semanteme migration
CN112214623A (en) * 2020-09-09 2021-01-12 鲁东大学 Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Learning to Hash with Dimension Analysis based Quantizer for Image Retrieval";Yuan Cao et al.;《IEEE》;20201231;第1-12页 *
"适用于图像检索的强化对抗生成哈希方法";施鸿源 等;《小型微型计算机系统》;20210507;第42卷(第5期);第1039-1043页 *

Also Published As

Publication number Publication date
CN113191445A (en) 2021-07-30

Similar Documents

Publication Publication Date Title
CN107480261B (en) Fine-grained face image fast retrieval method based on deep learning
CN113190699B (en) Remote sensing image retrieval method and device based on category-level semantic hash
CN111291836B (en) Method for generating student network model
CN107122809B (en) Neural network feature learning method based on image self-coding
CN113191445B (en) Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm
CN113657561B (en) Semi-supervised night image classification method based on multi-task decoupling learning
CN111898689A (en) Image classification method based on neural network architecture search
CN108984642A (en) A kind of PRINTED FABRIC image search method based on Hash coding
CN109960732B (en) Deep discrete hash cross-modal retrieval method and system based on robust supervision
CN112686376A (en) Node representation method based on timing diagram neural network and incremental learning method
CN114092747A (en) Small sample image classification method based on depth element metric model mutual learning
CN111008224A (en) Time sequence classification and retrieval method based on deep multitask representation learning
CN111783688B (en) Remote sensing image scene classification method based on convolutional neural network
Bai et al. Learning high-level image representation for image retrieval via multi-task dnn using clickthrough data
CN111079840B (en) Complete image semantic annotation method based on convolutional neural network and concept lattice
CN112699782A (en) Radar HRRP target identification method based on N2N and Bert
CN116977725A (en) Abnormal behavior identification method and device based on improved convolutional neural network
CN111507472A (en) Precision estimation parameter searching method based on importance pruning
CN116543250A (en) Model compression method based on class attention transmission
CN114168782B (en) Deep hash image retrieval method based on triplet network
CN112446432B (en) Handwriting picture classification method based on quantum self-learning self-training network
CN115131605A (en) Structure perception graph comparison learning method based on self-adaptive sub-graph
CN114387524A (en) Image identification method and system for small sample learning based on multilevel second-order representation
CN113887653A (en) Positioning method and system for tightly-coupled weak supervised learning based on ternary network
CN114170426A (en) Algorithm model for classifying rare tumor category small samples based on cost sensitivity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant