CN112925940A

CN112925940A - Similar image retrieval method and device, computer equipment and storage medium

Info

Publication number: CN112925940A
Application number: CN202110241887.3A
Authority: CN
Inventors: 陈墨; 郭唐仪; 练智超; 张德龙
Original assignee: Zhejiang Zhongshe Tianhe Technology Co ltd
Current assignee: Zhejiang Zhongshe Tianhe Technology Co ltd
Priority date: 2021-03-04
Filing date: 2021-03-04
Publication date: 2021-06-08
Anticipated expiration: 2041-03-04
Also published as: CN112925940B

Abstract

The invention is suitable for the technical field of computers, and provides a similar image retrieval method, a similar image retrieval device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring an image to be retrieved; processing an image to be retrieved according to a pre-trained unsupervised deep hash model, and determining a hash code of the image to be retrieved; the unsupervised deep hash model is generated based on a clustering algorithm and a deep hash algorithm in advance through iterative optimization training; the pseudo label determined by the clustering algorithm is used as an optimization target in the deep hash algorithm; and determining similar hash codes meeting a preset similar relation with the hash codes according to the hash codes of the image to be retrieved, and determining the corresponding image. In the method provided by the invention, when the unsupervised deep hash model faces massive images in the training process, the images are not required to be labeled one by one in advance, but pseudo label annotation is directly carried out on the images by using a clustering algorithm, so that the method has better applicability.

Description

Similar image retrieval method and device, computer equipment and storage medium

Technical Field

The invention belongs to the technical field of computers, and particularly relates to a similar image retrieval method and device, computer equipment and a storage medium.

Background

The retrieval process of the Hash image generally comprises two steps, wherein in the first step, feature extraction is carried out on a training data set to obtain high-dimensional feature vectors of the image, then the high-dimensional feature vectors are learned through a designed algorithm, and then a learned Hash function is obtained. And secondly, obtaining a hash code from the image in the database through feature extraction and mapping of a hash function, and storing the hash code in the database. When an image is used for retrieval, corresponding feature extraction is carried out in the same way, then a hash function is carried out to obtain a hash code of a corresponding query image, similarity calculation is carried out on the hash code and the hash code in the database to find out similar samples, and an image with the highest similarity is returned to the user.

However, in the above process, in the process of learning the high-dimensional feature vector to obtain the learned hash function, the image needs to be labeled in advance, however, in the case of a large number of images, the way of labeling the image one by one is obviously not reasonable, and if the hash function is not learned through labeling in advance, the obtained hash function cannot ensure that the hash code calculation result of the image can be well fitted with the similarity of the image.

Therefore, the hash function used in the conventional similar image retrieval process can be learned only by labeling the image in advance, and the applicability is poor in the scene of massive images.

Disclosure of Invention

The embodiment of the invention aims to provide a similar image retrieval method, and aims to solve the technical problems that a hash function used in the existing similar image retrieval process can be learned only by labeling an image in advance, and the applicability is poor in a scene with a large number of images.

The embodiment of the invention is realized in such a way that a similar image retrieval method comprises the following steps:

acquiring an image to be retrieved;

processing the image to be retrieved according to a pre-trained unsupervised deep hash model, and determining a hash code of the image to be retrieved; the unsupervised deep hash model is generated in advance based on a clustering algorithm and a deep hash algorithm iterative optimization training; the deep hash algorithm uses a pseudo label determined by a clustering algorithm as an optimization target;

determining a similar hash code meeting a preset similar relation with the hash code according to the hash code of the image to be retrieved, and determining an image corresponding to the similar hash code; and the image corresponding to the similar hash code is a similar image of the image to be retrieved.

Another object of an embodiment of the present invention is to provide a similar image retrieving apparatus, including:

the image to be retrieved acquiring unit is used for acquiring an image to be retrieved;

the Hash code determining unit is used for processing the image to be retrieved according to a pre-trained unsupervised deep Hash model and determining the Hash code of the image to be retrieved; the unsupervised deep hash model is generated in advance based on a clustering algorithm and a deep hash algorithm iterative optimization training; the deep hash algorithm uses a pseudo label determined by a clustering algorithm as an optimization target;

the similar image determining unit is used for determining a similar hash code which meets a preset similar relation with the hash code according to the hash code of the image to be retrieved and determining an image corresponding to the similar hash code; and the image corresponding to the similar hash code is a similar image of the image to be retrieved.

It is a further object of an embodiment of the present invention to provide a computer device, including a memory and a processor, wherein the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to execute the steps of the similar image retrieval method as described above.

It is another object of an embodiment of the present invention to provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, causes the processor to perform the steps of the similar image retrieval method as described above.

The invention provides a similar image retrieval method, which comprises the steps of processing an image to be retrieved according to a pre-trained unsupervised depth hash model after the image to be retrieved is obtained, determining a hash code of the image to be retrieved, determining a similar hash code according to the hash code and obtaining a corresponding similar image, wherein the unsupervised depth hash model is generated in advance based on a clustering algorithm and a depth hash algorithm iterative optimization training, and a pseudo label determined by the clustering algorithm is used as an optimization target instead of being manually labeled in the process of generating the unsupervised depth hash model by the iterative optimization training. When the unsupervised deep hash model used in the similar image retrieval method provided by the invention faces massive images in the training process, the images do not need to be labeled one by one in advance, but the images are subjected to pseudo label annotation directly by using a clustering algorithm, and the unsupervised deep hash model still has better applicability to the massive images on the premise of ensuring the effect of the unsupervised deep hash model.

Drawings

Fig. 1 is a flowchart illustrating steps of a similar image retrieval method according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating steps of a method for training a generation unsupervised deep hash model according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating steps for determining clustering pseudo-labels of sample images according to a K-Means clustering algorithm according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating steps for calculating a model total loss value according to a hash code of a sample image and a clustering pseudo label according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating steps of another method for training a generative unsupervised deep hash model according to an embodiment of the present invention;

FIG. 6 is a schematic diagram illustrating a visual comparison of clustering results of three algorithms provided by the present invention;

fig. 7 is a schematic structural diagram of a similar image retrieval apparatus according to an embodiment of the present invention;

fig. 8 is an internal structural diagram of a computer device for performing a similar image retrieval method according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides a method for obtaining an unsupervised deep hash model by mixed training of a deep hash network and a clustering algorithm to solve the problem that a hash function used in the process of searching similar images can be obtained by learning and training in a mode of labeling the labels of the images in advance in the prior art, and realizes similar searching of the images by using the unsupervised deep hash model obtained by the training method, wherein after the images are extracted to feature vectors of the images through a feature extraction layer of the deep hash network, on one hand, pseudo labels of all the images are determined through clustering by the clustering algorithm, on the other hand, the hash values of the images are further determined through a hash layer, the relation between the pseudo labels and the hash values is used as an optimization index to carry out iterative optimization on the whole deep hash network, and the deep hash network and the corresponding pseudo labels can be updated every time of optimization, after multiple rounds of iterative optimization, the hash value determined by the deep hash network of the image, the pseudo label obtained by the clustering algorithm and the image have high similarity, namely the more similar the images are, the more likely the pseudo labels are the same, and the closer the hash value is, namely the deep hash network, namely the unsupervised deep hash model obtained by training can be used for subsequent image retrieval processing.

As shown in fig. 1, a flowchart of steps of a similar image retrieval method provided in an embodiment of the present invention specifically includes the following steps:

and S102, acquiring an image to be retrieved.

In the embodiment of the present invention, the provided similar image retrieval method may be understood as a program, wherein an image to be retrieved is input into a retrieval frame of the program through uploading, downloading, or any other feasible method, so as to perform a subsequent retrieval process.

And step S104, processing the image to be retrieved according to a pre-trained unsupervised deep hash model, and determining the hash code of the image to be retrieved.

In the embodiment of the invention, an unsupervised deep hash model for processing an image to be retrieved is different from the prior art, wherein the unsupervised deep hash model is generated in advance based on a clustering algorithm and a deep hash algorithm iterative optimization training, and specifically, a pseudo label determined by the clustering algorithm is used as an optimization target in the deep hash algorithm.

In the embodiment of the present invention, for the unsupervised deep hash model, the following condition is satisfied: and the hash code values obtained by processing similar images through an unsupervised deep hash model are also similar.

In the embodiment of the present invention, please refer to fig. 2 and the description thereof for the specific steps of training the unsupervised deep hash model.

And S106, determining a similar hash code meeting a preset similar relation with the hash code according to the hash code of the image to be retrieved, and determining an image corresponding to the similar hash code.

In the embodiment of the invention, the image corresponding to the similar hash code is the similar image of the image to be retrieved.

In the embodiment of the invention, the database specifically comprises a plurality of images and hash codes thereof under the unsupervised deep hash model, and the images and the hash codes have corresponding association relation, so that similar images can be determined by retrieving the hash codes similar to the hash codes of the images to be retrieved.

In the embodiment of the invention, the idea of searching similar images by using hash codes belongs to the prior art, however, for different hash functions, namely, unsupervised deep hash models in the invention, the searching effect and the searching training process are different, the training process of the large hash function in the prior art depends on the label marking of images in advance, and the applicability to massive images is poor.

As shown in fig. 2, a flowchart of steps of a method for generating an unsupervised deep hash model for training provided in an embodiment of the present invention specifically includes the following steps:

and step S202, constructing an initialized unsupervised deep hash model.

In the embodiment of the invention, the unsupervised deep hash model sequentially comprises a feature extraction submodel and a hash calculation submodel.

In the embodiment of the invention, the feature extraction submodel generally consists of a plurality of groups of convolution layers and pooling layers, and a high-dimensional feature vector is output through a plurality of full-connection layers after a plurality of times of convolution and pooling processes. Typically, a 4096-dimensional vector is output.

In the embodiment of the present invention, the hash computation submodel adds a full connection layer after the output of the feature extraction submodel, and outputs a hash code with a specific length r (which can also be equivalently understood as a feature vector with each bit being 1 or-1).

In the embodiment of the present invention, as a feasible unsupervised deep hash model, the following is specifically constructed:

the first group consists of two convolution layers and a maximum pooling layer, and the size of 64 convolution kernels is 3 multiplied by 3;

the second group consists of two convolution layers and a maximum pooling layer, and the size of 128 convolution kernels is 3 multiplied by 3;

the third group consists of three convolution layers and a maximum pooling layer, and the size of 256 convolution kernels is 3 multiplied by 3;

the fourth group consists of three convolution layers and a maximum pooling layer, and the size of 512 convolution kernels is 3 multiplied by 3;

the fifth group consists of three convolution layers and a maximum pooling layer, and the size of 512 convolution kernels is 3 multiplied by 3;

the sixth group is a first complete connection layer, which is connected with the outputs in the fifth group, and the output number is 4096;

the seventh group is a second complete connection layer, which is connected with the complete connection layer of the sixth group, and the output number is 4096;

and the eighth group is a third complete connection layer, the complete connection layer of the seventh group is connected, the output number is r, and r is the length of the hash code.

The input is a 224 × 224 × 3 color image, the first seven groups (layers) form the feature extraction submodel, the output number of the seventh group (layer) forms a 4096 feature vector, and the fully connected layer of the eighth layer can be understood as a hash function, and the feature vector of 4096 is processed into a hash code with length r, that is, a feature vector with dimension r and each dimension value being 1 or-1.

In step S204, a plurality of sample images are acquired.

In the embodiment of the invention, the sample images are used for training the unsupervised deep hash model, and belong to the conventional technical means of the deep network model, the invention is not specifically explained, obviously, the more the sample images are, the better the unsupervised deep hash model effect obtained by training is, but the longer the training time is, and more time is consumed especially for carrying out the pseudo label labeling on the sample images by using the clustering algorithm.

As a preferred embodiment of the invention, after the unsupervised deep hash model is iteratively optimized each time, a small batch of images are randomly selected from a massive sample image pool to serve as sample images for next iteration optimization, and by means of a small number of sample sampling modes, training time is saved, especially time for carrying out pseudo label labeling on the sample images by using a clustering algorithm, and meanwhile, the effect of the unsupervised deep hash model obtained by training is ensured.

And step S206, processing the sample image according to the current feature extraction submodel, and determining the response feature vector of the sample image.

In the embodiment of the present invention, as can be known from the description in step S202, after the sample image of 224 × 224 × 3 is input into the unsupervised deep hash model, the value obtained by the seventh layer output is the response feature vector of the sample image.

And S208, clustering the response characteristic vectors of the sample images according to a K-Means clustering algorithm to determine a plurality of clustering centers, and determining clustering pseudo labels of the sample images.

In the embodiment of the invention, the response characteristic vectors of the sample images are clustered through a K-Means clustering algorithm to obtain a plurality of clustering centers, each clustering center corresponds to one clustering pseudo label, and the sample images corresponding to the response characteristic vectors belonging to the same clustering center are labeled with the same clustering pseudo labels.

In the embodiment of the present invention, please refer to fig. 3 and the contents of the explanation thereof for the specific steps of clustering the response feature vectors of the sample images according to the K-Means clustering algorithm to determine a plurality of clustering centers and determining the clustering pseudo labels of each sample image.

Step S210, processing the response characteristic vector of the sample image according to the current Hash calculation sub-model, and determining the Hash code of the sample image.

In the embodiment of the present invention, similarly, in combination with the step S202, after the response feature vector of the sample image (i.e., the seventh layer output) is processed and output again by the eighth layer, the hash code with the length r can be obtained, that is, the hash code of the sample image.

Step S212, determining the total loss value of the current unsupervised depth hash model according to the hash code of the sample image and the clustering pseudo label of the sample image.

In the embodiment of the invention, the hash codes of the sample images and the clustering pseudo labels of the sample images are subjected to fitting processing, namely, the hash codes of the sample images with the same clustering pseudo labels are similar as much as possible, while the hash codes of the sample images with different clustering pseudo labels are different as much as possible, and meanwhile, the output obtained by the calculation of the hash calculation sub-model is consistent with the real hash code as much as possible. Therefore, the total loss value of the current unsupervised depth hash model can be determined according to the hash codes of the sample images and the clustering pseudo labels of the sample images, and the total loss value can be determined by two aspects, wherein one aspect is to describe the degree that the hash codes of the sample images with the same clustering pseudo labels are similar as much as possible and the hash codes of the sample images with different clustering pseudo labels are different as much as possible, namely similarity loss, and the other aspect is to describe the distance between the hash codes of the sample images and-1/1, namely quantization loss. Reference may be made in detail to fig. 4 and its description that follows.

Step S214, determining whether a preset iterative optimization condition is satisfied. When it is determined that the preset iterative optimization condition is not satisfied, performing step S216; when it is determined that the preset iterative optimization condition is satisfied, step S218 is performed.

In the embodiment of the present invention, whether the preset iterative optimization condition is satisfied is generally determined by the number of iterations or the total loss value. For example, determining the number of iterations is enough to exceed a preset number, or determining whether the total loss value is below a preset threshold.

Step S216, updating the current feature extraction submodel and the hash calculation submodel based on the stochastic gradient descent algorithm, and returning to the step S206.

In the embodiment of the present invention, the current feature extraction submodel and the hash calculation submodel may be updated based on the stochastic gradient descent algorithm and by using the total loss value, so as to further reduce the total loss value, and then after the model parameters are updated, the process returns to step S206 again to perform the next iteration, that is, labeling the clustering pseudo labels and calculating the loss value again until the preset optimization condition is satisfied.

Step S218, determining an unsupervised deep Hash model according to the current feature extraction submodel and the Hash calculation submodel.

In the embodiment of the invention, after the iterative training is finished, the feature extraction submodel and the Hash calculation submodel are combined to form the unsupervised deep Hash model, and the unsupervised deep Hash model is generated based on the clustering algorithm and the deep Hash algorithm in advance through iterative optimization training. The unsupervised deep hash model meets the requirement of determining the similarity by using the hash value.

As shown in fig. 3, a flowchart of the step of determining the clustering pseudo label of the sample image according to the K-Means clustering algorithm provided in the embodiment of the present invention specifically includes the following steps:

in step S302, a plurality of initialized cluster centers are randomly generated.

In the embodiment of the invention, d clustering centers are randomly generated firstly, and then all the response characteristic vectors are sequentially distributed to the corresponding clustering centers.

Step S304, obtaining the response characteristic vector to be distributed, and determining the distance between the response characteristic vector and each current clustering center.

In the embodiment of the present invention, it is considered that each response feature vector is sequentially allocated to a corresponding cluster center, and therefore, each time a single response feature vector is obtained for allocation, the cluster center is gradually adjusted along with the allocation of the response feature vector, and therefore, the distance from the response feature vector to be allocated to each current cluster center needs to be calculated, and the distance is the euclidean distance. The calculation of the above-mentioned distance is a routine technical means for those skilled in the art, and the present invention is not specifically described herein.

And S306, sequentially distributing the response characteristic vectors to the nearest cluster centers according to the distance from the response characteristic vectors to each current cluster center, and updating the corresponding cluster centers.

In the embodiment of the invention, the response characteristic vector is distributed to the nearest cluster center, and meanwhile, the response characteristic vector is distributed, and the average value of the response characteristic vector is calculated according to the response characteristic vector of the subordinate cluster center so as to determine a new cluster center, thereby realizing the updating of the cluster center.

Step S308, determine whether there is an unassigned response feature vector. When the judgment is yes, returning to the step S304; when it is judged that there is no, step S310 is performed.

In an embodiment of the present invention, the foregoing steps are repeated until all response feature vectors have been assigned.

Step S310, determining a clustering pseudo label of the sample image corresponding to the response characteristic vector according to the clustering center to which the response characteristic vector belongs.

In the embodiment of the present invention, at this time, the clustering pseudo label of the sample image corresponding to the response feature vector is determined according to the clustering center to which the response feature vector belongs, that is, the sample image-response feature vector, the response feature vector-clustering center and the clustering center-clustering pseudo label all have a corresponding relationship, so as to obtain a corresponding relationship between the sample image and the clustering pseudo label.

As shown in fig. 4, a flowchart of a step of calculating a total model loss value according to a hash code of a sample image and a clustering pseudo label provided in an embodiment of the present invention specifically includes the following steps:

and step S402, determining the similarity loss of the current unsupervised depth hash model according to the hash code of the sample image and the clustering pseudo label of the sample image.

In the embodiment of the present invention, the similarity loss describes a correlation between the similarity between the hash codes of the sample images and the consistency of the clustering pseudo labels of the sample images, and specifically, the correlation can be solved by the following formula:

wherein, J_sFor similarity loss, h (x)_i)、h(x_j) Hash codes of the ith sample image and the jth sample image in the n sample images respectively, wherein the length of the hash code is r, S_i,jThe value of (D) is related to whether the clustering pseudo labels of the ith and jth sample images are the same or not, and when the clustering pseudo labels of the ith and jth sample images are the same, S is_i,jTaking 1, when clustering pseudo labels of the ith sample image and the jth sample image are different, S_i,jTaking-1.

In the embodiment of the present invention, considering that the value of each numerical value in the hash code is 1 or-1, it is obvious that when the numerical values of the same bit of the two hash codes are the same, the product is 1, and when the numerical values of the same bit of the two hash codes are different, the product is-1, so that the sum of the products of the corresponding bits of the two hash codes can describe how many bits are the same in the hash value, obviously, when the two hash codes are completely the same, the product of each bit is 1, and then h (x) is obtained (x is_i)^Th(x_j) The upper limit of (d) is the number r of bits of the hash code, and when two hash codes are completely different, the product of each bit is-1, then h (x)_i)^Th(x_j) The lower limit of (d) is defined as-r, it is understood that

Is of the value [ -1,1 [)]And the value may describe h (x)_i)、h(x_j) Similarity of (c), h (x)_i)、h(x_j) The more similar, then

The closer to 1 the value of (A) is, and vice versaNear-1, further, the more similar h (x)_i)、h(x_j) When the clustering pseudo labels are the same, then

The closer to 0, the more different the clustering pseudo label, the value of which is close to-2, and the same, the more dissimilar h (x)_i)、h(x_j) When the clustering pseudo labels are not the same,

the closer to 0, the closer to 2, the more. Therefore, the square sum of absolute values can be used to describe the similarity between hash codes and the relevance of clustering pseudo label consistency on the whole, when J_sThe smaller the size, the higher the correlation degree between the similarity between the hash codes and the consistency of the clustered pseudo labels, i.e. the clustered pseudo labels of the similar hash codes are the same, and the clustered pseudo labels of the dissimilar hash codes are different.

And step S404, determining the quantization loss of the current unsupervised depth hash model according to the hash code of the sample image.

In the embodiment of the present invention, the quantization loss describes a distance between a hash code of the sample image and-1/1, and a specific calculation formula thereof is as follows:

wherein, J_hTo quantify the loss, h (x)_i) For the hash code of the i-th sample image among the n sample images, sgn () is a sign function.

In the embodiment of the present invention, it should be noted that it is obviously difficult to implement the hash computation submodel that is directly trained to obtain the standard hash code that can output a result that each dimension is 1 or-1, so that the hash computation submodel that makes the output result of each dimension as close to 1 or-1 as possible is usually trained as much as possible, and then a cutoff function is added in the subsequent process to ensure that the final output is the standard hash code that each dimension is 1 or-1. In particular, it is implemented by using symbolic functionHowever, h (x)_i) The closer to 1 or-1 in each dimension, the more J_hThe smaller.

And step S406, determining the total loss value of the current unsupervised depth hash model according to the similarity loss of the unsupervised depth hash model and the quantization loss of the unsupervised depth hash model.

In an embodiment of the invention, the total loss function is expressed as follows:

where λ is a preset weight describing the degree to which the quantization loss value is considered.

Fig. 5 is a flowchart illustrating steps of another method for generating an unsupervised deep hash model for training according to an embodiment of the present invention, which is described in detail below.

In the embodiment of the present invention, a difference from the step flowchart of the method for training and generating an unsupervised deep hash model shown in fig. 2 is that the step S216 specifically includes:

step S502, updating the current feature extraction submodel and the Hash calculation submodel based on the stochastic gradient descent algorithm, and returning to the step S204.

In the embodiment of the invention, after the unsupervised deep hash model is iteratively optimized each time, a small batch of images are randomly selected from a massive sample image pool to be used as sample images for next iteration optimization, and the training time is saved by a small quantity of multiple sample sampling modes, particularly the time for carrying out pseudo label labeling on the sample images by using a clustering algorithm, and the effect of the unsupervised deep hash model obtained by training is ensured.

In order to facilitate understanding of the effect difference of the unsupervised deep hash model provided by the invention in image similarity retrieval compared with the image hash model obtained by training of the existing algorithm, the image hash model obtained by training of various algorithms and the unsupervised deep hash model obtained by the invention are tested on two sets of data sets CIFAR-10 and MIRFLICKR, and the specific test results are shown in the following table 1:

the invention does not describe the essence of each algorithm specifically, but the algorithm of the invention is abbreviated as IOPH, namely the last line, and the penultimate line, namely the IOPH _ N algorithm, is a process without iterative optimization on the basis of the algorithm, namely an algorithm which directly uses the pre-trained features to cluster to generate pseudo labels to train, and the effect of the cyclic optimization process can be determined through the IOPH _ N and the IOPH.

It can be seen from the table that the conventional hash algorithm obtained better MAP results than GIST feature using VGG16 feature, PCAH obtained 21.1%, 19.4% and 19% improvement per bit, SH obtained 20.1%, 20.1% and 21.1% improvement per bit on CIFAR-10 data set, PCAH also obtained an average 12.6% improvement and SH obtained an average 9.97% improvement on MIRFLICKR data set. Compared with the traditional hash algorithm, IOPH obtains a larger promotion, and the promotion of IOPH and the optimal traditional hash result on two data sets is respectively 9.77% and 11%. The CUDH also obtains good effect by using KL divergence to optimize the clustering center to generate the hash code, but the CUDH utilizes the extracted features to train, does not start from images, does not really achieve end-to-end training, and is slightly worse than the MAP result. When the IOPH is longer in the length of the hash code, the MAP result is correspondingly improved, which shows that the longer hash code can contain more discrimination information.

As can be seen from the data results for IOPH _ N and IOPH, the iteration effect is more obvious in the single-label data set CIFAR-10, and is correspondingly improved by 1.8%, 4%, and 5.5% in different bits, and is improved by 0.7%, 0.4%, and 0.8% in three bits in the multi-label data set MIRFLICKR, which may be associated with multiple labels for each image in the MIRFLICKR data set, resulting in that the effect of updating the pseudo label is not obvious to the improvement effect, and the effect of the improvement is more obvious in the single-label data set.

Further, as shown in fig. 6, a schematic diagram for visualization comparison of the clustering result of the three algorithms provided by the embodiment of the present invention is detailed as follows.

The three images shown in fig. 6 respectively correspond to the scheme IOPH of the present invention, the visual schematic diagram of the clustering result of the comparison scheme IOPH _ N and the conventional scheme DDh, and the comparison between the IOPH _ N algorithm and the IOPH algorithm can show that the iterative IOPH algorithm reduces the distance within the class and increases the distance between the classes, and the points of the IOPH _ N in the same class are relatively dispersed. Compared with an IOPH algorithm and a DDH algorithm, the IOPH algorithm distinguishes different types compared with the DDH algorithm, so that the points of the same type are more compact. From these two points, it can be seen that the iteration has a significant effect on the optimization of the result.

As shown in fig. 7, a schematic structural diagram of a similar image retrieving apparatus according to an embodiment of the present invention specifically includes the following units.

And an image to be retrieved obtaining unit 710, configured to obtain an image to be retrieved.

And the hash code determining unit 720 is configured to process the image to be retrieved according to a pre-trained unsupervised deep hash model, and determine a hash code of the image to be retrieved.

And the similar image determining unit 730 is configured to determine, according to the hash code of the image to be retrieved, a similar hash code that satisfies a preset similar relationship with the hash code, and determine an image corresponding to the similar hash code.

According to the similar image retrieval device provided by the invention, after an image to be retrieved is obtained, the image to be retrieved is processed according to a pre-trained unsupervised depth hash model, a hash code of the image to be retrieved is determined, a similar hash code is determined according to the hash code, and a corresponding similar image is obtained, wherein the unsupervised depth hash model is generated in advance based on a clustering algorithm and a depth hash algorithm iterative optimization training, and a pseudo label determined by the clustering algorithm is used as an optimization target without manual labeling in the process of generating the unsupervised depth hash model by the iterative optimization training. When the unsupervised deep hash model used in the similar image retrieval device provided by the invention faces massive images in the training process, the images do not need to be labeled one by one in advance, but the images are subjected to pseudo label annotation directly by using a clustering algorithm, and the unsupervised deep hash model still has better applicability to the massive images on the premise of ensuring the effect of the unsupervised deep hash model.

FIG. 8 is a diagram illustrating an internal structure of a computer device in one embodiment. As shown in fig. 8, the computer apparatus includes a processor, a memory, a network interface, an input device, and a display screen connected through a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement the similar image retrieval method. The internal memory may also have a computer program stored therein, which when executed by the processor, causes the processor to perform a similar image retrieval method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, the similar image retrieval apparatus provided in the present application may be implemented in the form of a computer program, and the computer program may be run on a computer device as shown in fig. 8. The memory of the computer device may store therein various program modules constituting the similar image retrieval apparatus, such as the image-to-be-retrieved acquisition unit 710, the hash code determination unit 720, and the similar image determination unit 730 shown in fig. 7. The computer program constituted by the respective program modules causes the processor to execute the steps in the similar image retrieval method of the respective embodiments of the present application described in the present specification.

For example, the computer device shown in fig. 8 may execute step S102 by the image-to-be-retrieved acquisition unit 710 in the similar image retrieval apparatus shown in fig. 7. The computer device may perform step S104 by the image-to-be-retrieved acquiring unit 720. The computer device may perform step S106 through the similar image determining unit 730.

In one embodiment, a computer device is proposed, the computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

acquiring an image to be retrieved;

In one embodiment, a computer readable storage medium is provided, having a computer program stored thereon, which, when executed by a processor, causes the processor to perform the steps of:

acquiring an image to be retrieved;

It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in various embodiments may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A method for retrieving similar images, the method comprising:

acquiring an image to be retrieved;

2. The similar image retrieval method according to claim 1, wherein the step of generating the unsupervised deep hash model based on a clustering algorithm and a deep hash algorithm iterative optimization training in advance specifically comprises:

constructing an initialized unsupervised deep hash model; the unsupervised deep Hash model consists of a feature extraction submodel and a Hash calculation submodel in sequence;

acquiring a plurality of sample images;

processing the sample image according to the current feature extraction submodel, and determining a response feature vector of the sample image;

clustering the response characteristic vectors of the sample images according to a K-Means clustering algorithm to determine a plurality of clustering centers and determine clustering pseudo labels of the sample images;

processing the response characteristic vector of the sample image according to the current Hash calculation submodel to determine a Hash code of the sample image;

determining the total loss value of the current unsupervised depth hash model according to the hash code of the sample image and the clustering pseudo label of the sample image;

judging whether a preset iterative optimization condition is met;

when the preset iterative optimization condition is judged not to be met, updating the current feature extraction submodel and the Hash calculation submodel based on a random gradient descent algorithm, and returning to the step of processing the sample image according to the current feature extraction submodel to determine the response feature vector of the sample image;

when the judgment result meets the preset iterative optimization condition, determining an unsupervised deep Hash model according to the current feature extraction submodel and the Hash calculation submodel; the unsupervised deep hash model is generated based on a clustering algorithm and a deep hash algorithm in advance through iterative optimization training.

3. The similar image retrieval method according to claim 2, wherein the step of clustering the response feature vectors of the sample images according to a K-Means clustering algorithm to determine a plurality of clustering centers and determining the clustering pseudo labels of each sample image specifically comprises:

randomly generating a plurality of initialized clustering centers;

acquiring response characteristic vectors to be distributed, and determining the distance between the response characteristic vectors and the current clustering centers;

according to the distance from the response characteristic vector to each current clustering center, sequentially distributing the response characteristic vector to the clustering center closest to the current clustering center, and updating the corresponding clustering center;

judging whether unallocated response characteristic vectors exist or not;

when the judgment is yes, returning to the step of obtaining the response characteristic vector to be distributed and determining the distance between the response characteristic vector and each current clustering center;

and when the judgment result does not exist, determining the clustering pseudo label of the sample image corresponding to the response characteristic vector according to the clustering center to which the response characteristic vector belongs.

4. The similar image retrieval method according to claim 2, wherein the step of determining the total loss value of the current unsupervised deep hash model according to the hash code of the sample image and the clustering pseudo label of the sample image specifically comprises:

determining the similarity loss of the current unsupervised depth hash model according to the hash code of the sample image and the clustering pseudo label of the sample image; the similarity loss describes the relevance of the similarity between the hash codes of the sample images and the consistency of clustering pseudo labels of the sample images;

determining the quantization loss of the current unsupervised depth hash model according to the hash code of the sample image; the quantization loss describes a distance between a hash code of the sample image and-1/1;

and determining the total loss value of the current unsupervised depth hash model according to the similarity loss of the unsupervised depth hash model and the quantization loss of the unsupervised depth hash model.

5. The similar image retrieval method according to claim 4, wherein in the step of determining the similarity loss of the current unsupervised deep hash model according to the hash code of the sample image and the clustering pseudo label of the sample image, a specific calculation formula of the similarity loss is as follows:

wherein, J_sFor similarity loss, h (x)_i)、h(x_j) Hash codes of the ith sample image and the jth sample image in the n sample images respectively, wherein the length of the hash code is r, S_i,jIs related to whether the clustering pseudo labels of the ith and jth sample images are the same.

6. The similar image retrieval method according to claim 4, wherein in the step of determining the quantization loss of the current unsupervised depth hash model according to the hash code of the sample image, a specific calculation formula of the quantization loss is as follows:

7. The similar image retrieval method of claim 2, wherein the step of updating the current feature extraction submodel and the hash computation submodel based on the stochastic gradient descent algorithm, returning to the step of processing the sample image according to the current feature extraction submodel, and determining the response feature vector of the sample image is specifically a step of determining the response feature vector of the sample image

Updating the current feature extraction submodel and the Hash calculation submodel based on a random gradient descent algorithm, and returning to the step of obtaining a plurality of sample images; the plurality of sample images acquired are small batches of sample images randomly decimated from among a set of sample images.

8. A similar image retrieval apparatus, comprising:

9. A computer device comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to carry out the steps of the similar image retrieval method as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, causes the processor to carry out the steps of the similar image retrieval method as claimed in any one of claims 1 to 7.