CN109829065B

CN109829065B - Image retrieval method, device, equipment and computer readable storage medium

Info

Publication number: CN109829065B
Application number: CN201910174727.4A
Authority: CN
Inventors: 张莉; 陆鋆; 王邦军; 周伟达
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2019-03-08
Filing date: 2019-03-08
Publication date: 2023-08-18
Anticipated expiration: 2039-03-08
Also published as: CN109829065A

Abstract

The embodiment of the application discloses an image retrieval method, an image retrieval device, image retrieval equipment and a computer readable storage medium. Firstly, constructing a double-row convolution hash mapping model by utilizing two convolution neural networks with different convolution layers in parallel, wherein the number of pooling layers of the first convolution neural network and the second convolution neural network, the size of a pooling window and the step length are the same; the model comprises a first full-connection layer formed by connecting outputs of a first convolutional neural network and a second convolutional neural network in parallel, and a second full-connection layer serving as a hash coding layer. Mapping the image to be searched into a hash code to be searched by using a double-row convolution hash mapping model; searching a target image which meets the preset condition with the Hamming distance difference value of the hash code to be searched in a hash code library to be used as a search result of the image to be searched in an image database; the hash coding library is obtained by mapping each image in the image database through a double-row convolution hash mapping model. The application improves the accuracy of image retrieval.

Description

Image retrieval method, device, equipment and computer readable storage medium

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to an image retrieval method, an image retrieval device, image retrieval equipment and a computer readable storage medium.

Background

In recent years, with further popularization of the internet and deep application of big data technology, hundreds of millions of images are generated every day. The centralization and increase in the size of image data resources makes it increasingly difficult for the prior art to meet the needs of user image retrieval. Therefore, how to effectively describe the feature information of the image, and what data structure is adopted to perform efficient indexing and quick similarity retrieval become a research hotspot in this direction.

In view of the nature of binary codes that are easy to compare and store, which can greatly increase the speed of similarity retrieval and save more computer resources, images are generally mapped into binary codes when image retrieval is performed.

The advent of deep learning has motivated the development of computer vision and has also provided a more efficient tool for learning hash mapping methods. In the related art, a neural network model is used for mapping hash codes, a deep learning model is trained by using an image mode, and parameters of the model are constrained through a loss function, so that a better result is obtained.

However, the accuracy of image retrieval by the method is not high, and particularly, the method can not meet the high requirement of users on image retrieval accuracy aiming at comparing similar images. In view of this, how to improve the accuracy of image retrieval is a problem to be solved by those skilled in the art.

Disclosure of Invention

The embodiment of the disclosure provides an image retrieval method, an image retrieval device, image retrieval equipment and a computer readable storage medium, and the accuracy of image retrieval is improved.

In order to solve the technical problems, the embodiment of the application provides the following technical scheme:

in one aspect, an embodiment of the present application provides an image retrieval method, including:

mapping the image to be searched into hash codes to be searched by utilizing a pre-constructed double-row convolution hash mapping model;

searching a target image which meets the preset condition with the Hamming distance difference value of the hash code to be searched in a pre-constructed hash code library to be used as a search result of the image to be searched in an image database;

the double-row convolution hash mapping model is formed by combining two convolution neural networks with different convolution layers, and comprises a first full-connection layer formed by parallelly connecting outputs of a first convolution neural network and a second full-connection layer serving as a hash coding layer; the number of pooling layers, the size of pooling windows and the step size of the pooling window of the first convolutional neural network and the second convolutional neural network are the same; the hash coding library is obtained by mapping each image in the image database through the double-row convolution hash mapping model.

Optionally, the training process of the double-row convolution hash mapping model includes:

taking an image pair in the image database as input;

if the label categories of the image pairs are similar, the distance between hash code pairs obtained by mapping the image pairs is taken as a loss value; if the label categories of the image pairs are dissimilar, taking the distance and the interval between hash code pairs obtained by mapping the image pairs as loss values; the label category is used for identifying the similarity of two images in the image pair;

and optimizing the loss value by adopting a machine learning optimization algorithm to train the double-row convolution hash mapping model.

Optionally, the loss value of the double-row convolution hash mapping model is:

wherein Loss is the Loss value, the ith image and the jth image form the image pair, n is the total number of images in the image database, o _i Hash encoding for ith image, o _j Hash encoding for the j-th image, ||o _i -o _j || ₂ Distance between hash code pairs, m is interval of hash code pairs obtained by mapping the image pairs, alpha is super parameter, y _i,j For label category, y _i,j =1 indicates that the ith image is similar to the jth image, y _i,j =0 indicates that the i-th image is dissimilar to the j-th image.

Optionally, optimizing the loss value using a machine learning optimization algorithm to train the double-row convolutional hash map model is optimizing the loss value using a random gradient descent to train the double-row convolutional hash map model.

Optionally, the generating process of the hash coding library is as follows:

inputting each image in the image database into the double-row convolution hash mapping model, and mapping the output of a hash coding layer of the double-row convolution hash mapping model into hash codes by setting a threshold value;

generating the hash code library according to the hash code of each image;

wherein the mth bit code of the ith image of the image databaseThe formula can be as follows:

in the method, in the process of the application,and (3) outputting the m-th bit of the ith image at the hash coding layer, wherein θ is the threshold value.

Optionally, the searching for the target image that meets the preset condition with the hamming distance difference value of the hash code to be searched in the hash code library constructed in advance is as follows:

searching a front T image with the minimum Hamming distance value with the hash code to be searched in the hash code library;

sorting the T images according to the difference value of the Hamming distance from the Hamming code to be searched from small to large;

and outputting the ordered T images.

Optionally, the first convolutional neural network is a VGG-16 network model with a convolutional layer of 14 layers, the second convolutional neural network is a VGG-16 network model with a convolutional layer of 5 layers, and pooling windows of the first convolutional neural network and the second convolutional neural network are 2×2 and the step size is 1.

Another aspect of the embodiment of the present application provides an image retrieval apparatus, including:

the hash code generation module is used for mapping the image to be searched into a hash code to be searched by utilizing a pre-constructed double-row convolution hash mapping model; the double-row convolution hash mapping model is formed by combining two convolution neural networks with different convolution layers, and comprises a first full-connection layer formed by parallelly connecting outputs of a first convolution neural network and a second full-connection layer serving as a hash coding layer; the number of pooling layers, the size of pooling windows and the step size of the pooling window of the first convolutional neural network and the second convolutional neural network are the same;

the image retrieval module is used for searching a target image which meets the preset condition with the Hamming distance difference value of the Hamming code to be retrieved in a pre-constructed Hamming code library to be used as a retrieval result of the image to be retrieved in an image database; the hash coding library is obtained by mapping each image in the image database through the double-row convolution hash mapping model.

The embodiment of the application also provides an image retrieval device comprising a processor for implementing the steps of the image retrieval method according to any one of the preceding claims when executing a computer program stored in a memory.

Finally, an embodiment of the present application provides a computer-readable storage medium, on which an image retrieval program is stored, which when executed by a processor implements the steps of the image retrieval method according to any one of the preceding claims.

The technical scheme provided by the application has the advantages that the first convolutional neural network model and the second convolutional neural network model are connected in parallel, and the hash coding layer is added to construct the double-row convolutional hash mapping model, the convolutional neural network model with a large number of convolutional layers is utilized to identify high-level semantic features, the neural network with a small number of convolutional layers is utilized to distinguish low-level features such as shapes and textures, and the two models are overlapped and connected to enhance the expression capability of the features on an image, so that the discrimination capability of binary coding is improved, the expression capability of hash coding generated by image mapping is enhanced, and the accuracy of large-scale image retrieval is improved.

In addition, the embodiment of the application also provides a corresponding implementation device, equipment and a computer readable storage medium for the image retrieval method, so that the method is more practical, and the device, equipment and computer readable storage medium have corresponding advantages.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions of the related art, the drawings that are required to be used in the embodiments or the description of the related art will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort to those of ordinary skill in the art.

Fig. 1 is a schematic flow chart of an image retrieval method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a convolutional neural network model according to an exemplary embodiment of the present disclosure;

FIG. 3 is a schematic diagram of another convolutional neural network model shown in accordance with an exemplary embodiment of the present disclosure;

FIG. 4 is a diagram of a training process for a double-row convolutional hash map model, according to an exemplary embodiment of the present disclosure;

FIG. 5 is a block diagram of an embodiment of an image retrieval device according to the present application;

fig. 6 is a block diagram of another embodiment of an image retrieval device according to an embodiment of the present application.

Detailed Description

In order to better understand the aspects of the present application, the present application will be described in further detail with reference to the accompanying drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims and drawings are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may include other steps or elements not expressly listed.

Having described the technical solutions of embodiments of the present application, various non-limiting embodiments of the present application are described in detail below.

Referring first to fig. 1, fig. 1 is a flowchart of an image retrieval method according to an embodiment of the present application, where the embodiment of the present application may include the following:

s101: and mapping the image to be searched into hash codes to be searched by using a pre-constructed double-row convolution hash mapping model.

Based on the deep convolutional neural network model, the high-level semantic features can be identified, the shallow neural network can be used for distinguishing low-level features such as shapes and textures, and if the two models are overlapped and connected, the expression capacity of the features on the image can be enhanced, so that the distinguishing capacity of binary codes is improved. In view of this, a double-row convolutional neural network model can be constructed by connecting a deep convolutional neural network model (a first convolutional neural network) and a shallow convolutional neural network model (a second convolutional neural network) with different convolutional layers in parallel, then learning based on an image database to obtain the parameter weight of the double-row convolutional network structure, finally training by using an image pair mode to obtain a hash mapping model, and mapping the image into a hash code, thereby completing the construction of the double-row convolutional hash mapping model.

And combining the two convolutional neural networks in parallel to form a double-row convolutional hash mapping model, wherein the model comprises a first full-connection layer formed by the output of the first convolutional neural network and the output of the second convolutional neural network in parallel and a second full-connection layer serving as a hash coding layer. The number of nodes is the hash code length, and the hash code layer can be generated by adding a full connection layer before the last layer of the double-row convolutional neural network, and the hash code layer can be generated by referring to any implementation process recorded in the related technology, and the description is omitted here.

The first convolutional neural network and the second convolutional neural network may adopt any convolutional neural network structure, for example, the convolutional neural network structure may be a VGG-16 network model, and the VGG-16 network model improves the final performance of the whole convolutional neural network by increasing the network depth. Of course, other convolutional network structures are possible, none of which image the implementation of the present application.

The first convolutional neural network and the second convolutional neural network can distinguish model depths by adding or deleting convolutional layers, the more the convolutional layers are, the higher the model depths are, the higher the convolutional layers are, the higher-level semantic features are identified by using the convolutional neural network models with the more convolutional layers, and the neural networks with the less convolutional layers distinguish the features such as low-level shapes, textures and the like. For example, as shown in fig. 3 and 2, the first convolutional neural network may be a network structure generated by taking the first 14 layers of the VGG-16 network model, i.e., the convolutional layer is 14 layers, and the second convolutional neural network may be a network structure generated by subtracting the convolutional layer of the VGG-16 network model, i.e., the VGG-16 network model with the convolutional layer being 5 layers.

In order to ensure that the first convolutional neural network and the second convolutional neural network can be connected in parallel at the output layer, the number of pooling layers of the first convolutional neural network and the second convolutional neural network, the size of a pooling window and the step length are the same. For example, the pooling windows of the first convolutional neural network and the second convolutional neural network may be set to 2×2, and the step size may be set to 1.

By do _i Representing the output of an image on a first convolutional neural network, do _i The dimensions of the vector may be expressed as (dw, dh, dd). By so _i Representing the output of the image on a second convolutional neural network, so _i The dimensions of the vector may be expressed as (sw, sh, sd). For the same image, the outputs of the two convolutional neural networks are respectively do _i And so _i And dw=sw, dh=sh, thus do _i And so _i Can be connected together in the last dimension to form a vector of dimensions (dw, dh, dd+sd), denoted mo _i . The double-row convolutional hash map model contains two fully connected layers, mo for the input image _i For the input of the first full-connection layer, the output of the second full-connection layer (Hash coding layer) is o _i 。o _i For generating a hash code, whereby the length k and o of the hash code _i Is equal to the number of hash coding layer nodes.

The training process of the double-row convolutional hash map model can be described as follows, please refer to fig. 4:

the image pairs in the image database may be input. A random image pair in the image database can be used as the input of a network structure, any two images in the image database randomly form an image pair, each image pair is provided with a label category, the label category is used for identifying the similarity of the two images forming the image pair, and if the two images are images of the same category, the label categories of the image pair are similar; if the two images are not images of the same class, the label classes of the image pairs are dissimilar. For example, images in an image database are represented as the set x= { X ₁ ,x ₂ ,…,x _n The image pairs of n images in the image database, i.e., the ith image and the jth image, can be expressed as (x) _i ,x _j ) By the symbol y _i,j To represent the label of the image, y _i,j =1 indicates that the two images are similar, y _i,j =0 indicates that the two images are dissimilar.

If the label types of the image pairs are similar, the distance between hash code pairs obtained by mapping the image pairs is taken as a loss value; if the label categories of the image pairs are dissimilar, the distance and the interval between the hash code pairs obtained by mapping the image pairs are taken as loss values. That is, the loss function of the double-row convolutional hash map model is determined by the class of the image pair and binarization constraints, and for an image database containing n images, the loss value of the double-row convolutional hash map model can be expressed as:

in the formula, loss is Loss value, the ith image and the jth image form an image pair, n is the total number of images in an image database, o _i Hash encoding for ith image, o _j Hash encoding for the j-th image, ||o _i -o _j || ₂ For the distance between hash code pairs, m is the image pairMapping the interval of hash code pairs, alpha is super parameter, y _i,j For label category, y _i,j =1 indicates that the ith image is similar to the jth image, y _i,j =0 indicates that the i-th image is dissimilar to the j-th image.

And then training the network structure by adopting random gradient descent optimization loss values to obtain model weights so as to complete training of the double-row convolution hash mapping model. Of course, other optimization algorithms may be employed, and the application is not limited in any way.

The output of the double-row convolution hash mapping model in the hash coding layer can be adoptedTo indicate that it is mapped to hash codes by setting a threshold.

And inputting the image to be searched into a double-row convolution hash mapping model, and mapping the hash coding layer output of the model into the hash coding of the image to be searched based on a threshold value, namely the hash coding to be searched.

S102: searching a target image which meets the preset condition with the Hamming distance difference value of the hash code to be searched in a hash code library constructed in advance, and taking the target image as a search result of the image to be searched in an image database.

The image database is a database for retrieving images similar to or identical to the image to be retrieved, and the database contains a large number of images.

The hash coding library corresponds to the image database, each hash code contained in the hash coding library corresponds to an image in the image database uniquely, each image in the image database is input into a double-row convolution hash mapping model, the output of a hash coding layer of the double-row convolution hash mapping model is mapped into a corresponding hash code through setting a threshold value, and then the hash coding library is generated according to the hash code of each image. Mth encoding of ith image of image databaseThe formula can be as follows:

in the method, in the process of the application,for the output of the ith bit of the ith image at the hash coding layer, θ is a threshold.

The method comprises the steps of searching images similar to the images to be searched in an image database, namely target images, determining according to Hamming distances between Hamming codes of the images to be searched and Hamming codes of all the images in the image database, setting preset conditions according to Hamming distance difference values, the number of images output by the target images and the total number of images in the image database, for example, in a specific embodiment, searching front T images with the minimum Hamming distance value with the Hamming codes to be searched in the Hamming code database, wherein the size of the T values can be determined according to the total number of images in the image database and actual requirements of users, and the implementation of the method is not affected. For example, when T is 2, the hamming distance difference between the hash codes to be retrieved and the hash codes of the images in the image database is calculated, and then the image corresponding to the smallest difference and the next smallest difference is selected from the 10 hamming distance differences.

In order to facilitate the output of similar images, the T images may be sorted from small to large according to the difference in hamming distance from the hash code to be retrieved, and then the sorted T images may be output. Of course, the T images may also be ordered from large to small according to the difference in hamming distance from the hash code to be retrieved, which does not image the implementation of the present application.

In the technical scheme provided by the embodiment of the application, the first convolutional neural network model and the second convolutional neural network model are connected in parallel, a hash coding layer is added to construct a double-row convolutional hash mapping model, the convolutional neural network model with a large number of convolutional layers is utilized to identify high-level semantic features, the neural network with a small number of convolutional layers is utilized to distinguish low-level features such as shapes and textures, and the two models are overlapped and connected to enhance the expression capability of the features on an image, so that the discrimination capability of binary coding is improved, and the expression capability of hash coding generated by image mapping is enhanced, so that the accuracy of large-scale image retrieval is improved.

In order to make the technical solution of the present application more clear for those skilled in the art, the present application also provides an illustrative example, and the CIFAR-10 data set is used as an image database for testing. The CIFAR-10 dataset contains 60000 color images of 32 x 3, and 10 categories are included, which can include the following:

building a double-row convolution hash mapping model:

the images in the image database may be represented as x= { X ₁ ,x ₂ ,…,x _n Where n is the number of categories of the image database. The first 50000 CIFAR-10 images are taken as training set, so n=50000 and the rest 10000 images form X _test 。

And constructing a double-row convolution hash mapping model, wherein the model consists of a deep convolution neural network and a shallow neural network. The deep convolutional neural network and the shallow convolutional neural network are connected in parallel to serve as the input of a full-connection layer, the end-to-end construction of the double-row convolutional hash mapping model is completed, and the model is marked as CNN _M 。

The deep neural network is selected from the first 14 layers of the VGG-16 model, and the deeper network can learn high-level semantic features better. The shallow neural network reduces the depth of the model by deleting some convolution layers of VGG-16, and can be used for learning low-level shape, texture and other characteristics.

do _i Representing an input image x _i Output on deep neural network, for CIFAR-10, the picture size is (32,32,3), thus do _i Is (1,1,512).

so _i Represents x _i Output on the shallow neural network, for CIFAR-10, the picture size is (32,32,3), so _i Is (1,1,512).

Image X in dataset X _i The outputs on the two neural networks are do respectively _i And so _i And dw=sw=1, dh=sh=1, thus do _i And so _i Can be used forJoined together in the last dimension to form mo _i The dimension is (1,1,1024).

CNN _M There are 2 total layers of full link layers for image x _i ，mo _i For the input of the first full-connection layer, the output of the second full-connection layer (Hash coding layer) is o _i 。o _i For generating a hash code, whereby the length k and o of the hash code _i Is equal to the number of hash coding layer nodes where k=12.

Training a double-row convolution hash mapping model:

for the image database X, the pair of images is formed by randomly forming two images, and the image pair formed by the ith image and the jth image is represented as (X) _i ,x _j ) By the symbol y _i,j To represent the labels of image pairs, y _i,j =1 indicates that the two images are similar, y _i,j =0 indicates that the two images are dissimilar.

In random image pairs (x _i ,x _j ) As CNN _M The output of the hash coding layer is (f) _i ,f _j ). The loss function is determined by 2 factors: (1) Class y _i,j The method comprises the steps of carrying out a first treatment on the surface of the (2) binarizing the constraint. Thus, for database CIFAR-10 with a total of 50000, its total loss value is:

here, m=12.

On the image database X, CNN can be obtained by minimizing Loss through gradient descent _M Model weights w of (2) will train the CNN _M Denoted as w-CNN _M . The learning rate of the gradient drop here is 0.00001.

Image x _i As w-CNN of the present application _M Input of model, get output o _i Setting a threshold θ=0 and mapping it to a hash code expressed asUse->To represent the mth bit encoding of the ith image, wherein:

thus, the image database X is accessed via w-CNN _M And mapping to obtain a hash code library, which is denoted as H.

And (3) image retrieval:

for the image x to be retrieved _query The top T images that are most similar are retrieved from the image database X.

By w-CNN of the application _M To-be-retrieved image x _query Mapping into hash code h _query ；

The first T images with the smallest hamming distance are found in the hash code library H, where t=5000.

The 5000 images are combined with x _query Is reordered from small to large.

And returning the 5000 reordered images as a retrieval result.

From the above, the embodiment of the application enhances the expression capability of generating hash codes by image mapping, and improves the accuracy of large-scale image retrieval.

Furthermore, in order to prove that the technical scheme provided by the application can realize accurate retrieval of images of the same category, the retrieval effect of an image retrieval algorithm on a retrieval image can be measured by taking the retrieval accuracy Precison as an evaluation standard. The calculation method of Precison comprises the following steps:

rel (i) indicates whether the image to be retrieved and the ith image in the image database are similar, and if so, the value is 1, and if not, the value is 0. For search image set (test set) X _test With average retrieval of all retrieved imagesThe accuracy MRP is used for measuring the retrieval performance of different methods.

Through experiments, the MRP of the double-row convolution hash mapping model is 83.52%, the MRP of the shallow-layer convolution hash mapping model is 76.54%, and the MRP of the deep-layer convolution hash mapping model is 81.78%. Therefore, the double-row convolutional neural network model connects the deep convolutional neural network model and the shallow convolutional neural network model in a superposition mode, the expression capacity of the features on the image can be enhanced, and the discrimination of binary coding is improved.

The embodiment of the application also provides a corresponding implementation device for the image retrieval method, so that the method has higher practicability. The image retrieval device provided by the embodiment of the present application will be described below, and the image retrieval device described below and the image retrieval method described above may be referred to correspondingly.

Referring to fig. 5, fig. 5 is a block diagram of an image retrieval device according to an embodiment of the present application, where the device may include:

the hash code generation module 501 is configured to map an image to be retrieved into a hash code to be retrieved by using a pre-constructed double-row convolution hash mapping model; the double-row convolution hash mapping model is formed by combining two convolution neural networks with different convolution layers, and comprises a first full-connection layer formed by parallelly connecting outputs of a first convolution neural network and a second full-connection layer serving as a hash coding layer; the number of pooling layers, the size of the pooling window and the step size of the pooling window of the first convolutional neural network and the second convolutional neural network are the same.

The image retrieval module 502 is configured to search a hash code library constructed in advance for a target image whose hamming distance difference value with the hash code to be retrieved meets a preset condition, so as to serve as a retrieval result of the image to be retrieved in the image database; the hash coding library is obtained by mapping each image in the image database through a double-row convolution hash mapping model.

Optionally, in some implementations of the present embodiment, referring to fig. 6, the apparatus may further include a model training module 503, for example, where the model training module 503 is configured to take an image pair in the image database as an input; if the label types of the image pairs are similar, the distance between hash code pairs obtained by mapping the image pairs is taken as a loss value; if the label categories of the image pairs are dissimilar, the distance and the interval between hash code pairs obtained by mapping the image pairs are taken as loss values; the label category is used for identifying the similarity of the two images in the pair; and optimizing the loss value by adopting a machine learning optimization algorithm to train a double-row convolution hash mapping model.

In another embodiment, the model training module 503 may further use the following formula as the loss value of the double-row convolutional hash map model:

wherein Loss is a Loss value, the first _i The image pair is formed by the image and the j-th image, n is the total number of the images in the image database, o _i Hash encoding for ith image, o _j Hash encoding for the j-th image, ||o _i -o _j || ₂ Distance between hash code pairs, m is interval between hash code pairs obtained by mapping image pairs, alpha is super parameter, y _i,j For label category, y _i,j =1 indicates that the ith image is similar to the jth image, y _i,j =0 indicates that the i-th image is dissimilar to the j-th image.

In other embodiments, the model training module 503 may also be a module that uses a machine learning optimization algorithm to optimize the loss value to train the double-row convolutional hash map model, for example, to use a random gradient descent optimization loss value to train the double-row convolutional hash map model.

Optionally, the image retrieval module 502 may further include a hash code library generating sub-module, where the hash code library generating sub-module is configured to input each image in the image database into a double-row convolution hash mapping model, and map an output of a hash code layer of the double-row convolution hash mapping model into a hash code by setting a threshold; generating a hash code library according to the hash codes of each image;

In some other embodiments, the image retrieval module 502 is further configured to search the hash code library for the top T images with the smallest hamming distance value with the hash code to be retrieved; sorting the T images according to the Hamming distance from small to large with the hash code to be searched; and outputting the ordered T images.

The functions of each functional module of the image retrieval device according to the embodiment of the present application may be specifically implemented according to the method in the embodiment of the method, and the specific implementation process may refer to the related description of the embodiment of the method, which is not repeated herein.

The embodiment of the application also provides an image retrieval device, which specifically comprises:

a memory for storing a computer program;

a processor for executing a computer program to implement the steps of the image retrieval method as described in any of the embodiments above.

The embodiment of the application also provides a computer readable storage medium storing an image retrieval program which, when executed by a processor, performs the steps of the image retrieval method according to any one of the embodiments above.

The functions of each functional module of the computer readable storage medium according to the embodiments of the present application may be specifically implemented according to the method in the embodiments of the method, and the specific implementation process may refer to the relevant description of the embodiments of the method, which is not repeated herein.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The image retrieval method, apparatus, device and computer readable storage medium provided by the present application are described in detail above. The principles and embodiments of the present application have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the method of the present application and its core ideas. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the application can be made without departing from the principles of the application and these modifications and adaptations are intended to be within the scope of the application as defined in the following claims.

Claims

1. An image retrieval method, comprising:

the double-row convolution hash mapping model is formed by combining two convolution neural networks with different convolution layers, and comprises a first full-connection layer formed by parallelly connecting outputs of a first convolution neural network and a second full-connection layer serving as a hash coding layer; the number of pooling layers, the size of pooling windows and the step size of the pooling window of the first convolutional neural network and the second convolutional neural network are the same; the hash coding library is obtained by mapping each image in the image database through the double-row convolution hash mapping model;

the hash code library is generated by the following steps:

generating the hash code library according to the hash code of each image;

wherein the Q bit code of the ith image of the image databaseThe formula is as follows:

in the method, in the process of the application,and (3) outputting the Q bit of the ith image at the hash coding layer, wherein θ is the threshold value.

2. The image retrieval method of claim 1, wherein the training process of the double-row convolutional hash map model comprises:

taking an image pair in the image database as input;

3. The image retrieval method according to claim 2, wherein the loss value of the double-row convolution hash map model is:

wherein Loss is the Loss value, the ith image and the jth image form the image pair, n is the total number of images in the image database, o _i Hash encoding for ith image, o _j Hash encoding for the j-th image, ||o _i -o _j || ₂ For the euclidean distance between the hash coding of the ith image and the hash coding of the jth image, m is a preset interval parameter, and I.I.I.I. ₂ The number of 2 norms is indicated, I.I ₁ Representing 1 norm, alpha being a hyper-parameter, y _i,j For label category, y _i,j =1 indicates that the ith image is similar to the jth image, y _i,j =0 indicates that the i-th image is dissimilar to the j-th image.

4. The image retrieval method of claim 2, wherein optimizing the loss values using a machine learning optimization algorithm to train the double-row convolutional hash map model is optimizing the loss values using a random gradient descent to train the double-row convolutional hash map model.

5. The method for searching for an image according to any one of claims 1 to 4, wherein searching for a target image satisfying a preset condition with a hamming distance difference value of the hash code to be searched in a hash code library constructed in advance is:

and outputting the ordered T images.

6. The image retrieval method according to claim 1, wherein the first convolutional neural network is a VGG-16 network model with 14 layers of convolutional layers, the second convolutional neural network is a VGG-16 network model with 5 layers of convolutional layers, and the pooling windows of the first convolutional neural network and the second convolutional neural network are 2 x 2 and the step size is 1.

7. An image retrieval apparatus, comprising:

the image retrieval module is used for searching a target image which meets the preset condition with the Hamming distance difference value of the Hamming code to be retrieved in a pre-constructed Hamming code library to be used as a retrieval result of the image to be retrieved in an image database; the hash coding library is obtained by mapping each image in the image database through the double-row convolution hash mapping model;

the hash code library is generated by the following steps:

generating the hash code library according to the hash code of each image;

8. An image retrieval apparatus comprising a processor for implementing the steps of the image retrieval method according to any one of claims 1 to 6 when executing a computer program stored in a memory.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon an image retrieval program which, when executed by a processor, implements the steps of the image retrieval method according to any one of claims 1 to 6.