CN113220916A

CN113220916A - Image retrieval method and device

Info

Publication number: CN113220916A
Application number: CN202110605065.9A
Authority: CN
Inventors: 刘伟煜; 刘义; 王磊; 徐竹胜; 王雪; 夏甜; 许佑连
Original assignee: Postal Savings Bank of China Ltd
Current assignee: Postal Savings Bank of China Ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-08-06

Abstract

The invention discloses an image retrieval method and device. Wherein, the method comprises the following steps: acquiring a first feature vector of a target image; determining similarity scores of the first feature vector and second feature vectors of a plurality of reference images in a feature index library, wherein the feature index library records the reference images and the feature vectors of the reference images; determining a part of reference images with similarity scores larger than a preset value in the plurality of reference images; and determining part of the reference image as a similar image of the target image. The invention solves the technical problem of lower reliability of the image retrieval mode used in the related technology.

Description

Image retrieval method and device

Technical Field

The invention relates to the technical field of image retrieval, in particular to an image retrieval method and device.

Background

In the related art, the depth features are extracted based on a depth convolution network, the deep features of a volume base layer are used, but the extracted features of the volume layer are local features which are not enough to represent the global feature information of the image when used for classification and retrieval tasks; in addition, in the image retrieval technology in the related technology, a linear dimension reduction mode is adopted for reducing dimensions of the features, but along with the increase of data volume, particularly for the nonlinear relation between high-dimensional data such as images and texts, the linear dimension reduction mode is not suitable, and the high-dimensional features cause the problems of complex calculation and slow retrieval speed; in addition, a cosine similarity measurement method is applied to calculate the similarity relation between the image feature vectors, and a cosine similarity measurement algorithm distinguishes differences from directions and is insensitive to absolute numerical values, so that errors can be caused.

In order to solve the above technical problems, some proposals have been proposed, for example, a vector search plug-in is designed based on the elastic search, then the vector search plug-in is used to expand the image field type and the semantic field type in the elastic search, the image feature vectors of a plurality of images are extracted according to the image network model trained by the neural network model and stored in the image field, the semantic feature vectors of the text data are extracted according to the semantic network model trained by the neural network model and stored in the semantic field, the image field, the semantic field and the original search field provided by the elastic search together form a search database structure to create a search database, when a user searches, different search conditions can be set, and the vector search plug-in searches the search database in at least one of the boolean search mode, the image search mode and the semantic search mode, the method realizes the combination of the Boolean retrieval mode, the image retrieval mode and the semantic retrieval mode to support the mixed full stack retrieval of mass data. However, part of the existing retrieval technology of the scheme is to adopt non-dimensionality reduction on the features or to use a traditional linear dimensionality reduction method to carry out dimensionality reduction; however, as the amount of data increases, a complex nonlinear relationship exists particularly for high-dimensional data such as images and texts.

In view of the problem of low reliability of the image retrieval method used in the related art, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides an image retrieval method and an image retrieval device, which at least solve the technical problem of low reliability of an image retrieval mode used in the related technology.

According to an aspect of the embodiments of the present invention, there is provided an image retrieval method, including: acquiring a first feature vector of a target image; determining similarity scores of the first feature vector and second feature vectors of a plurality of reference images in a feature index library, wherein the reference images and the feature vectors of the reference images are recorded in the feature index library; determining a part of the reference images with similarity scores larger than a preset value; determining the partial reference image as a similar image of the target image.

Optionally, obtaining a first feature vector of the target image includes: acquiring a Uniform Resource Locator (URL) of the target image through a distributed file system; extracting a first feature value of the target image based on the URL; and obtaining the first feature vector based on the first feature value.

Optionally, before obtaining the first feature vector of the target image, the image retrieval method further includes: preprocessing the target image; wherein the target image is preprocessed, including; and modifying the image size of the target image into the input size of a depth convolution network VGG, and carrying out averaging operation on the target image with the modified size.

Optionally, after obtaining the first feature vector of the target image, the image retrieval method further includes: carrying out nonlinear dimensionality reduction on the first feature vector; and carrying out normalization processing on the first feature vector after the dimension reduction processing.

Optionally, before determining similarity scores between the first feature vector and second feature vectors of multiple reference images in the feature index library, the image retrieval method further includes: generating the feature index library; wherein generating the feature index library comprises: acquiring a training image; preprocessing the training image, wherein the preprocessing comprises: modifying the image size of the target image into the input size of a depth convolution network VGG, and carrying out averaging operation on the target image with the modified size; extracting image features of the training image by using a deep convolutional network (VGG); and storing the training images and the image characteristics of the training images into a distributed search and analysis engine ElasticSearch to obtain the characteristic index library.

Optionally, storing the training image and the image features of the training image in an elastic search engine, including: uploading the training image to a distributed file system to obtain a Uniform Resource Locator (URL) of the training image through the distributed file system; converting the image characteristics of the training image into a binary array and encoding the binary array into a character string; storing the URL and the character string into the ElasticSearch.

Optionally, determining similarity scores between the first feature vector and second feature vectors of multiple reference images in a feature index library includes: determining similarity scores of the first feature vector and second feature vectors of the multiple reference images by a mean cosine similarity measurement scoring method, wherein the mean cosine similarity measurement scoring method is represented by a predetermined formula, and the predetermined formula is as follows:

score represents the similarity score, x_iRepresenting the i-th element, y, in the first feature vector_iTo representThe ith element in the second feature vector, z represents the average of all elements of the first and second feature vectors.

Optionally, after determining the partial reference image as a similar image of the target image, the image retrieval method further includes: displaying the partial reference image.

According to another aspect of the embodiments of the present invention, there is provided an image retrieval apparatus, including: the acquisition module is used for acquiring a first feature vector of a target image; a first determining module, configured to determine similarity scores between the first feature vector and second feature vectors of multiple reference images in a feature index library, where the feature index library records the reference images and the feature vectors of the reference images; the second determination module is used for determining a part of the reference images with similarity scores larger than a preset value; a third determining module, configured to determine the partial reference image as a similar image to the target image.

Optionally, the obtaining module includes: the first acquisition unit is used for acquiring a Uniform Resource Locator (URL) of the target image through a distributed file system; a first extraction unit configured to extract a first feature value of the target image based on the URL; and the second acquisition unit is used for obtaining the first feature vector based on the first feature value.

Optionally, the image retrieval apparatus further includes: the first preprocessing module is used for preprocessing the target image before acquiring a first feature vector of the target image; wherein the preprocessing module comprises; and the processing unit is used for modifying the image size of the target image into the input size of the deep convolutional network VGG and carrying out averaging operation on the target image after size modification.

Optionally, the image retrieval apparatus further includes: the dimension reduction module is used for carrying out nonlinear dimension reduction processing on a first feature vector of a target image after the first feature vector is obtained; and the normalization processing module is used for performing normalization processing on the first feature vector after the dimension reduction processing.

Optionally, the image retrieval apparatus further includes: the generating module is used for generating the feature index library before determining the similarity scores of the first feature vector and second feature vectors of a plurality of reference images in the feature index library; wherein the generating module comprises: a third acquisition unit configured to acquire a training image; a second preprocessing unit, configured to preprocess the training image, where the preprocessing includes: modifying the image size of the target image into the input size of a depth convolution network VGG, and carrying out averaging operation on the target image with the modified size; the second extraction unit is used for extracting the image characteristics of the training image by using a deep convolutional network VGG; and the storage unit is used for storing the training images and the image characteristics of the training images into a distributed search and analysis engine ElasticSearch to obtain the characteristic index library.

Optionally, the storage unit includes: the acquisition subunit is used for uploading the training image to a distributed file system so as to obtain a Uniform Resource Locator (URL) of the training image through the distributed file system; the conversion subunit is used for converting the image characteristics of the training image into a binary array and encoding the binary array into a character string; a storage subunit, configured to store the URL and the character string in the ElasticSearch.

Optionally, the first determining module includes: a determining unit, configured to determine similarity scores between the first feature vector and second feature vectors of the multiple reference images by a mean cosine similarity metric scoring method, where the mean cosine similarity metric scoring method is expressed by a predetermined formula, and the predetermined formula is:

score represents the similarity score, x_iRepresenting the i-th element, y, in the first feature vector_iRepresents the ith element in the second feature vector, z represents all elements of the first and second feature vectorsAverage value of (a).

Optionally, the image retrieval apparatus further includes: a display module for displaying the partial reference image after determining the partial reference image as a similar image to the target image.

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, which includes a stored computer program, wherein when the computer program is executed by a processor, the computer-readable storage medium is controlled by a device to execute any one of the above image retrieval methods.

According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a computer program, where the computer program executes to execute the image retrieval method according to any one of the above.

In the embodiment of the invention, a first feature vector of a target image is obtained; determining similarity scores of the first feature vector and second feature vectors of a plurality of reference images in a feature index library, wherein the feature index library records the reference images and the feature vectors of the reference images; determining a part of reference images with similarity scores larger than a preset value in the plurality of reference images; and determining part of the reference image as a similar image of the target image. By the image retrieval method provided by the embodiment of the invention, the purpose of taking the part of the reference image with the similarity score larger than a preset value as the similar image of the image to be retrieved is achieved by calculating the similarity scores of the feature vectors of the image to be retrieved and the feature vectors of a plurality of groups of reference images in the feature index library, the technical effect of the reliability of the image retrieval is achieved, and the technical problem of low reliability of the image retrieval mode used in the related technology is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a flowchart of a retrieval method of an image according to an embodiment of the present invention;

FIG. 2(a) is a first diagram illustrating various dimension reduction results according to an embodiment of the present invention;

FIG. 2(b) is a second diagram of various dimension reduction results according to an embodiment of the invention;

FIG. 3 is a schematic diagram of a method of retrieving an image according to an embodiment of the present invention;

fig. 4 is a schematic diagram of an image retrieval apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For convenience of description, some nouns or terms appearing in the embodiments of the present invention are explained below.

Deep learning: the essence of deep learning is to learn more useful features by constructing a machine learning model with many hidden layers and massive training data, thereby finally improving the accuracy of classification or prediction.

VGG19 network model: the deep convolutional neural network and the VGG mainly explore the relationship between the depth and the performance of the convolutional neural network, and can greatly reduce the error rate. Meanwhile, the generalization capability of the VGG is good, and the VGG has good performance on different picture data sets. VGG is so far often used as a classical model to extract features of an image.

And (3) reducing the dimensionality: the method overcomes the dimension disaster, obtains essential characteristics, saves storage space, removes useless noise and realizes data visualization.

UMAP (unified modified application and project, UMAP for short): an innovative nonlinear dimension-reduction manifold learning algorithm is based on the principle that the manifold and projection technology are utilized to achieve the purpose of dimension reduction. First the distances between points in the high dimensional space are calculated, they are projected into the low dimensional space, and the distances between points in the low dimensional space are calculated. It then uses a random gradient descent to minimize the difference between these distances.

Elastic search: the Lucene-based open-source distributed search and analysis engine has the advantages of high search speed, distributed processing, support of horizontal expansion, support of multiple development languages and convenience in integrated deployment.

FastDFS: an open source lightweight distributed file system that manages files, the functions comprising: file storage, file synchronization, file access (file upload, file download), etc., solving the problems of large-capacity storage and load balancing. The method is particularly suitable for online services taking files as carriers, such as photo album websites, video websites and the like.

Uniform Resource Locator (URL): is a representation method for specifying the location of information on a web service program of the internet.

The development of deep learning has achieved widespread use for many computer vision tasks. The invention uses a VGG19 network framework based on deep learning to extract the depth features, and the VGG19 network mainly proves that the depth of the network is increased, the final performance of the network can be influenced to a certain extent, and the feature information of the image can be better represented by using the depth features. The extracted depth feature vector needs to be subjected to dimensionality reduction, and dimensionality reduction operation has the advantages of removing redundant features, reducing required storage space and accelerating calculation speed. And a non-linear dimension reduction mode UMAP is adopted to carry out feature dimension reduction, so that UMAP successfully reflects most of large-scale global structures, and local fine structures are also reserved. It has short run time, is less constrained by sample size and still performs well in the tens of thousands of dimensions. The feature vector after dimension reduction can still well reflect the information of the picture, and redundant features are removed, so that the required storage space is reduced, and the calculation speed of retrieval is accelerated.

In the image retrieval method provided by the embodiment of the invention, the data set image is uploaded to the FastDFS to obtain the access path URL of the image, the URL of the image and the feature vector after dimension reduction are stored in the Elasticissearch as a retrieval library, a cosine similarity measurement method is used for measuring the similarity between the feature vector of the image to be retrieved and the feature vector in the Elasticissearch retrieval library, and a plurality of images with the similarity ranking at the front end and the similarity scoring condition are returned.

Therefore, the image retrieval method provided by the embodiment of the invention has the advantages that the image retrieval method utilizes a massive image search system formed by combining the reduced-dimension depth feature vector with the elastic search and the FastDFS, so that the high accuracy and the real-time performance of image search are promoted, and the image retrieval method still has good performance in high concurrency.

The following describes an image retrieval method and apparatus according to an embodiment of the present invention with reference to specific embodiments.

Example 1

In accordance with an embodiment of the present invention, there is provided a method embodiment of a method for retrieving an image, it being noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

Fig. 1 is a flowchart of an image retrieval method according to an embodiment of the present invention, as shown in fig. 1, the image retrieval method including the steps of:

step S102, a first feature vector of the target image is obtained.

Optionally, the target image may be an image that needs to be detected.

As an alternative embodiment, obtaining the first feature vector of the target image may include: acquiring a Uniform Resource Locator (URL) of a target image through a distributed file system; extracting a first feature value of the target image based on the URL; a first feature vector is obtained based on the first feature value.

Optionally, the distributed file system may be an open-source lightweight distributed file system FastDFS.

In this embodiment, when a user uploads an image, the image is uploaded to FastDFS, and a URL that can access the image is obtained in the FastDFS system, so that a first feature value of the image can be extracted based on the URL of the image, and a first feature vector of the image can be obtained based on the first feature value.

Step S104, determining similarity scores of the first feature vector and second feature vectors of a plurality of reference images in a feature index library, wherein the feature index library records the reference images and the feature vectors of the reference images.

Step S106, determining the partial reference images with the similarity scores larger than the preset value in the multiple reference images.

In step S108, a part of the reference image is determined as a similar image of the target image.

As can be seen from the above, in the embodiment of the present invention, after the first feature vector of the target image is obtained, the similarity scores of the first feature vector and the second feature vectors of the multiple reference images in the feature index library are determined, then the partial reference images with the similarity scores larger than the predetermined value in the multiple reference images are determined, and the partial reference images are determined as the similar images of the target image, so that the purpose of calculating the similarity scores of the feature vectors of the image to be retrieved and the feature vectors of the multiple groups of reference images in the feature index library and using the partial reference images with the similarity scores larger than the predetermined value as the similar images of the image to be retrieved is achieved, and the technical effect of reliability of image retrieval is achieved.

Therefore, the image retrieval method provided by the embodiment of the invention solves the technical problem of low reliability of the image retrieval mode used in the related technology.

As an alternative embodiment, before obtaining the first feature vector of the target image, the image retrieval method may further include: preprocessing a target image; preprocessing a target image, wherein the preprocessing comprises the following steps; and modifying the image size of the target image into the input size of the deep convolutional network VGG, and carrying out averaging operation on the target image with the modified size.

In this embodiment, after the user uploads the target image, the target image is preprocessed, for example, the image size of the target image may be modified to the input size of the deep neural network VGG, and then the averaging operation is performed on the target image.

It should be noted that, in the embodiment of the present invention, the image retrieval method may support inputting a picture of any size.

As an alternative embodiment, after obtaining the first feature vector of the target image, the image retrieval method may further include: carrying out nonlinear dimensionality reduction on the first feature vector; and carrying out normalization processing on the first feature vector after the dimension reduction processing.

In this embodiment, after obtaining the first feature vector, the UMAP nonlinear dimension reduction algorithm may be invoked to output a 128-dimensional feature vector F₁₂₈. Due to the fact that the UMAP nonlinear dimension reduction algorithm is high in dimension reduction response speed and small in sample size constraint, the dimension reduction speed of the first feature vector is improved, and therefore the image retrieval efficiency is improved.

Fig. 2(a) is a schematic diagram of a plurality of dimension reduction results according to an embodiment of the present invention, and fig. 2(b) is a schematic diagram of a plurality of dimension reduction results according to an embodiment of the present invention, and it can be seen from fig. 2(a) and fig. 2(b) that the dimension reduction speed can be increased by using the dimension reduction method of UMAP.

After the nonlinear dimensionality reduction processing is performed on the first feature vector in the above manner, normalization operation may be performed on the first feature vector after the dimensionality reduction processing.

This can be achieved, for example, by:

wherein the content of the first and second substances,

denotes the normalized first feature vector, max: maximum value of sample data, min: is the minimum value of the sample data.

As an alternative embodiment, before determining similarity scores between the first feature vector and second feature vectors of multiple reference images in the feature index library, the image retrieval method further includes: generating a characteristic index library; wherein, generating the feature index library comprises: acquiring a training image; preprocessing the training image, wherein the preprocessing comprises: modifying the image size of the target image into the input size of a depth convolution network VGG, and carrying out averaging operation on the target image with the modified size; extracting image features of the training image by using a deep convolutional network (VGG); and storing the training images and the image characteristics of the training images into a distributed search and analysis engine ElasticSearch to obtain a characteristic index library.

In this embodiment, the image dataset pictures may be collected first, and specifically, training dataset pictures for image retrieval of various categories may be downloaded; then changing the input size of the deep neural network VGG network into the input size of the deep neural network, and then carrying out averaging operation on the input size; training a deep convolution network VGG19 network by using the image after the equalization processing, extracting image features in batches, obtaining feature vectors of the image based on the extracted image features, inputting the feature vectors F, calling a UMAP nonlinear dimension reduction algorithm, and outputting the feature vectors F with 128 dimensions₁₂₈U of pictureAnd storing the RL and the feature vector into an ElasticSearch to obtain a feature index library.

The training of the deep convolution network VGG19 network by using the image after the averaging processing can be realized by the following steps: 1) after the picture is input, the operation of using a convolution kernel of 3 × 3 and having a step size of 2 pixels is performed on the first convolution layer. The size of the picture is 224 × 224 × 064; 2) after the first pooling layer, the sizes of the used max-posing convolution kernels are 2 × 12, the step length is 2 pixels max-posing, the size of the picture after pooling is changed into 112 × 2112 × 3128, the width and the height of the picture are reduced by half, and the number of channels is doubled; 3) after the second convolutional layer, using a convolution kernel of 3 × 43 with a step size of 2 pixels, and after the second pooling layer, using convolution kernels of max-pooling with a size of 2 × 52 with a step size of max-pooling of 2 pixels, after the size of the picture after pooling is changed, the size of the picture is 56 × 656 × 7256; 4) after the operation of using a convolution kernel of 3 × 83 and the step size of 2 pixels through the third convolution layer, the size of the used convolution kernel of max-posing is 2 × 92 and the step size of max-posing is 2 pixels through the third pooling layer, and the size of the picture after pooling is 28 × 28 × 0512; 5) after the operation of using a convolution kernel of 3 × 13 and a step size of 2 pixels in the fourth convolution layer, the size of the used convolution kernel of max-posing is 2 × 22 and the step size of max-posing is 2 pixels in the fourth pooling layer, and the size of the picture after pooling is changed to 14 × 314 × 512; 6) after the fifth convolution layer, using a convolution kernel of 3 × 3 and the operation of 2 pixels in step length, and then passing through a fifth pooling layer, the size of the used max-pooling convolution kernel is 2 × 2, the step length is 2 pixels in max-pooling, and the size of the picture after pooling is changed into 7 × 7 × 512; 7) after a max-firing, the convolution kernels used by the method are all 2 x 2, the step length is 2 pixels, and three full connection layers, FC4096, FC4096 and FC1000 are connected; the fully-connected layer considers global information more than the convolutional layer, maps all feature maps (width, height and channel) with local information originally to 4096 dimensions, and is advantageous in classification; 8) extracted 4096 × 1 feature vectors using full connection layers FC4096-RELU6 and FC4096-RELU7 asThe two layers of feature vectors are subjected to fusion calculation to extract feature vectors capable of better representing the intrinsic information of the image as the two layers of feature vectors are rich in information expressing the feature vectors; 9) inputting a feature vector F, calling a UMAP nonlinear dimension reduction algorithm, and outputting a 128-dimensional feature vector F₁₂₈The UMAP dimension reduction response speed is high, the constraint by the sample size is small, and then the feature vector is subjected to normalization operation.

FC4096-RELU6-drop0.5, FC initializes gaussian distribution (std ═ 0.005), and bias constant initializes (0.1); FC4096-RELU7-drop0.5, FC initializes for gaussian distribution (std ═ 0.005), bias constants initialize (0.1); FC1000 (last after SoftMax1000 classification), FC initializes gaussian distribution (std is 0.005) and bias constants (0.1).

FC4096-RELU6 described above: with F₁＝[f₁,f₂.....f₄₀₉₆]To represent the feature vector of the current fully-connected layer, similarly FC4096-RELU 7: with F₂＝[f₁,f₂.....f₄₀₉₆]To represent the feature vector of the current fully-connected layer;

and a final feature vector F is obtained, where

The element product is expressed, i.e. the two vectors are multiplied each element separately.

As an alternative embodiment, storing the training images and the image features of the training images in an elastic search engine, includes: uploading the training image to a distributed file system to obtain a Uniform Resource Locator (URL) of the training image through the distributed file system; converting the image characteristics of the training image into a binary array and encoding the binary array into a character string; the URL and string are stored into the ElasticSearch.

In this embodiment, the training image set may be uploaded to a distributed file system (for example, an open-source lightweight distributed file system FastDFS), to obtain URLs which can access the picture, then convert the feature values of the picture into binary groups and encode the binary groups into character strings, and store the encoded character strings and the corresponding URLs into an Elasticsearch, where the feature values and the URL storage types in the Elasticsearch are binary and text types.

As an alternative embodiment, determining similarity scores between the first feature vector and second feature vectors of multiple reference images in the feature index library includes: determining similarity scores of the first feature vectors and second feature vectors of the multiple reference images by a mean cosine similarity measurement scoring method, wherein the mean cosine similarity measurement scoring method is represented by a preset formula, and the preset formula is as follows:

score denotes the similarity score, x_iRepresenting the i-th element, y, in the first feature vector_iDenotes the ith element in the second feature vector, and z denotes the average of all elements of the first and second feature vectors.

In this embodiment, the similarity score may be calculated using an improved version of a cosine similarity metric algorithm according to the feature value of the uploaded picture and the feature values of the pictures in the feature index library. For example, the feature value vector of the input query picture (i.e., target image): x ═ X₀ x₁ x₂…x_i…x₁₂₇]Feature value vector of picture in Elasticsearch: y ═ Y₀ y₁y₂…y_i…y₁₂₇]The cosine similarity algorithm is:

because the cosine similarity is more in terms of distinguishing differences in directions, and absolute numerical insensitivity can cause errors, a calculation model of a cosine similarity algorithm is improved in the embodiment of the invention, and a mean value is subtracted based on each element, so that the error can be reduced.

The above average value is obtained as follows:

herein, the

Represents the average of the accumulated sums of each element value of the feature vector X,

herein, the

Represents the average of the accumulated sums of each element value of the feature vector Y,

where z represents

And

so we get the average of all the element values of X and Y.

The above-mentioned mean cosine similarity measure scoring method is:

when X and Y are all zero vectors, score is 1.

In the embodiment of the present invention, in order to prove the effectiveness of the mean cosine similarity metric scoring method, it may be assumed that two eigenvectors X ═ {0.1, 0.2}, Y ═ 0.4, 0.5}, and the cosine similarity algorithm is used as follows:

using a mean cosine similarity metric scoring method, score was 0.9. Very muchObviously, the result error is caused by the insensitivity of cosine similarity to numerical value and the sensitivity to direction, and X ═ {0.1, 0.2}, Y ═ {0.4, 0.5} are different, and their similarity is not so high, so the error can be properly reduced by the mean cosine similarity scoring method.

As an alternative embodiment, after determining the partial reference image as a similar image of the target image, the image retrieval method further includes: a partial reference image is displayed.

For example, after the user uploads the images to be retrieved, the images may be sorted in descending order according to the above scoring condition, and the background may return the top 10 or a predetermined number of images most similar to the uploaded images for presentation.

Fig. 3 is a schematic diagram of an image retrieval method according to an embodiment of the present invention, and as shown in fig. 3, after a picture data set is obtained, a deep convolutional network VGG19 network is trained by using the picture data set, image features are extracted in batch, and a UMAP nonlinear dimension reduction algorithm is invoked, and a 128-dimensional feature vector is output; meanwhile, the URL of the access path of the picture data set is obtained by using a distributed file system (for example, FastDFS), and then the URL of the picture and the feature vector are stored in an ElasticSearch. In addition, when a user uploads a picture, a distributed file system (for example, FastDFS) can be used to obtain an access path URL of the picture data set, then feature extraction is performed on the picture uploaded by the user, a feature vector is obtained based on the proposed features, dimension reduction processing is performed on the feature vector, a similarity score between the feature vector of the picture uploaded by the user and a feature vector of a reference picture in an ElasticSearch is calculated, the pictures corresponding to the similarity score are sorted in a descending order, and a plurality of pictures sorted in the front are returned to the front end of the value for presentation.

Compared with the prior art that the features are extracted based on a deep learning network, the features of a convolutional layer are generally used, the local features are obtained by convolution, the previous local features are assembled into a complete graph through a weight matrix again by full connection, in the embodiment of the invention, the depth features are extracted by using a trained network VGG19 network model, the feature vectors of full connection layers are used for carrying out classified retrieval on images, the feature vectors of the full connection layers are improved, the feature vectors of the two full connection layers are calculated and fused, and the feature vectors representing image information are obtained and then are used for efficient image retrieval; in addition, compared with most image feature dimension reduction modes adopted in image retrieval in the related art, such as PCA (principal component analysis), LDA (linear discriminant analysis), singular value decomposition and the like, the dimension reduction modes are not suitable along with the increase of data volume, especially for high-dimensional data such as images and texts, extremely complex nonlinear relations exist among the features. The advantages of UMAP over other comparative algorithms at run-time and subject to sample size constraints can be seen in fig. 2(a) and 2 (b). In addition, compared with the algorithm for measuring the similarity of images in the related art, the cosine similarity measurement algorithm is used, the cosine similarity is more different in direction and is insensitive to absolute numerical values, and in order to reduce the error caused by the insensitivity to numerical values and correct the result to keep the rationality, the embodiment of the invention provides the mean value cosine similarity measurement scoring algorithm, namely, the mean value is subtracted from the numerical values in all dimensions, the calculation mode is changed, the scoring condition is recalculated, and the error is further reduced.

As can be seen from the above, compared with the related art, the image retrieval method provided in the embodiment of the present invention has the following advantages: 1) according to the category of the data set, model training parameters are modified for a VGG19 network model to retrain, full-connection layer feature vector information is fully utilized, the full-connection layer mainly has the function of realizing classified retrieval, local features of convolutional layers are integrated, and outputting a line of feature vector values greatly reduces the influence of feature positions on retrieval; 2) a nonlinear dimension reduction method-UMAP is introduced to carry out dimension reduction operation on the feature vectors, and compared with the prior art, part of the prior retrieval technology adopts the steps of not reducing the dimension of the features or reducing the dimension by using a traditional linear dimension reduction method. However, as the amount of data increases, especially for high-dimensional data such as images and texts, complex nonlinear relations exist, which highlights the necessity of nonlinear dimension reduction. The UMAP enables the feature vector of the dimension reduction to reflect the intrinsic information of the image, and the complexity of calculation can be reduced when similarity measurement calculation is carried out, so that the response speed of retrieval is improved; 3) measuring the similarity relation between the characteristic vectors of the two pictures by adopting a cosine similarity measurement method, and providing a mean value cosine similarity measurement scoring algorithm on the basis, namely subtracting a mean value from the numerical values of all dimensions, and modifying a calculation formula, so that the similarity relation between the two pictures is measured better, and the retrieval accuracy is improved; 4) the massive image retrieval system formed by combining the reduced-dimension depth feature vector, the Elasticisearch and the FastDFS facilitates high accuracy and real-time performance of image retrieval, and still performs well in high concurrency.

Therefore, the image retrieval method provided by the embodiment of the invention has the following beneficial effects: 1) the advantages of a deep learning network VGG network model on an image retrieval task are fully utilized, the deep feature extraction is carried out on the image, the optimized feature vector of the full connection layer is used, the local features of the convolution layer are integrated, the influence of the feature position on the retrieval is greatly reduced, the feature information of the image is well represented, and the retrieval result is more accurate; 2) obviously, the extracted high-dimensional depth feature vector is subjected to dimension reduction by using a nonlinear dimension reduction method-UMAP (unified modeling application), the dimension of the feature vector after dimension reduction is low, the calculation cost is low, the feature information of the image is also abundantly expressed, and the performance of an image retrieval task is improved; 3) the similarity relation between the characteristic vectors is calculated by using a mean cosine similarity scoring method, so that the error caused by insensitivity of a cosine similarity measurement algorithm to numerical values is reduced, and the retrieval accuracy can be improved to a certain extent.

Example 2

According to another aspect of the embodiment of the present invention, there is also provided an image retrieval apparatus, and fig. 4 is a schematic diagram of the image retrieval apparatus according to the embodiment of the present invention, as shown in fig. 4, the image retrieval apparatus includes: an acquisition module 41, a first determination module 43, a second determination module 45, and a third determination module 47. The following describes an image search device.

An obtaining module 41, configured to obtain a first feature vector of the target image.

The first determining module 43 is configured to determine similarity scores between the first feature vector and second feature vectors of multiple reference images in a feature index library, where the feature index library records the reference images and the feature vectors of the reference images.

And a second determining module 45, configured to determine a part of the reference images with similarity scores greater than a predetermined value.

A third determining module 47, configured to determine the partial reference image as a similar image of the target image.

It should be noted here that the above-mentioned obtaining module 41, the first determining module 43, the second determining module 45 and the third determining module 47 correspond to steps S102 to S108 in embodiment 1, and the above-mentioned modules are the same as the examples and application scenarios realized by the corresponding steps, but are not limited to what is disclosed in embodiment 1. It should be noted that the modules described above as part of an apparatus may be implemented in a computer system such as a set of computer-executable instructions.

As can be seen from the above, in the embodiment of the present invention, the obtaining module may be used to obtain the first feature vector of the target image; then, a first determining module is used for determining similarity scores of the first feature vector and second feature vectors of a plurality of reference images in a feature index library, wherein the feature index library records the reference images and the feature vectors of the reference images; then, a second determination module is used for determining a part of reference images of which the similarity scores are larger than a preset value in the multiple reference images; and determining the part of the reference image as a similar image of the target image by using a third determination module. By the image retrieval device provided by the embodiment of the invention, the purpose of taking the part of the reference image with the similarity score larger than a preset numerical value as the similar image of the image to be retrieved is realized by calculating the similarity score between the feature vector of the image to be retrieved and the feature vectors of a plurality of groups of reference images in the feature index library, the technical effect of the reliability of the image retrieval is achieved, and the technical problem of low reliability of an image retrieval mode used in the related technology is solved.

Optionally, the obtaining module includes: the system comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a Uniform Resource Locator (URL) of a target image through a distributed file system; a first extraction unit configured to extract a first feature value of a target image based on a URL; and the second acquisition unit is used for obtaining a first feature vector based on the first feature value.

Optionally, the image retrieval apparatus further includes: the first preprocessing module is used for preprocessing the target image before acquiring the first feature vector of the target image; the pretreatment module comprises a pretreatment module and a pretreatment module, wherein the pretreatment module comprises; and the processing unit is used for modifying the image size of the target image into the input size of the deep convolution network VGG and carrying out averaging operation on the target image with the modified size.

Optionally, the image retrieval apparatus further includes: the dimension reduction module is used for carrying out nonlinear dimension reduction processing on the first characteristic vector after the first characteristic vector of the target image is obtained; and the normalization processing module is used for performing normalization processing on the first feature vector after the dimension reduction processing.

Optionally, the image retrieval apparatus further includes: the generating module is used for generating a feature index library before determining similarity scores of the first feature vector and second feature vectors of a plurality of reference images in the feature index library; wherein, the generation module comprises: a third acquisition unit configured to acquire a training image; the second preprocessing unit is used for preprocessing the training image, wherein the preprocessing comprises the following steps: modifying the image size of the target image into the input size of a depth convolution network VGG, and carrying out averaging operation on the target image with the modified size; the second extraction unit is used for extracting the image characteristics of the training image by using a deep convolutional network VGG; and the storage unit is used for storing the training images and the image characteristics of the training images into a distributed search and analysis engine ElasticSearch to obtain a characteristic index library.

Optionally, the memory cell comprises: the acquisition subunit is used for uploading the training image to a distributed file system so as to obtain a Uniform Resource Locator (URL) of the training image through the distributed file system; the conversion subunit is used for converting the image characteristics of the training image into a binary array and encoding the binary array into a character string; and the storage subunit is used for storing the URL and the character string into the ElasticSearch.

Optionally, the first determining module includes: the determining unit is used for determining similarity scores of the first feature vectors and second feature vectors of the multiple reference images through a mean cosine similarity measurement scoring method, wherein the mean cosine similarity measurement scoring method is expressed through a preset formula, and the preset formula is as follows:

Optionally, the image retrieval apparatus further includes: and the display module is used for displaying the partial reference image after determining the partial reference image as the similar image of the target image.

Example 3

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium including a stored computer program, wherein when the computer program is executed by a processor, the apparatus where the computer-readable storage medium is located is controlled to execute the method for retrieving an image according to any one of the above.

Example 4

According to another aspect of the embodiments of the present invention, there is also provided a processor for executing a computer program, where the computer program executes to execute the image retrieval method of any one of the above.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. An image retrieval method, comprising:

acquiring a first feature vector of a target image;

determining similarity scores of the first feature vector and second feature vectors of a plurality of reference images in a feature index library, wherein the reference images and the feature vectors of the reference images are recorded in the feature index library;

determining a part of the reference images with similarity scores larger than a preset value;

determining the partial reference image as a similar image of the target image.

2. The method of claim 1, wherein obtaining a first feature vector for a target image comprises:

acquiring a Uniform Resource Locator (URL) of the target image through a distributed file system;

extracting a first feature value of the target image based on the URL;

and obtaining the first feature vector based on the first feature value.

3. The method of claim 1, wherein prior to obtaining the first feature vector of the target image, the method further comprises:

preprocessing the target image;

wherein the target image is preprocessed, including;

and modifying the image size of the target image into the input size of a depth convolution network VGG, and carrying out averaging operation on the target image with the modified size.

4. The method of claim 2, wherein after obtaining the first feature vector of the target image, the method further comprises:

carrying out nonlinear dimensionality reduction on the first feature vector;

and carrying out normalization processing on the first feature vector after the dimension reduction processing.

5. The method of claim 1, wherein prior to determining the similarity scores of the first feature vector and the second feature vectors of the plurality of reference images in the feature index library, the method further comprises: generating the feature index library;

wherein generating the feature index library comprises:

acquiring a training image;

preprocessing the training image, wherein the preprocessing comprises: modifying the image size of the target image into the input size of a depth convolution network VGG, and carrying out averaging operation on the target image with the modified size;

extracting image features of the training image by using a deep convolutional network (VGG);

and storing the training images and the image characteristics of the training images into a distributed search and analysis engine ElasticSearch to obtain the characteristic index library.

6. The method of claim 5, wherein storing the training images and image features of the training images into a distributed search and analysis engine ElasticSearch comprises:

uploading the training image to a distributed file system to obtain a Uniform Resource Locator (URL) of the training image through the distributed file system;

converting the image characteristics of the training image into a binary array and encoding the binary array into a character string;

storing the URL and the character string into the ElasticSearch.

7. The method of claim 1, wherein determining similarity scores between the first feature vector and second feature vectors of a plurality of reference images in a feature index library comprises:

determining similarity scores of the first feature vector and second feature vectors of the multiple reference images by a mean cosine similarity measurement scoring method, wherein the mean cosine similarity measurement scoring method is represented by a predetermined formula, and the predetermined formula is as follows:

score represents the similarity score, x_iRepresenting the i-th element, y, in the first feature vector_iRepresents the ith element in the second feature vector, and z represents the average of all elements of the first and second feature vectors.

8. The method of any of claims 1 to 7, wherein after determining the partial reference image as a similar image to the target image, the method further comprises: displaying the partial reference image.

9. An image retrieval apparatus, comprising:

the acquisition module is used for acquiring a first feature vector of a target image;

a first determining module, configured to determine similarity scores between the first feature vector and second feature vectors of multiple reference images in a feature index library, where the feature index library records the reference images and the feature vectors of the reference images;

the second determination module is used for determining a part of the reference images with similarity scores larger than a preset value;

a third determining module, configured to determine the partial reference image as a similar image to the target image.

10. A computer-readable storage medium, comprising a stored computer program, wherein when the computer program is executed by a processor, the computer-readable storage medium controls an apparatus to execute the image retrieval method according to any one of claims 1 to 8.