CN116796021A

CN116796021A - Image retrieval method, system, electronic device and medium

Info

Publication number: CN116796021A
Application number: CN202311091625.9A
Authority: CN
Inventors: 丁顺意; 张浩然; 张璐; 陶明
Original assignee: Shanghai Renyimen Technology Co ltd
Current assignee: Shanghai Renyimen Technology Co ltd
Priority date: 2023-08-28
Filing date: 2023-08-28
Publication date: 2023-09-22
Anticipated expiration: 2043-08-28
Also published as: CN116796021B

Abstract

The present disclosure relates to image retrieval methods, systems, electronic devices, and media. The method comprises the following steps: detecting N stable key points of each sample image, and extracting features of each key point to obtain N x D1 local feature descriptors; product quantization is performed on the n×d1 feature descriptors, including: dividing the dimension D1 into m1 subsections, and obtaining the subsections D1/m1 of each section and N x D1/m1 local feature descriptors of each section; clustering the D1/m1 feature descriptors in each segment, and setting the clustering number K1 to obtain K1D 1/m 1-dimensional clustering centers of each sub-segment; quantizing the feature vector of each key point in the N key points into a D1/m 1-dimensional short vector, wherein each element in the short vector is a cluster center ID to which the sub-segment belongs; and establishing a mapping relation between the cluster center ID and the sample image by recording an image identifier of the sample image containing the cluster center ID for each cluster center ID, thereby establishing a local feature inverted index table taking the cluster center ID as a key word.

Description

Image retrieval method, system, electronic device and medium

Technical Field

The present disclosure relates to the field of data image processing, and more particularly to image retrieval methods, systems, electronic devices, and media.

Background

The development of image retrieval technology has gone through an evolving process from manual design features to deep learning-based, from simple color and texture features to learning to more discriminative feature representations. Researchers have begun focusing on the extraction of semantic information by introducing bag of words models and content-based methods. The rise of deep learning further promotes the development of image retrieval technology, and more accurate and intelligent image retrieval is realized by extracting high-level semantic features of images through a convolutional neural network and introducing an end-to-end learning method. In the future, image retrieval will continue to develop, including multi-modal retrieval, incremental learning, reinforcement learning, and large-scale retrieval, so that the image retrieval system is more efficient, adaptive and intelligent.

The image retrieval method commonly used in the prior art is a Scale-invariant feature transform (SIFT: scale-Invariant Feature Transform) image retrieval method. The SIFT image retrieval method has many advantages: firstly, feature points with invariance can be extracted under different scales and rotation by the SIFT algorithm, so that the image has better robustness under scale and rotation transformation. Secondly, the SIFT feature descriptors can accurately and abundantly describe the local image area, and key detail information in the image is captured. In addition, SIFT features are based on local features, so that the problems of partial shielding, deformation, illumination change and the like can be effectively solved. In addition, the SIFT algorithm has higher calculation efficiency and is suitable for processing small and medium-scale image databases. Therefore, SIFT image retrieval methods remain widely used in many scenarios, especially where scale and rotation invariance needs to be considered and complex scenarios are handled, which can provide reliable and efficient image matching and retrieval capabilities.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

According to one aspect of the present disclosure, there is provided an image retrieval method including: detecting N stable key points of each sample image, and extracting features of each stable key point to obtain N x D1 local feature descriptors, wherein N is a natural number, and D1 is a dimension; product quantization is performed on the n×d1 local feature descriptors, including: dividing the dimension D1 into m1 subsections, and obtaining the subsection dimension D1/m1 of each subsection and N x D1/m1 local feature descriptors of each subsection; clustering D1/m1 local feature descriptors in each subsection respectively, and setting the clustering number K1 to obtain K1D 1/m 1-dimensional clustering centers of each subsection, wherein each clustering center is represented by a clustering center ID, the range of the clustering center ID is 0 to K1-1, and K1 is a natural number between five thousands and hundred thousands; quantizing the feature vector of each of the N key points into a D1/m 1-dimensional short vector, wherein each element in the short vector is a cluster center ID to which the sub-segment belongs; and recording an image identifier of a sample image containing the cluster center ID for each cluster center ID to establish a mapping relation between the cluster center ID and the sample image, thereby establishing a local feature inverted index table taking the cluster center ID as a key word.

According to some embodiments of the present disclosure, the image retrieval method may further include: extracting global features of each sample image to obtain D2 global feature descriptors, wherein D2 is a dimension; product quantization of D2 global feature descriptors, including: dividing the D2 dimension into m2 subsections, and obtaining the subsections D2/m2 of each subsection and the global feature descriptors D2/m2 of each subsection; clustering D2/m2 global feature descriptors in each sub-segment respectively, and setting the clustering number K2 to obtain K2D 2/m 2-dimensional clustering centers of each sub-segment, wherein each clustering center is represented by a clustering center ID, and the range of the clustering center ID is 0 to K2-1; quantizing the global feature vector into a D2/m 2-dimensional short vector, wherein each element in the short vector is a cluster center ID to which the sub-segment belongs; and recording an image identifier of the sample image containing the cluster center ID for each cluster center ID to establish a mapping relation between the cluster center ID and the sample image, thereby establishing a global feature inverted index table taking the cluster center ID as a key word.

According to some embodiments of the present disclosure, the image retrieval method may further include: responding to a received image to be queried input by a user, detecting M stable key points of the image to be queried, and extracting features of each stable key point to obtain M.D1 local feature descriptors, wherein M is a natural number; product quantization is performed on M x D1 local feature descriptors, including: dividing the dimension D1 into M1 subsections, and obtaining the subsection dimension D1/M1 of each subsection and the local feature descriptors of each subsection as M x D1/M1; clustering D1/m1 local feature descriptors in each sub-segment respectively, and setting the clustering number K1 to obtain K1D 1/m 1-dimensional clustering centers of each sub-segment, wherein each clustering center is represented by a clustering center ID, and the range of the clustering center ID is 0 to K-1; quantizing the characteristics of each key point in the M key points into a D1/M1-dimensional short vector, wherein each element in the short vector is a cluster center ID to which the subsection belongs; taking each element in the short vector as a keyword, retrieving an image identifier containing the element from a local feature inverted index table, and recalling M.m1 clustering centers; and sequencing the M x M1 cluster centers from high to low according to the repetition rate of the recalled cluster centers.

According to some embodiments of the present disclosure, the image retrieval method may further include: calculating Euclidean distance between each element in the short vector quantized by the features of each key point and the recalled corresponding cluster center, and removing cluster centers with Euclidean distance larger than a first preset threshold value from M x M1 cluster centers recalled; and removing the mismatching cluster centers in the recalled M x M1 cluster centers, and sending the images mapped by the rest cluster centers to the user.

According to some embodiments of the present disclosure, the image retrieval method may further include: responding to the received image to be queried input by a user, and carrying out global feature extraction on the image to be queried to obtain D2 global feature descriptors; product quantization of D2 global feature descriptors, including: dividing the D2 dimension into m2 subsections, and obtaining the subsections D2/m2 of each subsection and the global feature descriptors D2/m2 of each subsection; clustering D2/m2 global feature descriptors in each sub-segment respectively, and setting the clustering number K2 to obtain K2D 2/m 2-dimensional clustering centers of each sub-segment, wherein each clustering center is represented by a clustering center ID, and the range of the clustering center ID is 0 to K2-1; quantizing the global feature vector into a D2/m 2-dimensional short vector, wherein each element in the short vector is a cluster center ID to which the sub-segment belongs; taking each element in the short vector as an index, retrieving an image identifier containing the element from a global feature inverted index table, and recalling m2 cluster centers; and sequencing the m2 cluster centers from high to low according to the repetition rate of the recalled cluster centers.

According to some embodiments of the present disclosure, the image retrieval method may further include: calculating Euclidean distance between each element in the short vector quantized by the global feature vector and the recalled corresponding cluster center, and removing cluster centers with Euclidean distance larger than a second preset threshold value among the recalled m2 cluster centers; and removing the mismatching cluster centers in the recalled m2 cluster centers, and sending the images mapped by the rest cluster centers to the user.

According to some embodiments of the present disclosure, the image retrieval method may further include: retrieving, with each element in the short vector as an index, image identifiers containing the element and elements similar to the element from a global feature inverted index table by approximate expansion, recall m3 cluster centers, where m3> m2; calculating Euclidean distance between each element in the short vector quantized by the global feature vector and the recalled corresponding cluster center, and removing cluster centers with Euclidean distance larger than a second preset threshold value among m3 cluster centers recalled; and removing the mismatching cluster centers in the recalled m3 cluster centers, and sending the images mapped by the rest cluster centers to the user.

According to another aspect of the present disclosure, there is provided an image retrieval system comprising a unit configured to perform the image retrieval method of any of the embodiments described in the present disclosure.

According to some embodiments of the present disclosure, there is provided an electronic device including: a memory; and a processor coupled to the memory, the processor configured to perform the image retrieval method of any of the embodiments described in the present disclosure based on instructions stored in the memory.

According to yet another aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the image retrieval method of any of the embodiments described in the present disclosure.

According to yet another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, performs the image retrieval method of any of the embodiments described in the present disclosure.

Other features, aspects, and advantages of the present disclosure will become apparent from the following detailed description of exemplary embodiments of the disclosure, which is to be read in connection with the accompanying drawings.

Drawings

Preferred embodiments of the present disclosure are described below with reference to the accompanying drawings. The accompanying drawings, which are included to provide a further understanding of the disclosure, are incorporated in and form a part of this specification, and are incorporated in and constitute a part of this specification. It is to be understood that the drawings in the following description are only related to some embodiments of the present disclosure and are not intended to limit the present disclosure. In the drawings:

Fig. 1 illustrates a search library creation flowchart of an image search method according to an exemplary embodiment of the present disclosure.

Fig. 2 shows a retrieval flow diagram of an image retrieval method according to an exemplary embodiment of the present disclosure.

Fig. 3 shows a schematic block diagram of an image retrieval system according to an exemplary embodiment of the present disclosure.

Fig. 4 shows a schematic block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

FIG. 5 illustrates a block diagram of an example structure of a computer system that may be employed in accordance with an example embodiment of the present disclosure.

It should be appreciated that for ease of description, the dimensions of the various parts shown in the figures are not necessarily drawn to actual scale. The same or similar reference numbers are used in the drawings to refer to the same or like parts. Thus, once an item is defined in one drawing, it may not be further discussed in subsequent drawings.

Detailed Description

Technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, but it is apparent that the described embodiments are only some embodiments of the present disclosure, not all embodiments. The following description of the embodiments is merely exemplary in nature and is in no way intended to limit the disclosure, its application, or uses. It should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect. The relative arrangement of parts and steps, numerical expressions and numerical values set forth in these embodiments should be construed as exemplary only, and not limiting the scope of the present disclosure unless specifically stated otherwise.

The term "comprising" and variations thereof as used in this disclosure is meant to encompass at least the following elements/features, but not to exclude other elements/features, i.e. "including but not limited to". Furthermore, the term "comprising" and variations thereof as used in this disclosure is meant to encompass at least the following elements/features, but not to exclude other elements/features, i.e. "including but not limited to". Thus, inclusion is synonymous with inclusion. The term "based on" means "based at least in part on.

Reference throughout this specification to "one embodiment," "some embodiments," or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. For example, the term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Moreover, appearances of the phrases "in one embodiment," "in some embodiments," or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment, but may.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units. Unless specified otherwise, the concepts of "first," "second," etc. are not intended to imply that the objects so described must be in a given order, either temporally, spatially, in ranking, or in any other manner.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

Embodiments of the present disclosure will be described in detail below with reference to the attached drawings, but the present disclosure is not limited to these specific embodiments. The following embodiments may be combined with each other and some embodiments may not be repeated for the same or similar concepts or processes. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner as would be apparent to one of ordinary skill in the art from this disclosure in one or more embodiments.

The dimensions of SIFT features are fixed, typically 128 dimensions. Such a fixed-dimension representation of features may not adequately capture all of the details and features in the image, limiting its expressive power on complex images. Secondly, the SIFT algorithm has high computational complexity, and particularly when processing a large-scale image database, a large amount of time and computational resources are required to be consumed, which limits the feasibility of the SIFT algorithm in real-time and large-scale applications. In addition, the SIFT features are sensitive to illumination change and partial shielding, and when illumination change exists in an image or a target object is shielded, the extraction and matching performance of the SIFT features can be reduced, so that the accuracy of a retrieval result is affected. In addition, SIFT algorithms are also limited in their performance in handling extreme dimensional changes and large angle rotation changes. Meanwhile, SIFT features consume a large amount of memory when storing large-scale image databases. For example, for 1 million images, it is necessary to use about 2.8T memory to store these features. Such tremendous memory requirements make the system impractical and expensive to store and process large-scale data. Finally, the local SIFT feature extraction ignores the global feature expression of the image, so that the retrieval capability of the retrieval mode for similar images is lost.

Large-scale image retrieval systems are used in many businesses and there are also a few open-source solutions. However, the existing large-scale vector retrieval with high accuracy and recall rate is required by many businesses, and the open-source solution has high recall rate but insufficient accuracy because the similarity map retrieval can only be performed based on the global image features with fixed length. In order to improve accuracy, the same graph search can be performed using a local feature that is long. In order to ensure high accuracy and recall rate at the same time, the invention provides a large-scale image retrieval system for the joint retrieval of the same similarity graph.

In the embodiment of the disclosure, when an index library is constructed, firstly, feature vectors of images are extracted. Different from the traditional SIFT feature extraction method, the invention provides a feature extraction mode capable of supporting the global feature with fixed length and the local feature with variable length, so that the purpose that the same graph and the similar graph can be searched in a combined way is achieved. The invention can solve the technical defect that the traditional open source scheme can only search the similar diagram. In an Internet platform and a social scene, the retrieval technology of the same graph can provide higher accuracy and meet the requirement of the platform on image retrieval.

Meanwhile, when the image base is large, a large amount of system memory is consumed for carrying out the retrieval operation on the images, and 2.8T memory is required for retrieving 1 hundred million image libraries through calculation. Therefore, the invention uses the optimized product quantization (OPQ: optimal Product Quantization) to carry out the characteristic quantization, and can reduce the memory use by 128-256 times. In order to support the quick retrieval of the quantized variable-length local features, the invention also provides a retrieval scheme based on the inverted index. The fixed-length global feature can also be directly spliced to the variable-length local feature, so that the system can directly support the joint index of the fixed-length and variable-length features, and the purpose of joint retrieval of the same similarity graph is achieved.

The disclosed embodiments first perform scale space extremum detection in an image by constructing a gaussian pyramid and using gaussian difference operators to detect stable keypoints that represent local feature regions in the image. For each keypoint, a local area is defined, for example a window of fixed size, for the calculation of the feature descriptors. The image is gaussian smoothed in the local region and the gradient magnitude and direction are calculated. Then, the local region is divided into a plurality of sub-regions, for example, a grid of 4×4 or 8×8, and a gradient histogram of each sub-region is calculated. Finally, the gradient histograms of all the sub-regions are connected to form a variable length local feature descriptor, which is capable of capturing subtle changes in different local feature regions. In addition to the extraction of local features, embodiments of the present disclosure perform feature extraction on the entire image to obtain a fixed-length global feature. The whole image is regarded as a special local area, and the characteristic descriptors of the whole image are calculated by adopting the same method to obtain a fixed-length global characteristic descriptor. By supporting both fixed-length global features and variable-length local features, the SIFT feature extraction approach can provide a more comprehensive and diversified feature representation. Meanwhile, the variable-length local features can capture detailed information of different areas in the image, and the expressive power and the distinguishing degree of the features are enhanced.

The main idea of OPQ quantization is to divide data, quantize the data in blocks, and calculate the product of the quantization results of each block in cartesian space, so as to realize the quantization coding of the whole data. Specifically, first, the original vector space is decomposed into a number of low-dimensional vector spaces. Then, for each low-dimensional vector space, an appropriate quantization method is adopted to independently quantize the vector space. In this way, each low-dimensional vector space obtains its own quantized code. And then, in the Cartesian space, carrying out product calculation on the quantized result of each low-dimensional vector space to obtain the quantized result of the whole data. The quantization results of all the blocks are multiplied, so that the purposes of reducing the dimension and the storage space requirement are achieved. The feature extraction method based on decomposition and quantification can provide efficient retrieval performance and resource utilization.

Through the above steps, an image can be represented as a number between 0 and K (K is the number of cluster centers, typically 5K to 100K), which constitute a sequence, similar to words in an article. Analogy to text searching, the idea of inverted indexing can be used to construct an index of images. Inverted indexes are a commonly used index structure for quickly finding documents or data records containing a particular term. In the image index, each order can be regarded as a term, and the image is regarded as a document. By mapping each term with the corresponding image, an inverted index table with the term as a key can be constructed. Specifically, for each sequence number (term), an image identifier containing the sequence number may be recorded, thereby establishing a term-to-image mapping relationship. Therefore, when image searching is carried out, only the image identifier containing the target sequence number is needed to be searched, and the whole image library does not need to be traversed, so that the searching efficiency is greatly improved. The inverted index has the advantage of being capable of rapidly positioning the image containing the target sequence number and realizing efficient image retrieval. Through the inverted index, related images can be quickly found according to the sequence numbers queried by the user and returned to the user. The image retrieval method based on the inverted index effectively combines the ideas of term retrieval and image indexing, and provides a feasible scheme for image retrieval.

An image retrieval method and system according to an exemplary embodiment of the present invention will be described in detail with reference to fig. 1 to 5.

According to the embodiment of the invention, the image retrieval method needs to establish an image retrieval library firstly, then retrieve the image based on the image retrieval library, and the establishment of the image retrieval library comprises the establishment of a local feature retrieval library and the establishment of a global feature retrieval library.

Fig. 1 shows a search pool establishment flowchart 100 of an image search method according to an exemplary embodiment of the present invention.

As shown in fig. 1, at step S101, N stable key points of each sample image are detected, and feature extraction is performed on each stable key point to obtain n×d1 local feature descriptors.

According to the embodiment of the invention, the SIFT algorithm can be adopted to perform feature extraction of each stable key point. Where N is a natural number and D1 is a dimension. The choice of N depends on the number of stable keypoints of the image to be detected. In accordance with a preferred embodiment of the present invention, D1 preferably takes 128 dimensions when extracting local features.

For example, when the image has 8 stable key points, n=8, and D1 takes 128 dimensions, then 8×128=1024 local feature descriptors can be obtained by performing feature extraction.

At step S102, product quantization is performed on n×d1 local feature descriptors. According to an embodiment of the present invention, step S102 may be split into three sub-steps S1021, S1022 and S1023.

As shown in fig. 1, at sub-step S1021, the D1 dimension is divided into m1 subsections, and the subsections D1/m1 of each subsection and the local feature descriptors of each subsection are obtained as n×d1/m 1.

According to an embodiment of the present invention, the optimal product quantization OPQ is used to quantize the local feature descriptors. Dividing the D1 dimension into m1 segments, wherein the sub dimension of each segment is D1/m1, and the number of the local feature descriptors of each segment is N x D1/m 1. It should be understood that m1 is a natural number that can be divided by dimension D1. For example, where D1 takes 128 dimensions and m1 takes 2, the sub-dimension of each segment is 64 dimensions.

At sub-step S1022, the D1/m1 local feature descriptors in each sub-segment are clustered respectively, and the number of clusters K1 is set to obtain K1D 1/m 1-dimensional cluster centers of each sub-segment. Wherein each cluster center (also referred to as cluster center) is represented by a cluster center ID, which ranges from 0 to K1-1, where K1 is typically a natural number between five and hundred thousand.

At sub-step S1023, the feature vector of each of the N keypoints is quantized into a D1/m 1-dimensional short vector. It should be appreciated that each element in the short vector is the cluster center ID to which the sub-segment belongs.

After the optimized product quantization, the N key points are quantized into N D1/m 1-dimensional short vectors.

At step S103, by recording, for each cluster center ID, the image identifier of the sample image containing the cluster center ID, a mapping relationship of the cluster center ID to the sample image is established, thereby establishing a local feature inverted index table with the cluster center ID as a key. It should be understood that the image identifier is information for identifying an image.

The global feature inverted index table is built similarly to the local feature inverted index table.

At step S104, global feature extraction is performed on each sample image, and D2 global feature descriptors are obtained.

According to an embodiment of the invention, a SIFT algorithm may be employed to perform the extraction of global features. Wherein D2 is the dimension. In accordance with a preferred embodiment of the present invention, D2 preferably takes 768 dimensions when extracting global features, and feature extraction may be performed to obtain 768 global feature descriptors.

At step S105, product quantization is performed on D2 global feature descriptors. According to an embodiment of the present invention, step S105 may also be split into three sub-steps S1051, S1052 and S1053.

As shown in fig. 1, at sub-step S1051, the D2 dimension is divided into m2 sub-segments, obtaining the sub-dimension D2/m2 of each sub-segment and the global feature descriptors of each sub-segment are D2/m 2.

According to an embodiment of the present invention, the global feature descriptors are quantized using an optimal product quantization OPQ. Dividing the D2 dimension into m2 segments, wherein the sub dimension of each segment is D2/m2, and the number of the global feature descriptors of each segment is D2/m 2. It should be appreciated that m2 is a natural number that can be divided by dimension D2. For example, where D2 takes 768 dimensions and m2 takes 4, the sub-dimension of each segment is 192 dimensions.

At sub-step S1052, the D2/m2 global feature descriptors in each sub-segment are clustered respectively, and the number of clusters K2 is set to obtain K2D 2/m 2-dimensional cluster centers of each sub-segment. Wherein each cluster center is represented by a cluster center ID, which ranges from 0 to K2-1, where K2 is typically a natural number between five thousand and one hundred thousand.

It should be appreciated that K2 may be consistent with K1, but is not limited thereto. It should also be understood that the clustering center obtained by clustering the global feature descriptors may start from 0 (i.e., from 0 to K2-1), or may be spliced behind the clustering center obtained by clustering the local feature descriptors (i.e., from K1 to k1+k2-1), which is illustrated in this disclosure by taking calculation from 0 as an example.

At sub-step S1053, the global feature vector is quantized into a D2/m 2-dimensional short vector. It should be appreciated that each element in the short vector is the cluster center ID to which the sub-segment belongs.

After the optimized product quantization, the global feature vector is quantized into a short vector of D2/m2 dimensions.

At step S106, by recording, for each cluster center ID, the image identifier of the sample image containing the cluster center ID, a mapping relationship of the cluster center ID to the sample image is established, thereby establishing a global feature inverted index table with the cluster center ID as a key.

By establishing a local feature inverted index table and a global feature inverted index table which take a cluster center ID as a key word, an image retrieval library is established.

Fig. 2 shows a retrieval flow diagram 200 of an image retrieval method according to an exemplary embodiment of the invention.

As shown in fig. 2, when an image to be queried input by a user is received, at step S201, M stable key points of the image to be queried are detected, and feature extraction is performed on each stable key point to obtain m×d1 local feature descriptors. It should be understood that M is a natural number that may be the same or different from N, and that the choice of M is also dependent on the number of stable keypoints of the image to be detected. The dimension D1 of feature extraction at the time of query is identical to the dimension at the time of library construction.

At step S202, m×d1 local feature descriptors are product quantized. As in the case of creating the image retrieval library, step S202 may also be split into three sub-steps S2021, S2022, and S2023 according to an embodiment of the present invention.

As shown in fig. 2, at sub-step S2021, the D1 dimension is divided into M1 subsections, obtaining a subsection D1/M1 of each subsection and m×d1/M1 local feature descriptors of each subsection.

According to the embodiment of the invention, the dimension and the sub-dimension taken during image retrieval are identical to those taken during library construction, so that the number of segments is also identical, and the principle of quantization of the optimal product is the same as that in step S1021. I.e. dividing the dimension D1 into M1 segments, the sub-dimension of each segment is D1/M1, and the number of local feature descriptors of each segment is m×d1/M1.

At sub-step S2022, the D1/m1 local feature descriptors in each sub-segment are clustered separately, and the number of clusters K1 is set to obtain K1D 1/m 1-dimensional cluster centers for each sub-segment. Wherein each cluster center is represented by a cluster center ID, which ranges from 0 to K1-1, where K1 is typically a natural number between five thousand and one hundred thousand. It should be understood that the number of clusters set when image retrieval is performed is identical to the number of clusters set when library construction is performed, and K1 is used.

At sub-step S2023, the feature vector for each of the M keypoints is quantized into a D1/M1-dimensional short vector. It should be appreciated that each element in the short vector is the cluster center ID to which the sub-segment belongs.

After the optimized product quantization, the M key points are quantized into M D1/M1-dimensional short vectors.

As shown in fig. 2, at step S203, with each element in the short vector as a key, the image identifier containing the element is retrieved from the local feature inverted index table, and m×m1 cluster centers are recalled.

At step S204, m×m1 cluster centers are ordered from high to low according to the recalled repetition rate of cluster centers. It will be appreciated that the higher the repetition rate of recalled cluster centers, the higher the matching rate of images, and that the user can set how much of the previous percentage of ordered cluster centers to return to as desired.

According to an embodiment of the present invention, for example, when M is 10 stable key points, D1 is 128 dimensions, and M1 is 2, recall 2×10=20 cluster centers, and if the repetition rate of the recalled cluster center ID 32 is 4, the repetition rate of the cluster center ID 89 is 3, the repetition rate of the cluster center ID 1111 is 1, the repetition rate of the cluster center ID 12 is 5, and the repetition rate of the cluster center ID 811 is 2 … …, the order of the repetition rate of these 1280 cluster centers from high to low is 12, 32, 89, 811, 1111 … ….

At step S205, a euclidean distance between each element in the short vector quantized by the feature vector of each key point and the recalled corresponding cluster center is calculated, and cluster centers, of which euclidean distances are greater than a first predetermined threshold, among m×m1 cluster centers recalled are removed.

According to the embodiment of the invention, the similarity between each element in the short vector and the corresponding cluster center of recall is calculated by calculating the Euclidean distance, and the cluster center with the similarity lower than a preset threshold value is removed. It should be understood that the predetermined threshold may be set as desired, and is not limited herein.

At step S206, the m×m1 cluster centers that are recalled and that are mismatched are removed, and the image mapped by the remaining cluster centers is sent to the user. According to the embodiment of the invention, whether the cluster centers are in mismatching or not can be judged according to the position of the recalled cluster center and the position information of each element in the short vector quantized by the characteristics of each key point, and the cluster center with the position obviously mismatching can be removed. The images corresponding to the rest of the recalled cluster centers are the images which are sent to the user and obtained through local feature retrieval.

The retrieval flow for global features is similar to the retrieval flow for local features.

As shown in fig. 2, when an image to be queried input by a user is received, global feature extraction is performed on the image to be queried at step S207, so as to obtain D2 global feature descriptors. It should be appreciated that dimension D2, where feature extraction occurs at query time, is consistent with the dimension at library construction.

At step S208, product quantization is performed on D2 global feature descriptors. As with the creation of the image retrieval library, step S208 may also be split into three sub-steps S2081, S2082, and S2083 according to an embodiment of the present invention.

As shown in fig. 2, at sub-step S2081, the D2 dimension is divided into m2 sub-segments, obtaining the sub-dimension D2/m2 of each sub-segment and the global feature descriptors of each sub-segment as D2/m 2.

According to the embodiment of the invention, the dimension and the sub-dimension taken during image retrieval are identical to those taken during library construction, so that the number of segments is also identical, and the principle of quantization of the optimal product is the same as that in step S1051. I.e. the D2 dimension is divided into m2 segments, the sub-dimension of each segment is D2/m2, and the number of global feature descriptors per segment is D2/m 2.

At sub-step S2082, D2/m2 global feature descriptors in each sub-segment are clustered respectively, and the number of clusters K2 is set to obtain K2D 2/m 2-dimensional cluster centers of each sub-segment. Wherein each cluster center is represented by a cluster center ID, which ranges from 0 to K2-1, where K2 is typically a natural number between five thousand and one hundred thousand. It should be understood that the number of clusters set when image retrieval is performed is identical to the number of clusters set when library construction is performed, and K2 is used.

At sub-step S2023, the global feature vector is quantized into a short vector of D2/m2 dimensions. It should be appreciated that each element in the short vector is the cluster center ID to which the sub-segment belongs.

As shown in fig. 2, at step S209, with each element in the short vector as a key, the image identifier containing the element is retrieved from the global feature inverted index table, and m2 cluster centers are recalled.

At step S210, m2 cluster centers are ordered from high to low according to the recalled cluster center repetition rate. It will be appreciated that the higher the repetition rate of recalled cluster centers, the higher the matching rate of images, and that the user can set how much of the previous percentage of ordered cluster centers to return to as desired.

According to an embodiment of the invention, for example, when D2 is 768 dimensions and m2 takes 4, 4 cluster centers are recalled, and assuming that cluster center IDs 35 are recalled 2 times (i.e., repetition rate is 2), cluster center IDs 92 and 111 are each recalled once, the 4 cluster centers are ranked from high to low repetition rate as 35, 92, 111.

At step S211, the euclidean distance between each element in the short vector quantized by the global feature vector and the recalled corresponding cluster center is calculated, and the cluster center of which euclidean distance is greater than the second predetermined threshold value among the recalled m2 cluster centers is removed.

At step S212, the mismatching cluster centers of the recalled m2 cluster centers are removed, and the image mapped by the remaining cluster centers is sent to the user. According to the embodiment of the invention, whether the cluster centers are in mismatching or not can be judged according to the position of the recalled cluster center and the position information of each element in the short vector quantized by the global feature, and the cluster center with obvious mismatching of the position is removed. The image corresponding to the rest of m2 cluster centers is the image obtained by global feature retrieval and sent to the user.

It should be understood that, when searching an image, only local feature search may be performed, only global feature search may be performed, and local feature search and global feature search may be performed simultaneously. When the local feature search and the global feature search are performed simultaneously, the local feature search and the global feature search are performed in parallel.

Although not shown in the figures, the image retrieval method further includes approximately expanding the global features. The approximate expansion is described below.

In quantization and inverted indexing, the approximate expansion of Global keys (Global keys) is a technique for improving query efficiency. This technique is typically used to speed up global key matching during the query process to reduce computational and storage overhead. Two concepts are explained first below.

Global Key (Global Key): in an inverted index, a global key refers to a key that uniquely identifies a particular data item throughout the index. It may be a word, phrase, or other identifier. The global key is the key for creating an inverted index by which documents or data items containing this global key can be quickly found.

Approximation extension (Approximate Expansion): approximation expansion is a technique for fuzzy matching or similarity matching, rather than strict matching, during a query. By approximate expansion, more candidates related to the query term can be found in the index, thereby increasing the recall of the query (the ability to find related terms), but some error may be introduced.

In quantization and inverted indexing, the approximate expansion of the global key can be understood as performing fuzzy matching on the query term, and finding other keys similar to the query term, thereby obtaining more inverted lists. The method can improve the query performance under certain scenes, especially when the query terms are misspelled or synonyms exist, the recall rate can be increased, and the search accuracy is improved.

Approximating the extensions typically involves measuring the similarity between global keys using string similarity algorithms (e.g., edit distance, cosine similarity, etc.), and then selecting whether to add these similar global keys to the query's extension based on a similarity threshold. Thus, when querying, not only the strictly matched global keys, but also those similar global keys are matched, so that more relevant inverted lists are obtained.

For example, of 8192 cluster center IDs, some IDs have similarity, if there is a cluster center id=1 in one image, then not only the cluster center 1 but also the cluster center 101 can be found in the inverted index (provided that the cluster center 1 and the cluster center 101 are known to have similarity, the cosine similarity between 2×64 can be characterized by calculating the two cluster centers).

In particular, the image retrieval method 200 further includes retrieving image identifiers containing each element in the short vectors and elements similar to the element from the global feature inverted index table by approximate expansion, with each element as a key, to return more than m2 cluster centers.

According to the image retrieval method, the image retrieval library is built by respectively building the inverted index table of the local feature vector and the global feature vector, and then when image searching is carried out, only the image identifier containing the target clustering center is needed to be searched, and the whole image library is not needed to be traversed, so that the searching efficiency is greatly improved.

The inverted index has the advantage of being capable of rapidly positioning the image containing the target clustering center and realizing efficient image retrieval. Through the inverted index, the related images can be quickly found according to the clustering center obtained by feature extraction and quantization of the images queried by the user, and returned to the user. The image retrieval method based on the inverted index effectively combines the ideas of term retrieval and image indexing, and provides a feasible scheme for image retrieval.

Fig. 3 shows a schematic block diagram of an image retrieval system 300 according to an exemplary embodiment of the present disclosure. The image retrieval system 300 according to the present invention can implement the image retrieval methods 100 and 200 shown in fig. 1 and 2.

As shown in fig. 3, the image retrieval system 300 includes a keypoint detection unit 31 configured to detect stable keypoints of an image. According to the embodiment of the invention, the key point detection unit can detect N stable key points of each sample image when an image retrieval library (namely an inverted index table) is established, and can detect M stable key points of an image to be queried when image retrieval is carried out. Here, M and N are both natural numbers.

As shown in fig. 3, the image retrieval system 300 includes a feature extraction unit 32 configured to perform feature extraction on an image to obtain a feature descriptor. According to the embodiment of the invention, when the local feature of the image is extracted, the feature extraction is performed on each stable key point. Referring to the description above with respect to fig. 1 and 2, it should be appreciated that the dimensions set when feature extraction is performed on local features of an image are generally inconsistent with the dimensions set when feature extraction is performed on global features, e.g., the dimensions of local feature extraction are preferably 128 and the dimensions of global feature extraction are preferably 768.

As shown in fig. 3, the image retrieval system 300 includes a product quantization unit 33 configured to product-quantize the feature descriptors.

Referring to the description above with respect to fig. 1, in the case where N stable keypoints of the sample image are detected and feature extraction is performed on the N keypoints to obtain n×d1 local feature descriptors in the establishment stage of the inverted index table, the product quantization unit 33 is configured to perform product quantization on the feature descriptors, including: dividing the dimension D1 into m1 subsections, and obtaining the subsection dimension D1/m1 of each subsection and N x D1/m1 local feature descriptors of each subsection; clustering D1/m1 local feature descriptors in each subsection respectively, and setting the clustering number K1 to obtain K1D 1/m 1-dimensional clustering centers of each subsection, wherein each clustering center is represented by a clustering center ID, the range of the clustering center ID is 0 to K1-1, and K1 is a natural number between five thousands and hundred thousands; and quantizing the feature vector of each of the N key points into a D1/m 1-dimensional short vector, wherein each element in the short vector is the cluster center ID to which the sub-segment belongs.

The process of product-quantizing the global feature descriptors at the creation stage of the inverted index table refers to steps S1051 to S1053. The process of product quantization for the local feature descriptors in the image retrieval stage refers to steps S2021-S2023, and the process of product quantization for the global feature descriptors refers to steps S2081-S2083, which are not described herein.

As shown in fig. 3, the image retrieval system 300 includes a mapping unit 34 configured to establish a local or global feature inverted index table using the cluster center ID as a key by recording, for each cluster center ID, an image identifier of a sample image containing the cluster center ID to establish a mapping relationship of the cluster center ID to the sample image. According to an embodiment of the present invention, the process of creating the inverted index table by the mapping unit may refer to steps S103 and S106 in fig. 1, which are not described herein.

As shown in fig. 3, the image retrieval system 300 includes a recall unit 35 configured to retrieve an image identifier containing each element in the short vector obtained by product quantization by the product quantization unit from the local or global feature inverted index table with the element as a key, thereby recalling a satisfactory cluster center. According to an embodiment of the present invention, the procedure of recalling the satisfactory clustering center by the recall unit may refer to steps S203 and S209 in fig. 2, which are not described herein.

As shown in fig. 3, the image retrieval system 300 comprises a ranking unit 36 configured to rank recalled cluster centers from high to low according to their repetition rate. It will be appreciated that the higher the repetition rate of recalled cluster centers, the higher the matching rate of images, and that the user can set how much of the previous percentage of ordered cluster centers to return to as desired. According to an embodiment of the present invention, the process of sorting the recalled cluster centers from high to low according to the repetition rate of the recalled cluster centers by the sorting unit may refer to steps S204 and S210 in fig. 2, which are not described herein.

As shown in fig. 3, the image retrieval system 300 includes a calculation unit 37 configured to calculate euclidean distances between each element in the short vector and the recalled corresponding cluster center, and remove distance centers in which the euclidean distance is greater than a predetermined threshold among the recalled cluster centers, and simultaneously remove the mismatching cluster centers among the recalled cluster centers, and send the image to which the remaining cluster centers are mapped to the user. According to an embodiment of the present invention, the process of calculating the euclidean cluster by the calculating unit, removing part of the cluster centers of the recalled cluster centers according to the euclidean distance and the mismatching information, and transmitting the image mapped to the rest of the cluster centers to the user may refer to steps S205-S206 and S211-S212 in fig. 2, which are not described herein.

According to an embodiment of the present invention, the recall unit 35 may be further configured to recall more cluster centers by retrieving the image identifier containing the element and the element similar to the element from the global feature inverted index table by approximately expanding with each element in the short vector as an index when processing the global feature of the image.

The image retrieval system provided by the embodiments of the present disclosure may implement the image retrieval method provided by any of the embodiments herein.

The application provides a SIFT feature extraction mode capable of supporting fixed-length global features and variable-length local features simultaneously, and combining inverted indexes to perform quick retrieval. By using OPQ for feature quantization, storage requirements are greatly reduced. Meanwhile, the image retrieval system provided by the application can directly support the joint index of the fixed-length and variable-length features, realize the joint retrieval of the same graph and the similar graph, and make up the technical defect of the traditional open-source scheme. In the searching scene, the accuracy rate can reach 99%, the recall rate can reach 90%, the searching of hundred million-level pictures can be supported, the memory is reduced by 128 times, the searching of hundred million-level images only needs 95G memory, and the searching of single image takes only 200ms.

Some embodiments of the present disclosure also provide an electronic device. Fig. 4 illustrates a block diagram of some embodiments of the electronic device 4 of the present disclosure. The electronic device may be used to implement a method according to any of the embodiments of the present disclosure.

For example, in some embodiments, the electronic device 4 may be various types of devices, and may include, for example, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), vehicle terminals (e.g., car navigation terminals), and the like, as well as stationary terminals such as digital TVs, desktop computers, and the like. For example, the electronic device 4 may comprise a display panel for displaying data and/or execution results utilized in the scheme according to the present disclosure. For example, the display panel may be various shapes such as a rectangular panel, an oval panel, a polygonal panel, or the like. In addition, the display panel may be not only a planar panel but also a curved panel or even a spherical panel.

As shown in fig. 4, the electronic apparatus 4 of this embodiment includes: a memory 41 and a processor 42 coupled to the memory 41. It should be noted that the components of the electronic device 4 shown in fig. 4 are only exemplary and not limiting, and that the electronic device 4 may also have other components according to the actual application needs. The processor 42 may control other components in the electronic device 4 to perform the desired functions.

In some embodiments, memory 41 is used to store one or more computer-readable instructions. The processor 42 is configured to execute computer readable instructions which, when executed by the processor 42, implement a method according to any of the embodiments described above. The specific implementation of the steps of the method and the related explanation can be referred to the above embodiments, and the details are not repeated here.

For example, the processor 42 and the memory 41 may communicate with each other directly or indirectly. For example, the processor 42 and the memory 41 may communicate via a network. The network may include a wireless network, a wired network, and/or any combination of wireless and wired networks. The processor 42 and the memory 41 may also communicate with each other via a system bus, which is not limited by the present disclosure.

For example, the processor 42 may be embodied as various suitable processors, processing means, etc., such as a Central Processing Unit (CPU), a graphics processor (Graphics Processing Unit, GPU), a Network Processor (NP), etc.; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The Central Processing Unit (CPU) can be an X86 or ARM architecture, etc. For example, the memory 41 may include any combination of various forms of computer-readable storage media, such as volatile memory and/or nonvolatile memory. The memory 41 may include, for example, a system memory storing, for example, an operating system, application programs, boot Loader (Boot Loader), database, and other programs. Various applications and various data, etc. may also be stored in the storage medium.

In addition, according to some embodiments of the present disclosure, various operations/processes according to the present disclosure, when implemented by software and/or firmware, may be installed from a storage medium or a network to a computer system having a dedicated hardware structure, such as the computer system 500 shown in fig. 5, which is capable of performing various functions including functions such as those described previously, and the like, when various programs are installed. FIG. 5 illustrates a block diagram of an example structure of a computer system that may be employed in accordance with embodiments of the present disclosure.

In fig. 5, a Central Processing Unit (CPU) 501 executes various processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 to a Random Access Memory (RAM) 503. In the RAM 503, data required when the CPU 501 executes various processes and the like is also stored as needed. The central processing unit is merely exemplary, and it may also be other types of processors, such as the various processors described previously. The ROM 502, RAM 503, and storage section 508 may be various forms of computer-readable storage media, as described below. It should be noted that although ROM 502, RAM 503, and storage 508 are shown separately in FIG. 5, one or more of them may be combined or located in the same or different memories or storage modules.

The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output interface 505 is also connected to the bus 504.

The following components are connected to the input/output interface 505: an input section 506 such as a touch screen, a touch panel, a keyboard, a mouse, an image sensor, a microphone, an accelerometer, a gyroscope, and the like; an output section 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), a speaker, a vibrator, etc.; a storage section 508 including a hard disk, a magnetic tape, and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, and the like. The communication section 509 allows communication processing to be performed via a network such as the internet. It will be readily appreciated that while the various devices or modules in computer system 500 shown in FIG. 5 are shown as communicating via bus 504, they may also communicate via a network or other means, wherein the network may include a wireless network, a wired network, and/or any combination of wireless and wired networks.

The drive 510 is also connected to the input/output interface 505 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as needed, so that a computer program read out therefrom is mounted in the storage section 508 as needed.

In the case of implementing the above-described series of processes by software, a program constituting the software may be installed from a network such as the internet or a storage medium such as the removable medium 511.

The processes described above with reference to flowcharts may be implemented as computer software programs according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or from the storage means 508, or from the ROM 502. When executed by the CPU 501, the computer program performs the functions defined above in the methods of the embodiments of the present disclosure.

It should be noted that in the context of this disclosure, a computer-readable medium can be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable medium may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable storage medium may be, for example, but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

In some embodiments, there is also provided a computer program comprising: instructions that, when executed by a processor, cause the processor to perform the method of any of the embodiments described above. For example, the instructions may be embodied as computer program code.

In an embodiment of the present disclosure, computer program code for performing the operations of the present disclosure may be written in one or more programming languages, including but not limited to an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer, for example, through the internet using an internet service provider.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules, components or units referred to in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a module, component or unit does not in some cases constitute a limitation of the module, component or unit itself.

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

The above description is merely illustrative of some embodiments of the present disclosure and of the principles of the technology applied. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the disclosure may be practiced without these specific details. In other instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the present disclosure. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

1. An image retrieval method comprising:

Detecting N stable key points of each sample image, and extracting features of each stable key point to obtain N x D1 local feature descriptors, wherein N is a natural number, and D1 is a dimension;

product quantization is performed on the n×d1 local feature descriptors, including:

dividing the dimension D1 into m1 subsections, and obtaining the subsection dimension D1/m1 of each subsection and N x D1/m1 local feature descriptors of each subsection;

clustering D1/m1 local feature descriptors in each subsection respectively, and setting the clustering number K1 to obtain K1D 1/m 1-dimensional clustering centers of each subsection, wherein each clustering center is represented by a clustering center ID, the range of the clustering center ID is 0 to K1-1, and K1 is a natural number between five thousands and hundred thousands;

quantizing the feature vector of each of the N key points into a D1/m 1-dimensional short vector, wherein each element in the short vector is a cluster center ID to which the sub-segment belongs;

and recording an image identifier of a sample image containing the cluster center ID for each cluster center ID to establish a mapping relation between the cluster center ID and the sample image, thereby establishing a local feature inverted index table taking the cluster center ID as a key word.

2. The image retrieval method as recited in claim 1, further comprising:

extracting global features of each sample image to obtain D2 global feature descriptors, wherein D2 is a dimension;

product quantization of D2 global feature descriptors, including:

dividing the D2 dimension into m2 subsections, and obtaining the subsections D2/m2 of each subsection and the global feature descriptors D2/m2 of each subsection;

clustering D2/m2 global feature descriptors in each sub-segment respectively, and setting the clustering number K2 to obtain K2D 2/m 2-dimensional clustering centers of each sub-segment, wherein each clustering center is represented by a clustering center ID, and the range of the clustering center ID is 0 to K2-1;

quantizing the global feature vector into a D2/m 2-dimensional short vector, wherein each element in the short vector is a cluster center ID to which the sub-segment belongs;

and recording an image identifier of the sample image containing the cluster center ID for each cluster center ID to establish a mapping relation between the cluster center ID and the sample image, thereby establishing a global feature inverted index table taking the cluster center ID as a key word.

3. The image retrieval method as recited in claim 1, further comprising:

responding to a received image to be queried input by a user, detecting M stable key points of the image to be queried, and extracting features of each stable key point to obtain M.D1 local feature descriptors, wherein M is a natural number;

Product quantization is performed on M x D1 local feature descriptors, including:

dividing the dimension D1 into M1 subsections, and obtaining the subsection dimension D1/M1 of each subsection and the local feature descriptors of each subsection as M x D1/M1;

clustering D1/m1 local feature descriptors in each sub-segment respectively, and setting the clustering number K1 to obtain K1D 1/m 1-dimensional clustering centers of each sub-segment, wherein each clustering center is represented by a clustering center ID, and the range of the clustering center ID is 0 to K-1;

quantizing the characteristics of each key point in the M key points into a D1/M1-dimensional short vector, wherein each element in the short vector is a cluster center ID to which the subsection belongs;

taking each element in the short vector as a keyword, retrieving an image identifier containing the element from a local feature inverted index table, and recalling M.m1 clustering centers;

and sequencing the M x M1 cluster centers from high to low according to the repetition rate of the recalled cluster centers.

4. The image retrieval method as recited in claim 3, further comprising:

calculating Euclidean distance between each element in the short vector quantized by the features of each key point and the recalled corresponding cluster center, and removing cluster centers with Euclidean distance larger than a first preset threshold value from M x M1 cluster centers recalled; and

And removing the mismatching cluster centers in the recalled M.m1 cluster centers, and sending the images mapped by the rest cluster centers to the user.

5. The image retrieval method as recited in claim 2, further comprising:

responding to the received image to be queried input by a user, and carrying out global feature extraction on the image to be queried to obtain D2 global feature descriptors;

product quantization of D2 global feature descriptors, including:

taking each element in the short vector as an index, retrieving an image identifier containing the element from a global feature inverted index table, and recalling m2 cluster centers;

and sequencing the m2 cluster centers from high to low according to the repetition rate of the recalled cluster centers.

6. The image retrieval method as recited in claim 5, further comprising:

calculating Euclidean distance between each element in the short vector quantized by the global feature vector and the recalled corresponding cluster center, and removing cluster centers with Euclidean distance larger than a second preset threshold value among the recalled m2 cluster centers; and

and removing the mismatching cluster centers in the recalled m2 cluster centers, and sending the images mapped by the rest cluster centers to the user.

7. The image retrieval method as recited in claim 5, further comprising:

retrieving, with each element in the short vector as an index, image identifiers containing the element and elements similar to the element from a global feature inverted index table by approximate expansion, recall m3 cluster centers, where m3 > m2;

calculating Euclidean distance between each element in the short vector quantized by the global feature vector and the recalled corresponding cluster center, and removing cluster centers with Euclidean distance larger than a second preset threshold value among m3 cluster centers recalled; and

and removing the mismatching cluster centers in the recalled m3 cluster centers, and sending the images mapped by the rest cluster centers to the user.

8. An image retrieval system comprising a unit configured to perform the method of any of claims 1-7.

9. An electronic device, comprising:

a memory; and

a processor coupled to the memory, the memory having instructions stored therein that, when executed by the processor, cause the electronic device to perform the method of any of claims 1-7.

10. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method according to any of claims 1-7.