CN111177435B

CN111177435B - CBIR method based on improved PQ algorithm

Info

Publication number: CN111177435B
Application number: CN201911417377.6A
Authority: CN
Inventors: 曾浩; 高凡
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2023-03-31
Anticipated expiration: 2039-12-31
Also published as: CN111177435A

Abstract

The invention relates to a CBIR method based on an improved PQ algorithm, belonging to the technical field of image processing. The method comprises the steps of extracting image depth features through an improved depth convolution network, coding and compressing image feature data through an index retrieval module of an inverted index based product quantification IVPQ algorithm which adopts a nonlinear retrieval ANN search strategy, generating indexes of a dynamic index database based on a Faiss frame, partitioning a data space of a full index database through feature vector coding, rapidly locking a certain subspace through Hamming distance rearrangement for traversal when retrieval of a query picture is carried out, and outputting a retrieval image. The invention realizes the dynamism of the retrieval index library based on the Faiss framework, and avoids the high operation and maintenance cost generated for reconstructing the index library in practical application occasions.

Description

CBIR method based on improved PQ algorithm

Technical Field

The invention belongs to the field of image processing, and relates to a CBIR method based on an improved PQ algorithm.

Background

In an actual application scene, a user needs to search and judge a key sensitive image library based on massive, label-free and complex unknown images, so that the function of searching images with images is realized. At present, the most effective mode for representing the index Image information is Based on the Image Content, so a Content Based Image Retrieval (CBIR) method is selected for large-scale Image Retrieval system design.

The traditional CBIR method employs a Brute-force (Brute-force) strategy of similarity measurement, which aggravates the consumption of memory resources as the picture feature index data increases. Particularly, when the data set scale of the actual application occasion reaches hundreds of millions of scales, the occupation of the operating memory (RAM) cannot be met due to the increase of the index scale, so that the retrieval performance is sharply reduced, the system performance cannot obtain the expected target, and the hardware cost is sharply increased. For this reason, the mainstream solution is to use an Approximate Nearest Neighbor (ANN) search strategy, which essentially partitions the full space of the search data set into subspaces, and locks (several) subspace sets in a certain way and traverses them quickly.

The ANN is mainly classified into a KD tree method, a graph index quantization method, a hash method, and a vector quantization method. For a conventional KD tree algorithm, along with the deeper tree depth of a KD tree, the performance of a KD tree method retrieval model is poor; the method for introducing the graph into the ANN search takes a mature HNSW (Hierarchical navigation Small World graphics) algorithm as an example, the recall rate is high, the index memory occupies a large amount, and a unique index structure is not beneficial to dynamic addition and deletion of data; for the hash method, a multi-table Locality-Sensitive Hashing (MLSH) is generated, which is an improved hash coding algorithm for generating a plurality of hash functions for dividing a space domain by constructing a plurality of hash tables so as to improve the search accuracy under a large-scale high-dimensional data set, but the condition that a large amount of memory space is consumed by index data is not eliminated; for the vector Quantization method, the representative algorithm is a Product Quantization (PQ) method which is already very practical and popular in the industrial field, the index data compression performance is better, the memory occupation can be effectively reduced, but the recall rate is lower.

Furthermore, indexing databases for CBIR methods based on ANN retrieval strategies have in recent years adopted more of the FALCONN or NMSLIB frameworks that are not currently supported for dynamic addition and deletion of data. This is acceptable for search algorithms and system implementations for small data sets and medium-scale data sets. However, for large-scale retrieval, in order to meet the needs of practical application occasions, the CBIR system needs to perform index storage on specific sensitive picture data in the operating process. Otherwise, the operation and maintenance cost and time are high and time is consumed for reestablishing the index database each time. Therefore, the index library for the CBIR system in practical applications should be dynamically scalable.

Disclosure of Invention

In view of the above, the present invention aims to provide a CBIR method based on the modified PQ algorithm.

In order to achieve the purpose, the invention provides the following technical scheme:

a content-based image retrieval CBIR method based on an improved product quantization PQ algorithm extracts image depth features through an improved depth convolution network, then codes and compresses image feature data through an index retrieval module which adopts a nonlinear retrieval ANN search strategy and is based on an inverted index product quantization IVPQ algorithm, generates indexes of a dynamic index database based on a Faiss frame, divides a data space of a full index database through feature vector coding, quickly locks a certain subspace through Hamming distance rearrangement for traversal when retrieval of a query picture is carried out, and outputs a retrieval image.

Optionally, the IVPQ algorithm is divided into index construction and nonlinear retrieval query, and X = [ X ] ₁ ,x ₂ ,...,x _N ]∈R ^N×Ω A characteristic vector data set matrix of a training sample set, wherein omega is the dimension of training sample data, N is the number of samples of the training sample set, and a query sample is x _q ；

The index construction specifically comprises the following steps:

and (3) carrying out encoding preprocessing: performing a K-Means clustering algorithm on the training sample characteristic vector data set X to obtain M sample clustering centers C = [ C = ₁ ,c ₂ ,...,c _M ]∈R ^M×Ω Is provided with c _i ＝NNC(x _i ) Representing a training sample data feature vector x _i And (3) subtracting the nearest sample clustering center by two to obtain a residual vector group R, wherein the R is expressed by the formula:

R＝[r ₁ ,r ₂ ,...,r _i ,...,r _N ]∈R ^N×Ω

r _i ＝|x _i -c _i | (2)

for residual vector r _i The dimension space omega of (A) is divided equally by P, and r is recorded _i ＝[r _i,1 ,r _i,2 ,...,r _i,j ,...,r _i,P ]∈R ¹ ^×Ω And omega ₁ +...+ω _j +...+ω _P = omega, and respectively carrying out K-Means clustering on residual sub-vectors of all training samples in different subspaces to generateCodebook set C with consistent clustering center number _Ω ，C _Ω The expression is as follows:

wherein the content of the first and second substances,

a codebook (cluster set) of the jth dimensionality subspace formed by halving the dimensionality space omega of the training sample residual vector group R is provided, and P is the number of the dimensionality subspaces after the omega is halved; />

Is->

M 'is the number of cluster centers per subspace, and satisfies M' =2 ^p ，2 ^p Binary encoding the number of bits for IVPQ;

by C _Ω To r is to _i IVPQ encoding with a per sample residual vector r _i Expressed by the ID number of the clustering center corresponding to the P residual sub-vectors, a training sample IVPQ coding set S is generated, wherein S is expressed by the following formula:

S＝{S(1),S(2),...,S(i),...,S(N)}

wherein S (i) is a residual vector r of the training sample _i Generated set of IVPQ codes, c _i Marking the corresponding training sample cluster center; n (i, j) is the sample residual sub-vector r in S (i) _i,j In the corresponding dimension subspace ω _j The number of the nearest cluster center;

representing a subspace ω _j Neutron vector r _i,j The nearest cluster center number;

the nonlinear retrieval query specifically comprises:

for query sample vector x _q Performing the similar coding pretreatment to generate a query residual vector r _q ＝|x _q -c _q L, likewise will r _q Dividing the vector into P identical sub-vectors, recording r _q ＝[r _q,1 ,r _q,2 ,...,r _q,j ,...,r _q,P ]∈R ^1*Ω And respectively calculating the distance between each subspace and M' clustering centers in the subspace to generate a query vector distance pool D with the size of P multiplied by M _Ω ，D _Ω The expression is as follows:

/>

wherein, c _q To query the sample cluster centers of the sample vectors,

for querying residual subvectors r _q (j) And subspace omega _j Distance sets of M' cluster centers; />

Is r _q (j) Corresponding to omega _j The distance value of the kth cluster center in (c),

is r _q (j) Corresponding to omega _j The kth cluster center;

when searching, only the training sample coding set S and the query sample vector x _q Sample cluster center c of _q Ucpq code set S with consistent subscripts _q Namely, ROI, traversing and inquiring; let the number of code groups consistent with the query vector be N', and obtain S from equation (4) _q Expression:

S _q ＝{S _q (1),S _q (2),...,S _q (i)...,S _q (N')}

in the query vector distance pool D _Ω Respectively calculate and S _q Generating a query retrieval distance set D by the sum of P Hamming distance values corresponding to each coding group _q Then D is _q The expression is as follows:

D _q ＝[D _q (1),D _q (2),...,D _q (i),...,D _q (N')]

wherein D is _q (i) Denotes S _q The ith training sample vector x _i And query sample vector x _q IVPQ coding distance of; if the sum of the distances D _q (i) Exceeding a threshold distance t, t e [30,100 ] set according to actual training needs]If yes, discarding; and finally, sorting the distance between each training sample and the query sample as a result of the nonlinear retrieval and returning.

The invention has the beneficial effects that:

in terms of requirements, the CBIR method based on the IVPQ algorithm provided by the invention is suitable for being used in an actual application scene, and a user needs to search and judge a key-sensitive image library based on massive, label-free and complex unknown images, so that the function of searching images by images is realized; the dynamic index database retrieval method based on the Faiss framework realizes the dynamic index database retrieval and avoids the high operation and maintenance cost generated for reconstructing the index database in practical application occasions.

In the algorithm effect, the Product Quantization (PQ) algorithm of index coding is improved and optimized, and the product quantization (IVPQ) algorithm optimization process based on the inverted index is provided, so that the high-efficiency compression of data characteristics and good nonlinear retrieval are realized; the distance calculation times of the original PQ coding algorithm are times, while the distance calculation times of the IVPQ coding algorithm are only needed during query and retrieval, so that the calculation amount is greatly reduced, and the time consumption of the algorithm is optimized; because the retrieved test samples are clustered in advance, the retrieval recall rate can be improved to a certain extent in precision.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof.

Drawings

For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a flow chart of a CBIR method for improving IVPQ algorithm;

fig. 2 is the IVPQ algorithm process.

Detailed Description

The following embodiments of the present invention are provided by way of specific examples, and other advantages and effects of the present invention will be readily apparent to those skilled in the art from the disclosure herein. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.

Wherein the showings are for the purpose of illustration only and not for the purpose of limiting the invention, shown in the drawings are schematic representations and not in the form of actual drawings; for a better explanation of the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.

The invention provides a CBIR method based on an inverted product quantization (IVPQ) algorithm, which is used for realizing a large-scale image retrieval system based on a feature vector coding index. The specific process is as shown in fig. 1, extracting the image depth feature by improving the depth convolution network, performing inverted product quantization coding compression feature on the image feature, generating an index of a dynamic index database based on a Faiss frame, segmenting the data space of a full index database by feature vector coding, quickly locking a certain subspace through hamming distance rearrangement during searching of a query picture, traversing again, and outputting a search image.

The IVPQ algorithm is divided into two steps of index construction and nonlinear retrieval query, and the specific flow is shown in FIG. 2.

Let X = [ X ] ₁ ,x ₂ ,...,x _N ]∈R ^N×Ω A characteristic vector data set matrix of a training sample set, wherein omega is the dimension of training sample data, N is the number of samples of the training sample set, and a query sample is x _q 。

The index construction steps of IVPQ are as follows:

and carrying out coding preprocessing. Performing a K-Means clustering algorithm on the training sample characteristic vector data set X to obtain M sample clustering centers C = [ C ] ₁ ,c ₂ ,...,c _M ]∈R ^M×Ω Is provided with c _i ＝NNC(x _i ) Representing a training sample data feature vector x _i And (3) subtracting the nearest sample clustering center by two to obtain a residual vector group R, wherein the R is expressed by the formula:

R＝[r ₁ ,r ₂ ,...,r _i ,...,r _N ]∈R ^N×Ω

r _i ＝|x _i -c _i | (2)

for residual vector r _i The dimension space omega of (A) is divided equally by P, and r is recorded _i ＝[r _i,1 ,r _i,2 ,...,r _i,j ,...,r _i,P ]∈R ¹ ^×Ω And omega ₁ +...+ω _j +...+ω _P = omega, and respectively carrying out K-Means clustering on residual sub-vectors of all training samples in different subspaces to generate codebook sets C with consistent clustering center numbers _Ω ，C _Ω The expression is as follows:

/>

wherein the content of the first and second substances,

a j-th dimension sub-space formed by halving the dimension space omega of the residual vector group R of the training sampleA codebook (cluster set) between, P is the number of the dimensionality subspaces divided by omega; />

Is->

M 'is the number of cluster centers per subspace, and satisfies M' =2 ^p ，2 ^p The number of bits is binary coded for IVPQ.

By C _Ω To r _i IVPQ encoding with a per sample residual vector r _i Expressed by the ID number of the clustering center corresponding to the P residual sub-vectors, a training sample IVPQ coding set S is generated, wherein the S is expressed by the following formula:

S＝{S(1),S(2),...,S(i),...,S(N)}

wherein S (i) is a residual vector r of a training sample _i Generated set of IVPQ codes, c _i Marking the corresponding training sample cluster center; n (i, j) is the sample residual sub-vector r in S (i) _i,j In the corresponding dimension subspace ω _j The number of the nearest cluster center;

representing a subspace ω _j Neutron vector r _i,j The nearest cluster center number.

The non-linear retrieval procedure of IVPQ is as follows:

for query sample vector x _q Performing the similar coding pretreatment to generate a query residual vector r _q ＝|x _q -c _q L, likewise will r _q Dividing into P identical sub-vectors, and recording r _q ＝[r _q,1 ,r _q,2 ,...,r _q,j ,...,r _q,P ]∈R ^1*Ω And respectively calculating the distance between each subspace and M' clustering centers in the subspace to generate a query vector distance pool D with the size of P multiplied by M _Ω ，D _Ω The expression is as follows:

wherein, c _q To query the sample cluster centers of the sample vectors,

Is r _q (j) Corresponding to omega _j The distance value of the k-th cluster center in (c),

is r _q (j) Corresponding to omega _j The k-th cluster center.

When searching, only the training sample coding set S and the query sample vector x _q Sample cluster center c of _q IVPQ code set S with consistent subscript _q (region of interest ROI) is traversed. Let the number of code groups consistent with the query vector be N', S can be obtained from equation (4) _q Expression:

S _q ＝{S _q (1),S _q (2),...,S _q (i)...,S _q (N')}

in the query vector distance pool D _Ω Respectively, and S _q Generating a query retrieval distance set D by the sum of P Hamming distance values corresponding to each coding group _q Then D is _q The expression is as follows:

D _q ＝[D _q (1),D _q (2),...,D _q (i),...,D _q (N')]

wherein D is _q (i) Denotes S _q The ith training sample vector x _i And query sample vector x _q IVPQ coding distance. If the sum of the distances D _q (i) Exceeding a threshold distance t, t E [30,100 ] set according to actual training needs]It can be discarded. And finally, sorting the distance between each training sample and the query sample as a result of the nonlinear retrieval and returning.

The invention provides a CBIR method for improving product quantization coding, which comprises the following specific processes: adopting an improved residual error network as an image feature extractor; and then coding compressed image characteristic data by an index retrieval module of an IVPQ coding algorithm adopting a nonlinear retrieval ANN search strategy, generating an index in a dynamic index database based on a Faiss frame, and finally performing retrieval judgment by utilizing Hamming distance measurement.

Meanwhile, two embodiments created by the invention are given:

1. and (3) carrying out an index algorithm test:

SIFT1M is used as a test data set, the IVPQ coding retrieval algorithm improved and optimized by the method is compared with the existing image retrieval algorithms based on ANN retrieval strategies, and the superiority of the retrieval algorithm is measured by three indexes, namely recall rate, retrieval time and index file size. The graph proposed by the document introduces an index quantization method HNSW, respectively for the unmodified product quantization method PQ proposed by the document, the locality sensitive hashing method MLSH for multi-table lookup proposed by the document. The experimental parameters are described as follows: nlist: representing the number of sample clusters; m: representing the number of the divided subspaces; nbit: a binary encoding number representing each vector quantum space; nprobe represents the number of the most similar classes searched during query; r @ n represents the recall rate for returning n most similar IDs; time: representing the time required for a single query vector to complete the search; the file size is as follows: and the size of the memory space occupied by the generated index file is represented.

TABLE 1 IVPQ Performance test

TABLE 2 PQ Performance test

TABLE 3 MLSH Performance test

TABLE 4 HNSW Performance test

TABLE 5 violence retrieval test (ground-truth)

The results of the tests are shown in tables 1, 2, 3, 4, and 5: and (4) searching by brute-force (brute-force) of cosine distance (the normalized cosine similarity is equivalent to Euclidean L2 distance) to be used as a data reference standard true value (ground-true). The test result shows that under the condition of the same number of coded bits, as the value of m increases, the time required by IVPQ retrieval is much shorter than that of PQ and MLSH compared with HNSW, and IVPQ encoding is only slightly lower than that of PQ encoding but better than that of LSH encoding and HNSW encoding in recall rate. Because the IVPQ coding requires the storage of the sample cluster center, the index file generated by the optimized and improved algorithm is necessarily slightly larger than that generated by the prototype PQ algorithm. Under the conditions of little recall rate and the same number of encoding bits, the index file generated by IVPQ encoding is only slightly higher than PQ, but is far smaller than the index files of LSH and HNSW, and the compression efficiency is respectively improved by 62.24% and 76.52% compared with the index files of LSH and HNSW; the retrieval method of LSH, HNSW and PQ consumes much less time for IVPQ retrieval, the retrieval time is only 4.03%, 13.21% and 27.05% of HNSW, LSH and PQ respectively, and the speed is improved by 23.21 times, 6.56 times and 2.69 times. By comprehensive evaluation, the index module adopting the IVPQ retrieval algorithm has the most excellent performance under the condition of facing large-scale data sets.

2. The retrieval effect display test of the CBIR method is carried out by using a Caltech256 image data set as a test data set, a feature extractor of the CBIR method respectively adopts the improved residual network model ResNet152v2_ AEPL, and an index retrieval module respectively adopts the IVPQ algorithm proposed by the method and index retrieval algorithms compared with other documents HNSW, MLSH and PQ:

the CBIR method proposed by the present invention extracts the depth feature descriptors of the Caltech256 image dataset by means of a feature extractor. And the index retrieval module is used for encoding and compressing the characteristic data to generate a sample index library with 28780 indexes and query indexes of 1000 test samples, and the performance test of the CBIR retrieval system is carried out. The results of this experiment are shown separately for elk (Caltech 256-065. Elk) search. The test result further certifies the retrieval effect precision of the CBIR method based on the IVPQ algorithm.

Finally, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A content-based image retrieval CBIR method based on an improved product quantization PQ algorithm is characterized in that: extracting image depth features through an improved depth convolution network, coding and compressing image feature data through an index retrieval module which adopts a nonlinear retrieval ANN search strategy and is based on an inverted index product quantization IVPQ algorithm, generating an index of a dynamic index database based on a Faiss frame, partitioning a data space of a full index database through feature vector coding, quickly locking a certain subspace through Hamming distance rearrangement for traversal when retrieving a query picture, and outputting a retrieval image;

the IVPQ algorithm is divided into index construction and nonlinear retrieval query, and the index construction and the nonlinear retrieval query are recorded with X = [ X = ₁ ,x ₂ ,...,x _N ]∈R ^N×Ω A characteristic vector data set matrix of a training sample set, wherein omega is the dimension of training sample data, N is the number of samples of the training sample set, and a query sample is x _q ；

The index construction specifically comprises the following steps:

R＝[r ₁ ,r ₂ ,...,r _i ,...,r _N ]∈R ^N×Ω

r _i ＝|x _i -c _i | (2)

for residual vector r _i The dimension omega of the space is divided into two parts by P, and r is recorded _i ＝[r _i,1 ,r _i,2 ,...,r _i,j ,...,r _i,P ]∈R ^1×Ω And omega ₁ +...+ω _j +...+ω _P = omega, and carry out K-Means clustering on residual sub-vectors of all training samples in different subspaces respectively to generate codebook sets C with consistent clustering center numbers _Ω ，C _Ω The expression is as follows:

wherein the content of the first and second substances,

a codebook of the jth dimensionality subspace formed after the dimensionality space omega of the training sample residual vector group R is divided equally, namely a cluster set, wherein P is the number of the dimensionality subspaces divided equally by omega; />

Is->

by C _Ω To r is to _i Performing IVPQ coding, each sample residual vector r _i Expressed by the ID number of the clustering center corresponding to the P residual sub-vectors, a training sample IVPQ coding set S is generated, wherein S is expressed by the following formula:

S＝{S(1),S(2),...,S(i),...,S(N)}

the nonlinear retrieval query specifically:

for query sample vector x _q The coding pretreatment is carried out to generate a query residual vector r _q ＝|x _q -c _q L, likewise will r _q Dividing into P identical sub-vectors, and recording r _q ＝[r _q,1 ,r _q,2 ,...,r _q,j ,...,r _q,P ]∈R ^1*Ω And respectively calculating the distance between each subspace and M' clustering centers in the subspace to generate a query vector distance pool D with the size of P multiplied by M _Ω ，D _Ω The expression is as follows:

/>

wherein, c _q To query the sample cluster centers of the sample vectors,

for querying residual subvectors r _q (j) And subspace omega _j Distance of M' cluster centersGathering; />

Is r _q (j) Corresponding to omega _j The value of the distance of the kth cluster center, <' > or>

Is r _q (j) Corresponding to omega _j The kth cluster center;

when searching, only the training sample coding set S and the query sample vector x _q Sample cluster center c _q Ucpq code set S with consistent subscripts _q Namely, ROI, traversing and inquiring; let the number of code groups consistent with the query vector be N', and obtain S from equation (4) _q Expression:

S _q ＝{S _q (1),S _q (2),...,S _q (i)...,S _q (N')}

D _q ＝[D _q (1),D _q (2),...,D _q (i),...,D _q (N')]