CN111177435A

CN111177435A - CBIR method based on improved PQ algorithm

Info

Publication number: CN111177435A
Application number: CN201911417377.6A
Authority: CN
Inventors: 曾浩; 高凡
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-05-19
Anticipated expiration: 2039-12-31
Also published as: CN111177435B

Abstract

The invention relates to a CBIR method based on an improved PQ algorithm, belonging to the technical field of image processing. The method comprises the steps of extracting image depth features through an improved depth convolution network, coding and compressing image feature data through an index retrieval module of an inverted index based product quantification IVPQ algorithm which adopts a nonlinear retrieval ANN search strategy, generating indexes of a dynamic index database based on a Faiss frame, partitioning a data space of a full index database through feature vector coding, rapidly locking a certain subspace through Hamming distance rearrangement for traversal when retrieval of a query picture is carried out, and outputting a retrieval image. The invention realizes the dynamism of the retrieval index library based on the Faiss framework, and avoids the high operation and maintenance cost generated for reconstructing the index library in practical application occasions.

Description

CBIR method based on improved PQ algorithm

Technical Field

The invention belongs to the field of image processing, and relates to a CBIR method based on an improved PQ algorithm.

Background

In an actual application scene, a user needs to search and judge a key-sensitive image library based on massive, label-free and complex unknown images, so that the function of searching images with images is realized. At present, the most effective mode for representing the index image information is based on the image Content, so a Content-based image Retrieval (CBIR) method is selected for large-scale image Retrieval system design.

The traditional CBIR method employs a Brute-force (Brute-force) strategy for similarity measurement, which aggravates the consumption of memory resources as the picture characteristic index data increases. Particularly, when the data set scale of the actual application occasion reaches hundreds of millions of scales, the occupation of the operating memory (RAM) cannot be met due to the increase of the index scale, so that the retrieval performance is sharply reduced, the system performance cannot obtain the expected target, and the hardware cost is sharply increased. For this reason, the mainstream solution is to use an Approximate Nearest Neighbor (ANN) search strategy, which essentially partitions the full space of the search data set into subspaces, and locks (several) subspace sets in a certain way and traverses them quickly.

The ANN is mainly classified into a KD tree method, a graph index quantization method, a hash method, and a vector quantization method. For a conventional KD tree algorithm, along with the deeper tree depth of a KD tree, the performance of a KD tree method retrieval model is poor; the method for introducing the graph into the ANN search takes a mature HNSW (Hierarchical navigation Small World graphics) algorithm as an example, the recall rate is high, the index memory occupies a large amount, and a unique index structure is not beneficial to dynamic addition and deletion of data; for the hash method, a multi-table Locality-Sensitive Hashing (MLSH) is generated, which is an improved hash coding algorithm for generating a plurality of hash functions for dividing a space domain by constructing a plurality of hash tables so as to improve the search accuracy under a large-scale high-dimensional data set, but the condition that a large amount of memory space is consumed by index data is not eliminated; for the vector Quantization method, the representative algorithm is a Product Quantization (PQ) method which is already very practical and popular in the industrial field, the index data compression performance is better, the memory occupation can be effectively reduced, but the recall rate is lower.

In addition, in recent years, many cases employ FALCONN or NMSLIB frameworks that are not currently supported by dynamic addition and deletion of data in the index database of the CBIR method based on the ANN search policy. This is acceptable for search algorithms and system implementations for small data sets and medium-scale data sets. However, for large-scale retrieval, in order to meet the needs of practical application occasions, the CBIR system needs to perform storage of indexes on specific sensitive picture data in the operating process. Otherwise, each time the index library is re-established, high operation and maintenance cost and time consumption are required. Therefore, the index library for the CBIR system in practical applications should be dynamically scalable.

Disclosure of Invention

In view of the above, the present invention aims to provide a CBIR method based on the modified PQ algorithm.

In order to achieve the purpose, the invention provides the following technical scheme:

a content-based image retrieval CBIR method based on an improved product quantization PQ algorithm extracts image depth features through an improved depth convolution network, then codes compressed image feature data through an index retrieval module based on an inverted index product quantization IVPQ algorithm and adopting a nonlinear retrieval ANN search strategy, generates indexes of a dynamic index database based on a Faiss frame, segments a data space of a full index database through feature vector coding, rapidly locks a certain subspace through Hamming distance rearrangement for traversal when retrieval of a query image is carried out, and outputs a retrieval image.

Optionally, the IVPQ algorithm is divided into index construction and nonlinear search query, and X ═ X is recorded₁,x₂,...,x_N]∈R^N×ΩIs a characteristic vector data set matrix of a training sample set, wherein omega is the dimension of training sample data, N is the number of samples of the training sample set, and a query sample is x_q；

The index construction specifically comprises the following steps:

and (3) carrying out encoding preprocessing: performing a K-Means clustering algorithm on the training sample characteristic vector data set X to obtain M sample clustering centers C ═ C₁,c₂,...,c_M]∈R^M×ΩIs provided with c_i＝NNC(x_i) Representing a training sample data feature vector x_iIn the nearest sample clustering center, subtracting each two to obtain a residual vector group R, wherein the R is expressed by the formula:

R＝[r₁,r₂,...,r_i,...,r_N]∈R^N×Ω

r_i＝|x_i-c_i| (2)

for residual vector r_iThe dimension of omega is divided equally by P, and ri is [ r ]_i,1,r_i,2,...,r_i,j,...,r_i,P]∈R¹ ^×ΩAnd omega₁+...+ω_j+...+ω_PAnd respectively carrying out K-Means clustering on residual sub-vectors of all training samples in different subspaces to generate a codebook set C with consistent clustering center number_Ω，C_ΩThe expression is as follows:

wherein the content of the first and second substances,

the method comprises the steps that a codebook (cluster set) of the jth dimensionality subspace formed after the dimensionality space omega of a training sample residual vector group R is divided equally is provided, and P is the number of the dimensionality subspaces divided equally by omega;

is composed of

M 'is the number of cluster centers per subspace, and satisfies M' 2^p，2^pBinary encoding the number of bits for IVPQ;

by C_ΩTo r_iIVPQ encoding with a per sample residual vector r_iExpressed by the ID number of the clustering center corresponding to the P residual sub-vectors, a training sample IVPQ coding set S is generated, wherein the S is expressed by the following formula:

S＝{S(1),S(2),...,S(i),...,S(N)}

wherein S (i) is a residual vector r of the training sample_iGenerated set of IVPQ codes, c_iMarking the corresponding training sample cluster center; n (i, j) is the sample residual sub-vector r in S (i)_i,jIn the corresponding dimension subspace ω_jThe number of the nearest cluster center;

representing a subspace ω_jNeutron vector r_i,jThe nearest cluster center number;

the nonlinear retrieval query specifically comprises:

for query sample vector x_qPerforming the similar coding preprocessing to generate a query residual vector r_q＝|x_q-c_qL, likewise will r_qDividing into P identical sub-vectors, and recording r_q＝[r_q,1,r_q,2,...,r_q,j,...,r_q,P]∈R^1*ΩAnd respectively calculating the distance between each subspace and M' clustering centers in the subspace to generate a query vector distance pool D with the size of P multiplied by M_Ω，D_ΩThe expression is as follows:

wherein, c_qTo query the sample cluster centers of the sample vectors,

for querying residual subvectors r_q(j) And subspace omega_jDistance sets of M' cluster centers;

is r_q(j) Corresponding to omega_jThe distance value of the k-th cluster center in (c),

is r_q(j) Corresponding to omega_jThe kth cluster center;

when searching, only the training sample coding set S and the query sample vector x_qSample cluster center c_qIVPQ code set S with consistent subscript_qNamely, ROI, traversing and inquiring; let the number of code groups consistent with the query vector be N', and obtain S from equation (4)_qExpression:

S_q＝{S_q(1),S_q(2),...,S_q(i)...,S_q(N')}

in the query vector distance pool D_ΩRespectively calculate and S_qGenerating a query search distance set D by the sum of P Hamming distance values corresponding to each coding group_qThen D is_qThe expression is as follows:

D_q＝[D_q(1),D_q(2),...,D_q(i),...,D_q(N')]

wherein D is_q(i) Denotes S_qThe ith training sample vector x_iAnd query sample vector x_qIVPQ coding distance of; if the sum of distances D_q(i) Exceeding a threshold distance t, t e [30,100 ] set according to actual training needs]If yes, discarding; finally, each training sample and each inquiry sample are usedIs returned as a result of the non-linear search.

The invention has the beneficial effects that:

in terms of requirements, the CBIR method based on the IVPQ algorithm provided by the invention is suitable for being used in an actual application scene, and a user needs to search and judge a key-sensitive image library based on massive, label-free and complex unknown images, so that the function of searching images by images is realized; the dynamic index database is realized based on the Faiss framework, and the high operation and maintenance cost generated for reconstructing the index database in practical application occasions is avoided.

In the algorithm effect, the Product Quantization (PQ) algorithm of index coding is improved and optimized, and an optimization process of the product quantization (IVPQ) algorithm based on the inverted index is provided, so that the high-efficiency compression of data characteristics and good nonlinear retrieval are realized; the distance calculation times of the original PQ coding algorithm are times, while the distance calculation times of the IVPQ coding algorithm are only needed during query and retrieval, so that the calculation amount is greatly reduced, and the time consumption of the algorithm is optimized; because the retrieved test samples are clustered in advance, the retrieval recall rate can be improved to a certain extent in precision.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.

Drawings

For purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a flow chart of a CBIR method for improving IVPQ algorithm;

fig. 2 is the IVPQ algorithm process.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.

Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.

The invention provides a CBIR method based on an inverted product quantization (IVPQ) algorithm, which is used for realizing a large-scale image retrieval system based on a feature vector coding index. The specific process is as shown in fig. 1, an image depth feature is extracted by improving a depth convolution network, inverted product quantization coding compression features are performed on the image feature, an index of a dynamic index database based on a Faiss framework is generated, a data space of a full index database is divided through feature vector coding, when a query picture is retrieved, a certain subspace is quickly locked through hamming distance rearrangement and then traversed, and a retrieval image is output.

The IVPQ algorithm is divided into two steps of index construction and nonlinear retrieval query, and the specific flow is shown in FIG. 2.

Let X be ═ X₁,x₂,...,x_N]∈R^N×ΩA characteristic vector data set matrix of a training sample set, wherein omega is the dimension of training sample data, N is the number of samples of the training sample set, and a query sample is x_q。

The index construction steps of IVPQ are as follows:

and carrying out coding preprocessing. Performing a K-Means clustering algorithm on the training sample characteristic vector data set X to obtain M sample clustering centers C ═ C₁,c₂,...,c_M]∈R^M×ΩIs provided with c_i＝NNC(x_i) Representing a training sample data feature vector x_iIn the nearest sample clustering center, subtracting each two to obtain a residual vector group R, wherein the R is expressed by the formula:

R＝[r₁,r₂,...,r_i,...,r_N]∈R^N×Ω

r_i＝|x_i-c_i| (2)

for residual vector r_iThe dimension space omega of (A) is divided equally by P, and r is recorded_i＝[r_i,1,r_i,2,...,r_i,j,...,r_i,P]∈R¹ ^×ΩAnd omega₁+...+ω_j+...+ω_PAnd respectively carrying out K-Means clustering on residual sub-vectors of all training samples in different subspaces to generate a codebook set C with consistent clustering center number_Ω，C_ΩThe expression is as follows:

wherein the content of the first and second substances,

is composed of

M 'is the number of cluster centers per subspace, and satisfies M' 2^p，2^pThe number of bits is binary coded for IVPQ.

S＝{S(1),S(2),...,S(i),...,S(N)}

representing a subspace ω_jNeutron vector r_i,jThe nearest cluster center number.

The non-linear retrieval procedure for IVPQ is as follows:

for query sample vector x_qPerforming the sameCode pre-processing to generate query residual vector r_q＝|x_q-c_qL, likewise will r_qDividing into P identical sub-vectors, and recording r_q＝[r_q,1,r_q,2,...,r_q,j,...,r_q,P]∈R^1*ΩAnd respectively calculating the distance between each subspace and M' clustering centers in the subspace to generate a query vector distance pool D with the size of P multiplied by M_Ω，D_ΩThe expression is as follows:

wherein, c_qTo query the sample cluster centers of the sample vectors,

is r_q(j) Corresponding to omega_jThe k-th cluster center.

When searching, only the training sample coding set S and the query sample vector x_qSample cluster center c_qIVPQ code set S with consistent subscript_q(region of interest ROI) is traversed. Let the number of code groups consistent with the query vector be N', and S can be obtained from equation (4)_qExpression:

S_q＝{S_q(1),S_q(2),...,S_q(i)...,S_q(N')}

D_q＝[D_q(1),D_q(2),...,D_q(i),...,D_q(N')]

wherein D is_q(i) Denotes S_qThe ith training sample vector x_iAnd query sample vector x_qIVPQ coding distance. If the sum of distances D_q(i) Exceeding a threshold distance t, t e [30,100 ] set according to actual training needs]It can be discarded. And finally, sorting the distance between each training sample and the query sample as a result of the nonlinear retrieval and returning.

The invention provides a CBIR method for improving product quantization coding, which comprises the following specific processes: adopting an improved residual error network as an image feature extractor; and then coding compressed image characteristic data by an index retrieval module of an IVPQ coding algorithm adopting a nonlinear retrieval ANN search strategy, generating an index in a dynamic index database based on a Faiss frame, and finally performing retrieval judgment by utilizing Hamming distance measurement.

Meanwhile, two embodiments created by the invention are given:

1. and (3) carrying out an index algorithm test:

the SIFT1M is used as a test data set, the IVPQ coding retrieval algorithm improved and optimized by the method is compared with the existing image retrieval algorithms based on ANN retrieval strategies, and the superiority of the retrieval algorithm is measured by three indexes, namely recall rate, retrieval time and index file size. The graph proposed by the literature is introduced into the index quantization method HNSW, respectively. The experimental parameters are described as follows: nlist: representing the number of sample clusters; m: representing the number of the divided subspaces; nbit: a binary encoding number representing each vector quantum space; nprobe represents the number of the most similar classes searched during query; r @ n represents the recall rate for returning n most similar IDs; time: representing the time required for a single query vector to complete retrieval; the file size is as follows: and the size of the memory space occupied by the generated index file is represented.

TABLE 1 IVPQ Performance test

TABLE 2 PQ Performance test

TABLE 3 MLSH Performance test

TABLE 4 HNSW Performance test

TABLE 5 violence retrieval test (ground-truth)

The results of the tests are shown in tables 1, 2, 3, 4, and 5: and (4) searching by brute-force (brute-force) of cosine distance (the normalized cosine similarity is equivalent to Euclidean L2 distance) to be used as a data reference standard true value (ground-true). The test result shows that the IVPQ retrieval requires much less time than PQ and MLSH compared with HNSW as the value of m increases under the condition of the same number of coded bits, and IVPQ coding is only slightly lower than PQ coding but better than LSH coding and HNSW coding in the recall ratio. Because the IVPQ coding requires the storage of the sample cluster center, the index file generated by the optimized and improved algorithm is necessarily slightly larger than that generated by the prototype PQ algorithm. Under the conditions of not much recall rate and the same number of encoding bits, the index file generated by IVPQ encoding is only slightly higher than PQ, but is far smaller than the index files of LSH and HNSW, and the compression efficiency is respectively improved by 62.24% and 76.52% compared with the index files of LSH and HNSW; the retrieval method of LSH, HNSW and PQ consumes much less time for IVPQ retrieval, the retrieval time is only 4.03%, 13.21% and 27.05% of HNSW, LSH and PQ respectively, and the speed is improved by 23.21 times, 6.56 times and 2.69 times. By comprehensive evaluation, the index module adopting the IVPQ retrieval algorithm is most superior in performance under the condition of facing a large-scale data set.

2. The retrieval effect display test of the CBIR method is carried out by using a Caltech256 image data set as a test data set, a feature extractor of the CBIR method respectively adopts the improved residual network model ResNet152v2_ AEPL, and an index retrieval module respectively adopts the IVPQ algorithm proposed by the method and the index retrieval algorithm compared with other documents HNSW, MLSH and PQ:

the CBIR method proposed by the invention extracts the depth feature descriptors of the Caltech256 image dataset by means of a feature extractor. And the index retrieval module is used for encoding and compressing the feature data to generate 28780 index sample index libraries and 1000 query indexes of test samples, and the performance test of the CBIR retrieval system is carried out. The results of this experiment are shown for elk (Caltech256-065.elk) search, respectively. The test result further verifies the retrieval effect precision of the CBIR method based on the IVPQ algorithm.

Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all that should be covered by the claims of the present invention.

Claims

1. A content-based image retrieval CBIR method based on an improved product quantization PQ algorithm is characterized in that: the method comprises the steps of extracting image depth features through an improved depth convolution network, coding and compressing image feature data through an index retrieval module of an inverted index based product quantification IVPQ algorithm which adopts a nonlinear retrieval ANN search strategy, generating indexes of a dynamic index database based on a Faiss frame, partitioning a data space of a full index database through feature vector coding, rapidly locking a certain subspace through Hamming distance rearrangement for traversal when retrieval of a query picture is carried out, and outputting a retrieval image.

2. The CBIR method based on modified PQ algorithm of claim 1, wherein: the IVPQ algorithm is divided into index construction and nonlinear retrieval query, and the notation of X is [ X ]₁,x₂,...,x_N]∈R^N×ΩA characteristic vector data set matrix of a training sample set, wherein omega is the dimension of training sample data, N is the number of samples of the training sample set, and a query sample is x_q；

The index construction specifically comprises the following steps:

and (3) carrying out encoding preprocessing: performing a K-Means clustering algorithm on the training sample characteristic vector data set X to obtain M sample clustering centers C ═ C₁,c₂,...,c_M]∈R^M×ΩIs provided with c_i＝NNC(x_i) Representing a training sample data feature vector x_iAnd (3) subtracting the nearest sample clustering center by two to obtain a residual vector group R, wherein the R is expressed by the formula:

R＝[r₁,r₂,...,r_i,...,r_N]∈R^N×Ω

r_i＝|x_i-c_i| (2)

for residual vector r_iThe dimension of omega is divided into P halves,note r_i＝[r_i,1,r_i,2,...,r_i,_j,...,r_i,P]∈R^1×ΩAnd omega₁+...+ω_j+...+ω_PAnd respectively carrying out K-Means clustering on residual sub-vectors of all training samples in different subspaces to generate a codebook set C with consistent clustering center number_Ω，C_ΩThe expression is as follows:

wherein the content of the first and second substances,

a codebook (cluster set) of the jth dimensionality subspace formed by halving the dimensionality space omega of the training sample residual vector group R is provided, and P is the number of the dimensionality subspaces after the omega is halved;

is composed of

S＝{S(1),S(2),...,S(i),...,S(N)}

the nonlinear retrieval query specifically:

wherein, c_qTo query the sample cluster centers of the sample vectors,

is r_q(j) Corresponding to omega_jThe kth cluster center;

S_q＝{S_q(1),S_q(2),...,S_q(i)...,S_q(N')}

in the query vector distance pool D_ΩRespectively calculate and S_qGenerating a query retrieval distance set D by the sum of P Hamming distance values corresponding to each coding group_qThen D is_qThe expression is as follows:

D_q＝[D_q(1),D_q(2),...,D_q(i),...,D_q(N')]

wherein D is_q(i) Denotes S_qThe ith training sample vector x_iAnd query sample vector x_qIVPQ coding distance of; if the sum of the distances D_q(i) Exceeding a threshold distance t, t e [30,100 ] set according to actual training needs]If yes, discarding; finally, the distance between each training sample and the query sample is used for sortingAnd returning the result of the nonlinear retrieval.