WO2018103179A1

WO2018103179A1 - Near-duplicate image detection method based on sparse representation

Info

Publication number: WO2018103179A1
Application number: PCT/CN2017/070197
Authority: WO
Inventors: 赵万青; 罗迒哉; 范建平; 彭进业
Original assignee: 西北大学
Priority date: 2016-12-09
Filing date: 2017-01-05
Publication date: 2018-06-14
Also published as: CN106599917A

Abstract

A near-duplicate image detection method based on sparse representation. The method is proposed on the basis of a Hadoop distributed computing framework, and comprises the following steps: acquiring an image set I, wherein sparse encoding results of all images are g'; extracting non-zero elements in g', and hashing the sparse encoding result gi' of the image Ii to groups corresponding to subscripts of the non-zero elements; computing, for each reduce function, a similarity level Y of the sparse encoding results for each pair of images <Iw, Iz>, and if Y is greater than 0.7, then outputting said near-duplicate image pair <Iw, Iz>; and combining near-duplicate image pairs having the image Iw, and generating a near-duplicate image subset. The technical solution of the present invention employs parallel computing to greatly improve computational efficiency of a K-Means clustering algorithm for a large-scale data set, and introduces the sparse representation concept to increase the speed of the method and eliminate excessive computation for solution finding and optimization.

Description

An approximate repeated image detection method based on sparse representation

Technical field

The invention belongs to the field of image approximate repeat detection, and relates to a parallel approximation image repeating detection method based on sparse representation, which can efficiently and accurately extract an approximate repeated image set for a massive image set.

Background technique

With the development of mobile Internet and digital cameras, more and more people are sharing the captured multimedia data on the Internet. Due to the location of the photographer, the objects and angles of the shooting, a large number of similarities appear on the Internet. picture of. By extracting these similar image sets, it is not only possible to de-filter the image retrieval results, but also an important step in many image processing fields such as image clustering, image recognition, image classification and the like.

Usually the approximate repeated image is obtained by some approximate original image transformation by some approximate repeated image. Generally, the transformation of the approximate repeated image can be generated, including translation, scaling, selection, image color change, adding text, format change, resolution change, etc. Wait. Near-repeated image detection refers to finding a near-repeated image of the image in the data set or extracting a subset of all near-repeated images in the data set for a given query image. At present, most of the near-repetitive image detection uses the Bag-of-words and LSH methods to build the system. The Bag-of-words model maps the local features of each image to a visual word frequency histogram vector using the trained dictionary. The image representation model method based on Bag-of-words generally consists of three parts: 1) extracting local features of the image; 2) constructing a visual dictionary by clustering the local features of the image set; 3) mapping the local vector of each image to one Word frequency histogram. LSH (Locality-Sensitive Hashing) is a stochastic method for indexing high-dimensional data. It approximates linear search in high-dimensional data space and returns approximate nearest neighbor data of query data at a certain accuracy. Its basic The idea is to map the input data points to each bucket through a set of hash functions, and ensure that the data points of the neighbors are mapped to the same bucket with a large probability, and the data points that are far apart are mapped to the same with a small probability. In a bucket, such other data points in the bucket where the query data points are located can be regarded as the neighbors of the query data. However, the BOW model is too strict for local features and the efficiency of LSH to change efficiency is often unsatisfactory. In addition, the existing near-repetition image detection system generally uses a single node for calculation. With the explosive growth of data volume, the single-node system is far from meeting the current application needs. Therefore, multi-node parallel computing has become an inevitable choice. Among many distributed frameworks, the HADOOP system is the most stable and efficient.

Summary of the invention

In view of the above problems in the prior art, an object of the present invention is to provide a distributed image approximate repeat set extraction system and method based on sparse representation for a large-scale image set. This method can not only improve the efficiency of processing large-scale image sets, but also can effectively improve the accuracy of detection results compared with traditional methods.

In order to achieve the above tasks, the present invention adopts the following technical solutions:

An approximate repeated image detection method based on sparse representation, which is proposed based on the Hadoop distributed computing framework, the detection method comprising the following steps: acquiring IDF weighted sparse coding g' of all images in image set I, where I=(I ₁ , I ₂ , ..., I _i , ..., I _w , ..., I _z , ..., I _R ), IDF weighted sparse coding of I _i is g _i ', g _i '∈g', i is A natural number greater than or equal to 1, w is a natural number greater than i, z is a natural number greater than w, and R is a natural number greater than z, and the method further includes:

(1) extracting the non-zero element in the IDF-weighted sparse coding g _i ' of the image I _i ; g _ik '∈g _i ', k is a natural number greater than or equal to 1, and the non-zero element in g _i ' is (g _iu ' ,...,g _iv ′), let m be non-zero elements, m be a natural number greater than or equal to 1, m ≤ k, g _iu ′′0, g _iv ′′0, u is a natural number greater than or equal to 1, v is greater than or equal to 1 natural number, k>v>u;

(2) Create k groups, named as:

among them,

Empty matrix

(3) using the matrix transformation of (Formula 1), hashing the IDF weighted sparse coding g _i ' of the image I _i into the m groups corresponding to the subscripts (u, ..., v) of the non-zero elements;

(4) Utilization

Calculating the similarity Y of each pair of images <I _i , I _j > IDF weighted sparse coding in each of the m groups obtained in the step (3), if the Y is greater than 0.7, the images <I _i , I _j > are similar images Correct;

Where j is a natural number greater than or equal to 1, and i ≠ j; g' _i and g' _j respectively represent IDF weighted sparse coding of images I _i and I _j ;

(5) Combining similar image pairs having the same image in the results obtained in the step (4) to generate a similar image subset.

Further, the acquiring the IDF weighted sparse coding g' of all the images in the image set I includes the following steps:

Parallelizing the local features of each image to obtain local features S of all images in image set I;

Extracting the image clustering center to obtain the feature dictionary E;

Calculate the weight of each cluster center in E;

The IDF weighted sparse coding g' of the image is extracted according to each cluster center weight in E.

Further, the parallelizing the extracted image local features is to extract SIFT features of all images of the image set I.

Further, the extracting SIFT features of all images, the specific steps are:

Normalizing and grading the size of each image I _i in the image set I to obtain a standard-sized gray image set; where I=(I ₁ , I ₂ , . . . , I _i , . . . , I _w ,...,I _z ,...,I _R );

The standard size gray image set is segmented to each cluster node, and the SIFT features of each image are extracted in parallel. The SIFT features of all images are represented as S, where Sa _a ∈S, S _a represents a vector, and a is greater than or equal to 1 The natural number, the SIFT feature of the I- _th image is F _i , where F _ib ∈F _i , F _ib represents a vector, and b is a natural number greater than or equal to 1.

Further, the extracting the image clustering center, the specific steps are:

(11) S _a calculated Euclidean distance to the cluster center of A ^d, as the value S _a, S _a to the Euclidean nearest cluster center as a key value; A _dk ∈A ^{d, d} is greater than or equal to 0 Non-negative integer, k is a natural number greater than or equal to 1, A _dk represents a vector; k SIFT features randomly selected from S are used as the initial k cluster centers to form an initial cluster center set A ⁰ , A _0k ∈A ⁰ ;

(21) Find the average value of S _a with the same key value, d=d+1, and take each average value as the new cluster center A ^d ;

(31) Calculate the Euclidean distance mean of the new cluster center A ^d and the cluster center A ^d-1 of the previous cycle, and if the Euclidean distance average value is >0.05, jump back to step (11) to execute; The Euclidean distance mean <0.05, the new cluster center A ^{d is} output as the feature dictionary E of the image set I, where E _k ∈E,k is a natural number greater than or equal to 1, and E _k is a vector, and the feature dictionary E is Cluster the center for images.

Further, the calculating the weight of each cluster center in E, the specific step is: adopting

Calculate the weight of each cluster center in E, where: D is the total number of all SIFT features in S, and D is a natural number greater than or equal to 1.

Represents the total number of all SIFT features attributed to the E _k center.

Further, the extracting the IDF weighted image sparse coding, the specific steps are:

Calculating the Euclidean distance h of each feature vector F _ib and the feature dictionary E in the image I _i , where E _k ∈E, h _k ∈h=(h ₁ , h ₂ ,...,), k is greater than or equal to 1 is a natural number, E is selected from the smallest of the m h _k E _K, wherein the composition dictionary E ', E' = (E f, ..., E g), where E 'has m vectors, f is greater than a natural number equal to 1, g is a natural number greater than or equal to 1, g>f;

Calculating the squared difference matrix C _{ib of} F _ib and the feature dictionary E′ by using C _ib =(E'-1F _ib ^T )(E'-1F _ib ^T ) ^T ;

Calculate the sparse coding c _{ib of} F _ib in the feature dictionary E′ ,

Where h _m ′′h′=(h ₁ ', h ₂ ',...,) is the Euclidean distance of the visual words of the feature vector F _ib to E′ respectively, and diag(h′) represents the element of the vector h′ as The main diagonal of the matrix;

Extracting the maximum value of k c _ib is, sparse coding obtain an image of the image I _i g _i, where g _ik ∈g _i;

according to

The image sparse coding g _{i is} subjected to inverse visual word frequency weighted normalization to obtain IDF weighted sparse coding g'.

The beneficial effects of the invention are:

(1) The present invention runs the KMeans algorithm in the MapReduce parallel mode to learn the complete dictionary features online, so as to extract the global feature structure as much as possible, so that the atomic combination with the most representative ability can be found from the overcomplete dictionary to represent the image features. The online dictionary learning terminates the MapReduce Jop every time and judges the termination according to the set threshold. This parallelization calculation method greatly improves the computational efficiency of the KMeans clustering algorithm for large-scale data sets;

(2) The present invention introduces a sparse representation theory and reconstructs by searching a dictionary set and K codewords that are nearest to the feature, which is reconstructed by using multiple codewords compared to the vector quantization of the original BOW. It can have smaller reconstruction error, and it can achieve local smooth sparsity by selecting local neighbors. At the same time, this method has faster implementation method with traditional sparse representation method and does not require excessive solution optimization process.

(3) The invention weights the image sparse representation vector by statistical global reverse word frequency IDF, so that the representation is stronger, and the weight of the less representative codeword in the sparse coding is reduced, and the representation is higher. The weight of the codeword makes it more sparse, so that the high-characterization features of similar images have higher probability of being the same or similar;

(4) The present invention hashes to the candidate similar set by extracting the high-characteristic features in the sparse representation, and simultaneously performs matching by calculating the similarity of the two Jaccard indexes in each hash bucket, which can obtain similar image pairs simply and efficiently. At the same time, this method can make full use of the characteristics of sparse representation and the Mapreduce model for parallel computing, greatly improving the matching efficiency and accuracy of large-scale image data sets;

(5) The HADOOP-based distributed storage system HDFS and the parallel computing model MapReduce distributed image near-repetitive detection system, the existing image near-repetition detection system generally supports only a single-node detection framework, but with the mobile Internet Development, image data grows exponentially, in the past The system simply cannot satisfy the storage and operations on such a large amount of data. The invention not only has high scalability for massive data, but also greatly improves the calculation efficiency of near-repetition detection.

DRAWINGS

1 is a schematic structural diagram of an approximate repeated image detecting method based on a sparse representation according to the present invention;

2 is a flowchart of implementing a MapReduce-based parallel KMeans algorithm according to the present invention;

3 is a flowchart of implementing an image sparse coding algorithm based on local priority search according to the present invention;

4 is a flowchart of implementing a parallel similar set detection algorithm based on MapReduce and significant dimensional features according to the present invention;

FIG. 5 is a clustering result obtained by different methods for five keywords in Embodiment 2; FIG.

Fig. 6 is a clustering result of the second embodiment.

detailed description

The following is a specific embodiment of the present invention. It should be noted that the present invention is not limited to the following specific embodiments, and equivalent transformations made on the basis of the technical solutions of the present application fall within the protection scope of the present invention.

Example 1

The method is based on the Hadoop distributed computing framework. In Fig. 1, an approximate repeated image detection method based on sparse representation includes a feature extraction module, a feature dictionary construction module, an image sparse coding representation module, a similar image matching filtering module, and a near Repeat the image pair module. The feature extraction module is configured to parallelize all original local image features in the image collection; the feature dictionary construction module is configured to construct the original feature dictionary set by using the parallel K-Means online dictionary learning algorithm for the image features extracted by the feature extraction module. The image sparse coding representation module is used to map the original local features of each image into a sparse vector to represent each image by using the constructed dictionary set and the sparse coding method; the similarity map matching filtering module is used for parallel computing. The similarity of the filtered image pairs and the output similarity is greater than An image pair of a certain threshold; the near-repeated image pairing module is used to merge all similar image pairs output by the similarity map matching filter module into a near-repeated image set.

The detection method includes the following steps:

Step 1, the image set I, where I = (I ₁ , I ₂ , ..., I _i , ..., I _w , ..., I _z , ..., I _R ) the size of each image I _i Standardization, for example, set to standardization (128*128, 256*256 or 512*512) size, the image selected in the present invention is uniformly standardized to a size of 256*256, and then grayscale processing is performed to obtain a standard-sized grayscale image set. Use the custom imageWritable to serialize the image, output the binary data stream of the image, and compress all the images in the form of key-value pairs <key: image ID, value: imageWritable> in a custom ImageBundle, and finally upload to the distributed Storage framework HDFS;

Step 2: Download the ImageBundle file from HDFS. Hadoop will split the ImageBundle file into the Map function of different nodes according to the number of nodes in the cluster. The Map function obtains the key-value pair form of the image, and extracts each image set based on the public SIFT algorithm. A plurality of SIFT feature vectors of an image, a sequence of SIFT feature vectors constituting an image, and SIFT features of all images are represented as S, where S _a ∈S, S _a represents a vector, a is a natural number greater than or equal to 1, and the _{i i} image The SIFT feature is F _i , where F _ib ∈F _i , F _ib represents a vector, b is a natural number greater than or equal to 1, and each image in the image set is stored as a <key:image ID,value:Fi> key-value pair On HDFS;

Step 3, to calculate Euclidean distance S _a cluster center of A ^d, as the value S _a, S _a to the Euclidean nearest cluster center as a key value A _q; A _dk ∈A ^{d, d} is greater than or equal A non-negative integer of 0, k is a natural number greater than or equal to 1, and A _dk represents a vector; k SIFT features randomly selected from S are used as the initial k cluster centers to form an initial cluster center set A ⁰ , A _0k ∈ A ⁰ ; output as a key value pair <key:A _q , value:S _a >.

In step 4, the average value of S _a with the same key value is obtained, d=d+1, and each average value is taken as a new cluster center A ^d ;

Step 5: Calculate the Euclidean distance mean of the new cluster center A ^d and the cluster center A ^d-1 of the previous cycle. If the Euclidean distance average is >0.05, jump back to step 3; if the Euclidean The distance average value <0.05, the new cluster center A ^{d is} output as the feature dictionary E of the image set I, where E _k ∈E, k is a natural number greater than or equal to 1, E _k is a vector, and the feature dictionary E is an image. Cluster center.

Step 6, adopt

Computation of the E of each cluster center weighting, wherein: D is the total number of SIFT features of the S, D is a natural number 1, if S _a Euclidean distance to E _K minimum, i.e. S _a attributable to E _K Center,

Represents the total number of all SIFT features attributed to the E _k center.

Step 8, respectively calculating the Euclidean distance h of each feature vector F _ib and the feature dictionary E in the image I _i , where E _k ∈E, h _k ∈h=(h ₁ , h ₂ ,...,), k For a natural number greater than or equal to 1, the m E _k with the smallest h _k are selected from E to form a feature dictionary E′, E′=(E _f ,..., E _g ), where there are m vectors in E′. f is a natural number greater than or equal to 1, g is a natural number greater than or equal to 1, g>f;

Step 9, calculating the sparse coding c _{ib of} F _ib in the feature dictionary E′,

according to

The image sparse coding g _{i is} subjected to inverse visual word frequency weighted normalization to obtain IDF weighted sparse coding g', where g _i '∈g'.

Step 10, extracting a non-zero element in the IDF weighted sparse coding g _i ' of the image I _i ; g _{ik ∈ ∈} g _i ', k is a natural number greater than or equal to 1, and a non-zero element in g _i ' is (g _iu ′ ,...,g _iv '), let m be a non-zero element, m be a natural number greater than or equal to 1, m ≤ k, g _iu ′ ≠ 0, g _iv ′ ≠ 0, u is a natural number greater than or equal to 1, v is greater than or equal to 1 natural number, k>v>u;

Step 11, create k groups, named as:

among them,

Empty matrix

Step 12, using the matrix transformation of (Formula 1), hashing the IDF weighted sparse coding g _i ' of the image I _i into the m groups corresponding to the subscripts (u, . . . , v) of the non-zero elements;

Step 13, using

Step 14, combining the similar image pairs having the same image in the results obtained in step 13 to generate a similarity A subset of images.

Example 2

In this embodiment, a Baidu image search engine is used to retrieve images of 30 famous scenic spots or buildings (for example, Egypt Eiffel Tower, White House, Big Wild Goose Pagoda, etc.), and manually select clear and accurate images from each type. 100 images, consisting of 3,000 near-repeated images, and randomly selecting 17,000 photos from the public dataset Flickr-100M (source: http://webscope.sandbox.yahoo.com) as interference items, together with near-repetition image composition experiments Image dataset with a total of 20,000 images.

The Hadoop 2.4.0 version was selected as the experimental platform, and a total of 10 node computers constitute the Hadoop cluster of this embodiment. Since Hadoop does not support the reading and processing of image data itself, we have customized two types based on Hadoop's Java open source framework: ImageBundle and ImageWritable. Similar to Hadoop's own SequenceFile, ImageBundle combines a large number of image format files into one large file and serializes them in HDFS in the form of a fixed key-value pair <key:imageID,value:imageWritable>. Custom ImageWritable inherits from Hadoop's Writable for encoding and decoding ImageBundles. Inheritance and Writable's two key functions, encoder() and decoder(), are used to decode the image's binary file into a key-value pair form and encode the key-value pair into a binary format. The following are the experimental steps:

1. Normally scale and perform gradation processing on 20,000 images. In this embodiment, the image is standardized to have a size of 256*256.

2. Use ImageBundle for 20,000 images to encode the image as a key-value pair and store it in the ImageBundle file and upload it to HDFS.

3. Perform parallel feature extraction on the ImageBundle file uploaded in step 2 based on Mapreduce, and store it in the SequenceFile file in the form of key value pair <key: image ID, value: image feature>. Pass to HDFS. The image extracted in this embodiment is a SIFT feature, and the SIFT feature is a local feature of the image, and each feature includes an indefinite number of features, and each SIFT feature is 128-dimensional.

4. Based on the MapReduce-Kmeans algorithm described in Embodiment 1, the image feature SequenceFile file generated in step 3 is parallelized by Kmeasn clustering. In this embodiment, the clustering center K=512, the loop termination condition is 0.01, and the loop ends when the Euclidean distance mean of the cluster center generated by the two loops is less than 0.01. This step will generate 512 cluster centers as visual dictionaries, each cluster center being a 128-dimensional vector as a visual word.

5. Perform an IDF weight evaluation for each visual word in the visual dictionary generated in step 4 based on step 7 of embodiment 1.

6. Perform imageReduce parallelization sparse coding and similarity calculation from the image feature SequenceFile file generated in step 3 of the HDFS download, wherein the Map function uses the steps 8 and 9 described in Embodiment 1 to sparsely encode the image set and follow the steps of Embodiment 1. 10 is hashed to the Reduce function, and the Reduce function receives the sparse coding sent from the Map function, performs the similarity calculation according to the method described in Step 11 of Embodiment 1, and outputs a similar pair greater than the threshold. In this embodiment, the sparsity degree L=10 is used, and the sparse coding of each image is 512 dimensions, and the similarity threshold is 0.7.

7. Combine the similar image pairs generated in step 6 of this embodiment, and finally output a near-repetitive similar image set.

The experimental results on the test data show that our algorithm can achieve a precision of 0.9 when the recall is 0.86, and the total time is 3.24kilosecond. Among them, Fig. 5 shows nine images randomly extracted from clustering results obtained from different methods of five keywords (Flower, Iphone, Colosseum, Elephant, Cosmopolitan), and F value is F1-measure index. The comparison methods are Partition min-Hash algorithm (PmH), Geometric min-Hash algorithm (GmH), min-hash method (mH), standard LSH algorithm (st.LSH) and Bag-of-Visual-Words-based tree. Search algorithm (baseline). Experimental Results Figure 6 shows the results of clustering 17,000 Flickr photos by this method. Cluster size is the number of photos of the corresponding cluster set.

Claims

An approximate repeated image detection method based on sparse representation, which is proposed based on the Hadoop distributed computing framework, the detection method comprising the following steps: acquiring IDF weighted sparse coding g' of all images in image set I, where I=(I 1 , I 2 , ..., I i , ..., I w , ..., I z , ..., I R ), IDF weighted sparse coding of I i is g i ', g i '∈g', i is A natural number greater than or equal to 1, w is a natural number greater than i, z is a natural number greater than w, and R is a natural number greater than z, and the method further includes:

(1) extracting the non-zero element in the IDF-weighted sparse coding g i ' of the image I i ; g ik '∈g i ', k is a natural number greater than or equal to 1, and the non-zero element in g i ' is (g iu ' ,...,g iv ′), let m be non-zero elements, m be a natural number greater than or equal to 1, m ≤ k, g iu ′′0, g iv ′′0, u is a natural number greater than or equal to 1, v is greater than or equal to 1 natural number, k>v>u;

(2) Create k groups, named as:
among them,
Empty matrix

(3) using the matrix transformation of (Formula 1), hashing the IDF weighted sparse coding g i ' of the image I i into the m groups corresponding to the subscripts (u, ..., v) of the non-zero elements;

(4) Utilization
Calculating the similarity Y of each pair of images <I i , I j > IDF weighted sparse coding in each of the m groups obtained in the step (3), if the Y is greater than 0.7, the images <I i , I j > are similar images Correct;

Where j is a natural number greater than or equal to 1, and i ≠ j; g' i and g' j respectively represent IDF weighted sparse coding of images I i and I j ;

(5) Combining similar image pairs having the same image in the results obtained in the step (4) to generate a similar image subset.
The method for detecting an approximate repeated image based on a sparse representation according to claim 1, wherein the acquiring the IDF weighted sparse code g' of all images in the image set I comprises the following steps:

Parallelizing the local features of each image to obtain local features S of all images in image set I;

Extracting the image clustering center to obtain the feature dictionary E;

Calculate the weight of each cluster center in E;

The IDF weighted sparse coding g' of the image is extracted according to each cluster center weight in E.
The approximate repeated image detecting method based on sparse representation according to claim 2, wherein the parallelizing the extracted image local features is to extract SIFT features of all images of the image set I.
The method for detecting an approximate repeated image based on a sparse representation according to claim 3, wherein the extracting SIFT features of all images is as follows:

Normalizing and grading the size of each image I i in the image set I to obtain a standard-sized gray image set; where I=(I 1 , I 2 , . . . , I i , . . . , I w ,...,I z ,...,I R );

The standard-sized grayscale image set is segmented to each cluster node, and the SIFT features of each image are extracted in parallel. The SIFT features of all images are represented as S, where Sa a ∈S, S a represents a vector, and a is greater than or equal to 1 The natural number, the SIFT feature of the I- th image is F i , where F ib ∈F i , F ib represents a vector, and b is a natural number greater than or equal to 1.
The method for detecting an approximate repeated image based on a sparse representation according to claim 2, wherein the extracting the image clustering center is as follows:

(11) S a calculated Euclidean distance to the cluster center of A d, as the value S a, S a to the Euclidean nearest cluster center as a key value; A dk ∈A d, d is greater than or equal to 0 Non-negative integer, k is a natural number greater than or equal to 1, A dk represents a vector; k SIFT features randomly selected from S are used as the initial k cluster centers to form an initial cluster center set A 0 , A 0k ∈A 0 ;

(21) Find the average value of S a with the same key value, d=d+1, and take each average value as the new cluster center A d ;

(31) Calculate the Euclidean distance mean of the new cluster center A d and the cluster center A d-1 of the previous cycle, and if the Euclidean distance average value is >0.05, jump back to step (11) to execute; The Euclidean distance mean <0.05, the new cluster center A d is output as the feature dictionary E of the image set I, where E k ∈E,k is a natural number greater than or equal to 1, and E k is a vector, and the feature dictionary E is Cluster the center for images.
The method for detecting an approximate repeated image based on a sparse representation according to claim 2, wherein the calculating the weight of each cluster center in E is as follows:
Calculate the weight of each cluster center in E, where: D is the total number of all SIFT features in S, and D is a natural number greater than or equal to 1.
Represents the total number of all SIFT features attributed to the E k center.
The method for detecting an approximate repeated image based on a sparse representation according to claim 2, wherein the extracting the IDF-weighted image is sparsely encoded, and the specific steps are:

Calculating the Euclidean distance h of each feature vector F ib and the feature dictionary E in the image I i , where E k ∈E, h k ∈h=(h 1 , h 2 ,...,), k is greater than or equal to 1 is a natural number, E is selected from the smallest of the m h k E K, wherein the composition dictionary E ', E' = (E f, ..., E g), where E 'has m vectors, f is greater than a natural number equal to 1, g is a natural number greater than or equal to 1, g>f;

Calculating the squared difference matrix C ib of F ib and the feature dictionary E′ by using C ib =(E'-1F ib T )(E'-1F ib T ) T ;

Calculate the sparse coding c ib of F ib in the feature dictionary E′ ,

Where h m ′′h′=(h 1 ', h 2 ',...,) is the Euclidean distance of the visual words of the feature vector F ib to E′ respectively, and diag(h′) represents the element of the vector h′ as The main diagonal of the matrix;

Extracting the maximum value of k c ib is, sparse coding obtain an image of the image I i g i, where g ik ∈g i;

according to
The image sparse coding g i is subjected to inverse visual word frequency weighted normalization to obtain IDF weighted sparse coding g'.