CN102402621A

CN102402621A - Image retrieval method based on image classification

Info

Publication number: CN102402621A
Application number: CN2011104434345A
Authority: CN
Inventors: 潘志庚; 张明敏; 张辉; 李文庆
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2011-12-27
Filing date: 2011-12-27
Publication date: 2012-04-04

Abstract

The invention relates to an image retrieval method based on image classification and aims to solve the problem of lower retrieval speed by the conventional method. The method comprises the following steps of: firstly, determining the class number of images in the image classification and a training image set; secondly, extracting the content characteristics of the training image set for classifier training to obtain a classifier; thirdly, inputting an image to be retrieved, extracting the content characteristics of the image to be retrieved as the input of the classifier to obtain a retrieval image set which corresponds to the class, and extracting the content characteristics of each image in the retrieval image set; and finally, according to the obtained content characteristics, acquiring similarity distances of the image to be retrieved and each image in the retrieval image set by using a similarity calculation algorithm, sorting the distances to obtain N images which are nearest to the image to be retrieved, and outputting the N images. The image retrieval method based on the image classification has the advantage that: by combining an image classification technology and the conventional content-based image retrieval method, an image retrieval speed is greatly improved.

Description

Image retrieval method based on image classification

Technical Field

The invention belongs to the field of multimedia information retrieval, and particularly relates to an image retrieval method based on image classification, which relates to the fields of image processing, computer vision, machine learning, image retrieval and the like, and can be directly used for content-based image retrieval in a Web environment and a single computer.

Background

With the development of the internet, multimedia information, particularly image information, is being generated and spread at an explosive speed. In the face of the huge image bank, how to mine images meeting the needs of the user becomes a very difficult problem, so the image retrieval field is becoming a hotspot of research and engineering practice in recent years. At present, there are two main methods for image retrieval: text-based image retrieval and content-based image retrieval.

The current mainstream image search engine adopts a text-based image retrieval technology, which is also a text retrieval technology in the traditional sense, and the process is to build indexes for text information related to network images, such as image titles, link texts, content descriptions and the like, to label the images, and finally, to retrieve by adopting a keyword matching technology. However, due to semantic ambiguity and descriptive ambiguity problems, the accuracy of text-based image retrieval techniques is relatively limited and the returned results are often not in line with the expectations of the user.

The content-based image retrieval is realized by introducing the technology in the field of computer vision to perform mathematical modeling and solving on an image pixel level object, so that the content characteristics of the image are used as the identification of the image. For the content feature extraction aspect of images, image retrieval currently has some transitions: the transformation from feature extraction based on the whole image to feature extraction based on the region (or object) makes the extracted features of the image more robust, keeps the features unchanged before and after the image is rotated and translated, and keeps relative invariance before and after the image illumination changes. The content-based image classification technology is a result of fusion of computer vision and pattern classification technology, and is a technology for automatically learning and classifying images by using a pattern classification method after extracting image content features. In the last decade, the classification technology is greatly improved, a support vector machine method based on a statistical probability theory is generated, and Adaboost methods established on a simple classifier at the bottom layer are generated, so that the classification accuracy of the methods is greatly improved compared with the prior methods such as a decision tree and a neural network. However, the original content-based image retrieval is difficult to be applied to a large-scale image library system due to large calculation amount, the application on the internet is less, and the image classification method only finds that the image belongs to a certain class and does not perform retrieval operation in the class.

Disclosure of Invention

The invention aims to provide an image retrieval method based on image classification, which integrates an image classification technology into image retrieval and solves the problem of slow retrieval speed in the current image retrieval based on content.

The invention provides an image retrieval method based on image classification, which comprises the following steps:

determining the number Cm of image categories in image classification, each detailed category (C1, C2, …, Cm) and training image sets (P1, P2, …, Pm) corresponding to each category;

b, extracting content characteristics of the training image set to train a classifier to obtain a classifier C;

inputting an image M to be retrieved, extracting content characteristics of the image M to be retrieved as the input of a classifier C, wherein the output result Co of the classifier C is the category of M;

d, obtaining a retrieval image set IPo corresponding to the category Co, and extracting the content characteristics of each image in the retrieval image set IPo;

and E, obtaining the similarity distance between the image M to be retrieved and each image in the retrieval image set IPo by using a similarity calculation algorithm according to the content characteristics obtained by C, D, sequencing the distances, and finally obtaining and outputting N images with the minimum distance to the image M to be retrieved.

Preferably, the image content features extracted in step B are SIFT features, and the specific steps include:

b11, extracting SIFT feature points of each image in the training image set to form a feature point library, wherein each SIFT feature point is represented by a 128-dimensional feature point descriptor vector;

b12, clustering the feature point library by adopting a K-means clustering method to obtain K classes;

b13, for a single image in the training image library, mapping all SIFT feature points of the single image to K classes respectively according to a minimum distance principle to obtain a K-dimensional feature vector V1, and dividing each value in the feature vector V1 by the total number of the feature points of the single image to obtain a frequency feature vector V2, wherein V2 is the SIFT feature of the single image.

Preferably, the classifier used in the step B is a support vector machine, and may be developed by using an open-source libsvm library, and the specific step of training the classifier includes:

b21, selecting RBF as a kernel function of the support vector machine;

b22, combining the feature vectors in the training image library with the category information of each image as the input of a support vector machine;

b23, determining the RBF kernel function parameters when the classification accuracy is highest by adopting a cross-printing method, and obtaining the support vector machine classifier C after determining the parameters.

Preferably, the specific step of extracting the content feature of the image M to be retrieved in step C includes:

c1, extracting SIFT feature points of the image M to be retrieved, wherein each SIFT feature point is represented by a 128-dimensional feature point descriptor vector;

and C2, mapping all SIFT feature points of the image M to be retrieved to K classes of B12 respectively according to a minimum distance principle to obtain a K-dimensional feature vector V1, and dividing each value in the feature vector V1 by the total number of the feature points of the image to obtain a frequency feature vector Vm, wherein the Vm is the SIFT feature of the M.

Preferably, the similarity calculation algorithm adopted in step E is a euclidean distance method, and the similarity measurement criterion is that smaller euclidean distances are more similar.

The invention has the beneficial effects that: the invention integrates the image classification technology on the basis of the traditional image retrieval method based on the content, thereby greatly improving the speed of image retrieval. In the content characteristic part of the image, SIFT characteristics are adopted, and the method has the characteristics of keeping invariance to scale scaling, brightness change and rotation of the image, so that the algorithm is more robust.

Drawings

In order to more clearly illustrate the embodiment of the present invention or the technical solutions in the prior art, the drawings used in the embodiment or the technical solutions in the prior art description will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

FIG. 1 is a flow diagram of the overall architecture of the present invention.

Fig. 2 is a process of constructing a gaussian difference scale space.

FIG. 3 is a schematic diagram of finding extreme points in scale space.

FIG. 4 is a diagram of feature vector generation by keypoint neighborhood gradient information.

Detailed Description

The present invention aims to provide an image retrieval method based on image classification, which integrates an image classification technology into image retrieval, solves the limitation of slow retrieval speed in the current content-based image retrieval, and specifically explains the implementation of the present invention with reference to the accompanying drawings in order to make the purpose, technical scheme and advantages of the present invention clearer.

Before describing the detailed flow of the present invention in detail, a brief introduction will be made to the basic knowledge required by the present invention:

a: SIFT content feature extraction of images

The SIFT content feature extraction of the image comprises two basic algorithms: extracting SIFT feature point descriptors and constructing image SIFT content features.

A1 extraction of SIFT feature point descriptor

The SIFT algorithm was proposed in d.g. lowe1999 and was perfected in 2004. The SIFT algorithm is an algorithm for extracting local features of an image and has the following characteristics: a) the SIFT features are local features of the image, and keep invariance to scale scaling, brightness change and rotation of the image, and also keep certain stability to radiation transformation, view angle change and noise. b) The method has the advantages of rich information quantity and good uniqueness, and is suitable for fast and accurate matching in a massive characteristic database. c) Even a few different images may produce a large number of different SIFT feature point descriptors. d) And the expandability can be conveniently fused with the feature vectors in other forms.

The SIFT feature point descriptor extraction algorithm comprises the following steps: 1) detecting extreme points on a scale space 2) accurately positioning the extreme points 3) calculating the direction parameters of each key point 4) generating descriptors of the key points. Each step is explained in detail below.

A11 Generation of a Scale space

The scale-space theory aims to model the multi-scale properties of image data.

The only linear kernel for realizing scale transformation is the Gaussian convolution kernel, and a two-dimensional image

Figure 2011104434345100002DEST_PATH_IMAGE002

The scale space of (a) is defined as:

Figure 2011104434345100002DEST_PATH_IMAGE004

wherein

Figure 2011104434345100002DEST_PATH_IMAGE006

Is a scale-variable gaussian function and is,

Figure 2011104434345100002DEST_PATH_IMAGE008

，

Figure 2011104434345100002DEST_PATH_IMAGE010

is a coordinate of the space, and is,

are scale coordinates.

In order to efficiently detect stable key points in scale space, the concept of the gaussian difference scale-space (DOG scale-space) is proposed. The difference of gaussians scale space is generated by convolution of different scales of difference of gaussians kernels and images.

Figure 2011104434345100002DEST_PATH_IMAGE014

As shown in fig. 2, the original scale-space model of the image is shown on the left side, and the gaussian difference space-scale model is shown on the right side.

A12 detection of spatial extreme points

In order to find the extreme point in the scale space, the sampling point needs to be compared with all its neighboring points to determine whether it is larger or smaller than all its neighboring points in the image domain and the scale domain. As shown in fig. 3, the middle detection point is compared with 26 points, which are 8 adjacent points of the same scale and 9 x 2 points of the upper and lower adjacent scales, to ensure that the extreme points are detected in both scale space and two-dimensional image space.

In order to accurately determine the position of the extreme point, the position and the scale of the key point are accurately determined by fitting a three-dimensional quadratic function, and meanwhile, the key point with low contrast and the unstable edge response point are removed, so that the matching stability is enhanced, and the anti-noise capability is improved.

A13 Key Point Direction Allocation

In order to enable the operator to have rotation invariance, a direction parameter is assigned to each key point by using the gradient direction distribution characteristic of the pixels in the neighborhood of the key point.

The two formulas are as above

The modulus of the gradient and the direction formula. WhereinThe scale used is the scale at which each keypoint is located. In actual calculation, sampling is carried out in a neighborhood window with a key point as the center, and the gradient direction of a neighborhood pixel is counted by using a histogram. The gradient histogram ranges from 0 to 360 degrees with one bin every 10 degrees for a total of 36 bins. The peak of the histogram represents the main direction of the neighborhood gradient at the keypoint, i.e. the direction that is taken as the keypoint. In the gradient direction histogram, when there is another peak corresponding to 80% of the energy of the main peak, this direction is considered as a secondary direction of the keypoint. A keypoint may be assigned multiple directions (one primary direction, more than one secondary direction), which may enhance the robustness of the match. So far, the detection of the key points of the image is finished, and a SIFT feature region can be determined.

A14 feature Point descriptor Generation

To ensure rotational invariance, the coordinate axes are first rotated to the direction of the key point. Next, a 8 x 8 window is taken centered on the keypoint. The central black point on the left part of fig. 4 is the position of the current keypoint, each square in the window represents a pixel in the scale space where the keypoint neighborhood is located, the length of the arrow represents the diyu module value, the direction of the arrow represents the gradient direction of the pixel, and the blue circle in the graph represents the range of gaussian weighting (pixels closer to the keypoint contribute more to the directional information). Then, a histogram of gradient directions in 8 directions is calculated on each 4 × 4 small block to obtain an accumulated value of each gradient direction, and a target point can be formed, as shown in the right part of the figure. In this figure, a key point consists of four target points 2 x 2, each having 8 pieces of direction vector information. The thought of neighborhood directivity information combination enhances the anti-noise capability of the algorithm, and simultaneously provides better fault tolerance for the feature matching containing the positioning error.

In the actual calculation process, in order to enhance the robustness of matching, Lowe proposes to use 16 seed points of 4 × 4 for each keypoint, so that 128 data can be generated for one keypoint, i.e., a 128-dimensional SIFT feature vector is finally formed. At this time, the SIFT feature vector has already removed the influence of set deformation factors such as scale change, rotation, etc., and then the length of the feature vector is normalized, so that the influence of illumination change can be further removed.

A2 image SIFT content feature structure

After the SIFT feature point descriptor subset of each image is obtained by the above method, because the SIFT feature points of each image are different and cannot be used in a general classification method, a method called bag of words is also needed to obtain a feature quantity with a fixed length to represent each image. The word package concept is as follows: regarding each feature point descriptor as a word, so that all feature point sets of the image set are a word bank, and dividing all words of the word bank into N classes by using a clustering method for the word bank; for a single image in the image set, the distribution of all the feature points of the single image in the N classes is counted to obtain an N-dimensional frequency distribution vector, and the vector is the SIFT content feature of the single image.

B: support vector machine classifier

An important topic in data mining is data classification. The data classification means that a classifier is formed by training according to a certain classification principle on the basis of the existing training data; the classifier is then used to determine the class of the unclassified data. Support vector machines are a classification boundary-based approach. The basic principle is (taking two-dimensional data as an example): if the training data is abstracted as points distributed on a two-dimensional plane, they are grouped into different regions on the plane according to their categories. The purpose of classification boundary-based classification algorithms is to find the boundaries between these classes through training. For multidimensional data (e.g., N-dimensional), they can be considered points in an N-dimensional space, while classification boundaries are hyper-surfaces in the N-dimensional space. The linear classifier uses a hyperplane type of boundary and the nonlinear classifier uses a hypersurface.

B1 basic principle of support vector machine classification

The support vector machine is based on linear partitioning. Not all data may be linearly partitioned. Points of two classes, such as in two-dimensional space, may require a curve as their boundary. Support vector machines take the measure of mapping points in a low-dimensional space into a high-dimensional space in order to make them linearly separable. In the high-dimensional space, the method is a linear division, and in the original data space, the method is a non-linear division. And judging the classification boundary by using a linear division method. When discussing specific algorithms for support vector machines, we will consider the optimization problem (finding the optimal solution for a certain objective) as how to define the mapping algorithm from low-dimensional to high-dimensional space is already included in its kernel function.

B2 optimization problem

If we represent a problem as a function

Figure 2011104434345100002DEST_PATH_IMAGE022

The optimization problem is to find the minimum or maximum of the function. In higher mathematics, if the function is continuously derivable, its extreme value can be found by taking the derivative, calculating the point where the derivative is zero. But in reality the functions are not necessarily continuously derivable. The optimization problem is a discussion of this situation.

The optimization problem can be divided into two categories: (1) an unconstrained optimization problem; (2) there is a constrained optimization problem.

The unconstrained optimization algorithm can be expressed as:

Figure 2011104434345100002DEST_PATH_IMAGE024

. The first approximate optimal solution can be obtained by a Newton method, a steepest gradient descent method and the like in a numerical calculation method through multiple cycles.

There is a constraint problem, generally expressed as:

Figure 2011104434345100002DEST_PATH_IMAGE026

b3 Linear divisible binary problem

The implications of the linearly separable binary problem are: the raw data may be divided by a straight line (if the data is only two-dimensional) or a hyperplane. There are three basic approaches to separating data into two classes using hyperplanes in a multidimensional space:

(1) square nearest point method: the bisector of the connecting line between the two closest points of the two types of points is used as a classification line (plane).

(2) Maximum interval method: and calculating a classification surface to maximize the interval of classification boundaries. The classification boundary is the translation of a value from the classification face to the points of the two classes, respectively, until the first data point is encountered. The distance between the classification boundaries of the two classes is the classification interval.

The classification plane is represented as:

Figure 2011104434345100002DEST_PATH_IMAGE028

. It is noted that,

is a multi-dimensional vector. The reciprocal of the classification interval is:

Figure 2011104434345100002DEST_PATH_IMAGE032

. The optimization problem is expressed as:

Figure 2011104434345100002DEST_PATH_IMAGE034

the constraint is as follows: require each data point

Figure 2011104434345100002DEST_PATH_IMAGE036

The distance to the classification surface is greater than or equal to 1. Wherein,is the classification of the data.

(3) Linear support vector classifier: classifying the noodles:

and, requiring:

Figure 2011104434345100002DEST_PATH_IMAGE040

based on this, find

Figure 2011104434345100002DEST_PATH_IMAGE042

(optimal solution, algorithm otherwise):

Figure 2011104434345100002DEST_PATH_IMAGE044

，

Figure 2011104434345100002DEST_PATH_IMAGE046

description of the drawings: the linear support vector machine is based on the maximum spacing method. The problem is a quadratic programming problem, the lagrangian function is used for combining the optimization problem and the constraint, and then the dual theory is used for obtaining the classification optimization problem. It should be noted that this problem is still a constrained optimization problem.

B4 Linear inseparable problem

(1) Linear soft interval classifier

The basic idea is as follows: the original requirement for the spacing cannot be met because the samples are linearly indistinguishable. Introducing relaxation variables

Figure 2011104434345100002DEST_PATH_IMAGE048

The constraint is weakened to:

Figure 2011104434345100002DEST_PATH_IMAGE050

. However, we still want this relaxation variable

Minimize (if0, the original linear hard interval classifier). Thus, penalty parameters are used in optimizing the objective function

Figure 2011104434345100002DEST_PATH_IMAGE052

To introduce into

The goal of minimization. Thus, the model of the classifier is: classifying the noodles:

and, requiring:

taking this as the original problem, the dual problem is:

(2) nonlinear hard interval classifier

The basic idea is as follows: a curve (surface) in a low-dimensional space can be mapped to a straight line or a plane in a high-dimensional space. The data is linearly separable in a high-dimensional space after being subjected to such mapping. Let the mapping be:

Figure 2011104434345100002DEST_PATH_IMAGE058

then, the linear support vector machine model in the high-dimensional space is:

classifying the noodles:and, requiring:

Figure 2011104434345100002DEST_PATH_IMAGE060

it should be noted that since the data is mapped to a high dimensional space，

Figure 2011104434345100002DEST_PATH_IMAGE062

Calculated quantity ratio ofMuch larger. At this time, a so-called "kernel function" is introduced:

Figure 2011104434345100002DEST_PATH_IMAGE066

as can be seen from the above formula, the kernel function has the effect that

While mapping to the high-dimensional space, the inner product of the two data in the high-dimensional space is calculated, so that the calculated amount is returned to

Of the order of magnitude of (d).

B5 Multi-Classification problem

(1) One class is to the rest class method:

a support vector machine is established that separates one class from the rest. If the training data has M types, M support vector machines need to be established. IdentificationWhen classifying, selectingThe largest classification:

or:calculate two maximums

As a confidence. If it is notThen selectThe largest class; otherwise, the classification is rejected.

(2) Paired classification

And (3) establishing classifiers between any two of the M classes, wherein M (M-1)/2 classifiers are shared. IdentificationWhen classifying, a voting method is adopted, and the class with the most votes isThe final classification of (1).

Having described the basic knowledge required by the present invention, the steps of the present invention will now be described in detail with reference to the flow chart of FIG. 1.

First, the classifier is a key component used in each search in the present invention. Since the classifier used in the present invention is a support vector machine, it is necessary to determine the number Cm of classes of images in image classification, each detailed class (C1, C2, …, Cm), and a training image set (P1, P2, …, Pm) corresponding to each class. The size of Cm is selected according to the application of the invention, and for the application of a larger search image set, such as image search on the Web, Cm should be larger, and the classification details the scale of the target search image. For each detailed category (C1, C2, …, Cm), different description methods are also adopted according to different application occasions, for example, in Web retrieval, a certain item Ci may correspond to some described keywords, and in some small local image retrieval systems, Ci may directly correspond to the file storage address of a certain type of image.

After determining the number of classes Cm of the images in the image classification, each detailed class (C1, C2, …, Cm) and the training image set (P1, P2, …, Pm) corresponding to each class, SIFT feature points of each image in the training image set are extracted by the method described above to form a feature point library, wherein each SIFT feature point is represented by a 128-dimensional feature point description sub-vector. And clustering the feature point library by adopting a K-means clustering method to obtain K classes. For a single image in a training image library, mapping all SIFT feature points of the single image to K classes respectively according to a minimum distance principle to obtain a K-dimensional feature vector V1, dividing each value in V1 by the total number of the feature points of the single image to obtain a frequency feature vector V2, wherein V2 is the SIFT content feature of the single image.

After the SIFT content features of the training image library are obtained, the support vector machine classifier is trained by using the content features. For support vector machines, development can be performed using an open-source libsvm library. Firstly, RBF is selected as a kernel function of a support vector machine, then, feature vectors in a training image library are combined with class information of each image to serve as input of the support vector machine, a cross-validation method is adopted to determine RBF kernel function parameters when the classification accuracy is highest, and a support vector machine classifier C is obtained after the parameters are determined.

After the support vector machine classifier C is obtained, the preparation work of the system is also completed. For an image M to be retrieved, firstly extracting the SIFT image content characteristics, and the steps are as follows: extracting the SIFT feature points of M, wherein each SIFT feature point is represented by a 128-dimensional feature point descriptor vector, mapping all SIFT feature points of M to K classes respectively according to a minimum distance principle to obtain a K-dimensional feature vector V1, and dividing each value in V1 by the total number of the feature points of the image to obtain a frequency feature vector Vm, wherein the Vm is the SIFT content feature of M. Support vector machine C accepts Vm as an input, resulting in a class Co of M. And obtaining IPo retrieval image sets corresponding to the categories Co, and extracting IPo SIFT content features of each image to form a content feature query set. And (3) calculating the Euclidean distance of each characteristic quantity in the Vm and IPo content query sets by using a Euclidean distance method, wherein the similarity measurement criterion is that the smaller the Euclidean distance is, the more similar the Euclidean distance is, sorting the distances to obtain and outputting N images with the minimum distance to M.

The content features and the classifier used in the above embodiments are only used for illustrating the present invention and are not limited to the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention, such as the method of extracting content features can be extended to texture symbiotic matrix method, wavelet transform method, etc., and the classifier can select other methods such as neural network, decision tree, etc., as all equivalent technical solutions also fall into the scope of the present invention, and the patent protection scope of the present invention should be defined by the claims.

Claims

1. An image retrieval method based on image classification is characterized by comprising the following steps:

2. The image retrieval method according to claim 1, characterized in that: the content features extracted in the step B are SIFT features, and the method specifically comprises the following steps:

3. The image retrieval method according to claim 1, characterized in that: the classifier adopted in the step B is a support vector machine, an open-source libsvm library can be adopted for development, and the specific steps of training the classifier comprise:

b21 selecting RBF as kernel function of support vector machine

4. The image retrieval method according to claim 1, characterized in that: the specific step of extracting the content features of the image M to be retrieved in the step C includes:

5. The image retrieval method according to claim 1, characterized in that: and E, adopting a similarity calculation algorithm as a Euclidean distance method, wherein the similarity measurement criterion is that the smaller the Euclidean distance is, the more similar the Euclidean distance is.