CN106529586A

CN106529586A - Image classification method based on supplemented text characteristic

Info

Publication number: CN106529586A
Application number: CN201610939529.9A
Authority: CN
Inventors: 郭继昌; 王楠; 张帆
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2016-10-25
Filing date: 2016-10-25
Publication date: 2017-03-22

Abstract

The invention relates to the field of digital image processing technology, and provides a method for image classification through supplementing a text characteristic for settling a problem of classification result deterioration caused by image characteristic discriminating property reduction because of text characteristic information loss in a bag of visual words. The image classification method based on the supplemented text characteristic comprises the following steps of (1), respectively extracting an SIFT characteristic, a color characteristic, and a shape-and-boundary characteristic of an image in a data set; (2), randomly selecting 200000 characteristics from the supplemented text characteristics and obtaining a dictionary B through a K-singular value decomposition method (K-SVD); (3), obtaining a sparse characteristic vector C through the studied dictionary B; (4), performing spatial maximal value gathering on the sparse characteristic; and (5), transmitting an image characteristic vector F into a linear SVM classifier, thereby performing training and testing respectively for obtaining an image classification result. The image classification method is mainly applied for digital image processing.

Description

Image classification method based on supplemental text feature

Technical field

The present invention relates to digital image processing techniques field, and in particular to a kind of scene image based on supplemental text feature Sorting technique.

Background technology

Basis of the image classification as image understanding, plays important role in computer vision field.Although people Many effort are done in terms of feature extraction and training, has been had made great progress, but at present still without unified best The sorting technique suitable for all images.

In image classification, it is exactly vision bag of words (Bag-of-Word) that one of them is the most popular, and the model is one The image classification model based on statistical model is planted, its objective is to obtain the image feature representation of more expressive force.Should in the model It is the feature based on image text information with most commonly used characteristics of image, such as SIFT, HOG etc..After extracting the feature of image, Need the local feature that profit is obtained to build visual dictionary, the feature extracted is carried out mapping with the dictionary of construction then and obtains figure The character representation of picture, is used for image classification and retrieval as the output characteristic of bag of words.

In traditional vision bag of words, the description subcharacter of extraction is usually the text features such as SIFT, and this feature is main The specific details of some of target described in collection, tends to ignore the information such as color in image, shape, border.And these Information is also an indispensable part in image classification, if not having image classification of these information based on bag of words can Can ignore that the nuance between image, so as to cause final characteristics of image to cash deficiency, affects the effect of image classification Really.

The content of the invention

To overcome the deficiencies in the prior art, it is contemplated that：As text feature information lacks in solution vision bag of words Lose and the final problem for affecting classification results of caused characteristics of image ability to see things in their true light decline, proposition is a kind of to supplement to text feature Method be used for image classification.The text features such as SIFT are mainly based upon a kind of local feature of image interest point extraction, the spy Levy the information such as color in easy missing image, shape, border.Therefore it is proposed that a kind of specific compensation process, the method While not being significantly increased amount of calculation, using the color of image, shape and boundary characteristic as the supplement of text feature and its melt Close, and the complementary features after fusion are used for into visual dictionary and build and feature coding, so as to improve the ability to see things in their true light of characteristics of image, carry High-class accuracy rate.The technical solution used in the present invention is that, based on the image classification method of supplemental text feature, step is as follows：

(1) SIFT feature of image, color characteristic, shape and boundary characteristic in data set is extracted respectively, and by color, shape Shape and boundary characteristic are dissolved in SIFT feature as side information, obtain supplementary text feature；

(2) randomly select 200000 from supplementary text feature to train by K- singular value decomposition method K-SVD To dictionary B；

(3) by the dictionary B for learning, all of text feature X of single image is entered using orthogonal matching pursuit method Row sparse coding, obtains sparse features vector C；

(4) space maximum convergence is carried out to sparse levying, and feature after convergence is normalized obtains image Characteristic vector F；

(5) image feature vector F is sent into into Linear SVM grader, is trained respectively and test obtains image classification knot Really.

Step 1 is specially：

(1.1) by each image with 8 pixels as interval, 16 × 16 image block is divided into, and obtains each in image block The gradient magnitude and gradient direction of pixel；

(1.2) vector sum of each pixel in 8 directions in the image block per 4 × 4 pixels is calculated, wherein 8 directions point Not Wei 0 degree, 45 degree, 90 degree, 135 degree, 180 degree, 225 degree, 270 degree, 315 degree, obtain 128 dimensions SIFT feature vector, 16 × 16 such characteristic vectors are had in 16 image block, this feature is used as the basic text feature of image；

(1.3) image is transformed into into hsv color space, and original image block is divided into 4 × 4 grid, in each grid The average and standard deviation of three Color Channels are inside calculated respectively, obtain the color characteristic of the feature of 96 dimensions, i.e. image, as text The side information of feature；

(1.4) dimensionality reduction, half of the dimensionality reduction to text feature dimension are carried out to color characteristic using PCA；

(1.5) the compact low-dimensional feature of image block is extracted using the matching kernel function based on local binary patterns

Wherein P represents two different image blocks with Q, and a and a ' represents the pixel in two image blocks respectively, S (a) is the standard deviation of pixel value in 3 × 3 fields centered on a, ∈_sIt is the constant of a numerical value very little, B (a) is a binary column vector for pixel value difference in the local window around binaryzation a, standardized linear kernelFor weighing the contribution of each local binary patterns, gaussian kernel function k_b(b (a), b (a '))=exp (- γ_b‖b (a)-b(a′)‖²) for measuring the shape similarity between two local binary patterns, Gauss position kernel function k_p(a, a ')= exp(-γ_p‖a-a′‖²) for measuring the space length between two pixels；

(1.6) dimensionality reduction is carried out to low-dimensional feature using core principle component analysis method, dimension is equally chosen for 64 dimensions, obtaining can be with The feature of image local shape and boundary information is obtained effectively；

(1.7) merged color, shape and boundary information as supplementary information and SIFT feature, obtaining dimension is 256 supplemental text feature；

Step 2 is specially：

(2.1) 200000 features, each feature are randomly selected in all of supplemental text feature of image from data set It is 256 dimensions, 200000 image blocks may be constructed the matrix of 256 × 200000, each row of matrix represent a benefit The Text eigenvector for filling；

(2.2) 512 are randomly selected as initial dictionary from 200000 image blocks, then initial dictionary can be with one Individual 256 × 512 matrix is represented；

(2.3) with K-SVD method training dictionaries, that is, solve object function：

Wherein K is degree of rarefication, is set to 5, c here_iI-th row of presentation code C.The step of obtaining dictionary is as follows：

(2.3.1) dictionary is fixed, each y is solved by orthogonal matching pursuit method_iCorresponding rarefaction representation c_i, its The span of middle i is 1 200000；

(2.3.2) residual error is calculated, and chooses the residual error item for indexing set that correspondence uses the kth of dictionary to arrange, to residual error item Singular value decomposition is carried out, singular value decomposition matrix is obtained, and is wherein obtained 256 × 256 unitary matrice；

(2.3.3) the kth row of dictionary are updated with 256 × 256 unitary matrice first row；

(2.3.4) iterate, training obtains final dictionary, and iterations is set to 50 here.

Step 3, specially：

(3.1) dictionary B is given, cataloged procedure is just to solve for the process of following formula：

(3.2) first time iteration, selects input data x as current residue, orthogonal matching pursuit algorithm select with it is current The optimal corresponding vision word of residual error；

(3.3) by x orthogonal mappings to the vision word for above selecting, and recalculate residual error；

(3.4) iterate, obtain sparse coding vector.

Step 4 is specially：

(4.1) to all of supplemental text feature in entire image, the maximum of its each dimension is sought, obtains one 512 dimension The characteristic vector of degree；

(4.2) entire image is divided into 4 image blocks, each characteristics of image is in each dimension in seeking each image block respectively Maximum, obtains the characteristic vector of 4 512 dimensions；

(4.3) entire image is divided into 16 image blocks, each characteristics of image is in each dimension in seeking each image block respectively Maximum, obtain the characteristic vector of 16 512 dimensions；

(4.4) the characteristics or every unit vector for producing (4.1) step to (4.3) is normalized respectively：

F_sFor the convergence feature that each unit is obtained, S is unit number, ‖ F_s‖_pFor p norms；

(4.5) normalization characteristic of obtain 21 units is attached, obtains the character representation of image.

Step 5 is specially：

(5.1) it is random that 60 images are chosen from each class image of data set as such training sample, residual graph As such test sample；

(5.2) characteristic vector of 100 images for choosing step (5.1) is trained in sending into Linear SVM grader, Obtain the SVM classifier for training；

(5.3) test sample chosen step (5.1) is sent in the SVM classifier that step (5.2) is trained, and obtains figure The classification results of picture.

The characteristics of of the invention and beneficial effect are：

1. context of methods is incorporated initial using the color in image, shape and boundary information as supplementary image information In text feature, the problem of text feature loss of learning is compensate for, improve the expressive force of characteristics of image.

2. the side information that context of methods is added all is Jing dimension-reduction treatment, therefore the addition of side information substantially will not increase Plus amount of calculation；

3. context of methods extracts effective information in image and by effective fusion method, is not increasing computing complexity The ability to see things in their true light of feature is increased while spending, image classification accuracy rate is improve.

Description of the drawings：

Fig. 1 is the flow chart of image classification method herein based on supplemental text feature.

Fig. 2 is this paper experimental data base parts of images.

Fig. 3 image space pyramid partition structure figures.

Specific embodiment

A kind of image classification method based on supplemental text feature is proposed, the shape in image, border and color are believed Breath merges in initial SIFT text features as side information, then carries out the structure and feature coding of dictionary, is had The graphical representation of effect, improves the accuracy rate of image classification.

The basic ideas of this method are：Basic text feature is entered using the shape in image, border and colouring information Row is supplemented, by by these information fusions solving the problems, such as text feature loss of learning in text feature, so as to follow-up The expressive force of feature is improved during feature learning, the accuracy rate of final image classification is improved.Specific method and step is as follows：

(6) SIFT feature of image, color characteristic, shape and boundary characteristic in data set is extracted respectively, and by color, shape Shape and boundary characteristic are dissolved in SIFT feature as side information, obtain supplementary text feature；

(7) randomly select 200000 from supplementary text feature to train by K- singular value decomposition methods (K-SVD) Obtain dictionary B；

(8) by the dictionary B for learning, all of text feature X of single image is entered using orthogonal matching pursuit method Row sparse coding, obtains sparse features vector C；

(9) space maximum convergence is carried out to sparse levying, and feature after convergence is normalized obtains image Characteristic vector F；

(10) image feature vector F is sent into into Linear SVM grader, is trained respectively and test obtains image classification knot Really.

Compared by experiment and shown, this method is compared existing similar image classification method and has more preferable classifying quality.

This experiment adopt Oxford Flower-17 data sets, data set include 17 classes flower image, each classification 80, Open image and there is larger dimensional variation, shooting angle change and illumination variation.Each class training image takes 60 respectively at random It is trained, used as test image, classification results take the mean value of 10 subseries results to residual image.With this method and its other party Method is compared on Oxford Flower-17 data sets, as a result as shown in table 1.

1 this method of table and additive method comparative result on Oxford Flower-17 data sets

Method	Average classification accuracy
		Gehler^[1]	85.5%
LLC^[2]	88.24%
		Context of methods	89.81%

Can be seen that under same experiment parameter by result in table, this method is in Oxford Flower-17 data sets On classification accuracy be higher than the method in article [1] [2], and be higher by 4.3% and 1.6% respectively.With method ratio in [1] Compared with, the image classification for combining multiple features is, but the accuracy rate of this method is higher, illustrate this method selection supplements letter Breath increased the expressive force of text feature, and the characteristics of image obtained by characteristic study has more ability to see things in their true light；Compare with [2] method, This method increased the side information of text feature, but not be significantly increased amount of calculation, and classification accuracy is improved, Illustrate the validity and feasibility of this method.

[1]Gehler P,Nowozin S.On feature combination for multiclass object classification[C]//Computer Vision,2009IEEE 12th International Conference on.IEEE,2009:221-228。

[2]Wang J,Yang J,Yu K,et al.Locality-constrained linear coding for image classification[C]//Computer Vision and Pattern Recognition(CVPR), 2010IEEE Conference on.IEEE,2010:3360-3367。

In order that the scheme and advantage of this method become more apparent, with reference to accompanying drawing, this method is specifically described：

Step 1, extracts SIFT feature, color characteristic, shape and the boundary characteristic of image in data set respectively.

Wherein P represents two different image blocks with Q, and a and a ' represents the pixel in two image blocks respectively, S (a) is the standard deviation of pixel value in 3 × 3 fields centered on a, ∈_sIt is the constant of a numerical value very little, B (a) is a binary column vector for pixel value difference in the local window around binaryzation a, standardized linear kernelFor weighing the contribution of each local binary patterns, gaussian kernel function k_b(b (a), b (a '))=exp (- γ_b‖b (a)-b(a′)‖²) for measuring the shape similarity between two local binary patterns, Gauss position kernel function k_p(a, a ')= exp(-γ_p‖a-a′‖²) for measuring the space length between two pixels.

(1.7) merged color, shape and boundary information as supplementary information and SIFT feature, obtaining dimension is 256 supplemental text feature.

Step 2, is randomly selected 200000 from supplementary text feature and is instructed by K- singular value decomposition methods (K-SVD) Get dictionary B.

Step 3, by the dictionary B for learning, using all of text feature X of the orthogonal matching pursuit method to single image Sparse coding is carried out, sparse features vector C is obtained.

(3.4) iterate, obtain sparse coding vector.

Step 4, carries out spatial pyramid division as shown in Figure 3, and sparse the levying to obtaining carries out space maximum to image Value is converged, and feature after convergence is normalized characteristic vector F for obtaining image.

F_sFor the convergence feature that each unit is obtained, S is unit number, here S=21, ‖ F_s‖_pFor p norms, p takes 2 here；

Image feature vector F is sent into Linear SVM grader, is trained respectively and test obtains image classification by step 5 As a result.

Claims

1. a kind of image classification method based on supplemental text feature, is characterized in that, step is as follows：

(1) extract the SIFT feature of image, color characteristic, shape and boundary characteristic in data set respectively, and by color, shape and Boundary characteristic is dissolved in SIFT feature as side information, obtains supplementary text feature；

(2) 200000 are randomly selected from supplementary text feature and word is obtained by K- singular value decomposition methods K-SVD training Allusion quotation B；

(3) by the dictionary B for learning, all of text feature X of single image is carried out using orthogonal matching pursuit method dilute Coding is dredged, sparse features vector C is obtained；

(4) space maximum convergence is carried out to sparse levying, and feature after convergence is normalized the feature for obtaining image Vectorial F；

(5) image feature vector F is sent into into Linear SVM grader, is trained respectively and test obtains image classification result.

2. the image classification method based on supplemental text feature as claimed in claim 1, is characterized in that, step 1 is specially：

(1.1) by each image with 8 pixels as interval, 16 × 16 image block is divided into, and obtains each pixel in image block The gradient magnitude and gradient direction of point；

(1.2) vector sum of each pixel in 8 directions in the image block per 4 × 4 pixels is calculated, wherein 8 directions are respectively 0 Degree, 45 degree, 90 degree, 135 degree, 180 degree, 270 degree, 315 degree, obtains the SIFT feature vector of 128 dimensions, 16 × 16 figure by 225 degree The characteristic vector as 16 are had in block, this feature is used as the basic text feature of image；

(1.3) image is transformed into into hsv color space, and original image block is divided into 4 × 4 grid, in each grid point Not Ji Suan three Color Channels average and standard deviation, obtain the color characteristic of the feature of 96 dimensions, i.e. image, as text feature Side information；

K_{s h a p e} (P, Q) = \underset{a &Element; P}{Σ} \underset{a^{'} &Element; Q}{Σ} \tilde{s} (a) \tilde{s} (a^{'}) k_{b} (b (a), b (a^{'})) k_{p} (a, a^{'})

(1.6) dimensionality reduction is carried out to low-dimensional feature using core principle component analysis method, dimension is equally chosen for 64 dimensions, obtaining can be effective Obtain the feature of image local shape and boundary information；

(1.7) merged color, shape and boundary information as supplementary information and SIFT feature, dimension is obtained for 256 Supplemental text feature.

3. the image classification method based on supplemental text feature as claimed in claim 1, is characterized in that, step 2 is specially：

(2.1) 200000 features are randomly selected in all of supplemental text feature of image from data set, each feature is 256 dimensions, 200000 image blocks may be constructed the matrix of 256 × 200000, each row of matrix represent one it is supplementary Text eigenvector；

(2.2) 512 are randomly selected as initial dictionary from 200000 image blocks, then initial dictionary can be with one 256 × 512 matrix is represented；

(2.3.1) dictionary is fixed, each y is solved by orthogonal matching pursuit method_iCorresponding rarefaction representation c_i, wherein i's Span is 1 200000；

(2.3.2) residual error is calculated, and chooses the residual error item for indexing set that correspondence uses the kth of dictionary to arrange, residual error item is carried out Singular value decomposition, obtains singular value decomposition matrix, and wherein obtains 256 × 256 unitary matrice；

4. the image classification method based on supplemental text feature as claimed in claim 1, is characterized in that, step 3 is specially：

\begin{matrix} \arg \underset{c}{m i n} | | x - B c | |_{2}^{2} & s . t . & | | c | |_{0} \leq K \end{matrix}

(3.2) first time iteration, selects input data x as current residue, and orthogonal matching pursuit algorithm is selected and current residue Optimal corresponding vision word；

(3.4) iterate, obtain sparse coding vector.

5. the image classification method based on supplemental text feature as claimed in claim 1, is characterized in that, step 4 is specially：

(4.1) to all of supplemental text feature in entire image, the maximum of its each dimension is sought, 512 dimensions are obtained Characteristic vector；

(4.2) entire image is divided into 4 image blocks, seeks maximum of each characteristics of image in each dimension in each image block respectively Value, obtains the characteristic vector of 4 512 dimensions；

(4.3) entire image is divided into 16 image blocks, ask in each image block respectively each characteristics of image in each dimension most It is big to be worth, obtain the characteristic vector of 16 512 dimensions；

\begin{matrix} \tilde{F_{s}} = \frac{F_{s}}{| | F_{s} | |_{p}} & &ForAll; s = 1, 2, ..., S \end{matrix}

6. the image classification method based on supplemental text feature as claimed in claim 1, is characterized in that, step 5 is specially：

(5.1) random that 60 images are chosen from each class image of data set as such training sample, residual image is made For such test sample；

(5.2) characteristic vector of 100 images for choosing step (5.1) is trained in sending into Linear SVM grader, is obtained The SVM classifier for training；

(5.3) test sample chosen step (5.1) is sent in the SVM classifier that step (5.2) is trained, and obtains image Classification results.