CN110659608A - Scene classification method based on multi-feature fusion - Google Patents

Scene classification method based on multi-feature fusion Download PDF

Info

Publication number
CN110659608A
CN110659608A CN201910901697.2A CN201910901697A CN110659608A CN 110659608 A CN110659608 A CN 110659608A CN 201910901697 A CN201910901697 A CN 201910901697A CN 110659608 A CN110659608 A CN 110659608A
Authority
CN
China
Prior art keywords
features
feature
scene
classification
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910901697.2A
Other languages
Chinese (zh)
Inventor
轩靖奇
蔡春花
王峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University of Technology
Original Assignee
Henan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University of Technology filed Critical Henan University of Technology
Priority to CN201910901697.2A priority Critical patent/CN110659608A/en
Publication of CN110659608A publication Critical patent/CN110659608A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Abstract

The invention researches a feature fusion method for scene classification aiming at the defects of single feature discrimination performance and generalization ability of images in the field of scene recognition. Firstly, extracting GIST (GIST, HOG, scale and noise) features, HOG (histogram of oriented gradient) features, SIFT (scale and noise) features and PLBP (PLBP) features of a scene image, and carrying out feature coding on the SIFT features in a VLAD (very large amplitude digital) mode; then, analyzing and fusing the extracted features in different modes by using a serial fusion method; and finally, inputting the scene images into a multi-linear SVM to classify the scene images, and evaluating the average accuracy and the classification speed of final recognition through a large number of experiments. Experimental results show that the method provided by the invention can realize mutual feature information complementation by using the advantages of different features, and achieves better classification performance under the condition of low feature extraction time consumption and low classification time consumption.

Description

Scene classification method based on multi-feature fusion
Technical Field
The invention belongs to the field of scene recognition, and particularly relates to a scene classification method based on multi-feature fusion.
Background
The goal of scene recognition is to identify the scene to which the image belongs by extracting and analyzing features in the scene to obtain information of the scene. As an important research direction of computer vision, the method is applied to various fields such as image video retrieval, security control systems, robot intelligent vision systems, intelligent transportation and the like. Due to the fact that the images of the same type of scene have large differences in the aspects of background, scale, visual angle, illumination and the like, and the images of different types of scenes have similarity, the classification and the identification of the images of the scenes are difficult.
Scene recognition is an important and difficult research topic in the field of computer vision. Before 2010, classification recognition is mainly realized by using low-level features, and mainly comprises textures, shapes, colors and the like. However, such simple global features are not enough to describe the features of the whole image, and the classification performance is not good under complex environments. To overcome this problem, some scholars then proceed from the local part of the underlying feature, processing the color and texture of the local area. David Lowe proposed a scale-space based image local feature description operator SIFT with image scaling, rotation and affine transformation invariance in the 2004 IJCV conference. In 2005, Dalal et al proposed a histogram of gradient directions (HOG) feature at the CVPR conference, a feature that is obtained by counting gradient direction information of a local area of an image as an image. Olivia and Torralba adopt and improve a global feature Gist capable of reflecting scene information such as image natural degree and openness degree, but the Gist feature has a less obvious effect when classifying complex indoor scenes. Philbin proposes a BOW model based on SIFT features, expresses the extracted features into a combination of a plurality of visual vocabularies to form a dictionary, and classifies the samples by analyzing and calculating the frequency of the visual vocabularies in the samples. The BOVW model is simple and can effectively reduce the characteristic dimension of the sample, but the model does not consider the spatial position information of the characteristic points. For the disadvantage, Lazebnik et al proposed a spatial pyramid matching model (SPM) in 2006, and divided sample spaces at different levels, so that spatial position information of features is fully considered, and performance of the boww model is greatly improved.
Due to the complexity of the scene image, it is difficult for a single feature to describe all the information in the image. How to seek a method for mining richer information by taking advantages of all characteristics into consideration so as to achieve a classification effect superior to that of a single characteristic becomes a hot direction.
Disclosure of Invention
The invention aims to provide a method for realizing a multi-feature fusion method for scene classification. A fusion mode of VLAD characteristics, GIST characteristics, PLBP characteristics and HOG characteristics based on SIFT local descriptors is provided. By further coding the local features, the related information among the local features is mined, the discriminability is enhanced, and the classification speed is increased; meanwhile, the HOG characteristics of the fused image are considered to extract edge and gradient characteristics so as to well grasp the characteristics of local shapes; fusing GIST characteristics to improve the global description capability of the image; and the PLBP is fused to improve the problem of insufficient expression of the spatial information of the texture features. And further, a support vector machine based on the RBF kernel function is used for realizing scene image classification after feature fusion.
In order to solve the technical problems, the invention provides the following technical scheme, which sequentially comprises the following steps:
(1) scene image preprocessing
In the preprocessing stage of the experiment, the scene image is subjected to gray level conversion and other processing. When Gist feature extraction is performed, the image size is adjusted to 256 × 256, and when other features are extracted, the image size is adjusted to 300 × 300.
(2) Feature extraction
And extracting SIFT features, GIST features, PLBP features and HOG features of the scene image. Then, the local features are further coded by using a VLAD algorithm to mine relevant information among the local features, so that discriminability is enhanced, and the classification speed is increased; meanwhile, the HOG characteristics of the extracted image are considered to obtain edge and gradient characteristics so as to well grasp the characteristics of local shapes; extracting GIST characteristics to improve the global description capacity of the image; and extracting PLBP to solve the problem of insufficient expression of spatial information of textural features. The step (2) is characterized in that:
1) GIST feature: dividing the image into 4 × 4 grids, processing each block by 4-scale 8-direction Gabor filter sets, then averaging to obtain a 32-dimensional vector set of the image block, and cascading Gist feature vectors of all image blocks of the whole image to obtain the Gist feature of the whole image, wherein the dimension is 4 × 4 × 32-512 dimensions.
2) HOG features: the HOG features are formed by calculating and counting a gradient direction histogram of a local area of an image, and the essence is that the image features are represented by statistical information of image gradients. Firstly, normalizing a grayed image, calculating the gradient of each pixel point, forming a cell by a plurality of pixels, counting a gradient histogram in the cell unit, forming a block by a plurality of adjacent cell units, and forming the gradient histogram in the block by serially connecting and normalizing the cell unit histograms, wherein the block histograms form the characteristics of an image block, and the HOG characteristics of the image can be obtained by serially connecting and combining a plurality of block characteristics. The invention divides the image into 50 multiplied by 50 cells and calculates the 40-bins gradient histogram of each cell, and sets the adjacent 2 multiplied by 2 cells to form a block. Then, if the image size is 300 × 300, there are 6 cells in the vertical direction and 6 cells in the horizontal direction, and the adjacent 2 × 2 cells are combined into one block, there are 5 blocks in the vertical direction and 5 blocks in the horizontal direction, and therefore the final HOG feature vector obtained is 5 × 5 × 40 × 2 × 2 ═ 4000 dimensions.
3) Sift (vlad) feature: firstly, SIFT features of a scene image are extracted, a codebook containing k centers is obtained by using k-means, then each local feature is assigned to a central point nearest to the local feature, and finally, residual errors between the local features and the assigned central point are accumulated to serve as a final image representation. The method comprises the steps of finding a nearest codebook cluster center for the features in each image, then accumulating the difference values of all the features and the cluster centers to obtain a K x D VLAD matrix, wherein K is the number of the cluster centers and D is a feature dimension (for example, sift is 128 dimensions), then expanding the matrix into a (K x D) vector, and normalizing L2 of the vector, wherein the obtained vector is VLAD (the value of K is set to be 78, and D is set to be 128). The VLAD has the advantages that the calculation amount can be effectively reduced, and the algorithm is an algorithm which has both precision and efficiency.
4) PLBP feature: the PLBP features are obtained through the LBP histogram series connection of each level pyramid, and each series digital image LBP feature vector is normalized uniformly, so that the pixel information reflecting the image totality is obtained. Firstly, performing edge detection and pyramid segmentation on an image, dividing the image into 4 layers, wherein the first layer is the whole image, the second layer divides the whole image into 4 sub-regions, and the third layer and the fourth layer further divide the previously segmented sub-regions into 4 smaller Block small regions. Next, the LBP features for each sub-region are computed, quantifying the image sub-regions into K patches. And finally, cascading all LBP feature vectors to obtain the PLBP feature vector of the image. The interval of the present invention is set to 40, and is divided into 4 layers of spaces, so the dimension of the final extracted feature is (1+4+16+64) × 40-3400.
(3) Feature fusion
Assuming there are three eigenvectors, β and γ, in A, B, C three eigenspaces, where α ∈ A, β ∈ B, γ ∈ C, then for serial fusion there is aIf α, β, γ represent m, n and q dimensional feature vectors, respectively, thenHas a dimension of m + n + q. k, l, j are the weighting coefficients of the corresponding feature vectors. The invention adopts a serial fusion method, the weight coefficient is set to be 1, and the final fusion dimension is (m + n + q + …). Firstly, SIFT features are extracted, then feature coding is carried out by using a VLAD algorithm to generate coded features, the invention mainly adopts a VLAD feature coding mode, and PLBP, GIST and HOG features of a scene image are simultaneously extracted to generate a feature matrix file corresponding to each picture for fusion, then a feature matrix is loaded according to 10 randomly generated training set and test set files, serial fusion is realized by using a Numpy library, and then the step (4) is implemented.
(4) Normalization process
After the features are extracted, in order to eliminate possible influences caused by dimension, extreme value or noise data, value range difference and the like among the features and improve the convergence rate of the model, the step (3) is processed by adopting a standard deviation standardization method, the average value of the processed feature data is 0, and the standard deviation is 1.
(5) And classifying the scene images by using a support vector machine based on the RBF kernel function.
The model evaluation parameters are average classification accuracy, recall rate, time consumed for feature extraction and time consumed for classification. The higher the average classification accuracy, the less time is spent on feature extraction and classification, which indicates that the prediction capability of the established model is stronger. By comparing the average prediction accuracy (fig. 1), it can be found that the scene classification recognition effect based on the single feature is poor, and a relatively good recognition effect can be achieved by using the feature fusion method (tables 2-4), wherein the scene recognition system using the serial fusion method of sift (vlad), GIST, HOG and PLBP features can obtain a recognition accuracy of 87.27%.
Drawings
As shown in the drawings, fig. 1 shows the classification accuracy of a single feature on the OT data set, fig. 2 shows the confusion matrix of the fusion mode of sift (vlad), GIST, HOG and PLBP features on the OT data set, fig. 3 shows the confusion matrix of the fusion mode of sift (vlad), GIST, HOG and PLBP features on the FP data set, and fig. 4 shows the confusion matrix of the fusion mode of sift (vlad), GIST, HOG and PLBP features on the LSP data set.
Detailed Description
To verify the performance of the model we proposed, we performed experiments on three datasets, Scene-8(OT-8), Scene-13(FP) and Scene-15 (LSP). Each category in the data set consists of 200 and 400 pictures, with an average size of 300 × 250 pixels. The composition of the data set is shown in table 1.
TABLE 1 Experimental data set
The experiment of the invention adopts a strategy of averaging multiple experiments. And respectively and randomly selecting 100 images for each scene as a training set, and taking the rest images as a testing set. The experiment was repeated 10 times for each data set and averaged to obtain the final experimental result.
As can be seen from the table, the serial fusion of sift (vlad), GIST, HOG and PLBP features achieved 87.27%, 83.50% and 79.30% classification accuracy on OT and LSP datasets, respectively, and the average time spent on feature extraction from three datasets to classify 1.1393s, 1.3651s and 1.4529s, respectively.
Experiments can also find that the identified performance shows a descending trend along with the increase of the scale of the data set, and the classification accuracy is reduced compared with that of an OT data set due to the fact that the number of the classifications of the FP data set is increased and an indoor scene is added. And more complex stores and industrial scenes are added in the LSP data set, and the classification accuracy is further reduced.
TABLE 2 Performance indicators corresponding to different fusion modes in OT data set
Figure BDA0002212025060000032
TABLE 3 Performance indicators corresponding to different fusion modes in FP data set
Figure BDA0002212025060000041
TABLE 4 Performance indicators corresponding to different fusion modes in LSP dataset
Figure BDA0002212025060000042
The confusion matrices corresponding to the three data sets in the method for obtaining the optimal classification accuracy in tables 2-4 are respectively shown in fig. 2-4, and it can be seen from the figures that the high recognition accuracy on the OT data set reaches 98%, the identification accuracy and the opencount type accuracy both reach 92%, and the worst coast reaches 78% of classification effect; the Opencount on the FT data set reaches 96%, the newly added android category reaches 97%, the kitchen category identification accuracy reaches 95%, the relative street category is obviously reduced, and the accuracy is only 61%; the degree of detail on LSP data set is 96%, mountain is 95%, and the accuracy of newly added store and industrial classification is 79% and 94%, respectively.

Claims (1)

1. The invention discloses a scene classification method based on multi-feature fusion, which is mainly used for accurately predicting scene images and comprises the following steps:
(1) scene image preprocessing
The method mainly completes preprocessing operations such as size and gray level conversion of a scene image;
(2) feature extraction
SIFT features, GIST features, PLBP features and HOG features of the scene image are extracted, and then the local features are further encoded by using a VLAD algorithm so as to mine related information among the local features, enhance discriminability and improve classification speed; meanwhile, the HOG characteristics of the extracted image are considered to obtain edge and gradient characteristics so as to well grasp the characteristics of local shapes; extracting GIST characteristics to improve the global description capacity of the image; extracting PLBP characteristics to solve the problem of insufficient spatial information expression of textural characteristics;
(3) feature fusion
Storing the scene image features extracted in the step (2) for fusion, then loading a feature matrix according to 10 randomly generated training sets and test set files, finally setting a feature fusion weight coefficient as 1, and realizing serial fusion, wherein the method is characterized in that for the step (3): assuming there are three eigenvectors, β and γ, in A, B, C three eigenspaces, where α ∈ A, β ∈ B, γ ∈ C, then for serial fusion there is a
Figure FDA0002212025050000011
If α, β, γ represent m, n and q dimensional feature vectors, respectively, then
Figure FDA0002212025050000012
The dimensionality of (m + n + q), wherein k, l, j are weight coefficients of corresponding feature vectors, the invention adopts a serial fusion method, the weight coefficient is set to be 1, and the final fusion dimensionality is (m + n + q + …);
(4) normalization process
After the features are extracted, in order to eliminate possible influences caused by dimension, extreme value or noise data, value range difference and the like among the features and improve the convergence rate of the model, the step (3) is processed by adopting a standard deviation standardization method, the average value of the processed feature data is 0, and the standard deviation is 1;
(5) classifying the scene images by using a support vector machine based on an RBF kernel function;
and (4) generating a training set and a test set according to rules by the processed features, inputting the training set and the test set into a support vector machine based on the RBF kernel function, and generating performance indexes such as a confusion matrix and a classification result corresponding to each classification, the accuracy rate corresponding to each batch, a recall rate, feature extraction time, classification and feature fusion time, the average accuracy rate of all batches and the like.
CN201910901697.2A 2019-09-23 2019-09-23 Scene classification method based on multi-feature fusion Pending CN110659608A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910901697.2A CN110659608A (en) 2019-09-23 2019-09-23 Scene classification method based on multi-feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910901697.2A CN110659608A (en) 2019-09-23 2019-09-23 Scene classification method based on multi-feature fusion

Publications (1)

Publication Number Publication Date
CN110659608A true CN110659608A (en) 2020-01-07

Family

ID=69039179

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910901697.2A Pending CN110659608A (en) 2019-09-23 2019-09-23 Scene classification method based on multi-feature fusion

Country Status (1)

Country Link
CN (1) CN110659608A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111242223A (en) * 2020-01-15 2020-06-05 中国科学院地理科学与资源研究所 Street space quality evaluation method based on streetscape image multi-feature fusion
CN111553893A (en) * 2020-04-24 2020-08-18 成都飞机工业(集团)有限责任公司 Method for identifying automatic wiring and cutting identifier of airplane wire harness
CN111723763A (en) * 2020-06-29 2020-09-29 深圳市艾为智能有限公司 Scene recognition method based on image information statistics
CN112287769A (en) * 2020-10-09 2021-01-29 江汉大学 Face detection method, device, equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111242223A (en) * 2020-01-15 2020-06-05 中国科学院地理科学与资源研究所 Street space quality evaluation method based on streetscape image multi-feature fusion
CN111553893A (en) * 2020-04-24 2020-08-18 成都飞机工业(集团)有限责任公司 Method for identifying automatic wiring and cutting identifier of airplane wire harness
CN111723763A (en) * 2020-06-29 2020-09-29 深圳市艾为智能有限公司 Scene recognition method based on image information statistics
CN111723763B (en) * 2020-06-29 2024-02-13 深圳市艾为智能有限公司 Scene recognition method based on image information statistics
CN112287769A (en) * 2020-10-09 2021-01-29 江汉大学 Face detection method, device, equipment and storage medium
CN112287769B (en) * 2020-10-09 2024-03-12 江汉大学 Face detection method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110659608A (en) Scene classification method based on multi-feature fusion
Bosch et al. Representing shape with a spatial pyramid kernel
Everingham et al. Pascal visual object classes challenge results
CN112633382B (en) Method and system for classifying few sample images based on mutual neighbor
CN107085731B (en) Image classification method based on RGB-D fusion features and sparse coding
Marszałek et al. Accurate object recognition with shape masks
CN103679192A (en) Image scene type discrimination method based on covariance features
CN110738672A (en) image segmentation method based on hierarchical high-order conditional random field
Karmakar et al. Improved tamura features for image classification using kernel based descriptors
CN112784722B (en) Behavior identification method based on YOLOv3 and bag-of-words model
Wilber et al. Exemplar codes for facial attributes and tattoo recognition
Mannan et al. Optimized segmentation and multiscale emphasized feature extraction for traffic sign detection and recognition
CN111414958B (en) Multi-feature image classification method and system for visual word bag pyramid
Dunlop Scene classification of images and video via semantic segmentation
Ahmad et al. SSH: Salient structures histogram for content based image retrieval
CN108536772B (en) Image retrieval method based on multi-feature fusion and diffusion process reordering
CN112818779B (en) Human behavior recognition method based on feature optimization and multiple feature fusion
Caputo et al. A performance evaluation of exact and approximate match kernels for object recognition
Tang et al. Rapid forward vehicle detection based on deformable Part Model
Krig et al. Local Feature Design Concepts, Classification, and Learning
Vinoharan et al. An Efficient BoF Representation for Object Classification
Chen et al. Indoor/outdoor classification with multiple experts
Rapantzikos et al. On the use of spatiotemporal visual attention for video classification
Sarkar et al. A meta-algorithm for classification by feature nomination
Xu et al. Integrated patch model: A generative model for image categorization based on feature selection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200107