CN107085731B - Image classification method based on RGB-D fusion features and sparse coding - Google Patents

Image classification method based on RGB-D fusion features and sparse coding Download PDF

Info

Publication number
CN107085731B
CN107085731B CN201710328468.7A CN201710328468A CN107085731B CN 107085731 B CN107085731 B CN 107085731B CN 201710328468 A CN201710328468 A CN 201710328468A CN 107085731 B CN107085731 B CN 107085731B
Authority
CN
China
Prior art keywords
image
feature
fusion
features
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710328468.7A
Other languages
Chinese (zh)
Other versions
CN107085731A (en
Inventor
周彦
向程谕
王冬丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiangtan University
Original Assignee
Xiangtan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiangtan University filed Critical Xiangtan University
Priority to CN201710328468.7A priority Critical patent/CN107085731B/en
Publication of CN107085731A publication Critical patent/CN107085731A/en
Application granted granted Critical
Publication of CN107085731B publication Critical patent/CN107085731B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Abstract

The invention discloses an image classification method based on RGB-D fusion characteristics and sparse coding, which comprises the following specific implementation steps of: 1) extracting dense SIFT features and PHOG features of the color image and the depth image; (2) performing feature fusion on the extracted features of the two images in a linear series connection mode to finally obtain four different fusion features; (3) clustering different fusion characteristics by using a K-means + + clustering method to obtain four different visual dictionaries; (4) performing local constraint linear coding on each visual dictionary to obtain different image expression sets; (5) and classifying different image expression sets by using a linear SVM (support vector machine), and determining the final classification condition of the obtained multiple classification results by using a voting decision method. The invention has high classification precision.

Description

Image classification method based on RGB-D fusion features and sparse coding
Technical Field
The invention relates to the technical fields of computer vision, pattern recognition and the like, in particular to an image classification method based on RGB-D fusion characteristics and sparse coding.
Background
Today's society has been an era of information explosion, and besides a great amount of text information, multimedia information (pictures, videos and the like) contacted by human beings has also been explosively increased. This requires a computer to accurately understand the image content in a manner understood by humans in order to accurately and efficiently utilize, manage, and retrieve the images. The image classification is an important way for solving the image understanding problem and has an important promoting effect on the development of multimedia retrieval technology. The acquired image may be influenced by multiple factors such as viewpoint change, illumination, occlusion, background and the like, so that image classification has been a challenging problem in the fields of computer vision and artificial intelligence, and therefore, many image feature description and classification technologies are rapidly developed.
In the current image Feature description and classification technology, a main algorithm is based on a Bag-of-Feature (BOF) algorithm, and s.lazebnik proposes a Spatial Pyramid Matching (SPM) framework based on BOF in the article "Spatial Pyramid Matching for clustering natural scene sites", so that the algorithm overcomes the Spatial information lost in the BOF algorithm and effectively improves the accuracy of image classification. However, the algorithms based on the BoW all encode features by using Vector Quantization (VQ), and the hard coding mode does not consider the interrelation between visual words in a visual dictionary, so that the error of the encoded image features is large, and the performance of the whole image classification algorithm is affected.
In recent years, as the Sparse Coding (SC) theory matures, the theory becomes the most popular technique in the field of image classification. Yang proposes a Sparse coding Spatial Pyramid Matching (ScSPM) based on the article "Linear coding Spatial Pyramid coding for image classification", the model replaces the hard distribution mode with the Sparse coding mode, can optimize the weight coefficient of the visual dictionary, thus better quantize the image characteristics, make the accuracy and efficiency of image classification have very big promotion, but because of the reason of the over-complete code book, may be expressed by the very high characteristic of the degree of similarity of original completely differently, the stability of the ScSPM model is not good. Wang et al improved ScSPM, proposed in the article "localized-constrained Linear Coding for image classification", indicating that Locality is more important than sparsity, and using multiple bases in the visual dictionary to represent a feature descriptor, and similar feature descriptors obtain similar Coding by sharing their local bases, which greatly improves the instability of ScSPM.
The method aims at the classification of the color images, ignores the depth information in the object or the scene, and the depth information is one of important clues of image classification, because the depth information can easily separate the foreground from the background according to the distance, the three-dimensional information of the object or the scene can be directly reflected. With the rise of Kinect, the acquisition of depth images becomes easier, and algorithms for image classification in combination with depth information begin to become popular. The article "Kerneldescriptors for visual recognition" by Liefeng Bo et al proposes to extract the features of an image from the perspective of a kernel method and classify the image, however, the algorithm has the defects that the object needs to be modeled three-dimensionally first, which is time-consuming and has low real-time performance; in an article of 'inductor scene segmentation using a structured light sensor', Silberman firstly extracts the features of a Depth image (Depth image) and a color image (RGB image) respectively by using a Scale Invariant Feature Transform (SIFT) algorithm, then performs Feature fusion, and then performs image classification by using SPM (process map coding); janoch respectively extracts features of a depth image and a color image by using a Histogram of Oriented Gradient (HOG) algorithm in an article A Category-Level 3D Object Dataset to push the Kinect to Work, and realizes final image classification after feature fusion; mirdanies et al in the article "Object recognition system in remote controlled weather station using SIFT and SURF methods" fuse the extracted SIFT features of the RGB image with SURF features of the depth image and use the fused features for Object classification. The algorithms are all used for fusing RGB (red, green and blue) features and depth features in a feature layer, so that the image classification precision can be effectively improved. However, the algorithms also have certain defects, that is, the features extracted from the RGB image and the depth image are single features, and when the single feature is adopted, the information extraction from the image is insufficient, and the obtained fusion features cannot sufficiently express the image content, because: the RGB image is easily influenced by various aspects such as illumination change, visual angle change, image geometric deformation, shadow, shielding and the like, the depth image is easily influenced by imaging equipment, so that the problems of holes, noise and the like appear in the image, the single image feature extraction cannot keep robustness on all factors in the image, and the information in the image is inevitably lost.
Therefore, it is necessary to design a method for classifying images with more accurate classification.
Disclosure of Invention
The invention aims to solve the technical problem of providing an image classification method integrating RGB-D fusion characteristics and sparse coding, which is high in accuracy and good in stability and aims at overcoming the defects of the prior art.
In order to solve the technical problems, the technical scheme provided by the invention is as follows:
an image classification method based on RGB-D fusion features and sparse coding comprises a training stage and a testing stage:
the training phase comprises the steps of:
step A1, extracting a Scale-invariant feature transform (Scale-invariant feature transform) and a photo (histogram of layered gradient directions) feature of an RGB image and a Depth image (color image and Depth image) of each sample data; the number of sample data is n;
a2, performing feature fusion on the extracted features of two images of each sample data in a pairwise linear series mode to obtain four different fusion features; the same kind of fusion characteristics obtained by n sample data form a set to obtain four fusion characteristic sets;
through the feature extraction, the denseSIFT and PHOG features of the RGB image and the denseSIFT and PHOG features of the Depth image; then, normalizing the obtained features to ensure that all the features have similar scales; in order to reduce the complexity of feature fusion, the invention fuses features in a pairwise linear series connection mode, namely:
f=K1·α+K2·β (1)
wherein K1,K2Is a weight corresponding to the feature, and K1+K 21, in the present invention, let K1=K2α represents the extracted features of the RGB image, β represents the extracted features of the Depth image, and four different fusion features are finally obtained, namely RGBD-dense SIFT feature, RGB-dense SIFT feature + PHOGD feature, RGB-PHOG feature + D-dense SIFT feature, RGBD-PHOG feature, fusion features generated by dense SIFT features of the RGB image and the Depth image, the dense SIFT feature of the RGB image and the PHOG feature of the Depth image, the PHOG feature of the RGB image and the dense SIFT feature of the Depth image, and the PHOG feature of the RGB image and the PHOG feature of the Depth image.
Step A3, clustering fusion features in the four fusion feature sets respectively to obtain four different visual dictionaries;
a4, performing feature coding on the fusion features on each visual dictionary by adopting a local constraint linear coding model to obtain four different image expression sets;
step A5, constructing classifiers according to the four different fusion feature sets, the image expression set and the class labels of the corresponding sample data to obtain four different classifiers.
The testing phase comprises the following steps:
b1, extracting and fusing the features of the images to be classified according to the method in the steps A2-A3 to obtain four fusion features of the images to be classified;
step B2, respectively carrying out feature coding on the four fusion features obtained in the step B1 on the four visual dictionaries obtained in the step A3 by adopting a local constraint linear coding model to obtain four different image expressions of the image to be classified;
step B3, classifying the four image representations obtained in step B2 by using the four classifiers obtained in step a5, respectively, to obtain four class labels (the four class labels may include the same class label or may be different class labels);
and step B4, based on the obtained four class labels, obtaining the final class label of the image to be classified by using a voting decision method, namely selecting the class label with the largest number of votes from the four class labels as the final class label.
Further, in the step a3, a K-means + + clustering method is used to perform clustering processing on the fusion features in a certain fusion feature set.
The traditional K-means algorithm for establishing the visual dictionary has the advantages of simplicity, high performance, high efficiency and the like. However, the K-means algorithm has a certain limitation, and the algorithm is random in the selection of the initial clustering center, which results in that the clustering result is greatly influenced by the initial center point, and if the initial center point is selected to fall into a local optimal solution, the result of correctly classifying the images is fatal. Aiming at the defects, the invention uses the K-means + + algorithm to establish the visual dictionary, and adopts a probability selection method to replace the random selection of the initial clustering center. The specific implementation method for clustering any fusion feature to obtain the corresponding visual dictionary is as follows:
3.1) combining the obtained fusion characteristics obtained by n sample data into a set, namely a fusion characteristic set HI={h1,h2,h3,…,hnAnd setting the clustering number as m;
3.2) fusing feature sets HI={h1,h2,h3,…,hnRandomly selecting a point in the cluster as a first initial cluster center S1(ii) a Setting a count value t to be 1;
3.3) fusion feature set HI={h1,h2,h3,…,hnEvery point h ini,hi∈HICalculating it and StDistance d (h) therebetweeni);
3.4) selecting the next initial clustering center St+1
Based on the formula
Figure BDA0001291801110000051
Calculate point hi' probability of being selected as next initial cluster center, where hi'∈HI
Selecting the point with the maximum probability as the next initial clustering center St+1
3.5) making t equal to t +1, repeating the steps (3) and (4) until t equal to m, namely m initial clustering centers are selected;
3.6) operating a K-means algorithm by utilizing the selected initial clustering centers, and finally generating m clustering centers;
3.7) defining each cluster center as a visual word in the visual dictionary, wherein the cluster number m is the size of the visual dictionary.
Further, in the step a4, a locally constrained linear coding model is used to perform feature coding on the fusion features, where the model expression is as follows:
Figure BDA0001291801110000052
Figure BDA0001291801110000053
in the formula: h isiTo fuse feature sets HIThe fused feature in (1), i.e. the feature vector to be encoded, hi∈RdD represents the dimension of the fused feature; b ═ B1,b2,b3…bm]Is a visual dictionary established by K-means + + algorithm, b1~bmFor m visual words in the visual dictionary, bj∈Rd;C=[c1,c2,c3…cn]For the coded image representation set, where ci∈RmThe representation form of sparse coding of an image after coding is finished; lambda is a penalty factor of LLC;
Figure BDA0001291801110000054
representing the corresponding multiplication of elements; 1TciWhere 1 represents a vector with all elements 1, then 1T c i1 is used to constrain the LLC to have translational invariance; diIs defined as:
Figure BDA0001291801110000055
wherein dist (h)i,B)=[dist(hi,b1),dist(hi,b2),…dist(hi,bm)]T,dist(hi,bj) Represents hiAnd bjThe euclidean distance between, σ, is used to adjust the falling speed of the constraint weights for the local locations.
The present invention employs locally-constrained linear coding (LLC). Locality is more important than sparsity because local position constraints for features can necessarily satisfy sparsity for features, which does not necessarily satisfy local position constraints. The LLC uses local constraint instead of sparse constraint to obtain good performance.
Further, in the step a4, the fusion features are coded by using the approximate locally constrained linear coding modelPerforming feature coding; the coding model in the formula (2) is solved for ciIn the process of (2), the feature vector h to be encodediAnd selecting the visual words with shorter distance in the visual dictionary to form a local coordinate system. Therefore, according to the rule, a simple approximate LLC feature coding mode can be used for accelerating the coding process, namely, the formula (2) is not solved, and for any feature vector h to be codediSelecting k nearest visual words in the visual dictionary B as a local visual word matrix B by using k adjacent searchiThe encoding is obtained by solving a linear system of smaller scale. The expression is as follows:
Figure BDA0001291801110000061
Figure BDA0001291801110000062
wherein the content of the first and second substances,
Figure BDA0001291801110000063
a set of image representations obtained for approximate coding, wherein
Figure BDA0001291801110000064
For the representation form of sparse coding of an image after the approximate coding is finished, the approximate LLC feature coding can convert the computational complexity from o (n) according to the analytic solution of the formula (4)2) Reduced to o (n + k)2) Wherein k is<<n, but the final performance is not much different from the LLC characteristic code. The approximate LLC characteristic coding mode can not only retain local characteristics, but also can ensure the requirement of coding sparsity, so that the approximate LLC model is used for carrying out characteristic coding in the invention.
Further, k is taken to be 50.
Further, in the step a1, the densesif feature divides the image into feature blocks (blocks) with equal size by using a grid, an overlapping manner is adopted between the blocks, the center position of each feature block is used as a feature point, SIFT feature descriptors (the same as the traditional SIFT feature: gradient histogram) of the feature point are formed by all pixel points in the same feature block, and finally, the feature points based on the SIFT feature descriptors form densesif features of the whole image;
the specific steps of the PHOG feature extraction are as follows:
1.1) counting edge information of the image; extracting an edge contour of the image by using a Canny edge detection operator, and using the contour to describe the shape of the image;
1.2) carrying out pyramid grade segmentation on the image, wherein the number of the segmented blocks of the image depends on the number of layers of the pyramid grade; in the invention, the image is divided into 3 layers, wherein the 1 st layer is the whole image; the 2 nd layer divides the image into 4 sub-areas, and the size of each area is consistent; layer 3 is to divide 4 sub-regions based on layer 2, and divide each region into 4 sub-regions to finally obtain 4 × 4 sub-regions;
1.3) extracting HOG feature vectors (Histogram of oriented gradients) of each sub-region in each layer;
and 1.4) finally, carrying out cascade processing (series connection) on the HOG characteristic vectors of the sub-regions in each layer of the image, carrying out data normalization operation after obtaining the HOG data after cascade processing, and finally obtaining the PHOG characteristic of the whole image.
Further, in the step a5, the classifier adopts a linear SVM classifier.
Further, the voting decision method in step B4 may have a problem that different class labels obtain the most and equal votes, and in this case, a random selection method is adopted to randomly select one of the class labels with the equal votes as the final class label.
The invention has the beneficial effects that:
the invention selects a plurality of fusion features, can make up the defect that the information content is insufficient in the single fusion feature of the image, and effectively improves the accuracy of image classification. A Kmeans + + algorithm is selected to establish a visual dictionary, a probability selection method is adopted to replace random selection of an initial clustering center, and the algorithm can be effectively prevented from falling into a local optimal solution. And finally, voting is carried out on each class result by using a voting decision method, the classification results with large differences are fused, and the final classification performance is determined by voting decision, so that the stability of the results is ensured.
Drawings
FIG. 1 is a flow chart of an image classification method integrating RGB-D fusion features with sparse coding.
Fig. 2 shows the LLC features coding model in step a5 of the training phase of the present invention.
FIG. 3 illustrates the test image classification decision module in step B4 of the testing phase of the present invention.
FIG. 4 is a recognition confusion matrix on an RGB-D Scenes data set according to the present invention.
Detailed Description
The invention is described in further detail below with reference to specific examples and with reference to the accompanying drawings. The described examples are intended to be illustrative of the invention and are not intended to be limiting in any way.
FIG. 1 is a flow chart of a system for image classification integrating RGB-D fusion features with sparse coding, the specific implementation steps are as follows:
step S1: extracting dense SIFT features and PHOG features of the RGB image and the Depth image;
step S2: performing feature fusion on the extracted features of the two images in a serial connection mode to finally obtain four different fusion features;
step S3: clustering different fusion characteristics by using a K-means + + clustering method to obtain four different visual dictionaries;
step S4: performing local constraint linear coding on each visual dictionary to obtain different image expression sets;
step S5: and constructing classifiers for different image expression sets by using a linear SVM (support vector machine), and finally determining the final classification by voting on classification results of the four classifiers.
Based on an image classification method integrating RGB-D fusion characteristics and sparse coding, the method provided by the invention is verified by using experimental data.
The experimental data set adopted by the invention is an RGB-D Scenes data set, the data set is a multi-view scene picture data set provided by Washington university, the data set consists of 8 classified Scenes, 5972 pictures are provided, all images are obtained by a Kinect camera, and the size of the images is 640 × 480.
In the RGB-D Scenes dataset, all images were used for the experiment and the image size was adjusted to 256 × 256. For feature extraction, the sampling interval of dense SIFT features extracted from the image in the experiment is set to be 8 pixels, and the image block is 16 × 16. The PHOG characteristic extraction parameters are set as follows: the image block size is 16 × 16, the sampling interval is 8 pixels, and the gradient direction is set to 9. When building a visual dictionary, the dictionary size is set to 200. During SVM classification, a LIBSVM3.12 toolbox of an LIBSVM toolkit is adopted, 80% of pictures are taken in a data set for training, and 20% of pictures are taken for testing.
In the experiment, the method is considered from two aspects, firstly, the method is considered to be compared with the methods of some researchers with higher classification accuracy; secondly, different RGB-D fusion characteristics are considered to be compared with the classification effect of the method.
TABLE 1RGB-D Scenes dataset Classification result comparison
Classification method Rate of accuracy/%)
Linear SVM 89.6%
Gaussian kernel function SVM 90.0%
Random forest 90.1%
HOG 77.2%
SIFT+SPM 84.2%
The method of the invention 91.7%
The classification accuracy is compared with other methods as shown in table 1. Three features are integrated by the Liefeng Bo in the article "Kernel descriptors for visual recognition", and are trained and classified by a linear SVM (Linear SVM), a Gaussian kernel SVM (Kernel SVM) and a Random Forest (Random Forest), and the accuracy of 89.6%, 90.0% and 90.1% are respectively obtained in the experiment. Janoch uses HOG algorithm to respectively extract features of a depth image and a color image in an article A Category-Level 3D Object Dataset, uses an SVM classifier to realize final classification after feature fusion, and obtains an accuracy rate of 77.2% in the experiment. In an article of 'inductor scene segmentation using an actual structured light sensor', silberman firstly extracts the features of a depth image and a color image respectively by using an SIFT algorithm, then performs feature fusion, performs feature coding by using an SPM, and finally performs classification by using an SVM, wherein in the experiment, the algorithm obtains 84.2% classification accuracy. The algorithm provided by the invention obtains an accuracy of 91.7%, and is improved by 1.6% compared with the best result, so that the algorithm provided by the invention has good classification performance.
TABLE 2 comparison of classification results of different fusion characteristics of RGB-D Scenes data set
Figure BDA0001291801110000091
As can be seen from table 2, when the image classification is performed by combining the depth information, the accuracy of the classification algorithm based on the single fusion feature is lower than that of the classification algorithm based on the multi-fusion feature, and the image classification algorithm based on the multi-feature fusion can obtain better classification accuracy, but is still slightly lower than that of the image classification algorithm based on the multi-fusion feature decision fusion.
The foregoing description of specific embodiments of the present invention has been presented. It should be understood that the invention is not limited to the particular embodiments described above, but is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the invention.

Claims (7)

1. An image classification method based on RGB-D fusion features and sparse coding is characterized by comprising a training stage and a testing stage:
the training phase comprises the steps of:
a1, extracting denseSIFT and PHOG characteristics of an RGB image and a Depth image of each sample data; the number of sample data is n;
a2, performing feature fusion on the extracted features of two images of each sample data in a pairwise linear series mode to obtain four different fusion features; combining the same kind of fusion features obtained by n sample data into a set to obtain four fusion feature sets;
a3, clustering fusion features in the four fusion feature sets by using a K-means + + clustering method to obtain four different visual dictionaries; the method for establishing the corresponding visual dictionary by using the K-means + + clustering method to perform clustering processing on the fusion features in a certain fusion feature set comprises the following steps:
3.1) recording the fusion feature set as HI={h1,h2,h3,…,hnAnd setting the clustering number as m;
3.2) in HIRandomly selecting a fusion feature as a first initial clustering center S1(ii) a Setting a count value t to be 1;
3.3) to HIEach of the fusion features hi,hi∈HICalculating it and StDistance d (h) therebetweeni);
3.4) selecting the next initial clustering center St+1
Based on the formula
Figure FDA0002248160470000011
Calculate point hi' probability of being selected as next initial cluster center, where hi'∈HI(ii) a Selecting the fusion feature with the maximum probability as the next initial clustering center St+1
3.5) let t be t +1, and repeat steps 3.3) and 3.4) until t is m, i.e. m initial cluster centers are selected;
3.6) operating a K-means algorithm by utilizing the selected initial clustering centers, and finally generating m clustering centers;
3.7) defining each clustering center as a visual word in the visual dictionary, wherein the clustering number m is the size of the visual dictionary;
a4, performing feature coding on the fusion features on each visual dictionary by adopting a local constraint linear coding model to obtain four different image expression sets;
a5, constructing classifiers according to four different fusion feature sets, image expression sets and class labels of corresponding sample data to obtain four different classifiers;
the testing phase comprises the following steps:
b1, extracting and fusing the features of the images to be classified according to the method in the steps A2-A3 to obtain four fusion features of the images to be classified;
step B2, respectively carrying out feature coding on the four fusion features obtained in the step B1 on the four visual dictionaries obtained in the step A3 by adopting a local constraint linear coding model to obtain four different image expressions of the image to be classified;
step B3, classifying the four image expressions obtained in the step B2 by using the four classifiers obtained in the step A5 respectively to obtain four class labels;
and step B4, based on the obtained four class labels, obtaining the final class label of the image to be classified by using a voting decision method, namely selecting the class label with the largest number of votes from the four class labels as the final class label.
2. The method for classifying images based on RGB-D fusion features and sparse coding as claimed in claim 1, wherein in said step A4, a locally constrained linear coding model is used to perform feature coding on the fusion features, and the model expression is as follows:
Figure FDA0002248160470000021
Figure FDA0002248160470000022
in the formula: h isiTo fuse feature sets HIThe fused feature in (1), i.e. the feature vector to be encoded, hi∈RdD represents the dimension of the fused feature; b ═ B1,b2,b3…bm]Is a visual dictionary established by K-means + + algorithm, b1~bmFor m visual words in the visual dictionary, bj∈Rd;C=[c1,c2,c3…cn]For the coded image representation set, where ci∈RmFusing features h for encoding completioniThe coding coefficient of (2); lambda is a penalty factor of LLC;
Figure FDA0002248160470000024
representing the corresponding multiplication of elements; 1TciWhere 1 represents a vector with all elements 1, then 1Tci1 is used for constraining a local constraint linear coding model to have translation invariance; diIs defined as:
Figure FDA0002248160470000023
wherein dist (h)i,B)=[dist(hi,b1),dist(hi,b2),…dist(hi,bm)]T,dist(hi,bj) Represents hiAnd bjThe euclidean distance between, σ, is used to adjust the falling speed of the constraint weights for the local locations.
3. The method for classifying images based on RGB-D fusion features and sparse coding as claimed in claim 1, wherein in said step A4, the fusion features are feature coded by using approximate locally constrained linear coding model, the model expression is as follows:
Figure FDA0002248160470000031
Figure FDA0002248160470000032
wherein, BiIs a distance to be coded feature vector h in a visual dictionary B selected by using a k-neighbor searchiA local visual word matrix of the nearest k visual words,
Figure FDA0002248160470000033
a set of image representations obtained for approximate coding, wherein
Figure FDA0002248160470000034
Fusing features h after completion of approximate codingiThe coding coefficients of (1).
4. The RGB-D fusion features and sparse coding based image classification method according to claim 3, wherein k is 50.
5. The image classification method based on RGB-D fusion features and sparse coding according to any one of claims 1 to 4, wherein in the step A1, the specific steps of PHOG feature extraction are as follows:
1.1) counting edge information of the image; extracting an edge contour of the image by using a Canny edge detection operator, and using the contour to describe the shape of the image;
1.2) carrying out pyramid grade segmentation on the image; dividing the image into 3 layers, wherein the 1 st layer is the whole image; the 2 nd layer divides the image into 4 sub-areas, and the size of each area is consistent; layer 3 is to divide 4 sub-regions based on layer 2, and divide each region into 4 sub-regions to finally obtain 4 × 4 sub-regions;
1.3) extracting HOG characteristic vectors of each sub-region in each layer;
and 1.4) finally, cascading HOG characteristic vectors of sub-regions in each layer of the image, and after obtaining the cascaded HOG data, carrying out data normalization operation to finally obtain the PHOG characteristic of the whole image.
6. The image classification method based on RGB-D fusion features and sparse coding according to any one of claims 1 to 4, wherein in the step A5, a linear SVM classifier is adopted as the classifier.
7. The image classification method based on RGB-D fusion features and sparse coding according to any one of claims 1 to 4, wherein in step B4, if there are a plurality of class labels with the largest number of votes, one of the several class labels is randomly selected as the final class label.
CN201710328468.7A 2017-05-11 2017-05-11 Image classification method based on RGB-D fusion features and sparse coding Active CN107085731B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710328468.7A CN107085731B (en) 2017-05-11 2017-05-11 Image classification method based on RGB-D fusion features and sparse coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710328468.7A CN107085731B (en) 2017-05-11 2017-05-11 Image classification method based on RGB-D fusion features and sparse coding

Publications (2)

Publication Number Publication Date
CN107085731A CN107085731A (en) 2017-08-22
CN107085731B true CN107085731B (en) 2020-03-10

Family

ID=59611626

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710328468.7A Active CN107085731B (en) 2017-05-11 2017-05-11 Image classification method based on RGB-D fusion features and sparse coding

Country Status (1)

Country Link
CN (1) CN107085731B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108090926A (en) * 2017-12-31 2018-05-29 厦门大学 A kind of depth estimation method based on double dictionary study
CN108596256B (en) * 2018-04-26 2022-04-01 北京航空航天大学青岛研究院 Object recognition classifier construction method based on RGB-D
CN108805183B (en) * 2018-05-28 2022-07-26 南京邮电大学 Image classification method fusing local aggregation descriptor and local linear coding
CN108875080B (en) * 2018-07-12 2022-12-13 百度在线网络技术(北京)有限公司 Image searching method, device, server and storage medium
CN109741484A (en) * 2018-12-24 2019-05-10 南京理工大学 Automobile data recorder and its working method with image detection and voice alarm function
CN110443298B (en) * 2019-07-31 2022-02-15 华中科技大学 Cloud-edge collaborative computing-based DDNN and construction method and application thereof
CN111160387B (en) * 2019-11-28 2022-06-03 广东工业大学 Graph model based on multi-view dictionary learning

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005786A (en) * 2015-06-19 2015-10-28 南京航空航天大学 Texture image classification method based on BoF and multi-feature fusion

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005786A (en) * 2015-06-19 2015-10-28 南京航空航天大学 Texture image classification method based on BoF and multi-feature fusion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于Kinect和金字塔特征的行为识别算法;申晓霞等;《光电子•激光》;20140228;第357-362页 *
基于RGB-D容和特征的图像分类;向程谕;《计算机工程与应用》;20170316;第178-182页 *

Also Published As

Publication number Publication date
CN107085731A (en) 2017-08-22

Similar Documents

Publication Publication Date Title
CN107085731B (en) Image classification method based on RGB-D fusion features and sparse coding
Liu et al. Recognizing human actions using multiple features
Wang et al. Mmss: Multi-modal sharable and specific feature learning for rgb-d object recognition
US9633282B2 (en) Cross-trained convolutional neural networks using multimodal images
Kim et al. A hierarchical image clustering cosegmentation framework
Reddy Mopuri et al. Object level deep feature pooling for compact image representation
Negrel et al. Evaluation of second-order visual features for land-use classification
CN106126581A (en) Cartographical sketching image search method based on degree of depth study
Tabia et al. Compact vectors of locally aggregated tensors for 3D shape retrieval
Hu et al. RGB-D semantic segmentation: a review
CN109657704B (en) Sparse fusion-based coring scene feature extraction method
CN112163114B (en) Image retrieval method based on feature fusion
CN110659608A (en) Scene classification method based on multi-feature fusion
Chen et al. Multi-view feature combination for ancient paintings chronological classification
Morioka et al. Learning Directional Local Pairwise Bases with Sparse Coding.
Zhu et al. Traffic sign classification using two-layer image representation
CN111414958B (en) Multi-feature image classification method and system for visual word bag pyramid
Jampour et al. Pairwise linear regression: an efficient and fast multi-view facial expression recognition
Montazer et al. Farsi/Arabic handwritten digit recognition using quantum neural networks and bag of visual words method
James et al. Interactive video asset retrieval using sketched queries
Ren et al. A multi-scale UAV image matching method applied to large-scale landslide reconstruction
Guo et al. Integrating shape and color cues for textured 3D object recognition
Kastaniotis et al. HEp-2 cells classification using locally aggregated features mapped in the dissimilarity space
Gupta et al. Video scene categorization by 3D hierarchical histogram matching
Montazer et al. Scene classification based on local binary pattern and improved bag of visual words

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant