CN110598776A - Image classification method based on intra-class visual mode sharing - Google Patents
Image classification method based on intra-class visual mode sharing Download PDFInfo
- Publication number
- CN110598776A CN110598776A CN201910830812.1A CN201910830812A CN110598776A CN 110598776 A CN110598776 A CN 110598776A CN 201910830812 A CN201910830812 A CN 201910830812A CN 110598776 A CN110598776 A CN 110598776A
- Authority
- CN
- China
- Prior art keywords
- image
- visual
- class
- dictionary
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention provides an image classification method based on intra-class visual mode sharing, which comprises the following steps: generating an image object window; extracting depth features of an image window; visual dictionary learning based on in-class sharing characteristics: according to the depth characteristics of the candidate object windows of all semantic category images, a structured visual dictionary with in-class sharing characteristics is obtained by optimizing a visual dictionary learning model; generating an input image object window and extracting characteristics; feature coding of the object window; integrating visual features and constructing image global features; and predicting semantic tags of the SVM classifier. The method analyzes and solves the problem from the perspective of introducing the sharing characteristic of the visual mode with more practical value, realizes the enhancement of the semantic expression of the image characteristics and improves the accuracy of the identification of the image types by cooperatively mining the visual dictionary words with the sharing characteristic in each semantic type.
Description
Technical Field
The invention relates to the technical field of image recognition, in particular to an image classification method based on intra-class visual mode sharing.
Background
With the current continuous development of digital multimedia technology and internet technology, human society has stepped into a big data age in which multimedia data is rapidly increasing. Among multimedia data of different forms, image data plays an important role in various aspects of people's social life due to the characteristics of intuition and easy acquisition, and further, how to effectively analyze and understand the content of the image data becomes increasingly important. In the past years, a plurality of image semantic object classification methods make certain progress in visual feature generation, object model construction and strong supervision learning modes. However, due to the semantic gap between the bottom-layer visual features and the middle-layer and high-layer information, the existing image semantic object classification method is still slow in the key problems of discriminant feature construction, associated information collaborative analysis, visual feature semantics and the like.
For the image classification problem, the current research focus is mainly on the construction aspect of semantic representation of image features. When the image feature representation can fully describe the semantic content of the object, the accurate prediction of the semantic content of the image can be realized only by a simple linear classifier. The acquisition of early image features is based on underlying visual cues such as color, shape, texture and the like, and histogram representation of visual information is generated through an artificially defined image feature construction mode. However, the bottom-layer visual feature representation is only a statistical description of visual information, and it is difficult to effectively depict semantic object contents, which eventually leads to an inability to accurately predict the category to which the image belongs in an actual classification task. In order to solve the above problems, subsequent research efforts are focused on extracting image feature representations with more semantic discriminative properties by means of machine learning. In a plurality of image classification models, a visual dictionary learning-based method decomposes the construction problem of image semantic feature representation into four sub-problems of bottom layer feature extraction, visual dictionary learning, local feature coding and image global feature generation.
According to the current image classification method based on visual dictionary learning, the obtained dictionary words are mutually independent, the exploration of the correlation among the dictionary words is lacked, and the judgment capability of the visual dictionary for constructing image feature representation is weakened. In fact, in the process of learning the visual dictionary, visual dictionary words with relevance are cooperatively mined, so that the consistency of feature representation of images of the same semantic category and the difference between feature representations of images of different semantic categories can be effectively enhanced, and the performance of image semantic object category prediction is finally improved.
Disclosure of Invention
Aiming at the defects in the prior art, the technical problem to be solved by the invention is to provide an image classification method based on intra-class visual mode sharing so as to solve the problem of insufficient semantic information represented by image features.
The technical scheme adopted by the invention for realizing the purpose is as follows: an image classification method based on intra-class visual mode sharing comprises the following steps:
image object window generation: giving an image training set containing multiple semantic class objects, and generating a candidate object window of each image in the image training set;
extracting depth features of an image window: extracting the depth feature of the candidate object window;
visual dictionary learning based on in-class sharing characteristics: according to the depth characteristics of the candidate object windows of all semantic category images, a structured visual dictionary with in-class sharing characteristics is obtained by optimizing a visual dictionary learning model;
generating candidate object windows of the image for the input image with unknown semantic category, and extracting the depth characteristics of the candidate object windows;
calculating the characteristic codes of candidate object windows of the input images according to the structured visual dictionary;
combining object window feature codes based on the feature codes of all object windows of the input image to construct an image global feature representation;
and predicting semantic category labels of the input images by utilizing a linear SVM classifier according to the image global feature representation to realize the classification of the images.
And the generation of the candidate object window of each image in the image training set is realized by an EdgeBox algorithm.
The extracting of the depth features of the candidate object window is completed through a VGG19 deep network model.
The optimized visual dictionary learning model is performed by the following formula:
in the above formula, XiA visual characteristic matrix of all training samples corresponding to the ith semantic object class; d∈iRepresenting a class-specific visual dictionary in which dictionary words corresponding to the ith class in the structured visual dictionary D are reserved, and the rest semantic object class dictionary words are set to zero; a. theiIs a visual feature matrix XiClass specific dictionary D∈iA matrix of representation coefficients of (a); d is a structured visual dictionary to be optimized, which is a set of dictionary words of all semantic object categories; ziIs a feature matrix XiRepresenting coefficients on a structured visual dictionary D; diAnd DjRespectively representing visual dictionaries corresponding to the ith and jth class objects in the structured dictionary D; symbol | · | non-conducting phosphorFA Frobenius norm representing a computational matrix; parameters alpha, beta, lambda1、λ2The weighting coefficients of the different cost terms are balanced in the objective function.
The objective function for calculating the feature codes of the candidate object windows of the input image is as follows:
in the formula, x is the depth visual feature of an object window, y represents the object window feature code to be optimized and solved, D is a structured visual dictionary, and parameter eta represents the number of nonzero elements in the control feature code y, namely the sparsity of the feature code; symbol | · | non-conducting phosphorFRepresenting the Frobenius norm of the computational matrix.
The invention has the following advantages and beneficial effects:
1. aiming at the problem that the relevance constraint among words is neglected in the process of learning the words of the visual dictionary by the current image classification method based on the visual dictionary learning, and the lack of image feature representation semantic information is caused, the method provided by the invention analyzes and solves the problem from the perspective of introducing the sharing characteristic of the visual pattern with practical value, realizes the enhancement of image feature representation semantic property and improves the accuracy of image category identification by cooperatively mining the visual dictionary words with the sharing characteristic in each semantic category.
2. The invention has the characteristics of no manual participation, high classification accuracy and the like. The method is different from the method for learning each dictionary word by an image classification method based on dictionary learning in isolation at present, introduces the sharing characteristic of the visual mode in the semantic object class, cooperatively mines the visual dictionary word with the sharing characteristic in the same semantic class, establishes the association constraint among the dictionary words, and improves the problem of lack of semantic information expressed by the visual characteristic of the current image.
3. The method of the present invention is practical and effective.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a diagram of a set of visual features based on an image object window according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
The method is realized on a Matlab R2016b experimental platform, as shown in FIG. 1, the method mainly comprises two parts, specifically, an intra-class shared visual pattern mining part mainly relates to the following contents, including image object window generation, window depth feature extraction and visual dictionary learning based on intra-class shared characteristics; in the image global feature construction and classification part, four steps of input image object window generation and feature extraction, object window feature coding, visual feature integration and construction of image global features and SVM classifier semantic label prediction are involved. The method comprises the following specific steps:
in-class shared visual pattern mining:
step one, generating an image object window
Given a training set of images containing multiple semantic class objects, candidate object windows for each image are generated using the EdgeBox algorithm (see document c. lawrence Zitnick and Piotr Doll' ar. edge boxes: Locating object pro posalsfrom images in European Conference Computer Vision 2014.).
Step two, extracting depth features of image windows
The image candidate object window depth features are extracted using the VGG19 depth network model (see document k. simony and a. zip. version depth relational networks for large-scale image registration. in international conference on Learning retrieval, 2015.).
Step three, learning a visual dictionary based on in-class sharing characteristics
In order to mine a visual pattern with a sharing characteristic in each semantic category, the embodiment of the invention designs the following structured visual dictionary, and the mathematical form of the visual dictionary is as follows:
D=[D1,D2,...,DC]
d in the above formula represents a constructed visual dictionary, which is a dictionary D of each object classiI 1, 2.., C, which represents the number of object classes contained by the image set.
And according to the depth characteristics of the candidate object windows of all semantic image classes, a structured visual dictionary D with the in-class sharing characteristic is obtained by optimizing a visual dictionary learning model constructed in the following way.
In the above formula, XiVisual feature matrices for all training samples corresponding to the ith semantic object class, D∈iRepresenting a class-specific visual dictionary in which dictionary words corresponding to the ith class in the structured dictionary D are reserved, and the rest semantic object class dictionary words are set to zero; a. theiIs a visual feature matrix XiClass specific dictionary D∈iA matrix of representation coefficients of (a); d is a structured visual dictionary to be optimized, which is a set of dictionary words of all semantic object categories; ziIs a feature matrix XiRepresenting coefficients on a structured dictionary D; diAnd DjRespectively representing visual dictionaries corresponding to the ith and jth class objects in the structured dictionary D; symbolNo. | · | non-conducting phosphorFRepresenting the Frobenius norm of the computational matrix.
In the constructed visual dictionary learning model, the first two cost itemsAndreconstructing residual terms for data, whose effect is to exploit the learned class-specific visual dictionary D∈iAnd a structured visual dictionary D, which realizes effective reconstruction of visual features of the ith semantic object category. Cost itemAnd selecting dictionary words corresponding to the ith class in the structured dictionary D for the consistency constraint of the introduced representation coefficients so as to reconstruct and represent the visual features of the class, thereby ensuring that the reconstruction coefficients of the feature data of the same object class have consistency. Cost item in dictionary learning modelDictionary D for each semantic object classiAnd the orthogonality constraint between the words I and C can enhance the difference between different dictionary words, and ensure the discrimination capability of subsequent image feature coding. Last two terms lambda in the optimization model1||Ai||2,1,λ2||Zi||2,1To represent the coefficient AiAnd ZiThe applied regularization constraint specifically adopts a method based on l in the embodiment2,1The set of norms is sparse such that the matrix of solved representation coefficients is sparse by rows. In the dictionary learning model, will l2,1Norm group sparse regularization term and representation coefficient consistency constraintPerforming joint optimization, on one hand, ensuring the i-th visual dictionary DiThe reconstruction of the semantic category visual characteristics can also be carried outAnd effectively mining a visual mode with a sharing characteristic in the ith category, and finally improving the consistency of the feature representation of the same semantic category and the difference between the features of different semantic categories. Parameters alpha, beta and lambda in dictionary learning model1、λ2The weighting coefficients for balancing different cost terms in the objective function are empirically set to 0.01 through experiments. The dictionary learning objective function is a multivariable optimization problem, and iterative computation is performed by adopting an alternative optimization strategy. Specifically, when one variable in the objective function is optimized, the other variables are fixed, and then the original convex optimization problem is converted into a plurality of convex optimization sub-problems to be solved.
Constructing and classifying image global features:
step one, generating an input image object window and extracting characteristics
Given an input image with unknown semantic category, candidate object windows of the image are generated by utilizing an EdgeBox algorithm, and VGG19 deep network visual features of the candidate object windows are further extracted.
Step two, object window characteristic coding
And calculating the characteristic codes of the candidate object windows of the input images according to the acquired structured visual dictionary D. The specific mathematical form of the objective function is as follows:
in the formula, x is the depth visual feature of an object window, y represents the object window feature code to be optimized and solved, D is a structured visual dictionary, and parameter eta represents the number of nonzero elements in the control feature code y, namely the sparsity of the feature code; symbol | · | non-conducting phosphorFRepresenting the Frobenius norm of the computational matrix.
In order to solve the above objective function, a Feature-design search (see literature (honglag Lee, Alexis Battle, Rajat Raina, and Andrew y. ng. efficient mapping algorithms, the Conference on Neural Information Processing Systems, pages 801-808.2007.) algorithm is specifically adopted to calculate the variable y to be optimized, i.e. to obtain the Feature code of the image window.
Step three, integrating visual features and constructing image global features
Based on the feature codes of all object windows of the input image, further borrowing the mode of traditional Max-Pooling feature integration (see the literature: Jiancho Yang, Kai Yu, Yihong Gong, and T. Huang. Linear spatial profiling using encoding for image classification. in IEEEConference on Computer Vision and Pattern Recognition, pages 1794-.
The traditional Max-Pooling feature integration method is based on image local interest points, and in order to embed space distribution information in image global features, a step of dividing different space scales of an image is added in the method. Different from the traditional method, the method is based on the image object window area, image space distribution and object semantic information are effectively introduced in the image global feature construction process, and finally the construction of the image global feature directly encodes all window features of the image and obtains the maximum value on each feature dimension.
Step four, forecasting semantic tags of SVM classifier
And (3) according to the image global feature representation constructed in the third step, predicting semantic category labels of the input image by using a linear SVM classifier (see the documents: R. -E.Fan, K. -W.Chang, C. -J.Hsieh, et al.Libliner: A library for large linear classification Research. journal of Machine Learning Research, 2008, 9: 1871-.
Table 1 evaluation of the accuracy of the method of the present invention and existing image classification methods on UIUC8 object recognition databases
As shown in the above table, the experiments were compared to existing methods on UIUC8 object recognition databases. The database contains image data of 8 different sports categories, and 1972 images. In order to calculate the classification accuracy of different methods, 70 images are randomly selected from the image sets of all classes as training data, 60 images are randomly selected from the rest images of the classes as test data, and the final classification accuracy is the average value of the classification accuracy of all the classes.
Note: the image classification method LLC in the above table, see literature (Jinjun Wang, Jiancho Yang, Kai Yu, Fengjun Lv, T.Huang, and Yihong Gong.Locality-constrained linear coding for image classification. in IEEE Conference on Computer Vision and Pattern recognition, pages 3360. 3367, 2010.); an image classification method LSC, see literature (Lingqi Liu, Lei Wang, and Xinwang Liu. in destination of soft-alignment coding. in IEEEInternational Conference on Computer Vision, pages 2486-; the image classification method CNN, see literature (k. simony and a. zisserman. very deep relational network for large-scale image recognition. in International Conference on learning information, 2015.).
Claims (5)
1. An image classification method based on intra-class visual mode sharing is characterized by comprising the following steps:
image object window generation: giving an image training set containing multiple semantic class objects, and generating a candidate object window of each image in the image training set;
extracting depth features of an image window: extracting the depth feature of the candidate object window;
visual dictionary learning based on in-class sharing characteristics: according to the depth characteristics of the candidate object windows of all semantic category images, a structured visual dictionary with in-class sharing characteristics is obtained by optimizing a visual dictionary learning model;
generating candidate object windows of the image for the input image with unknown semantic category, and extracting the depth characteristics of the candidate object windows;
calculating the characteristic codes of candidate object windows of the input images according to the structured visual dictionary;
combining object window feature codes based on the feature codes of all object windows of the input image to construct an image global feature representation;
and predicting semantic category labels of the input images by utilizing a linear SVM classifier according to the image global feature representation to realize the classification of the images.
2. The method of claim 1, wherein the generating of the candidate object window for each image in the image training set is implemented by an EdgeBox algorithm.
3. The method of claim 1, wherein the extracting the depth features of the candidate object window is performed by a VGG19 depth network model.
4. The method according to claim 1, wherein the optimized visual dictionary learning model is performed by the following formula:
in the above formula, XiA visual characteristic matrix of all training samples corresponding to the ith semantic object class; d∈iRepresenting a class-specific visual dictionary in which dictionary words corresponding to the ith class in the structured visual dictionary D are reserved, and the rest semantic object class dictionary words are set to zero; a. theiIs a visual feature matrix XiClass specific dictionary D∈iA matrix of representation coefficients of (a); d is a structured visual dictionary to be optimized, which is a set of dictionary words of all semantic object categories; ziIs a feature matrix XiRepresenting coefficients on a structured visual dictionary D; diAnd DjRespectively representing visual dictionaries corresponding to the ith and jth class objects in the structured dictionary D; symbol | · | non-conducting phosphorFA Frobenius norm representing a computational matrix; parameters alpha, beta, lambda1、λ2The weighting coefficients of the different cost terms are balanced in the objective function.
5. The method according to claim 1, wherein the objective function for calculating the feature codes of the candidate windows of the input image is:
in the formula, x is the depth visual feature of an object window, y represents the object window feature code to be optimized and solved, D is a structured visual dictionary, and parameter eta represents the number of nonzero elements in the control feature code y, namely the sparsity of the feature code; symbol | · | non-conducting phosphorFRepresenting the Frobenius norm of the computational matrix.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910830812.1A CN110598776A (en) | 2019-09-03 | 2019-09-03 | Image classification method based on intra-class visual mode sharing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910830812.1A CN110598776A (en) | 2019-09-03 | 2019-09-03 | Image classification method based on intra-class visual mode sharing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110598776A true CN110598776A (en) | 2019-12-20 |
Family
ID=68857276
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910830812.1A Pending CN110598776A (en) | 2019-09-03 | 2019-09-03 | Image classification method based on intra-class visual mode sharing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110598776A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112329884A (en) * | 2020-11-25 | 2021-02-05 | 成都信息工程大学 | Zero sample identification method and system based on discriminant visual attributes |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104239897A (en) * | 2014-09-04 | 2014-12-24 | 天津大学 | Visual feature representing method based on autoencoder word bag |
CN104331717A (en) * | 2014-11-26 | 2015-02-04 | 南京大学 | Feature dictionary structure and visual feature coding integrating image classifying method |
CN104537392A (en) * | 2014-12-26 | 2015-04-22 | 电子科技大学 | Object detection method based on distinguishing semantic component learning |
CN107704864A (en) * | 2016-07-11 | 2018-02-16 | 大连海事大学 | Well-marked target detection method based on image object Semantic detection |
-
2019
- 2019-09-03 CN CN201910830812.1A patent/CN110598776A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104239897A (en) * | 2014-09-04 | 2014-12-24 | 天津大学 | Visual feature representing method based on autoencoder word bag |
CN104331717A (en) * | 2014-11-26 | 2015-02-04 | 南京大学 | Feature dictionary structure and visual feature coding integrating image classifying method |
CN104537392A (en) * | 2014-12-26 | 2015-04-22 | 电子科技大学 | Object detection method based on distinguishing semantic component learning |
CN107704864A (en) * | 2016-07-11 | 2018-02-16 | 大连海事大学 | Well-marked target detection method based on image object Semantic detection |
Non-Patent Citations (2)
Title |
---|
KAREN SIMONYAN ET AL.: "Very deep convolutional networks for large-scale image recognition", 《INTERNATIONAL CONFERENCE ON LEARNING REPRESENTATIONS》 * |
谢昱锐: "图像的语义信息提取与分类方法研究", 《中国优秀博硕士学位论文全文数据库(博士) 信息科技辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112329884A (en) * | 2020-11-25 | 2021-02-05 | 成都信息工程大学 | Zero sample identification method and system based on discriminant visual attributes |
CN112329884B (en) * | 2020-11-25 | 2022-06-07 | 成都信息工程大学 | Zero sample identification method and system based on discriminant visual attributes |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107122809B (en) | Neural network feature learning method based on image self-coding | |
CN110413924B (en) | Webpage classification method for semi-supervised multi-view learning | |
CN109376242B (en) | Text classification method based on cyclic neural network variant and convolutional neural network | |
CN111274398B (en) | Method and system for analyzing comment emotion of aspect-level user product | |
CN108595636A (en) | The image search method of cartographical sketching based on depth cross-module state correlation study | |
CN113190699A (en) | Remote sensing image retrieval method and device based on category-level semantic hash | |
CN103186538A (en) | Image classification method, image classification device, image retrieval method and image retrieval device | |
CN105868706A (en) | Method for identifying 3D model based on sparse coding | |
CN112306494A (en) | Code classification and clustering method based on convolution and cyclic neural network | |
CN111125411A (en) | Large-scale image retrieval method for deep strong correlation hash learning | |
CN110413791A (en) | File classification method based on CNN-SVM-KNN built-up pattern | |
CN114048354B (en) | Test question retrieval method, device and medium based on multi-element characterization and metric learning | |
CN112712127A (en) | Image emotion polarity classification method combined with graph convolution neural network | |
CN110969023B (en) | Text similarity determination method and device | |
CN112507800A (en) | Pedestrian multi-attribute cooperative identification method based on channel attention mechanism and light convolutional neural network | |
CN112434553A (en) | Video identification method and system based on deep dictionary learning | |
CN110046660A (en) | A kind of product quantization method based on semi-supervised learning | |
CN113537304A (en) | Cross-modal semantic clustering method based on bidirectional CNN | |
CN109933682A (en) | A kind of image Hash search method and system based on semanteme in conjunction with content information | |
Cosovic et al. | Classification methods in cultural heritage | |
CN110110120B (en) | Image retrieval method and device based on deep learning | |
CN109471930B (en) | Emotional board interface design method for user emotion | |
CN108388918B (en) | Data feature selection method with structure retention characteristics | |
CN116629258B (en) | Structured analysis method and system for judicial document based on complex information item data | |
CN110598776A (en) | Image classification method based on intra-class visual mode sharing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191220 |
|
RJ01 | Rejection of invention patent application after publication |