CN110598776A - Image classification method based on intra-class visual mode sharing - Google Patents

Image classification method based on intra-class visual mode sharing Download PDF

Info

Publication number
CN110598776A
CN110598776A CN201910830812.1A CN201910830812A CN110598776A CN 110598776 A CN110598776 A CN 110598776A CN 201910830812 A CN201910830812 A CN 201910830812A CN 110598776 A CN110598776 A CN 110598776A
Authority
CN
China
Prior art keywords
image
visual
class
dictionary
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910830812.1A
Other languages
Chinese (zh)
Inventor
谢昱锐
刘甲甲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu University of Information Technology
Original Assignee
Chengdu University of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu University of Information Technology filed Critical Chengdu University of Information Technology
Priority to CN201910830812.1A priority Critical patent/CN110598776A/en
Publication of CN110598776A publication Critical patent/CN110598776A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an image classification method based on intra-class visual mode sharing, which comprises the following steps: generating an image object window; extracting depth features of an image window; visual dictionary learning based on in-class sharing characteristics: according to the depth characteristics of the candidate object windows of all semantic category images, a structured visual dictionary with in-class sharing characteristics is obtained by optimizing a visual dictionary learning model; generating an input image object window and extracting characteristics; feature coding of the object window; integrating visual features and constructing image global features; and predicting semantic tags of the SVM classifier. The method analyzes and solves the problem from the perspective of introducing the sharing characteristic of the visual mode with more practical value, realizes the enhancement of the semantic expression of the image characteristics and improves the accuracy of the identification of the image types by cooperatively mining the visual dictionary words with the sharing characteristic in each semantic type.

Description

Image classification method based on intra-class visual mode sharing
Technical Field
The invention relates to the technical field of image recognition, in particular to an image classification method based on intra-class visual mode sharing.
Background
With the current continuous development of digital multimedia technology and internet technology, human society has stepped into a big data age in which multimedia data is rapidly increasing. Among multimedia data of different forms, image data plays an important role in various aspects of people's social life due to the characteristics of intuition and easy acquisition, and further, how to effectively analyze and understand the content of the image data becomes increasingly important. In the past years, a plurality of image semantic object classification methods make certain progress in visual feature generation, object model construction and strong supervision learning modes. However, due to the semantic gap between the bottom-layer visual features and the middle-layer and high-layer information, the existing image semantic object classification method is still slow in the key problems of discriminant feature construction, associated information collaborative analysis, visual feature semantics and the like.
For the image classification problem, the current research focus is mainly on the construction aspect of semantic representation of image features. When the image feature representation can fully describe the semantic content of the object, the accurate prediction of the semantic content of the image can be realized only by a simple linear classifier. The acquisition of early image features is based on underlying visual cues such as color, shape, texture and the like, and histogram representation of visual information is generated through an artificially defined image feature construction mode. However, the bottom-layer visual feature representation is only a statistical description of visual information, and it is difficult to effectively depict semantic object contents, which eventually leads to an inability to accurately predict the category to which the image belongs in an actual classification task. In order to solve the above problems, subsequent research efforts are focused on extracting image feature representations with more semantic discriminative properties by means of machine learning. In a plurality of image classification models, a visual dictionary learning-based method decomposes the construction problem of image semantic feature representation into four sub-problems of bottom layer feature extraction, visual dictionary learning, local feature coding and image global feature generation.
According to the current image classification method based on visual dictionary learning, the obtained dictionary words are mutually independent, the exploration of the correlation among the dictionary words is lacked, and the judgment capability of the visual dictionary for constructing image feature representation is weakened. In fact, in the process of learning the visual dictionary, visual dictionary words with relevance are cooperatively mined, so that the consistency of feature representation of images of the same semantic category and the difference between feature representations of images of different semantic categories can be effectively enhanced, and the performance of image semantic object category prediction is finally improved.
Disclosure of Invention
Aiming at the defects in the prior art, the technical problem to be solved by the invention is to provide an image classification method based on intra-class visual mode sharing so as to solve the problem of insufficient semantic information represented by image features.
The technical scheme adopted by the invention for realizing the purpose is as follows: an image classification method based on intra-class visual mode sharing comprises the following steps:
image object window generation: giving an image training set containing multiple semantic class objects, and generating a candidate object window of each image in the image training set;
extracting depth features of an image window: extracting the depth feature of the candidate object window;
visual dictionary learning based on in-class sharing characteristics: according to the depth characteristics of the candidate object windows of all semantic category images, a structured visual dictionary with in-class sharing characteristics is obtained by optimizing a visual dictionary learning model;
generating candidate object windows of the image for the input image with unknown semantic category, and extracting the depth characteristics of the candidate object windows;
calculating the characteristic codes of candidate object windows of the input images according to the structured visual dictionary;
combining object window feature codes based on the feature codes of all object windows of the input image to construct an image global feature representation;
and predicting semantic category labels of the input images by utilizing a linear SVM classifier according to the image global feature representation to realize the classification of the images.
And the generation of the candidate object window of each image in the image training set is realized by an EdgeBox algorithm.
The extracting of the depth features of the candidate object window is completed through a VGG19 deep network model.
The optimized visual dictionary learning model is performed by the following formula:
in the above formula, XiA visual characteristic matrix of all training samples corresponding to the ith semantic object class; d∈iRepresenting a class-specific visual dictionary in which dictionary words corresponding to the ith class in the structured visual dictionary D are reserved, and the rest semantic object class dictionary words are set to zero; a. theiIs a visual feature matrix XiClass specific dictionary D∈iA matrix of representation coefficients of (a); d is a structured visual dictionary to be optimized, which is a set of dictionary words of all semantic object categories; ziIs a feature matrix XiRepresenting coefficients on a structured visual dictionary D; diAnd DjRespectively representing visual dictionaries corresponding to the ith and jth class objects in the structured dictionary D; symbol | · | non-conducting phosphorFA Frobenius norm representing a computational matrix; parameters alpha, beta, lambda1、λ2The weighting coefficients of the different cost terms are balanced in the objective function.
The objective function for calculating the feature codes of the candidate object windows of the input image is as follows:
in the formula, x is the depth visual feature of an object window, y represents the object window feature code to be optimized and solved, D is a structured visual dictionary, and parameter eta represents the number of nonzero elements in the control feature code y, namely the sparsity of the feature code; symbol | · | non-conducting phosphorFRepresenting the Frobenius norm of the computational matrix.
The invention has the following advantages and beneficial effects:
1. aiming at the problem that the relevance constraint among words is neglected in the process of learning the words of the visual dictionary by the current image classification method based on the visual dictionary learning, and the lack of image feature representation semantic information is caused, the method provided by the invention analyzes and solves the problem from the perspective of introducing the sharing characteristic of the visual pattern with practical value, realizes the enhancement of image feature representation semantic property and improves the accuracy of image category identification by cooperatively mining the visual dictionary words with the sharing characteristic in each semantic category.
2. The invention has the characteristics of no manual participation, high classification accuracy and the like. The method is different from the method for learning each dictionary word by an image classification method based on dictionary learning in isolation at present, introduces the sharing characteristic of the visual mode in the semantic object class, cooperatively mines the visual dictionary word with the sharing characteristic in the same semantic class, establishes the association constraint among the dictionary words, and improves the problem of lack of semantic information expressed by the visual characteristic of the current image.
3. The method of the present invention is practical and effective.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a diagram of a set of visual features based on an image object window according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
The method is realized on a Matlab R2016b experimental platform, as shown in FIG. 1, the method mainly comprises two parts, specifically, an intra-class shared visual pattern mining part mainly relates to the following contents, including image object window generation, window depth feature extraction and visual dictionary learning based on intra-class shared characteristics; in the image global feature construction and classification part, four steps of input image object window generation and feature extraction, object window feature coding, visual feature integration and construction of image global features and SVM classifier semantic label prediction are involved. The method comprises the following specific steps:
in-class shared visual pattern mining:
step one, generating an image object window
Given a training set of images containing multiple semantic class objects, candidate object windows for each image are generated using the EdgeBox algorithm (see document c. lawrence Zitnick and Piotr Doll' ar. edge boxes: Locating object pro posalsfrom images in European Conference Computer Vision 2014.).
Step two, extracting depth features of image windows
The image candidate object window depth features are extracted using the VGG19 depth network model (see document k. simony and a. zip. version depth relational networks for large-scale image registration. in international conference on Learning retrieval, 2015.).
Step three, learning a visual dictionary based on in-class sharing characteristics
In order to mine a visual pattern with a sharing characteristic in each semantic category, the embodiment of the invention designs the following structured visual dictionary, and the mathematical form of the visual dictionary is as follows:
D=[D1,D2,...,DC]
d in the above formula represents a constructed visual dictionary, which is a dictionary D of each object classiI 1, 2.., C, which represents the number of object classes contained by the image set.
And according to the depth characteristics of the candidate object windows of all semantic image classes, a structured visual dictionary D with the in-class sharing characteristic is obtained by optimizing a visual dictionary learning model constructed in the following way.
In the above formula, XiVisual feature matrices for all training samples corresponding to the ith semantic object class, D∈iRepresenting a class-specific visual dictionary in which dictionary words corresponding to the ith class in the structured dictionary D are reserved, and the rest semantic object class dictionary words are set to zero; a. theiIs a visual feature matrix XiClass specific dictionary D∈iA matrix of representation coefficients of (a); d is a structured visual dictionary to be optimized, which is a set of dictionary words of all semantic object categories; ziIs a feature matrix XiRepresenting coefficients on a structured dictionary D; diAnd DjRespectively representing visual dictionaries corresponding to the ith and jth class objects in the structured dictionary D; symbolNo. | · | non-conducting phosphorFRepresenting the Frobenius norm of the computational matrix.
In the constructed visual dictionary learning model, the first two cost itemsAndreconstructing residual terms for data, whose effect is to exploit the learned class-specific visual dictionary D∈iAnd a structured visual dictionary D, which realizes effective reconstruction of visual features of the ith semantic object category. Cost itemAnd selecting dictionary words corresponding to the ith class in the structured dictionary D for the consistency constraint of the introduced representation coefficients so as to reconstruct and represent the visual features of the class, thereby ensuring that the reconstruction coefficients of the feature data of the same object class have consistency. Cost item in dictionary learning modelDictionary D for each semantic object classiAnd the orthogonality constraint between the words I and C can enhance the difference between different dictionary words, and ensure the discrimination capability of subsequent image feature coding. Last two terms lambda in the optimization model1||Ai||2,1,λ2||Zi||2,1To represent the coefficient AiAnd ZiThe applied regularization constraint specifically adopts a method based on l in the embodiment2,1The set of norms is sparse such that the matrix of solved representation coefficients is sparse by rows. In the dictionary learning model, will l2,1Norm group sparse regularization term and representation coefficient consistency constraintPerforming joint optimization, on one hand, ensuring the i-th visual dictionary DiThe reconstruction of the semantic category visual characteristics can also be carried outAnd effectively mining a visual mode with a sharing characteristic in the ith category, and finally improving the consistency of the feature representation of the same semantic category and the difference between the features of different semantic categories. Parameters alpha, beta and lambda in dictionary learning model1、λ2The weighting coefficients for balancing different cost terms in the objective function are empirically set to 0.01 through experiments. The dictionary learning objective function is a multivariable optimization problem, and iterative computation is performed by adopting an alternative optimization strategy. Specifically, when one variable in the objective function is optimized, the other variables are fixed, and then the original convex optimization problem is converted into a plurality of convex optimization sub-problems to be solved.
Constructing and classifying image global features:
step one, generating an input image object window and extracting characteristics
Given an input image with unknown semantic category, candidate object windows of the image are generated by utilizing an EdgeBox algorithm, and VGG19 deep network visual features of the candidate object windows are further extracted.
Step two, object window characteristic coding
And calculating the characteristic codes of the candidate object windows of the input images according to the acquired structured visual dictionary D. The specific mathematical form of the objective function is as follows:
in the formula, x is the depth visual feature of an object window, y represents the object window feature code to be optimized and solved, D is a structured visual dictionary, and parameter eta represents the number of nonzero elements in the control feature code y, namely the sparsity of the feature code; symbol | · | non-conducting phosphorFRepresenting the Frobenius norm of the computational matrix.
In order to solve the above objective function, a Feature-design search (see literature (honglag Lee, Alexis Battle, Rajat Raina, and Andrew y. ng. efficient mapping algorithms, the Conference on Neural Information Processing Systems, pages 801-808.2007.) algorithm is specifically adopted to calculate the variable y to be optimized, i.e. to obtain the Feature code of the image window.
Step three, integrating visual features and constructing image global features
Based on the feature codes of all object windows of the input image, further borrowing the mode of traditional Max-Pooling feature integration (see the literature: Jiancho Yang, Kai Yu, Yihong Gong, and T. Huang. Linear spatial profiling using encoding for image classification. in IEEEConference on Computer Vision and Pattern Recognition, pages 1794-.
The traditional Max-Pooling feature integration method is based on image local interest points, and in order to embed space distribution information in image global features, a step of dividing different space scales of an image is added in the method. Different from the traditional method, the method is based on the image object window area, image space distribution and object semantic information are effectively introduced in the image global feature construction process, and finally the construction of the image global feature directly encodes all window features of the image and obtains the maximum value on each feature dimension.
Step four, forecasting semantic tags of SVM classifier
And (3) according to the image global feature representation constructed in the third step, predicting semantic category labels of the input image by using a linear SVM classifier (see the documents: R. -E.Fan, K. -W.Chang, C. -J.Hsieh, et al.Libliner: A library for large linear classification Research. journal of Machine Learning Research, 2008, 9: 1871-.
Table 1 evaluation of the accuracy of the method of the present invention and existing image classification methods on UIUC8 object recognition databases
As shown in the above table, the experiments were compared to existing methods on UIUC8 object recognition databases. The database contains image data of 8 different sports categories, and 1972 images. In order to calculate the classification accuracy of different methods, 70 images are randomly selected from the image sets of all classes as training data, 60 images are randomly selected from the rest images of the classes as test data, and the final classification accuracy is the average value of the classification accuracy of all the classes.
Note: the image classification method LLC in the above table, see literature (Jinjun Wang, Jiancho Yang, Kai Yu, Fengjun Lv, T.Huang, and Yihong Gong.Locality-constrained linear coding for image classification. in IEEE Conference on Computer Vision and Pattern recognition, pages 3360. 3367, 2010.); an image classification method LSC, see literature (Lingqi Liu, Lei Wang, and Xinwang Liu. in destination of soft-alignment coding. in IEEEInternational Conference on Computer Vision, pages 2486-; the image classification method CNN, see literature (k. simony and a. zisserman. very deep relational network for large-scale image recognition. in International Conference on learning information, 2015.).

Claims (5)

1. An image classification method based on intra-class visual mode sharing is characterized by comprising the following steps:
image object window generation: giving an image training set containing multiple semantic class objects, and generating a candidate object window of each image in the image training set;
extracting depth features of an image window: extracting the depth feature of the candidate object window;
visual dictionary learning based on in-class sharing characteristics: according to the depth characteristics of the candidate object windows of all semantic category images, a structured visual dictionary with in-class sharing characteristics is obtained by optimizing a visual dictionary learning model;
generating candidate object windows of the image for the input image with unknown semantic category, and extracting the depth characteristics of the candidate object windows;
calculating the characteristic codes of candidate object windows of the input images according to the structured visual dictionary;
combining object window feature codes based on the feature codes of all object windows of the input image to construct an image global feature representation;
and predicting semantic category labels of the input images by utilizing a linear SVM classifier according to the image global feature representation to realize the classification of the images.
2. The method of claim 1, wherein the generating of the candidate object window for each image in the image training set is implemented by an EdgeBox algorithm.
3. The method of claim 1, wherein the extracting the depth features of the candidate object window is performed by a VGG19 depth network model.
4. The method according to claim 1, wherein the optimized visual dictionary learning model is performed by the following formula:
in the above formula, XiA visual characteristic matrix of all training samples corresponding to the ith semantic object class; d∈iRepresenting a class-specific visual dictionary in which dictionary words corresponding to the ith class in the structured visual dictionary D are reserved, and the rest semantic object class dictionary words are set to zero; a. theiIs a visual feature matrix XiClass specific dictionary D∈iA matrix of representation coefficients of (a); d is a structured visual dictionary to be optimized, which is a set of dictionary words of all semantic object categories; ziIs a feature matrix XiRepresenting coefficients on a structured visual dictionary D; diAnd DjRespectively representing visual dictionaries corresponding to the ith and jth class objects in the structured dictionary D; symbol | · | non-conducting phosphorFA Frobenius norm representing a computational matrix; parameters alpha, beta, lambda1、λ2The weighting coefficients of the different cost terms are balanced in the objective function.
5. The method according to claim 1, wherein the objective function for calculating the feature codes of the candidate windows of the input image is:
in the formula, x is the depth visual feature of an object window, y represents the object window feature code to be optimized and solved, D is a structured visual dictionary, and parameter eta represents the number of nonzero elements in the control feature code y, namely the sparsity of the feature code; symbol | · | non-conducting phosphorFRepresenting the Frobenius norm of the computational matrix.
CN201910830812.1A 2019-09-03 2019-09-03 Image classification method based on intra-class visual mode sharing Pending CN110598776A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910830812.1A CN110598776A (en) 2019-09-03 2019-09-03 Image classification method based on intra-class visual mode sharing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910830812.1A CN110598776A (en) 2019-09-03 2019-09-03 Image classification method based on intra-class visual mode sharing

Publications (1)

Publication Number Publication Date
CN110598776A true CN110598776A (en) 2019-12-20

Family

ID=68857276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910830812.1A Pending CN110598776A (en) 2019-09-03 2019-09-03 Image classification method based on intra-class visual mode sharing

Country Status (1)

Country Link
CN (1) CN110598776A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329884A (en) * 2020-11-25 2021-02-05 成都信息工程大学 Zero sample identification method and system based on discriminant visual attributes

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239897A (en) * 2014-09-04 2014-12-24 天津大学 Visual feature representing method based on autoencoder word bag
CN104331717A (en) * 2014-11-26 2015-02-04 南京大学 Feature dictionary structure and visual feature coding integrating image classifying method
CN104537392A (en) * 2014-12-26 2015-04-22 电子科技大学 Object detection method based on distinguishing semantic component learning
CN107704864A (en) * 2016-07-11 2018-02-16 大连海事大学 Well-marked target detection method based on image object Semantic detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239897A (en) * 2014-09-04 2014-12-24 天津大学 Visual feature representing method based on autoencoder word bag
CN104331717A (en) * 2014-11-26 2015-02-04 南京大学 Feature dictionary structure and visual feature coding integrating image classifying method
CN104537392A (en) * 2014-12-26 2015-04-22 电子科技大学 Object detection method based on distinguishing semantic component learning
CN107704864A (en) * 2016-07-11 2018-02-16 大连海事大学 Well-marked target detection method based on image object Semantic detection

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KAREN SIMONYAN ET AL.: "Very deep convolutional networks for large-scale image recognition", 《INTERNATIONAL CONFERENCE ON LEARNING REPRESENTATIONS》 *
谢昱锐: "图像的语义信息提取与分类方法研究", 《中国优秀博硕士学位论文全文数据库(博士) 信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329884A (en) * 2020-11-25 2021-02-05 成都信息工程大学 Zero sample identification method and system based on discriminant visual attributes
CN112329884B (en) * 2020-11-25 2022-06-07 成都信息工程大学 Zero sample identification method and system based on discriminant visual attributes

Similar Documents

Publication Publication Date Title
CN107122809B (en) Neural network feature learning method based on image self-coding
CN109376242B (en) Text classification method based on cyclic neural network variant and convolutional neural network
CN110413924B (en) Webpage classification method for semi-supervised multi-view learning
CN111274398B (en) Method and system for analyzing comment emotion of aspect-level user product
CN108595636A (en) The image search method of cartographical sketching based on depth cross-module state correlation study
CN113190699A (en) Remote sensing image retrieval method and device based on category-level semantic hash
CN103186538A (en) Image classification method, image classification device, image retrieval method and image retrieval device
CN112306494A (en) Code classification and clustering method based on convolution and cyclic neural network
CN111125411A (en) Large-scale image retrieval method for deep strong correlation hash learning
CN110413791A (en) File classification method based on CNN-SVM-KNN built-up pattern
CN112712127A (en) Image emotion polarity classification method combined with graph convolution neural network
CN110969023B (en) Text similarity determination method and device
CN112507800A (en) Pedestrian multi-attribute cooperative identification method based on channel attention mechanism and light convolutional neural network
CN110046660A (en) A kind of product quantization method based on semi-supervised learning
CN113537304A (en) Cross-modal semantic clustering method based on bidirectional CNN
Cosovic et al. Classification methods in cultural heritage
CN114048354B (en) Test question retrieval method, device and medium based on multi-element characterization and metric learning
CN110110120B (en) Image retrieval method and device based on deep learning
CN109471930B (en) Emotional board interface design method for user emotion
CN110765781A (en) Man-machine collaborative construction method for domain term semantic knowledge base
CN108388918B (en) Data feature selection method with structure retention characteristics
CN116629258B (en) Structured analysis method and system for judicial document based on complex information item data
CN112418257B (en) Effective zero sample learning method based on potential visual attribute mining
CN110598776A (en) Image classification method based on intra-class visual mode sharing
Vijayaraju Image retrieval using image captioning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191220

RJ01 Rejection of invention patent application after publication