CN108764302A - A kind of bill images sorting technique based on color characteristic and bag of words feature - Google Patents

A kind of bill images sorting technique based on color characteristic and bag of words feature Download PDF

Info

Publication number
CN108764302A
CN108764302A CN201810434070.6A CN201810434070A CN108764302A CN 108764302 A CN108764302 A CN 108764302A CN 201810434070 A CN201810434070 A CN 201810434070A CN 108764302 A CN108764302 A CN 108764302A
Authority
CN
China
Prior art keywords
bill
feature
color
bag
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810434070.6A
Other languages
Chinese (zh)
Other versions
CN108764302B (en
Inventor
李浚时
李文军
陈龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
National Sun Yat Sen University
Original Assignee
National Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Sun Yat Sen University filed Critical National Sun Yat Sen University
Priority to CN201810434070.6A priority Critical patent/CN108764302B/en
Publication of CN108764302A publication Critical patent/CN108764302A/en
Application granted granted Critical
Publication of CN108764302B publication Critical patent/CN108764302B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to the technical fields of image, more particularly, to a kind of bill images sorting technique based on color characteristic and bag of words feature.Present invention utilizes the thinkings of Bag of Words classical in computer vision, i.e. first to extracting the SIFT feature of every bill in training sample and generating 128 dimensional feature descriptors, then it carries out K mean cluster and obtains K vision word, and its vision word occurrence number is counted to every a kind of bill and forms such vision word histogram as feature, it finally incorporates color characteristic and forms total feature vector, it is sent into SVM classifier to be trained, obtains bill classification device model.Because bag of words do not use the color characteristic of bill images, therefore this method adds the global domain color feature of image, further promotes the performance of bill classification device.The present invention only needs minute quantity training sample and can train bill classification device model without engineer's additional features, and grader classification speed is fast, and accuracy rate is high.

Description

A kind of bill images sorting technique based on color characteristic and bag of words feature
Technical field
The present invention relates to the technical fields of image, more particularly, to a kind of ticket based on color characteristic and bag of words feature According to image classification method.
Background technology
In traditional bill management, often relies on and manually classify to bill, because often number is huge for bill to be sorted Greatly, so needing that a large amount of manpower and materials is spent to go to complete, therefore bill automatic classification system comes into being, and is with machine vision Technical background solves this kind of classification work simply repeated.Bill automatic classification system now is required for acquiring first more All kinds of bill images as training sample, and for the various specific features of bill engineer such as line segment, angle point, shape, Texture etc., then these feature vectorizations feeding grader such as SVM is trained.This categorizing system needs to collect a large amount of instructions Practice sample and great effort engineer's bill feature is spent just to can guarantee the classification performance for training obtained model, therefore there are one Fixed limitation.In addition, existing bill classification system can support that the bill of classification is that value-added tax, universal machine dismiss the wealth such as ticket mostly Business special invoice, but can not classify for bills such as the common train ticket of reimbursement system, high guaranteed votes, taxi ticket, plane tickets, do not have There is versatility.
Chinese patent grant number CN106096667 authorizes a kind of bill classification method based on SVM, and this method needs thing Characteristic Design, such as official seal extraction and lines detection manually first are carried out to bill, this method is only applicable in a small number of bills, for big The bill of most no straight lines or seal can not classify, and method is excessively limited to.
China Patent Publication No. CN107633239 discloses a kind of bill classification and bill based on deep learning and OCR Field extracting method, this method need to obtain seal contour feature first, and need a large amount of seal samples of collecting as depth The training sample of study, this method are not only not suitable for the bill with most of no seals, but also need a large amount of collection training samples.
Invention content
The present invention is at least one defect overcome described in the above-mentioned prior art, is provided a kind of based on color characteristic and bag of words The bill images sorting technique of feature, the bill images sorting technique based on color characteristic and bag of words feature can be instructed in minute quantity Practice on collection, corresponding feature is designed the characteristics of without being directed to every a kind of bill, merely with the image overall color characteristic of generation And bag of words feature based on SIFT feature and it is sent into SVM classifier training, you can the bill classification device of excellent performance is obtained, When bill classification is predicted on line, the multiclass classification strategy based on color characteristic and bag of words feature is reused, further promotes classification Performance rapidly and accurately completes bill classification task.
The technical scheme is that:A kind of bill images sorting technique based on color characteristic and bag of words feature, including The training of bill classification device and bill Fast Classification two large divisions on line under line,
Bill classification device training part is divided into color feature extracted and bag of words training two large divisions under line:Color-feature module HSV color spaces are converted into the picture in training set first, and quantization is carried out to H components and generates color histogram, record is every The domain color of class bill simultaneously preserves deposit hard disk preservation;Bag of words training carries out SIFT spies according to the bill to training set Sign, which is extracted and carries out K mean cluster, obtains K cluster centre, and carries out characteristic quantification and generate bag of words, and to the bill of every one kind Training sample carries out vision word frequency and is counted, and generates such vision word histogram, and generate phase with this histogram The feature vector answered, the input as feature SVM classifier are trained, and are finally stored in trained model parameter file Hard disk preserves;
Bill Fast Classification part is firstly the need of being loaded into trained bill classification device model and color classification parameter on line File is transformed into hsv color space to image first, generates its color histogram and extracts domain color feature, is sentenced with this feature The domain color of the fixed bill exists and uniquely in existing bill classification, if then direct output category result, if not then entering Bag of words assorting process;Bag of words assorting process is firstly the need of the SIFT feature for extracting image and generates vision word, and This feature vector feeding SVM is classified, classification results is obtained, then needs to do two according to color characteristic to this classification results Secondary judgement, i.e., according to the classification results inquire the corresponding domain color feature of the result whether with the domain color feature phase that previously obtained Together, if then output category result, the presentation class result mistake if different, classification fail.
Further, the described bill sample for training only need every class only to need several can to train performance excellent Elegant bill classification model, without largely collecting training sample.
Further, this disaggregated model takes characteristic of division of the bill domain color feature as bill, i.e., first to image HSV color spaces are transformed into, the color histogram to image statistics its H components simultaneously extracts master of its maximum component as image Color characteristic.
Further, this disaggregated model use based on SIFT feature Bag of Words models be trained on line Classification carries out SIFT feature extraction to training set image first and K mean cluster forms K vision word, and to training It concentrates the vision word of every bill to carry out statistics and forms bag of words feature, and be sent into SVM classifier and be trained.
Further, bill classification is to take hierarchy classification policy on line, i.e., first with color characteristic on line One subseries, classify successfully then directly output as a result, if failure after if again with the color characteristic of the prediction result of SVM classifier again It compares, obtains final classification result.
Bill classification device training module is as follows under line:
(1)Image switchs to HSV color spaces
(2)Generate the image histogram of the H components per a kind of training sample.
(3)Extraction as the domain color feature of such image and preserves extremely per the highest component in a kind of image histogram File.
(4)Image gray processing
(5)SIFT feature is extracted to every a kind of training sample, generates SIFT feature descriptor.
(6)K mean cluster obtains the K class heart as bag of words
(7)The vision word of the training sample of every one kind is counted successively, obtains vision word feature vector.
(8)Feature vector feeding SVM classifier is trained
(9)Preserve trained sorter model file.
Grader bill classification prediction module is as follows on line:
(1)Bill classification system initialization is loaded into model parameter file and color characteristic file.
(2)HSV color spaces are transformed into image to be sorted.
(3)Generate the color histogram of figure H components.
(4)Extract domain color feature.
(5)Judge whether the domain color feature of the figure is to exist and uniquely in color characteristic file, if then directly defeated Go out prediction result of classifying, if not thening follow the steps(6).
(6)Image gray processing.
(7)SIFT feature is extracted to every a kind of training sample, generates SIFT feature descriptor.
(8)The vision word of the figure is counted, vision word feature vector is obtained.
(9)SVM carries out classification prediction according to vision word feature vector.
(10)Secondary verification is carried out to the classification results, that is, judge to do result domain color feature whether with(4)Step obtains The domain color feature arrived is consistent, if then output category result, as not being to classify and fail.
Compared with prior art, advantageous effect is:The present invention can automatically extract color under a small amount of bill training samples Feature and bag of words feature, without being directed to every class bill feature artificial design features, applicable bill range is very wide, and classification is accurate Rate is high, classification speed is fast.
Description of the drawings
Fig. 1 indicates the algorithm frame of the present invention.
Fig. 2 indicates the decision process classified on line when predicting.
Specific implementation mode
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;It is attached in order to more preferably illustrate the present embodiment Scheme certain components to have omission, zoom in or out, does not represent the size of actual product;To those skilled in the art, The omitting of some known structures and their instructions in the attached drawings are understandable.Being given for example only property of position relationship described in attached drawing Illustrate, should not be understood as the limitation to this patent.
As shown in Figure 1, our programs are divided under line two big mould of bill Fast Classification on bill classification model training and line Block.The specific steps of disaggregated model training include under line:(1)It collects per a kind of bill images, because this method is only for instruction The number for practicing sample requires image that is very low, therefore only needing to select every a kind of bill several high quality;(2)Every is schemed As being all transformed into HSV space;(3)Because chrominance component can most reflect direct feel of the people to color, therefore only to H to chrominance component H is quantified, and is divided into 6 sections for an interval with 60 degree, the color histogram of H components is generated with this;(3)Extraction interval is most Domain color feature of the high color as the bill is preserved to file;(3)To sample image gray processing;(4)To training sample Each image carries out SIFT feature extraction, and generates the Feature Descriptor of 128 dimensions of each characteristic point, preserves;(5) It is right(4)In obtained all feature description subvectors carry out K-Means clusters, class calculation mesh is selected as 1000, i.e., cluster is completed After will obtain 1000 vision words;(6)The frequency of its vision word is counted to the bill of every one kind, per a kind of generation data Word feature vector;(7)With(6)Obtained feature is sent into SVM classifier and is trained as input, and svm classifier mould is obtained Type, and model parameter deposit file is preserved.
The categorised decision process of bill Fast Classification module is as shown in Fig. 2, be as follows on line:(1)System carries out Initialization, be loaded into trained sorter model, by color characteristic file as and be stored in color_pool arrays;(2) Image to be classified is transformed into HSV color spaces;(3)Generate color histogram;(4)Extract domain color component;(5)Check the figure Domain color whether there is in color_pool arrays and uniquely, if then indicating to exist in bill classification has this kind of color Invoice and only one kind, therefore can be determined that the invoice with this kind of color is exactly to belong to the category, can direct output category As a result, if not then entering in next step;(6)To sample image gray processing;(7)SIFT is carried out to each image of training sample Feature point extraction, and the Feature Descriptor of 128 dimensions of each characteristic point is generated, it preserves;(8)To the vision word of the figure into Row statistics, obtains vision word feature vector;(9)Divided according to vision word feature vector using trained SVM Class is predicted, result of presorting is obtained;(10)Secondary verification is carried out to the classification results, that is, judge such domain color whether with (4)Whether obtained domain color is consistent, the output category result if consistent, if inconsistent output category fails, by the word The verification of bag feature and color characteristic, can be with significant increase classification accuracy.
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this All any modification, equivalent and improvement etc., should be included in the claims in the present invention made by within the spirit and principle of invention Protection domain within.

Claims (5)

1. a kind of bill images sorting technique based on color characteristic and bag of words feature, which is characterized in that including bill under line point Bill Fast Classification two large divisions in the training of class device and line,
Bill classification device training part is divided into color feature extracted and bag of words training two large divisions under line:Color-feature module HSV color spaces are converted into the picture in training set first, and quantization is carried out to H components and generates color histogram, record is every The domain color of class bill simultaneously preserves deposit hard disk preservation;Bag of words training carries out SIFT spies according to the bill to training set Sign, which is extracted and carries out K mean cluster, obtains K cluster centre, and carries out characteristic quantification and generate bag of words, and to the bill of every one kind Training sample carries out vision word frequency and is counted, and generates such vision word histogram, and generate phase with this histogram The feature vector answered, the input as feature SVM classifier are trained, and are finally stored in trained model parameter file Hard disk preserves;
Bill Fast Classification part is firstly the need of being loaded into trained bill classification device model and color classification parameter on line File is transformed into hsv color space to image first, generates its color histogram and extracts domain color feature, is sentenced with this feature The domain color of the fixed bill exists and uniquely in existing bill classification, if then direct output category result, if not then entering Bag of words assorting process;Bag of words assorting process is firstly the need of the SIFT feature for extracting image and generates vision word, and This feature vector feeding SVM is classified, classification results is obtained, then needs to do two according to color characteristic to this classification results Secondary judgement, i.e., according to the classification results inquire the corresponding domain color feature of the result whether with the domain color feature phase that previously obtained Together, if then output category result, the presentation class result mistake if different, classification fail.
2. a kind of bill images sorting technique based on color characteristic and bag of words feature according to claim 1, feature It is:The bill sample for training only needs every class only to need the several bill classifications that can train excellent performance Model, without largely collecting training sample.
3. a kind of bill images sorting technique based on color characteristic and bag of words feature according to claim 1, feature It is:This disaggregated model takes characteristic of division of the bill domain color feature as bill, i.e., is first transformed into HSV colors to image Space, color histogram to image statistics its H components simultaneously extract domain color feature of its maximum component as image.
4. a kind of bill images sorting technique based on color characteristic and bag of words feature according to claim 1, feature It is:This disaggregated model uses to be trained based on SIFT feature Bag of Words models classifies on line, i.e., first SIFT feature extraction is carried out to training set image and K mean cluster forms K vision word, and to every bill in training set Vision word carry out statistics and form bag of words feature, and be sent into SVM classifier and be trained.
5. a kind of bill images sorting technique based on color characteristic and bag of words feature according to claim 1, feature It is:Bill classification is to take hierarchy classification policy on line, i.e., first with color characteristic the first subseries on line, classification It is successful then directly output as a result, if failure after if compared again with the color characteristic of the prediction result of SVM classifier again, obtain most Whole classification results.
CN201810434070.6A 2018-05-08 2018-05-08 Bill image classification method based on color features and bag-of-words features Active CN108764302B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810434070.6A CN108764302B (en) 2018-05-08 2018-05-08 Bill image classification method based on color features and bag-of-words features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810434070.6A CN108764302B (en) 2018-05-08 2018-05-08 Bill image classification method based on color features and bag-of-words features

Publications (2)

Publication Number Publication Date
CN108764302A true CN108764302A (en) 2018-11-06
CN108764302B CN108764302B (en) 2021-09-28

Family

ID=64009283

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810434070.6A Active CN108764302B (en) 2018-05-08 2018-05-08 Bill image classification method based on color features and bag-of-words features

Country Status (1)

Country Link
CN (1) CN108764302B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670513A (en) * 2018-11-27 2019-04-23 西安交通大学 A kind of piston attitude detecting method based on bag of words and support vector machines
CN111160373A (en) * 2019-12-30 2020-05-15 重庆邮电大学 Method for extracting, detecting and classifying defect image features of variable speed drum parts
CN111652309A (en) * 2020-05-29 2020-09-11 刘秀萍 Visual word and phrase co-driven bag-of-words model picture classification method
CN112613563A (en) * 2020-12-25 2021-04-06 福建福清核电有限公司 Nuclear power field equipment image classification method based on OpenCV
CN112907534A (en) * 2021-02-18 2021-06-04 哈尔滨市科佳通用机电股份有限公司 Fault detection method and device based on door closing part position image
CN112966715A (en) * 2021-02-02 2021-06-15 哈尔滨商业大学 Commodity image feature description method based on multi-scale visual word bag model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622607A (en) * 2012-02-24 2012-08-01 河海大学 Remote sensing image classification method based on multi-feature fusion
CN106022364A (en) * 2016-05-13 2016-10-12 邓昌顺 Novel note classifying method
CN106203448A (en) * 2016-07-08 2016-12-07 南京信息工程大学 A kind of scene classification method based on Nonlinear Scale Space Theory

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622607A (en) * 2012-02-24 2012-08-01 河海大学 Remote sensing image classification method based on multi-feature fusion
CN106022364A (en) * 2016-05-13 2016-10-12 邓昌顺 Novel note classifying method
CN106203448A (en) * 2016-07-08 2016-12-07 南京信息工程大学 A kind of scene classification method based on Nonlinear Scale Space Theory

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
LEI WANG ET AL.: "Area Determination of Diabetic Foot Ulcer Images Using a Cascaded Two-Stage SVM-Based Classification", 《IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING》 *
ZHUOJIA LIANG ET AL.: "Salient object detection based on regions", 《MULTIMED TOOLS APPL》 *
殷绪成 等: "层次型金融票据图像分类方法", 《中文信息学报》 *
王亚如: "基于HSV颜色空间和SIFT特征的近似图像检索", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
邓亚芳: "基于决策树的瓷砖图像分类方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670513A (en) * 2018-11-27 2019-04-23 西安交通大学 A kind of piston attitude detecting method based on bag of words and support vector machines
CN111160373A (en) * 2019-12-30 2020-05-15 重庆邮电大学 Method for extracting, detecting and classifying defect image features of variable speed drum parts
CN111652309A (en) * 2020-05-29 2020-09-11 刘秀萍 Visual word and phrase co-driven bag-of-words model picture classification method
CN112613563A (en) * 2020-12-25 2021-04-06 福建福清核电有限公司 Nuclear power field equipment image classification method based on OpenCV
CN112966715A (en) * 2021-02-02 2021-06-15 哈尔滨商业大学 Commodity image feature description method based on multi-scale visual word bag model
CN112966715B (en) * 2021-02-02 2021-09-07 哈尔滨商业大学 Commodity image feature description method based on multi-scale visual word bag model
CN112907534A (en) * 2021-02-18 2021-06-04 哈尔滨市科佳通用机电股份有限公司 Fault detection method and device based on door closing part position image

Also Published As

Publication number Publication date
CN108764302B (en) 2021-09-28

Similar Documents

Publication Publication Date Title
CN109344736B (en) Static image crowd counting method based on joint learning
CN108764302A (en) A kind of bill images sorting technique based on color characteristic and bag of words feature
CN110598800A (en) Garbage classification and identification method based on artificial intelligence
CN104268599B (en) Intelligent unlicensed vehicle finding method based on vehicle track temporal-spatial characteristic analysis
CN102346847B (en) License plate character recognizing method of support vector machine
CN108388927A (en) Small sample polarization SAR terrain classification method based on the twin network of depth convolution
CN109952614A (en) The categorizing system and method for biomone
CN108171184A (en) Method for distinguishing is known based on Siamese networks again for pedestrian
CN105574550A (en) Vehicle identification method and device
CN108197538A (en) A kind of bayonet vehicle searching system and method based on local feature and deep learning
CN106610969A (en) Multimodal information-based video content auditing system and method
CN105426903A (en) Cloud determination method and system for remote sensing satellite images
CN109993201A (en) A kind of image processing method, device and readable storage medium storing program for executing
Raj et al. Helmet violation processing using deep learning
CN109410184B (en) Live broadcast pornographic image detection method based on dense confrontation network semi-supervised learning
CN111046886A (en) Automatic identification method, device and equipment for number plate and computer readable storage medium
CN113963147B (en) Key information extraction method and system based on semantic segmentation
CN106156777A (en) Textual image detection method and device
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
CN112990282B (en) Classification method and device for fine-granularity small sample images
CN106203539A (en) The method and apparatus identifying container number
CN114937179B (en) Junk image classification method and device, electronic equipment and storage medium
Chandran et al. Missing child identification system using deep learning and multiclass SVM
CN102129568A (en) Method for detecting image-based spam email by utilizing improved gauss hybrid model classifier
CN106033443B (en) A kind of expanding query method and device in vehicle retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant