CN108764302A - A kind of bill images sorting technique based on color characteristic and bag of words feature - Google Patents
A kind of bill images sorting technique based on color characteristic and bag of words feature Download PDFInfo
- Publication number
- CN108764302A CN108764302A CN201810434070.6A CN201810434070A CN108764302A CN 108764302 A CN108764302 A CN 108764302A CN 201810434070 A CN201810434070 A CN 201810434070A CN 108764302 A CN108764302 A CN 108764302A
- Authority
- CN
- China
- Prior art keywords
- bill
- feature
- color
- bag
- words
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/50—Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The present invention relates to the technical fields of image, more particularly, to a kind of bill images sorting technique based on color characteristic and bag of words feature.Present invention utilizes the thinkings of Bag of Words classical in computer vision, i.e. first to extracting the SIFT feature of every bill in training sample and generating 128 dimensional feature descriptors, then it carries out K mean cluster and obtains K vision word, and its vision word occurrence number is counted to every a kind of bill and forms such vision word histogram as feature, it finally incorporates color characteristic and forms total feature vector, it is sent into SVM classifier to be trained, obtains bill classification device model.Because bag of words do not use the color characteristic of bill images, therefore this method adds the global domain color feature of image, further promotes the performance of bill classification device.The present invention only needs minute quantity training sample and can train bill classification device model without engineer's additional features, and grader classification speed is fast, and accuracy rate is high.
Description
Technical field
The present invention relates to the technical fields of image, more particularly, to a kind of ticket based on color characteristic and bag of words feature
According to image classification method.
Background technology
In traditional bill management, often relies on and manually classify to bill, because often number is huge for bill to be sorted
Greatly, so needing that a large amount of manpower and materials is spent to go to complete, therefore bill automatic classification system comes into being, and is with machine vision
Technical background solves this kind of classification work simply repeated.Bill automatic classification system now is required for acquiring first more
All kinds of bill images as training sample, and for the various specific features of bill engineer such as line segment, angle point, shape,
Texture etc., then these feature vectorizations feeding grader such as SVM is trained.This categorizing system needs to collect a large amount of instructions
Practice sample and great effort engineer's bill feature is spent just to can guarantee the classification performance for training obtained model, therefore there are one
Fixed limitation.In addition, existing bill classification system can support that the bill of classification is that value-added tax, universal machine dismiss the wealth such as ticket mostly
Business special invoice, but can not classify for bills such as the common train ticket of reimbursement system, high guaranteed votes, taxi ticket, plane tickets, do not have
There is versatility.
Chinese patent grant number CN106096667 authorizes a kind of bill classification method based on SVM, and this method needs thing
Characteristic Design, such as official seal extraction and lines detection manually first are carried out to bill, this method is only applicable in a small number of bills, for big
The bill of most no straight lines or seal can not classify, and method is excessively limited to.
China Patent Publication No. CN107633239 discloses a kind of bill classification and bill based on deep learning and OCR
Field extracting method, this method need to obtain seal contour feature first, and need a large amount of seal samples of collecting as depth
The training sample of study, this method are not only not suitable for the bill with most of no seals, but also need a large amount of collection training samples.
Invention content
The present invention is at least one defect overcome described in the above-mentioned prior art, is provided a kind of based on color characteristic and bag of words
The bill images sorting technique of feature, the bill images sorting technique based on color characteristic and bag of words feature can be instructed in minute quantity
Practice on collection, corresponding feature is designed the characteristics of without being directed to every a kind of bill, merely with the image overall color characteristic of generation
And bag of words feature based on SIFT feature and it is sent into SVM classifier training, you can the bill classification device of excellent performance is obtained,
When bill classification is predicted on line, the multiclass classification strategy based on color characteristic and bag of words feature is reused, further promotes classification
Performance rapidly and accurately completes bill classification task.
The technical scheme is that:A kind of bill images sorting technique based on color characteristic and bag of words feature, including
The training of bill classification device and bill Fast Classification two large divisions on line under line,
Bill classification device training part is divided into color feature extracted and bag of words training two large divisions under line:Color-feature module
HSV color spaces are converted into the picture in training set first, and quantization is carried out to H components and generates color histogram, record is every
The domain color of class bill simultaneously preserves deposit hard disk preservation;Bag of words training carries out SIFT spies according to the bill to training set
Sign, which is extracted and carries out K mean cluster, obtains K cluster centre, and carries out characteristic quantification and generate bag of words, and to the bill of every one kind
Training sample carries out vision word frequency and is counted, and generates such vision word histogram, and generate phase with this histogram
The feature vector answered, the input as feature SVM classifier are trained, and are finally stored in trained model parameter file
Hard disk preserves;
Bill Fast Classification part is firstly the need of being loaded into trained bill classification device model and color classification parameter on line
File is transformed into hsv color space to image first, generates its color histogram and extracts domain color feature, is sentenced with this feature
The domain color of the fixed bill exists and uniquely in existing bill classification, if then direct output category result, if not then entering
Bag of words assorting process;Bag of words assorting process is firstly the need of the SIFT feature for extracting image and generates vision word, and
This feature vector feeding SVM is classified, classification results is obtained, then needs to do two according to color characteristic to this classification results
Secondary judgement, i.e., according to the classification results inquire the corresponding domain color feature of the result whether with the domain color feature phase that previously obtained
Together, if then output category result, the presentation class result mistake if different, classification fail.
Further, the described bill sample for training only need every class only to need several can to train performance excellent
Elegant bill classification model, without largely collecting training sample.
Further, this disaggregated model takes characteristic of division of the bill domain color feature as bill, i.e., first to image
HSV color spaces are transformed into, the color histogram to image statistics its H components simultaneously extracts master of its maximum component as image
Color characteristic.
Further, this disaggregated model use based on SIFT feature Bag of Words models be trained on line
Classification carries out SIFT feature extraction to training set image first and K mean cluster forms K vision word, and to training
It concentrates the vision word of every bill to carry out statistics and forms bag of words feature, and be sent into SVM classifier and be trained.
Further, bill classification is to take hierarchy classification policy on line, i.e., first with color characteristic on line
One subseries, classify successfully then directly output as a result, if failure after if again with the color characteristic of the prediction result of SVM classifier again
It compares, obtains final classification result.
Bill classification device training module is as follows under line:
(1)Image switchs to HSV color spaces
(2)Generate the image histogram of the H components per a kind of training sample.
(3)Extraction as the domain color feature of such image and preserves extremely per the highest component in a kind of image histogram
File.
(4)Image gray processing
(5)SIFT feature is extracted to every a kind of training sample, generates SIFT feature descriptor.
(6)K mean cluster obtains the K class heart as bag of words
(7)The vision word of the training sample of every one kind is counted successively, obtains vision word feature vector.
(8)Feature vector feeding SVM classifier is trained
(9)Preserve trained sorter model file.
Grader bill classification prediction module is as follows on line:
(1)Bill classification system initialization is loaded into model parameter file and color characteristic file.
(2)HSV color spaces are transformed into image to be sorted.
(3)Generate the color histogram of figure H components.
(4)Extract domain color feature.
(5)Judge whether the domain color feature of the figure is to exist and uniquely in color characteristic file, if then directly defeated
Go out prediction result of classifying, if not thening follow the steps(6).
(6)Image gray processing.
(7)SIFT feature is extracted to every a kind of training sample, generates SIFT feature descriptor.
(8)The vision word of the figure is counted, vision word feature vector is obtained.
(9)SVM carries out classification prediction according to vision word feature vector.
(10)Secondary verification is carried out to the classification results, that is, judge to do result domain color feature whether with(4)Step obtains
The domain color feature arrived is consistent, if then output category result, as not being to classify and fail.
Compared with prior art, advantageous effect is:The present invention can automatically extract color under a small amount of bill training samples
Feature and bag of words feature, without being directed to every class bill feature artificial design features, applicable bill range is very wide, and classification is accurate
Rate is high, classification speed is fast.
Description of the drawings
Fig. 1 indicates the algorithm frame of the present invention.
Fig. 2 indicates the decision process classified on line when predicting.
Specific implementation mode
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;It is attached in order to more preferably illustrate the present embodiment
Scheme certain components to have omission, zoom in or out, does not represent the size of actual product;To those skilled in the art,
The omitting of some known structures and their instructions in the attached drawings are understandable.Being given for example only property of position relationship described in attached drawing
Illustrate, should not be understood as the limitation to this patent.
As shown in Figure 1, our programs are divided under line two big mould of bill Fast Classification on bill classification model training and line
Block.The specific steps of disaggregated model training include under line:(1)It collects per a kind of bill images, because this method is only for instruction
The number for practicing sample requires image that is very low, therefore only needing to select every a kind of bill several high quality;(2)Every is schemed
As being all transformed into HSV space;(3)Because chrominance component can most reflect direct feel of the people to color, therefore only to H to chrominance component
H is quantified, and is divided into 6 sections for an interval with 60 degree, the color histogram of H components is generated with this;(3)Extraction interval is most
Domain color feature of the high color as the bill is preserved to file;(3)To sample image gray processing;(4)To training sample
Each image carries out SIFT feature extraction, and generates the Feature Descriptor of 128 dimensions of each characteristic point, preserves;(5)
It is right(4)In obtained all feature description subvectors carry out K-Means clusters, class calculation mesh is selected as 1000, i.e., cluster is completed
After will obtain 1000 vision words;(6)The frequency of its vision word is counted to the bill of every one kind, per a kind of generation data
Word feature vector;(7)With(6)Obtained feature is sent into SVM classifier and is trained as input, and svm classifier mould is obtained
Type, and model parameter deposit file is preserved.
The categorised decision process of bill Fast Classification module is as shown in Fig. 2, be as follows on line:(1)System carries out
Initialization, be loaded into trained sorter model, by color characteristic file as and be stored in color_pool arrays;(2)
Image to be classified is transformed into HSV color spaces;(3)Generate color histogram;(4)Extract domain color component;(5)Check the figure
Domain color whether there is in color_pool arrays and uniquely, if then indicating to exist in bill classification has this kind of color
Invoice and only one kind, therefore can be determined that the invoice with this kind of color is exactly to belong to the category, can direct output category
As a result, if not then entering in next step;(6)To sample image gray processing;(7)SIFT is carried out to each image of training sample
Feature point extraction, and the Feature Descriptor of 128 dimensions of each characteristic point is generated, it preserves;(8)To the vision word of the figure into
Row statistics, obtains vision word feature vector;(9)Divided according to vision word feature vector using trained SVM
Class is predicted, result of presorting is obtained;(10)Secondary verification is carried out to the classification results, that is, judge such domain color whether with
(4)Whether obtained domain color is consistent, the output category result if consistent, if inconsistent output category fails, by the word
The verification of bag feature and color characteristic, can be with significant increase classification accuracy.
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair
The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description
To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this
All any modification, equivalent and improvement etc., should be included in the claims in the present invention made by within the spirit and principle of invention
Protection domain within.
Claims (5)
1. a kind of bill images sorting technique based on color characteristic and bag of words feature, which is characterized in that including bill under line point
Bill Fast Classification two large divisions in the training of class device and line,
Bill classification device training part is divided into color feature extracted and bag of words training two large divisions under line:Color-feature module
HSV color spaces are converted into the picture in training set first, and quantization is carried out to H components and generates color histogram, record is every
The domain color of class bill simultaneously preserves deposit hard disk preservation;Bag of words training carries out SIFT spies according to the bill to training set
Sign, which is extracted and carries out K mean cluster, obtains K cluster centre, and carries out characteristic quantification and generate bag of words, and to the bill of every one kind
Training sample carries out vision word frequency and is counted, and generates such vision word histogram, and generate phase with this histogram
The feature vector answered, the input as feature SVM classifier are trained, and are finally stored in trained model parameter file
Hard disk preserves;
Bill Fast Classification part is firstly the need of being loaded into trained bill classification device model and color classification parameter on line
File is transformed into hsv color space to image first, generates its color histogram and extracts domain color feature, is sentenced with this feature
The domain color of the fixed bill exists and uniquely in existing bill classification, if then direct output category result, if not then entering
Bag of words assorting process;Bag of words assorting process is firstly the need of the SIFT feature for extracting image and generates vision word, and
This feature vector feeding SVM is classified, classification results is obtained, then needs to do two according to color characteristic to this classification results
Secondary judgement, i.e., according to the classification results inquire the corresponding domain color feature of the result whether with the domain color feature phase that previously obtained
Together, if then output category result, the presentation class result mistake if different, classification fail.
2. a kind of bill images sorting technique based on color characteristic and bag of words feature according to claim 1, feature
It is:The bill sample for training only needs every class only to need the several bill classifications that can train excellent performance
Model, without largely collecting training sample.
3. a kind of bill images sorting technique based on color characteristic and bag of words feature according to claim 1, feature
It is:This disaggregated model takes characteristic of division of the bill domain color feature as bill, i.e., is first transformed into HSV colors to image
Space, color histogram to image statistics its H components simultaneously extract domain color feature of its maximum component as image.
4. a kind of bill images sorting technique based on color characteristic and bag of words feature according to claim 1, feature
It is:This disaggregated model uses to be trained based on SIFT feature Bag of Words models classifies on line, i.e., first
SIFT feature extraction is carried out to training set image and K mean cluster forms K vision word, and to every bill in training set
Vision word carry out statistics and form bag of words feature, and be sent into SVM classifier and be trained.
5. a kind of bill images sorting technique based on color characteristic and bag of words feature according to claim 1, feature
It is:Bill classification is to take hierarchy classification policy on line, i.e., first with color characteristic the first subseries on line, classification
It is successful then directly output as a result, if failure after if compared again with the color characteristic of the prediction result of SVM classifier again, obtain most
Whole classification results.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810434070.6A CN108764302B (en) | 2018-05-08 | 2018-05-08 | Bill image classification method based on color features and bag-of-words features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810434070.6A CN108764302B (en) | 2018-05-08 | 2018-05-08 | Bill image classification method based on color features and bag-of-words features |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108764302A true CN108764302A (en) | 2018-11-06 |
CN108764302B CN108764302B (en) | 2021-09-28 |
Family
ID=64009283
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810434070.6A Active CN108764302B (en) | 2018-05-08 | 2018-05-08 | Bill image classification method based on color features and bag-of-words features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108764302B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109670513A (en) * | 2018-11-27 | 2019-04-23 | 西安交通大学 | A kind of piston attitude detecting method based on bag of words and support vector machines |
CN111160373A (en) * | 2019-12-30 | 2020-05-15 | 重庆邮电大学 | Method for extracting, detecting and classifying defect image features of variable speed drum parts |
CN111652309A (en) * | 2020-05-29 | 2020-09-11 | 刘秀萍 | Visual word and phrase co-driven bag-of-words model picture classification method |
CN112613563A (en) * | 2020-12-25 | 2021-04-06 | 福建福清核电有限公司 | Nuclear power field equipment image classification method based on OpenCV |
CN112907534A (en) * | 2021-02-18 | 2021-06-04 | 哈尔滨市科佳通用机电股份有限公司 | Fault detection method and device based on door closing part position image |
CN112966715A (en) * | 2021-02-02 | 2021-06-15 | 哈尔滨商业大学 | Commodity image feature description method based on multi-scale visual word bag model |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102622607A (en) * | 2012-02-24 | 2012-08-01 | 河海大学 | Remote sensing image classification method based on multi-feature fusion |
CN106022364A (en) * | 2016-05-13 | 2016-10-12 | 邓昌顺 | Novel note classifying method |
CN106203448A (en) * | 2016-07-08 | 2016-12-07 | 南京信息工程大学 | A kind of scene classification method based on Nonlinear Scale Space Theory |
-
2018
- 2018-05-08 CN CN201810434070.6A patent/CN108764302B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102622607A (en) * | 2012-02-24 | 2012-08-01 | 河海大学 | Remote sensing image classification method based on multi-feature fusion |
CN106022364A (en) * | 2016-05-13 | 2016-10-12 | 邓昌顺 | Novel note classifying method |
CN106203448A (en) * | 2016-07-08 | 2016-12-07 | 南京信息工程大学 | A kind of scene classification method based on Nonlinear Scale Space Theory |
Non-Patent Citations (5)
Title |
---|
LEI WANG ET AL.: "Area Determination of Diabetic Foot Ulcer Images Using a Cascaded Two-Stage SVM-Based Classification", 《IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING》 * |
ZHUOJIA LIANG ET AL.: "Salient object detection based on regions", 《MULTIMED TOOLS APPL》 * |
殷绪成 等: "层次型金融票据图像分类方法", 《中文信息学报》 * |
王亚如: "基于HSV颜色空间和SIFT特征的近似图像检索", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
邓亚芳: "基于决策树的瓷砖图像分类方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109670513A (en) * | 2018-11-27 | 2019-04-23 | 西安交通大学 | A kind of piston attitude detecting method based on bag of words and support vector machines |
CN111160373A (en) * | 2019-12-30 | 2020-05-15 | 重庆邮电大学 | Method for extracting, detecting and classifying defect image features of variable speed drum parts |
CN111652309A (en) * | 2020-05-29 | 2020-09-11 | 刘秀萍 | Visual word and phrase co-driven bag-of-words model picture classification method |
CN112613563A (en) * | 2020-12-25 | 2021-04-06 | 福建福清核电有限公司 | Nuclear power field equipment image classification method based on OpenCV |
CN112966715A (en) * | 2021-02-02 | 2021-06-15 | 哈尔滨商业大学 | Commodity image feature description method based on multi-scale visual word bag model |
CN112966715B (en) * | 2021-02-02 | 2021-09-07 | 哈尔滨商业大学 | Commodity image feature description method based on multi-scale visual word bag model |
CN112907534A (en) * | 2021-02-18 | 2021-06-04 | 哈尔滨市科佳通用机电股份有限公司 | Fault detection method and device based on door closing part position image |
Also Published As
Publication number | Publication date |
---|---|
CN108764302B (en) | 2021-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109344736B (en) | Static image crowd counting method based on joint learning | |
CN108764302A (en) | A kind of bill images sorting technique based on color characteristic and bag of words feature | |
CN110598800A (en) | Garbage classification and identification method based on artificial intelligence | |
CN104268599B (en) | Intelligent unlicensed vehicle finding method based on vehicle track temporal-spatial characteristic analysis | |
CN102346847B (en) | License plate character recognizing method of support vector machine | |
CN108388927A (en) | Small sample polarization SAR terrain classification method based on the twin network of depth convolution | |
CN109952614A (en) | The categorizing system and method for biomone | |
CN108171184A (en) | Method for distinguishing is known based on Siamese networks again for pedestrian | |
CN105574550A (en) | Vehicle identification method and device | |
CN108197538A (en) | A kind of bayonet vehicle searching system and method based on local feature and deep learning | |
CN106610969A (en) | Multimodal information-based video content auditing system and method | |
CN105426903A (en) | Cloud determination method and system for remote sensing satellite images | |
CN109993201A (en) | A kind of image processing method, device and readable storage medium storing program for executing | |
Raj et al. | Helmet violation processing using deep learning | |
CN109410184B (en) | Live broadcast pornographic image detection method based on dense confrontation network semi-supervised learning | |
CN111046886A (en) | Automatic identification method, device and equipment for number plate and computer readable storage medium | |
CN113963147B (en) | Key information extraction method and system based on semantic segmentation | |
CN106156777A (en) | Textual image detection method and device | |
CN110008899B (en) | Method for extracting and classifying candidate targets of visible light remote sensing image | |
CN112990282B (en) | Classification method and device for fine-granularity small sample images | |
CN106203539A (en) | The method and apparatus identifying container number | |
CN114937179B (en) | Junk image classification method and device, electronic equipment and storage medium | |
Chandran et al. | Missing child identification system using deep learning and multiclass SVM | |
CN102129568A (en) | Method for detecting image-based spam email by utilizing improved gauss hybrid model classifier | |
CN106033443B (en) | A kind of expanding query method and device in vehicle retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |