CN108764302A

CN108764302A - A kind of bill images sorting technique based on color characteristic and bag of words feature

Info

Publication number: CN108764302A
Application number: CN201810434070.6A
Authority: CN
Inventors: 李浚时; 李文军; 陈龙
Original assignee: National Sun Yat Sen University
Current assignee: Sun Yat Sen University; National Sun Yat Sen University
Priority date: 2018-05-08
Filing date: 2018-05-08
Publication date: 2018-11-06
Anticipated expiration: 2038-05-08
Also published as: CN108764302B

Abstract

The present invention relates to the technical fields of image, more particularly, to a kind of bill images sorting technique based on color characteristic and bag of words feature.Present invention utilizes the thinkings of Bag of Words classical in computer vision, i.e. first to extracting the SIFT feature of every bill in training sample and generating 128 dimensional feature descriptors, then it carries out K mean cluster and obtains K vision word, and its vision word occurrence number is counted to every a kind of bill and forms such vision word histogram as feature, it finally incorporates color characteristic and forms total feature vector, it is sent into SVM classifier to be trained, obtains bill classification device model.Because bag of words do not use the color characteristic of bill images, therefore this method adds the global domain color feature of image, further promotes the performance of bill classification device.The present invention only needs minute quantity training sample and can train bill classification device model without engineer's additional features, and grader classification speed is fast, and accuracy rate is high.

Description

A kind of bill images sorting technique based on color characteristic and bag of words feature

Technical field

The present invention relates to the technical fields of image, more particularly, to a kind of ticket based on color characteristic and bag of words feature According to image classification method.

Background technology

In traditional bill management, often relies on and manually classify to bill, because often number is huge for bill to be sorted Greatly, so needing that a large amount of manpower and materials is spent to go to complete, therefore bill automatic classification system comes into being, and is with machine vision Technical background solves this kind of classification work simply repeated.Bill automatic classification system now is required for acquiring first more All kinds of bill images as training sample, and for the various specific features of bill engineer such as line segment, angle point, shape, Texture etc., then these feature vectorizations feeding grader such as SVM is trained.This categorizing system needs to collect a large amount of instructions Practice sample and great effort engineer's bill feature is spent just to can guarantee the classification performance for training obtained model, therefore there are one Fixed limitation.In addition, existing bill classification system can support that the bill of classification is that value-added tax, universal machine dismiss the wealth such as ticket mostly Business special invoice, but can not classify for bills such as the common train ticket of reimbursement system, high guaranteed votes, taxi ticket, plane tickets, do not have There is versatility.

Chinese patent grant number CN106096667 authorizes a kind of bill classification method based on SVM, and this method needs thing Characteristic Design, such as official seal extraction and lines detection manually first are carried out to bill, this method is only applicable in a small number of bills, for big The bill of most no straight lines or seal can not classify, and method is excessively limited to.

China Patent Publication No. CN107633239 discloses a kind of bill classification and bill based on deep learning and OCR Field extracting method, this method need to obtain seal contour feature first, and need a large amount of seal samples of collecting as depth The training sample of study, this method are not only not suitable for the bill with most of no seals, but also need a large amount of collection training samples.

Invention content

The present invention is at least one defect overcome described in the above-mentioned prior art, is provided a kind of based on color characteristic and bag of words The bill images sorting technique of feature, the bill images sorting technique based on color characteristic and bag of words feature can be instructed in minute quantity Practice on collection, corresponding feature is designed the characteristics of without being directed to every a kind of bill, merely with the image overall color characteristic of generation And bag of words feature based on SIFT feature and it is sent into SVM classifier training, you can the bill classification device of excellent performance is obtained, When bill classification is predicted on line, the multiclass classification strategy based on color characteristic and bag of words feature is reused, further promotes classification Performance rapidly and accurately completes bill classification task.

The technical scheme is that：A kind of bill images sorting technique based on color characteristic and bag of words feature, including The training of bill classification device and bill Fast Classification two large divisions on line under line,

Bill classification device training part is divided into color feature extracted and bag of words training two large divisions under line：Color-feature module HSV color spaces are converted into the picture in training set first, and quantization is carried out to H components and generates color histogram, record is every The domain color of class bill simultaneously preserves deposit hard disk preservation；Bag of words training carries out SIFT spies according to the bill to training set Sign, which is extracted and carries out K mean cluster, obtains K cluster centre, and carries out characteristic quantification and generate bag of words, and to the bill of every one kind Training sample carries out vision word frequency and is counted, and generates such vision word histogram, and generate phase with this histogram The feature vector answered, the input as feature SVM classifier are trained, and are finally stored in trained model parameter file Hard disk preserves；

Bill Fast Classification part is firstly the need of being loaded into trained bill classification device model and color classification parameter on line File is transformed into hsv color space to image first, generates its color histogram and extracts domain color feature, is sentenced with this feature The domain color of the fixed bill exists and uniquely in existing bill classification, if then direct output category result, if not then entering Bag of words assorting process；Bag of words assorting process is firstly the need of the SIFT feature for extracting image and generates vision word, and This feature vector feeding SVM is classified, classification results is obtained, then needs to do two according to color characteristic to this classification results Secondary judgement, i.e., according to the classification results inquire the corresponding domain color feature of the result whether with the domain color feature phase that previously obtained Together, if then output category result, the presentation class result mistake if different, classification fail.

Further, the described bill sample for training only need every class only to need several can to train performance excellent Elegant bill classification model, without largely collecting training sample.

Further, this disaggregated model takes characteristic of division of the bill domain color feature as bill, i.e., first to image HSV color spaces are transformed into, the color histogram to image statistics its H components simultaneously extracts master of its maximum component as image Color characteristic.

Further, this disaggregated model use based on SIFT feature Bag of Words models be trained on line Classification carries out SIFT feature extraction to training set image first and K mean cluster forms K vision word, and to training It concentrates the vision word of every bill to carry out statistics and forms bag of words feature, and be sent into SVM classifier and be trained.

Further, bill classification is to take hierarchy classification policy on line, i.e., first with color characteristic on line One subseries, classify successfully then directly output as a result, if failure after if again with the color characteristic of the prediction result of SVM classifier again It compares, obtains final classification result.

Bill classification device training module is as follows under line：

（1）Image switchs to HSV color spaces

（2）Generate the image histogram of the H components per a kind of training sample.

（3）Extraction as the domain color feature of such image and preserves extremely per the highest component in a kind of image histogram File.

（4）Image gray processing

（5）SIFT feature is extracted to every a kind of training sample, generates SIFT feature descriptor.

（6）K mean cluster obtains the K class heart as bag of words

（7）The vision word of the training sample of every one kind is counted successively, obtains vision word feature vector.

（8）Feature vector feeding SVM classifier is trained

（9）Preserve trained sorter model file.

Grader bill classification prediction module is as follows on line：

（1）Bill classification system initialization is loaded into model parameter file and color characteristic file.

（2）HSV color spaces are transformed into image to be sorted.

（3）Generate the color histogram of figure H components.

（4）Extract domain color feature.

（5）Judge whether the domain color feature of the figure is to exist and uniquely in color characteristic file, if then directly defeated Go out prediction result of classifying, if not thening follow the steps（6）.

（6）Image gray processing.

（7）SIFT feature is extracted to every a kind of training sample, generates SIFT feature descriptor.

（8）The vision word of the figure is counted, vision word feature vector is obtained.

（9）SVM carries out classification prediction according to vision word feature vector.

（10）Secondary verification is carried out to the classification results, that is, judge to do result domain color feature whether with（4）Step obtains The domain color feature arrived is consistent, if then output category result, as not being to classify and fail.

Compared with prior art, advantageous effect is：The present invention can automatically extract color under a small amount of bill training samples Feature and bag of words feature, without being directed to every class bill feature artificial design features, applicable bill range is very wide, and classification is accurate Rate is high, classification speed is fast.

Description of the drawings

Fig. 1 indicates the algorithm frame of the present invention.

Fig. 2 indicates the decision process classified on line when predicting.

Specific implementation mode

The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent；It is attached in order to more preferably illustrate the present embodiment Scheme certain components to have omission, zoom in or out, does not represent the size of actual product；To those skilled in the art, The omitting of some known structures and their instructions in the attached drawings are understandable.Being given for example only property of position relationship described in attached drawing Illustrate, should not be understood as the limitation to this patent.

As shown in Figure 1, our programs are divided under line two big mould of bill Fast Classification on bill classification model training and line Block.The specific steps of disaggregated model training include under line：（1）It collects per a kind of bill images, because this method is only for instruction The number for practicing sample requires image that is very low, therefore only needing to select every a kind of bill several high quality；（2）Every is schemed As being all transformed into HSV space；（3）Because chrominance component can most reflect direct feel of the people to color, therefore only to H to chrominance component H is quantified, and is divided into 6 sections for an interval with 60 degree, the color histogram of H components is generated with this；（3）Extraction interval is most Domain color feature of the high color as the bill is preserved to file；（3）To sample image gray processing；（4）To training sample Each image carries out SIFT feature extraction, and generates the Feature Descriptor of 128 dimensions of each characteristic point, preserves；（5） It is right（4）In obtained all feature description subvectors carry out K-Means clusters, class calculation mesh is selected as 1000, i.e., cluster is completed After will obtain 1000 vision words；（6）The frequency of its vision word is counted to the bill of every one kind, per a kind of generation data Word feature vector；（7）With（6）Obtained feature is sent into SVM classifier and is trained as input, and svm classifier mould is obtained Type, and model parameter deposit file is preserved.

The categorised decision process of bill Fast Classification module is as shown in Fig. 2, be as follows on line：（1）System carries out Initialization, be loaded into trained sorter model, by color characteristic file as and be stored in color_pool arrays；（2） Image to be classified is transformed into HSV color spaces；（3）Generate color histogram；（4）Extract domain color component；（5）Check the figure Domain color whether there is in color_pool arrays and uniquely, if then indicating to exist in bill classification has this kind of color Invoice and only one kind, therefore can be determined that the invoice with this kind of color is exactly to belong to the category, can direct output category As a result, if not then entering in next step；（6）To sample image gray processing；（7）SIFT is carried out to each image of training sample Feature point extraction, and the Feature Descriptor of 128 dimensions of each characteristic point is generated, it preserves；（8）To the vision word of the figure into Row statistics, obtains vision word feature vector；（9）Divided according to vision word feature vector using trained SVM Class is predicted, result of presorting is obtained；（10）Secondary verification is carried out to the classification results, that is, judge such domain color whether with （4）Whether obtained domain color is consistent, the output category result if consistent, if inconsistent output category fails, by the word The verification of bag feature and color characteristic, can be with significant increase classification accuracy.

Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this All any modification, equivalent and improvement etc., should be included in the claims in the present invention made by within the spirit and principle of invention Protection domain within.

Claims

1. a kind of bill images sorting technique based on color characteristic and bag of words feature, which is characterized in that including bill under line point Bill Fast Classification two large divisions in the training of class device and line,

2. a kind of bill images sorting technique based on color characteristic and bag of words feature according to claim 1, feature It is：The bill sample for training only needs every class only to need the several bill classifications that can train excellent performance Model, without largely collecting training sample.

3. a kind of bill images sorting technique based on color characteristic and bag of words feature according to claim 1, feature It is：This disaggregated model takes characteristic of division of the bill domain color feature as bill, i.e., is first transformed into HSV colors to image Space, color histogram to image statistics its H components simultaneously extract domain color feature of its maximum component as image.

4. a kind of bill images sorting technique based on color characteristic and bag of words feature according to claim 1, feature It is：This disaggregated model uses to be trained based on SIFT feature Bag of Words models classifies on line, i.e., first SIFT feature extraction is carried out to training set image and K mean cluster forms K vision word, and to every bill in training set Vision word carry out statistics and form bag of words feature, and be sent into SVM classifier and be trained.

5. a kind of bill images sorting technique based on color characteristic and bag of words feature according to claim 1, feature It is：Bill classification is to take hierarchy classification policy on line, i.e., first with color characteristic the first subseries on line, classification It is successful then directly output as a result, if failure after if compared again with the color characteristic of the prediction result of SVM classifier again, obtain most Whole classification results.