CN107195069A - A kind of RMB crown word number automatic identifying method - Google Patents

A kind of RMB crown word number automatic identifying method Download PDF

Info

Publication number
CN107195069A
CN107195069A CN201710509012.0A CN201710509012A CN107195069A CN 107195069 A CN107195069 A CN 107195069A CN 201710509012 A CN201710509012 A CN 201710509012A CN 107195069 A CN107195069 A CN 107195069A
Authority
CN
China
Prior art keywords
character
rmb
word number
crown word
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710509012.0A
Other languages
Chinese (zh)
Inventor
尹建伟
赵景晨
岑超
邓水光
李莹
吴健
吴朝晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201710509012.0A priority Critical patent/CN107195069A/en
Publication of CN107195069A publication Critical patent/CN107195069A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07DHANDLING OF COINS OR VALUABLE PAPERS, e.g. TESTING, SORTING BY DENOMINATIONS, COUNTING, DISPENSING, CHANGING OR DEPOSITING
    • G07D7/00Testing specially adapted to determine the identity or genuineness of valuable papers or for segregating those which are unacceptable, e.g. banknotes that are alien to a currency
    • G07D7/20Testing patterns thereon
    • G07D7/2008Testing patterns thereon using pre-processing, e.g. de-blurring, averaging, normalisation or rotation
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07DHANDLING OF COINS OR VALUABLE PAPERS, e.g. TESTING, SORTING BY DENOMINATIONS, COUNTING, DISPENSING, CHANGING OR DEPOSITING
    • G07D7/00Testing specially adapted to determine the identity or genuineness of valuable papers or for segregating those which are unacceptable, e.g. banknotes that are alien to a currency
    • G07D7/20Testing patterns thereon
    • G07D7/2016Testing patterns thereon using feature extraction, e.g. segmentation, edge detection or Hough-transformation

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a kind of RMB crown word number automatic identifying method, it is by the heightened awareness that is laid out to RMB picture structure, and design, which is realized, a set of can accurately identify the algorithm at RMB edge, direction and crown word number region.The present invention is realized and well-designed process step by efficient algorithm, by the function of ripe OCR engine, greatly improves the speed of crown word number identification;Pass through the statistical analysis to OCR engine recognition effect science and the understanding to crown word number pattern, comprehensively utilize the advantage of multiple engines, ripe OCR engine of increasing income successfully is recognized into this specific area applied to crown word number, and ensure that crown word number identification accuracy rate there is provided high recognition speed.

Description

A kind of RMB crown word number automatic identifying method
Technical field
The invention belongs to financial OCR (Optical Character Recognition, optical character identification) technology neck A kind of domain, and in particular to RMB crown word number automatic identifying method.
Background technology
OCR is that word and character are scanned using optical technology, obtains the image information of word and character, using each Plant algorithm for pattern recognition to analyze word morphological feature, obtain the process of word and space of a whole page characteristic information.With OCR technique It is gradually ripe, OCR technique starts to be applied to every field, such as certificate identification, Car license recognition, bank slip recognition, bank card recognizes, Document identification etc., has all been formed in numerous industries such as bank, insurance, finance, logistics, audit, the tax, customs, public security, frontier inspection Ripe OCR products.The application of OCR technique reduces the configuration of equipment, reduces human cost, improves operating efficiency.
With the development of China's economic, the monitoring of RMB and management work face the increasing pressure, and RMB Monitoring and management key be the management of crown word number.Serial number is to be used to count banknote printing quantity, mark banknote unique Property a kind of symbol, be made up of prefix and number two parts, represent banknote printing quantity.RMB typically carries out " a ticket one Number ", serial number represents the unique identity of every banknote, into the identity card of every RMB.Utilize serial number Uniqueness, during automatic teller machine paper money, the collection of the serial numbers of these bank note, record, it is possible to carry out Inquiry and statistics.Therefore, when there is counterfeit money dispute, as long as checking the crown word number of banknote when merchandising in management platform Code, it is possible to prove whether counterfeit money comes from ATM.With the extensive use of OCR, RMB prefix Number identification and follow the trail of and have become that financial field is a kind of to prevent the important means of economic crime.
Require that the ATM and cash recycling system of bank must realize RMB crown word number according to operating management portion of Central Bank Writing function.100 yuan of denomination RMB prefixs that current ATM in bank, withdrawal circulation all-in-one machine, financial institution's sales counter are paid Number be able to must all be inquired about.
In this context, how to realize that a kind of efficient, RMB crown word number identification method of high accuracy seems particularly It is important.Although traditional crown word number identification method accuracy rate is still, recognition speed is slower, for spot and similar character Problem can not also be solved very well.
The content of the invention
In view of it is above-mentioned, the invention provides a kind of RMB crown word number automatic identifying method, can accurately and efficiently it utilize The RMB image that cash inspecting machine is shot, realizes the automatic identification of RMB crown word number.
A kind of RMB crown word number automatic identifying method, comprises the following steps:
(1) gray level image of RMB is gathered first, and then bank note limb recognition is carried out to the gray level image obtains the people The quadrangular configuration of coin;
(2) quadrangular configuration of RMB is cut and corrected based on perspective transform, to obtain the rectangle of RMB Image;
(3) direction of RMB rectangular image is identified and corrected based on distribution of color situation, just to be faced On RMB rectangular image, then directly abandoned for the RMB rectangular image of reverse side;
(4) crown word number zone boundary detection is carried out to face-up RMB rectangular image, obtains being known to position Other crown word number region;
(5) crown word number area image is pre-processed, successively including image binaryzation, Character segmentation, histogram stretching And connected component analysis, obtain each character picture of crown word number;
(6) instructed for RMB prefix sign character using the OCR engine Tesseract increased income the training tools provided Practice, obtain the proprietary engine for recognizing prefix sign character;
(7) the primary engine pair that identification Latin character is carried in proprietary engine and the OCR engine Tesseract is utilized Each character picture obtained in step (5) is identified, to obtain candidate character list corresponding to each character picture and each The confidence level of candidate characters;
(8) it is that each character picture chooses one from candidate character list according to the characteristics of combination and confidence level of crown word number Character is used as its recognition result.
Further, bank note limb recognition is carried out to the gray level image of RMB in the step (1), detailed process is such as Under:
The gray level image of 1.1 pairs of RMB carries out median filter process;
Filtered gray level image is carried out expansion process by 1.2;
Gray level image after expansion is carried out binary conversion treatment by 1.3;
1.4 pairs of binary images use Suzuki85 algorithms [Suzuki, S.and Abe, K., Topological Structural Analysis of Digitized Binary Images by Border Following.CVGIP 301, Pp 32-46 (1985)] outline identification is carried out, obtain the maximum exterior contour of area in image;
1.5 that quadrangle is carried out to the exterior contour using Douglas-Pu Ke algorithms is approximate or detect its minimum external square Shape, so as to obtain the quadrangular configuration of RMB.
Further, crown word number zone boundary detection, detailed process are carried out to RMB rectangular image in the step (4) It is as follows:
4.1 carry out resampling according to 465 × 231 size to RMB rectangular image obtains its thumbnail;
The 5th arranges the left upper apex that the 168th row pixel is crown word number region in 4.2 selected thumbnails, using the summit as base It is ROI (region of interest, area-of-interest) that standard, which selectes a height of 33 a width of 116 region,;
4.3 couples of selected ROI enter column hisgram stretching and binary conversion treatment;
4.4 are based on default ranks black-white point proportion threshold value, and the border up and down of crown word number in ROI after binaryzation is entered The detection of row two-wheeled is approached, and removes the white blank frame crown word number region to be recognized to finally give.
Further, the step (7) to implement process as follows:
7.1, for any character picture, are identified to it using proprietary engine and obtain candidate character list and each candidate The confidence level of character;
7.2 compare the confidence level of each candidate characters, if the maximum candidate characters of confidence level are B, Z, 0,4,8 or G, perform Step 7.3;Otherwise, directly the candidate character list and each candidate characters confidence level are regard as final output result;
7.3 the character picture are identified using primary engine the confidence for obtaining candidate character list and each candidate characters Degree;
7.4 recognize obtained candidate character list as final output result using proprietary engine, on wherein each candidate word The confidence level of symbol:For appearing in the character in two engine candidate character lists simultaneously, then calculated by two engines Confidence level weighting determine after final output;For the character only occurred in proprietary engine candidate character list, then with proprietary The confidence level that engine is calculated is final output.
Two engines are true by the method for logistic regression for the weight coefficient of kinds of characters confidence level in the step 7.4 It is fixed.
Further, the step (8) to implement process as follows:
8.1 for the 1st character picture in crown word number, and confidence level highest is selected from its corresponding candidate character list Letter be used as recognition result;
8.2 for rear 6 character pictures in crown word number, and confidence level is selected most from each self-corresponding candidate character list High numeral is used as recognition result;
8.3 for the 2nd~4 character picture in crown word number, and traversal is all to come from respective candidate character list and comprising 1 Individual alphabetical 2 digital character combinations, selection character confidence level sum highest character combination is used as recognition result.
It is the character picture that T or J and confidence level are less than 85% preferably for recognition result in step (8), intercepts the word Accord with lower 70% part of image and it is identified using primary engine:If recognition result be J, judge the character picture as J;If recognition result is I or 1, judge the character picture as T.
The advantageous effects of the present invention are as follows:
(1) introduced in the present invention based on image pre-processing method combination and Suzuki85 outline identification algorithms, can be with Bank note main body is extracted in the banknote image captured from cash inspecting machine, and remedial frames is distorted.
(2) present invention in introduce based on pixel characteristic rule crown word number zone boundary detection method, can at a high speed and Crown word number region is effectively extracted from banknote image;In addition the present invention also fully utilizes a series of efficient image preprocessings Method, is cleared up crown word number region, the influence of the noise to identification such as maximized removal spot, equipment light leak.
(3) present invention utilizes the ripe OCR engine increased income, it is used for RMB crown word number by training and recognizes scene, On the premise of training data quality and quantity all have difficulties, recognition result can be weighted by comprehensively utilizing multi engine, and Rationally effective error correction scheme, recognition accuracy is lifted to more than 99%.
(4) present invention is while recognition result accuracy rate is ensured, on the one hand using efficient algorithm, optimizes realisation Can, the step of on the one hand simplifying image procossing reduces amount of calculation using thumbnail and coarse positioning.Model 3B are sent in raspberry (the core ARMv8CPU of 1.2GHz tetra-) environmental testing, recognition speed can reach more than 1000/point.
Brief description of the drawings
Fig. 1 realizes schematic diagram for the system of the inventive method.
Fig. 2 is bank note limb recognition of the present invention and the schematic flow sheet of perspective transform.
Fig. 3 is the schematic flow sheet of crown word number zone boundary of the present invention detection and identification with double engines.
Embodiment
In order to more specifically describe the present invention, below in conjunction with the accompanying drawings and embodiment is to technical scheme It is described in detail.
RMB crown word number automatic identifying method of the present invention is implemented as shown in figure 1, specifically including:
(1) rmb paper currency limb recognition.
Read from cash inspecting machine image-forming module after gray scale picture, medium filtering, expansion, OTSU are carried out to image successively first The pretreatment of binaryzation, then runs outline identification algorithm and obtains bank note profile, and obtained most by the way of quadrangle is approximate Big profile is specific as shown in Figure 2 as final bank note profile:
1.1 medium filtering;Median filter process is carried out to the gray scale picture read, to reduce because image defects are introduced Influence of the noise to outline identification result.Medium filtering is one kind of sort method wave filter, and it faces domain using a pixel In the intermediate value of gray level replace the value of the pixel, i.e.,:
Wherein, (x, y) is current pixel coordinate, and S is contiguous range;In the present embodiment, it is 5 × 5 to take contiguous range.
1.2 expansion;Expansive working is carried out to filtered image, the details in picture is reduced, retains principal character and paper The continuity of coin profile.Expansion is a kind of important morphological images processing method, for two-dimensional integer space Z2In set A And the expansion of B, B to AIt is defined as:
Wherein,For set B reflection, B is a structural elements or core, and A is inflated Set.This formula is the image on its origin with B, and based on z is translated to image.In this implementation In mode, B is 9 × 9 rectangle core.
1.3 binaryzation;For the image after expansion, optimal global threshold is carried out using maximum between-cluster variance (OTSU) algorithm Processing, obtains the binary image of only black and white colour, is used as the input of contour detecting algorithm.The algorithm is using the think of clustered Think, it is assumed that the image includes two class pixels according to bimodulus histogram (foreground pixel and background pixel), calculating can separate two classes Optimal threshold so that their variance within clusters are minimum, or inter-class variance is maximum;Comprise the following steps that:
1.3.1 the normalization histogram of calculating input image;
1.3.2 all possible threshold value t=1...255 is traveled through, inter-class variance is calculated Wherein, ωiFor class probability, μiAll it is by histogram calculation for class average;
1.3.3 threshold value t when inter-class variance is maximumoptFor optimal threshold;
1.3.4 with toptFor global threshold, t will be less thanoptGray-value pixel point be set to gray scale minimum 0, more than topt's Gray-value pixel point is set to gray scale maximum 255, realizes image binaryzation.
1.4 rim detection;Rim detection is carried out to the banknote image after binaryzation, comprised the following steps that:
1.4.1 the method based on the extraction outermost layer profile in Suzuki85 algorithms, finds all in binary image Outermost layer profile;
1.4.2 for each the outermost layer profile found, the area of contoured interior is calculated using green theorem:If closed zone Domain D is surrounded by piecewise smooth simple curve L, and function P (x, y) and Q (x, y) have the continuous partial derivative of single order on D, then have:
Wherein, L is the boundary curve that D takes forward direction;
1.4.3 the wherein maximum profile of internal area is taken, it is near to carry out polygon to profile using Douglas-Pu Ke algorithms Seemingly, wherein approximation coefficient is 1 the percent of profile length;
If 1.4.4 approximately rear is quadrangle, four summits of bank note can be directly obtained;If not being quadrangle after approximate, Then calculate four points nearest apart from minimum enclosed rectangle vertex distance on the minimum enclosed rectangle of profile, contouring and be used as bank note Four summits.
In the present embodiment, tilt and distort because bank note exists in shooting, be not to represent rectangle in picture, But irregular quadrilateral object;Therefore the bank note edge got is with four vertex representations of quadrangle, four got Summit will be used as the calibration point of perspective transform in next step.
(2) image cropping based on perspective transform and correction.
Tilt and distort because bank note exists in shooting, irregular quadrilateral is rendered as in picture, for crown word number The positioning and character recognition in region cause influence, therefore before detection crown word number region, should be first according to bank note edge Testing result carries out the cutting and correction of banknote image.
According to the 4 bank note summits obtained in step 1.4.4, perspective transform is carried out to image;Perspective transform is using saturating Depending on center, picture point, the condition of target point three point on a straight line, by the process of picture projection to a new view plane, its universal transformation Formula is as follows:
Wherein, (u, v) is coordinates of original image coordinates, and image coordinate (x, y) is after conversion:
Both sides maximum is as the width of new images using above and below quadrangle, and quadrangle the right and left maximum is used as new images Height, using the bank note summit obtained in step 1.4.4 successively as four summits of new images, sets up transformation matrix;Perspective transform The RMB image corrected afterwards, is used as the input of subsequent step.
(3) RMB is towards identification and corrects.
In cash inspecting machine note input, RMB image there may be it is just upper, anti-it is upper, just under, anti-lower four kinds of postures.Crown word number is known The picture of just upper position is not needed to use to carry out.For each banknote, cash inspecting machine can gather two images of positive and negative, verso images It is simply discarded in the present embodiment.For direct picture, according to the color characteristics of rmb paper currency, the positive left side of bank note For white portion, left and right sides color distortion is larger, can be using recognizing paper based on the method for gray scale space distribution statisticses Coin direction, specific method is as follows:
Banknote image after 3.1 pairs of corrections enters column hisgram stretching, makes color of image full of in whole grey level range, keeps away Exempt from the aberration caused by distinct device or batch image quality difference;Specifically, to [a, b] interval greyscale transformation is arrived In the range of [0,255], transforming function transformation function is:
Wherein, r is the gray value before conversion, and T (r) is the gray value after conversion;
3.2 pairs of images carry out resamplings, obtain the thumbnail that size is 310 × 154, and calculating is carried out on thumbnail can be with Greatly reduce amount of calculation, lift processing speed;
3.3 be that threshold value carries out binaryzation to thumbnail with gray scale intermediate value 127;
Thumbnail surrounding is respectively cut 10 pixels by 3.4, removes interference of the right side white edge to Color Statistical;
3.5 respectively take 1/3 width regions at left and right sides edge, calculate average color, the higher side of average color For left side;
3.6 according to recognition result, rotates image, and unified bank note is oriented on just.
(4) crown word number zone boundary is detected.
Can accurate match to crown word number region, be the basis that accurately crown word number can be identified;Based on for people The cognition of people's coin pattern, the crown word number region that identification is used is located at image lower-left Angle Position.Due to RMB crown word number area Domain area is small, and image resolution ratio is low, is easily disturbed by contamination spot and noise, and sample size is less, the side of characteristic matching Method be difficult obtain better effects, therefore present embodiment in successively using resampling, coarse positioning, histogram stretching, binaryzation, as The method of the detection of plain level interval and iterative approach detects crown word number region from image, and specific steps are as shown in Figure 3:
4.1 resampling;Resampling is carried out to image similar to step 3.2, in order to keep characteristics of image as far as possible, will be contracted Sketch map is dimensioned to 465 × 231, and operation afterwards is all based on thumbnail progress;
4.2 coarse positioning;Based on the empirical knowledge to crown word number position and picture size ratio, while in order to keep crown word number The 5th arranges the upper left top that the 168th row pixel is crown word number region in the feature difference in region and other regions, selected thumbnail Point, it is ROI to select a height of 33 a width of 116 region on the basis of the summit, and operation afterwards is carried out based on this region.Coarse positioning There is following main advantage:One can reduce the area of identification region, and the interference of reduction noise, spot and irrelevant contents subtracts Few amount of calculation;Two can reduce the complexity of rule setting, boosting algorithm operational efficiency.
4.3 pretreatment;The pretreatment of the following steps is carried out to image successively:
4.3.1 enter column hisgram stretching to thumbnail image, strengthen region contrast;
4.3.2 the image after being stretched to histogram, optimal global threshold is carried out using maximum between-cluster variance (OTSU) algorithm Processing, obtains binary image;
4.3.3 binary image is filtered using a species specific minimum filtering device, wave filter is given by:
The wave filter can remove " isolated point " in image, and these points are typically noise, can be to crown word number zone boundary Identification interfere.
4.4 right boundaries are detected;After pretreatment terminates, the right boundary of crown word number is detected first, specific step It is rapid as follows:
4.4.1 binary image is projected by row, draws the quantity of each row black and white pixel;
4.4.2 since the leftmost side, non-pure white pixel column is run into, and black picture element ratio is when being less than 50%, regards as Left border (rule 1), this rule eliminates left side blank and the black surround because cutting correction error band;
4.4.3 it is at least 85 pixels to assume crown word number peak width, since the rightmost side, runs into continuous three and is classified as all white Stop when color or short of width, continuation is filtered out after all pure white row to the left, regard as right side boundary (rule 2), this rule Eliminate right side decorative pattern, noise and blank.
4.5 up-and-down boundaries are detected;According to the testing result of right boundary, image is cut, then carried out upper following The detection on boundary, is comprised the following steps that:
4.5.1 binary image is projected by row, draws the quantity of every a line black and white pixel;
4.5.2 since top side, when running into row of the white points ratio less than 88%, boundary (rule is regarded as 3);
4.5.3 it is at least 12 pixels to assume crown word number region height, since the pixel of boundary+12, runs into white point Number ratio stops when being more than 88% row, regards as downside border (rule 4).
4.6 right boundaries are secondary to be assert;According to the testing result of up-and-down boundary, image is cut again, obtained just Walk result.On this basis, secondary identification is carried out to right boundary, noise jamming is further removed, comprises the following steps that:
4.6.1 since the leftmost side, run into and there is black pixel point and when black pixel point is not more than the row of 3, assert Arranged for possible noise, if three row are pure white pixel column after the row, regard as noise row, left side is moved to right accordingly Boundary;Otherwise it is crown word number edges of regions, stops detection (rule 5);
4.6.2 since the rightmost side, run into and there is black pixel point and when black pixel point is not more than the row of 3, assert Arranged for possible noise, if three row are pure white pixel column after the row, regard as noise row, right edge is moved to left accordingly Boundary;Otherwise it is crown word number edges of regions, stops detection (rule 6).
4.7 remove white frame;Image is cut according to new right boundary, and removes white frame, final hat is obtained Font size regional location.
(5) crown word number image preprocessing.
Behind successful probe crown word number region, according to the proportionate relationship of original image and thumbnail, extracted on original image Crown word number region, carries out the image preprocessings such as image binaryzation, Character segmentation, histogram stretching, connected component analysis, obtains Image is finally entered used in OCR engine identification.Comprise the following steps that:
5.1 sizes are normalized;According to RMB size characteristic, using 155:77 ratio, all images are all stretched as 2170 × 1078 sizes, now artwork and the ratio of thumbnail are 7:1;According to this ratio and crown word number region in thumbnail Position, intercept out crown word number region in the correspondence position of artwork;In order to correct error caused by resampling, carried out in artwork Surrounding border stretches out the width of 10 pixels respectively during cutting.
5.2 Character segmentation;Because the shade of prefix sign character in RMB image differs, and there is machine reason to cause Dark strokes be superimposed upon on picture, some local noise color may be more darker than the character of another part;If to whole Zhang Guan Font size region picture uses global image processing method, it is easy to the local word segment of influence, but if in small range Interior use local optimum threshold value progress processing can dispose the noise lighter than close text color and spot.Therefore first to prefix Sign character is split, then enters column hisgram stretching to the character picture after each segmentation, can obtain more preferable effect, specifically Step is as follows:
5.2.1 OTSU binaryzations are carried out to whole crown word number area grayscale figure;
5.2.2 binary image is projected by row, counts the number of each row black picture element, statistical result can be presented Peak and low ebb alternating, each low ebb is the blank parts of intercharacter.
5.2.3 crown word number gray scale picture is divided into some by blank parts, Nogata is individually carried out to each part Stretch Tula.
5.3 local histograms stretch and UNICOM's PCA;Column hisgram stretching, root are individually entered to the image of each part All it is capitalization or numerical characteristic according to all characters of crown word number, this represents the part that each character is not broken, all Connection;Therefore connected component analysis can be carried out, removes part discrete in each character picture, reduction noise is to identification Influence;Connected component analysis is carried out to each character picture, removes all discrete connected components, leaves behind most comprising pixel A connected component, i.e., character is in itself.
(6) OCR engine training and crown word number character recognition.
OCR in present embodiment recognizes that the 3.x versions based on the OCR engine tesseract increased income are carried out, and utilizes The training tool that tesseract is provided, the application scenarios recognized for RMB crown word number, is carried out to tesseract engines Re -training, as shown in figure 3, training process is broadly divided into the following steps:
6.1 generation training datas;Select sufficient amount of RMB image pattern composition training set, it is ensured that each in crown word number Character occurs, and the frequency occurred is as close as possible;It regard result of the training set through image preprocessing in step (5) as sample Picture;For all samples, the correct recognition result of handmarking is used.
6.2 generation box files;The character occurred in all images of box file records and size and the position of character, Tesseract training tool, which can be recognized and automatically generated according to existing engine, to be needed after box files, generation by related work Tool, according to handmarking's result editing files and the character righted the wrong.
It 6.3 is trained;The automation tools collection provided by tesseract, is carried out according to samples pictures and box files Automation training, generates training data file, for final character recognition.
Using the training data trained, word is carried out to pretreated picture in step (5) using tesseract engines Symbol identification, obtains preliminary recognition result --- for each character, obtain all candidate character lists and corresponding confidence Number of degrees value, for follow-up intelligent correction.
(7) multi engine recognition result weighting checking.
Due to training data quality and the factor of quantity, and the algorithm errors of OCR engine in itself of increasing income, training result exists Very good effect is not reached in checking data;Improve the quality of data and parameter repeatedly tunes poor controllability, and simultaneously Recognition accuracy can not be effectively lifted, for this problem, present embodiment employs the weighting checking of multi engine recognition result Method, carry Latin character identification data (primary engine) with reference to proprietary training data (proprietary engine) and tesseract, make It is identified with two sets of independent identification engines;According to statistic analysis result, two sets of engines are in the identification for kinds of characters Reliability have different manifestations.
The character of candidate is all classified as two engines, final confidence level is determined using weighting scheme;For only by one The character that engine is identified, is directly used as final confidence level using the confidence level of the engine.
In order to determine weight coefficient, the method that present embodiment make use of logistic regression is calculated for each character Recognition result is in the reliability weight coefficient of two engines, and specific method is as follows:
7.1 for each character candidates, it is assumed that equation is:
hθ(x)=sigmoid (θ1x12x23)
Wherein, x1For the confidence level of proprietary engine, x2For the confidence level of primary engine.
7.2 accordingly, and logistic regression cost equation is:
Wherein, if character candidates i is correct recognition result, y(i)=1, otherwise, y(i)=0.
7.3 take θ during global minimum using gradient descent method calculation cost equation, are used as adding for the final candidate characters Weight parameter.
Because present embodiment requires higher to algorithm performance, all characters are all identified using two engines can be big Big reduction recognition speed, therefore according to statistical result, only to drawing proprietary after subsequent error correction step error correction in practice process Hold up the higher B of error rate, Z, 0,4,8, six characters of G using multi engine weight checking method;When recognition result is this six During character, it is recognized using primary engine;For the candidate word being had in two engines, added using weighting parameters Power, obtains final confidence level.
Up to the present, for each character in picture, a candidate character list, each candidate word can be accessed Symbol has a corresponding confidence level.
(8) intelligent correction based on crown word number pattern.
RMB crown word number has a specific form, the position and the combination of quantity that letter and number occurs be it is fixed, Using this characteristic, we can carry out error correction to recognition result, rejected from candidate character list and do not meet crown word number combination The candidate characters of pattern.
The number of rmb paper currency is all made up of " prefix " and " sequence number " two parts, the RMB prefix circulated at present Have 2 prefixes, 3 prefixes, three kinds of 4 prefix, correspond to 1 respectively, 2 be letter, 1,3 be letter, 1,4 be alphabetical 3 kinds of situations;Always For, crown word number first place must be alphabetical, have and only one letter, remaining all numeral in 2~4.
According to this rule, present embodiment takes the final recognition result of following policy selection:
8.1 are used as recognition result for the 1st character, selection confidence level highest letter;
8.2 are used as recognition result for rear 6 characters, selection confidence level highest numeral;
8.3 for the 2nd~4 character, travels through 1 all letters 2 and counts combinatorics on words, the highest combination of selection confidence level is made For recognition result.
(9) the specialization processing of high similar character.
For most characters, very high discrimination can be just obtained by step (8);Know in itself yet with engine The problem of limitation and data sample size of other method are not enough, proprietary engine and primary engine are frequent for two letters of T, J Obscure, leverage final discrimination;Present embodiment solves this problem by local knowledge method for distinguishing, specific step It is rapid as follows:
If 9.1 recognition results are T or J, and confidence level is less than 85%, performs step 9.2;
9.2 intercept lower 70% part of the character picture, by the setting of primary engine for it identification J, I, 1 three it is alphabetical Pattern, is identified;
If 9.3 recognition results are J, judge the character as J;
If 9.4 recognition results are I or 1, judge the character as T.
So far, identification process all to crown word number is completed.
The above-mentioned description to embodiment is understood that for ease of those skilled in the art and using the present invention. Person skilled in the art obviously can easily make various modifications to above-described embodiment, and described herein general Principle is applied in other embodiment without passing through performing creative labour.Therefore, the invention is not restricted to above-described embodiment, ability Field technique personnel are according to the announcement of the present invention, and the improvement made for the present invention and modification all should be in protection scope of the present invention Within.

Claims (7)

1. a kind of RMB crown word number automatic identifying method, comprises the following steps:
(1) gray level image of RMB is gathered first, and then bank note limb recognition is carried out to the gray level image obtains RMB Quadrangular configuration;
(2) quadrangular configuration of RMB is cut and corrected based on perspective transform, to obtain the rectangular image of RMB;
(3) direction of RMB rectangular image is identified and corrected based on distribution of color situation, it is face-up to obtain RMB rectangular image, is then directly abandoned for the RMB rectangular image of reverse side;
(4) crown word number zone boundary detection is carried out to face-up RMB rectangular image, obtains what is recognized to position Crown word number region;
(5) crown word number area image is pre-processed, successively including image binaryzation, Character segmentation, histogram stretching and Connected component is analyzed, and obtains each character picture of crown word number;
(6) it is trained for RMB prefix sign character using the OCR engine Tesseract increased income the training tools provided, Obtain the proprietary engine for recognizing prefix sign character;
(7) the primary engine for recognizing Latin character using being carried in proprietary engine and the OCR engine Tesseract is to step (5) each character picture obtained in is identified, to obtain the candidate character list corresponding to each character picture and each candidate The confidence level of character;
(8) it is that each character picture chooses a character from candidate character list according to the characteristics of combination and confidence level of crown word number It is used as its recognition result.
2. RMB crown word number automatic identifying method according to claim 1, it is characterised in that:It is right in the step (1) The gray level image of RMB carries out bank note limb recognition, and detailed process is as follows:
The gray level image of 1.1 pairs of RMB carries out median filter process;
Filtered gray level image is carried out expansion process by 1.2;
Gray level image after expansion is carried out binary conversion treatment by 1.3;
1.4 pairs of binary images carry out outline identification using Suzuki85 algorithms, obtain the maximum exterior contour of area in image;
1.5 carry out quadrangle approximately to the exterior contour using Douglas-Pu Ke algorithms or detect its minimum enclosed rectangle, from And obtain the quadrangular configuration of RMB.
3. RMB crown word number automatic identifying method according to claim 1, it is characterised in that:It is right in the step (4) RMB rectangular image carries out crown word number zone boundary detection, and detailed process is as follows:
4.1 carry out resampling according to 465 × 231 size to RMB rectangular image obtains its thumbnail;
The 5th arranges the left upper apex that the 168th row pixel is crown word number region in 4.2 selected thumbnails, is selected on the basis of the summit Fixed a height of 33 a width of 116 region is ROI;
4.3 couples of selected ROI enter column hisgram stretching and binary conversion treatment;
4.4 are based on default ranks black-white point proportion threshold value, and two are carried out to the border up and down of crown word number in ROI after binaryzation Wheel detection is approached, and removes the white blank frame crown word number region to be recognized to finally give.
4. RMB crown word number automatic identifying method according to claim 1, it is characterised in that:The tool of the step (7) Body implementation process is as follows:
7.1, for any character picture, are identified to it using proprietary engine and obtain candidate character list and each candidate characters Confidence level;
7.2 compare the confidence level of each candidate characters, if the maximum candidate characters of confidence level are B, Z, 0,4,8 or G, perform step 7.3;Otherwise, directly the candidate character list and each candidate characters confidence level are regard as final output result;
7.3 the character picture are identified using primary engine the confidence level for obtaining candidate character list and each candidate characters;
7.4 recognize obtained candidate character list as final output result using proprietary engine, on wherein each candidate characters Confidence level:For appearing in the character in two engine candidate character lists simultaneously, then pass through putting that two engines are calculated Final output after reliability weighting is determined;For the character only occurred in proprietary engine candidate character list, then with proprietary engine The confidence level calculated is final output.
5. RMB crown word number automatic identifying method according to claim 4, it is characterised in that:Two in the step 7.4 Individual engine is determined for the weight coefficient of kinds of characters confidence level by the method for logistic regression.
6. RMB crown word number automatic identifying method according to claim 1, it is characterised in that:The tool of the step (8) Body implementation process is as follows:
8.1 for the 1st character picture in crown word number, and confidence level highest word is selected from its corresponding candidate character list Mother is used as recognition result;
8.2 for rear 6 character pictures in crown word number, and confidence level highest is selected from each self-corresponding candidate character list Numeral is used as recognition result;
8.3 for the 2nd~4 character picture in crown word number, and traversal is all to come from respective candidate character list and comprising 1 word Female 2 digital character combinations, selection character confidence level sum highest character combination is used as recognition result.
7. RMB crown word number automatic identifying method according to claim 1, it is characterised in that:For knowing in step (8) Other result is the character picture that T or J and confidence level are less than 85%, intercepts lower 70% part of the character picture and is drawn using primary Hold up and it is identified:If recognition result is J, judge the character picture as J;If recognition result is I or 1, the word is judged Symbol image is T.
CN201710509012.0A 2017-06-28 2017-06-28 A kind of RMB crown word number automatic identifying method Pending CN107195069A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710509012.0A CN107195069A (en) 2017-06-28 2017-06-28 A kind of RMB crown word number automatic identifying method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710509012.0A CN107195069A (en) 2017-06-28 2017-06-28 A kind of RMB crown word number automatic identifying method

Publications (1)

Publication Number Publication Date
CN107195069A true CN107195069A (en) 2017-09-22

Family

ID=59881536

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710509012.0A Pending CN107195069A (en) 2017-06-28 2017-06-28 A kind of RMB crown word number automatic identifying method

Country Status (1)

Country Link
CN (1) CN107195069A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107392260A (en) * 2017-06-08 2017-11-24 中国民生银行股份有限公司 The wrong scaling method and device of a kind of character identification result
CN108091033A (en) * 2017-12-26 2018-05-29 深圳怡化电脑股份有限公司 A kind of recognition methods of bank note, device, terminal device and storage medium
CN108830862A (en) * 2018-06-08 2018-11-16 江南大学 Based on the crab of image segmentation towards recognition methods
CN108875746A (en) * 2018-05-17 2018-11-23 北京旷视科技有限公司 A kind of licence plate recognition method, device, system and storage medium
CN110348361A (en) * 2019-07-04 2019-10-18 杭州景联文科技有限公司 Skin texture images verification method, electronic equipment and recording medium
CN110610575A (en) * 2019-09-20 2019-12-24 北京百度网讯科技有限公司 Coin identification method and device and cash register
CN110634222A (en) * 2019-08-27 2019-12-31 河海大学 Bank bill information identification method
CN116486418A (en) * 2023-06-19 2023-07-25 恒银金融科技股份有限公司 Method and device for generating banknote crown word number image

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090128659A (en) * 2008-06-11 2009-12-16 (주)캡소프트 High-density barcode printing apparatus in trade specification at monetary automated machine and method thereof
JP2010055544A (en) * 2008-08-29 2010-03-11 Fujitsu Frontech Ltd Card insertion guidance method and card processor
CN101894266A (en) * 2010-06-30 2010-11-24 北京捷通华声语音技术有限公司 Handwriting recognition method and system
CN103136845A (en) * 2013-01-23 2013-06-05 浙江大学 Renminbi (RMB) counterfeit identifying method based on crown-word image characters
CN104408814A (en) * 2014-12-13 2015-03-11 天津远目科技有限公司 Method for identifying RMB code

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090128659A (en) * 2008-06-11 2009-12-16 (주)캡소프트 High-density barcode printing apparatus in trade specification at monetary automated machine and method thereof
JP2010055544A (en) * 2008-08-29 2010-03-11 Fujitsu Frontech Ltd Card insertion guidance method and card processor
CN101894266A (en) * 2010-06-30 2010-11-24 北京捷通华声语音技术有限公司 Handwriting recognition method and system
CN103136845A (en) * 2013-01-23 2013-06-05 浙江大学 Renminbi (RMB) counterfeit identifying method based on crown-word image characters
CN103136845B (en) * 2013-01-23 2015-09-16 浙江大学 A kind of Renminbi false distinguishing method based on crown word number characteristics of image
CN104408814A (en) * 2014-12-13 2015-03-11 天津远目科技有限公司 Method for identifying RMB code

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
冯博远 等: ""人民币冠字号码识别预处理算法研究"", 《计算机工程与科学》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107392260B (en) * 2017-06-08 2020-03-17 中国民生银行股份有限公司 Error calibration method and device for character recognition result
CN107392260A (en) * 2017-06-08 2017-11-24 中国民生银行股份有限公司 The wrong scaling method and device of a kind of character identification result
CN108091033A (en) * 2017-12-26 2018-05-29 深圳怡化电脑股份有限公司 A kind of recognition methods of bank note, device, terminal device and storage medium
CN108875746B (en) * 2018-05-17 2023-02-17 北京旷视科技有限公司 License plate recognition method, device and system and storage medium
CN108875746A (en) * 2018-05-17 2018-11-23 北京旷视科技有限公司 A kind of licence plate recognition method, device, system and storage medium
CN108830862B (en) * 2018-06-08 2021-11-30 江南大学 Crab orientation identification method based on image segmentation
CN108830862A (en) * 2018-06-08 2018-11-16 江南大学 Based on the crab of image segmentation towards recognition methods
CN110348361A (en) * 2019-07-04 2019-10-18 杭州景联文科技有限公司 Skin texture images verification method, electronic equipment and recording medium
CN110348361B (en) * 2019-07-04 2022-05-03 杭州景联文科技有限公司 Skin texture image verification method, electronic device, and recording medium
CN110634222A (en) * 2019-08-27 2019-12-31 河海大学 Bank bill information identification method
CN110634222B (en) * 2019-08-27 2021-07-09 河海大学 Bank bill information identification method
CN110610575A (en) * 2019-09-20 2019-12-24 北京百度网讯科技有限公司 Coin identification method and device and cash register
US11354887B2 (en) 2019-09-20 2022-06-07 Beijing Baidu Netcom Science And Technology Co., Ltd. Coin identification method, device, and cash register
CN116486418A (en) * 2023-06-19 2023-07-25 恒银金融科技股份有限公司 Method and device for generating banknote crown word number image
CN116486418B (en) * 2023-06-19 2023-10-03 恒银金融科技股份有限公司 Method and device for generating banknote crown word number image

Similar Documents

Publication Publication Date Title
CN107195069A (en) A kind of RMB crown word number automatic identifying method
CN110598699B (en) Anti-counterfeiting bill authenticity distinguishing system and method based on multispectral image
CN107491730A (en) A kind of laboratory test report recognition methods based on image procossing
Saxena Niblack’s binarization method and its modifications to real-time applications: a review
CN104463195B (en) Printing digit recognizing method based on template matches
US7590275B2 (en) Method and system for recognizing a candidate character in a captured image
Shidore et al. Number plate recognition for indian vehicles
CN108596166A (en) A kind of container number identification method based on convolutional neural networks classification
CN107944452A (en) A kind of circular stamp character recognition method
Shen et al. Improving OCR performance with background image elimination
US20080310721A1 (en) Method And Apparatus For Recognizing Characters In A Document Image
CN101599125A (en) The binarization method that the complex background hypograph is handled
CN102426649A (en) Simple steel seal digital automatic identification method with high accuracy rate
CN104680130A (en) Chinese character recognition method for identification cards
CN106529532A (en) License plate identification system based on integral feature channels and gray projection
Sharan et al. Detection of counterfeit Indian currency note using image processing
Zhang et al. A combined algorithm for video text extraction
Sawant et al. Currency recognition using image processing and minimum distance classifier technique
Suresh et al. Indian currency recognition and verification using image processing
CN116071763A (en) Teaching book intelligent correction system based on character recognition
CN110689003A (en) Low-illumination imaging license plate recognition method and system, computer equipment and storage medium
CN113537211A (en) Deep learning license plate frame positioning method based on asymmetric IOU
CN107742357A (en) A kind of recognition methods of paper money number and device
Khan et al. Car Number Plate Recognition (CNPR) system using multiple template matching
Hollaus et al. CNN based binarization of multispectral document images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170922

RJ01 Rejection of invention patent application after publication