CN1217292C - Bill image face identification method - Google Patents

Bill image face identification method Download PDF

Info

Publication number
CN1217292C
CN1217292C CN 03148250 CN03148250A CN1217292C CN 1217292 C CN1217292 C CN 1217292C CN 03148250 CN03148250 CN 03148250 CN 03148250 A CN03148250 A CN 03148250A CN 1217292 C CN1217292 C CN 1217292C
Authority
CN
China
Prior art keywords
space
whole page
line
point
standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 03148250
Other languages
Chinese (zh)
Other versions
CN1460961A (en
Inventor
蔡亮
陈宇
周昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sinyada Technology Co ltd
Sunyard System Engineering Co ltd
Original Assignee
XINYADA SYSTEM ENGINEERING Co LTD HANGZHOU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XINYADA SYSTEM ENGINEERING Co LTD HANGZHOU filed Critical XINYADA SYSTEM ENGINEERING Co LTD HANGZHOU
Priority to CN 03148250 priority Critical patent/CN1217292C/en
Publication of CN1460961A publication Critical patent/CN1460961A/en
Application granted granted Critical
Publication of CN1217292C publication Critical patent/CN1217292C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

The present invention discloses a layout identification method in a bill image treatment process. The method comprises the following steps: the image of a layout to be recognized; the layout image is pretreated; then, characteristic lines or characteristic characters in the image after pretreatment are matched with standard lines or standard characters of each standard layout, which are stored in advance; a layout to be recognized is determined according to a matching confidence degree. During particular matching operation, the characteristic lines can be firstly searched in the image after pretreatment, and whether the searched characteristic lines are sufficient is judged. If true, the characteristic lines are matched with the standard lines of all the standard layouts, which are stored in advance; if false, the characteristic characters are searched in the image after pretreatment, the characteristic characters are matched with the standard characters of all the standard layouts, which are stored in advance. The method has the advantages of simplicity and easy realization and is suitable for bill treatment of multiple application areas.

Description

The recognition methods of the bill image space of a whole page
Technical field
The present invention relates to a kind of image processing method, especially a kind of identification disposal route to the bill layout image.
Background technology
In the informationized society that arrives gradually, along with the development of computer networking technology, bigger variation has taken place in the communication media of information, makes the expression of information, propagation also must enter electronic development model.Because the business development of all trades and professions, rolling up of the kind of bill, quantity adopted the Computer Processing of configuration optical image information fetch equipment and preserved the important means that these bills become efficient processing billing information.In the bill processing procedure that adopts computing machine, the identification of the space of a whole page of bill is a most important parts, and the efficient that improves the identification of the bill space of a whole page helps improving whole bill treatment effeciency.
In the recognition methods of the existing bill space of a whole page, usually adopt following step to realize, at first store the data of the standard edition surface information relevant with the space of a whole page to be identified in computer system, the standard space of a whole page data here are the characteristics that contrasts the standard space of a whole page that needs when rule is carried out space of a whole page coupling according to certain; Secondly bill to be identified is carried out scanning of image, the image that obtains according to scanning carries out the extraction of characteristic, and the characteristic that last basis extracts is carried out the similarity judgement, thereby determines the space of a whole page of optimum matching.
In the said process, people's common concern be the simple and easy implementation of determining and in image to be identified, extract characteristic of the characteristic of the standard ticket space of a whole page, this is related to the quality of the characteristic that extracts, thereby influence the efficient that bill is discerned.Therefore, how can adopt the identifying schemes of simple bill layout image, be that people thirst for the problem that solves always.
Summary of the invention
Based on top described, the objective of the invention is to, the space of a whole page recognition methods in a kind of bill image processing procedure simple, that easily realize is provided.
For achieving the above object, the space of a whole page recognition methods in the bill image processing procedure provided by the invention comprises:
Import the image of the space of a whole page to be identified, this image is carried out space of a whole page pre-service;
In pretreated image, search characteristic line;
Whether judge the characteristic line that finds greater than the appointment thresholding, if greater than, utilize the described characteristic line and the standard lines of all standard spaces of a whole page of storing in advance to mate; Otherwise, in pretreated image, search the feature literal, utilize the described feature literal and the grapholect of all standard spaces of a whole page of storing in advance to mate;
Determine the space of a whole page to be identified according to matching confidence.
In above-mentioned steps, the described characteristic line of searching comprises according to following step and searches horizontal line:
11) treat the test space of a whole page and carry out point by point scanning line by line, when scanning stain b_dot, change step 12), when scanning white point w_dot, change step 13);
12) whether judging point b_dot is a line segment left end point, if then change step 11), and begin to look for the line operation; Otherwise,
Judging point b_dot whether be in the line segment a bit, if, this point is charged to line length, change step 11) and continue the next point of scanning; Otherwise,
Whether judging point b_dot is the end of delegation, and is not the point on the line, if then change step 11) and carry out next line scanning; Otherwise, change step 15) and do and look for the line processing that finishes;
13) whether find line before the judging point w_dot, if not, change step 11) and continue the follow-up point of scanning; Otherwise,
Whether judging point w_dot is the white point of a line segment interrupt line part, if change the step 14) processing of breaking; Otherwise,
Whether judging point w_dot is the end point of a line segment, if change step 15) and do and look for the line processing that finishes; Otherwise change step 16);
14) will put w_dot as stain b_dot, change step 12);
15) preserve the lines that find;
16) judge the whether end of scan of the space of a whole page to be tested, if finish, change step 11) and continue scanning, otherwise finish scanning.
In step 13), whether be the white point of a line segment interrupt line part according to following step judging point w_dot:
In the scope that broken string length allows,, be offset nominated bank up and down with the current behavior datum line, begin to search stain from the horizontal ordinate of this white point, if in this scope, do not find stain, judge that then this white point is the end of a line segment, otherwise judge that this white point is a broken string part in the line.
In step 16) judge that the space of a whole page to be tested whether before the end of scan, also comprises the step of judging the line segment qualification.
Described operation of searching characteristic line also comprises according to following step and searches vertical line:
21) space of a whole page to be tested is turned over turn 90 degrees, so that vertical line changes horizontal line into;
22) search horizontal line;
23) coordinate conversion with described horizontal line is the coordinate of vertical line.
Before the step of determining the space of a whole page to be identified according to matching result, also comprise: the match point of determining the space of a whole page to be identified.
Adopt following step to determine the match point of the space of a whole page to be identified:
Determine the intersection point of all horizontal lines and vertical line;
In above-mentioned intersection point, the intersection point of the difference minimum of selection horizontal ordinate and ordinate is as match point.
Based on top described, realize the described characteristic line and the coupling of standard lines of each standard space of a whole page of storage in advance according to following step:
31) the standard space of a whole page of reading pre-stored;
32) be standard with the match point, calculate the horizontal line matching rate in the space of a whole page to be tested and the standard space of a whole page, and calculate the vertical line matching rate in the space of a whole page to be tested and the standard space of a whole page;
33) according to described horizontal line matching rate and the definite matching confidence of vertical line matching rate to this standard space of a whole page.
Determine that the horizontal line matching rate in the space of a whole page to be tested and the standard space of a whole page carries out according to following step:
41) the accumulative total horizontal line matching rate of the calculating space of a whole page to be tested and the standard space of a whole page;
42) horizontal line matching rate=accumulative total horizontal line matching rate * 2/ (the horizontal line number of the horizontal line number+standard space of a whole page of the test space of a whole page).
Described accumulative total horizontal line matching rate is determined according to following step:
51) initialization accumulative total horizontal line matching rate bMatchH=0;
52) from the set of the horizontal line of the space of a whole page to be tested, selects a non-selected horizontal line LineT, when not having selectable horizontal line LineT, end operation, otherwise commentaries on classics step 53);
53) the length L T of calculating horizontal line LineT, and it is from vertical range DVT and the horizontal range DHT of match point OrgT;
54) from the set of the horizontal line of the standard space of a whole page, selects a non-selected horizontal line LineS, when not having selectable horizontal line LineS, changes step 52), otherwise commentaries on classics step 55);
55) the length L S of calculating horizontal line LineS, and it is from vertical range DVS and the horizontal range DHS of match point OrgS;
56) whether the absolute value c that whether whether differs from greater than corresponding setting value H or LT and LS greater than the absolute value b of corresponding setting value V or DHT and DHS difference of the absolute value a that judges DVT and DVS difference is greater than corresponding setting value L, if greater than, change step 54), otherwise, calculate matching rate matchL according to following formula:
matchL=((a/V) 2+(b/L) 2+(c/L) 2)/3;
57) matchL is added to bMatchH, changes step 54 then).
Determine vertical line matching rate in the space of a whole page to be tested and the standard space of a whole page according to following step:
61) space of a whole page to be tested is turned over turn 90 degrees, so that vertical line changes horizontal line into;
62) calculate the accumulative total horizontal line matching rate of the space of a whole page to be tested and the standard space of a whole page, with described accumulative total horizontal line matching rate as vertical line matching rate totally;
63) vertical line matching rate=accumulative total vertical line matching rate * 2/ (the vertical line number of the vertical line number+standard space of a whole page of the test space of a whole page).
In addition, the present invention also provides the recognition methods of a kind of bill image space of a whole page, comprising:
Import the image of the space of a whole page to be identified, this image is carried out space of a whole page pre-service;
In pretreated image, search the feature literal;
Judge whether to find the feature literal,, utilize the described feature literal and the grapholect of all standard spaces of a whole page of storing in advance to mate if find; Otherwise, in pretreated image, search characteristic line, utilize the described characteristic line and the standard lines word of all standard spaces of a whole page of storing in advance to mate;
Determine the space of a whole page to be identified according to matching confidence.
Described step of searching the feature literal comprises:
71) search the characteristic matching point of the space of a whole page to be tested;
72) the standard edition surface information of reading pre-stored takes the picture block of appointment in pretreated image according to described information;
73) in described picture block, search the feature literal.
Because it is recognition feature that the present invention adopts with lines or literal in the space of a whole page to be identified, with the standard lines or the grapholect of each standard space of a whole page of storage mate in advance, at last determine that according to matching confidence the method for the space of a whole page to be identified carries out the identification of the space of a whole page, simple and the realization easily of method, and have higher space of a whole page recognition efficiency.
Description of drawings
Fig. 1 is the main flow chart of the method for the invention embodiment;
Fig. 2 is 3 * 3 matrix synoptic diagram;
Fig. 3 be the described embodiment of Fig. 1 adopt look for the horizontal line process flow diagram;
Fig. 4 is the changes in coordinates synoptic diagram before and after the layout image rotation;
Fig. 5 is the calculating accumulative total horizontal line matching rate process flow diagram that Fig. 1 adopts;
Fig. 6 is the space of a whole page to be tested and the standard space of a whole page synoptic diagram that comprises horizontal line;
Fig. 7 is communicated with the piece synoptic diagram for literal.
Embodiment
The main task of the method for the invention is exactly according to line information or Word message that the space of a whole page to be identified comprises it to be discerned, to determine the space of a whole page that is identified specifically is any space of a whole page, be particularly useful for bill is discerned, so that the basis of bill process computerization to be provided.
Fig. 1 is the main flow chart of the method for the invention embodiment.In flow process shown in Figure 1, the main main points of the method for the invention have been described: the layout image to be identified that obtains by the scanning bill is carried out lines or text query, characteristic line that the utilization inquiry obtains or feature literal compare with the standard lines or the grapholect of the standard space of a whole page of storing in advance, determine according to the result who contrasts what bill the bill that is identified belongs to.According to the indication of Fig. 1, realize method of the present invention, at first to store the standard information that the bill space of a whole page to be identified is discerned use, as store in the database of system, with foundation as the identification contrast according to standard ticket being used for of obtaining.Difference according to processing bill character, described standard information can adopt different characteristic, can represent the feature of the bill image space of a whole page as characteristic line, unique point and feature Word message etc., normal data in the present embodiment comprises template space of a whole page title, and all standard ledgement coordinates, vertical moulding coordinate in the template, word content and coordinate, match point etc., also comprise some other empirical value, the for example shortest length of lines, lines error in length, and lines are to vertical range, horizontal range and the error range thereof etc. of space of a whole page match point.
Based on standard information, after reading bill layout image to be identified by the optical image fetch equipment, at first will be in the pre-service before step 1 pair described layout image is discerned, to remove the various interference noises in the image.From system database, read space of a whole page initialization information in step 2 then, promptly read the information of all standard spaces of a whole page of being stored, be used for the comparison of follow-up space of a whole page identifying.Described step 1, the 2nd, the initialization step of present embodiment flow process, provide and carried out space of a whole page base of recognition, therefore, promptly can carry out searching of space of a whole page characteristic information to be identified in step 3, specifically, this step is looked for characteristic line from layout image to be identified, after search operation finishes, judge whether on the space of a whole page to be identified, to have found enough lines that are used for space of a whole page identification in step 4, if found enough lines, only illustrate and just can correctly discern as characteristic line which kind of bill the bill of judging this image sign belongs to layout image to be identified with the lines that find.At this moment, carry out step 6, according to the information of the standard space of a whole page that reads in advance, promptly standard lines and the characteristic line that finds carry out space of a whole page matching operation, determine the space of a whole page to be identified according to matching result in step 8 then, and will determine that the result feeds back to system.Since bill that layout image to be identified identified must be in the standard space of a whole page bill of being stored a kind of, therefore which kind of bill can identify layout image to be identified usually in step 8 belongs to; If, also can feed back other recognition result, there is flaw maybe can't discern etc. as the space of a whole page in this step owing to reasons such as scanning cause layout image identification error.If judgement through step 4, from layout image to be identified, do not find enough lines, illustrate that the bill that layout image to be identified identifies may be no lines bill or other situation, at this moment will discern the space of a whole page by literal, therefore be that the lines that find are when enough in judged result, want step 5 to search the characteristic matching point of the space of a whole page, and according to the standard edition surface information that reads in advance, promptly give instruction with the Word message in the standard edition surface information, in pretreated image, take the picture block of appointment, and in described picture block, search the feature literal, according to the feature literal that finds, carry out characters matching then, promptly do not have the coupling of the lines space of a whole page in step 7, determine the space of a whole page to be identified according to matching result in step 8 at last, and will determine that the result feeds back to system.
In the described embodiment flow process of Fig. 1, narrated at first and carried out space of a whole page identification according to characteristic line, if the no lines space of a whole page is discerned, carry out the situation of space of a whole page identification again according to the feature literal, this situation is fit to discern for the bill space of a whole page that most bills have under the lines feature situation.In fact, also can adopt other order, carry out space of a whole page identification according to the feature literal earlier,, carry out space of a whole page identification according to characteristic line again if to the lines space of a whole page being arranged or not having the literal space of a whole page and discern.Promptly at first in pretreated image, search the feature literal, judge whether to find the feature literal,, utilize the described feature literal and the grapholect of all standard spaces of a whole page of storing in advance to mate if find; Otherwise, in pretreated image, search characteristic line, utilize the described characteristic line and the standard lines word of all standard spaces of a whole page of storing in advance to mate.It is pointed out that and be not limited to said sequence in the reality.For example, the space of a whole page character that can judge the space of a whole page to be identified earlier and identified, the identification based on the feature literal is carried out in decision earlier again, still carries out the identification based on characteristic line earlier.
In the described embodiment flow process of Fig. 1, step 1 is described layout image is discerned before pretreated purpose be to eliminate the defective that influences its identification in the layout image, comprise following related content.Remove layout image to be identified black surround, to layout image to be identified tilt rectification, remove the noise of layout image to be identified.Shi Bie bill is colored bill if desired, perhaps the image after the scanning has certain gray scale or color, for improving the accuracy rate or the recognition efficiency of identification, can also be when pre-service begins, whether to layout image to be identified is that colour or gray scale image are judged, if, change described image into the black and white binary picture, promptly carry out binary conversion treatment.Concrete binary conversion treatment process, being used for colour or gray scale layout image data conversion through image enhancement and noise cleaning is to have only layout image data black, white two-value.Specifically can adopt such method to realize: at first to carry out Gauss's smothing filtering,, prevent to produce isolated white point and stain after the binaryzation to remove white point and stain noise; Secondly determine the prospect of view picture image and the gray-scale value of background, calculate the binaryzation threshold values, described threshold values can adopt two kinds, and the one, overall static threshold values, the 2nd, local dynamic thresholding; Reset at last the gray-scale value of view picture image, gray-scale value is made as white point greater than the picture element of threshold values, otherwise is made as stain according to the size of threshold values.
In the pretreatment operation of layout image to be identified in this example, the black surround excision is carried out at binary picture, specific practice is, space of a whole page level is divided into two zones up and down, in each zone respectively from the left side, the right begins to handle, whole like this zone can be divided into upper left district, lower-left district, upper right district and bottom right district handles, and adopts identical rule to line by line scan respectively in each district.By scanning, to the row that each scanned, according to black row judgment rule determine whether this row is black row, promptly as long as the continuous white point number in this row greater than white noise gap width given in advance, then should capablely not be black row, otherwise went for black.According to the scanning result of each row,, determine black surround and the removal that to remove again according to the judgment rule of black surround.Described image is corrected, and is that the inclination image that scanning produces is corrected.Carrying out image when tilt correcting, can carry out according to sciagraphy, concrete grammar is: one, the angle of inclination of computational picture, two, rotated image.
The method of calculating the angle of inclination is: one, rough search can be 2 degree with step-length to+30 degree scopes at-30 degree usually, searches for; Two, precise search is that 0.1 degree carries out precise search with step-length in the 2 degree scopes that rough search obtains; Mainly be to utilize direction projection statistics stain number when angle searching, promptly, add up in the view picture image stain number of each row in the direction along some angle traversal images, utilize then and add up the stain histogram calculation variance that obtains, the direction of choosing the variance maximum is as the pitch angle.
The noise of described removal layout image to be identified is that layout image is carried out Filtering Processing, removes the ground unrest in the image, and strengthens image.Filtering and noise reduction sound described here is meant and removes unnecessary lines, striped in the image, and carries out smoothing processing, the processing of desalination filter, the processing of enriching filter, the cleaning smoothing processing of character and mend broken string processing etc.The described lines that go, be on image, be the bitmap image, the judgement of lining by line scan, the wide lines of single pixel that search exists, again the wide lines of all single pixels are merged into the thick lines with actual pixels width by adjacent rule, judge then whether these thick lines satisfy the removal condition, if satisfy, explanation is the unnecessary lines in the pattern recognition process, it is removed all lines that satisfy condition up to elimination.If in going the lines process, the part effective information on the character is removed, for guaranteeing the complete of character, at this moment, also to carry out repairing to impaired character.Concrete mending course is that near the distribution situation of the character of lines has been removed in scanning, detects the particular location that needs to repair character, repairs according to the average length of lines length about the damaged part then.
The described striped that goes is with by the scanning to image, length in the delegation is not more than the lines removal of given width.
The smoothing processing of described character is meant for the point in the character to be divided into a little and to mend some two rules that visual bitmap pointwise is judged rule induction is 3 * 3 matrixes, with reference to figure 2, the central point of this matrix is a judging point.The coordinate of supposing this point for (I, J), (I J) is white point, if 8 some stains are many on every side for they, such as (I-1, J-1), (I, J-1), (I+1, J-1), (I-1, J), (I+1 J) is stain, and then this point should be mended and be stain; Otherwise (I J) is stain, if 8 some white points are many around it, thinks that then this point is a noise point, should remove.
Described desalination filter is handled, and is to adopt corroding method to eliminate object boundary point in the image.If structural element is got 3 * 3 stain piece, corrosion will make the border of object reduce by a pixel along periphery.If between two objects tiny connection is arranged, when structural element is enough big, two objects can be separated so by erosion operation.The expression formula of erosion algorithm is: X-S=∩ X[s] |-s ∈ S}, X are target image, and S is a structural element.At first read in the some pixel value of (being called current point) in the former image, getting with this point is 3 * 3 matrixes at center, if this is a stain, and 8 points not all are stains on every side, and then this point is composed into white point, promptly erodes.
Described enriching filter is handled, and is used to strengthen the brightness of image, improves the contrast of image.Concrete method equation expression is: X+S=∪ X[s] | s ∈ S}, X are target image, and S is a structural element.At first read in the some pixel value of (being called current point) in the former image, getting with this point is 3 * 3 matrixes at center, if this is a white point, and 8 points not all are white points on every side, and then this point is composed into stain, promptly expands.Also can adopt the disposal route of light enriching in the reality, i.e. the shape of limiting structure element S.Such as the expansion process left that makes progress, at first read in the some pixel value of (being called current point) in the former image, get with this point is 3 * 3 matrixes at center, if this is a white point, and left side point, upper left point and last edge point not all are white point (promptly only judging 3 points), then this point is composed into stain, promptly expands.
Cleaning is level and smooth to be handled with mending to break, and its rule also reduces 3 * 3 matrixes, and the central point of this matrix is a judging point.Smoothing processing is considered is a little, and promptly stain bleaches a little, and what mend that broken string considers is to mend point, i.e. white point blackening point.Mend the broken string processing rule, the number of stains in the middle of 8 points around the judgment matrix central point at first, next judges each stain position, whether this point is become stain according to the number and the determining positions of above-mentioned stain; Smoothing processing is at first considered on every side white point number in 8 points, considers each white point position relation then, whether this point is become white point according to the number and the determining positions of above-mentioned white point.
The process of looking for characteristic line from layout image to be identified that step 3 is carried out is a committed step of the present invention, has the space of a whole page of lines bill to discern for great majority, and the lines that find according to this step are feature, can finish the identification of the space of a whole page.This process comprises the subprocess of searching horizontal line and vertical line, because it is different that the difference for the treatment of horizontal line and vertical line is the angle of the space of a whole page, angle rotation by the space of a whole page can realize the mutual conversion of horizontal line and vertical line, therefore searches horizontal line and vertical line can be based on same method.Concrete horizontal line is searched based on scanning of image.When binary picture is scanned, can adopt from top to bottom, or scan mode from top to bottom to line by line scan, the figure image point that scans may be stain, also may be white point.When finding a stain b_dot, there are four kinds of possibilities, corresponding thus different some processing mode:
1. some b_dot may be a line segment left end point, at this moment begins to look for the line operation;
2. some b_dot may be a bit in the line segment, at this moment needs this point is charged to the line length variable, continues the next point of scanning;
3. be positioned at the end of delegation if put b_dot, and be not the point on the line, then proceed next line scanning;
4. if some b_dot is positioned at the end of delegation, and be the point on, then do and look for the line processing that finishes.
When finding a white point w_dot, also exist three kinds may reach corresponding processing mode:
1. do not find line before the some w_dot, at this moment do not process, directly scan follow-up point;
2. some w_dot may be a white point in the line segment interrupt line subregion, the processing of at this moment will breaking;
3. some w_dot may be the end point of a line segment, at this moment looks for the line processing that finishes.
Based on above-mentioned possibility, adopt the described flow process of Fig. 3 to search horizontal line.Before beginning to look for the line operation, lines set variable and current lines variable should be set usually, so that the lines that find are preserved.At first treat the test space of a whole page and carry out point by point scanning line by line, when scanning stain b_dot, change step 17 and carry out the stain processing, when scanning white point w_dot, change step 13 and carry out the white point processing in step 11.If that scan is stain b_dot, whether be that (whether it is stain that left end point is characterized as this point to a line segment left end point, and the front a bit is a white point, can be a line segment left end point by above-mentioned feature judging point b_dot therefore at step 17 judging point b_dot.), if, begin to look for the line operation in step 18, promptly begin this point is charged to current lines variable, change the operation that step 11 continues the scanning subsequent point then; Otherwise, whether at step 19 judging point b_dot is a bit (entering when looking for the line operation in the line segment, if this point is a stain, can determine that it is a bit in the line segment), if, directly this point is charged to current lines variable, change step 11 then and continue the next point of scanning in step 20; Otherwise, whether be the end of delegation at step 21 judging point b_dot, and be not the point on the line, if, illustrate that this point may be a noise spot, irrelevant with the lines that will search, at this moment then change step 11 and carry out next line scanning; Otherwise, illustrate whether some b_dot are the end of delegation, and be the point on a line, therefore change step 22 and do and look for the line processing that finishes.
If that scan is white point w_dot, whether at step 13 judging point w_dot is the white point of a line segment interrupt line part, if, the processing of need breaking, therefore in step 15 processing of breaking, broken string in this example handle be with a w_dot as stain b_dot, change step 12 then and carry out stain and handle; Otherwise whether found line before step 14 judging point w_dot, promptly whether the point before this point is the ending of a line, if not, illustrates that this point is common white point, irrelevant with lines, changes step 11 this moment and continues the follow-up point of scanning; Otherwise, whether be the end point of a line segment at step 16 judging point w_dot, promptly the point before this point is the point of a line, if change step 22 and do and look for the line processing that finishes.
Handle at the step 22 pair lines that find, judge the whether end of scan of the space of a whole page to be tested in step 23 then,, change step 11 and continue scanning, otherwise finish scanning, carry out the subsequent treatment of the space of a whole page to be identified if finish.
In this example, whether according to following step judging point w_dot is the white point of a line segment interrupt line part: in the scope that broken string length allows, with the current behavior datum line, be offset nominated bank up and down, begin to search stain from the horizontal ordinate of this white point, if in this scope, do not find stain, judge that then this point is the end of a line segment, otherwise judge that this white point is a broken string part in the line.
The lines of step 22 are handled, and are line segment to be carried out the judgement of qualification, if what find is qualified lines, it is saved in the lines set variable, the further feature information that also comprises lines of Bao Cuning simultaneously as parameters such as the length of lines, coordinates, otherwise is abandoned this underproof lines.In this example to the judgement of lines qualification by with relatively the finishing of empirical value, for example, preestablish the length empirical value of hyphen lines, as 9 bits, and the length empirical value of long ledgement, as 120 bits.In the vertical line of down-stream is searched, also to as 70 bits, compare, thereby finish the judgement of vertical line qualification according to the length empirical value of predefined vertical moulding.The main effect of step 22 is noise lines of removing in letter in the layout image or the Chinese character, is convenient to follow-up identification.
When ledgement search finish after, just begin to search vertical moulding, at first the space of a whole page to be tested is turned over and turn 90 degrees, so that vertical line changes horizontal line into, carry out searching of horizontal line according to above-mentioned horizontal line lookup method then, the coordinate conversion of the horizontal line in the lines set that at last this search procedure is produced is the coordinate of vertical line, thereby makes all horizontal lines in this set be converted into vertical line.With reference to figure 4.That supposes image widely is w, and height be h, before the rotation and after the rotation coordinate at four angles of image as shown in the figure, if the ordinate of the horizontal line that finds after rotation is a, the horizontal ordinate that then is converted to the bitmap vertical line also is a.
In fact,, and in identifying, need oblique line to be used for space of a whole page identification, only need the angle that it is suitable that the space of a whole page overturns can be carried out searching of oblique line, thereby finish the process of carrying out space of a whole page identification by oblique line if comprise oblique line in the bill image of being discerned.
In step 4, judge whether on the space of a whole page to be identified, to have found enough lines that are used for space of a whole page identification, so just can learn with the lines that found to be that can feature correctly be discerned the space of a whole page to be identified.In this example, described enough lines are horizontal line and the vertical line more than 2 more than at least 2.
Before the space of a whole page matching operation of step 6, also to determine the match point of the space of a whole page to be identified.This match point can be determined when the scanning bill is visual by system, can determine that also concrete mode should be determined consistent with the standard form match point according to ledgement that finds and vertical moulding.Select a kind of mode in back in this example, concrete grammar is: at first determine the intersection point of all horizontal lines and vertical line, in above-mentioned intersection point, the intersection point of the difference minimum of selection horizontal ordinate and ordinate is as match point then.Like this, the lines that find in utilization as characteristic line when the standard lines of each standard space of a whole page of storage carry out matching operation in advance, the normal data of the standard space of a whole page of elder generation's reading pre-stored, be standard then with the match point, calculate the horizontal line matching rate in the space of a whole page to be tested and the standard space of a whole page, and calculate vertical line matching rate in the space of a whole page to be tested and the standard space of a whole page, last just can determine matching confidence to this standard space of a whole page according to the horizontal line matching rate that relatively obtains with the standard space of a whole page and vertical line matching rate, obtaining the matching confidence to each standard space of a whole page thus, can the space of a whole page to be identified be any space of a whole page by described confidence level.
In this example, determine horizontal line matching rate in the space of a whole page to be tested and the standard space of a whole page: at first calculate the accumulative total horizontal line matching rate of the space of a whole page to be tested and the standard space of a whole page, try to achieve the horizontal line matching rate according to described accumulative total horizontal line matching rate then according to following step.Concrete method is:
Horizontal line matching rate=accumulative total horizontal line matching rate * 2/ (the horizontal line number of the horizontal line number+standard space of a whole page of the test space of a whole page).
The key here is at first to obtain described accumulative total horizontal line matching rate, adopt following method in this example, with reference to figure 6, if LineT is a horizontal line to be tested in the space of a whole page to be measured, OrgT is a match point for the treatment of the side space of a whole page, LineS is a horizontal line in the standard masterplate, and OrgS is the match point of standard masterplate.
When calculating accumulative total horizontal line matching rate, its process must at first import lines positional information and match point initial position in the space of a whole page to be measured and the standard masterplate with reference to figure 5, and horizontal line matching rate variable bMatchH is set.According to Fig. 5, at step 31 initialization accumulative total horizontal line matching rate variable bMatchH, make it equal 0, then in step 32 from the set of the horizontal line of the space of a whole page to be tested, select a non-selected horizontal line LineT,, promptly may not have selectable horizontal line LineT because this selection may make unsuccessfully, therefore, after selection, judge in step 33 whether selection is successful, if unsuccessful, illustrate and do not have selectable horizontal line to be measured, at this moment direct end operation, otherwise change step 34, calculate the length L T of horizontal line LineT, and it is from vertical range DVT and the horizontal range DHT of match point OrgT, with this parameter as lines coupling to be tested; Then in step 35 from the set of the horizontal line of the standard space of a whole page, select a non-selected standard horizontal line LineS, the horizontal line coupling is carried out in preparation, because the standard horizontal line is also by selected the finishing of possibility, therefore, to judge whether selection is successful in step 36, if it is unsuccessful, illustrate and do not have selectable standard horizontal line LineS, being that the standard horizontal line is selected finishes, need to select next bar horizontal line to be tested proceed with the standard space of a whole page in the matching operation of standard lines, therefore change step 32 and select next bar horizontal line to be tested; Otherwise at the length L S of step 37 basis of calculation horizontal line LineS, and it is from vertical range DVS and the horizontal range DHS of match point OrgS, prepares to mate with lines to be tested.Concrete matching operation is from step 38, in step 38, calculate the absolute value a of DVT and DVS difference, the absolute value c of the absolute value b of DHT and DHS difference and LT and LS difference, then step 39 judge above-mentioned difference a whether greater than the setting value V of correspondence or difference b whether greater than the setting value H of correspondence or difference c whether greater than the setting value L of correspondence, if have one in the above-mentioned difference greater than corresponding preset threshold, illustrate that coupling is unsuccessful, the lines to be tested and the standard lines that are promptly mated do not have similarity, at this moment, will change step 35 selects next bar standard horizontal line to proceed coupling; If described difference a, b, c are not more than corresponding preset threshold, lines are described, and the match is successful, therefore, calculates matching rate matchL in step 40 according to following formula:
matchL=((a/V) 2+(b/L) 2+(c/L) 2)/3;
And matchL is added among the variable bMatchH in step 41, change step 35 then and select non-selected standard horizontal line to proceed matching operation.
For the matching operation that makes continuation has better effect, before changeing step 35, step 41 can also judge that whether matchL is the maximal value in all matching rates that calculated, if, the mid point of horizontal line LineT is composed to OrgT, and the mid point of horizontal line LineS composed to OrgS, and then carry out the operation of changeing step 35.
Based on above-mentioned accumulative total horizontal line matching rate, when determining the vertical line matching rate in the space of a whole page to be tested and the standard space of a whole page again, can adopt the operation of following step: at first the space of a whole page to be tested is turned over and turn 90 degrees, so that vertical line changes horizontal line into, calculate the accumulative total horizontal line matching rate of the space of a whole page to be tested and the standard space of a whole page again, as accumulative total vertical line matching rate, the vertical line matching rate that obtains like this is with described accumulative total horizontal line matching rate:
Vertical line matching rate=accumulative total vertical line matching rate * 2/ (the vertical line number of the vertical line number+standard space of a whole page of the test space of a whole page).
In the process of searching the feature literal described in the step 5 of Fig. 1 be: the characteristic matching point of at first searching the space of a whole page, this characteristic matching point can be the true origin that system determines, it also can be the definite point of matches criteria point of the normative reference space of a whole page, with this point is reference, the standard edition surface information of reading pre-stored, in pretreated image, take the picture block of appointment according to described information, from described picture block, search the feature literal again.When concrete search operation, because the track of literal is made up of the stain of a series of adjacent arrangements certainly, so can classify adjacent stain as one group, and each stain of organizing adjacent arrangement can surround it with a rectangle, form the connection piece, with reference to figure 7.Be communicated with piece according to this, can adopt following Word message leaching process:
1, determines the approximate range of required literal;
2, take the image of specified scope;
3, remove lines and shading;
4, search all connection pieces;
5, some arrangements of merging with good conditionsi are communicated with piece closely;
6, classify the connection piece of specified altitude assignment and length as a set A;
7, the Word message of in set A, searching;
It should be noted that, because some literal intersect with lines and shading, in the superincumbent operation, because the step of lines and shading in the existence removal literal, so when removing lines and shading, the part of literal can be erased, in order to increase the accuracy of pattern recognition, also need to carry out literal and repair, concrete method for repairing and mending is with reference to the described related content of top step 1.
In the specific embodiment of the invention described above, after the space of a whole page tilts to correct, only utilize the horizontal line information and the vertical line information of the space of a whole page.When the space of a whole page is looked for the toe-in bundle, for the horizontal line section, need its ordinate of record, the horizontal ordinate of left end point horizontal ordinate and right endpoint, the difference of these two horizontal ordinates is exactly the length of horizontal line section.Need write down its horizontal ordinate for the vertical line section, the ordinate of upper extreme point ordinate and lower extreme point, the difference of these two ordinates is exactly the length of vertical line section.
When calculating the horizontal line section, calculate vertical range and the horizontal range of this horizontal line section left end point to match point to the vertical range of match point and horizontal range; When calculating perpendicular section, calculate vertical range and the horizontal range of this horizontal line section lower extreme point to match point to the vertical range of match point and horizontal range.
For example:
1) ordinate of a certain horizontal line section is y, and the left end point horizontal ordinate is x1, and the horizontal ordinate of right endpoint is xr, the match point coordinate is (x0, y0). then the horizontal line segment length is xr-x1, and line to the vertical range of match point is | y-y0|, line to the horizontal range of match point be | x1-x0|.
2) horizontal ordinate of a certain vertical line section is x, and the upper extreme point ordinate is yt, and the ordinate of lower extreme point is yb, the match point coordinate is (x0, y0). then the vertical line segment length is xt-xb, and line to the vertical range of match point is | x-x0|, line to the horizontal range of match point be | yb-y0|.
Need to prove, in specific embodiments of the invention, what adopt is the foundation that horizontal line and vertical line contrast as the space of a whole page, adopt diagonal features to carry out space of a whole page coupling in the reality and also be fit to method of the present invention, difference only is the angle that the space of a whole page is rotated when handling, therefore, the present invention has better use dirigibility in practice.

Claims (16)

1, the space of a whole page recognition methods in a kind of bill image processing procedure is characterized in that comprising:
Import the image of the space of a whole page to be identified, this image is carried out space of a whole page pre-service;
In pretreated image, search characteristic line;
Whether judge the characteristic line that finds greater than the appointment thresholding, if greater than, utilize the described characteristic line and the standard lines of all standard spaces of a whole page of storing in advance to mate; Otherwise, in pretreated image, search the feature literal, utilize the described feature literal and the grapholect of all standard spaces of a whole page of storing in advance to mate;
Determine the space of a whole page to be identified according to matching confidence.
2, space of a whole page recognition methods according to claim 1 is characterized in that, the described characteristic line of searching comprises according to following step and searches horizontal line:
11) treat the test space of a whole page and carry out point by point scanning line by line, when scanning stain b_dot, change step 12), when scanning white point w_dot, change step 13);
12) whether judging point b_dot is a line segment left end point, if then change step 11), and begin to look for the line operation; Otherwise,
Judging point b_dot whether be in the line segment a bit, if, this point is charged to line length, change step 11) and continue the next point of scanning; Otherwise,
Whether judging point b_dot is the end of delegation, and is not the point on the line, if then change step 11) and carry out next line scanning; Otherwise, change step 15) and do and look for the line processing that finishes;
13) whether find line before the judging point w_dot, if not, change step 11) and continue the follow-up point of scanning; Otherwise,
Whether judging point w_dot is the white point of a line segment interrupt line part, if change the step 14) processing of breaking; Otherwise,
Whether judging point w_dot is the end point of a line segment, if change step 15) and do and look for the line processing that finishes; Otherwise change step 16);
14) will put w_dot as stain b_dot, change step 12);
15) preserve the lines that find;
16) judge the whether end of scan of the space of a whole page to be tested, if finish, change step 11) and continue scanning, otherwise finish scanning.
Whether 3, space of a whole page recognition methods according to claim 2 is characterized in that, be the white point of a line segment interrupt line part according to following step judging point w_dot in the step 13):
In the scope that broken string length allows,, be offset nominated bank up and down with the current behavior datum line, begin to search stain from the horizontal ordinate of this white point, if in this scope, do not find stain, judge that then this white point is the end of a line segment, otherwise judge that this white point is a broken string part in the line.
4, space of a whole page recognition methods according to claim 3 is characterized in that, in step 16) judge that the space of a whole page to be tested whether before the end of scan, also comprises the step of judging the line segment qualification.
5, space of a whole page recognition methods according to claim 4 is characterized in that, the described characteristic line of searching comprises according to following step and searches vertical line:
21) space of a whole page to be tested is turned over turn 90 degrees, so that vertical line changes horizontal line into;
22) search horizontal line;
23) coordinate conversion with described horizontal line is the coordinate of vertical line.
6, space of a whole page recognition methods according to claim 5 is characterized in that, also comprises before the step of determining the space of a whole page to be identified according to matching result: the match point of determining the space of a whole page to be identified.
7, space of a whole page recognition methods according to claim 6 is characterized in that, adopts following step to determine the match point of the space of a whole page to be identified:
Determine the intersection point of all horizontal lines and vertical line;
In above-mentioned intersection point, the intersection point of the difference minimum of selection horizontal ordinate and ordinate is as match point.
8, space of a whole page recognition methods according to claim 7 is characterized in that, realizes the described characteristic line and the coupling of standard lines of each standard space of a whole page of storage in advance according to following step:
31) the standard space of a whole page of reading pre-stored;
32) be standard with the match point, calculate the horizontal line matching rate in the space of a whole page to be tested and the standard space of a whole page, and calculate the vertical line matching rate in the space of a whole page to be tested and the standard space of a whole page;
33) according to described horizontal line matching rate and the definite matching confidence of vertical line matching rate to this standard space of a whole page.
9, space of a whole page recognition methods according to claim 8 is characterized in that, determines horizontal line matching rate in the space of a whole page to be tested and the standard space of a whole page according to following step:
41) the accumulative total horizontal line matching rate of the calculating space of a whole page to be tested and the standard space of a whole page;
42) horizontal line matching rate=accumulative total horizontal line matching rate * 2/ (the horizontal line number of the horizontal line number+standard space of a whole page of the test space of a whole page).
10, space of a whole page recognition methods according to claim 9 is characterized in that, determines described accumulative total horizontal line matching rate according to following step:
51) initialization accumulative total horizontal line matching rate bMatchH=0;
52) from the set of the horizontal line of the space of a whole page to be tested, selects a non-selected horizontal line LineT, when not having selectable horizontal line LineT, end operation, otherwise commentaries on classics step 53);
53) the length L T of calculating horizontal line LineT, and it is from vertical range DVT and the horizontal range DHT of match point OrgT;
54) from the set of the horizontal line of the standard space of a whole page, selects a non-selected horizontal line LineS, when not having selectable horizontal line LineS, changes step 52), otherwise commentaries on classics step 55);
55) the length L S of calculating horizontal line LineS, and it is from vertical range DVS and the horizontal range DHS of match point OrgS;
56) whether the absolute value c that whether whether differs from greater than corresponding setting value H or LT and LS greater than the absolute value b of corresponding setting value V or DHT and DHS difference of the absolute value a that judges DVT and DVS difference is greater than corresponding setting value L, if greater than, change step 54), otherwise, calculate matching rate matchL according to following formula:
matchL=((a/V) 2+(b/L) 2+(c/L) 2)/3;
57) matchL is added to bMatchH, changes step 54 then).
11, space of a whole page recognition methods according to claim 10 is characterized in that, in described step 57) commentaries on classics step 54) also comprise before:
Judge that whether matchL is the maximal value in all matching rates that calculated, if the mid point of horizontal line LineT is composed to OrgT, and the mid point of horizontal line LineS is composed to OrgS.
12, space of a whole page recognition methods according to claim 11 is characterized in that, determines vertical line matching rate in the space of a whole page to be tested and the standard space of a whole page according to following step:
61) space of a whole page to be tested is turned over turn 90 degrees, so that vertical line changes horizontal line into;
62) calculate the accumulative total horizontal line matching rate of the space of a whole page to be tested and the standard space of a whole page, with described accumulative total horizontal line matching rate as vertical line matching rate totally;
63) vertical line matching rate=accumulative total vertical line matching rate * 2/ (the vertical line number of the vertical line number+standard space of a whole page of the test space of a whole page).
13, the space of a whole page recognition methods in a kind of bill image processing procedure is characterized in that comprising:
Import the image of the space of a whole page to be identified, this image is carried out space of a whole page pre-service;
In pretreated image, search the feature literal;
Judge whether to find the feature literal,, utilize the described feature literal and the grapholect of all standard spaces of a whole page of storing in advance to mate if find; Otherwise, in pretreated image, search characteristic line, utilize the described characteristic line and the standard lines of all standard spaces of a whole page of storing in advance to mate;
Determine the space of a whole page to be identified according to matching confidence.
14, space of a whole page recognition methods according to claim 13 is characterized in that, described step of searching the feature literal comprises:
71) search the characteristic matching point of the space of a whole page to be tested;
72) the standard edition surface information of reading pre-stored takes the picture block of appointment in pretreated image according to described information;
73) in described picture block, search the feature literal.
15, space of a whole page recognition methods according to claim 13 is characterized in that, described image to the space of a whole page to be identified carries out the pretreated step of the space of a whole page and comprises:
81) black surround of removal layout image to be identified;
82) to the layout image to be identified rectification of tilting;
83) noise of removal layout image to be identified.
16, space of a whole page recognition methods according to claim 15 is characterized in that, described step 81) also comprise judge whether layout image to be identified is colour or gray scale image, if change described image into the black and white binary picture.
CN 03148250 2003-06-27 2003-06-27 Bill image face identification method Expired - Fee Related CN1217292C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 03148250 CN1217292C (en) 2003-06-27 2003-06-27 Bill image face identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 03148250 CN1217292C (en) 2003-06-27 2003-06-27 Bill image face identification method

Publications (2)

Publication Number Publication Date
CN1460961A CN1460961A (en) 2003-12-10
CN1217292C true CN1217292C (en) 2005-08-31

Family

ID=29591420

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 03148250 Expired - Fee Related CN1217292C (en) 2003-06-27 2003-06-27 Bill image face identification method

Country Status (1)

Country Link
CN (1) CN1217292C (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023966B (en) * 2009-09-16 2014-03-26 鸿富锦精密工业(深圳)有限公司 Computer system and method for comparing contracts
CN102750541B (en) * 2011-04-22 2015-07-08 北京文通科技有限公司 Document image classifying distinguishing method and device
CN103136544A (en) * 2011-11-30 2013-06-05 夏普株式会社 Image judging device
CN103034848B (en) * 2012-12-19 2016-07-06 方正国际软件有限公司 A kind of recognition methods of form types
CN103544475A (en) * 2013-09-23 2014-01-29 方正国际软件有限公司 Method and system for recognizing layout types
CN108460418B (en) * 2018-03-07 2021-09-28 南京邮电大学 Invoice classification method based on character recognition and semantic analysis
CN108717544B (en) * 2018-05-21 2022-11-25 天津科技大学 Newspaper sample manuscript text automatic detection method based on intelligent image analysis
CN109214385B (en) * 2018-08-15 2021-06-08 腾讯科技(深圳)有限公司 Data acquisition method, data acquisition device and storage medium
CN110533036B (en) * 2019-08-28 2022-06-07 长城信息股份有限公司 Rapid inclination correction method and system for bill scanned image

Also Published As

Publication number Publication date
CN1460961A (en) 2003-12-10

Similar Documents

Publication Publication Date Title
CN108596066B (en) Character recognition method based on convolutional neural network
CN1111818C (en) The Apparatus and method for of 2 d code identification
CN1310187C (en) Apparatus and method for recognizing code
CN1991865A (en) Device, method, program and media for extracting text from document image having complex background
CN1573811A (en) Map generation device, map delivery method, and map generation program
CN1157060C (en) Image interpolation system and image interpolocation method
CN1240021C (en) Bill image processing equipment
CN1258894A (en) Apparatus and method for identifying character
CN1221910C (en) Method and apparatus for character font generation within limitation of character output media and computer readable storage medium storing character font generation program
CN1741039A (en) Face organ's location detecting apparatus, method and program
CN108805126B (en) Method for removing long interference lines of text image
JP2641380B2 (en) Bending point extraction method for optical character recognition system
CN1217292C (en) Bill image face identification method
CN101046888A (en) Rendering apparatus and method, and shape data generation apparatus and method
CN1234565A (en) Identifying method and system for handwritten characters
CN1492377A (en) Form processing system and method
CN1924899A (en) Precise location method of QR code image symbol region at complex background
CN1477852A (en) Image processing equipment, image processing method and storage medium of image processing program
CN1519768A (en) Method and device for correcting image askew
CN109948621B (en) Image processing and character segmentation method based on picture verification code
CN110210440A (en) A kind of form image printed page analysis method and system
CN1653810A (en) Image angle detection device and scan line interpolation device having the same
CN1643540A (en) Comparing patterns
CN1173682A (en) Online character recognition system for recognizing input characters using standard strokes
CN1161688C (en) Character processing apparatus and method therefor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: Xinyada technology building, 3888 Jiangnan Avenue, Binjiang District, Hangzhou City, Zhejiang Province 310051

Patentee after: Sinyada Technology Co.,Ltd.

Address before: Xinyada technology building, 3888 Jiangnan Avenue, Binjiang District, Hangzhou City, Zhejiang Province 310051

Patentee before: SUNYARD SYSTEM ENGINEERING Co.,Ltd.

CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Xinyada technology building, 3888 Jiangnan Avenue, Binjiang District, Hangzhou City, Zhejiang Province 310051

Patentee after: SUNYARD SYSTEM ENGINEERING Co.,Ltd.

Address before: 310053 xinyada science and technology building, hi tech software park (phase 2), Hangzhou hi tech Industrial Development Zone (Binjiang), Zhejiang Province

Patentee before: HANGZHOU SUNYARD SYSTEM ENGINEERING Co.,Ltd.

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20050831