Summary of the invention
Based on top described, the objective of the invention is to, the space of a whole page recognition methods in a kind of bill image processing procedure simple, that easily realize is provided.
For achieving the above object, the space of a whole page recognition methods in the bill image processing procedure provided by the invention comprises:
Import the image of the space of a whole page to be identified, this image is carried out space of a whole page pre-service;
In pretreated image, search characteristic line;
Whether judge the characteristic line that finds greater than the appointment thresholding, if greater than, utilize the described characteristic line and the standard lines of all standard spaces of a whole page of storing in advance to mate; Otherwise, in pretreated image, search the feature literal, utilize the described feature literal and the grapholect of all standard spaces of a whole page of storing in advance to mate;
Determine the space of a whole page to be identified according to matching confidence.
In above-mentioned steps, the described characteristic line of searching comprises according to following step and searches horizontal line:
11) treat the test space of a whole page and carry out point by point scanning line by line, when scanning stain b_dot, change step 12), when scanning white point w_dot, change step 13);
12) whether judging point b_dot is a line segment left end point, if then change step 11), and begin to look for the line operation; Otherwise,
Judging point b_dot whether be in the line segment a bit, if, this point is charged to line length, change step 11) and continue the next point of scanning; Otherwise,
Whether judging point b_dot is the end of delegation, and is not the point on the line, if then change step 11) and carry out next line scanning; Otherwise, change step 15) and do and look for the line processing that finishes;
13) whether find line before the judging point w_dot, if not, change step 11) and continue the follow-up point of scanning; Otherwise,
Whether judging point w_dot is the white point of a line segment interrupt line part, if change the step 14) processing of breaking; Otherwise,
Whether judging point w_dot is the end point of a line segment, if change step 15) and do and look for the line processing that finishes; Otherwise change step 16);
14) will put w_dot as stain b_dot, change step 12);
15) preserve the lines that find;
16) judge the whether end of scan of the space of a whole page to be tested, if finish, change step 11) and continue scanning, otherwise finish scanning.
In step 13), whether be the white point of a line segment interrupt line part according to following step judging point w_dot:
In the scope that broken string length allows,, be offset nominated bank up and down with the current behavior datum line, begin to search stain from the horizontal ordinate of this white point, if in this scope, do not find stain, judge that then this white point is the end of a line segment, otherwise judge that this white point is a broken string part in the line.
In step 16) judge that the space of a whole page to be tested whether before the end of scan, also comprises the step of judging the line segment qualification.
Described operation of searching characteristic line also comprises according to following step and searches vertical line:
21) space of a whole page to be tested is turned over turn 90 degrees, so that vertical line changes horizontal line into;
22) search horizontal line;
23) coordinate conversion with described horizontal line is the coordinate of vertical line.
Before the step of determining the space of a whole page to be identified according to matching result, also comprise: the match point of determining the space of a whole page to be identified.
Adopt following step to determine the match point of the space of a whole page to be identified:
Determine the intersection point of all horizontal lines and vertical line;
In above-mentioned intersection point, the intersection point of the difference minimum of selection horizontal ordinate and ordinate is as match point.
Based on top described, realize the described characteristic line and the coupling of standard lines of each standard space of a whole page of storage in advance according to following step:
31) the standard space of a whole page of reading pre-stored;
32) be standard with the match point, calculate the horizontal line matching rate in the space of a whole page to be tested and the standard space of a whole page, and calculate the vertical line matching rate in the space of a whole page to be tested and the standard space of a whole page;
33) according to described horizontal line matching rate and the definite matching confidence of vertical line matching rate to this standard space of a whole page.
Determine that the horizontal line matching rate in the space of a whole page to be tested and the standard space of a whole page carries out according to following step:
41) the accumulative total horizontal line matching rate of the calculating space of a whole page to be tested and the standard space of a whole page;
42) horizontal line matching rate=accumulative total horizontal line matching rate * 2/ (the horizontal line number of the horizontal line number+standard space of a whole page of the test space of a whole page).
Described accumulative total horizontal line matching rate is determined according to following step:
51) initialization accumulative total horizontal line matching rate bMatchH=0;
52) from the set of the horizontal line of the space of a whole page to be tested, selects a non-selected horizontal line LineT, when not having selectable horizontal line LineT, end operation, otherwise commentaries on classics step 53);
53) the length L T of calculating horizontal line LineT, and it is from vertical range DVT and the horizontal range DHT of match point OrgT;
54) from the set of the horizontal line of the standard space of a whole page, selects a non-selected horizontal line LineS, when not having selectable horizontal line LineS, changes step 52), otherwise commentaries on classics step 55);
55) the length L S of calculating horizontal line LineS, and it is from vertical range DVS and the horizontal range DHS of match point OrgS;
56) whether the absolute value c that whether whether differs from greater than corresponding setting value H or LT and LS greater than the absolute value b of corresponding setting value V or DHT and DHS difference of the absolute value a that judges DVT and DVS difference is greater than corresponding setting value L, if greater than, change step 54), otherwise, calculate matching rate matchL according to following formula:
matchL=((a/V)
2+(b/L)
2+(c/L)
2)/3;
57) matchL is added to bMatchH, changes step 54 then).
Determine vertical line matching rate in the space of a whole page to be tested and the standard space of a whole page according to following step:
61) space of a whole page to be tested is turned over turn 90 degrees, so that vertical line changes horizontal line into;
62) calculate the accumulative total horizontal line matching rate of the space of a whole page to be tested and the standard space of a whole page, with described accumulative total horizontal line matching rate as vertical line matching rate totally;
63) vertical line matching rate=accumulative total vertical line matching rate * 2/ (the vertical line number of the vertical line number+standard space of a whole page of the test space of a whole page).
In addition, the present invention also provides the recognition methods of a kind of bill image space of a whole page, comprising:
Import the image of the space of a whole page to be identified, this image is carried out space of a whole page pre-service;
In pretreated image, search the feature literal;
Judge whether to find the feature literal,, utilize the described feature literal and the grapholect of all standard spaces of a whole page of storing in advance to mate if find; Otherwise, in pretreated image, search characteristic line, utilize the described characteristic line and the standard lines word of all standard spaces of a whole page of storing in advance to mate;
Determine the space of a whole page to be identified according to matching confidence.
Described step of searching the feature literal comprises:
71) search the characteristic matching point of the space of a whole page to be tested;
72) the standard edition surface information of reading pre-stored takes the picture block of appointment in pretreated image according to described information;
73) in described picture block, search the feature literal.
Because it is recognition feature that the present invention adopts with lines or literal in the space of a whole page to be identified, with the standard lines or the grapholect of each standard space of a whole page of storage mate in advance, at last determine that according to matching confidence the method for the space of a whole page to be identified carries out the identification of the space of a whole page, simple and the realization easily of method, and have higher space of a whole page recognition efficiency.
Embodiment
The main task of the method for the invention is exactly according to line information or Word message that the space of a whole page to be identified comprises it to be discerned, to determine the space of a whole page that is identified specifically is any space of a whole page, be particularly useful for bill is discerned, so that the basis of bill process computerization to be provided.
Fig. 1 is the main flow chart of the method for the invention embodiment.In flow process shown in Figure 1, the main main points of the method for the invention have been described: the layout image to be identified that obtains by the scanning bill is carried out lines or text query, characteristic line that the utilization inquiry obtains or feature literal compare with the standard lines or the grapholect of the standard space of a whole page of storing in advance, determine according to the result who contrasts what bill the bill that is identified belongs to.According to the indication of Fig. 1, realize method of the present invention, at first to store the standard information that the bill space of a whole page to be identified is discerned use, as store in the database of system, with foundation as the identification contrast according to standard ticket being used for of obtaining.Difference according to processing bill character, described standard information can adopt different characteristic, can represent the feature of the bill image space of a whole page as characteristic line, unique point and feature Word message etc., normal data in the present embodiment comprises template space of a whole page title, and all standard ledgement coordinates, vertical moulding coordinate in the template, word content and coordinate, match point etc., also comprise some other empirical value, the for example shortest length of lines, lines error in length, and lines are to vertical range, horizontal range and the error range thereof etc. of space of a whole page match point.
Based on standard information, after reading bill layout image to be identified by the optical image fetch equipment, at first will be in the pre-service before step 1 pair described layout image is discerned, to remove the various interference noises in the image.From system database, read space of a whole page initialization information in step 2 then, promptly read the information of all standard spaces of a whole page of being stored, be used for the comparison of follow-up space of a whole page identifying.Described step 1, the 2nd, the initialization step of present embodiment flow process, provide and carried out space of a whole page base of recognition, therefore, promptly can carry out searching of space of a whole page characteristic information to be identified in step 3, specifically, this step is looked for characteristic line from layout image to be identified, after search operation finishes, judge whether on the space of a whole page to be identified, to have found enough lines that are used for space of a whole page identification in step 4, if found enough lines, only illustrate and just can correctly discern as characteristic line which kind of bill the bill of judging this image sign belongs to layout image to be identified with the lines that find.At this moment, carry out step 6, according to the information of the standard space of a whole page that reads in advance, promptly standard lines and the characteristic line that finds carry out space of a whole page matching operation, determine the space of a whole page to be identified according to matching result in step 8 then, and will determine that the result feeds back to system.Since bill that layout image to be identified identified must be in the standard space of a whole page bill of being stored a kind of, therefore which kind of bill can identify layout image to be identified usually in step 8 belongs to; If, also can feed back other recognition result, there is flaw maybe can't discern etc. as the space of a whole page in this step owing to reasons such as scanning cause layout image identification error.If judgement through step 4, from layout image to be identified, do not find enough lines, illustrate that the bill that layout image to be identified identifies may be no lines bill or other situation, at this moment will discern the space of a whole page by literal, therefore be that the lines that find are when enough in judged result, want step 5 to search the characteristic matching point of the space of a whole page, and according to the standard edition surface information that reads in advance, promptly give instruction with the Word message in the standard edition surface information, in pretreated image, take the picture block of appointment, and in described picture block, search the feature literal, according to the feature literal that finds, carry out characters matching then, promptly do not have the coupling of the lines space of a whole page in step 7, determine the space of a whole page to be identified according to matching result in step 8 at last, and will determine that the result feeds back to system.
In the described embodiment flow process of Fig. 1, narrated at first and carried out space of a whole page identification according to characteristic line, if the no lines space of a whole page is discerned, carry out the situation of space of a whole page identification again according to the feature literal, this situation is fit to discern for the bill space of a whole page that most bills have under the lines feature situation.In fact, also can adopt other order, carry out space of a whole page identification according to the feature literal earlier,, carry out space of a whole page identification according to characteristic line again if to the lines space of a whole page being arranged or not having the literal space of a whole page and discern.Promptly at first in pretreated image, search the feature literal, judge whether to find the feature literal,, utilize the described feature literal and the grapholect of all standard spaces of a whole page of storing in advance to mate if find; Otherwise, in pretreated image, search characteristic line, utilize the described characteristic line and the standard lines word of all standard spaces of a whole page of storing in advance to mate.It is pointed out that and be not limited to said sequence in the reality.For example, the space of a whole page character that can judge the space of a whole page to be identified earlier and identified, the identification based on the feature literal is carried out in decision earlier again, still carries out the identification based on characteristic line earlier.
In the described embodiment flow process of Fig. 1, step 1 is described layout image is discerned before pretreated purpose be to eliminate the defective that influences its identification in the layout image, comprise following related content.Remove layout image to be identified black surround, to layout image to be identified tilt rectification, remove the noise of layout image to be identified.Shi Bie bill is colored bill if desired, perhaps the image after the scanning has certain gray scale or color, for improving the accuracy rate or the recognition efficiency of identification, can also be when pre-service begins, whether to layout image to be identified is that colour or gray scale image are judged, if, change described image into the black and white binary picture, promptly carry out binary conversion treatment.Concrete binary conversion treatment process, being used for colour or gray scale layout image data conversion through image enhancement and noise cleaning is to have only layout image data black, white two-value.Specifically can adopt such method to realize: at first to carry out Gauss's smothing filtering,, prevent to produce isolated white point and stain after the binaryzation to remove white point and stain noise; Secondly determine the prospect of view picture image and the gray-scale value of background, calculate the binaryzation threshold values, described threshold values can adopt two kinds, and the one, overall static threshold values, the 2nd, local dynamic thresholding; Reset at last the gray-scale value of view picture image, gray-scale value is made as white point greater than the picture element of threshold values, otherwise is made as stain according to the size of threshold values.
In the pretreatment operation of layout image to be identified in this example, the black surround excision is carried out at binary picture, specific practice is, space of a whole page level is divided into two zones up and down, in each zone respectively from the left side, the right begins to handle, whole like this zone can be divided into upper left district, lower-left district, upper right district and bottom right district handles, and adopts identical rule to line by line scan respectively in each district.By scanning, to the row that each scanned, according to black row judgment rule determine whether this row is black row, promptly as long as the continuous white point number in this row greater than white noise gap width given in advance, then should capablely not be black row, otherwise went for black.According to the scanning result of each row,, determine black surround and the removal that to remove again according to the judgment rule of black surround.Described image is corrected, and is that the inclination image that scanning produces is corrected.Carrying out image when tilt correcting, can carry out according to sciagraphy, concrete grammar is: one, the angle of inclination of computational picture, two, rotated image.
The method of calculating the angle of inclination is: one, rough search can be 2 degree with step-length to+30 degree scopes at-30 degree usually, searches for; Two, precise search is that 0.1 degree carries out precise search with step-length in the 2 degree scopes that rough search obtains; Mainly be to utilize direction projection statistics stain number when angle searching, promptly, add up in the view picture image stain number of each row in the direction along some angle traversal images, utilize then and add up the stain histogram calculation variance that obtains, the direction of choosing the variance maximum is as the pitch angle.
The noise of described removal layout image to be identified is that layout image is carried out Filtering Processing, removes the ground unrest in the image, and strengthens image.Filtering and noise reduction sound described here is meant and removes unnecessary lines, striped in the image, and carries out smoothing processing, the processing of desalination filter, the processing of enriching filter, the cleaning smoothing processing of character and mend broken string processing etc.The described lines that go, be on image, be the bitmap image, the judgement of lining by line scan, the wide lines of single pixel that search exists, again the wide lines of all single pixels are merged into the thick lines with actual pixels width by adjacent rule, judge then whether these thick lines satisfy the removal condition, if satisfy, explanation is the unnecessary lines in the pattern recognition process, it is removed all lines that satisfy condition up to elimination.If in going the lines process, the part effective information on the character is removed, for guaranteeing the complete of character, at this moment, also to carry out repairing to impaired character.Concrete mending course is that near the distribution situation of the character of lines has been removed in scanning, detects the particular location that needs to repair character, repairs according to the average length of lines length about the damaged part then.
The described striped that goes is with by the scanning to image, length in the delegation is not more than the lines removal of given width.
The smoothing processing of described character is meant for the point in the character to be divided into a little and to mend some two rules that visual bitmap pointwise is judged rule induction is 3 * 3 matrixes, with reference to figure 2, the central point of this matrix is a judging point.The coordinate of supposing this point for (I, J), (I J) is white point, if 8 some stains are many on every side for they, such as (I-1, J-1), (I, J-1), (I+1, J-1), (I-1, J), (I+1 J) is stain, and then this point should be mended and be stain; Otherwise (I J) is stain, if 8 some white points are many around it, thinks that then this point is a noise point, should remove.
Described desalination filter is handled, and is to adopt corroding method to eliminate object boundary point in the image.If structural element is got 3 * 3 stain piece, corrosion will make the border of object reduce by a pixel along periphery.If between two objects tiny connection is arranged, when structural element is enough big, two objects can be separated so by erosion operation.The expression formula of erosion algorithm is: X-S=∩ X[s] |-s ∈ S}, X are target image, and S is a structural element.At first read in the some pixel value of (being called current point) in the former image, getting with this point is 3 * 3 matrixes at center, if this is a stain, and 8 points not all are stains on every side, and then this point is composed into white point, promptly erodes.
Described enriching filter is handled, and is used to strengthen the brightness of image, improves the contrast of image.Concrete method equation expression is: X+S=∪ X[s] | s ∈ S}, X are target image, and S is a structural element.At first read in the some pixel value of (being called current point) in the former image, getting with this point is 3 * 3 matrixes at center, if this is a white point, and 8 points not all are white points on every side, and then this point is composed into stain, promptly expands.Also can adopt the disposal route of light enriching in the reality, i.e. the shape of limiting structure element S.Such as the expansion process left that makes progress, at first read in the some pixel value of (being called current point) in the former image, get with this point is 3 * 3 matrixes at center, if this is a white point, and left side point, upper left point and last edge point not all are white point (promptly only judging 3 points), then this point is composed into stain, promptly expands.
Cleaning is level and smooth to be handled with mending to break, and its rule also reduces 3 * 3 matrixes, and the central point of this matrix is a judging point.Smoothing processing is considered is a little, and promptly stain bleaches a little, and what mend that broken string considers is to mend point, i.e. white point blackening point.Mend the broken string processing rule, the number of stains in the middle of 8 points around the judgment matrix central point at first, next judges each stain position, whether this point is become stain according to the number and the determining positions of above-mentioned stain; Smoothing processing is at first considered on every side white point number in 8 points, considers each white point position relation then, whether this point is become white point according to the number and the determining positions of above-mentioned white point.
The process of looking for characteristic line from layout image to be identified that step 3 is carried out is a committed step of the present invention, has the space of a whole page of lines bill to discern for great majority, and the lines that find according to this step are feature, can finish the identification of the space of a whole page.This process comprises the subprocess of searching horizontal line and vertical line, because it is different that the difference for the treatment of horizontal line and vertical line is the angle of the space of a whole page, angle rotation by the space of a whole page can realize the mutual conversion of horizontal line and vertical line, therefore searches horizontal line and vertical line can be based on same method.Concrete horizontal line is searched based on scanning of image.When binary picture is scanned, can adopt from top to bottom, or scan mode from top to bottom to line by line scan, the figure image point that scans may be stain, also may be white point.When finding a stain b_dot, there are four kinds of possibilities, corresponding thus different some processing mode:
1. some b_dot may be a line segment left end point, at this moment begins to look for the line operation;
2. some b_dot may be a bit in the line segment, at this moment needs this point is charged to the line length variable, continues the next point of scanning;
3. be positioned at the end of delegation if put b_dot, and be not the point on the line, then proceed next line scanning;
4. if some b_dot is positioned at the end of delegation, and be the point on, then do and look for the line processing that finishes.
When finding a white point w_dot, also exist three kinds may reach corresponding processing mode:
1. do not find line before the some w_dot, at this moment do not process, directly scan follow-up point;
2. some w_dot may be a white point in the line segment interrupt line subregion, the processing of at this moment will breaking;
3. some w_dot may be the end point of a line segment, at this moment looks for the line processing that finishes.
Based on above-mentioned possibility, adopt the described flow process of Fig. 3 to search horizontal line.Before beginning to look for the line operation, lines set variable and current lines variable should be set usually, so that the lines that find are preserved.At first treat the test space of a whole page and carry out point by point scanning line by line, when scanning stain b_dot, change step 17 and carry out the stain processing, when scanning white point w_dot, change step 13 and carry out the white point processing in step 11.If that scan is stain b_dot, whether be that (whether it is stain that left end point is characterized as this point to a line segment left end point, and the front a bit is a white point, can be a line segment left end point by above-mentioned feature judging point b_dot therefore at step 17 judging point b_dot.), if, begin to look for the line operation in step 18, promptly begin this point is charged to current lines variable, change the operation that step 11 continues the scanning subsequent point then; Otherwise, whether at step 19 judging point b_dot is a bit (entering when looking for the line operation in the line segment, if this point is a stain, can determine that it is a bit in the line segment), if, directly this point is charged to current lines variable, change step 11 then and continue the next point of scanning in step 20; Otherwise, whether be the end of delegation at step 21 judging point b_dot, and be not the point on the line, if, illustrate that this point may be a noise spot, irrelevant with the lines that will search, at this moment then change step 11 and carry out next line scanning; Otherwise, illustrate whether some b_dot are the end of delegation, and be the point on a line, therefore change step 22 and do and look for the line processing that finishes.
If that scan is white point w_dot, whether at step 13 judging point w_dot is the white point of a line segment interrupt line part, if, the processing of need breaking, therefore in step 15 processing of breaking, broken string in this example handle be with a w_dot as stain b_dot, change step 12 then and carry out stain and handle; Otherwise whether found line before step 14 judging point w_dot, promptly whether the point before this point is the ending of a line, if not, illustrates that this point is common white point, irrelevant with lines, changes step 11 this moment and continues the follow-up point of scanning; Otherwise, whether be the end point of a line segment at step 16 judging point w_dot, promptly the point before this point is the point of a line, if change step 22 and do and look for the line processing that finishes.
Handle at the step 22 pair lines that find, judge the whether end of scan of the space of a whole page to be tested in step 23 then,, change step 11 and continue scanning, otherwise finish scanning, carry out the subsequent treatment of the space of a whole page to be identified if finish.
In this example, whether according to following step judging point w_dot is the white point of a line segment interrupt line part: in the scope that broken string length allows, with the current behavior datum line, be offset nominated bank up and down, begin to search stain from the horizontal ordinate of this white point, if in this scope, do not find stain, judge that then this point is the end of a line segment, otherwise judge that this white point is a broken string part in the line.
The lines of step 22 are handled, and are line segment to be carried out the judgement of qualification, if what find is qualified lines, it is saved in the lines set variable, the further feature information that also comprises lines of Bao Cuning simultaneously as parameters such as the length of lines, coordinates, otherwise is abandoned this underproof lines.In this example to the judgement of lines qualification by with relatively the finishing of empirical value, for example, preestablish the length empirical value of hyphen lines, as 9 bits, and the length empirical value of long ledgement, as 120 bits.In the vertical line of down-stream is searched, also to as 70 bits, compare, thereby finish the judgement of vertical line qualification according to the length empirical value of predefined vertical moulding.The main effect of step 22 is noise lines of removing in letter in the layout image or the Chinese character, is convenient to follow-up identification.
When ledgement search finish after, just begin to search vertical moulding, at first the space of a whole page to be tested is turned over and turn 90 degrees, so that vertical line changes horizontal line into, carry out searching of horizontal line according to above-mentioned horizontal line lookup method then, the coordinate conversion of the horizontal line in the lines set that at last this search procedure is produced is the coordinate of vertical line, thereby makes all horizontal lines in this set be converted into vertical line.With reference to figure 4.That supposes image widely is w, and height be h, before the rotation and after the rotation coordinate at four angles of image as shown in the figure, if the ordinate of the horizontal line that finds after rotation is a, the horizontal ordinate that then is converted to the bitmap vertical line also is a.
In fact,, and in identifying, need oblique line to be used for space of a whole page identification, only need the angle that it is suitable that the space of a whole page overturns can be carried out searching of oblique line, thereby finish the process of carrying out space of a whole page identification by oblique line if comprise oblique line in the bill image of being discerned.
In step 4, judge whether on the space of a whole page to be identified, to have found enough lines that are used for space of a whole page identification, so just can learn with the lines that found to be that can feature correctly be discerned the space of a whole page to be identified.In this example, described enough lines are horizontal line and the vertical line more than 2 more than at least 2.
Before the space of a whole page matching operation of step 6, also to determine the match point of the space of a whole page to be identified.This match point can be determined when the scanning bill is visual by system, can determine that also concrete mode should be determined consistent with the standard form match point according to ledgement that finds and vertical moulding.Select a kind of mode in back in this example, concrete grammar is: at first determine the intersection point of all horizontal lines and vertical line, in above-mentioned intersection point, the intersection point of the difference minimum of selection horizontal ordinate and ordinate is as match point then.Like this, the lines that find in utilization as characteristic line when the standard lines of each standard space of a whole page of storage carry out matching operation in advance, the normal data of the standard space of a whole page of elder generation's reading pre-stored, be standard then with the match point, calculate the horizontal line matching rate in the space of a whole page to be tested and the standard space of a whole page, and calculate vertical line matching rate in the space of a whole page to be tested and the standard space of a whole page, last just can determine matching confidence to this standard space of a whole page according to the horizontal line matching rate that relatively obtains with the standard space of a whole page and vertical line matching rate, obtaining the matching confidence to each standard space of a whole page thus, can the space of a whole page to be identified be any space of a whole page by described confidence level.
In this example, determine horizontal line matching rate in the space of a whole page to be tested and the standard space of a whole page: at first calculate the accumulative total horizontal line matching rate of the space of a whole page to be tested and the standard space of a whole page, try to achieve the horizontal line matching rate according to described accumulative total horizontal line matching rate then according to following step.Concrete method is:
Horizontal line matching rate=accumulative total horizontal line matching rate * 2/ (the horizontal line number of the horizontal line number+standard space of a whole page of the test space of a whole page).
The key here is at first to obtain described accumulative total horizontal line matching rate, adopt following method in this example, with reference to figure 6, if LineT is a horizontal line to be tested in the space of a whole page to be measured, OrgT is a match point for the treatment of the side space of a whole page, LineS is a horizontal line in the standard masterplate, and OrgS is the match point of standard masterplate.
When calculating accumulative total horizontal line matching rate, its process must at first import lines positional information and match point initial position in the space of a whole page to be measured and the standard masterplate with reference to figure 5, and horizontal line matching rate variable bMatchH is set.According to Fig. 5, at step 31 initialization accumulative total horizontal line matching rate variable bMatchH, make it equal 0, then in step 32 from the set of the horizontal line of the space of a whole page to be tested, select a non-selected horizontal line LineT,, promptly may not have selectable horizontal line LineT because this selection may make unsuccessfully, therefore, after selection, judge in step 33 whether selection is successful, if unsuccessful, illustrate and do not have selectable horizontal line to be measured, at this moment direct end operation, otherwise change step 34, calculate the length L T of horizontal line LineT, and it is from vertical range DVT and the horizontal range DHT of match point OrgT, with this parameter as lines coupling to be tested; Then in step 35 from the set of the horizontal line of the standard space of a whole page, select a non-selected standard horizontal line LineS, the horizontal line coupling is carried out in preparation, because the standard horizontal line is also by selected the finishing of possibility, therefore, to judge whether selection is successful in step 36, if it is unsuccessful, illustrate and do not have selectable standard horizontal line LineS, being that the standard horizontal line is selected finishes, need to select next bar horizontal line to be tested proceed with the standard space of a whole page in the matching operation of standard lines, therefore change step 32 and select next bar horizontal line to be tested; Otherwise at the length L S of step 37 basis of calculation horizontal line LineS, and it is from vertical range DVS and the horizontal range DHS of match point OrgS, prepares to mate with lines to be tested.Concrete matching operation is from step 38, in step 38, calculate the absolute value a of DVT and DVS difference, the absolute value c of the absolute value b of DHT and DHS difference and LT and LS difference, then step 39 judge above-mentioned difference a whether greater than the setting value V of correspondence or difference b whether greater than the setting value H of correspondence or difference c whether greater than the setting value L of correspondence, if have one in the above-mentioned difference greater than corresponding preset threshold, illustrate that coupling is unsuccessful, the lines to be tested and the standard lines that are promptly mated do not have similarity, at this moment, will change step 35 selects next bar standard horizontal line to proceed coupling; If described difference a, b, c are not more than corresponding preset threshold, lines are described, and the match is successful, therefore, calculates matching rate matchL in step 40 according to following formula:
matchL=((a/V)
2+(b/L)
2+(c/L)
2)/3;
And matchL is added among the variable bMatchH in step 41, change step 35 then and select non-selected standard horizontal line to proceed matching operation.
For the matching operation that makes continuation has better effect, before changeing step 35, step 41 can also judge that whether matchL is the maximal value in all matching rates that calculated, if, the mid point of horizontal line LineT is composed to OrgT, and the mid point of horizontal line LineS composed to OrgS, and then carry out the operation of changeing step 35.
Based on above-mentioned accumulative total horizontal line matching rate, when determining the vertical line matching rate in the space of a whole page to be tested and the standard space of a whole page again, can adopt the operation of following step: at first the space of a whole page to be tested is turned over and turn 90 degrees, so that vertical line changes horizontal line into, calculate the accumulative total horizontal line matching rate of the space of a whole page to be tested and the standard space of a whole page again, as accumulative total vertical line matching rate, the vertical line matching rate that obtains like this is with described accumulative total horizontal line matching rate:
Vertical line matching rate=accumulative total vertical line matching rate * 2/ (the vertical line number of the vertical line number+standard space of a whole page of the test space of a whole page).
In the process of searching the feature literal described in the step 5 of Fig. 1 be: the characteristic matching point of at first searching the space of a whole page, this characteristic matching point can be the true origin that system determines, it also can be the definite point of matches criteria point of the normative reference space of a whole page, with this point is reference, the standard edition surface information of reading pre-stored, in pretreated image, take the picture block of appointment according to described information, from described picture block, search the feature literal again.When concrete search operation, because the track of literal is made up of the stain of a series of adjacent arrangements certainly, so can classify adjacent stain as one group, and each stain of organizing adjacent arrangement can surround it with a rectangle, form the connection piece, with reference to figure 7.Be communicated with piece according to this, can adopt following Word message leaching process:
1, determines the approximate range of required literal;
2, take the image of specified scope;
3, remove lines and shading;
4, search all connection pieces;
5, some arrangements of merging with good conditionsi are communicated with piece closely;
6, classify the connection piece of specified altitude assignment and length as a set A;
7, the Word message of in set A, searching;
It should be noted that, because some literal intersect with lines and shading, in the superincumbent operation, because the step of lines and shading in the existence removal literal, so when removing lines and shading, the part of literal can be erased, in order to increase the accuracy of pattern recognition, also need to carry out literal and repair, concrete method for repairing and mending is with reference to the described related content of top step 1.
In the specific embodiment of the invention described above, after the space of a whole page tilts to correct, only utilize the horizontal line information and the vertical line information of the space of a whole page.When the space of a whole page is looked for the toe-in bundle, for the horizontal line section, need its ordinate of record, the horizontal ordinate of left end point horizontal ordinate and right endpoint, the difference of these two horizontal ordinates is exactly the length of horizontal line section.Need write down its horizontal ordinate for the vertical line section, the ordinate of upper extreme point ordinate and lower extreme point, the difference of these two ordinates is exactly the length of vertical line section.
When calculating the horizontal line section, calculate vertical range and the horizontal range of this horizontal line section left end point to match point to the vertical range of match point and horizontal range; When calculating perpendicular section, calculate vertical range and the horizontal range of this horizontal line section lower extreme point to match point to the vertical range of match point and horizontal range.
For example:
1) ordinate of a certain horizontal line section is y, and the left end point horizontal ordinate is x1, and the horizontal ordinate of right endpoint is xr, the match point coordinate is (x0, y0). then the horizontal line segment length is xr-x1, and line to the vertical range of match point is | y-y0|, line to the horizontal range of match point be | x1-x0|.
2) horizontal ordinate of a certain vertical line section is x, and the upper extreme point ordinate is yt, and the ordinate of lower extreme point is yb, the match point coordinate is (x0, y0). then the vertical line segment length is xt-xb, and line to the vertical range of match point is | x-x0|, line to the horizontal range of match point be | yb-y0|.
Need to prove, in specific embodiments of the invention, what adopt is the foundation that horizontal line and vertical line contrast as the space of a whole page, adopt diagonal features to carry out space of a whole page coupling in the reality and also be fit to method of the present invention, difference only is the angle that the space of a whole page is rotated when handling, therefore, the present invention has better use dirigibility in practice.