Summary of the invention
At the defective that exists in the prior art, the binarization method and the system that the purpose of this invention is to provide a kind of image, these method and system can be good at solving the image by shootings such as digital cameras, because the uneven problem of the background depth that problems such as illumination cause.Simultaneously, saltus step information between the pixel that directly obtains when the present invention utilizes scan image is judged prospect, the background attribute of image block, add the mutual reference of Three Estate feature, prospect, background judgement to pixel are quick, accurate, thereby finally can obtain to be fit to the black white image of OCR identification.
For reaching above purpose, the technical solution used in the present invention is:
A kind of binarization method of image may further comprise the steps:
(1) document with input is converted to Digital Image Data;
(2) at first image is pressed the Three Estate piecemeal, be respectively whole level, sub-image area level, pixel area level, the eigenwert of each piece of scan statistics Three Estate then;
(3) calculate the threshold value of whole level and sub-image area each piece of level;
(4) data that obtain according to step (2) and step (3) are revised the eigenwert of sub-image area level;
(5) according to revised eigenwert to image pointwise binaryzation.
Further, for making the present invention have better effect, in the step (2), described whole level is that entire image is a piece, the sub-image area level is that image is divided into the experimental process image, and described subimage is a fixed size, or according to the size of whole level image, determine that in proportion each subimage block is not less than 128 * 128 picture elements; The pixel area level is that the pixel dot matrix with n*n is one, and wherein n is a positive integer, and n is smaller or equal to 16.
In the step (2), the eigenwert of each piece of scan statistics Three Estate realizes as follows: scan each picture element successively, for each picture element, position according to its place image, determine which piece it belongs to respectively at different levels, then the feature of this pixel is participated in the at different levels characteristic statistics under it.
The eigenwert of each level block further again, step 2) comprises the grey level histogram, maximum gradation value, minimum gradation value, saltus step average gray of each piece in the subimage, the number of picture elements that Gray Level Jump is bigger;
Gray Level Jump in the above-mentioned eigenwert is asked for like this: relatively present picture element point and its interlacing or every the gray-scale value of row picture element, if it is enough big, the Gray Level Jump number of picture elements adds 1, each gray scale difference value is added up, after having scanned interior all pixels of this piece, the merchant of gray scale difference value that adds up and Gray Level Jump number of picture elements is the Gray Level Jump average of this piece.
Step 2) Gray Level Jump asks in the process in, the gray difference that is not per two pixels all will add up, the present invention at first sets a basic hop value, has only the just participation accumulation calculating of two pixel gray differences greater than basic hop value, and the empirical value of described basic hop value is between 5-8.
Further, in the step (3), use the grey level histograms that count at different levels, obtain the threshold value of each piece image in entire image and the subimage level with big Tianjin method.
Further, the data that obtain according to step (2) and step (3) in the step (4) method that the eigenwert of sub-image area level is revised may further comprise the steps:
A. at first, set one with reference to hop value.With sub-image area level binary-state threshold is horizontal ordinate, and sub-image area level hop value is an ordinate, and subimage is divided in the different zones;
B. then, the subimage of zones of different is taked different analytical approachs;
Further, described analytical approach is as follows:
A) display foreground, background replace image less and that background proportion is more, at first select the bigger image of hop value around this subimage, with the average binary-state threshold of the bigger subimage of these saltus steps binary-state threshold as this subimage.If other subimage blocks that do not satisfy condition around this subimage block, so just judge the background piece that this subimage block is whether pure, if just make the binary-state threshold of this subimage block be this subimage block minimum gradation value-1, after being binaryzation, this image block does not have the prospect pixel.Otherwise the whole level of order threshold value is the binary-state threshold of this subimage block.
B) for the subimage of background that may have partial continuous or prospect, rescan statistics with reference to the saltus step of pixel area level.
Further, of the present invention rescaning in the process, at first determine the accurate hop value of minimum, find all pixel level pieces that are included in the subimage level piece then, have only satisfy greater than the accurate pixel level piece of hop value just participate in rescaning, statistics.If there is not the enough pixel level piece that satisfies condition to participate in statistics, so just by 1) method handle.With 1) to handle in the method different be that if be not used in other subimages that satisfy condition of calculated threshold around the subimage, threshold value does not make an amendment.
C) replace more subimage for prospect, background, will not revise.
The method that two-value transforms in the step (5) may further comprise the steps:
(a) at first determine 3 rules that the pixel binaryzation is mainly followed: promptly 1. gray-scale value is big more, illustrates that then the color of picture element is light more, tends to be judged as background, and vice versa; 2. the interior saltus step of the subimage block of sub-image area level is big more under the pixel, and explanation prospect, background changing are strong more, and it is more to contain band identification literal.3. saltus step is big more in the pixel level piece, and it is bigger to illustrate that pixel belongs to the possibility of edge pixel.
(b) utilize above-mentioned rule, each pixel is done prospect, background judgement.
A kind of binaryzation system of image comprises with lower device: image-input device, blocked scan device, threshold value are asked for device, data analysis set-up and two-value reforming unit and output unit;
Wherein, image-input device is used for the document of input is converted to Digital Image Data; The blocked scan device is used for the eigenwert of image by Three Estate piecemeal and each piece of scan statistics Three Estate, and described Three Estate piecemeal is respectively whole level, sub-image area level, pixel area level; Threshold value is asked for the threshold value that device is used to calculate whole level and sub-image area each piece of level; Data analysis set-up is used for asking for device according to blocked scan device and threshold value and obtains various eigenwerts, and the eigenwert of sub-image area level is revised; The two-value reforming unit is used for original image is converted to the image file of black and white two-value; Output unit is used to export the image file that has been converted the black and white two-value.
Effect of the present invention is: adopt method of the present invention, can obtain being suitable for the black white image of OCR identification quickly and accurately by gray level image, be particularly useful for digital camera, the first-class shooting of shooting, because the binaryzation of the image of the background colour inequality that shooting angle, light, shade etc. cause.
Embodiment
The invention will be further described below in conjunction with the drawings and the specific embodiments.
As shown in Figure 1, a kind of binaryzation system of image, comprise characteristics of image statistical study part and pointwise binaryzation part, specifically comprise with lower device: image-input device, blocked scan device, threshold value are asked for device, data analysis set-up and two-value reforming unit and output unit.
Wherein, the characteristics of image statistical study partly comprises image-input device, and it can be image-input devices such as scanner, facsimile recorder or digital camera, is particularly useful for the image that digital camera or camera are taken.Comprise that also scanister, threshold value ask for device, data analysis set-up.
Wherein, image-input device is used for the document of input is converted to Digital Image Data; The blocked scan device is used for the eigenwert of image by Three Estate piecemeal and each piece of scan statistics Three Estate, and described Three Estate piecemeal is respectively whole level, sub-image area level, pixel area level; Threshold value is asked for the threshold value that device is used to calculate whole level and sub-image area each piece of level; Data analysis set-up is used for asking for device according to blocked scan device and threshold value and obtains various eigenwerts, and the eigenwert of sub-image area level is revised; The two-value reforming unit is used for original image is converted to the image file of black and white two-value; Output unit is used to export the image file that has been converted the black and white two-value.
A kind of binarization method of image may further comprise the steps:
(1) will import document and be converted to Digital Image Data, described document can be document printing, hand-written document or other forms of document;
(2) at first image is pressed the Three Estate piecemeal, be respectively whole level, sub-image area level, pixel area level, the eigenwert of each piece of scan statistics Three Estate then;
1) at first image is pressed the Three Estate piecemeal, first grade is whole level, and promptly entire image is a piece; Second grade is the sub-image area level, and image is divided into several number of sub images of equal size, as, image is divided into 10 * 10.Because each subimage block need calculate binary-state threshold by big Tianjin method, so each subimage block can not be too small, under the smaller situation of image, reduce the subimage block number, make each subimage block be not less than 128 * 128 picture elements.Three Estate is the pixel area level, and the pixel dot matrix as 8 * 8 is one.
2) then, scan each picture element successively.As shown in Figure 2, step (2) is by the point on the scanister pointwise statistical picture, and the data of needs realize by the following method during according to the gray-scale value record statistical study of each point.
For each picture element, according to the position of its place image, determine which piece it belongs to respectively at different levels, then the feature of this pixel is participated in the at different levels characteristic statistics under it.Because all pixels all belong to same on the first estate, needn't add up the feature of whole level during scanning, being added up by the feature of subimage level in the scanning back to obtain the statistical value of whole level.Therefore only add up the eigenwert of subimage level and pixel level in the scanning process.
Eigenwert in the said method comprises the grey level histogram, maximum gradation value, minimum gradation value, saltus step average gray of each piece in the subimage, number of picture elements that Gray Level Jump is bigger etc.
Gray Level Jump in the above-mentioned eigenwert is asked for like this: the gray-scale value of relatively present picture element point and its interlacing (or every row) picture element, if it is enough big, the Gray Level Jump number of picture elements adds 1, each gray scale difference value is added up, after having scanned interior all pixels of this piece, the merchant of gray scale difference value that adds up and Gray Level Jump number of picture elements is the Gray Level Jump average of this piece.Two pixels of above-mentioned comparison also can be adjacent, but interlacing or every row two pixels between on the border saltus step more obvious, then difference is less for non-border, so better effects if.
Asking in the process of above-mentioned Gray Level Jump, the gray difference that is not per two pixels all will add up, because also have small difference between close two pixels that belong to background or prospect together, if the difference that these are small adds up, and the calculating of the difference fellowship saltus step of pixel between the border, must obliterate the high-lighting of saltus step feature, therefore, the present invention at first sets a basic hop value (empirical value of basic hop value is between 5-8), and two pixel gray differences are greater than the just participation accumulation calculating of basic hop value.
In the scanning process of the present invention, needn't all add up by each pixel, facts have proved, interlacing, can not influence the various eigenwerts of monoblock, and reduced sweep time greatly every the extraction statistics pixel of row.
(3) calculate the threshold value of whole level and sub-image area each piece of level;
In the present embodiment, use the grey level histograms that count at different levels, obtain the threshold value of each piece image in entire image and the subimage level, certainly, also can adopt other methods of asking for threshold value, such as average gray threshold method, Mathematical Expectation Method etc. with big Tianjin method.
(4) data that obtain according to step (2) and step (3) are revised the eigenwert of sub-image area level;
The process of data analysis of the present invention is exactly further to make the accurate process of subimage level binary-state threshold.The parameter of data analysis set-up input is the eigenwert that scanister and threshold value are asked for each grade of image that obtains in the device, output be the accurate relatively binary-state threshold of each piece subimage of subimage level.At first, set one with reference to hop value (empirical value is the mean value of whole level average transition value and basic hop value).With sub-image area level binary-state threshold is horizontal ordinate, and sub-image area level hop value is an ordinate, and subimage is divided in the different zones.Fig. 3 illustrates the feature regional of using according in the one embodiment of the invention data analysis process.As seen from Figure 3, different according to subimage binary-state threshold and hop value, subimage is in 3 different zones.The present invention will take different analytical approachs to the subimage in 3 zones.
At first, the subimage saltus step of a-quadrant is less, and binary-state threshold is higher, then display foreground, background replace less and background proportion more, for the image of this feature, the threshold value in the zone that big Tianjin method calculates is higher usually, and the image that obtains by this threshold binarization will produce a lot of stains.For the subimage of a-quadrant, we at first select the bigger image of hop value around this subimage, with the average binary-state threshold of the bigger subimage of these saltus steps binary-state threshold as this subimage.If other subimage blocks that do not satisfy condition around this subimage block, so just judge the whether pure background piece of determining of this subimage, if just make the binary-state threshold of this subimage block be this subimage block minimum gradation value-1, after being binaryzation, this image block does not have the prospect pixel.
The judgement of pure background piece of the present invention is as follows, at first, the saltus step threshold value of subimage block is less, satisfy less than basic hop value and average with reference to hop value, secondly, the minimum pixel value of subimage is less than whole level binary-state threshold, and the maximum pixel value of subimage is greater than whole level binary-state threshold.
If can not satisfy above-mentioned two conditions, the whole level of order threshold value is the binary-state threshold of this subimage block.
The subimage saltus step in B zone is lower than the saltus step of whole level, illustrates that may there be the background or the prospect of partial continuous in this zone, therefore, for this regional subimage, rescan statistics with reference to the saltus step of pixel area level.
It is of the present invention that to rescan process as follows:
At first determine the accurate hop value of minimum, think to exist enough preceding backgrounds alternately in the pixel level piece greater than this accurate hop value that minimum accurately hop value is taken at reference between hop value and the whole grade of average transition value.
Find all pixel level pieces that are included in the subimage level piece then, the pixel level piece that satisfies greater than accurate hop value is rescaned, statistics is calculated the threshold value of binary-state threshold as this subimage block with big Tianjin method again according to histogram.
If there are not enough pixel level pieces that satisfies condition to participate in statistics, so just as the subimage of a-quadrant, the bigger image of hop value around the chooser image is with the average binary-state threshold of the bigger subimage of these saltus steps binary-state threshold as this subimage.Different with the processing of a-quadrant subimage is that if be not used in other subimages that satisfy condition of calculated threshold around the subimage, threshold value does not make an amendment.
The subimage hop value in C zone is higher, thinks that this regional subimage prospect, background are alternately more, is equal to the zone that comprises a lot of text messages, and the threshold value that therefore big Tianjin method calculates is more accurate, will not revise.
(5) the two-value reforming unit is to image pointwise binaryzation, and being about to greyscale image transitions is the image file of black and white two-value.
The flow process of pixel binaryzation of the present invention is mainly followed following rule:
1. gray-scale value is big more, illustrates that then the color of picture element is light more, tends to be judged as background, and vice versa.
2. the interior saltus step of the subimage block of sub-image area level is big more under the pixel, and explanation prospect, background changing are strong more, and it is more to contain band identification literal.
3. saltus step is big more in the pixel level piece, and it is bigger to illustrate that pixel belongs to the possibility of edge pixel.
As shown in Figure 4, the flow process of a pixel binaryzation judgement is as follows:
At first, according to being the center with a whole level threshold value of asking for threshode devices output, mark an isolation strip, the upper edge of isolation strip be the max-thresholds of whole level, and the lower edge be the minimum threshold of integral body grade.Order is maximum, minimum threshold is respectively Max_T, Min_T, and whole level threshold value is Whole_T, so
Max_T=Whole_T×(1+α),
Min_T=Whole_T×(1-α),
Wherein α is a fixing coefficient, and empirical value is 0.4.
According to existing data, the parts of images pixel that can carry out binaryzation has following several situation:
1. gray-scale value is judged as background pixels greater than whole level max-thresholds;
2. gray-scale value is less than whole level minimum threshold, Gray Level Jump less (preventing the character background erroneous judgement of shadow region) in the subimage level piece, and the maximum gray scale of subimage level piece is judged as the prospect pixel greater than the pixel of threshold value (preventing does not have the erroneous judgement of prospect shadow region).
3. gray-scale value is greater than whole level threshold value, and the pixel level Gray Level Jump is less, and pixel level maximum, minimal gray difference are less, are judged as background pixels.
If pixel does not belong to above-mentioned three kinds of situations, just obtain the pixel level threshold value according to the bilinearity difference.At first find out the subimage block of the four number of sub images levels nearest apart from this picture element, think that the subimage level threshold value of each subimage block is the binary-state threshold of this image block central point pixel, the threshold value of establishing four central points is respectively T
TL, T
TR, T
BL, T
BR, W
T, W
B, W
L, W
RBe the weights to this picture element degree of influence, then the threshold value of this picture element is:
T=W
T×(T
TL×W
L+T
TR?×W
R)+W
B×(T
BL×W
L+T
BR×W
R);
Wherein, above-mentioned weights are obtained by the distance calculation of picture element to the rectangle that is the summit with four central points (or square) each limit, upper and lower, left and right, are inversely proportional to distance.
According to picture element threshold value other eigenwerts with the statistics in early stage, the binaryzation result of most of pixel just can determine.Mainly comprise following situation:
1. if pixel gray-scale value and picture element threshold value differ bigger, then the pixel gray-scale value is greater than the background that is judged as of threshold value, on the contrary the prospect of being judged as.
2. if pixel gray-scale value and picture element threshold value are more or less the same, the pixel level Gray Level Jump is less, the difference of pixel level maximum gradation value and minimum gradation value is also very little, illustrate that this pixel level zone exists the possibility on border less, if the gray-scale value of this picture element then is judged as background greater than mean value maximum, minimum gradation value.
3. if pixel gray-scale value and picture element threshold value difference are very little, do not satisfy above-mentioned condition again, will ask for the transitional states of this pixel, determine that pixel should be judged to background or prospect.
Ask in the transitional states of this pixel in step, if HT, HB, HL, HR are respectively the gray scale difference of the adjacent upper and lower, left and right pixel of this pixel and this pixel itself, if gray scale difference is greater than the product of a basic saltus step and a coefficient (empirical value is 1.5~2) arbitrarily, think that then this pixel belongs to edge pixel, to determine according to transitional states, otherwise directly according to the pixel level threshold decision.
The following several situation of passing through that is defined as edge pixel is judged:
1. if HT, HB, HL, HR's and greater than the basic saltus step of twice illustrate that this pixel is that more peripheral pixel color is darker, is judged as prospect;
2. if HT, HB, HL, HR's and less than the twice of negative basic saltus step illustrate that this pixel is that more peripheral pixel color is more shallow, is judged as background;
If can not satisfy above-mentioned two conditions, then use the binary-state threshold of the mean value of pixel level maximal value and minimum value as this pixel.
Experimental result shows that by adopting the present invention, the image after the binaryzation can be good at evading the deep mixed problem that causes of background that image is caused owing to the shooting angle difference, and relative very fast with the regional binaryzation phase specific rate of other complexity.
In view of this, the present invention has played significant optimization function.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.