USRE47889E1 - System and method for segmenting text lines in documents - Google Patents

System and method for segmenting text lines in documents Download PDF

Info

Publication number
USRE47889E1
USRE47889E1 US15/200,351 US201615200351A USRE47889E US RE47889 E1 USRE47889 E1 US RE47889E1 US 201615200351 A US201615200351 A US 201615200351A US RE47889 E USRE47889 E US RE47889E
Authority
US
United States
Prior art keywords
feature
fragment
images
classifier
segmenter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US15/200,351
Inventor
Eric Saund
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
III Holdings 6 LLC
Original Assignee
III Holdings 6 LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by III Holdings 6 LLC filed Critical III Holdings 6 LLC
Priority to US15/200,351 priority Critical patent/USRE47889E1/en
Assigned to PALO ALTO RESEARCH CENTER INCORPORATED reassignment PALO ALTO RESEARCH CENTER INCORPORATED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SAUND, ERIC
Assigned to III HOLDINGS 6, LLC reassignment III HOLDINGS 6, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: PALO ALTO RESEARCH CENTER INCORPORATED
Application granted granted Critical
Publication of USRE47889E1 publication Critical patent/USRE47889E1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • G06K9/00456
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/155Removing patterns interfering with the pattern to be recognised, such as ruled lines or underlines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • G06K9/00442
    • G06K9/00449
    • G06K9/346
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present exemplary embodiments relate to systems and methods for segmenting text lines in documents, and the use of the segmented text in the determination of marking types in documents.
  • An automated electronic based system having the capability for such detection has uses in a number of environments. For example, in legal document discovery it is valuable for lawyers to be able to quickly narrow down, from millions of pages, those pages which have been marked on. Also, in automated data extraction, absence of handwritten marks in a signature box can be translated to mean the absence of a signature. Further, being able to tell noise marks apart from machine printed marks can lead to better segmentation for optical character recognition (OCR). It is therefore envisioned one area the present system will find use is in the context of forms, where printed or handwritten text may overlap machine printed rules and lines.
  • OCR optical character recognition
  • Identifying granular noise sometimes called salt and pepper noise
  • line graphics and machine print text have received the most attention in document image analysis literature.
  • the dominant approaches have relied on certain predictable characteristics of each of these kinds of markings. For example, connected components of pixels that are smaller than a certain size are assumed to be noise; large regions of dark pixels are assumed to be shadows; and long straight runs of pixels are assumed to come from line graphics.
  • Identification of machine print text is an even more difficult task. In commercial OCR packages, systems for the detection of machine printed regions have been heavily hand-tuned, especially for Romanic scripts, in order to work in known contexts of language, script, image resolution and text size. While these processes have had certain success when used with clean images, they have not been successful when dealing with images having clutter.
  • Zheng et al. “Machine Printed Text And Handwriting Identification In noisy Document Images,” IEEE Trans. Pattern anal. Mach. Intell., 26(3):337-353, 2004, emphasized classifying regions of pixels (roughly text words) into one of the following categories: machine print text, handwritten text, noise.
  • Zheng et al. employed a large number of features, selected according to discriminative ability for classification. The results are post processed using a Markov Random Field that enforces neighborhood relationships of text words.
  • FIG. 1 shows a portion of a document 100 containing machine graphics 102 , machine printed text 104 , and handwriting 106 .
  • Various applications require separating and labeling these and other different kinds of markings.
  • a common intermediate step in the art is to form connected components.
  • One example is a signature that sprawls across the printed text of a form or letter.
  • Another example is seen in FIG. 1 where the handwritten numbers 106 extend over the machine printed text 102 .
  • FIG. 2 shows connected components (e.g., a sampling identified as 108 a- 108 n) of FIG. 1 in terms of bounding boxes (e.g., a sampling identified as 110 a- 110 n).
  • these connected components can and do include mixtures of marking types. The problem is to break these into smaller meaningful units suitable for grouping and classifying into smaller meaningful units suitable for grouping and classifying into correct types.
  • Methods and systems of the present embodiment provide segmenting of connected components of markings found in document images. Segmenting includes detecting aligned text. From this detected material an aligned text mask is generated and used in processing of the images. The processing includes breaking connected components in the document images into smaller pieces or fragments by detecting and segregating the connected components and fragments thereof likely to belong to aligned text.
  • FIG. 1 is an example image region with mixed machine print text, handwritten text and graphics
  • FIG. 2 is the image of FIG. 1 with connected components shown by their boundary boxes;
  • FIG. 3 is a system diagram of an environment in which the concepts of the present application are implemented.
  • FIG. 4 illustrates a high level process flow for generating masks and using the masks to break connected components into smaller pieces
  • FIG. 5 shows the results of the mask processing on an image returning horizontal and vertical line markings of the image
  • FIG. 6 depicts bounding boxes of lines of text found by the process of the present application
  • FIG. 7 is a portion of FIG. 6 with bounding boxes around fragments after splitting by the process of the present application;
  • FIG. 8 depicts a flow diagram for the processing steps used to find text lines
  • FIG. 9 depicts a portion of FIG. 6 illustrating the upper and lower extrema of the bounding contours of connected components
  • FIG. 10 illustrates content extrema points grouped according to horizontal alignment
  • FIG. 11 shows line segments formed by extrema point groups
  • FIG. 12A is an outline of a portion of a two-stage classifier configured in accordance with the present application.
  • FIG. 12B is an outline of a portion two-stage classifier configured in accordance with the present application.
  • FIG. 12C is an outline of a portion two-stage classifier configured in accordance with the present application.
  • FIG. 12D is an outline of a portion two-stage classifier configured in accordance with the present application
  • FIG. 13 shows one embodiment of the classifier which may be used in configuring the two-stage classifier of FIGS. 12A-12D ;
  • FIG. 14 is a schematic of a weighted sum classifier that is obtained by Adaboost used for each one-vs-all classifiers in one embodiment.
  • FIG. 15 is a more detailed flow chart of a process incorporating the concepts of the present application.
  • the methods and systems of the present disclosure are trainable based on examples.
  • the systems are configured to input and transform a physical hardcopy document into a binary image and to output a new image version where image pixels are color coded according to the automatic classification of the type of marking the fragment belongs.
  • a hardcopy document is digitized with images, including at least one of handwritten text, machine printed text, machine printed graphics, unidentified markings (i.e., noise) and form lines or rules.
  • the images are segmented into fragments by a segmenter module.
  • Each fragment is classified by an automatically trained multi-stage classifier and classification labels are provided to the fragments. These labels may be colors, differing gray tones, symbols, or other identifiers.
  • the classifier considers not just properties of the fragment itself, but also properties of the fragment neighborhood. In classification nomenclature these properties or attributes are called features. Features relevant for discrimination are picked out automatically from among a plurality of feature measurements.
  • the classifier is a two-staged classifier trained from labeled example images where each pixel has a “groundtruth” label, i.e., the label on a base or original image. A held out set of groundtruth images can be used for evaluation. Thereafter, the labeled document is stored in memory, displayed in an electronic display, printed out or otherwise processed.
  • a particular aspect of the present methods and systems is the ability to automatically train parameters from examples or groundtruths. This enables the present concepts to be used in high-volume operations by targeting specific goals and data at hand.
  • the disclosed methods and systems address the comparatively difficult task of classifying small marking fragments at the connected component or sub-connected component level.
  • the motivation is for at least two reasons. First this allows for calling out/identifying touching markings of different types, which permits appropriate splitting, when necessary, of the connected components.
  • the second motivation is to build a useful basic building block (e.g., a fragment-classifier) with the understanding that coarser level decisions (at the level of words, regions, or pages) can be made with much higher accuracy by aggregating the output of the described basic building block tool (e.g., the fragment-classifier).
  • a useful basic building block e.g., a fragment-classifier
  • coarser level decisions at the level of words, regions, or pages
  • the described basic building block tool e.g., the fragment-classifier
  • previous concepts target classification of larger aggregate regions only.
  • FIG. 3 Depicted in FIG. 3 is a system 300 in which the concepts of the present application may be implemented.
  • System 300 illustrates various channels by which digitized bitmapped images and/or images formed by digital ink techniques are provided to segmenter-classifier systems of the present application.
  • a hardcopy of a document carrying images 302 is input to a scanner 304 which converts or transforms the images of document 302 into an electronic document of the images 306 .
  • the images on hardcopy document 302 may be created by electronic data processing devices, by pens, pencils, or other non-electronic materials, or by stamps both electronic and manual.
  • the electronic document 306 is displayed on a screen 308 of a computer, personal digital system or other electronic device 310 , which includes a segmenter-classifier system 312 of the present application.
  • the electronic device 308 includes at least one processor and sufficient electronic memory storage to operate the segmenter-classifier system 312 , which in one embodiment may be software. It is understood the electronic device 310 includes input/output devices including but not limited to a mouse and/or keyboard.
  • a whiteboard or digital ink device 314 may be coupled to electronic device 310 , whereby bitmapped or digital ink images 316 are electronically transmitted to device 310 .
  • Another channel by which bitmapped or digital ink images may be provided to the segmenter-classifier system 312 is through use of another electronic device 318 .
  • This device can be any of a number of systems, including but not limited to a computer, a computerized CAD system, an electronic tablet, personal digital assistant (PDA), a server on the Internet which delivers web pages, or any other system which provides bitmapped and/or digital ink images 320 to segmenter-classifier system 312 .
  • image generation software loaded on electronic device 310 , can be used to generate a bitmapped or digital ink image for use by segmenter-classifier system 312 .
  • a finalized version of the electronic document with images processed by the segmenter-classifier system 312 is stored in the memory storage of the computer system 310 , sent to another electronic device 318 , printed out in hardcopy form by a printer 322 or printed out from printing capabilities associated with converter/scanner 308 .
  • segmenter-classifier system 312 includes segmenter 312 a and classifier 312 b.
  • the segmenter 312 a takes in a document image and partitions the set of pixels into small fragments.
  • the classifier 312 b then takes each fragment and assigns a category label to that fragment.
  • the classifier 312 b returns scores corresponding to different categories of markings, and in one embodiment the category with the best score.
  • a downstream application such as an interpreter 324 may further interpret the scores in order to make decisions. For example, scores not satisfying an acceptance criteria may be labeled as “reject” or “unknown”, or fragments that have handwriting scores above a preset threshold may be highlighted or marked for annotation processing on a processed electronic document displayed on display 308 .
  • classifying or scoring each individual pixel according to its type of marking is accomplished by considering spatial neighborhoods and other forms of context of the document. Pixels may be classified based on feature measurements made on the neighborhood. This can lead to interesting possibilities especially enabling formulations where segmentation and recognition proceed in lock-step informing each other.
  • An approach of the present application is to fragment the images into chunks of pixels that can be assumed to come from the same source of markings. These fragments are then classified as a whole. Needless to say that since this segmenter 312 a of the segmenter-classifier 312 will make hard decisions, any errors made by the segmenter are likely to cause errors in the end-result. Two kinds of errors are counted: (a) Creating fragments that are clearly a combination of different marking types, and (b) Unnecessarily carving out fragments from regions that are the same marking type.
  • FIG. 4 illustrated is a process 400 by which the operations of the segmenter take place to accomplish segmenting of the image.
  • the concept of finding alignments that come from, for example, machine-printed text is illustrated in operation with the finding of horizontal and vertical lines, as well as the use of a combined mark to find other fragments. It is, however, to be understood while the following discusses these concepts together, the finding of aligned text may be used without other detection concepts.
  • an electronic image 402 is investigated to identify aligned text 404 and at the same time separately identify horizontal and vertical lines of the image 406 . Once the aligned text is identified this information is used as an aligned text mask 408 .
  • the identified horizontal and vertical lines are used as a graphic lines mask 410 .
  • the aligned text mask 408 and the graphic lines mask 410 govern how the process breaks the image into pieces corresponding to aligned text fragments 412 and pieces corresponding to graphic line fragments 414 .
  • the aligned text mask 408 and graphic lines mask 410 are combined and applied to the image 416 and this combination governs how the process breaks the image into pieces corresponding to reminder fragments of the image 418 , where in one embodiment remainder fragments are pieces of the image that fall outside the bounds of lines of machine printed or neatly written text.
  • FIG. 5 shows the graphic lines mask 500 of the image of FIG. 1 , containing only horizontal and vertical line graphics. It is understood the processes including the aligned text mask and the combined mask will similarly have only those portions of the image corresponding to the mask contents.
  • fragments generated by process 400 are used in further operations designed to classify markings on a document image.
  • FIG. 4 Another aspect of the operations of FIG. 4 (whose operations are expanded on by the processes of FIGS. 8 and 15 ) is that it produces features that are useful in and of itself for classifying fragments.
  • the alignment processes disclosed herein is able to find fragments that align to a high degree or “very well” and may themselves be determined more likely to be machine printed markings, while fragments that determined by the processes found to align only “moderately well” are more likely to come from handwritten text.
  • threshold values which are determined for particular implementations. More specifically, in some implementations, the degree to which alignment must be found may be higher or lower, depending on the uses of the present concepts. The thresholding values can therefore be built into the system based on those particular requirements and the system may in fact be tunable for these factors while in operation.
  • FIG. 6 shown is an image 600 generated by process 400 , where bounding boxes 602 surround aligned text. These bounding boxes 602 (not all bounding boxes are numbered) define aligned text and identify connected components that extend outside of aligned text regions.
  • FIG. 7 illustrated is a close-up of one portion 700 of the image of FIG. 6 .
  • Exemplary bounding boxes 702 a- 702 n are identified which enclose fragments found by process 400 . All the bounding boxes are not numerically identified simply for clarity purposes.
  • FIG portion 700 is the situation where handwritten numbers cross the machine printed text, i.e., “3” crosses at the “PA” of “REPAIR ORDER NUMBER”. In this instance attention is directed to the connected component comprising “3” touching the word REPAIR” which is broken into smaller pieces, with some inside the aligned text region the “PA” fragment.
  • a particular aspect of the present application disclosure is the processes used to find lines of aligned text such as highly aligned text—e.g., “REPAIR ORDER NUMBER” and pretty good aligned text—e.g., “370147” (handwritten in FIG. 7 ). Identifying this material is based on grouping of extrema of contour paths, which may be achieved in accordance with process 800 of FIG. 8 .
  • connected components (CC) 802 are provided. Upper extrema and lower extrema 804 a, 804 b are detected. The results of this process are shown in FIG. 9 which identifies contours from the sample image segment in 900 , with upper contour extrema identified by points 902 a, 902 b, 902 c and lower contour extrema identified by points 904 a, 904 b, 904 c.
  • upper and lower contour extrema are processed independently to find groups of the points ( 806 a, 806 b). This is accomplished where first horizontal strips of extrema points are found by clustering by vertical position and selecting points that are within a threshold instance of a selected maximum. The described process is done iteratively as points are grouped and removed from the pool of points identified in steps 804 a and 804 b. Each such strip of points, are segregated according to any large horizontal gaps. Finally, extrema points are clustered into horizontal or nearly horizontal groups of extrema points that align with each other, using in one embodiment the well-known RANSAC (i.e., abbreviation for Random Sample Consensus) algorithm ( 808 a, 808 b).
  • RANSAC i.e., abbreviation for Random Sample Consensus
  • FIG. 10 shows extrema groups of points ( 902 a′, 904 a′, 902 b′, 904 b′ and 904 c′) found by this operation.
  • the RANSAC operation removes points determined to be outliers, where in fact all upper extrema points 902 c are removed, along with a number of points within the remaining groups.
  • step 812 a and 812 b line segments are fit to the point groups (e.g., in FIG. 11 lines 1100 a and 1100 b at the top of “12249Y”, lines 1100 b and 1100 c at the top and bottom of “REPAIR ORDER NUMBER” and line 1100 d at the bottom of “370147”).
  • Line segments with an orientation falling outside a predetermined threshold difference from horizontal are removed (e.g., line segments 1100 b and 1100 c are maintained and line segments 1100 a and 1100 d are removed as will be shown below).
  • steps 814 a and 814 b pairs of line segments derived from upper and lower contour extrema points are paired according to overlap and distance.
  • These form text line boxes step 818 as previously shown in FIG. 6 . It is noted final text bounding boxes are expanded by the extent of ascender and descenders and the left and right extent of connected components that contributed extrema points, so in some cases the text line bounding boxes will be somewhat larger than this example produces.
  • segmenter 312 a generates, from an image, a list of fragments. Each fragment is characterized by a number of feature measurements that are computed on the fragment and surrounding context.
  • the classifier of the present application is trained to classify each fragment into one of the categories of the described marking types, on the basis of these features.
  • a two-stage classifier 1200 includes in its first stage 1202 , a plurality of first stage classifiers 1202 a, 1202 b . . . 1202 n.
  • each fragment is classified solely on the basis of features described above in Section 3.1. This results in each fragment having a per-category score.
  • image fragments 1204 a are supplied to particular feature vectors 1206 a (as will be shown in more detail in FIG. 13 ). If the classifier 1200 stopped here the category with the highest score could be assigned for each fragment.
  • classifiers 1202 a and 1202 n in embodiments of the present application, the classification is refined by taking into consideration the surrounding context, and how the spatial neighbors are classified. Wherein neighborhood fragments 1204 b . . . 1204 n are provided to corresponding feature vectors 1206 b . . . 1206 n. The results of these operations in the form of category scores 1208 a, and accumulated category scores 1208 b . . .
  • labeling module 650 is understood to be the appropriate components of the system described in FIG. 3 .
  • the discussed secondary features are named and measured as accumulations of first-stage category-scores of all fragments with bounding boxes contained in the following spatial neighborhoods of the fragment's bounding box:
  • the neighborhood sizes are fairly arbitrary except in certain embodiments they are chosen to be less than one character height (e.g., 16 pixels) and several character heights (e.g., 160 pixels) based on 300 dpi, 12 point font. They can be adjusted according to application context, e.g., scan resolution. Thus the present methods and systems are tunable to particular implementations.
  • the secondary features establish a relationship among category-labels of neighborhood fragments, while the first-stage features measure relationships among fragments and their observable properties.
  • the regularity features measures how frequent the fragment height is in the neighborhood. This takes into account the other fragments in the neighborhood, but not what the likely categories of these fragments are.
  • Zheng et al. constructed a Markov Random Field to address this issue.
  • the present approach is different.
  • a neighborhood for each node (fragment) is defined, and the fragment label is allowed to depend on the neighborhood labels.
  • the pattern of dependence is guided by the choice of neighborhoods, but a preconceived form of dependence is not enforced. Rather the dependence, if significant, is learned from training data; the neighborhood features are made available to the second stage classifier learner and are selected if they are found useful for classification.
  • this formulation sidesteps loopy message propagation or iterative sampling inference which may have compute-time and convergence problems.
  • the two stage classifier is constructed by using the basic classifier explained in FIG. 13 .
  • the basic classifier is applied to categorize fragments based on features described above in Section 3.1.
  • the results of categorization are aggregated for the whole image into the secondary features 1208 a . . . 1208 n.
  • These secondary features and the initial features ( 1206 a) are together used by another basic classifier (i.e., second stage classifier 1210 ) in the second stage to arrive at the final categorization numbers.
  • the basic classifier used in each stage is a collection of one vs. all classifiers—one per category.
  • This classifier type takes as input a vector of features, and produces an array of scores—one per category. This output array is then used to pick the best scoring category, or apply various rejection/acceptance thresholds.
  • classifier 1300 may be understood as the type of classifier employed for each of the classifiers of FIGS. 12A-12D .
  • classifier 1300 is implemented as one vs. all type classifier implemented as weighted sum of weak classifiers, where each weak classifier is a single threshold test on one of scalar features measured on a fragment (e.g., one dimension of the feature vector). More particularly, an image fragment 1302 is supplied to each of the feature vectors 1304 a . . . 1304 n. Output from these vectors are passed to a multi-dimensional score vector (e.g., 5-dimensional score vector) 1306 . This output is then passed to a score regulator 1308 , which provides its output to a multi-dimensional regularized score vector (e.g., 5-dimensional regularized score vector) 1310 .
  • a multi-dimensional score vector e.g., 5-dimensional regularized score vector
  • This design set up permits extremely fast classification. For example in a classifier with a combination of 50 weak classifiers amounts to about 50 comparisons, multiplications, and additions for each fragment.
  • Each weak classifier produces a number that is either +1 or ⁇ 1 indicating the result of the comparison test. The weighted sum of these is then a number between +1 and ⁇ 1, nominally indicating positive classification if the result is positive.
  • the output of the basic classifier is then an array of numbers, one per category. A positive result nominally indicates a good match to the corresponding category. Typically, but not always, only one of these numbers will be positive. When more than one number is positive, the fragment may be rejected as un-assignable, or the system may be designed to pick the highest scorer. Similarly, it may be necessary to arbitrate when no category returns a positive score to claim a fragment.
  • One strategy is to feed the category-score vector to another classifier, which then produces refined category scores. This is especially useful if this second stage classifier can also be learned automatically from data.
  • the second classifier stage which, in some embodiments has adapted this approach may be thought of as a score regularizer.
  • the basic classifier itself may be thought of as a two stage classifier, with a number of one-vs.-all classifiers feeding into a score regularizer. This is not to be confused with the larger two stage approach where neighborhood information is integrated at the second stage.
  • the two stage classifier is implemented by using this same basic classifier structure, but with different parameters because the second stage classifier works on an augmented feature. Therefore, preliminary category assignments are revised based on statistics of category assignments made to neighboring fragments.
  • the basic one-vs.-all classifiers, and the score regularizer 1400 may in one embodiment be trained using the machine learning algorithm Adaptive Boosting (i.e., Adaboost).
  • Adaboost a machine learning algorithm
  • a feature vector 1402 is provided to scalar feature selectors 1404 a . . . 1404 n, which provide their output to weak scalar classifiers 1406 a . . . 1406 n.
  • the material is summed at summer 1408 and scored 1410 to obtain a binary decision 1412 .
  • the weak-learner In operation both, the weak-learner considers one feature dimension/vector 1402 at a time, and finds the threshold test (scalar feature selectors) 1404 that minimizes the weighted error on the training data (weak scalar classifier) 1406 . The most discriminative of these feature dimensions is then picked as the next weak classifier ( 1408 , 1410 , 1412 ). This process is then repeated through Adaboost iterations. By this construction parameters of the classifier are obtained by discriminative Adaboost training that selects useful features from among a family of hundreds of measurements and assigns relative weights to the features.
  • Adaboosting classifier learners has recently been found to be very effective in categorizing document images on a few Xerox Global Services client application data sets.
  • a discussion of Adaboost is set out in Freund et al., “A Decision-Theoretic Generalization Of On-Line Learning And An Application To Boosting,” in European Conference On Computational Learning Theory, pages 23-37, 1995, hereby incorporated herein by reference in its entirety.
  • process 1500 of FIG. 15 A more detailed description of the process of FIG. 4 is shown in by process 1500 of FIG. 15 .
  • this flow diagram there is interaction between image-based representations which are good for bitwise logical operations such as masking, and symbolic representations which are good for grouping and applying logic and rules to geometrical properties.
  • image-based representations which are good for bitwise logical operations such as masking
  • symbolic representations which are good for grouping and applying logic and rules to geometrical properties.
  • Similar to FIG. 4 one particular aspect of process 1500 is the creation of various image masks. Therefore, in the following discussion generally the various processing steps on the left-hand side of FIG. 15 are directed toward generating such masks in addition to further processing, while the processing steps on the right hand side of the figure are directed to image processing operations using the masks.
  • Process 1500 thus provides a mixture of image processing operations and symbolic operations that work on a token, such as the connected component (CC) objects.
  • CC connected component
  • an original image 1502 is investigated to find and obtain areas of the image which meet a predetermined definition of large areas of dark/black material (e.g., also called herein “big black blobs” or BBBs) 1504 .
  • An “AndNot” operation 1508 is preformed on the original image and the BBBs, where the BBBs sections are removed from the original image 1510 .
  • Original image minus BBBs is operated on so that the connected components (CCs) in the remaining image are extracted 1512 , creating an image with all CCs being identified 1514 .
  • a filtering operation is performed 1516 , wherein small dimensioned CCs (sometimes called dust or spec CCs due to their small size) are removed, resulting in the remaining CCs being the non-dust CCs 1518 .
  • This image of non-dust CCs has a text line determining process applied to find text lines 1520 .
  • this text line process may be accomplished by process 800 of FIG. 8 .
  • the resulting image of text lines 1522 is then processed to generate bounding boxes (i.e., bounding boxes are grown) 1524 at the locations where text lines of the image (i.e., step 1522 ) are located. This results in identification of bounding box locations that encompass the determined text line locations 1526 .
  • the bounding boxes are rendered 1528 , thereby generating an alignment text mask 1530 .
  • An invert binary pixel color operation 1532 is applied to the alignment text mask whereby the binary pixels of the mask are inverted (i.e., the colors are inverted) to generate an inverse alignment text mask 1534 .
  • the dust or spec connected components are shown at 1536 .
  • the “dust CCs” is shown as a double-line box.
  • the double-line box is intended to represent a final complete set of image material from the original image 1500 , which at this point is intended to be interpreted as different types of objects, where the system or process believes text images may exist.
  • the “OK CCs” 1540 are shown as a double-line box.
  • the double-line box is intended to represent a final complete set of an image material from the original image 1500 , where it is believed text images may exist.
  • this image is provided to an extraction process to extract and separate horizontal lines and vertical lines 1542 . More particularly, this process identifies a bitmap having just horizontal lines 1544 , a bitmap with just vertical lines 1546 , and a bitmap having no lines 1548 .
  • the horizontal lines bitmap 1544 is operated on to extract the connected components (CCs) of the bitmap 1550 , and a horizontal line image of connected components (CC's) is generated.
  • Vertical line bitmap 1546 is processed in a different manner than the horizontal line bitmap, wherein an “AndNot” logical operation is performed on the pixels of the vertical lines bitmap and the horizontal lines bitmap 1544 . This operation takes the vertical CCs and minuses out any horizontal CCs.
  • the remaining vertical connected components (CCs) are extracted 1556 , resulting in vertical line image of connected components (CCs).
  • the “Horizontal Line CCs image” 1552 and the “Vertical Line CCs image” 1558 are shown as a double-line boxes.
  • the double-line boxes are intended to represent a final complete set of an image material from the original image 1500 , where it is believed text images may exist.
  • this image is “Anded” in a bitwise manner with the inverted alignment text mask (of 1534 ) 1560 .
  • This operation identifies a no-lines bitmap which is outside of the alignment of the previously determined text lines 1562 .
  • the dust CC's of 1536 are provided for an operation where the dust CC's are rendered as white (i.e., they are turned into background material) 1564 .
  • This cleaned-up bitmap of outside alignment material 1566 then has its connected components (CC's) extracted 1568 , resulting in a finalized image of connected components which are aligned outside of the predetermined text lines range 1570 .
  • the “Outside Alignments CCs” 1570 is shown as a double-line box.
  • the double-line box is intended to represent a final complete set of image material from the original image 1500 , where it is believed text images may exist.
  • step 1572 the Too Big CCs of 1538 are rendered, forming a Too Big CCs bitmap 1574 , which is then “Anded” in a bitwise manner with alignment text mask (of 1530 ) 1576 .
  • the Anding operation generates a too big bitmap within the area of the image defined as alignment of text material 1578 , which then has its CCs extracted 1580 , whereby an image of too big CCs within the alignment areas is generated 1582 .
  • process 1500 final complete sets of image material (e.g., fragment or connected components) from the original image are generated which are now interpreted as different types of objects (i.e., small specs of the images) 1536 , CCs believed to be text images (OK CCs) 1540 , horizontal line CCs 1552 , vertical line CCs 1558 , CCs determined or believed to be outside of the text line alignments 1570 , and CCs determined or believed to be within the text line alignment range 1582 .
  • object i.e., small specs of the images
  • the concepts described herein permit annotation detection and marking classification systems to work at a level smaller than connected components, thereby allowing for the methods and systems to separate annotations that touch printed material.
  • This method does not cause spurious over fragmentation of fragments that occur within text lines, and this approach is relatively scale-independent and is not overly dependent on determining whether a connected component is text or not based on its size.
  • Another aspect of the present methods and systems is that it produces features that are useful in classifying fragments. Specifically, the alignments of contour extrema are themselves useful for later classification. Contour extrema that align very well are more likely to be machine printed, while contour extrema that align only moderately well are more likely to come from handwritten text.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Character Input (AREA)
  • Image Analysis (AREA)

Abstract

Methods and systems of the present embodiment provide segmenting of connected components of markings found in document images. Segmenting includes detecting aligned text. From this detected material an aligned text mask is generated and used in processing of the images. The processing includes breaking connected components in the document images into smaller pieces or fragments by detecting and segregating the connected components and fragments thereof likely to belong to aligned text.

Description

INCORPORATION BY REFERENCE
This application claims the priority, as a divisional, of U.S. application Ser. No. 12/500,882, filed Jul. 10, 2009 (now U.S. Application Publication No. US-2011-0007970-A1, published Jan. 13, 2011), the disclosure of which is incorporated herein by reference in its entirety.
CROSS REFERENCE TO RELATED PATENTS AND APPLICATIONS
Commonly assigned applications, U.S. Application Publication No. US-2011-0007366-A1, published Jan. 13, 2011, to Saund et al., entitled, “System And Method For Classifying Connected Groups Of Foreground Pixels In Scanned Document Images According To The Type Of Marking”; and U.S. Application Publication No. US-2011-0007964-A1, published Jan. 13, 2011, to Saund et al., entitled, “System and Method for Machine-Assisted Human Labeling of Pixels in an Image”, are each incorporated herein by reference in their entirety.
BACKGROUND
The present exemplary embodiments relate to systems and methods for segmenting text lines in documents, and the use of the segmented text in the determination of marking types in documents.
An automated electronic based system having the capability for such detection has uses in a number of environments. For example, in legal document discovery it is valuable for lawyers to be able to quickly narrow down, from millions of pages, those pages which have been marked on. Also, in automated data extraction, absence of handwritten marks in a signature box can be translated to mean the absence of a signature. Further, being able to tell noise marks apart from machine printed marks can lead to better segmentation for optical character recognition (OCR). It is therefore envisioned one area the present system will find use is in the context of forms, where printed or handwritten text may overlap machine printed rules and lines.
Identifying granular noise (sometimes called salt and pepper noise), line graphics, and machine print text have received the most attention in document image analysis literature. The dominant approaches have relied on certain predictable characteristics of each of these kinds of markings. For example, connected components of pixels that are smaller than a certain size are assumed to be noise; large regions of dark pixels are assumed to be shadows; and long straight runs of pixels are assumed to come from line graphics. Identification of machine print text is an even more difficult task. In commercial OCR packages, systems for the detection of machine printed regions have been heavily hand-tuned, especially for Romanic scripts, in order to work in known contexts of language, script, image resolution and text size. While these processes have had certain success when used with clean images, they have not been successful when dealing with images having clutter.
Zheng et al., “Machine Printed Text And Handwriting Identification In Noisy Document Images,” IEEE Trans. Pattern anal. Mach. Intell., 26(3):337-353, 2004, emphasized classifying regions of pixels (roughly text words) into one of the following categories: machine print text, handwritten text, noise. Zheng et al. employed a large number of features, selected according to discriminative ability for classification. The results are post processed using a Markov Random Field that enforces neighborhood relationships of text words.
Chen et al., “Image Objects And Multi-Scale Features For Annotation Detection”, in Proceedings of International Conference on Pattern Recognition, Tampa Bay, Fla., 2008, focused on the selecting the right level of segmentation through a multiscale hierarchical segmentation scheme.
Koyama et al., “Local-Spectrum-Based Distinction Between Handwritten And Machine-Printed Characters”, in Proceedings of the 2008 IEEE International Conference On Image Processing, San Diego, Calif., October 2008, used local texture features to classify small regions of an image into machine-printed or handwritten.
FIG. 1 shows a portion of a document 100 containing machine graphics 102, machine printed text 104, and handwriting 106. Various applications require separating and labeling these and other different kinds of markings.
A common intermediate step in the art is to form connected components. A problem arises when connected components contain mixed types of markings, especially when machine printed and handwritten text touch graphics, such as rule lines, or touch handwritten annotations that are not part of a given text line. Then, correct parsing requires breaking connected components into smaller fragments. One example is a signature that sprawls across the printed text of a form or letter. Another example is seen in FIG. 1 where the handwritten numbers 106 extend over the machine printed text 102.
FIG. 2 shows connected components (e.g., a sampling identified as 108a-108n) of FIG. 1 in terms of bounding boxes (e.g., a sampling identified as 110a-110n). Clearly many of these connected components can and do include mixtures of marking types. The problem is to break these into smaller meaningful units suitable for grouping and classifying into smaller meaningful units suitable for grouping and classifying into correct types.
One method for breaking connected components into smaller fragments is recursive splitting is discussed on commonly assigned U.S. Patent Publication No. US-2011-0007366-A1, published Jan. 13, 2011, to Saund et al., entitled, “System And Method For Classifying Connected Groups Of Foreground Pixels In Scanned Document Images According To The Type Of Marking”.
Another approach is described by Thomas Breuel in “Segmentation Of Handprinted Letter Strings Using A Dynamic Programming Algorithm”, in Proceedings of Sixth International Conference on Document Analysis and Recognition, pages 821-6, 2001.
Still another concept for breaking connected components into smaller fragments is disclosed in U.S. Pat. No. 6,411,733, Saund, “Method and apparatus for separating document image object types.” This applies mainly to separating pictures and large objects from text from line art. It does not focus on separating small text from small line art or graphics.
SUMMARY
Methods and systems of the present embodiment provide segmenting of connected components of markings found in document images. Segmenting includes detecting aligned text. From this detected material an aligned text mask is generated and used in processing of the images. The processing includes breaking connected components in the document images into smaller pieces or fragments by detecting and segregating the connected components and fragments thereof likely to belong to aligned text.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is an example image region with mixed machine print text, handwritten text and graphics;
FIG. 2 is the image of FIG. 1 with connected components shown by their boundary boxes;
FIG. 3 is a system diagram of an environment in which the concepts of the present application are implemented;
FIG. 4 illustrates a high level process flow for generating masks and using the masks to break connected components into smaller pieces;
FIG. 5 shows the results of the mask processing on an image returning horizontal and vertical line markings of the image;
FIG. 6 depicts bounding boxes of lines of text found by the process of the present application;
FIG. 7 is a portion of FIG. 6 with bounding boxes around fragments after splitting by the process of the present application;
FIG. 8 depicts a flow diagram for the processing steps used to find text lines;
FIG. 9 depicts a portion of FIG. 6 illustrating the upper and lower extrema of the bounding contours of connected components;
FIG. 10 illustrates content extrema points grouped according to horizontal alignment;
FIG. 11 shows line segments formed by extrema point groups;
FIG. 12A is an outline of a portion of a two-stage classifier configured in accordance with the present application;
FIG. 12B is an outline of a portion two-stage classifier configured in accordance with the present application;
FIG. 12C is an outline of a portion two-stage classifier configured in accordance with the present application;
FIG. 12D is an outline of a portion two-stage classifier configured in accordance with the present application
FIG. 13 shows one embodiment of the classifier which may be used in configuring the two-stage classifier of FIGS. 12A-12D;
FIG. 14 is a schematic of a weighted sum classifier that is obtained by Adaboost used for each one-vs-all classifiers in one embodiment; and
FIG. 15 is a more detailed flow chart of a process incorporating the concepts of the present application.
DETAILED DESCRIPTION
Described are methods and systems for finding alignments that come from machine-printed text lines in documents to segment the text lines, for use in larger methods and systems designed to identify various kinds of markings in scanned binary documents. The identification is then used to detect handwriting, machine print and noise in the document images. The methods and systems of the present disclosure are trainable based on examples. In some embodiments the systems are configured to input and transform a physical hardcopy document into a binary image and to output a new image version where image pixels are color coded according to the automatic classification of the type of marking the fragment belongs.
In one embodiment a hardcopy document is digitized with images, including at least one of handwritten text, machine printed text, machine printed graphics, unidentified markings (i.e., noise) and form lines or rules. The images are segmented into fragments by a segmenter module. Each fragment is classified by an automatically trained multi-stage classifier and classification labels are provided to the fragments. These labels may be colors, differing gray tones, symbols, or other identifiers. In order to arrive at the classification label, the classifier considers not just properties of the fragment itself, but also properties of the fragment neighborhood. In classification nomenclature these properties or attributes are called features. Features relevant for discrimination are picked out automatically from among a plurality of feature measurements. The classifier is a two-staged classifier trained from labeled example images where each pixel has a “groundtruth” label, i.e., the label on a base or original image. A held out set of groundtruth images can be used for evaluation. Thereafter, the labeled document is stored in memory, displayed in an electronic display, printed out or otherwise processed.
A particular aspect of the present methods and systems is the ability to automatically train parameters from examples or groundtruths. This enables the present concepts to be used in high-volume operations by targeting specific goals and data at hand.
The disclosed methods and systems address the comparatively difficult task of classifying small marking fragments at the connected component or sub-connected component level. The motivation is for at least two reasons. First this allows for calling out/identifying touching markings of different types, which permits appropriate splitting, when necessary, of the connected components. The second motivation is to build a useful basic building block (e.g., a fragment-classifier) with the understanding that coarser level decisions (at the level of words, regions, or pages) can be made with much higher accuracy by aggregating the output of the described basic building block tool (e.g., the fragment-classifier). In contradistinction, previous concepts target classification of larger aggregate regions only.
It is understood a single foreground (e.g., black) pixel alone does not have sufficient information to be used to decipher its source type (i.e., the type of mark it is). Following are examples of different types of marking on an image. It is to be understood the markings described below are provided to assist in the explanation of the present concepts and are not considered to be limiting of the present description or the claims of this application. Thus, the following assumptions are examples made to assist in providing a representation of groundtruth, and a consistent evaluation metric:
    • i. Pixel labels: Each pixel has a single marking category label. This assumption is purely pragmatic, of course. This allows the groundtruth of an image to be represented as another image with an integer label for each pixel. Thus a groundtruth can be stored and its output processed using known image formats, and image viewers, loaders, and editors may be used to efficiently visualize and manipulate them. This also leads to a simple, general, and consistent evaluation metric that is discussed later.
    • ii. Background pixels: marking categories are assigned only to foreground pixels (e.g., black) and it is assumed that white pixels form the background (e.g., paper). This assumption matches current usage scenarios.
    • iii. Ambiguous pixels: Clearly, multiple markings can overlap in a single black pixel. When both markings are of the same category, there is no ambiguity in pixel labeling. In other situations the pixel has to be assigned to one of the overlapping marking categories. A predefined order of priority will be assumed for the category labels. Any ambiguity can be resolved by assigning the label with higher priority. For example, in one implementation, “Handwritten Text” is the category with highest priority. When handwritten characters overlap machine printed lines, pixels in the intersection are labeled as “Handwritten Text”. Noise labels have the lowest priority.
    • iv. Evaluation metrics: When comparing two groundtruth label files, or an automatic classification output to groundtruth, the labels are compared at each pixel location and one error will be counted if the two labels differ. This is useful as an evaluation metric because the definition works consistently irrespective of the solution approach. In one embodiment of this application, a segment-then-classify approach is used. An alternate approach classifies each pixel based on surrounding context. Yet another approach assigns labels to grid-patches of the image. Nevertheless, for all approaches, the present methods and systems measures the fraction of foreground pixels correctly labeled. The described concepts classify (and learn to classify) fragments, not pixels. Nevertheless, it is useful to use the pixel error metric as wrongly classifying a large fragment is worse than making a mistake on a small fragment.
    • v. Assumptions about image content: While the setup is quite general, and the present systems and methods apply to situations not in line with the foregoing, there are several assumptions that are made about the images that are used—either explicitly or implicitly. First, it is assumed the test/application scenarios are well represented in the training images. For example, an assumption is made the images represent everyday scanned document images, nearly upright, in binary, at roughly 300 dpi; and that machine printed text is horizontally laid out.
In implementations, such as a software program operated on a document editing device, the above assumptions are considered to hold. Nevertheless, it is considered the systems and methods of the present application will continue to work if they do not hold.
The present methods and systems have been designed to be fairly general and extensible, therefore the following target marking categories as defined below may be altered depending upon the particular implementation. However, for the present discussion the identification of the following target markings and their order of disambiguation priority (higher (i) to lower (v) are used:
    • i. Handwritten: This consists of HandwrittenText (handwritten paragraphs, words, single letters, or even just punctuation marks), HandwrittenSignatures, and HandwrittenGraphics (underlines, arrows, line markings, strikethroughs, check marks in check boxes). This text may be handprinted or cursive, and in any language or script. Cursive font printed text is considered MachinePrintText.
    • ii. MachinePrintText: Black on white text that is machine printed in any language or script. Shaded text, or black background for white text should be marked as MachinePrintGraphic.
    • iii. MachinePrintGraphic: MachinePrintLineGraphic (underlines, arrows, background rules, lineart), or MachinePrintBlockGraphic (bullets, logos, photos).
    • iv. ScannerNoiseSaltPepper: Small granular noise usually due to paper texture, or faulty binarization.
    • v. ScannerNoiseDarkRegion: This is meant to cover significant black regions not generated by machine or handwriting ink. This includes black pixels generated by darkening of background-material, e.g. paper folds, shadows, holes, etc.
      1. Solution Architecture
Depicted in FIG. 3 is a system 300 in which the concepts of the present application may be implemented. System 300 illustrates various channels by which digitized bitmapped images and/or images formed by digital ink techniques are provided to segmenter-classifier systems of the present application.
More particularly, a hardcopy of a document carrying images 302 is input to a scanner 304 which converts or transforms the images of document 302 into an electronic document of the images 306. While not being limited thereto, the images on hardcopy document 302 may be created by electronic data processing devices, by pens, pencils, or other non-electronic materials, or by stamps both electronic and manual. The electronic document 306 is displayed on a screen 308 of a computer, personal digital system or other electronic device 310, which includes a segmenter-classifier system 312 of the present application. The electronic device 308 includes at least one processor and sufficient electronic memory storage to operate the segmenter-classifier system 312, which in one embodiment may be software. It is understood the electronic device 310 includes input/output devices including but not limited to a mouse and/or keyboard.
Alternatively, a whiteboard or digital ink device 314 may be coupled to electronic device 310, whereby bitmapped or digital ink images 316 are electronically transmitted to device 310. Another channel by which bitmapped or digital ink images may be provided to the segmenter-classifier system 312, is through use of another electronic device 318. This device can be any of a number of systems, including but not limited to a computer, a computerized CAD system, an electronic tablet, personal digital assistant (PDA), a server on the Internet which delivers web pages, or any other system which provides bitmapped and/or digital ink images 320 to segmenter-classifier system 312. Further, image generation software, loaded on electronic device 310, can be used to generate a bitmapped or digital ink image for use by segmenter-classifier system 312. A finalized version of the electronic document with images processed by the segmenter-classifier system 312 is stored in the memory storage of the computer system 310, sent to another electronic device 318, printed out in hardcopy form by a printer 322 or printed out from printing capabilities associated with converter/scanner 308.
It is to be appreciated that while the foregoing discussion explicitly states a variety of channels to generate the images, concepts of the present application will also work with images on documents obtained through other channels as well.
With continuing attention to FIG. 3 segmenter-classifier system 312, includes segmenter 312a and classifier 312b. The segmenter 312a takes in a document image and partitions the set of pixels into small fragments. The classifier 312b then takes each fragment and assigns a category label to that fragment. The classifier 312b returns scores corresponding to different categories of markings, and in one embodiment the category with the best score. A downstream application such as an interpreter 324 may further interpret the scores in order to make decisions. For example, scores not satisfying an acceptance criteria may be labeled as “reject” or “unknown”, or fragments that have handwriting scores above a preset threshold may be highlighted or marked for annotation processing on a processed electronic document displayed on display 308.
2. Segmenter
In the present application classifying or scoring each individual pixel according to its type of marking, particularly when pixels are either black or white, is accomplished by considering spatial neighborhoods and other forms of context of the document. Pixels may be classified based on feature measurements made on the neighborhood. This can lead to interesting possibilities especially enabling formulations where segmentation and recognition proceed in lock-step informing each other.
An approach of the present application is to fragment the images into chunks of pixels that can be assumed to come from the same source of markings. These fragments are then classified as a whole. Needless to say that since this segmenter 312a of the segmenter-classifier 312 will make hard decisions, any errors made by the segmenter are likely to cause errors in the end-result. Two kinds of errors are counted: (a) Creating fragments that are clearly a combination of different marking types, and (b) Unnecessarily carving out fragments from regions that are the same marking type.
While it is clear that errors of type (a) are bound to result in pixel-level labeling errors, the effect of type (b) errors are more subtle. Thus it is considered the more surrounding context that can be gathered, the better the results. It has been determined herein that identifying handwritten regions from machine printed regions is easier, than it is to tell handwritten characters from machine printed characters. It becomes even more difficult at the stroke level. Further problems arise when artificial boundaries introduced by the segmenter 312a mask the true appearance of a marking.
Despite the above concerns, a “segment-then-classify” approach has been adopted. The present approach acts to over-segment rather than under-segment by relying on connected component analysis, but with decision processing to split selected connected components when necessary.
Turning to FIG. 4, illustrated is a process 400 by which the operations of the segmenter take place to accomplish segmenting of the image. In this embodiment, the concept of finding alignments that come from, for example, machine-printed text is illustrated in operation with the finding of horizontal and vertical lines, as well as the use of a combined mark to find other fragments. It is, however, to be understood while the following discusses these concepts together, the finding of aligned text may be used without other detection concepts. In operation, an electronic image 402 is investigated to identify aligned text 404 and at the same time separately identify horizontal and vertical lines of the image 406. Once the aligned text is identified this information is used as an aligned text mask 408. Similarly, the identified horizontal and vertical lines are used as a graphic lines mask 410. The aligned text mask 408 and the graphic lines mask 410 govern how the process breaks the image into pieces corresponding to aligned text fragments 412 and pieces corresponding to graphic line fragments 414. Also in process 400, the aligned text mask 408 and graphic lines mask 410 are combined and applied to the image 416 and this combination governs how the process breaks the image into pieces corresponding to reminder fragments of the image 418, where in one embodiment remainder fragments are pieces of the image that fall outside the bounds of lines of machine printed or neatly written text.
As an example of the obtained masks, attention is directed to FIG. 5 which shows the graphic lines mask 500 of the image of FIG. 1, containing only horizontal and vertical line graphics. It is understood the processes including the aligned text mask and the combined mask will similarly have only those portions of the image corresponding to the mask contents.
As will be discussed in more detail below, fragments generated by process 400 are used in further operations designed to classify markings on a document image. However, it is mentioned here that another aspect of the operations of FIG. 4 (whose operations are expanded on by the processes of FIGS. 8 and 15) is that it produces features that are useful in and of itself for classifying fragments. Specifically, the alignment processes disclosed herein, is able to find fragments that align to a high degree or “very well” and may themselves be determined more likely to be machine printed markings, while fragments that determined by the processes found to align only “moderately well” are more likely to come from handwritten text. It is to be understood that when stated that text aligns to a “high degree”, “very well” or only “moderately well”, such terms correspond to threshold values which are determined for particular implementations. More specifically, in some implementations, the degree to which alignment must be found may be higher or lower, depending on the uses of the present concepts. The thresholding values can therefore be built into the system based on those particular requirements and the system may in fact be tunable for these factors while in operation.
It is noted various known processes maybe used to detect aligned text, and horizontal and vertical graphics. A particular process of the present application for determining aligned text will be discussed in connection with FIG. 8.
Turning to FIG. 6, shown is an image 600 generated by process 400, where bounding boxes 602 surround aligned text. These bounding boxes 602 (not all bounding boxes are numbered) define aligned text and identify connected components that extend outside of aligned text regions.
Turning now to FIG. 7 illustrated is a close-up of one portion 700 of the image of FIG. 6. Exemplary bounding boxes 702a-702n are identified which enclose fragments found by process 400. All the bounding boxes are not numerically identified simply for clarity purposes. Of particular note in figure portion 700 is the situation where handwritten numbers cross the machine printed text, i.e., “3” crosses at the “PA” of “REPAIR ORDER NUMBER”. In this instance attention is directed to the connected component comprising “3” touching the word REPAIR” which is broken into smaller pieces, with some inside the aligned text region the “PA” fragment.
A particular aspect of the present application disclosure is the processes used to find lines of aligned text such as highly aligned text—e.g., “REPAIR ORDER NUMBER” and pretty good aligned text—e.g., “370147” (handwritten in FIG. 7). Identifying this material is based on grouping of extrema of contour paths, which may be achieved in accordance with process 800 of FIG. 8.
Initially, connected components (CC) 802 are provided. Upper extrema and lower extrema 804a, 804b are detected. The results of this process are shown in FIG. 9 which identifies contours from the sample image segment in 900, with upper contour extrema identified by points 902a, 902b, 902c and lower contour extrema identified by points 904a, 904b, 904c.
Returning to FIG. 8, upper and lower contour extrema are processed independently to find groups of the points (806a, 806b). This is accomplished where first horizontal strips of extrema points are found by clustering by vertical position and selecting points that are within a threshold instance of a selected maximum. The described process is done iteratively as points are grouped and removed from the pool of points identified in steps 804a and 804b. Each such strip of points, are segregated according to any large horizontal gaps. Finally, extrema points are clustered into horizontal or nearly horizontal groups of extrema points that align with each other, using in one embodiment the well-known RANSAC (i.e., abbreviation for Random Sample Consensus) algorithm (808a, 808b). FIG. 10 shows extrema groups of points (902a′, 904a′, 902b′, 904b′ and 904c′) found by this operation. As is seen when comparing FIG. 9 to FIG. 10, the RANSAC operation removes points determined to be outliers, where in fact all upper extrema points 902c are removed, along with a number of points within the remaining groups.
Returning to FIG. 8, in steps 812a and 812b, line segments are fit to the point groups (e.g., in FIG. 11 lines 1100a and 1100b at the top of “12249Y”, lines 1100b and 1100c at the top and bottom of “REPAIR ORDER NUMBER” and line 1100d at the bottom of “370147”). Line segments with an orientation falling outside a predetermined threshold difference from horizontal are removed (e.g., line segments 1100b and 1100c are maintained and line segments 1100a and 1100d are removed as will be shown below).
Thereafter, in steps 814a and 814b, pairs of line segments derived from upper and lower contour extrema points are paired according to overlap and distance. These form text line boxes step 818, as previously shown in FIG. 6. It is noted final text bounding boxes are expanded by the extent of ascender and descenders and the left and right extent of connected components that contributed extrema points, so in some cases the text line bounding boxes will be somewhat larger than this example produces.
3. Fragment Classifier
As discussed above, segmenter 312a generates, from an image, a list of fragments. Each fragment is characterized by a number of feature measurements that are computed on the fragment and surrounding context. The classifier of the present application is trained to classify each fragment into one of the categories of the described marking types, on the basis of these features.
3.1 Features
Various kinds of features, in addition to the described text line feature, are measured on each fragment, a sampling of these other features include:
    • i. Segmenter features: These are simply two features that are either zero or one, indicating whether the fragment was part of horizontal lines and vertical lines images respectively.
    • ii. Size features: These include width, height, aspect ratio of the bounding boxes, the size of the perimeter, the number of holes in the connected component, the number of foreground pixels. Also included are the number of spine-fragments resulting from midcrack thinning, the ratio of foreground count to bounding box area, the ratio of foreground count to the perimeter size, and perimeter size to the bounding box area.
    • iii. Location features: The minimum horizontal and vertical distances of the fragment from the image boundary are measured. The idea is that this can help to tell shadow noise from dark graphic regions in the document.
    • iv. Regularity features: This is mainly targeted at characterizing machine printed text apart from other kinds of markings. Printed text shows a high degree of regularity of alignment and of size. If many other fragments in the document or within a spatial neighborhood share the same height, bounding box top y, and bounding box bottom y, it indicates that the current fragment is likely to be printed text or graphics. It is more accidental for handwriting or noise to show such regularity. The feature measurements are made as histograms of relative differences. For example, for the measuring height regularity, histograms of (hi-h0) are designed, where hi is the height of the ith fragment and h0 is the height of this fragment. The histogram bins are set to [−32, −16), . . . , [−4, −2), [−2, −1), [−1, 0), [0, 0], (0, 1], (1, 2], (2, 4], (16, 32].
      • Thus it would be expected the three middle bins would have high counts for printed text. The height histograms consider all fragments in the image, while the bounding box extremity histograms only consider fragments in an x-neighborhood.
    • v. Edge Curvature features: For each fragment this attempts to characterize the curvature of the outer contour through quick measurements. The curvature index at a contour point is measured as the Euclidean distance of that point from the straight line joining two other contour points that are a fixed contour distance (16 contour points) away from this point. The histogram of all curvature indices measured over a fragment's outer contour is computed and used.
    • vi. Contour features: This consists of two measurements. Traversing counter-clockwise around the outer contour of a fragment, the histogram of the displacement between two contour points separated by four contour positions is recorded. From here the histogram of unsigned edge displacements (where two opposite displacements get added), and histogram of symmetry violations (where two opposite displacements cancel each other) are then measured. The expectation is to see higher histogram strength around vertical and horizontal directions for printed lines and printed text. For uniform strokes it is expected low values for symmetry violations will be seen.
    • vii. Run length features: Spines of fragments are computed by a mid-crack thinning algorithm, such as disclosed in U.S. Pat. No. 6,377,710, to Saund, entitled, “Method And Apparatus For Extracting The Skeleton Of A Binary Figure By Contour-Based Erosion”, 2002, incorporated herein in its entirety. At each point on the spine the minimum and maximum of the horizontal and vertical run-lengths are recorded. The histograms of these two numbers are returned as run-length features. Printed parts are expected to have more concentrated run-length histograms than handwriting or noise. But the concentration need not be uni-modal. The raw-run length histograms as features are used, assuming that the classifier trainer will be able to pick out some differences among the histograms for different categories.
    • viii. Edge-turn histogram features: These were found to be useful, but have been superseded by contour features, and edge curvature features.
      3.2 Classifier
The classification of fragments according to marking type takes place in two stages, as illustrated in FIGS. 12A-12D and 13. As more particularly shown in FIGS. 12A-12D, a two-stage classifier 1200 includes in its first stage 1202, a plurality of first stage classifiers 1202a, 1202b . . . 1202n. In the first stage, each fragment is classified solely on the basis of features described above in Section 3.1. This results in each fragment having a per-category score. So in the case of FIGS. 12A-12D, image fragments 1204a are supplied to particular feature vectors 1206a (as will be shown in more detail in FIG. 13). If the classifier 1200 stopped here the category with the highest score could be assigned for each fragment.
But as can be seen by the use of classifiers 1202a and 1202n, in embodiments of the present application, the classification is refined by taking into consideration the surrounding context, and how the spatial neighbors are classified. Wherein neighborhood fragments 1204b . . . 1204n are provided to corresponding feature vectors 1206b . . . 1206n. The results of these operations in the form of category scores 1208a, and accumulated category scores 1208b . . . 1208n are supplied, along with the feature vector 1202a, to an augmented feature vector 1210, for use by second stage classifier 1212 of the two stage classifier 1200, to provide this refined output by reclassifying the image fragments 1204a by taking into consideration all the features used in the first stage 1202a, and also the likely category (secondary feature) labels of neighborhood fragments 1204b . . . 1204n. The output from the second stage classifier 1212 providing final category scores 1214. The final category score from classifier 1212 is then used by the systems and methods of the present application to apply a label (such as a color, a grey tone, or other marking or indicator) to the segment of the image by a labeling module 650. In one embodiment, labeling module is understood to be the appropriate components of the system described in FIG. 3.
The discussed secondary features are named and measured as accumulations of first-stage category-scores of all fragments with bounding boxes contained in the following spatial neighborhoods of the fragment's bounding box:
    • i. Horizontal Strip: Within ±16 pixels of fragment in y, and ±160 pixels of the fragment in x.
    • ii. Vertical Strip: Within ±16 pixels of fragment in x, and ±160 pixels of the fragment in y.
    • iii. Rectangular Neighborhood: Within ±160 pixels of fragment in both x and y directions.
The neighborhood sizes are fairly arbitrary except in certain embodiments they are chosen to be less than one character height (e.g., 16 pixels) and several character heights (e.g., 160 pixels) based on 300 dpi, 12 point font. They can be adjusted according to application context, e.g., scan resolution. Thus the present methods and systems are tunable to particular implementations.
It is mentioned there is a subtle but important difference of purpose between the secondary features and first-stage features that also consider neighborhood content (e.g., regularity features). The secondary features establish a relationship among category-labels of neighborhood fragments, while the first-stage features measure relationships among fragments and their observable properties. Consider, for example, the regularity features. The height-regularity feature measures how frequent the fragment height is in the neighborhood. This takes into account the other fragments in the neighborhood, but not what the likely categories of these fragments are. Thus, if si represents the ith fragment, ui are the features measured on that fragment, and ci is that fragments category, then the classifier trained on the first stage features establishes:
p(ci|ui;jϵneighborhood(i)).
  • In contrast, the secondary features enable a dependency of the form:
    p(ci|cj;jϵneighborhood(i)).
  • Thus the secondary features address the issue of inter-label dependence.
Zheng et al. constructed a Markov Random Field to address this issue. The present approach is different. Here a neighborhood for each node (fragment) is defined, and the fragment label is allowed to depend on the neighborhood labels. The pattern of dependence is guided by the choice of neighborhoods, but a preconceived form of dependence is not enforced. Rather the dependence, if significant, is learned from training data; the neighborhood features are made available to the second stage classifier learner and are selected if they are found useful for classification. Further, this formulation sidesteps loopy message propagation or iterative sampling inference which may have compute-time and convergence problems.
The two stage classifier is constructed by using the basic classifier explained in FIG. 13. In the first stage 1202a . . . 1202n the basic classifier is applied to categorize fragments based on features described above in Section 3.1. The results of categorization are aggregated for the whole image into the secondary features 1208a . . . 1208n. These secondary features and the initial features (1206a) are together used by another basic classifier (i.e., second stage classifier 1210) in the second stage to arrive at the final categorization numbers.
3.3 The Basic Classifier
In one embodiment the basic classifier used in each stage is a collection of one vs. all classifiers—one per category. This classifier type takes as input a vector of features, and produces an array of scores—one per category. This output array is then used to pick the best scoring category, or apply various rejection/acceptance thresholds.
With continuing attention to FIG. 13, classifier 1300 may be understood as the type of classifier employed for each of the classifiers of FIGS. 12A-12D. In this embodiment, classifier 1300 is implemented as one vs. all type classifier implemented as weighted sum of weak classifiers, where each weak classifier is a single threshold test on one of scalar features measured on a fragment (e.g., one dimension of the feature vector). More particularly, an image fragment 1302 is supplied to each of the feature vectors 1304a . . . 1304n. Output from these vectors are passed to a multi-dimensional score vector (e.g., 5-dimensional score vector) 1306. This output is then passed to a score regulator 1308, which provides its output to a multi-dimensional regularized score vector (e.g., 5-dimensional regularized score vector) 1310.
This design set up permits extremely fast classification. For example in a classifier with a combination of 50 weak classifiers amounts to about 50 comparisons, multiplications, and additions for each fragment.
Each weak classifier produces a number that is either +1 or −1 indicating the result of the comparison test. The weighted sum of these is then a number between +1 and −1, nominally indicating positive classification if the result is positive. The output of the basic classifier is then an array of numbers, one per category. A positive result nominally indicates a good match to the corresponding category. Typically, but not always, only one of these numbers will be positive. When more than one number is positive, the fragment may be rejected as un-assignable, or the system may be designed to pick the highest scorer. Similarly, it may be necessary to arbitrate when no category returns a positive score to claim a fragment. One strategy is to feed the category-score vector to another classifier, which then produces refined category scores. This is especially useful if this second stage classifier can also be learned automatically from data. The second classifier stage which, in some embodiments has adapted this approach may be thought of as a score regularizer.
Thus the basic classifier itself may be thought of as a two stage classifier, with a number of one-vs.-all classifiers feeding into a score regularizer. This is not to be confused with the larger two stage approach where neighborhood information is integrated at the second stage. In fact, as previously mentioned, the two stage classifier is implemented by using this same basic classifier structure, but with different parameters because the second stage classifier works on an augmented feature. Therefore, preliminary category assignments are revised based on statistics of category assignments made to neighboring fragments.
As depicted in FIG. 14 the basic one-vs.-all classifiers, and the score regularizer 1400 may in one embodiment be trained using the machine learning algorithm Adaptive Boosting (i.e., Adaboost). In FIG. 14, a feature vector 1402 is provided to scalar feature selectors 1404a . . . 1404n, which provide their output to weak scalar classifiers 1406a . . . 1406n. The material is summed at summer 1408 and scored 1410 to obtain a binary decision 1412. In operation both, the weak-learner considers one feature dimension/vector 1402 at a time, and finds the threshold test (scalar feature selectors) 1404 that minimizes the weighted error on the training data (weak scalar classifier) 1406. The most discriminative of these feature dimensions is then picked as the next weak classifier (1408,1410,1412). This process is then repeated through Adaboost iterations. By this construction parameters of the classifier are obtained by discriminative Adaboost training that selects useful features from among a family of hundreds of measurements and assigns relative weights to the features.
This particular form of Adaboosting classifier learners has recently been found to be very effective in categorizing document images on a few Xerox Global Services client application data sets. A discussion of Adaboost is set out in Freund et al., “A Decision-Theoretic Generalization Of On-Line Learning And An Application To Boosting,” in European Conference On Computational Learning Theory, pages 23-37, 1995, hereby incorporated herein by reference in its entirety.
4. Image Processing Flow Diagram
A more detailed description of the process of FIG. 4 is shown in by process 1500 of FIG. 15. In this flow diagram there is interaction between image-based representations which are good for bitwise logical operations such as masking, and symbolic representations which are good for grouping and applying logic and rules to geometrical properties. Similar to FIG. 4 one particular aspect of process 1500 is the creation of various image masks. Therefore, in the following discussion generally the various processing steps on the left-hand side of FIG. 15 are directed toward generating such masks in addition to further processing, while the processing steps on the right hand side of the figure are directed to image processing operations using the masks. Process 1500 thus provides a mixture of image processing operations and symbolic operations that work on a token, such as the connected component (CC) objects.
With a more detailed look at process 1500, an original image 1502 is investigated to find and obtain areas of the image which meet a predetermined definition of large areas of dark/black material (e.g., also called herein “big black blobs” or BBBs) 1504. An “AndNot” operation 1508 is preformed on the original image and the BBBs, where the BBBs sections are removed from the original image 1510.
Original image minus BBBs is operated on so that the connected components (CCs) in the remaining image are extracted 1512, creating an image with all CCs being identified 1514. A filtering operation is performed 1516, wherein small dimensioned CCs (sometimes called dust or spec CCs due to their small size) are removed, resulting in the remaining CCs being the non-dust CCs 1518.
This image of non-dust CCs, has a text line determining process applied to find text lines 1520. In one embodiment this text line process may be accomplished by process 800 of FIG. 8. The resulting image of text lines 1522 is then processed to generate bounding boxes (i.e., bounding boxes are grown) 1524 at the locations where text lines of the image (i.e., step 1522) are located. This results in identification of bounding box locations that encompass the determined text line locations 1526. The bounding boxes are rendered 1528, thereby generating an alignment text mask 1530. An invert binary pixel color operation 1532 is applied to the alignment text mask whereby the binary pixels of the mask are inverted (i.e., the colors are inverted) to generate an inverse alignment text mask 1534.
The foregoing operations have therefore created an alignment text mask 1530 and an inverse alignment text mask 1534, which are used in later image processing steps.
Returning attention to the filtering operation of 1516 the dust or spec connected components (CCs) are shown at 1536. The “dust CCs” is shown as a double-line box. The double-line box is intended to represent a final complete set of image material from the original image 1500, which at this point is intended to be interpreted as different types of objects, where the system or process believes text images may exist.
Next, with attention to growing of the bounding boxes 1524, there will be instances when some identified CCs are determined to be too big to be included as a text line with a bounding box 1538, as opposed to the CCs determined to be of an appropriate size 1540. The “OK CCs” 1540 are shown as a double-line box. The double-line box is intended to represent a final complete set of an image material from the original image 1500, where it is believed text images may exist.
Turning now to the right-hand side of the process flow, the image processing operations of process 1500, which will use the masks and of data generated by the previous processing will now be addressed in more detail.
Returning to the original image minus the BBBs 1510, this image is provided to an extraction process to extract and separate horizontal lines and vertical lines 1542. More particularly, this process identifies a bitmap having just horizontal lines 1544, a bitmap with just vertical lines 1546, and a bitmap having no lines 1548.
The horizontal lines bitmap 1544 is operated on to extract the connected components (CCs) of the bitmap 1550, and a horizontal line image of connected components (CC's) is generated. Vertical line bitmap 1546 is processed in a different manner than the horizontal line bitmap, wherein an “AndNot” logical operation is performed on the pixels of the vertical lines bitmap and the horizontal lines bitmap 1544. This operation takes the vertical CCs and minuses out any horizontal CCs. The remaining vertical connected components (CCs) are extracted 1556, resulting in vertical line image of connected components (CCs).
The “Horizontal Line CCs image” 1552 and the “Vertical Line CCs image” 1558 are shown as a double-line boxes. The double-line boxes are intended to represent a final complete set of an image material from the original image 1500, where it is believed text images may exist.
Returning now to the no-lines bitmap 1548, this image is “Anded” in a bitwise manner with the inverted alignment text mask (of 1534) 1560. This operation identifies a no-lines bitmap which is outside of the alignment of the previously determined text lines 1562. To clean up this image, the dust CC's of 1536 are provided for an operation where the dust CC's are rendered as white (i.e., they are turned into background material) 1564. This cleaned-up bitmap of outside alignment material 1566 then has its connected components (CC's) extracted 1568, resulting in a finalized image of connected components which are aligned outside of the predetermined text lines range 1570. The “Outside Alignments CCs” 1570 is shown as a double-line box. The double-line box is intended to represent a final complete set of image material from the original image 1500, where it is believed text images may exist.
Turning to step 1572, the Too Big CCs of 1538 are rendered, forming a Too Big CCs bitmap 1574, which is then “Anded” in a bitwise manner with alignment text mask (of 1530) 1576. The Anding operation generates a too big bitmap within the area of the image defined as alignment of text material 1578, which then has its CCs extracted 1580, whereby an image of too big CCs within the alignment areas is generated 1582.
By the operation of process 1500, final complete sets of image material (e.g., fragment or connected components) from the original image are generated which are now interpreted as different types of objects (i.e., small specs of the images) 1536, CCs believed to be text images (OK CCs) 1540, horizontal line CCs 1552, vertical line CCs 1558, CCs determined or believed to be outside of the text line alignments 1570, and CCs determined or believed to be within the text line alignment range 1582.
The concepts described herein permit annotation detection and marking classification systems to work at a level smaller than connected components, thereby allowing for the methods and systems to separate annotations that touch printed material.
This method does not cause spurious over fragmentation of fragments that occur within text lines, and this approach is relatively scale-independent and is not overly dependent on determining whether a connected component is text or not based on its size.
Another aspect of the present methods and systems is that it produces features that are useful in classifying fragments. Specifically, the alignments of contour extrema are themselves useful for later classification. Contour extrema that align very well are more likely to be machine printed, while contour extrema that align only moderately well are more likely to come from handwritten text.
It will be appreciated that various of the above-disclosed and other features and functions, or alternatives thereof, may be desirably combined into many other different systems or applications. Also that various presently unforeseen or unanticipated alternatives, modifications, variations or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.

Claims (25)

What is claimed is:
1. A method of classifying marking types on images of a document, the method comprising:
supplying the document containing the images to a segmenter;
segmenting the images received by the segmenter into fragments, the segmenting including identifying neatly written or printed text by grouping selected feature points along predetermined orientations, the feature points including local extrema of bounding contours of connected components, and subtracting enclosing boundary boxes of text lines from remaining document material to fragment connected components that are part of the text lines and part of extraneous markings;
supplying the fragments to a two-stage classifier, the two-stage classifier providing a plurality of first classifiers generating a first category score and a second classifier generating a second category score to each fragment, wherein the two-stage classifier is trained from groundtruth images whose pixels are labeled according to known marking types, wherein the first category score is generated as an array of scores for the each fragment, and the second category score is provided by reclassifying the each fragment by considering a neighborhood fragment of the each fragment, and wherein the first and second category scores to each of the fragments are generated by:
determining feature measurements for the each fragment, the feature measurements including measuring a segmenter feature, a size feature, a location feature, a regularity feature, an edge curvature feature, a contour feature, a run length feature, and an edge-turn histogram feature, and
determining the first and second category scores for the each fragment by applying the two-stage classifier to the determined feature measurements; and
assigning a same label to all pixels in a fragment when the fragment is classified by the two-stage classifier.
2. The method according to claim 1 wherein the segmenting of the image lines images is directed to segregation of the text lines believed to be aligned text, from other aspects of the images.
3. The method according to claim 1, wherein the document images are electronic images stored in an electronic memory and are segmented by a processor associated with the electronic memory.
4. A method of classifying marking types on images of a document, the method comprising:
supplying the document containing the images to a segmenter;
segmenting the images received by the segmenter into fragments, the segmenting including identifying neatly written or printed text by grouping selected feature points along predetermined orientations, the feature points including local extrema of bounding contours of connected components, and subtracting enclosing boundary boxes of text lines from remaining document material to fragment connected components that are part of the text lines and part of extraneous markings;
supplying the fragments to a two-stage classifier, the two-stage classifier providing a plurality of first classifiers generating a first category score and a second classifier generating a second category score to each fragment, wherein the two-stage classifier is trained from groundtruth images whose pixels are labeled according to known marking types, wherein the first category score is generated as an array of scores for the each fragment, and the second category score is provided by reclassifying the each fragment by considering a neighborhood fragment of the each fragment, and wherein the first and second category scores to the each fragment are generated by:
determining feature measurements for the each fragment, the feature measurements including measuring a segmenter feature, a size feature, a location feature, a regularity feature, an edge curvature feature, a contour feature, a run length feature, and an edge-turn histogram feature, and
determining the first and second category scores for the each fragment by applying the two-stage classifier to the determined feature measurements; and
assigning a same label to all pixels in a fragment when the fragment is classified by the two-stage classifier;
wherein the segmenting includes processing the images to find text lines, the processing comprising:
detecting upper and lower extrema of the connected components;
identifying upper and lower contour extrema of the detected upper and lower extrema of the connected components;
grouping the identified upper and lower contour extrema;
identifying upper contour point groups and lower contour point groups;
fitting the grouped upper and lower point groups;
filtering out the fitted and grouped upper and lower point groups that are outside a predetermined alignment threshold;
forming upper and lower alignment segments for the upper and lower point groups that remain after the filtering operation;
matching as pairs the upper and lower segments that remain after the filtering operation; and
forming text line bounding boxes based on the pairs of matched upper and lower segments that remain after the filtering operation, the bounding boxes identifying connected components believed to be aligned text.
5. A system of classifying marking types on images of a document, the system comprising:
a segmenter operated on a processor and configured to receive the document containing the images, the segmenter segmenting the images into fragments of foreground pixel structures that are identified as being likely to be of the same marking type by finding connected components, and dividing at least some of the connected components to obtain image fragments, the connected components being isolated, continuous regions of foreground pixels, the segmenter segmenting the images by:
identifying neatly written or printed text by grouping selected feature points along predetermined orientations, the feature points being local extrema of bounding contours of the connected components; and
subtracting enclosing boundary boxes of text lines from remaining document material to fragment connected components that are partly part of the text lines and partly part of extraneous markings; and
a two-stage classifier operated on a processor and configured to receive the fragments, the two-stage classifier providing a plurality of first classifiers generating a first category score and a second classifier generating a second category score to each received fragment, wherein the two-stage classifier is trained from ground truth images whose pixels are labeled according to known marking types, wherein the first category score is generated as an array of scores for the each fragment, and the second category score is provided by reclassifying the each fragment by considering a neighborhood fragment of the each fragment, and wherein the first and second category scores to the each fragment are generated by:
determining feature measurements for the each fragment, the feature measurements including measuring a segmenter feature, a size feature, a location feature, a regularity feature, an edge curvature feature, a contour feature, a run length feature, and an edge-turn histogram feature, and
determining the first and second category scores for the each fragment by applying the two-stage classifier to the determined feature measurements,
the two-stage classifier assigning a same label to all pixels in a fragment when the fragment is classified by the two-stage classifier.
6. The system according to claim 5 wherein the segmenter is further configured to find text lines including:
detecting upper and lower extrema of the connected components;
identifying the upper and lower contour extrema of the detected upper and lower extrema of the connected components;
grouping the identified upper and lower contour extrema;
identifying upper contour point groups and lower contour point groups;
fitting the grouped upper and lower point groups;
filtering out the fitted and grouped upper and lower point groups that are outside a predetermined alignment threshold;
forming upper and lower alignment segments for the upper and lower point groups that remain after the filtering operation;
matching as pairs the upper and lower segments that remain after the filtering operation; and
forming text line bounding boxes based on the pairs of matched upper and lower segments that remain after the filtering operation, the bounding boxes identifying connected components believed to be aligned text.
7. The method according to claim 5, wherein the document images are electronic images stored in an electronic memory and are segmented by a processor associated with the electronic memory.
8. The system according to claim 5 further including a scanner to receive a hardcopy document containing images, the scanner converting the hardcopy document into an electronic document, the electronic document being the document supplied to the segmenter.
9. The method according to claim 1, wherein the connected components are each an isolated, continuous region of foreground pixels, and wherein at least one of the fragments is a subset of one of the connected components.
10. The method according to claim 1, wherein the segmenting includes:
grouping the selected feature points into strips;
fitting lines to the strips; and
forming the enclosing bounding boxes from pairs of fitted lines.
11. The method according to claim 1, further including:
training the classifier from the groundtruth images using a machine learning algorithm.
12. The method according to claim 1, further including:
providing the category score to each of the fragments by:
determining feature measurements for the fragment;
determining the category score for the fragment by applying the classifier to the determined feature measurements.
13. The method according to claim 1, wherein the segmenting includes:
determining the enclosing boundary boxes for individual text lines based on the grouping.
14. The system according to claim 5, wherein the segmenting includes:
grouping the selected feature points into strips;
fitting lines to the strips; and
forming the enclosing bounding boxes from pairs of fitted lines.
15. The system according to claim 5, wherein the segmenter segments the images by further:
determining the enclosing boundary boxes for individual text lines based on the grouping.
16. The system according to claim 5, wherein the classifier provides the category score to each received fragment by:
determining feature measurements for the fragment; and
determining the category score for the fragment by applying the classifier to the determined feature measurements.
17. A method of classifying marking types of one or more images of an electronic document, comprising:
receiving an electronic document containing one or more images;
segmenting the one or more images by grouping selected feature points of the one or more images along predetermined orientations into a plurality of segments;
supplying the plurality of segments to a two-stage classifier, the two-stage classifier providing a plurality of first classifiers generating a first category score and a second classifier generating a second category score to each segment, wherein the two-stage classifier is trained from groundtruth images whose pixels are labeled of known marking types, wherein the first category score is generated as an array of scores for the each fragment, and the second category score is provided by reclassifying the each fragment by considering a neighborhood segment of the each segment, and wherein the first and second category scores to the each fragment are generated by:
determining feature measurements for the each fragment, the feature measurements including measuring a segmenter feature, a size feature, a location feature, a regularity feature, an edge curvature feature, a contour feature, a run length feature, and an edge-turn histogram feature, and
determining the first and second category scores for the each fragment by applying the two-stage classifier to the determined feature measurements; and
assigning a same label to all pixels in a segment when the segment is classified by the two-stage classifier.
18. The method of claim 17, further comprising training the two-stage classifier from the groundtruth images using a machine learning algorithm.
19. The method of claim 17, wherein the segmenting includes: determining enclosing boundary boxes for individual text lines based on the grouping.
20. A method of classifying marking types on images of an electronic document, comprising:
receiving an electronic document containing one or more images;
segmenting the one or more images received by grouping selected feature points along predetermined orientations into a plurality of segments;
supplying the segments to a two-stage classifier, the two-stage classifier providing a plurality of first classifiers generating a first category score and a second classifier generating a second category score to each segment, wherein the two-stage classifier is trained from groundtruth images whose pixels are labeled of known marking types, wherein the first category score is generated as an array of scores for the each fragment, and the second category score is provided by reclassifying the each fragment by considering a neighborhood segment of the each segment, and wherein the first and second category scores to the each fragment are generated by:
determining feature measurements for the each fragment, the feature measurements including measuring a segmenter feature, a size feature, a location feature, a regularity feature, an edge curvature feature, a contour feature, a run length feature, and an edge-turn histogram feature, and
determining the first and second category scores for the each fragment by applying the two-stage classifier to the determined feature measurements; and
assigning a same label to all pixels in a segment when the segment is classified by the two-stage classifier;
wherein the segmenting includes processing the images to find text lines, the processing comprising:
detecting upper and lower extrema of the connected components;
identifying upper and lower contour extrema of the detected upper and lower extrema of the connected components;
grouping the identified upper and lower contour extrema;
identifying upper contour point groups and lower contour point groups;
fitting the grouped upper and lower point groups;
filtering out the fitted and grouped upper and lower point groups that are outside a predetermined alignment threshold;
forming upper and lower alignment segments for the upper and lower point groups that remain after the filtering operation;
matching as pairs the upper and lower segments that remain after the filtering operation; and
forming text line bounding boxes based on the pairs of matched upper and lower segments that remain after the filtering operation, the bounding boxes identifying connected components believed to be aligned text.
21. A system of classifying marking types of one or more images of an electronic document, comprising:
a segmenter operated on a processor that receives an electronic document containing one or more images, the segmenter segmenting the one or more images by grouping selected feature points along predetermined orientations; and
a two-stage classifier operated on the processor that:
receives the segments, the two-stage classifier providing a plurality of first classifiers generating a first category score and a second classifier generating a second category score to each segment, wherein the two-stage classifier is trained from groundtruth images whose pixels are labeled of known marking types, wherein the first category score is generated as an array of scores for the each fragment, and the second category score is provided by reclassifying the each fragment by considering a neighborhood segment of the each segment, and wherein the first and second category scores to the each fragment are generated by:
determining feature measurements for the each fragment, the feature measurements including measuring a segmenter feature, a size feature, a location feature, a regularity feature, an edge curvature feature, a contour feature, a run length feature, and an edge-turn histogram feature, and
determining the first and second category scores for the each fragment by applying the two-stage classifier to the determined feature measurements, and
assigns a same label to all pixels in a segment when the segment is classified by the two-stage classifier.
22. The system of claim 21, wherein the one or more images are electronic images stored in an electronic memory and are segmented by a processor associated with the electronic memory.
23. The system of claim 21, further comprising a scanner to:
receive a hardcopy document containing images; and
convert the hardcopy document into an electronic document, wherein the electronic document is supplied to the segmenter.
24. The system of claim 21, wherein the segmenter segments the one or more images by further: determining enclosing boundary boxes for individual text lines based on the grouping.
25. An apparatus to classify marking types of one or more images of an electronic document, comprising:
a two-stage segmenter-classifier that:
receives an electronic document containing one or more images;
segments the one or more images by grouping selected feature points of the one or more images along predetermined orientations into a plurality of segments;
provides a plurality of first classifiers generating a first category score and a second classifier generating a second category score to each segment, wherein the two-stage segmenter-classifier is trained from groundtruth images whose pixels are labeled of known marking types, wherein the first category score is generated as an array of scores for the each fragment, and the second category score is provided by reclassifying the each fragment by considering a neighborhood segment of the each segment, and wherein the first and second category scores to the each fragment are generated by:
determining feature measurements for the each fragment, the feature measurements including measuring a segmenter feature, a size feature, a location feature, a regularity feature, an edge curvature feature, a contour feature, a run length feature, and an edge-turn histogram feature, and
determining the first and second category scores for the each fragment by applying the two-stage classifier to the determined feature measurements; and
assigns a same label to all pixels in a segment when the segment is classified by the two-stage segmenter-classifier; and
a scanner, communicatively coupled to the two-stage segmenter-classifier, that:
receives a hardcopy document containing one or more images,
converts the hardcopy document into an electronic document,
supplies the electronic document to the two-stage segmenter-classifier.
US15/200,351 2009-07-10 2016-07-01 System and method for segmenting text lines in documents Active USRE47889E1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/200,351 USRE47889E1 (en) 2009-07-10 2016-07-01 System and method for segmenting text lines in documents

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US12/500,882 US8649600B2 (en) 2009-07-10 2009-07-10 System and method for segmenting text lines in documents
US13/677,473 US8768057B2 (en) 2009-07-10 2012-11-15 System and method for segmenting text lines in documents
US15/200,351 USRE47889E1 (en) 2009-07-10 2016-07-01 System and method for segmenting text lines in documents

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/677,473 Reissue US8768057B2 (en) 2009-07-10 2012-11-15 System and method for segmenting text lines in documents

Publications (1)

Publication Number Publication Date
USRE47889E1 true USRE47889E1 (en) 2020-03-03

Family

ID=43034566

Family Applications (3)

Application Number Title Priority Date Filing Date
US12/500,882 Active 2031-11-26 US8649600B2 (en) 2009-07-10 2009-07-10 System and method for segmenting text lines in documents
US13/677,473 Ceased US8768057B2 (en) 2009-07-10 2012-11-15 System and method for segmenting text lines in documents
US15/200,351 Active USRE47889E1 (en) 2009-07-10 2016-07-01 System and method for segmenting text lines in documents

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US12/500,882 Active 2031-11-26 US8649600B2 (en) 2009-07-10 2009-07-10 System and method for segmenting text lines in documents
US13/677,473 Ceased US8768057B2 (en) 2009-07-10 2012-11-15 System and method for segmenting text lines in documents

Country Status (3)

Country Link
US (3) US8649600B2 (en)
EP (1) EP2275973B1 (en)
JP (1) JP5729930B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11462037B2 (en) 2019-01-11 2022-10-04 Walmart Apollo, Llc System and method for automated analysis of electronic travel data

Families Citing this family (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8649600B2 (en) 2009-07-10 2014-02-11 Palo Alto Research Center Incorporated System and method for segmenting text lines in documents
US8442319B2 (en) * 2009-07-10 2013-05-14 Palo Alto Research Center Incorporated System and method for classifying connected groups of foreground pixels in scanned document images according to the type of marking
US8452086B2 (en) * 2009-07-10 2013-05-28 Palo Alto Research Center Incorporated System and user interface for machine-assisted human labeling of pixels in an image
CN102622724A (en) * 2011-01-27 2012-08-01 鸿富锦精密工业(深圳)有限公司 Appearance patent image cutting method and system
JP5757259B2 (en) * 2012-02-28 2015-07-29 ブラザー工業株式会社 Image processing apparatus and image processing program
US9536141B2 (en) * 2012-06-29 2017-01-03 Palo Alto Research Center Incorporated System and method for forms recognition by synthesizing corrected localization of data fields
JP2014203393A (en) * 2013-04-09 2014-10-27 株式会社東芝 Electronic apparatus, handwritten document processing method, and handwritten document processing program
CN103413132B (en) * 2013-06-24 2016-11-09 西安交通大学 A kind of progressive level cognitive scene image text detection method
JP6094400B2 (en) * 2013-06-25 2017-03-15 ソニー株式会社 Information processing apparatus, information processing method, and information processing program
US8831329B1 (en) 2013-06-28 2014-09-09 Google Inc. Extracting card data with card models
US9235755B2 (en) * 2013-08-15 2016-01-12 Konica Minolta Laboratory U.S.A., Inc. Removal of underlines and table lines in document images while preserving intersecting character strokes
US9245205B1 (en) * 2013-10-16 2016-01-26 Xerox Corporation Supervised mid-level features for word image representation
US8965117B1 (en) * 2013-12-17 2015-02-24 Amazon Technologies, Inc. Image pre-processing for reducing consumption of resources
US9325672B2 (en) * 2014-04-25 2016-04-26 Cellco Partnership Digital encryption shredder and document cube rebuilder
US9940511B2 (en) * 2014-05-30 2018-04-10 Kofax, Inc. Machine print, hand print, and signature discrimination
US9842281B2 (en) * 2014-06-05 2017-12-12 Xerox Corporation System for automated text and halftone segmentation
US9904956B2 (en) 2014-07-15 2018-02-27 Google Llc Identifying payment card categories based on optical character recognition of images of the payment cards
US9582727B2 (en) 2015-01-16 2017-02-28 Sony Corporation Text recognition system with feature recognition and method of operation thereof
US9530082B2 (en) * 2015-04-24 2016-12-27 Facebook, Inc. Objectionable content detector
US9785850B2 (en) 2015-07-08 2017-10-10 Sage Software, Inc. Real time object measurement
US9684984B2 (en) * 2015-07-08 2017-06-20 Sage Software, Inc. Nearsighted camera object detection
CN107133622B (en) 2016-02-29 2022-08-26 阿里巴巴集团控股有限公司 Word segmentation method and device
US10037459B2 (en) 2016-08-19 2018-07-31 Sage Software, Inc. Real-time font edge focus measurement for optical character recognition (OCR)
EP3479259A4 (en) * 2016-09-08 2020-06-24 Goh Soo, Siah Video ingestion framework for visual search platform
US10354161B2 (en) * 2017-06-05 2019-07-16 Intuit, Inc. Detecting font size in a digital image
US10163022B1 (en) * 2017-06-22 2018-12-25 StradVision, Inc. Method for learning text recognition, method for recognizing text using the same, and apparatus for learning text recognition, apparatus for recognizing text using the same
US10452952B2 (en) * 2017-06-30 2019-10-22 Konica Minolta Laboratory U.S.A., Inc. Typesetness score for a table
RU2666277C1 (en) * 2017-09-06 2018-09-06 Общество с ограниченной ответственностью "Аби Продакшн" Text segmentation
US10318803B1 (en) * 2017-11-30 2019-06-11 Konica Minolta Laboratory U.S.A., Inc. Text line segmentation method
US11593552B2 (en) 2018-03-21 2023-02-28 Adobe Inc. Performing semantic segmentation of form images using deep learning
CN108875737B (en) * 2018-06-11 2022-06-21 四川骏逸富顿科技有限公司 Method and system for detecting whether check box is checked in paper prescription document
CN109191210A (en) * 2018-09-13 2019-01-11 厦门大学嘉庚学院 A kind of broadband target user's recognition methods based on Adaboost algorithm
US10402673B1 (en) 2018-10-04 2019-09-03 Capital One Services, Llc Systems and methods for digitized document image data spillage recovery
US10331966B1 (en) * 2018-10-19 2019-06-25 Capital One Services, Llc Image processing to detect a rectangular object
CN109902806B (en) * 2019-02-26 2021-03-16 清华大学 Method for determining target bounding box of noise image based on convolutional neural network
US10671892B1 (en) * 2019-03-31 2020-06-02 Hyper Labs, Inc. Apparatuses, methods, and systems for 3-channel dynamic contextual script recognition using neural network image analytics and 4-tuple machine learning with enhanced templates and context data
US11042734B2 (en) * 2019-08-13 2021-06-22 Adobe Inc. Electronic document segmentation using deep learning
US11106891B2 (en) 2019-09-09 2021-08-31 Morgan Stanley Services Group Inc. Automated signature extraction and verification
US11074473B1 (en) 2020-01-21 2021-07-27 Capital One Services, Llc Systems and methods for digitized document image text contouring
CN111767787B (en) * 2020-05-12 2023-07-18 北京奇艺世纪科技有限公司 Method, device, equipment and storage medium for judging front and back sides of identity card image
CN111832292B (en) * 2020-06-03 2024-02-02 北京百度网讯科技有限公司 Text recognition processing method, device, electronic equipment and storage medium
CN111680628B (en) * 2020-06-09 2023-04-28 北京百度网讯科技有限公司 Text frame fusion method, device, equipment and storage medium
CN111680145B (en) * 2020-06-10 2023-08-15 北京百度网讯科技有限公司 Knowledge representation learning method, apparatus, device and storage medium
CN112989452B (en) * 2021-01-20 2023-12-29 上海品览智造科技有限公司 Identification method for labeling text on component lead in CAD water supply and drainage professional drawing
US11682220B2 (en) * 2021-03-15 2023-06-20 Optum Technology, Inc. Overlap-aware optical character recognition
JP2022191771A (en) * 2021-06-16 2022-12-28 キヤノン株式会社 Image processing apparatus, image processing method, and program
US11755817B2 (en) * 2021-08-02 2023-09-12 Adobe Inc. Systems for generating snap guides relative to glyphs of editable text
US11830264B2 (en) * 2022-01-31 2023-11-28 Intuit Inc. End to end trainable document extraction
CN116090417B (en) * 2023-04-11 2023-06-27 福昕鲲鹏(北京)信息科技有限公司 Layout document text selection rendering method and device, electronic equipment and storage medium

Citations (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5181255A (en) 1990-12-13 1993-01-19 Xerox Corporation Segmentation of handwriting and machine printed text
US5201011A (en) 1991-11-19 1993-04-06 Xerox Corporation Method and apparatus for image hand markup detection using morphological techniques
US5202933A (en) * 1989-12-08 1993-04-13 Xerox Corporation Segmentation of text and graphics
US5369714A (en) 1991-11-19 1994-11-29 Xerox Corporation Method and apparatus for determining the frequency of phrases in a document without document image decoding
US5570435A (en) * 1989-12-08 1996-10-29 Xerox Corporation Segmentation of text styles
US5778092A (en) 1996-12-20 1998-07-07 Xerox Corporation Method and apparatus for compressing color or gray scale documents
US5852676A (en) * 1995-04-11 1998-12-22 Teraform Inc. Method and apparatus for locating and identifying fields within a document
US5892842A (en) * 1995-12-14 1999-04-06 Xerox Corporation Automatic method of identifying sentence boundaries in a document image
US5953451A (en) * 1997-06-19 1999-09-14 Xerox Corporation Method of indexing words in handwritten document images using image hash tables
US5956468A (en) 1996-07-12 1999-09-21 Seiko Epson Corporation Document segmentation system
US6009196A (en) 1995-11-28 1999-12-28 Xerox Corporation Method for classifying non-running text in an image
US6301386B1 (en) 1998-12-09 2001-10-09 Ncr Corporation Methods and apparatus for gray image based text identification
US6377710B1 (en) 1998-11-25 2002-04-23 Xerox Corporation Method and apparatus for extracting the skeleton of a binary figure by contour-based erosion
US6411733B1 (en) 1998-11-25 2002-06-25 Xerox Corporation Method and apparatus for separating document image object types
US20020106128A1 (en) * 2001-02-06 2002-08-08 International Business Machines Corporation Identification, separation and compression of multiple forms with mutants
US20020135786A1 (en) * 2001-02-09 2002-09-26 Yue Ma Printing control interface system and method with handwriting discrimination capability
US6587583B1 (en) 1999-09-17 2003-07-01 Kurzweil Educational Systems, Inc. Compression/decompression algorithm for image documents having text, graphical and color content
US20030215136A1 (en) * 2002-05-17 2003-11-20 Hui Chao Method and system for document segmentation
US6771816B1 (en) 2000-01-19 2004-08-03 Adobe Systems Incorporated Generating a text mask for representing text pixels
US20040165774A1 (en) * 2003-02-26 2004-08-26 Dimitrios Koubaroulis Line extraction in digital ink
US20050100217A1 (en) * 2003-11-07 2005-05-12 Microsoft Corporation Template-based cursive handwriting recognition
US6903751B2 (en) 2002-03-22 2005-06-07 Xerox Corporation System and method for editing electronic images
US20060002623A1 (en) 2004-06-30 2006-01-05 Sharp Laboratories Of America, Inc. Methods and systems for complexity-based segmentation refinement
US7010165B2 (en) * 2002-05-10 2006-03-07 Microsoft Corporation Preprocessing of multi-line rotated electronic ink
US7036077B2 (en) 2002-03-22 2006-04-25 Xerox Corporation Method for gestural interpretation in a system for selecting and arranging visible material in document images
US7050632B2 (en) * 2002-05-14 2006-05-23 Microsoft Corporation Handwriting layout analysis of freeform digital ink input
US7079687B2 (en) 2003-03-06 2006-07-18 Seiko Epson Corporation Method and apparatus for segmentation of compound documents
US20060164682A1 (en) * 2005-01-25 2006-07-27 Dspv, Ltd. System and method of improving the legibility and applicability of document pictures using form based image enhancement
US7086013B2 (en) 2002-03-22 2006-08-01 Xerox Corporation Method and system for overloading loop selection commands in a system for selecting and arranging visible material in document images
US20060193518A1 (en) * 2005-01-28 2006-08-31 Jianxiong Dong Handwritten word recognition based on geometric decomposition
US20060222239A1 (en) 2005-03-31 2006-10-05 Bargeron David M Systems and methods for detecting text
US7136082B2 (en) * 2002-01-25 2006-11-14 Xerox Corporation Method and apparatus to convert digital ink images for use in a structured text/graphics editor
US20070009153A1 (en) 2005-05-26 2007-01-11 Bourbay Limited Segmentation of digital images
US7177483B2 (en) 2002-08-29 2007-02-13 Palo Alto Research Center Incorporated. System and method for enhancement of document images
US20080002887A1 (en) 2006-06-28 2008-01-03 Microsoft Corporation Techniques for filtering handwriting recognition results
US7379594B2 (en) * 2004-01-28 2008-05-27 Sharp Laboratories Of America, Inc. Methods and systems for automatic detection of continuous-tone regions in document images
US20080175507A1 (en) 2007-01-18 2008-07-24 Andrew Lookingbill Synthetic image and video generation from ground truth data
US20080267497A1 (en) 2007-04-27 2008-10-30 Jian Fan Image segmentation and enhancement
US20090185723A1 (en) 2008-01-21 2009-07-23 Andrew Frederick Kurtz Enabling persistent recognition of individuals in images
US20100040285A1 (en) 2008-08-14 2010-02-18 Xerox Corporation System and method for object class localization and semantic class based image segmentation
US7783117B2 (en) 2005-08-12 2010-08-24 Seiko Epson Corporation Systems and methods for generating background and foreground images for document compression
US7792353B2 (en) 2006-10-31 2010-09-07 Hewlett-Packard Development Company, L.P. Retraining a machine-learning classifier using re-labeled training samples
US20110007964A1 (en) 2009-07-10 2011-01-13 Palo Alto Research Center Incorporated System and method for machine-assisted human labeling of pixels in an image
US20110007970A1 (en) 2009-07-10 2011-01-13 Palo Alto Research Center Incorporated System and method for segmenting text lines in documents
US20110007366A1 (en) 2009-07-10 2011-01-13 Palo Alto Research Center Incorporated System and method for classifying connected groups of foreground pixels in scanned document images according to the type of marking
US7899258B2 (en) 2005-08-12 2011-03-01 Seiko Epson Corporation Systems and methods to convert images into high-quality compressed documents
US7907778B2 (en) 2007-08-13 2011-03-15 Seiko Epson Corporation Segmentation-based image labeling
US7936923B2 (en) 2007-08-31 2011-05-03 Seiko Epson Corporation Image background suppression
US7958068B2 (en) 2007-12-12 2011-06-07 International Business Machines Corporation Method and apparatus for model-shared subspace boosting for multi-label classification
US8009928B1 (en) * 2008-01-23 2011-08-30 A9.Com, Inc. Method and system for detecting and recognizing text in images
US8156115B1 (en) * 2007-07-11 2012-04-10 Ricoh Co. Ltd. Document-based networking with mixed media reality
US8171392B2 (en) 2009-04-28 2012-05-01 Lexmark International, Inc. Automatic forms processing systems and methods

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11238095A (en) * 1998-02-20 1999-08-31 Toshiba Corp Mail address reader
JP2000181993A (en) * 1998-12-16 2000-06-30 Fujitsu Ltd Character recognition method and device
JP4229521B2 (en) * 1999-05-21 2009-02-25 富士通株式会社 Character recognition method and apparatus
US6909805B2 (en) * 2001-01-31 2005-06-21 Matsushita Electric Industrial Co., Ltd. Detecting and utilizing add-on information from a scanned document image
JP3914119B2 (en) * 2002-09-02 2007-05-16 東芝ソリューション株式会社 Character recognition method and character recognition device

Patent Citations (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5202933A (en) * 1989-12-08 1993-04-13 Xerox Corporation Segmentation of text and graphics
US5570435A (en) * 1989-12-08 1996-10-29 Xerox Corporation Segmentation of text styles
US5181255A (en) 1990-12-13 1993-01-19 Xerox Corporation Segmentation of handwriting and machine printed text
US5201011A (en) 1991-11-19 1993-04-06 Xerox Corporation Method and apparatus for image hand markup detection using morphological techniques
US5369714A (en) 1991-11-19 1994-11-29 Xerox Corporation Method and apparatus for determining the frequency of phrases in a document without document image decoding
US5852676A (en) * 1995-04-11 1998-12-22 Teraform Inc. Method and apparatus for locating and identifying fields within a document
US6009196A (en) 1995-11-28 1999-12-28 Xerox Corporation Method for classifying non-running text in an image
US5892842A (en) * 1995-12-14 1999-04-06 Xerox Corporation Automatic method of identifying sentence boundaries in a document image
US5956468A (en) 1996-07-12 1999-09-21 Seiko Epson Corporation Document segmentation system
US5778092A (en) 1996-12-20 1998-07-07 Xerox Corporation Method and apparatus for compressing color or gray scale documents
US5953451A (en) * 1997-06-19 1999-09-14 Xerox Corporation Method of indexing words in handwritten document images using image hash tables
US6377710B1 (en) 1998-11-25 2002-04-23 Xerox Corporation Method and apparatus for extracting the skeleton of a binary figure by contour-based erosion
US6411733B1 (en) 1998-11-25 2002-06-25 Xerox Corporation Method and apparatus for separating document image object types
US6301386B1 (en) 1998-12-09 2001-10-09 Ncr Corporation Methods and apparatus for gray image based text identification
US6587583B1 (en) 1999-09-17 2003-07-01 Kurzweil Educational Systems, Inc. Compression/decompression algorithm for image documents having text, graphical and color content
US6771816B1 (en) 2000-01-19 2004-08-03 Adobe Systems Incorporated Generating a text mask for representing text pixels
US20020106128A1 (en) * 2001-02-06 2002-08-08 International Business Machines Corporation Identification, separation and compression of multiple forms with mutants
US20020135786A1 (en) * 2001-02-09 2002-09-26 Yue Ma Printing control interface system and method with handwriting discrimination capability
US7136082B2 (en) * 2002-01-25 2006-11-14 Xerox Corporation Method and apparatus to convert digital ink images for use in a structured text/graphics editor
US6903751B2 (en) 2002-03-22 2005-06-07 Xerox Corporation System and method for editing electronic images
US7086013B2 (en) 2002-03-22 2006-08-01 Xerox Corporation Method and system for overloading loop selection commands in a system for selecting and arranging visible material in document images
US7036077B2 (en) 2002-03-22 2006-04-25 Xerox Corporation Method for gestural interpretation in a system for selecting and arranging visible material in document images
US7010165B2 (en) * 2002-05-10 2006-03-07 Microsoft Corporation Preprocessing of multi-line rotated electronic ink
US7050632B2 (en) * 2002-05-14 2006-05-23 Microsoft Corporation Handwriting layout analysis of freeform digital ink input
US20030215136A1 (en) * 2002-05-17 2003-11-20 Hui Chao Method and system for document segmentation
US7177483B2 (en) 2002-08-29 2007-02-13 Palo Alto Research Center Incorporated. System and method for enhancement of document images
US20040165774A1 (en) * 2003-02-26 2004-08-26 Dimitrios Koubaroulis Line extraction in digital ink
US7079687B2 (en) 2003-03-06 2006-07-18 Seiko Epson Corporation Method and apparatus for segmentation of compound documents
US20050100217A1 (en) * 2003-11-07 2005-05-12 Microsoft Corporation Template-based cursive handwriting recognition
US7379594B2 (en) * 2004-01-28 2008-05-27 Sharp Laboratories Of America, Inc. Methods and systems for automatic detection of continuous-tone regions in document images
US20060002623A1 (en) 2004-06-30 2006-01-05 Sharp Laboratories Of America, Inc. Methods and systems for complexity-based segmentation refinement
US20060164682A1 (en) * 2005-01-25 2006-07-27 Dspv, Ltd. System and method of improving the legibility and applicability of document pictures using form based image enhancement
US20060193518A1 (en) * 2005-01-28 2006-08-31 Jianxiong Dong Handwritten word recognition based on geometric decomposition
US20060222239A1 (en) 2005-03-31 2006-10-05 Bargeron David M Systems and methods for detecting text
US7570816B2 (en) 2005-03-31 2009-08-04 Microsoft Corporation Systems and methods for detecting text
US20070009153A1 (en) 2005-05-26 2007-01-11 Bourbay Limited Segmentation of digital images
US7899258B2 (en) 2005-08-12 2011-03-01 Seiko Epson Corporation Systems and methods to convert images into high-quality compressed documents
US7783117B2 (en) 2005-08-12 2010-08-24 Seiko Epson Corporation Systems and methods for generating background and foreground images for document compression
US7734094B2 (en) 2006-06-28 2010-06-08 Microsoft Corporation Techniques for filtering handwriting recognition results
US20080002887A1 (en) 2006-06-28 2008-01-03 Microsoft Corporation Techniques for filtering handwriting recognition results
US7792353B2 (en) 2006-10-31 2010-09-07 Hewlett-Packard Development Company, L.P. Retraining a machine-learning classifier using re-labeled training samples
US7970171B2 (en) 2007-01-18 2011-06-28 Ricoh Co., Ltd. Synthetic image and video generation from ground truth data
US20080175507A1 (en) 2007-01-18 2008-07-24 Andrew Lookingbill Synthetic image and video generation from ground truth data
US20080267497A1 (en) 2007-04-27 2008-10-30 Jian Fan Image segmentation and enhancement
US8156115B1 (en) * 2007-07-11 2012-04-10 Ricoh Co. Ltd. Document-based networking with mixed media reality
US7907778B2 (en) 2007-08-13 2011-03-15 Seiko Epson Corporation Segmentation-based image labeling
US7936923B2 (en) 2007-08-31 2011-05-03 Seiko Epson Corporation Image background suppression
US7958068B2 (en) 2007-12-12 2011-06-07 International Business Machines Corporation Method and apparatus for model-shared subspace boosting for multi-label classification
US20090185723A1 (en) 2008-01-21 2009-07-23 Andrew Frederick Kurtz Enabling persistent recognition of individuals in images
US8180112B2 (en) 2008-01-21 2012-05-15 Eastman Kodak Company Enabling persistent recognition of individuals in images
US8009928B1 (en) * 2008-01-23 2011-08-30 A9.Com, Inc. Method and system for detecting and recognizing text in images
US20100040285A1 (en) 2008-08-14 2010-02-18 Xerox Corporation System and method for object class localization and semantic class based image segmentation
US8171392B2 (en) 2009-04-28 2012-05-01 Lexmark International, Inc. Automatic forms processing systems and methods
US20110007366A1 (en) 2009-07-10 2011-01-13 Palo Alto Research Center Incorporated System and method for classifying connected groups of foreground pixels in scanned document images according to the type of marking
US20110007970A1 (en) 2009-07-10 2011-01-13 Palo Alto Research Center Incorporated System and method for segmenting text lines in documents
US20110007964A1 (en) 2009-07-10 2011-01-13 Palo Alto Research Center Incorporated System and method for machine-assisted human labeling of pixels in an image

Non-Patent Citations (45)

* Cited by examiner, † Cited by third party
Title
An et al., "Iterated document content classification", in Int'l Conf. Doc. Analysis & Recognition, vol. 1, pp. 252-256, Los Alamitos, CA, 2007.
An et al.. "Iterated document content classification", in Inl'l Conf. Doc. Analysis & Recognition, val. 1, pp. 252-256, Los Alamitos, CA, 2007.
Bal et al., "Interactive degraded document enhancement and ground truth generation", DAS, 2008.
Breuel, "Segmentation of Handprinted Letter Strings Using a Dynamic Programming Algorithm", in Proceedings of 6th Int'l Conf. on Document Analysis and Recognition, pp. 821-826, 2001.
Chen et al., "Image Objects and Multi-Scale Features for Annotation Detection", in Proceedings of Int'l Conf. on Pattern Recognition, Tampa Bay, FL, 2008.
Evans et al., "Computer Assisted Interactive Recognition (CAVIAR) Technology", IEEE Inl'l Coni. Electro-Information Technology, 2005.
Evans et al., "Computer Assisted Interactive Recognition (CAVIAR) Technology", IEEE Int'l Conf. Electro-Information Technology, 2005.
Fan, et al., "Classification of machine-printed and handwritten texts using character block layout variance.", Pattern Recognition, 31(9):1275-1284, 1998.
Ford, et al., "Ground truth data for document image analysis", Symposium on Document Image Understanding and Technology, 2003.
Freund et al., "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting", in European Conf. on Computational Learning Theory, pp. 23-27, 1995.
Gatos et al., "ICDAR2007 handwriting segmentation contest", ICDAR, pp. 1284-1288, 2007.
Guo et al., "Separating handwritten material from machine printed text using hidden Markov models", Proc. ICDAR, pp. 439-443, 2001.
Guyon et al., "Data sets for OCR and document image understanding research", Proc. SPIE-Document Recognition IV, 1997.
Guyon et al., "Data sets for OCR and document image understanding research", Proc. SPIE—Document Recognition IV, 1997.
Ha et al., "The Architecture of Trueviz: A Groundtruth/Metadata Editing and Visualizing Toolkit", Pattern Recognition, 36(3):811-825, 2003.
Houle et al., "Handwriting stroke extraction using a new xy1c transform", Proc. ICDAR, pp. 91-95, 2001.
Houle et al., "Handwriting stroke extraction using a new xytc transform", Proc. ICDAR, pp. 91-95, 2001.
Huang et al., "User-Assisted Ink-Bleed Correction for Handwritten Documents", Joint Conference on Digital Libraries, 2008.
Kavallieratou et al., "Machine-printed from handwritten text discrimination", IWFHR-9, pp. 312-316, 2004.
Kavallieratou et al., Handwritten text localization in skewed documents, ICIP, pp. 1102-1105, 2001.
Koyama et al., "Local-Spectrum-Based Distinction Between Handwritten and Machine-Printed Characters", in Proceedings of the 2008 IEEE Inl'l Coni. on Image Processing, San Diego, CA, Oct. 2008.
Kuhnke et al., "A system for machine-written and hand-written character distinction", ICDAR, pp. 811-814, 1995.
Li et al., A new algorithm for detecting text line in handwritten documents, IWFHR, pp. 35-40, 2006.
Liang, et al., "Document image restoration using binary morphological filters", in Proc. SPIE Conf. Document Recognition, pp. 274-285, 1996.
Manmatha et al., "A scale space approach for automatically segmenting words from historical handwritten documents", IEEE, TPAMI, 27(8):1212-1225, Aug. 2005.
Moll et al., "Truthing for pixel-accurate segmentation", DAS 2008, 2008.
Okun et al., "Automatic ground-truth generation for skew-tolerance evaluation of document layout analysis methods", ICPR, pp. 376-379, 2000.
OpenCV, Internet website http://opencv.willowgarage.com/wiki/, last edited Mar. 18, 2009.
Pal et al., "Machine-printed and handwritten text lines identification", Pall. Rec. Lett., 22(3-4):431, 2001.
Pal et al., "Machine-printed and handwritten text lines identification", Patt. Rec. Lett., 22(3-4):431, 2001.
Roth et al., "Ground Truth Editor and Document Interface", Summit on Arabic and Chinese Handwriting, 2006.
Saund et al., "Perceptually-Supported Image Editing of Text and Graphics", ACM UISTI, pp. 183-192, 2003.
Saund et al., PixLabeler: User Interface for Pixel-Level Labeliing of Elements in Document Images, Document Analysis and Recognition, 2009, pp. 646-650, ICDAR '09 10th Int'l Conf., Made Available Jul. 26, 2009.
Saund et al., PixLabeler: User Interface for Pixel-Level Labeling of Elements in Document Images, Documnt Analysis and Recognition, 2009, pp. 646-650, ICDAR '09 10th Int'l Conf., Made Available Jul. 26, 2009.
Saund, et al., PixLabeler: User Interface for Pixel-Level Labeling of Elements in Document Images, Document Analysis and Recognition, 2009, pp. 646-650, ICDAR '09 10th Int'l Conf., Made Available Jul. 26, 2009.
Shafait et al., "Pixel-accurate representation and evaluation of page segmentation in document images", ICPR, pp. 872-875, 2006.
Shetiy, et al., "Segmentation and labeling of documents using conditional random fields", Proc. SPIE, 6500, 2007.
Shetty, et al., "Segmentation and labeling of documents using conditional random fields", Proc. SPIE, 6500, 2007.
Wenyin et al., "A protocol for performance evaluation of line detection algorithms", Machine Vision and Applications, 9:240-250, 1997.
Yacoub et al., "A ground truthing environment for complex documents", DAS, pp. 452-456, 2005.
Yang et al., "Semi-Automatic Grountdruth Generation for Chart Image Recognition", DAS, pp. 324-335, 2006.
Zheng et al., "Machine Printed Text and Handwriting Identification in Noisy Document Images", IEEE Trans. Pattern Anal. Mach. Intell., 26(3):337-353, 2004.
Zi et al., "Document image ground truth generation from electronic text", ICPR, pp. 663,666, 2004.
Zotkins et al., "Gedi: Groundtruthing environment for document images", http://lampsrv01.umiacs.umd.edu/projdb/projecl.php?id-53.
Zotkins et al., "Gedi: Groundtruthing environment for document images", http://lampsrv01.umiacs.umd.edu/projdb/project.php?id-53.

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11462037B2 (en) 2019-01-11 2022-10-04 Walmart Apollo, Llc System and method for automated analysis of electronic travel data

Also Published As

Publication number Publication date
JP2011018337A (en) 2011-01-27
EP2275973B1 (en) 2016-05-18
EP2275973A2 (en) 2011-01-19
US8649600B2 (en) 2014-02-11
US20130114890A1 (en) 2013-05-09
JP5729930B2 (en) 2015-06-03
EP2275973A3 (en) 2014-07-30
US8768057B2 (en) 2014-07-01
US20110007970A1 (en) 2011-01-13

Similar Documents

Publication Publication Date Title
USRE47889E1 (en) System and method for segmenting text lines in documents
US8442319B2 (en) System and method for classifying connected groups of foreground pixels in scanned document images according to the type of marking
Gatos et al. ICDAR2009 handwriting segmentation contest
Namboodiri et al. Document structure and layout analysis
Minetto et al. SnooperText: A text detection system for automatic indexing of urban scenes
JP5492205B2 (en) Segment print pages into articles
US6014450A (en) Method and apparatus for address block location
Chaudhuri et al. An approach for detecting and cleaning of struck-out handwritten text
Shafii Optical character recognition of printed persian/arabic documents
da Silva et al. Automatic discrimination between printed and handwritten text in documents
Sarkar et al. Word extraction and character segmentation from text lines of unconstrained handwritten Bangla document images
Brisinello et al. Optical Character Recognition on images with colorful background
Ntzios et al. An old greek handwritten OCR system based on an efficient segmentation-free approach
Alhéritière et al. A document straight line based segmentation for complex layout extraction
Sarkar et al. Suppression of non-text components in handwritten document images
CN114581928A (en) Form identification method and system
Banerjee et al. A system for handwritten and machine-printed text separation in Bangla document images
Lue et al. A novel character segmentation method for text images captured by cameras
Das et al. Hand-written and machine-printed text classification in architecture, engineering & construction documents
Seuret et al. Pixel level handwritten and printed content discrimination in scanned documents
Lin et al. Multilingual corpus construction based on printed and handwritten character separation
Marinai Learning algorithms for document layout analysis
JP3476595B2 (en) Image area division method and image binarization method
Dhandra et al. Classification of Document Image Components
Bouressace et al. A convolutional neural network for Arabic document analysis

Legal Events

Date Code Title Description
AS Assignment

Owner name: PALO ALTO RESEARCH CENTER INCORPORATED, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SAUND, ERIC;REEL/FRAME:044930/0392

Effective date: 20090707

Owner name: III HOLDINGS 6, LLC, DELAWARE

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PALO ALTO RESEARCH CENTER INCORPORATED;REEL/FRAME:044930/0523

Effective date: 20150529

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8