IL98293A - Method of discriminating between text and graphics - Google Patents

Method of discriminating between text and graphics

Info

Publication number
IL98293A
IL98293A IL9829391A IL9829391A IL98293A IL 98293 A IL98293 A IL 98293A IL 9829391 A IL9829391 A IL 9829391A IL 9829391 A IL9829391 A IL 9829391A IL 98293 A IL98293 A IL 98293A
Authority
IL
Israel
Prior art keywords
objects
graphics
text
black
white
Prior art date
Application number
IL9829391A
Other languages
Hebrew (he)
Other versions
IL98293A0 (en
Original Assignee
Scitex Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Scitex Corp Ltd filed Critical Scitex Corp Ltd
Priority to IL9829391A priority Critical patent/IL98293A/en
Priority to EP92630054A priority patent/EP0516576A2/en
Priority to JP4134588A priority patent/JPH05166002A/en
Publication of IL98293A0 publication Critical patent/IL98293A0/en
Publication of IL98293A publication Critical patent/IL98293A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10008Still image; Photographic image from scanner, fax or copier
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30176Document

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Character Input (AREA)

Description

METHOD OF DISCRIMINATING BETWEEN TEXT AND GRAPHICS METHOD OF DISCRIMINATING BETWEEN TEXT AND GRAPHICS BACKGROUND OF THE INVENTION The present invention relates to a method of analyzing documents or other source images in order to discriminate between text and graphics, and thereby to separate text from graphics .
Discrimination between text and graphics is frequently essential when processing documents. For example, some document processing applications are interested only in graphics (or text) . Other document processing applications apply different processes to text and graphics and therefore have to segment the image into regions of text, graphics and half-tone.
All applications discriminating between text and graphics require a definition distinguishing between the two. Some define text as characters grouped in strings, whereas characters which appear isolated are considered as graphics. Others define text as characters wherever they appear, regardless of font or size. The latter definition appears more appropriate but results in misclassifications; for example, a circle might be misclassified as the character "o" . Whichever definition is used, most algorithms proposed in the literature do not perform true character recognition, which is far more expensive, but rather use simple hueristics for classification .
There are two principal approaches by which text is discriminated from graphics: "top-down" and "bottom-up". In the "top-down" approach, the image is first divided into major regions which are further divided into subsequent regions. In the "bottom-up" approach, the image is first processed to determine the individually connected components. These components, when identified as characters, are grouped into words, words into sentences, and so on. The top-down approach is knowledge based. It is suitable only for images which are composed of strictly separated regions of text and graphics. Text words which lie within graphic regions are classified as graphics. The bottom-up approach, on the other hand, is more reliable but time consuming. Therefore, the two approaches should be used in conjunction; first a top-down method will detect the graphics regions, and then a bottom-up method will detect the text within these regions.
The run-length smearing algorithm (RLSA) is an example of a top-down method. This algorithm segments and labels the image into major regions of text lines, graphics and half-tone images. The algorithm replaes O's by 1's if the number of adjacent O's is less than a predefined threshold (O's correspond to white pixels and 1's correspond to black pixels).
This one dimensional operation is applied line-by-line as well as column-by-column to the two dimensional bitmap image. The two results are then combined by applying local AND to each pixel location. The resulting image contains black blocks wherever printed material appears on the original image producing an effect of smearing. The blocks are then labeled as text lines, graphics or half-tone images using statistical pattern classification {for example -number of black pixels in block, number of horizontal white/black transitions).
The RLSA algorithm is fast but is restricted to a certain class of images. No skewed text lines are allowed in these images, and the dimensions of characters must fit the predefined threshold parameters; otherwise, characters will remain isolated (if parameters are too small) or text lines will get combined (if parameters are too big).
After rough classification is received by a "top-down" algorithm, the graphic blocks are further processed by a "bottom-up" algorithm to obtain a detailed classification. Bottom-up algorithms start with a process to determine the individually connected components. Several algorithms are known which perform connected components detection. These algorithms can be combined into chain code generation algorithms in order to extract as much information as possible during one raster scan over the image. Such "combined" algorithm can operate fast on a run-length formated image (run time is proportional to the number of "runs" in the image which is roughly proportional to the length of boundaries in the image). At the end of such process, the following raw information is available for each connected component: (a) area (number of pixels forming the connected component ) ; ( 2 ) chain code description of boundaries (a chain for each boundary); and (3) identification of the enclosing connected component, and of the enclosed connected components.
This raw information can be further processed to resolve other properties: (4) enclosing rectangle; (5) Euler number (Euler number=1 - number of holes in shape); (6) perimeter length (total length of boundaries); and (7) hull area.
More shape properties other than those of (4)-(7) can be resolved from the information of (1)-(3), but properties (4)-(7) are most valuable for discrimination of character symbols in a minimum effort. The Euler number is available with no additional effort (Euler number=2 - number of chains) . The enclosing rectangle can be computed in one scan over the chains. Perimeter length equals roughly the total number of links in the chain code. Better estimation can be obtained in other methods, but this estimation is fairly good. The hull area can be computed by first finding the convex-hull polygon, and then finding the area of that polygon which is a simple task.
Most algorithms which discriminate text according to local shape features use the properties listed above. The algorithms which are based on local shape features have two major flaws: (1) they may misclassify graphics as text (a circle may classify as "o") and (2) they can not detect abnormal strings (for example, they can not detect a dashed line as graphics; instead, each minus sign is detected as a character symbol and the whole string is detected as text).
These flaws were fixed in a known Text-String Separation algorithm but at a high price of processing time. The clustering process of characters into strings takes most of the time. The algorithm uses Hough transform to detect collinear components and then groups them into words and phrases if they conform to some statistical pattern. The algorithm succeeds to classify abnormal strings as graphics, but is sensitive to parameter settings; a wrong selection may cause connected components which belong to one line to be grouped in different cells (undergrouping) , or it may cause several parallel strings to be grouped into a single cell (over-grouping) . The Hough transform may also mistakenly detect a group of vertical components as a vertical string although these components are part of horizontal text lines.
Another difficulty is that strings which have arc-orientation (rather than linear orientation) are not discriminated as text. The same happens with short isolated strings (strings containing less than three characters ) .
All of the algorithms mentioned above fail to properly discriminate between images which contain a large variety of font sizes. Moreover, they can not handle blocks of reversed text (reversed text is white text over a black background) .
OBJECTS AND SUMMARY OF THE INVENTION An object of the present invention is to provide a novel method, having advantages in one or more of the above respects, for analyzing a source image to separate text from graphics.
According to the present invention, there is provided a method of analyzing a source image to separate text from graphics, comprising a method of analyzing a source image to separate text from graphics, comprising (a) scanning and digitizing the source image to obtain a binary image including black and white objects; (b) filtering out the noise from the binary image to obtain a filtered binary image; (c) extracting the contours of the black objects and the white objects from the filtered binary image; (d) evaluating inclusion relationships between the objects, and generating a tree-like structure of such relationships; (e) utilizing said contours for measuring the objects to obtain the shape properties of each object; (f) effecting classification of the objects as graphics or text according to the measured shape properties; and the generated tree-like structure of the inclusion relationships; (g) and utilizing said source image and said classification of the objects for generating outputs representing graphics and text, respectively .
According to further features in the preferred embodiment of the invention desribed below, in step (b), the noise is filtered out by dilation of black pixels; in step (e), the objects are measured in a top-down sequence, starting with the object at the root of a tree; and in step (c), extracting the contour of the black objects and the white objects from the filtered binary image is effected by a single scan in which a window is convolved with the filtered binary image in a raster fashion. In addition, the window scans the image along a line and returns an indication of the type of pattern seen from the window and an indication of the center of the window; each type pattern is processed differently to determine whether a new object is started, continued or ended, all objects intersecting the current scan line being processed in parallel .
In the described preferred embodiment, when a maximal point is encountered during the window scan, it is considered to be a starting point of a new object, but if later the scan indicates it was a maximal point of a previously indicated object, the new object is merged with that of the previously indicated object.
Further features of the invention will be apparent from the description below.
BRIEF DESCRIPTION OF THE DRAWINGS The invention is herein described, by way of example only, with reference to the accompanying drawings, wherein: Fig. 1 is an overall pictorial diagram illustrating one application of the method of the present invention; Fig. 1a illustrates a typical document including graphics and text in different sizes, orientations and fonts, and the results of its being processed according to the present invention; Fig. 2 is a flow diagram illustrating the main steps in a method of analyzing a source image to sepatate text from graphics in accordance with the present invention; Fig. 3 is a diagram illustrating the scanning and digitizing step (a) in the diagram of Fig. 2; Fig. 4 is a diagram illustrating the dilation method for filtering noise in accordance with step (b) of the flow diagram of Fig. 2; Figs. 5a and 5b are flow diagrams illustrating one algorithm for performing step (b) in the flow diagram of Fig. 2; and Figs. 5c and 5d are diagrams helpful in understanding this step; Fig. 6a is a diagram illustrating the contour detection step (c) in the flow diagram of Fig. 2; and Fig. 6b more particularly illustrating one example of performing that step; Figs. 7a and 7b are flow diagrams illustrating an algorithm which may be used for performing step (c); Fig. 8 is a decision table used in the algorithm of Fig. 7 indicating how the different states are handled; Fig. 9 is a diagram illustrating the tree-generation step (d) in the flow diagram of Fig. 2; Fig. 10 is a flow diagram of one algorithm that may be used for performing a polygonal approximation in the object measurement step (e) of Fig. 2; Figs. 11a and 11b are flow diagrams illustrating one algorithm that may be used in performing the classification step (f) of Fig. 2; and Fig. 12 is a flow diagram illustrating one algorithm for performing the output-generation step (g) in Fig. 2.
DESCRIPTION OF A PREFERRED EMBODIMENT Overall System Fig. 1 pictorially illustrates a method of analyzing a source document 2 in accordance with the present invention to separate text from graphics, the text being outputted in document 4, and the graphics being outputted in document 6. For purposes of example and of showing the capability of the method, the source document 2, as shown in enlargement in Fig. la, includes graphics and text of different sizes, orientations and fonts .
Thus, the source document 2 containing the source image of both text and graphics is scanned by an optical scanner 8, and its output is fed to an image processing system, generally designaed 10, which includes an image disc 12, a memory 14, and a CPU 16. The image processing system 10 outputs the process information via a plotter 18 in the form of the two documents 4 and 6 : document 4 contains the text of the original document 2, and document 6 contains the graphics of the original document 2.
Fig. 2 is a flow diagram illustrating seven basic steps (a-g), generally designated by blocks 21-27, performed by the image processing system 10; as follows : (a) scans and digitizes the source image (document 2) to obtain a binary image including black and white objects (block 21); (b) filters out the noise from the binary image to obtain a filtered binary image (block 22); (c) extracts the contours of the black objects and the white objects from the filtered binary image (block 23); (d) evaluates the inclusion relationship between the objects and generates a tree-like structure of such relationship (block 24); (e) utilizes the contours detected in step c for measuring the objects to obtain the shaped properties of each object (block 25); (f) classifies the objects as graphics or text according to the measured shaped properties and the inclusion relationship obtained in step d (block 26); and (g) generates, via the output plotter 18, outputs representing text (document 4) and graphics (document 6), respectively (block 27).
Following is a more detailed description of each of the above steps : Scanning and Digitizing (Block 21, Fig. 2) This step is effected to obtain a binary version of the source image. It may be carried out by an optical scanner, a CCD (charge-coupled device) scanner, etc., to produce a binary file on disc or tape (e.g., image disc 12, Fig. 1) containing the bitmap representation of the source image. The bitmap can be a stream of bits with each bit corresponding to a black or a white pixel, or it can be encoded in runs. It will be assumed that a run-length coding is used, by which a sequence of black (or white) pixels are encoded by the colour with the length of the sequence being up to the next transition in colour. A typical resolution of scanning is 50 pixels/mm.
Fig. 3 diagrammatically illustrates the scanning and digitizing step, wherein it will be seen that the source image, as shown at 31 , is converted to a digitized bitmap representation of the source image, as shown at 32. It will also be seen that the bitmap representation of the source image 32 in Fig. 3 includes image data 32a and noise 32b.
Filtering Noise (Block 22, Fig. 2) The second step performed by the image processing system 10 of Fig. 1, as shown in the block diagram of Fig. 2, is noise filtration, namely the removal of the noise signals 32b in the bitmap representation illustrated at 32 in Fig. 3. This step is carried out by a dilation operator which changes white pixel to black if its distance from the nearest black pixel is below a predefined threshold.
This step is more particularly shown in Fig. 4, wherein it will be seen that the image data before dilation, as shown at 41, includes a number of isolated black pixels, 41a which are very close to a group of black pixels 41b and which are absorbed to form a single group 42a after the dilation step as shown at 42. This operation, which widens the black pixels and therefore connects together isolated pixels, decreases significantly the number of isolated black pixels which are in the surroundings of black objects.
A simple dilation algorithm can be: Set an output pixel to be the conjuctive of all input pixels in its surrounding.
The dilated image 42 is intermediate and is used only to partition the image roughly into regions of black and white objects. Later in the process, as will be described below, these regions will be classified, and the pixels of the original image will be coloured properly according to the class in which they reside.
Noise filtration by dilation provides two advantages: (a) it maintains the basic shape properties of the original objects; and (b) it faciliates the later determination as to which class the black pixels in the original image belong.
Dilation can be achieved in many ways . When performed on a bit map, it can be achieved by simple hardware or software; but when performed on a run-length coded image, it is more complicated.
Preferably, in order to utilize the advantages of the run-length coding, a specific apparatus is used operating according to the following algorithm, as illustrated in the flow diagrams of Figs. 5a and 5b, and also in Appendix A at the end of this specification.
Contour Detection (Block 23) In this step, the image obtained by the dilation is scanned in order to label the objects and to extract their contours. A contour of an object is defined as the chain of line segments which track the boundary of the object separating between black and white pixels. If the object is not solid (i.e., it contains holes), the contour of these holes is extracted as well. Therefore, an object may have more than one contour.
Fig. 6a illustrates the contour extracting step, wherein it will be seen that the black object shown at 61 is converted to the contour 62 constituted of a chain of line segments which track the boundary of the object 61.
Many algorithms are known for such chain generation in order to extract the contour. Some algorithms use a sequential approach, by which a contour is tracked from beginning to end before another contour is tracked. However, this aproach may result in many scans over the image, especially when the image contains many large objects, and therefore may take a considerable period of time.
Preferably, a single scan approach is used in the method of the present invention. In this approach, a 2 x 2 window is convolved with the image in a raster fashion. The raster scan can again benefit from the compact run-length coding since only locations of colour transitions need be examined instead of the whole image .
The general idea of the one-scan approach is as follows : The window scans the image and returns an indication of the type of pattern seen from the window and an indication of the position of the center of the window. Each type of pattern is processed differently to determine whether a new object is started, continued or ended. All objects intersected by the current scan line are processed in parallel. A new object always starts at a maximal point and ends at minimal point, but not all maximal points necessarily start new objects or do all minimal points always end existing objects. The minimal points makes no problem because by the time they are reached, sufficient information is already at hand to determine whether or not they are true end points. However, with the maximal points, there is a problem of ambiguity. At the time a maximal pint is encountered it cannot always be determined whether this point is a local maximum of an existing object or a global maximum in a new object.
In the described process, a maximal point is always considered to be a starting point of a new object. If later on it is discovered that it was a starting point of an existing object, the two objects, the true and the artificial, are merged and the artificial object is deleted.
At each maximal point two chains are started downwards , and at each minimal point two chains are connected. Therefore a contour is intially composed of more than one chain, and only when the object is ended are the chains connected properly to form one closed- loop contour. With each contour two pointers are connected to point at the two objects on the right-hand and left-hand sides of the contour. These pointers enable later to extract the inclusion relationship between the objects.
Fig. 6b illustrates a particular case, in which contour 1 is composed of chains A-F, contour 2 is composed of chains G-H, and contour 3 is composed of chains I-J. It will be seen that object 1 (background) is bounded by contours 1 and 3; object 2 is bounded by contours 1 and 2; object 3 is bounded by contour 2, and object 4 is bounded by contour 3.
Figs. 7 and 7a illustrate an example of an algorithm which may be used for this step; and Fig. 8 elaborates on the operations of blocks 71 and 72 in Fig. 7b and illustrates a decision table for the different states. Appendix B at the end of this specification illustrates an example of an algorithm for this purpose.
Tree Generation (Block 24) In this step, the inclusion relationship between the objects is evaluated, and a tree-like structure of such relationships is generated. This relationship is utilized at the time of classification, since it is sometimes important to have information about the objects included within one object in order to assign it a proper class. This relationship can be extracted easily from the data base of objects and contours produced in the previous step. All that is necessary is to set a pointer from each object to the object which includes it, namely to its predecessor. In that way, a tree-like structure is formed. There is one object which has no predecessor, which is usually the white background.
The predecessor of an object may be found as follows: Assuming that the contours are always directed counter-clockwise, first find out which of the contours is the outmost (it being recalled that an object has more than one contour if it contains holes), and then set the pointer to point at the object on the right side of this contour. This object is the predecessor.
Fig. 9 diagrammatically illustrates the step of determining the inclusion relationship. Graph 92 in Fig. 9 is the tree-like structure obtained from the image 91.
Object Measurements (Block 25) This involves measuring the objects to obtain the shape properties of each object. The following primitives are used: (a) area of the object (measured in pixels), (b) number of contours, and (c) perimeter length of each contour (measured in pixels). From these primitives, the following are determined: (a) elongation, (b) hull area, (c) hull eccentricity, (d) black/white ratio, (e) Euler number, and (f) number of sharp corners .
Elongation measures the ratio between the width of the lines forming the object and the overall dimensions of the object. Elongation is computed as follows : where A is the area of the object, and P is the perimeter of the object.
Hull is the convex polygon which bounds the object. There are fast algorithms which compute the convex hull for a given set of points.
Hull eccentricity is the ratio between width and height of the hull.
Black/white ratio is the ratio between the hull area and the area of the object .
Euler number indicates the number of holes in the object. It is defined as one minus the number of holes .
The number of sharp corners is computed as follows: first a polygonal approximation of the contours is generated. This approximation is generated several times, each time with a bigger error threshold. This is done as long as the number of polygon segments continues to drop linearly with respect to the increase of the error threshold. The last approximation is used for the evaulation of the number of sharp corners. A sharp corner is a corner in the approximating polygon which has an angle of less than 95 degrees.
Fig. 10 is a flow chart illustrating one algorithm that may be used for performing a polygonal approximation operation in the object measurement step (e) .
Object Classification (Block 26) This step involves classifying the objects as graphics or text. In this step, the objects are traversed in a bottom-up fashion and are classified according to the measurements taken in the previous step, and according to the classes that were given to the successive objects in the tree. The classification is done according to a set of predefined rules and thresholds. Appendix C is an example of such rules and thresholds as illustrated in the flow diagrams of Figs. 11a and 11b.
Output Generation (Block 27) This step involves generating outputs representing text and graphics, as illustrated by documents 4 and 6, respectively, Fig. 1.
In this step, the original image is read again and written back in different colours. White pixels remain white, but black pixels change according to the class of the object to which they reside (each class is assigned a different colour) . Two adjacent black pixels are never painted in different colours because the dilation operation prevents them from being associated with different objects; therefore, it prevents them from having different classes, and thus different colours.
After the black pixels are repainted, the whole process can be repeated for the white pixels. That is, if it is necessary to discrimate between the various white objects, the steps of blocks 21-27 of the flow chart of Fig. 2 should be carried out again, but this time step 2 (block 22), namely the dilation step, should be performed on the white pixels and not on the black pixels.
The problem of output generation is actually reduced to the problem of finding for each black pixel the object in which it resides. This object directly defines the class, and the class defines the new colour for that pixel.
One algorithm which may be used for output generation, as illustrated in Fig. 13, is illustrated in the attached Appendix D.
The invention has been described with respect to one preferred embodiment, but it will be appreciated that many variations and other applications of the invention may be made .
While the flow diagram of Fig. 2 illustrates the steps as performed sequentially, such steps can be, and preferably are, performed in pipeline fashion.
Thus, during scanning via the input window, as soon as an end of an object is determined, processing of the output of the object can be started from the highest line of the object.
APPENDIX A The following is an algorithm for dilation of run-length coded image. d - distance threshold. linei - input line number . line - output line number i. strip . - batch of 2d+\ lines linei d, Hnei+d). 1. initialize brush-vector: b[i] <— LV(^+½)2 - i2\ , -d ≤ i ≤ d 2. initialize lines-counter: j <— 0 3. clear line. , -d ≤ i < 0 4. read first d lines into fnc0, /me;, ... , Hned l 5. while not end-of-file do 6. clear line! 7. read next input line into Hnej+(J 8. partition tripj into patterns: P r P2, ... , Pn 9. for each Pk , l ≤ k ≤ n do 10. if Pk is not totally WHITE (not zero) then 11. set LEFT, RIGHT to mark the left and right margins of 12. find minimal I i I for which Pk[i] is BLACK 13. insert black run [LEFT-b[i], RiGHT+b[i]] into line! 14. end 15. end 16. output line! 17. 18. end A pattern is a slice in a strip that contains 2d+l line segments which start and end at the same coordinate (see figure 5c). A pattern is maximal in the sense that it is the widenst slice that contains no color transitions along the line segments which constitutes the pattern. Pk[i] is the color of the z'th line-segment in P, (see figure 5d).
Input: image given run-length format.
Output: list of objects and chains.
An object contains: a. color-code describing the color of the object. b. area of object (number of pixels) c. pointers to the chains which partition the contour of the object.
A chain contains: a. chain-code describing a segment of contour. b. length of chain (number of links) c. pointers to the objects on both sides of the chain.
The algorithm uses the following variables: x y - pointers to current scan location. lineO, linel - holds the contents of two successive input lines. gchains - list of "growing" chains. chainp - pointer to chain in gchains. 1. frame the image (frame width = 1, frame color = WHITE) 2. init objects, chains, gchains 3. y ^- 0 4. read first line into lineO 5. while y < height-of-image do: 6. x 4- 0 7. read next line into linel 8. set chainp to point at first chain in gchains list 9. while x < width-of -image do: 10. advance x to coordinate of next run 11. identify state of colors in the 2x2 window centered at (x,y) 12. handle this state (see figure 8) 13. end 14. y <— y + 1 15. lineO «— linel 16. end At step 1, the framing process can be done concurrently at the time the image is read (steps 4 and 7). At step 10, variable x is advanced to the coordinate of minimal-run offset, thus, no run is skipped. Each run is processed twice - once as being a member of lineO and once as a member of linel.
APPENDIX C The following is an algorithm for object classification. 1. class <- TEXT 2. if area < CI resolution2 then class 4- NOISE 3. if object has no predecesor then class <— BACKGROUND 4. if Ew/er wm&er < C2 then c/aw 4- BACKGROUND 5. if elongation > C3 then c/a.ss 4- GRAPHICS 6. if iforp cor/zerj > C4 then /aw 4- GRAPHICS 7. if W ra/z ? < C5 then class - GRAPHICS 8. if elongation > C6 then c/cws 4— VO/Z) 9. if eccentricity > C7 then c/ass 4- VOID 10. if cto is BACKGROUND then 11. for each succesor object - sobj do: 12. if class of sobj is TEXT then 13. change the class of all succesors of sobj to BACKGROUND 14. end Constant Value C1 0.25 C2 -8 C3 90 C4 20 C5 0.05 C6 25 C7 15 APPENDIX D The following is an algorithm for output generation. / - length of run. c - color of run. x,y - coordinates in the image. 1. y <- 0 2. while not end-of-file do 3. y <- y + 1 4. read next line from source file 5. x This knowledge can even be used to make the process possible in pipeline. The output generation will be triggered by signals from the classification module that a new object is completely discovered and classified.

Claims (6)

1. WHAT IS CLAIMED IS: 1. A method of analyzing a source image to separate text from graphics, comprising: (a) scanning and digitizing the source image to obtain a binary image including black and white objects ; (b) filtering out the noise from the binary image to obtain a filtered binary image; (c) extracting the contours of the black objects and the white objects from the filtered binary image ; (d) evaluating inclusion relationships between the objects, and generating a tree-like structure of such relationships; (e) utilizing said contours for measuring the objects to obtain the shape properties of each object; (f) effecting classification of the objects as graphics or text according to the measured shape properties and then generating tree-like structure of the inclusion relationships; (g) and utilizing said source image and said classification of the objects for generating outputs representing graphics and text, respectively. 3. The method according to either of Claims 1 or 2, wherein in step (e), the objects are measured in a top-down sequence, starting with the object at the root of the tree.
2. The method according to Claim 1 , wherein in step (b), the noise is filtered out by dilation of the black pixels. 4. The method according to any one of Claims 1-3, wherein in step (c), extracting the contour of the black objects and the white objects from the filtered binary image is effected by a single scan in which a window is convolved with the filtered binary image in a raster fashion. 5. The method according to Claim 4, wherein the window scans the image along a line and returns an indication of the type of pattern seen from the window and an indication of the center of the window, each type pattern being processed differently to determine whether a new object is started, continued or ended, all objects intersecting the current scan line being processed in parallel. 6. The method according to Claim 5, wherein a maximal point encountered during the window scan is considered to be a starting point of a new object, but if later the scan indicates it was a maximal point of a previously indicated object, the new object is merged with that of the previously indicated object. 7. The method according to any one of Claims 1-6, wherein in step (d), the tree-like structure is generated by setting a pointer from each object to its predecessor, the predecessor of an object being found by determining which of the object contours is the outermost one, and then setting the pointer to point at the object on one side of that contour. 8. The method according to any one of Claims 1-7, wherein in step (e), the objects are measured to obtain the following shape properties of each object: area of the object, number of contours, and perimeter length of each contour. 9. The method according to Claim 8, wherein in step (e), the following additional properties are determined from the measured shape properties: elongation, hull area, hull eccentricity, black/white ratio, Euler number, and number of sharp corners. 10. The method according to Claim 9, wherein the number of sharp corners is determined by: generating several polygonal approximations of the contour, with each generation having a bigger error threshold, as long as the number of polygon segments drops linearly with respect to the increase in the error threshold; and determining that a sharp corner exists when the last polygon approximation has an angle of less than 60°. 11. The method according to any one of Claims 1-10, wherein in step (g), the generated outputs represeting graphics and texts are in the form of different images. 12. The method according to any one of Claims 1-10, wherein in step (g), the generated outputs representing graphics and texts are in the form of different colours of the same image. 1
3. The method according to any one of Claims 1-10, wherein the source image contains text of different sizes, orientationsand/or fonts. 1
4. The method according to Claim 2, wherein steps (a)-(g) are repeated, except the noise is filtered out in step (b) by dilation of the white pixels, so that white objects of the source image are separated, thereby providing discrimination of white text and graphics over black background. 1
5. The method according to any one of Claims 1-10, wherein the source image contains black text, white text, black graphics, white graphics, black and white background, and black and white noise. 1
6. A method of analyzing a source image to separate text from graphics, substantially as described with reference to and as illustrated in the accompanying drawings. torney Tel-Aviv 61 230
IL9829391A 1991-05-28 1991-05-28 Method of discriminating between text and graphics IL98293A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
IL9829391A IL98293A (en) 1991-05-28 1991-05-28 Method of discriminating between text and graphics
EP92630054A EP0516576A2 (en) 1991-05-28 1992-05-21 Method of discriminating between text and graphics
JP4134588A JPH05166002A (en) 1991-05-28 1992-05-27 Method for analyzing source image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
IL9829391A IL98293A (en) 1991-05-28 1991-05-28 Method of discriminating between text and graphics

Publications (2)

Publication Number Publication Date
IL98293A0 IL98293A0 (en) 1992-06-21
IL98293A true IL98293A (en) 1994-04-12

Family

ID=11062477

Family Applications (1)

Application Number Title Priority Date Filing Date
IL9829391A IL98293A (en) 1991-05-28 1991-05-28 Method of discriminating between text and graphics

Country Status (3)

Country Link
EP (1) EP0516576A2 (en)
JP (1) JPH05166002A (en)
IL (1) IL98293A (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE69523135T2 (en) * 1994-12-28 2002-05-02 Canon Kk Image processing device and method
US6389162B2 (en) 1996-02-15 2002-05-14 Canon Kabushiki Kaisha Image processing apparatus and method and medium
WO2000062243A1 (en) 1999-04-14 2000-10-19 Fujitsu Limited Character string extracting device and method based on basic component in document image
JP4454789B2 (en) 1999-05-13 2010-04-21 キヤノン株式会社 Form classification method and apparatus
JP4547752B2 (en) * 2000-01-14 2010-09-22 ソニー株式会社 Image processing apparatus and method, and recording medium
US6738512B1 (en) * 2000-06-19 2004-05-18 Microsoft Corporation Using shape suppression to identify areas of images that include particular shapes
US6832726B2 (en) 2000-12-19 2004-12-21 Zih Corp. Barcode optical character recognition
US7311256B2 (en) 2000-12-19 2007-12-25 Zih Corp. Barcode optical character recognition
US7596270B2 (en) * 2005-09-23 2009-09-29 Dynacomware Taiwan Inc. Method of shuffling text in an Asian document image
JP5842441B2 (en) 2011-07-29 2016-01-13 ブラザー工業株式会社 Image processing apparatus and program
JP5796392B2 (en) 2011-07-29 2015-10-21 ブラザー工業株式会社 Image processing apparatus and computer program
JP5853470B2 (en) 2011-07-29 2016-02-09 ブラザー工業株式会社 Image processing device, image processing program
JP5776419B2 (en) 2011-07-29 2015-09-09 ブラザー工業株式会社 Image processing device, image processing program
US20160110599A1 (en) * 2014-10-20 2016-04-21 Lexmark International Technology, SA Document Classification with Prominent Objects

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3128794A1 (en) * 1981-07-21 1983-05-05 Siemens AG, 1000 Berlin und 8000 München Method for finding and delimiting letters and letter groups or words in text areas of an original which can also contain graphical and/or image areas apart from text areas

Also Published As

Publication number Publication date
IL98293A0 (en) 1992-06-21
EP0516576A2 (en) 1992-12-02
EP0516576A3 (en) 1994-01-12
JPH05166002A (en) 1993-07-02

Similar Documents

Publication Publication Date Title
JP4323328B2 (en) System and method for identifying and extracting character string from captured image data
Wang et al. Classification of newspaper image blocks using texture analysis
US5410611A (en) Method for identifying word bounding boxes in text
Namboodiri et al. Document structure and layout analysis
US5563403A (en) Method and apparatus for detection of a skew angle of a document image using a regression coefficient
US5050222A (en) Polygon-based technique for the automatic classification of text and graphics components from digitized paper-based forms
US6014450A (en) Method and apparatus for address block location
US4757551A (en) Character recognition method and system capable of recognizing slant characters
US4817171A (en) Pattern recognition system
LeBourgeois Robust multifont OCR system from gray level images
US5805740A (en) Bar-code field detecting apparatus performing differential process and bar-code reading apparatus
US7233697B2 (en) Character recognition device and a method therefor
US5915039A (en) Method and means for extracting fixed-pitch characters on noisy images with complex background prior to character recognition
IL98293A (en) Method of discriminating between text and graphics
JP2000132690A (en) Image processing method and image processor using image division by making token
US20030012438A1 (en) Multiple size reductions for image segmentation
US5974200A (en) Method of locating a machine readable two dimensional barcode within an image
JPH05225378A (en) Area dividing system for document image
Rege et al. Text-image separation in document images using boundary/perimeter detection
US5113453A (en) Character recognition method and apparatus
Lam et al. Reading newspaper text
Shivananda et al. Separation of foreground text from complex background in color document images
Velu et al. Automatic letter sorting for Indian postal address recognition system based on pin codes
Govindaraju et al. Newspaper image understanding
JP2001109887A (en) Area extracting method, method and device for extracting address area, and image processor

Legal Events

Date Code Title Description
RH Patent void