WO2000038100A1 - Appareil et procede ameliores destines a la correction du desalignement d'images de symboles ayant une base de ligne non-lineaire - Google Patents

Appareil et procede ameliores destines a la correction du desalignement d'images de symboles ayant une base de ligne non-lineaire Download PDF

Info

Publication number
WO2000038100A1
WO2000038100A1 PCT/US1999/023605 US9923605W WO0038100A1 WO 2000038100 A1 WO2000038100 A1 WO 2000038100A1 US 9923605 W US9923605 W US 9923605W WO 0038100 A1 WO0038100 A1 WO 0038100A1
Authority
WO
WIPO (PCT)
Prior art keywords
baseline
pixel
symbols
minima
local
Prior art date
Application number
PCT/US1999/023605
Other languages
English (en)
Inventor
Adam Altman
Timur Osipov
Mikhail Utkin
Original Assignee
Horizon Marketing Corporation, Aka Wordwand
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Horizon Marketing Corporation, Aka Wordwand filed Critical Horizon Marketing Corporation, Aka Wordwand
Priority to AU11082/00A priority Critical patent/AU1108200A/en
Publication of WO2000038100A1 publication Critical patent/WO2000038100A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/1475Inclination or skew detection or correction of characters or of image to be recognised
    • G06V30/1478Inclination or skew detection or correction of characters or of image to be recognised of characters or characters lines

Definitions

  • the invention relates to the conversion of images of symbols to an electronic format. More particularly, the invention relates to the deskewing of images of symbols, where the symbols are positioned along a baseline that is not a straight line, but rather is a wavy or a curved line.
  • Text recognition technology provides a useful tool for converting information stored on paper into an electronic format.
  • Optical character recognition (OCR) technology typically converts the electronic output of a scanner into computer usable files through a series of complex computer algorithms.
  • the electronic image produced by the scanner is comprised of black and white picture elements referred to as pixels, andjias a desired resolution, which for text is presently 300 dpi, or 90,000 pixels per square inch.
  • Each pixel is rendered as a digital value of 1 or 0, which represents either a white pixel or a black pixel.
  • Before optically scanned text is actually recognized it may be displayed and manipulated as an image on a computer monitor or in memory. At this point, the electronic information has not yet been recognized as text, but is merely an image or picture of the text.
  • OCR algorithms typically recognize scanned text images in two steps. First, they analyze the image of the page to determine which parts of the image are text and numeric data, and determine the structure of the page layout. For example, tables, columns, and paragraphs are identified and located. Next, the characters are examined and identified to produce a file of character data contained in words, including page formatting information, such as tables, columns, paragraphs, spacing, bold characters, italics, and underlining that are necessary to allow manipulation of the data as a text file.
  • page formatting information such as tables, columns, paragraphs, spacing, bold characters, italics, and underlining that are necessary to allow manipulation of the data as a text file.
  • Deskewing is a well understood problem in imaging, especially related to OCR technology.
  • the image that is to be scanned is placed on a flat platen or is fed through a set of rollers.
  • hand held scanners such as the
  • OmniscanTM product manufactured by Caere of Los Gatos, California the user pulls the scanner vertically down the page.
  • the DataPenTM product manufactured by Primax Electronics of Taiwan R.O.C. the user pulls the scanner horizontally across the page, typically resulting in a wavy baseline (see for example U.S. Pat. Nos. 5,182,450 and 5,301 ,243).
  • the horizontal baseline of symbols may not be flat across the image, but may slope upwards or downwards at a constant angle. This skew confuses the OCR algorithms, and a deskewing process therefore needs to be implemented to recreate a horizontal baseline. While there are OCR algorithms that are tolerant of skew, the majority and the best of such algorithms are not tolerant of skew.
  • U.S. Patent No. 5,054,098 discloses a process from which samples are taken to measure skew, and from which samples a histogram is created. The most frequent skew angle is considered to be the skew angle of the entire document. The baseline in this technique is assumed to be a straight line.
  • A. Spitz Determination of Image Skew Angle From Data in Compressed Form
  • U.S. Patent No. 5,245,676 discloses a method of determining an image skew angle. The problem solved is one of having a baseline with a fixed angle. The disclosed method selects certain features on the image on which to base a skew angle, and assigns weights to those features.
  • a hand-held scanner that is able to scan text one line at a time e.g. as a scanning wand or pen.
  • the user would hold such scanner as with a pen, and draw the scanner horizontally across a page to enter information into a computer one line at a time, much like a highlighting pen is used to highlight text.
  • Such scanner scans symbols one line at a time. Because of this manual horizontal action, the image created by the scanner is not distorted or skewed in a conventional manner, where the skew is of a constant angle, but is subject to a new problem, referred to herein as baseline waviness.
  • the baseline waviness is created because there is no flat baseline of symbols, or even a straight but angled baseline, as generated in a page scanner, when hand scanning across a page. Thus, the baseline assumes a wavy non-linear profile.
  • U.S. Patent No. 5,638,466 (10 June 1997) discloses an apparatus and method for deskewing images of symbols having a non-linear baseline, wherein a program determines an overall boundary for each continuous symbol, determines minimum, maximum, and average dimensions for the symbols, and shifts the symbols to a flat baseline. While Rokusek provides some deskewing for many symbols, Rokusek provides only a single minimum point to be determined for each continuous symbol. Therefore, it does not compensate for uneven scanning of a single symbol, and fails to compensate for uneven scanning between connected symbols.
  • Skewed characters can also be connected together within a scanned image, typically as the result of close tracking within entire lines of text, or by tight kerning between character pairs.
  • characters can also be connected together inadvertently, such as within photocopied pages having dark print quality.
  • typefaces front families
  • much time is spent designing the overall default tracking, or space between adjoining symbols, depending on the font and font size chosen.
  • certain letter pairs when typed consecutively next to one another, are spaced slightly differently than the default tracking for the font, and are referred to as kerning pairs.
  • the disclosed prior art systems and methodologies thus provide basic methods for deskewing images, but fail to provide a system to accurately deskew images of text on a pixel by pixel basis, or on a pixel column by pixel column basis.
  • the development of such a deskewing system would constitute a major technological advance.
  • the invention is used in conjunction with a hand-held scanner that scans generally parallel to the baseline of printed symbols, such as text, across a page.
  • the invention provides a system that deskews pixel-based images of symbols having a non-linear baseline on either a pixel by pixel or pixel column by pixel column basis, to thereby flatten waviness associated with such scanning.
  • the resulting deskewed image improves the legibility of the deskewed pixel-based image, and allows subsequent optical character recognition (OCR) systems to accurately recognize the deskewed symbols.
  • OCR optical character recognition
  • the system operates accordance with a program that finds separate contiguous pixel groups of symbols in the scanned image, and defines bounding boxes around the groups.
  • Local baseline minima for the symbols are then determined, which are used to create a series of line segments that approximate the non-linear baseline.
  • the series of line segments are then fit to a flat baseline, and the pixels or pixel columns associated with the scanned symbols are then shifted to the flat baseline, in an amount proportional to the distance between the line segments and the flat baseline.
  • Figure 1 is block diagram of a system that incorporates a scanner and a computer according to the invention
  • Figure 2 is a block schematic diagram of an image to character translation application for use with a hand-held scanner according to the invention
  • Figure 3 shows details of a pixel-based scanned symbol
  • Figure 4 shows a portion of a line of symbols defined upon a local baseline
  • Figure 5 is a simplified flow chart depicting image processing
  • Figure 6 shows an outline form of a pixel-based scanned image along a nonlinear baseline
  • Figure 7 shows bounding boxes defined around contiguous pixel groups of symbols in a pixel-based scanned image along a non-linear baseline
  • Figure 8 shows local minima determined for contiguous pixel groups within a pixel-based scanned image along a non-linear baseline
  • Figure 9 shows the creation of a series of baseline segments that approximate the non-linear baseline
  • Figure 10 shows the calculation of distance from baseline segments to a flat baseline for each pixel column value within the bounding box for each group of symbols
  • Figure 11 shows a deskewed pixel-based image, wherein pixels or pixel columns are shifted towards a flat baseline
  • Figure 12 is a flow chart depicting an alternate method for deskewing images of symbols having a non-linear baseline
  • Figure 13A is a flow chart depicting a method for contiguous pixel groups of symbols and determining group boundaries within a scanned image
  • Figure 13B is an example of a processed image at various stages of image processing in accordance with the image processing sequence of Figure 13A;
  • Figure 14A is a flow chart depicting a method for determining contiguous pixel group boundaries
  • Figure 14B is an example of a processed image at various stages of image processing in accordance with the image processing sequence of Figure 14A;
  • Figure 15A is a flow chart depicting a method for finding the next contiguous pixel group to the right of a current contiguous pixel group;
  • Figure 15B is an example of a processed image at various stages of image processing in accordance with the image processing sequence of Figure 15A;
  • Figure 16 is a process for finding local minima within a bounding box that surrounds a contiguous pixel group
  • Figure 17 shows a minima filtering process, which looks at the slope established between subsequent local minima, and filters minima that deviate from a slope threshold value; and
  • Figure 18 shows an example of a preferred line segment creation process, in which line segments are created by a least square linear fit to closely approximate a non-linear baseline.
  • FIG. 1 is block diagram of a system 10 that incorporates a hand-held scanner 12 and a computer 16 according to the invention.
  • a hand-held scanner 12 is preferably used to scan horizontally across a line of symbols 40 (FIGS. 4,6).
  • the scanner 12 is linked to the input port 14 of a computer 16.
  • the input function is managed by a central processing unit (CPU) 22.
  • Image information 20 obtained by the scanner 12 is stored by the CPU 22 in a memory 18.
  • the invention provides a symbol deskewing and recognition application 24 that processes the image 20 stored in memory 18, and then sends a stream of recognized text-based symbols 27 to an active user application 26, such as a word processor, database, or spreadsheet application.
  • FIG. 2 is a block schematic diagram of an image to character translation application 24 for use with a hand-held scanner 12 according to the invention.
  • the deskewing and translation application 24 consists of a user interface 28 which allows the user to change various settings and links to other parts of the application 24.
  • the application 24 processes a scanned image 20 that is created by a hand-held scanner 12 in an image processing module 32 to create a deskewed image 100 that consists of a single line of symbols 40 (FIG. 3), such as text.
  • the deskewed pixel-based image is then preferably sent to an optical character recognition (OCR) application 34, which recognizes the symbols 40.
  • OCR optical character recognition
  • the recognized symbols 27 are then sent to a character post-processing module 36, where they are optionally modified, such as through spell checking or user editing.
  • the recognized and post-processed text-based symbols 27 are then sent to an active user application 26.
  • Figure 3 shows details of a pixel-based printed symbol 40, in relation to an X- axis 41 and a Y-axis 43.
  • symbols 40 such as text
  • symbols 40 are defined by an outline 42.
  • the symbols 40 are printed, typically by a printer or by hand, they are usually defined by toner or ink.
  • the toner or ink is applied to a page to represent the symbol 40, either as a continuous ink pattern, or as a series of adjoining screened pixels of varying sizes, which approximate the outline 42.
  • adjoining columns 46 of digital pixels 44 are typically created, typically as either black or as white pixel values 44.
  • grey scale and color pixels 44 can also be scanned, analyzed, and manipulated.
  • Text symbols 40 are typically located upon a local baseline 48, and subsequent lines 56 are typically vertically separated by a distance referred to as a leading (not shown). While most symbols 40 include baseline minima 50 that are located along the local baseline 48, portions of letters may include local minima 52, which include colored pixels 44 that are either higher or lower than adjoining colored pixels 44 within the symbols 40, but are located away from the local baseline 48.
  • the symbol "T” is defined by a serif-family font structure, and includes upper local minima outliers 52 near the top of the symbol 40.
  • Figure 4 shows a portion of a line 54 of symbols 40 defined upon a local baseline 48. Groups of symbols 40 located together are considered to be contiguous pixel groups 58.
  • the system breaks the scanned lines 54 of text into collections of symbols 40, referred to as contiguous pixel groups 58. Contiguous pixel groups 58 need not be words as one normally defines them, but merely represent groups of symbols 40 without large spaces between them. Larger spaces, typically exceeding a threshold, are considered to define the spaces between groups 58 of symbols 40.
  • Figure 5 is a flow chart depicting a basic pixel-column based image deskewing process 60a.
  • Image processing begins, at step 64, wherein the system captures groups of one or more symbols 40, typically by defining bounding boxes 86 (FIG. 7) around contiguous pixel groups 58 within the scanned image 20.
  • bounding boxes 86 FIG. 7
  • bounding boxes 86 are established around contiguous pixel groups 58 in step 64, when the vertical edges of the bounding box 86 are expanded past a pixel group 58, thereby hitting a pixel column 46 comprised of entirely of white pixels 44 (for black symbols 40), that bounding box 86 will end, and a new bounding box 86 will be determined for the subsequent contiguous pixel group 58.
  • a first bounding box 86 is defined around the symbol "m” 40
  • a second bounding box 86 is defined around the symbol "i”
  • third bounding box 86 is defined around the contiguous group of pixels 58 comprising the symbols "ght”. While the pixels
  • the T is also included within the bounding box 86, since all the pixel columns 46 between the "h” and the '1" have at least one darkened pixel 44.
  • bounding boxes 86 are defined around contiguous pixel groups 58 which do not require that a white pixel column 46 is established between bounding boxes 86.
  • the bounding box 86 around the contiguous pixel group 58 including the "gh” would be separate from a bounding box 86 surrounding the "t”.
  • non- rectangular bounding boxes 86 are defined around contiguous pixel groups 58.
  • bounding boxes 86 in the deskewing system 10 is typically used only to define one or more contiguous pixel groups 58, whereby local minima 50, 52, 56 are found and used to deskew a scanned image 20 having a non- linear baseline 82.
  • bounding box 86 local minima 50,52,56 (FIG. 8) are found for each contiguous pixel group 58 of symbols 40, at step 68. Since each contiguous pixel group 58 is comprised of one or more columns 46 of pixels 44, there is commonly one or more local minima 50, 52, 56 for each contiguous pixel group 58, particularly along the scanned non-linear baseline 82 (FIG.
  • each line segment 92 for a particular x-location which corresponds to each pixel column 46 within the contiguous pixel group 58, is then calculated, at step 76.
  • Each column of pixels 46 is then adjusted by the calculated offset for a location along the X-axis 51 (FIG. 3), at step 78, which shifts each column of pixels 46 vertically along the Y-axis 53 (FIG. 3), in relation to the flat baseline 96.
  • individual pixels 44 within a contiguous pixel group 58 are each adjusted by the calculated offset for a location along the X-axis 51 (FIG. 3), at step 78, which shifts each pixel 44 vertically along the Y-axis 53 (FIG. 3), in relation to the flat baseline 96. While entire columns 46 of pixels 44 are typically moved as a group, the process can be used to shift any number of discrete pixels 44.
  • Figure 6 shows an outline form 80 of a pixel-based scanned image 20 along a non-linear baseline 82.
  • Each outline 42 is approximately represented by groups of columns 46 of pixels 44 within the scanned image 20, as shown in Figure 3.
  • Figure 7 is a view 84 showing bounding boxes 86 defined around contiguous pixels groups 58 of symbols 40 in a pixel-based scanned image 20 along a non-linear baseline 82.
  • each of the contiguous pixel groups 58 are typically determined, wherein symbols 40 that are located within a specified distance are grouped together within contiguous pixel groups 58 for the deskewing process 60.
  • Figure 8 shows the determination 88 of local minima 50,52,56 for contiguous pixel groups 58 of symbols 40 in pixel-based scanned image 20 along a non-linear baseline 82.
  • m 40 is constructed of a plurality of columns 46 of pixels 44, the determination of multiple local minima 50, and the ability to adjust each column 46 of pixels 44 individually, allows the accurate deskewing of the symbol 40.
  • Figure 9 shows the creation 90 of a series of baseline segments 92 that approximate the non-linear baseline 82.
  • line segments 92 are established between all the subsequent baseline minima 50.
  • line segments 92 do not necessarily pass through all the subsequent baseline minima 50, but are established between baseline minima 50 which lie further away, such as with a least squares fit, wherein the vertical distance from intermediate baseline minima 50 is less than a threshold value (FIG. 18).
  • a line segment 92 may likely be established between the left-most baseline minima 50 on the "m” and the baseline minima 50 on the "i", since the vertical distance between the other baseline minima 50 on the "m” and the established line segment is relatively small.
  • Figure 10 shows the calculation 94 of distances 98 from baseline segments 92 to a flat baseline 96 for each pixel or pixel column 46 location along the X-axis 41 within each contiguous group 58 of symbols 40.
  • Figure 11 shows a deskewed pixel-based image 100, wherein the pixel columns 46 for each of the symbols 40 are shifted towards the flat baseline 96, using the calculated distances 98 between the baseline segments 92 and the flat baseline 96 for each pixel column 46.
  • the use of pixel-column based deskewing 60 allows each of the pixel columns 46 within skewed symbols 40, such as the "m”, to be accurately shifted, either upward or downward, to the fixed baseline 96, thereby reproducing an accurate outline 42 of the scanned symbol 40.
  • Connected symbols 40 such as the "g” and the "h” are also accurately shifted toward the fixed baseline 46.
  • the basic pixel-column based deskewing process 60a comprises the following steps:
  • FIG. 12 is a flow chart depicting an alternate method 60b for deskewing an image having a non-linear baseline 82.
  • preliminary step 62 is preferably used to clean the edges of image scans 20, which removes contiguous areas of black that directly abut the edges of the scans 40.
  • the cleaning step 62 is performed by a seed algorithm.
  • the seed algorithm which is typically used to find any continuous set of contiguous pixels 44, first finds black pixels 44 next to all edges of each image scan 20.
  • the cleaning process 62 Upon finding a black pixel 44, the cleaning process 62 inverts the pixel 44, making it white. The process then makes all adjoining pixels 44 white, and repeats the process, until there are no adjoining black pixels 44. In this manner, all contiguous pixel groups 58 of symbols 40 within the scanned image 20 lie within the cleaned scanned image 20 which has at least one white pixel depth around the edges of the scanned image 20.
  • preliminary step 62 cleans the edges of image scans
  • the cleaned scanned image 20 therefore has at least one white pixel depth added around the edges of the scanned image 20, which is then preferably used by the deskewing process 60b to aid in subsequent analysis of contiguous pixel groups 58, as discussed below.
  • Bounding boxes 86 are established around all the contiguous pixel groups 58, in step 64, first by finding separate contiguous pixel groups 58 of symbols 40 in step 66, as shown in Figure 13A and 13B, and then repeating the process 66, to capture all the contiguous pixel groups 58 along the non-linear baseline 82, in step 67.
  • a seed fill algorithm is used to find the extent of contiguous pixel groups 58, to establish the boundaries 64 of bounding boxes 86, as shown in Figure 14A and 14B, as discussed below.
  • bounding boxes 86 are established around contiguous or nearby pixels 44, by starting at a single point, or at four nearby points which establish the four comers of a rectangular bounding box 86. The rectangular bounding box 86 is then expanded in height and width until each edge is all white. At this step, each edge of the bounding box is one pixel width too wide.
  • the edges of the bounding box 86 are then moved back, or shrunk, by one pixel width or height 46 from the top, bottom, left and right sides of the contiguous group of pixels 58, such that one or more black pixels 44 just touch each of the sides of the bounding box 86.
  • the bounding box 86 of these contiguous or nearly contiguous groups 58 of pixels 44 is established.
  • the bounding boxes 86 are typically used only to put limits on where to look for local minima 50, 52,56.
  • Individual symbols 40 within the contiguous pixel groups 58 are not necessarily required to be found and analyzed in the pixel-column based deskewing process 60a, 60b, since minima 50, 52, 56 for entire groups of contiguous pixels 58 can be found and analyzed to create a series of baseline segments 92 that approximate the curvature of the non-linear baseline 82.
  • some embodiments find and analyze individual symbols 40, typically to establish threshold constants for letter spacing, word spacing, and font family characteristics, such as typical descender minima and upper minima distances.
  • Figure 13A is a flow chart 110 depicting a method for finding contiguous pixel groups 58 of symbols 40 and symbol boundaries during a scan operation, which can be used within step 64 within the deskewing process 60a, 60b.
  • Figure 13B is an example of a processed image 20 at various stages of image processing in accordance with the image processing sequence of Figure 13A.
  • the process 110 first finds the first contiguous pixel group 58 of symbols 40 in the scan image 20, at step 112, by approaching the image from one side of the image 20 (e.g.
  • the system determines the top, bottom, left, and right boundaries of the contiguous pixel group 58 of symbols 40, at step 114a, of which the pixel 44 is a member (as represented by the symbol "p" in Figure 13B). This step 116 is then repeated, moving in the first direction (e.g. rightward), until all the contiguous pixel groups 58 of symbols 40 have been so processed (as represented b y the period ".” in Figure 13B).
  • This process of finding contiguous pixel groups 58 of symbols 40 is then preferably repeated from the opposite direction (e.g. moving leftward starting from the right-most contiguous pixel group 58), to verify that all contiguous pixel groups 58 of symbols 40 have been found, at step 118, as represented by the period ".” in Figure 13B.
  • the process again determines the symbol boundaries for each contiguous pixel group 58 of symbols 40 while moving in the second direction, at steps 1 14b and 120.
  • the system then merges the results of the two foregoing processes, to ensure that all contiguous pixel groups 58 of symbols 40 have been included, at step 122, and that the process is completed.
  • the period ".” symbol 40 is a contiguous pixel group 58, with a separate bounding box 86.
  • Other contiguous pixel groups 58 of smaller symbols 40, such as apostrophes would also typically have a separate bounding box 86.
  • a local minima would typically be determined to be an outlier minima 52, and would not be used in subsequent calculation of baseline segments 92.
  • the pixel columns 46 for the apostrophe would be adjusted in proportion to the neighboring pixel columns 46. Therefore, the apostrophe 40 would not be mistakenly adjusted down to the flat baseline 96, possibly creating a comma, nor would the minima 52 on the apostrophe be used in the determination of the baseline segments 92.
  • Figure 14A is a flow chart depicting a method 114 for determining the boundaries of a contiguous pixel group 58 of symbols 40.
  • Figure 14B is an example of a processed image at different stages of processing in accordance with the process shown in Figure 14A.
  • processing begins at any point on the contiguous pixel group 58, which typically includes one or more symbols 40.
  • the process is initialized at step 126, and the process then moves around the perimeter of the contiguous pixel group 58 in a constant direction, in step 128.
  • the system monitors the X and Y coordinates 41 ,43 as the process proceeds around the symbol 40, and updates coordinate minimums and maximums, at step 130.
  • the process is complete when there has been a complete circuit of the perimeter of the symbol 40, and the minimum and maximum values for the symbol 40 are then stored to memory.
  • Figure 15A is a flow chart depicting a method 116,120 for finding subsequent contiguous pixel groups 58, which is used in the process 110 shown in Figure 13A.
  • Figure 15B is an example of a processed image at different stages of processing in accordance with the process shown in Figure 15A.
  • the process starts at the right border of the previous contiguous pixel group 58, at step 134, as indicated by the letter "m” in Figure 15B.
  • the system moves up and down the range of heights in the previous contiguous pixel group 58, shifting one pixel- column 46 to the right at a time, at step 136, and continues until a dark pixel 44 is hit, at step 138, as indicated by the letter "i" in Figure 15B. It is assumed that the hit pixel 44 belongs to the next contiguous pixel group 58.
  • the hit pixel 44 is then used to find the symbol boundaries of the next contiguous pixel group 58 (of the "i"), at step 140.
  • Figure 16 shows a process 68 for finding the local minima 50, 52, 56 within the bounding boxes 86, as used in the deskewing process 60a, 60b.
  • the process moves from one end of the scanned image 20 within the bounding box 86, and repeatedly analyzes pixels 44 belonging to each contiguous pixel group 58 comprising one or more symbols 40, in relation to neighboring pixels 44 (FIG. 3). Through subsequent analysis and comparison of pixels 44 and neighboring pixels 44, local minima 50, 52, 56 throughout each contiguous pixel group 58 are determined.
  • the process is started at one end of the bounding box 86, at step 142.
  • the current pixel 44 is set as the local minimum candidate, at step 144.
  • the relationship with the next neighboring pixels 44 are checked, at step 146.
  • the lower neighbors 44 are established as local minimum candidates, at step 150. If the neighbors are higher, the current pixel 44 is established as a local minimum, at step 152, and the local minimum candidate is cleared, in step 154. If the current pixel 44 and the neighbors 44 are at the same height 43, the current pixel 44 and the neighboring pixel are all established as local minimum candidates, at step 148. When the opposite end of the bounding box is reached, at step 156, the local minimum candidate, if it exists, is determined to be a local minimum, at step 160, and the process stops, at step 162. Otherwise, the next pixel 44 is established as the current pixel, at step 158, and the process is repeated, until all the local minimums 50,52,56 for the contiguous pixel group 58 are found within the bounding box 86.
  • FIG. 3 at a time (i.e. raising a bar), while detecting and logging the transition between white pixels 44 and black pixels 44, thereby storing the local minima 50,52,56.
  • the process 68 repeated, until the top horizontal edge of the bounding box 86 is reached, whereby all the local minima 50,52,56 for the symbol 40 are found within the bounding box 86.
  • the baseline minima 50 are used to create baseline segments 92 which are subsequently used to deskew the pixels 44 or pixel columns 46 within the image 20.
  • lower minima 56 due primarily to descending symbols
  • upper local minima 52 due to serifed symbol details
  • the lower minima 56 and upper local minima 52 are detected and removed, first by calculating a curve to fit all the relative local minima 50, 52, 56, and then by using high frequency signal filtering techniques, such as fast fourier transform (FFT) filtering, to remove high frequency noise, thus eliminating the lower minima 56 and upper local minima 52 from the calculated curve.
  • high frequency signal filtering techniques such as fast fourier transform (FFT) filtering
  • a minima filtering process 164 analyzes each of the minima 50,52,56 sequentially, and looks at the slope (dy/dx) established between subsequent local minima, as shown in Figure 17. From each current local baseline minimum 50, the slope is determined to the next local minimum 50,52,56. If the absolute value of the slope is too great (e.g. having a slope greater than or equal to 0.2), the next local minimum is assumed to deviate too much from the current local minimum, and the process 164 assumes that the next local minimum is either an upper minimum 52 (if the slope is greater than a positive threshold value), or is a lower minimum 56 (if the slope is less than a negative threshold value).
  • next local minimum 52,56 is determined to deviate too much from the expected baseline, and is not used in the creation of baseline segments 92. If a next local minima is rejected as a deviation 52,56, the current local minimum 50 is used again, to be compared with the next local minimum 50,52,56 in the X-direction 41 , and the process is then repeated.
  • the filtering process 164 finds more than a specified number (e.g. four) of consecutive minima, the process typically assumes that the minima are baseline minima 50 (that they are located on the baseline 82), since non-baseline local minima 52,56 deviations, above the baseline or below the baseline, typically occur with a much lower frequency.
  • a specified number e.g. four
  • line segments 92a, 92b are created by a least-square linear fit, to closely approximate the non-linear baseline 82. While a single line segment 92a could be drawn between the leftmost baseline minima 50 on the "m” and the baseline minima 50 on the "t", the vertical offset distance 168 between the intermediate baseline minima 50, such as on the "i" may be greater than an allowable threshold.
  • a smaller line segments 92 can be chosen, such as line segment 92b between the leftmost baseline minima 50 on the “m” and the baseline minima 50 on the "i”, and line segment 92c between the baseline minima 50 on the "i” and the leftmost baseline minima 50 on the "m” and the baseline minima 50 on the "t”.
  • the process For each chosen line segment 92, the process preferably finds the longest perpendicular distance 168 from the line segment to the line 170 defined between all of the local baseline minima 50. If the longest perpendicular distance 168 is greater than a threshold value, the process selects shorter line segments
  • the pixel column deskewing process 60a, 60b determines a large number of local baseline minima 50, which allows the scanned non-linear baseline 82 to be accurately determined, allowing the symbols 40 to be deskewed accurately.
  • each scanned contiguous pixel group 58 can be adjusted either on a pixel basis or on a pixel column basis to a flat baseline 96, to reproduce an accurate pixel-based image 100 on the flat baseline 96.
  • contiguous pixel groups 58 are themselves deskewed, as necessary, to produce a more accurate contiguous pixel groups 58, located accurately upon the flat baseline 96.
  • OCR applications 34 are more likely to properly recognize each of the scanned symbols 40, producing a more accurate text-based file 27 to be output to subsequent user applications 26.
  • contiguous pixel groups 58 comprising connected pixel-based symbols 40, such as tightly tracked text or kerning pairs, can be adjusted more accurately in the pixel-column deskewing system 60, since the local minima 50 for each of the connected symbols 40 are found, and are used to deskew the combined symbols 40, producing a more accurate deskewed symbol group.
  • the deskewing system 10 and its methods 60 of use are described herein in connection with text scanning and computer input systems, the techniques can be implemented for other image input, image recognition or image enhancement devices, or any combination thereof, as desired.
  • the deskewing system 10 and its methods 60 of use are described herein in connection with black and white scanned images, the techniques can be implemented for other images, such as grey scale or colored images, as desired.
  • the pixels 44 analyzed are illustrated as discrete square pixels 44 arranged within columns 46 and rows 47, the system 10 and methods 60 of use can be used in for any discreet pixel-based images.
  • the deskewing system 10 and its methods 60 of use are described herein in connection with text on a non-linear horizontal baseline, the techniques can be implemented for the deskewing of any symbols on any baseline, as desired.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)

Abstract

L'invention concerne un procédé utilisé conjointement avec un scanner à main qui scanne en général parallèlement à la ligne de base de symboles imprimés, tels que du texte, le long d'une page. Cette invention concerne un système qui corrige le désalignement d'images de symboles pixelisées qui ont une ligne de base non-linéaire, soit pixel par pixel, ou soit colonne de pixels par colonne de pixels, afin d'aplanir l'ondulation liée à un tel scannage. L'image corrigée résultante améliore la lisibilité de l'image pixélisée corrigée, et permet aux systèmes de reconnaissance optique de caractères (ROC) subséquents de reconnaître de manière précise les symboles corrigés. Le système fonctionne en accord avec un progamme qui trouve des groupes de pixels contigus dans l'image scannée, et il définit des matrices de caractères autour des groupes de pixels contigus. Les minima de ligne de base locales pour les groupes de pixels sont alors déterminées, elles sont utilisées pour créer une série de segments de ligne qui approchent la ligne de base non-linéaire. Les séries de segments de ligne sont ensuite ajustées à une ligne de base plate, et les pixels ou colonnes de pixels associés aux symboles scannés sont alors déplacés vers la ligne de base plate, dans une quantité proportionnelle à la distance calculée entre les segments de ligne et la ligne de base plate.
PCT/US1999/023605 1998-12-19 1999-10-11 Appareil et procede ameliores destines a la correction du desalignement d'images de symboles ayant une base de ligne non-lineaire WO2000038100A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU11082/00A AU1108200A (en) 1998-12-19 1999-10-11 Improved method and apparatus for deskewing images of symbols having a non-linear baseline

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US21575998A 1998-12-19 1998-12-19
US09/215,759 1998-12-19

Publications (1)

Publication Number Publication Date
WO2000038100A1 true WO2000038100A1 (fr) 2000-06-29

Family

ID=22804269

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/023605 WO2000038100A1 (fr) 1998-12-19 1999-10-11 Appareil et procede ameliores destines a la correction du desalignement d'images de symboles ayant une base de ligne non-lineaire

Country Status (2)

Country Link
AU (1) AU1108200A (fr)
WO (1) WO2000038100A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004015983A1 (fr) * 2002-08-07 2004-02-19 Hewlett-Packard Development Company, L.P. Accessoire portable pour le scannage de documents utilise avec un dispositif de communications sans fil manuel
US9621761B1 (en) 2015-10-08 2017-04-11 International Business Machines Corporation Automatic correction of skewing of digital images

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5638466A (en) * 1995-08-30 1997-06-10 Horizon Marketing Corporation Aka Wordwand Method and apparatus for deskewing images of symbols having a non-linear baseline
US5781660A (en) * 1994-07-28 1998-07-14 Seiko Epson Corporation Image processing method and apparatus

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5781660A (en) * 1994-07-28 1998-07-14 Seiko Epson Corporation Image processing method and apparatus
US5638466A (en) * 1995-08-30 1997-06-10 Horizon Marketing Corporation Aka Wordwand Method and apparatus for deskewing images of symbols having a non-linear baseline

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ROSENTHAL A S ET AL: "SIZE AND ORIENTATION NORMALIZATION OF ON-LINE HANDWRITING USING HOUGH TRANSFORM", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP),US,LOS ALAMITOS,CA: IEEE COMP. SOC. PRESS, pages 3077-3080, XP000788042, ISBN: 0-8186-7920-4 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004015983A1 (fr) * 2002-08-07 2004-02-19 Hewlett-Packard Development Company, L.P. Accessoire portable pour le scannage de documents utilise avec un dispositif de communications sans fil manuel
US7167604B2 (en) 2002-08-07 2007-01-23 Hewlett-Packard Development Company, L.P. Portable document scan accessory for use with a wireless handheld communications device
US9621761B1 (en) 2015-10-08 2017-04-11 International Business Machines Corporation Automatic correction of skewing of digital images
US10176395B2 (en) 2015-10-08 2019-01-08 International Business Machines Corporation Automatic correction of skewing of digital images

Also Published As

Publication number Publication date
AU1108200A (en) 2000-07-12

Similar Documents

Publication Publication Date Title
US7016536B1 (en) Method and apparatus for automatic cleaning and enhancing of scanned documents
JP4065460B2 (ja) 画像処理方法及び装置
JP3792747B2 (ja) 文字認識装置及び方法
EP0567344B1 (fr) Méthode et dispositif de reconnaissance de caractères
JP3696920B2 (ja) ドキュメント格納装置及び方法
US5625719A (en) OCR image preprocessing method for image enhancement of scanned documents
KR101985612B1 (ko) 종이문서의 디지털화 방법
JP3727974B2 (ja) 画像処理装置及び方法
EP0389988B1 (fr) Détection de segments de lignes et de modèles prédéterminés dans un document balayé optiquement
JP2002133426A (ja) 多値画像から罫線を抽出する罫線抽出装置
JP3411472B2 (ja) パターン抽出装置
US6813367B1 (en) Method and apparatus for site selection for data embedding
JP5538812B2 (ja) 画像処理装置、画像処理方法及びプログラム
JP3837193B2 (ja) 文字行抽出方法および装置
US5638466A (en) Method and apparatus for deskewing images of symbols having a non-linear baseline
JP3420864B2 (ja) 枠抽出装置及び矩形抽出装置
Rodrigues et al. Cursive character recognition–a character segmentation method using projection profile-based technique
US6175664B1 (en) Optical character reader with tangent detection for detecting tilt of image data
de Elias et al. Alignment, scale and skew correction for optical mark recognition documents based
WO2000038100A1 (fr) Appareil et procede ameliores destines a la correction du desalignement d'images de symboles ayant une base de ligne non-lineaire
EP0476873B1 (fr) Méthode et appareil de segmentation de régions d'image
JP3642615B2 (ja) パターン領域切り出し方式及びパターン抽出装置
Li An implementation of ocr system based on skeleton matching
JP2000187705A (ja) 文書読取装置および方法および記憶媒体
JP4070486B2 (ja) 画像処理装置、画像処理方法及び同方法の実行に用いるプログラム

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref country code: AU

Ref document number: 2000 11082

Kind code of ref document: A

Format of ref document f/p: F

AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase