CA2002544C - Image skeletonization method - Google Patents

Image skeletonization method

Info

Publication number
CA2002544C
CA2002544C CA002002544A CA2002544A CA2002544C CA 2002544 C CA2002544 C CA 2002544C CA 002002544 A CA002002544 A CA 002002544A CA 2002544 A CA2002544 A CA 2002544A CA 2002544 C CA2002544 C CA 2002544C
Authority
CA
Canada
Prior art keywords
image
templates
pixels
thinning
template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CA002002544A
Other languages
French (fr)
Other versions
CA2002544A1 (en
Inventor
John Stewart Denker
Hans Peter Graf
Donnie Henderson
Richard Edwin Howard
Wayne E. Hubbard
Lawrence David Jackel
Lawrence O'gorman
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AT&T Corp
Original Assignee
American Telephone and Telegraph Co Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by American Telephone and Telegraph Co Inc filed Critical American Telephone and Telegraph Co Inc
Publication of CA2002544A1 publication Critical patent/CA2002544A1/en
Application granted granted Critical
Publication of CA2002544C publication Critical patent/CA2002544C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • G06V30/168Smoothing or thinning of the pattern; Skeletonisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)
  • Image Processing (AREA)
  • Character Discrimination (AREA)

Abstract

-A method for improved thinning or skeletonizing handwritten characters or other variable-line-width images. The method scans a template set over the image to be thinned. Each template has a specific arrangement of dark and light pixels. At least one of those templates includes either more than three pixels per row or more than three rows of pixels. An odd number is good choice. Moreover, the templatesare chosen so that each template can unconditionally delete image pixels withoutconsideration of the effect of such deletions on the behavior of the other templates.
Thus the templates are independent of each other.

Description

2~0~S~

Image Sl~letoni7~tion Method Back~round of the Invention This invention relates to pattern analysis and recognition and, more particularly, to systems for thinning or skeletonizing the strokes of imaged symbols, 5 characters or binary-values images in general, that can be used in the classification process. This invention is related to an application, filed on even date herewith, entitled "Imaged Symbol Classification".
A wide variety of applications exist in which it is desirable for a machine to autom~tic~lly recognize, analyzè and classify character patterns in a10 given image. The explosion of compulcl-based information gathering, handling,manipulation, storage, and tr~nsmission systems offers the technology that makes the re~li7~tion of these desires possible. Elaborate programs have been written for general purpose COl~)u~elS to perform pattern recognition, but they have experi~nce(3 a limited level of success. That success was achieved mostly in the area of 15 recognizing standard printed fonts.
One character recognition technique that dates back to the early 1960's involves following the curve of the characters to be recognized. It has an intuitive appeal but, unfortunately, it often fails when the characters are mi~h~pen or have extraneous strokes.
Bakis et al. (IBM) reported on an approach for recognizing hand-printed numerals in an article titled "An Exp~rimPnt~l Study of Machine Recognition of Hand Printed Numerals," IEEE Transactions on Systerns Science and Cybernetics Vol SSC-4, No. 2, July 1968. In the described system, the numerals are convertedinto a 25x32 binary matrix. Features are extracted to reduce the rlimensionality of the 800 bit vector (25x32) to about 100, and the 100 bit vector is submitted to several categorizers. Some "norm~li7~tion" of the characters is also p~,~rol,lled. The authors reported a recognition rate of between 86 to 99.7 percent, depending on the handwriting samples employed. Because of the low recognition rate relative to the desired level for commercial applications, the authors concluded that "it would seem 30 that the course to follow is to combine curve-following type measurelllents ... with automatic feature selection and parallel decision logic."
In what appears to be a follow-up effort, R. G. Casey described an e~e- ;.".ont that exp~nded the "norm~li7~tinn" of Bakis et al. to a process of deskewing of the subject characters. "Moment Norm~li7~tion of Handprinted 35 Characters", IBM Jourr~l-of Research Development, September, 1970, pp S48 -557.
Casey used feature recognition in combination with curve following, as suggested by ~f 20~;~5~4 Bakis et al., and decision methodologies which included template matclling, clustering, autocorrelation, weighted cross correlation, and zoned n-tuples.
In a subsequent article, Naylor (also of IBM) reported on an OCR
(Optical Character Recognition) system that employs a compulel, an interactive 5 graphics console, and skew norm~li7~tion. "Some Studies in the Interactive Design of Character Recognition Systems", IEEE Transactions on Computers, September, 1971, pp 1075-1086. The objective of his system was to develop the appl~liate logic for identifying the features to be extracted.
In U.S. Patent 4,259,661 issued March 31, 1981, another extracted-10 feature approach was described by Todd. In accordance with the Todd approach, arectangular area defined by the character's e~ ~ties is norn-~li7ed to a pre~efinç~l size, and then divided into subareas. The "rl~rkness" of the image within each of the subareas is ev~ te~l, and the collection of the ~l~rkness evaluations is formed into a "feature vector." The feature vector is compared to a stored set of feature vectors 15 that r~l~sellt characters, and the closest match is selected as the recognized character.
In an article entitled "SPTA: A Proposed Algoli~ l for Thinning Binary Patterns", IEEE Transaction on Systerns, Man, and Cybernetics, Vol. SMC-14, No.
3, May/June 1984, pp. 409-418, Naccache et al. present a different approach to the 20 OCR problem. This approach addresses the fact that patterns are often made up of strokes that are wide, and that it may be of benefit to skeletonize the patterns. As described by Naccache et al, "skeletonization consists of iterative deletions of the dark points (i.e., changing them to white) along edges of a pattern until the pattern is thinned to a line drawing." Ideally, the original pattern is thinned to its medial axis.
25 The article briefly describes fourteen difrelent known skeletonization algoli~,ms, and then proposes its own algc.lilh.ll (SPTA). All of the described skeletonization algoli~ s, including SPTA, are based on the concept of passing over the image a square window of three rows and three columns (com.llo--ly referred to as a 3x3 window). As the square 3x3 window is passed across the image, the algorithms 30 evaluate the 8 pixel neighborhood surrounding the center pixel and, based on the evaluation, either convert a black center point to white, or leave it unaltered.Pattern classification received a boost from another direction with recent advances in the field of connectionism. Specifically, highly parallel con~ulation networks ("neural networks") have come to the fore with the work by Hopfield, 35 disclosed in U.S. Patent 4,660,166, issued April 21, 1987. Also, work continued on robust learning algorithms for multi-layered networks in which "hidden" layers of Z-~0~5~4 neural elçm~ontc permit separation of arbitrary regions of the feature space. This work, reported on, inter alia, by Gullichsen et al. in "Pattern (~l~csific~tion by Neural Ne~wolk~: An Ex~elimental System for Icon Recognition", Proceedings of the IEEE
First International Conference on Neural Networks, pp IV-725-732, Cardill et al., 5 Editors, concentrates on the character classification process. The system theydescribe uses some image preprocessing but no feature extractions. Instead, theyrely entirely on the inherent classification intelligence that the neural networks acquire through the "back propagation" training process. The reported system appar~ ly works, but as suggested by the authors, many questions rem~ine~ to be 10 investig~te l The system's ~ .rollllance is less than acceptable.
There exist many other character classification techniques, approaches, and algorithmc. For purposes of this disclosure, however, the above ~ere ~,nces provide a reasonable description of the most relevant prior art. Suffice it to say that with all the efforts that have gone into solving the character recognition (i.e., 15 cl~csific~tion) problem, the existing systems do not offer the accuracy and speed that is believed needed for a successful commercial system for recognizing hand written symbols.
Summary of the Invention This invention provides for improved thinning or skeletonizing of hand 20 written characters or other variable-line-width images, to thereby permit recognition with a higher degree of accuracy. Moreover, the increased ac~ ;y is achieved with an inherent increase in speed of procescin~
Like Naccache et al., supra, our invention uses templates to scan over the image to be thinned. However, whereas the prior art systems employ 3x3 25 templates, our invention employs templates that are greater than 3x3. Further, our templates are chosen so that each template can unconditionally delete image pixels without consideradon of the effect of such deletions on the behavior of the other templates. Thus the templates are independent of each other.
In accordance with one aspect of our invention the set of templates that 30 is employed includes different templates, or masks, each having a specific arrangement of dark and light pixels. At least one of those templates includes either more than three pixels per row or more than three rows of pixels. An odd number,such as 5, is a good choice.
Line thinning is achieved by passing each template over the image, in 35 steps. The number of steps is dependent on the chosen size of the template and the size of the image. At each step of each template, a decision is made whether to
-4- 2002544 delete one more pixels from the image. A conclusion to the :~ffirm~tive is executed independently of the decisions made in connection with other templates. Because of the independence of the templates, our system for skeletonizing operates on all the templates simultaneously. This increases the processing speed substantially, thereby
5 permitting effective OCR systems to be developed.
In accordance with another aspect of our invention, instead of passing a plurality of templates over the image, a single template is passed, but at each step the template is changed in size, starting with a template that is kxk, where k is greater than three, and decrementing the template by one with each substep. At each substep, a test 10 is made whether a deletion of a center core of size (k-2)x(k-2) would cause adiscontinuity to be created. When it is determined that a discontinuity would not be created, the core is deleted.
In accordance with one aspect of the invention there is provided a method for thinning an image composed of pixels arranged in an array, comprising: a step of 15 passing a plurality of templates, in parallel, over said image, and a step ofunconditionally determinin~, for each template and based on a comparison between said template and said image, whether a selected portion of said image can be deleted from said image.
In accordance with another aspect of the invention there is provided a 20 method for thinning lines of an image composed of an array of pixels, comprising: a first step of selecting a window of size kxk, where k is an integer, a step of applying thinning criteria to a portion of said image covered by said window to determinewhether a core subportion of said image can be deleted, a step of deleting said core subportion when said step of applying thinning criteria indicate that said core subportion 25 should be deleted, a step of reducing the size of said window by one when said step of applying thinning criteria indicate that said core subportion should not be deleted, a step of returning control to said step of applying thinning criteria when said step of reducing size yields a size of k greater than 2, and a step of selecting another window following said step of returning control and following said step of deleting.

-4a- 2002544 Brief D~s~ ;~lion of the Drawin~
FIG. 1 presents a general flow diagram of a classification method;
FIG. 2 presents an example of a problem resulting from use of independent 3x3 templates;
FIG. 3 shows the set of thinning templates used with our invention, which includes templates greater than 3x3;
FIG. 4 depicts a set of feature extraction templates;
FIG. 5 presents a flow chart of a thinning procedure that is different from the procedure used in connection with the templates of FIG. 3 but which employs windows that are greater than 3x3;
FIG. 6 illustrates the structure of a neural network decision circuit used in connection with the templates of FIGS. 3 and 4;
FIG. 7 depicts the structure of a two-layer neural network with analog-valued connection weights; and FIG. 8 illustrates one realization for an analog-valued connection weights neural network.
Detailed D~ lion FIG. 1 presents a flow chart of our process for character or symbol classification. In block 10, the character image is captured and, advantageously, stored in a frame buffer such as a semiconductor memory. The image may be obtained through electronic transmission from a remote location, or it may be obtained "locally"
with a sc~nnin~ camera. Regardless of the source, in accordance with conventional practice, the image is represented by an ordered collection (array) of pixels. The value of each pixel corresponds to the light (brightness, color, etc.) 2~0~S~4 em~n~ting from a particular small area of the image. The pixel values are stored in the memory.
Smudges and extraneous strokes are often found in proximity to characters, and their presence cannot help but make the recognition process more5 difficult. In accordance with our invention, block 20 follows block 10 and itsfunction is to cleanse the image. This is the first step in our effort to removemç~ningless variability from the image.
Usually, an image of a symbol or a character, such as a digit, contains one large group of pixels (contiguous) and a small number, possibly zero, of smaller 10 groups. Our cleaning algorithm basically identifies all such groups and deletes all but the largest one. If the deleted groups, together, constitute more than a certain percenLage of the ori in~l image, this fact is noted for later use, since it indicates that the image is anomolous. In the context of this description, it is ~sume~ that the image symbols are composed of dark strokes on a light background. A "reversed"
15 image can of course be handled with equal facility. The above cleaning algolith also ~ssnmes that the symbol set that is expected in the image does not contain symbols that call for disjoint strokes. The digits 0-9 and the Latin alphabet (save for lower case letters i and j) form such sets, but most other alphabets (Hebrew, Chinese, J~p~nese, Korean, Arabic, etc.) contain many disjoint strokes. For such other sets a 20 slightly dir~.ent cleansing algc,lithlll would have to be applied, such as looking as each disjoint area, rather than at the whole collection of such areas.
There are a number of processes that can be applied to detect and identify these extraneous areas. The process we use resembles a brush fire.
In accordance with our process, the image is raster scanned from top to 25 bottom in an effort to find "black" pixel groups. When such a group is found (i.e., when a black pixel is encountered that has not been considered before), the scanning is suspended and a "brush fire" is ignited. That is, the encountered pixel is marked with an identifier, and the marking initiates a spreading process. In the spreading process, each of the eight imm~orli~tely neighboring pixels are considered. Those 30 neighboring pixels that are black are similarly marked with the i~lçntifier, and each marking initiates its own spreading process. In this manner, the first encounter~d pixel of a "black" group causes the entire group to be quickly idçntifiecl by the selected identifier. At this point in the process, the sc~nning of the image l~umcs so that other groups can be discovered and identified (with a different idçntifier). When 35 scanning is completed and all of the "black" areas are identified area calculations can be carried out. As in-lisatecl above, all but the largest group is deleted from the .~0~54~

image (i.e., turned from dark to light, or turned OFF).
It may be noted at this point that in the character recognition art, it is more important to not make a mistake in identifying a character incorrectly, than to refuse to make a decision. For that reason, in a system that is designed to identify 5 numerals or other character sets that do not have disconnected strokes, the area removal threshold should be set to a fairly low level.
Ordinarily it is expected that the pixels comprising the meaningful part of the image will be contiguous in a strict sense (in the aforementioned 0-9 character set and the Latin alphabet). On the other hand, an exception should be made, 10 perhaps, when areas are separated only slightly, and external information leads one to believe that it is possible for a character stroke to be inadvertently broken (such as when writing with a poor pen or on rough writing surface). To provide for such contingel-cies, our process for spreading the "_re" includes an option for defining the neighborhood to include eight additional pixels that are somewhat removed from the 15 eight immeAi~te pixels (the eight pixels being corners of a larger window and center pixels of sides of the larger window). In effect, we permit the "fire" to jump over a "fire break".
The process of scaling the image to a given size, in block 25, follows the cleansing process. Scaling, of course, removes a m~ningless variability of the 20 image. The sequence of cleansing followed by scaling is imposed by the desire to not scale the image that includes smudges. The scaling process can use any one of a number of dirr~lel~t algolithllls. For example, in accordance with one algulilhlll, the image can be scaled in both dimensions by an equal factor, until one of the image ~limen~ions reaches a fixed size. Another algorithm scales independently in the two 25 ~ ;on~, subject to some constraint on the largest dirr~,,ence in the scaling factors of the two ~limen~ions. Both approaches work well and, therefore, choice of the algulithlll and its implementation are left to the reader. We scale each of the character images with the first-described alg~ lllll into a convenient number ofpixels, such as an 18x30 pixel array.
People generally write characters at a slant. The slant is dirr~cnt from one person to another. The slant, or skew, of the characters is another meaningless variability of written characters that carries no information, and therefore, weremove it.
Returning to FIG. 1, block 30 which follows block 25 deskews the 35 image. Stated differentl,v, it is the function of Block 30 to make all characters more uniro~ ly upright.

Block 30 can use any one of a number of conventional procedures for deskewing an image. One such procedure subjects the image to a transformation ofthe form rUl 1 -m~y/myy x - xO
~v~ 0 1 Y Yo 5 where x and y are the original coordinates of the image, xO and yO define an origin point, u and v are the coordinates in the transformed image, and mxy and myy are the image moments calculated by mxy = ~(x-xo)(y-yo)B(x~y) x,y and myy = ~,(y-yO)2B(x,y) In the above, B(x,y) assumes the value 1 when the pixel at position x,y is "black", and the value 0 otherwise. The effect of this function is to reduce the xy l~mellt to essentially 0.
Scaling (block 25) and deskewing (block 20) are both linear 15 transformations. Therefore, the composition of the two is also a linear transformation. It may be advantageous to apply the compound transformation to the cleansed image to produce the deskewed image directly. This combined operation allows us to avoid an explicit represçnt~tion of the scaled image as an array of pixels. This elimin~tes a source of (coll~uL~Lion) noise.
Block 40, which in FIG. 1 follows block 30, thins the image. Thinning of the image also removes m~nin~less variability of the image. As indicated above, the prior art methods for skeletonization use a 3x3 window that is passed over the image. The center point of the 3x3 window is turned OFF if certain conditions are met; and those conditions, in most of the methods, involve repeated tests with 25 dirrelcnt pre~çfin~ window conditions. For example, the Ben-Lan and Montoto algorithm states that a dark center point is deleted (i.e., turned OFF or turned light) if it satisfies the following conditions:
1) the pixel has at least one light 4-neighbor, and 2) the neighborhood does not match any of 8 predefined 3x3 windows.
30 A 4-neighbor is a pixel that is east, north, west, or south of the pixel under consideration.
Algorithms like the one described above are quite acceptable in sorlw~e implementations because, until recently, processors were able to handle only onetask at a time anyway. However, these algorithms are necessarily slow because of 2~544 their sequential nature. Furthermore, each of these prior art tests zeroes in on a certain characteristic of the pattern, but not on other characteristics. To thin strokes of different character (e.g., vertical lines and horizontal lines) dirr~.eilt tests must be applied. Additionally, with prior art tests there is a need to p~.rO~ at least some of S these tests sequentially before one can be sure that a particular pixel may be deleted;
and the pixel cannot be turned OFF until these tests are pc.rol.l-ed. The example of FIG. 2 illustrates the problem.
In FIG. 2, templates 100 and 110 are two 3x3 pixel windows. The three top pixels in template 100 are circle-hatched to designate searching for ON pixels.
10 The center pixel and the pixel in the center of the bottom row are crossh~tehç~l to ~i~Sign~te searching for OFF pixels. The rçm~ining pixels are blank, to design~te a "don't care" condition. Template 100 searches for the edge condition of light space (pixels 101, 102, and 103) above dark space (pixels 104 and 105), with the caveat that the dark space must be at least two pixels thick. When such a condition is 15 encountered, the center pixel (104) is turned from ON to OFF (dark to light). Thus, template 100 provides a mechanism to nibble away from an ON area, from the top, until there is only one ON row left.
Template 110 operates similarly, except that it has the bottom row looking for OFF pixels while the center pixels of the first and second row are looking 20 for ON pixels. Template 110 nibbles ON (dark) areas from the bottom.
The above templates which thin horizontal lines and do not thin vertical lines illustrate the desirability of passing a number of dirr~ ,nt templates over the image, with the different templates being sensitive to dirr~.e.lt characteristics of the image. It is also desirable (from a speed standpoint) to pass the various templates 25 concull~ntly. However, in the FIG. 2 image segment 106, templates 100 and 110cannot be applied concurrently because, if that were done, the depicted 2-pixel wide horizontal line would be completely elimin~t.oA The top row would be deleted by template 100, and the bottom row would be deleted by template 110.
If line thinning is to be performed efficiently, this interdependence 30 between dlrr.,ient templates must be broken.
We found that, unexpectedly, this interdependence can be broken by employing a window that is greater than 3x3. Hence, we use a template set which contains at least some templates that are greater than 3x3. Some are 3x3, some are 3x4, some are 4x3, and some are 5x5. The characteristic of the collection is that the 35 templates can be passed over the irnage concurrently. This capability comes about from the particular selection of templates, which allows the image to be altered in %~0~5~4 response to one template without having a deleterious effect on the ability of another template to independently alter the image. This fairly unique set of templates is shown in FIG. 3.
We discovered that the set of templates depicted in FIG. 3 is a sufficient 5 set. Other sets are possible, of course, but, in accordance with our inventions, such sets are characterized by the inclusion of at least one template that is greater than 3x3.
To describe the operation of the depicted templates, we start with templates 120 and 140. These templates correspond to templates 100 and 110 of 10 FIG. 2. Template 120 is shown as a SxS array but, in essence, it forms a 3x3 window, since the the outer columns and rows are at a "don't care" condition.
Template 120 differs from template 100 in that pixels 121 and 122 in template 120 test for ON pixels, whereas the correspondingly positioned pixels in template 100 are set to "don't care". That is, template 101 makes sure that the pixel nibbled away 15 (turned light) is above a line that extends in both directions. Template 140, on the other hand, differs from template 110 in that, effectively, it is a 3x4 template. It includes a 3x3 portion that is similar to the 3x3 template 110 (other than pixels 141 and 142), and it also includes a pixel 143 at the center of the first row. Pixel 143, in effect, requires a horizontal line to be 3 pixels wide before a pixel is permitted to be 20 nibbled away (from the bottom).
Templates 130 and 150 form a template pair like the template pair 120 and 140. Templates 130 and 150 thin vertical lines. Templates 160, 170, 180, and190 thin "knees" pointing to the right, left, up and down, respectively; templates 200, 210, 220 and 230 thin slanted lines from above and from below; etc. It may be noted 25 that templates 160-230 are all SxS templates.
In accordance with another approach to skeletonization, we have discovered that templates of size kxk, where k is greater than 3, can follow a specific algo~ for any value of k. This algorithm can be implemented iteratively or in parallel. The operation of the kxk template is to erase the central (k-2)x(k-2) core of 30 the template whenever certain criteria are met. As can be anticipated, larger values of k result in coarser thinning but require fewer colllpu~alions.
The thinning criteria can be stated as follows. For a kxk template, if its core, R(x,y,k) is ON (dark), then it may be turned OFF (deleted, or turned light) if:
1-%(11)=l, 2. ~1 (Tl) > k-2, and 3. ~o(Tl) > k-2, 54~

where ~0(rl) is the m~imu~ll length (in pixels) of chains of 4-connected OFF pixels in the 4(k-1) perimeter of pixels surrounding the core. This is the neighborhood, ~.
Also, ~ ) is the maximum length of chains of 8-connected ON pixels in the neighborhood and %(11) is the number of chains of 8-connected ON pixels in the S neighborhood. The value of X(Tl) can be calculated in accordance with 4(k~ (i-l) I + ~ (m+l)-1l(m-l) ¦
i=O m=n(k-l) -(ii-l I (k-l) + ~, ~l(m)l~l(m)-Tl(m-l)l l~l(m)-ll(m+l)¦, m=n(k-l) for n=0,1,2,3 and rl(-l) = ~(4k-5).
In the above, Tl(i) corresponds to the i~ pixel in the neighborhood rl, counted from 10 the top left corner of the neighborhood and moving clockwise; and the value of ~(i) is 1 when the corresponding pixel is ON and 0 when the corresponding pixel is OFF.
Eight-connectedness is defined in the following manner. Two pixels are in the same 8-connected chain if one is adjacent to the other in any of its 8 neighbors. Whereas, 4-connected chains contain adjacent neighbors only in horizontal or vertical 15 directions; not diagonal.
Criterion (1) is necessary so that the connectivity of the structure is not altered. If %(rl) = 1, then the neighborhood contains a single chain of 8-connected ON pixels, and the erasure of the core does not break connectivity bel~n the core and any ON chains in the neighborhood. If %>l, then there are two or more chains of 20 8-connected ON pixels in the neighborhood, and erasure of the core will sep~te the chains, destroying connectivity. If % = 0, then the core is either isolated with no neighborhood pixels that are ON, or it is enclosed completely by ON pixels. In such a case erasure is not desired.
Criterion (2) m~int~in~ endlines (an endline is the end of a line). At 25 level k, an endline is defined as that with width less than or equal to the length of the core side, k-2. For a core which has an 8-connected ON chain of k-2 pixels or fewer, that core is defined as an endline at level k, and maintained. When %(r~ ) is equal to the number of ON pixels in the neighborhood.
Criterion (3) can be viewed as the inverse condition of criterion (2).
30 Where criterion (2) prevents endlines from being eroded, criterion (3) prevents inward erosion of OFF regions into ON regions.
The steps of the sequential, multi-value kxk thinning algorithm are listed below.
1. For each location (x,y) in ascending x,y order:

.~0~;~544 (i) set k'=k;
(ii) for kernel R(x,y,k'), consider any erased neighborhood values as ON, and test the thinning criteria;
(iii) if the thinning criteria are met in (ii), then for each side and its S adjoining corners, set any erased values to OFF -- except for an ERASEDA anchor value in the NW corner that is set to ON -- and set all other erased valued to ON; test connecdvity with respect to the thinning criteria, and if they are met, set the core to ERASED, or ERASEDA if it is an anchor core; otherwise, set k'=k'-l, and if k'23 go to (ii);
2. if no pixels were turned to ERASED or ERASEDA, stop; otherwise, set all ERASED and ERASEDA values to OFF, and repeat (1).
In the above, an anchor is a core that is located at the beginning endline of a diagonal that is oriented in the direction of sc~nning When it is erased, its pixels are m~rk~
15 with values ERASEDA, and this marking is used to prevent further erosion of the endline. For the NW-to-SE scanning order, an anchor is a NW endline; that is, a kernel whose north side and two corners, and whose west side, contain only OFF
values.
In the parallel algorithm, all the pixels of the image can be operated 20 upon ~imlllt~neously because the thinning results on a pass do not affect the thinning operations on that pass. To accomplish this independence, each iteration (application of the criteria to windows throughout the entire image) is separated into four separate sub-cycles and thinning is applied only to kernels which are on N, S, E, and W borders on the four subcycles respectively.
The rules which are used to assign compass directions to a kernel are the following:
a kernel is a north border kernel if the side on the north contains only OFF values (a "side" refers to the pçrim~ter pixels in a row or a column, eY~ ing the corner pixels);
30 a kernel is a south border border kernel if the side on the south contains only OFF
values and the kernel is not a north border kernel;
a kernel is an east border kernel if the side on the east contains only OFF values and the kernel is not a north or south border kernel; and a kernel is a west border kernel if the side on the west contains only OFF values and the kernel is not a north, south, or east kernel.

.~0~254~

The general flowchart of the algorithm is shown in FIG. 5. Depending on whether a sequential or a parallel implementation is desired, the algorithms differ in their specifics, as described below.
The steps of the parallel, binary kxk thinning algorithm are listed below.
5 Note that for this algu,i~ there is no need to retain erased values, erasure is to OFF.
1. In a repeating circular sequence in the order, {N, S, E, W~, do for all border kernels:
(i) set k'=k;
(ii) for kernel R(x,y,k'), test the thinning criteria, and if they are met, erase the core to OFF; otherwise, set k'=k'-l, and if k'23, repeat (ii).
(iii) If no pixels were erased on the last four consecutive subcycles, stop; otherwise repeat (1) for the next border direction in the sequence.
15 This ends the thinning process description.
Returning to FIG. 1, Skeletonization block 40 is followed by feature extraction block 50. Although operationally similar, skeletoni7~fiQn is different from feature extraction from a functional stand point. In the former, one identifies superfluous pixels and turns them from dark to light. In the latter, one identifies 20 relatively macroscopic characteristics that help classify the character. The macroscopic characteristics iden*fiYl are the kind of characteri~*cs that are not dependent on the size or thickness of the character, but are the ones that give the character its particular "sign~t~re". Hence, it is these characteristics that block 50 seeks to identify.
Operationally, feature extraction is accomplished by passing a collection of windows over the image. Each window in our system is a 7x7 template, and eachtemplate detects the presence of a particular feature; such as an end point, diagonal lines, a horizontal line, a vertical line, etc. The detection works by a majority rule in the sense that when the majority of the 49 pixels (7x7) fit the template, it is 30 concluded that the feature is present. In our system we employ 49 different 7x7 templates, as depicted in FIG. 4. For each of the templates we create a "feature map"
which basically in~isates the coordinates in the image array where the pattern of the template matches the image.
Having developed the 49 feature maps corresponding to the 49 35 templates of FIG. 4, we develop a number of super-feature maps in block 60 that are logical combinations (AND and OR) of the feature maps. We thus reduce the set from 49 maps to 18 maps (of 18x30 pixel arrays). The reduced number has been determined heuristically.
We call the arrangements of the detected features "maps" because we structure an array (in the memory where we store them) and we place the feature detections in the appr~pliate locations in the array. In this manner we record the presence of a feature and its location. Other mech~ni~m~ for recording "hit" location ~lesi~n~fions can be used, but it is still conceptually simpler to think in term of maps.
It turns out that the 18x30 array is too det~ile~3 for cl~sifi~fion purposes. The detail can actually mask the character and make the c~ ification task 10 more difficult (as in the saying "you can't see the forest for the trees"). Accordingly, block 70 performs coarse blocking to reduce the 1 8x30 feature maps to feature maps that are only 3x5. This results in a final map or vector of 270 bits, which corresponds to the 18 3x5 maps.
Lastly, block 80 performs the classification algoli~lllls to determine, from the given 270 bits, the most likely cl~csific~tion c~n~ tç. A simple algcni~ , such as determining the lowest H~mming distance, will suffice once it is known what templates most likely correspond to the characters the are to be i~lentifi~ The key, of course, lies in ~etçrmining these templates; and that aspect calls for the learning methodologies (such as back propagation) that the art is currently dealing with.

Hardwal~ embodiment Although FIG. 1 depicts the process of our OCR system, it is also quite representative of the hal~.lw~e realization. The actual details of the signal flow would vary with the particular design, but that is perfectly well within the conventional circuit design arts. For purposes of the following discussion, it may be considered that our system operates in a pipelined fashion and that each electronic circuit block applies the necessary signals and controls to the following circuit block, together with the n~ces~ry i~l~ntific~tinn as to which pixel is being considered.
As suggested earlier, block 10 comprises conventional apparatus that is tailored to the particular source of the image to be cl~sifito~l~ It can simply be a video camera coupled to a commercial "frame grabber" and a memory. When the classification process begins, the memory is accesse-l to retrieve the center pixels and the 24 neighboring pixels, and the collection of retrieved signals is applied to block 20.

2~2544 Blocks 20 and 30 are currently implemented on a SUN workstation with the simple programs presented in the appendix. Local memory is included with themicroprocessors to store image signals and temporary col-lpu~Lion results, as necessary. Practically any microprocessor can be similarly utili7.ed, but if higher S speed is required than is obtainable with a microprocessor, specific ha~dwafe can be designe~ in a conventional manner to carry out the needed calculations. In fact,since the operations required are merely additions, subtractions, comparisons, and rudimentary multiplications, a pipelined architecture can easily be designed that offers very high throughputs.
The output of block 30 is a sequence of signal sets, each having an associated center pixel and its neighboring pixels. Block 40 is implemented with the neural network of FIG. 6 which includes a series connection of a switch 400, a template match network 410, and a threshold network 420. The input signals, which correspond to the 25 pixel values of the image covered at any instant by the 5X515 window are applied to switch 400 at inputs 410. Switch 410 insures that thesevalues are applied to network 410 simlllt~neously. Network 410 includes 25 inputleads and a number of output leads that equals the number of templates stored.
Within network 410, all input leads are connecte~l to each output lead through acolumn of preset connection nodes. Each such column of connection nodes (e.g. the 20 column cont~ining nodes 411-414) corresponds to a stored template. Thus, the signal of each output lead represents the affinity of the input signal to a different template. More specifically, the connection nodes are of three "varieties"; to wit, excitatory (E), inhibitory (I), and "don't care" (D). Response to a match or a mi~m~tch differs with each of the v~ieties in accordance with the truth table below.
input synapse output E

Nodes 411 that implement this truth table are easily realized with gated amplifiers.
The information of whether a node is an E, I, or D node, can be stored in a two flip-flop set associated with each node (when variability is desired).
Alternatively, the information can be "hardwired" with an array of links associated 35 with the array of nodes. The programming of the templates (i.e., connections) can be 2~)~25~

achieved through a burn-through of the applupliate links. Of course, if the templates are completely llnch~nging, one can design the template information directly into the integrated circuit mask of the nodes' array.
The current of the output lines flows into an impedance, and the flow 5 causes the voltage of each output line of network 410 to rise to a level that is proportional to the degree of match between 1 's in the set of input signals andexcitatory nodes. Of course, the voltage is also tlimini~herl by the degree of match between l's in the set of input signals and the inhibitory nodes.
The output lines of network 410 are applied to threshold network 420, 10 where that impedance can optionally be placed. Network 420 applies a set of thresholds to the output signals of network 410. Specifi~lly, network 420 comprises a set of two-input amplifiers (e.g., 421-424) having one input responsive to the input leads of network 420, and a number of sources (e.g., 425-427) that connect to the second input of amplifiers 421-424. Each of the sources supplies a di~rc-cnL current 15 and, co,-~,~ondingly, each arnplifier 421-424 develops a voltage on its second lead that is related to the specific connection that the lead has to sources 425-427. In this manner, dirrc-cnt thresholds can be applied to the dirît-~,nt amplifiers within network 420. The output leads of network 420 are the outputs of amplifiers 421-424, and they take on the logic value 1 or 0, depending on whether the signal input of an20 amplifier eYcee~ls the threshold or not.
Block 50 is constructed with a neural network such as the one depicted in FIG. 6. However, since the block 50 neural network deals with 7x7 templates as compared to the 5x5 templates of block 40, a memory 55 is interposed between the two neural networks to buffer the data.
Block 60 generates the 18 feature maps. It simply takes the outputs of block 50 and, together with the signal that specifies the identity of the center pixel, stores the appr~iate information in a memory. The result is 18 memory segments, with each segment containing information about the features found in the image.
Each such segment is, thus, one of our feature maps.
The coarse blocking of block 70 is achieved by using 18 additional smaller memory segments, perhaps in the same physical memory device. In these smaller memory segments, block 70 stores information about the featur~s that arefound in appr~liately selected portions of the larger memory segmentc~ When the original image is 18 pixels by 30 pixels in size, the selection can be easily 35 accomplished with a counter that operates in modulus 5, where the full value of the counter is used to access the larger segments, while the whole number after division 20~4~

by the modulus is used to identify the cells in the 18 smaller memory segments.
The 270 memory locations of the smaller memory segments form the output of block 70 and make up, in effect, a vector that describes the charactercontained in the image.
The last function that needs to be carried out is to apply this vector to some network that would select the most likely candidate character for the given feature vector. This is the function of block 80.
Block 80 can be implemented in many ways. For example, the content-addressable teachings of Hopfield in the aforementioned U.S. Patent No. 4,660,166 can be used to advantage. In accordance with his teachings, one can impart to the feedback network of his circuit the information about the characters in the subject set. With such information in place, the content-addressable memory identifies the feature vector of the character that is closest to the applied feature vector. The Hopfield network is very robust in making the "correct" choice even when the input appears to be quite distorted.
It is a little difficult, however, to design the feedback network for the Hopfield circuit because all of the stored vectors are distributed throughout the feedback network and commingled with one another. This difficulty is compounded by the fact that we do not exactly know how we recognize a "4", or the limits of when we can recognize a "4"
and when we are so unsure as to decline to make a decision. Yet, we know a "4" when we see one!
Current research ~l~e~ to solve this problem by having the classifier circuit "learn", through trial and error, to reach the correct decisions. One structure that has the potential for such "learning" is depicted in FIG. 6. This technique is commonly referred to in the art as "back propagation". It is described, for example, by D.E. Rumelhart et al. in "Learning Internal Representations by Error Propagation", in D.E. Rumelhart, J.L. McClelland (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition, MIT Press, 1986, Chap. 8.
FIG. 7 comprises interconnection networks 81 and 82 that are serially connected. The input signal set is applied at the input of network 81, and the output signal set appears at the output of network 82. Each of the networks has a plurality of input and output leads, and each input lead is connected to all of the output leads.
More specifically, each input lead i is connected to each output lead j through a connection weight wjj. In our application, network 81 has 270 input leads and 40output leads. Network 82 has 40 input leads and 10 output leads. The number of input leads of network 81 is dictated by the length of the feature vector. The number of outputs of networ~ 82 is dictated by the nurnber of characters in the classifying set. The number of int~rrn~ te leads (in this case, 40) is ll~t~rmine~l heuristically.
Training of the FIG. 1 circuit is carried out by applying a developed feature vector of a known character and ad~usting the weights in both n~lwuLk 81 and 5 82 to maxirmize the output signal at the designated output lead of networ~ 82 corresponding to the applied known character. All available samples of all the characters in the set to be cl~csifiç~ are applied to the network in this f~hion, and each time, the weights in the interconnection network are adjusted to maximize the signal at the a~Lu~liate output lead. In this manner, a set of weights wij is 10 developed for both networks.
It may be appropriate to explicitly mention that the connection weights wi; are analog in nature and that the circuit operates in an an~log fashion. That is, the voltage at any output lead of network 81 is a sum of the contributions of the "fired up" weights connected to that output lead. Each weight is "fired up" by a15 binary signal on the input lead to which the weight is connected. Thus, the output at lead j equals ~Biwij where Bi is the value of the i'h input lead (0 or 1).
Though the concept of such a learning network is fairly well understood, 20 the task remains to realize such an analog circuit efficiently and compactly. The re~luiLe" ,ents on such a circuit are not trivial. For example, the minimllm weight change, or modification, must be fairly small if optimization of the networ~ is to be achieved. The iterative improvement methodology described above is based on the heuristic assumption that better weights may be found in the neighborhood of good 25 ones, but that heuristic f&ils when the ~ranulæ~ity is not fine enough. We found that for a small network 81, at least 8 bits of analog depth are necessary. L&rger networks may require even finer granularity. The weights must &SO represent bothpositive and negative values, and changes must be easily reversible. During the learning and tr&ning session the number of changes to the weights can be quite 30 l&rge. Therefore, a practical circuit must allow for quick modification of the weights.
T~icing these an~ other L~uiL~ents into account, we have created &n efficient analog connection weight, or strengt~, circuit with MOS VLSI technology.

- B

2~

Whereas each connection weigh~ in FIG. 7 is depicted with merely a blac~ dot. FIG. 8 presents a circuit for implementing these dots. More particuLarly, FIG. 8 shows one conneciton weight circuit with itS connection to input lines 83 and output 'fine 84, as well as some common circuitry. Primarily, the interconnection 5 weight portion of the '~G. 8 circuit includes capacitors 801 and 802, srnall MOS
switches 803 and 804, a relatively large MOS transistor 805, a differential a nplifier 806, and a multiplier 807. Secondarily, the circuit of FIG. 8 includes a charge-coupling switch 808, a sensing switch 809 and various control leads.
The circuit operates as follows. Capacitors 801 and 802 are charged to 10 different voltage levels, and the difference in voltage levels is reflected in the output voltage of differential ampliier 806. Amplifier 806 has its two inputs connected to capacitors 801 and 802. The output of amplifier 806, which represents the connection weight, is connected to multiplier 807. Multiplier 807 can be any conventional transconductance amplifier. Also connected to multiplier 807 is input 15 lead 83 of the interconnection networ~. The output of converter 807 is conneoted to an output lead of the interconnection network. Thus, multiplier 807 sends a current to the output lead that is a product of the signal at he input lead and the value of the connection weight. The connection weight is represenre~l by the differential voltage developed by amplifier 806 in response to the difference in voltages between 20 capacitors 801 and 802.
We have found that the difference in voltages on capacitors 801 and 802 is m~int~ine~l for a long time (relative to the operations involved in OCR systems) and that no refreshing is necess~ry when the circuit is kept reasonably low temperatures. For example, at 77 degrees Kelvin no detectable loss has been noted 25 with time. It may be observed that one advantage of our circuit is that the weight is propor~ional to Vc80~ - Vc~ and, therefore, even a loss in charge -- when it is the same at both capacitors -- results in no change to the weight.
Nevertheless, an avenue must clearly be provided for refreshing the information on capacitors 801 and 802. Moreover, an avenue must be provided for 30 setting a voltage (charge) value on capacitors 801 and 802 and for modifying the set values to allow for the above-descri~ed "learning'' procedure. This is where ther~m~ining sw~tches and controls come in.
To bring a connection weight to a desired level, switch 808 is closed mom~n~nly tO allow a fixed voltaae level to be applied to capacitor 801 from 35 voltage source 816. Thatvoltagecorresponds to afiYedcharge. Thereafter, swi~ch 808 is ;u.~ei orI. At this poin~. ~he v.eigh; of ~uie connecrIon is at a maYimum 2~5~4 positive level because capacitor 801 is connected to the non-inverting input of amplifier 806 and carries a positive voltage, while capacitor 802 is connected to the inverting input of amplifier 806. A change in the connection weight is accomplished in the following way.
First, transistors 803 and 805 are turned on. Transistor 803 is very small compal~;d to transistor 805 and for the sake of a better understanding of what happens, transistor 803 can be thought of as being merely a switch. By colllp~ison, transistor 805 is long and narrow and when it is on it can be thought of as a capacitor. When switch 803 is closed and transistor 805 (assuming it is an n channel 10 device) is turned-on, the charge on capacitor 801 is distributed between the capacitor (801) and the inversion charge on the turned on transistor 805. Transistor 803 is then turned off, thereby trapping the charge in transistor 805. Transistor 804 is then turned on and if transistor 805 is slowly turned off, the mobile charge in its channel will diffuse through switch 804 into capacitor 802.
The above steps thus move a quantum of charge from capacitor 801 to capacitor 802. That corresponds to a change in the capacitors' voltages and in the interconnection weight.
The above sequence can be repeated as many times as necessary to bring the connection weights to the desired levels. In this manner, the optimi7~tion of the 20 connection weights can proceed during the training period, with the result that each interconnection weight in neLw~ s 81 and 82 is set to the correct level.
The above description addresses the training aspect of the circuit. Once the learning process is over, means should be provided for 1) det~rmining the values of the weights and 2) refreshing the weights to compensate for losses with time, etc.
25 This is accomplished with the aid of sensing switch 809, an A/D converter, a D/A
converter, and a non-volatile memory.
To ~etçrmine the value of the weights in an interconnection network, all of the input leads are turned on, one at a time. Each time a lead is turned on, the sensing switches (809) of the weights connected to that input lead are sequentially 30 turned on to allow each amplifier's voltage to appear on sensing bus 810. That voltage is applied to A/D converter 811 and the resulting digital information isstored in memory 812. All of the weights are converted to digital form in this manner and stored in memory 812. During a refresh operation, each connection weight is isolated in the manner described above, but this time the voltage output on 35 sensing bus 810 is compared in amplifier 814 to the analog voltage of D/A converter 813, to which the digital output of memory 812 is applied. Of course, memory 812 ZC~Q2~544 is caused to deliver the digital output that corresponds to the refreshed connection weight. Based on the comparison results, the sequence of switching elements 803,804, and 805 is controlled by the output signal of amplifier 814 to either increase or rlimini.ch the voltage of capacitor 801 relative to capacitor 802. The control of S directing the output of bus 810 to either A/D converter 811 or to colllpal~toramplifier 814 is effected by switch 815. Should it be necessary to completely discharge both capacitors 801 and 802, the voltage of source 816 can be reduced to zero and switches 803, 804, and 805 can be turned on.

2~254~

APPENDIX

Il // fire.c 9~7/88 LDJ
S //
// check for broken images // returns -1 if completely blank // returns 0 if connected // returns 1 if connected except for samll flyspecks 10 // returns 2 if badly disconnected // uses recursive brushfire algo~

// Di~nostic output: prints number of segmPnt~, // and location and code of the largest // Side effect: sets up Lseg (in rec2com) 15 // IMPORTANT ASSUMPTION: firel assumes img.pix black pixels are POSITIVE
// and white pixels are zero ---// If you can't guarantee that, call fire() instead of firet) // negative pixels will cause trouble // This routine modifies img.pix ! !

20 #include <stdio.h>
#include "rec2types.h"
#include "errl.h"
#include "fire.h"

inline int imin(int a, int b){return(a<b? a: b);}
25 inline int imax(int a, int b){return(a>b? a: b);}

static int xdl; // copy of img.x static int ydl; '! and img.y 2~5~S4 static char** pixl;// and img.pix static Pair* pl;
static Pair* p2;
static int list_size = -1;

S static Seg myseg;// segement being processed Seg Lseg; // longest segment there is static int nseg;
static int totpix;// total pixels in image int firel(Image img) {
// make sure we have allcated enough room to keep our pair-lists:
int lsiz = img.x * img.y;// max possible size of list r~uired if (lsiz > list_size) {
if (pl) {
15 delete pl;
delete p2;
}

pl = new Pair[lsiz];
p2 = new Pair[lsiz];
20 list_size = lsiz;
}

Lseg.list = 0; // no longest segment yet r~eg-si7e = 0;// first seg will beat this for sure nseg = 0;
25 totpix = 0;

// find first black pixel, so we can initiate the burn there:
int xx; int yy;
for (yy = 0; yy < img.y; yy++) {
for (xx = 0; xx < img.x; xx++) {
30 if (img.pix[yy][xx] > 0) {

2~ 5gL4 /// fprintf(stderr, "firstx = %d firsty = %d 0, xx, yy);

// a lot of these things might logically be arguments to burn(), // but static variables are faster & simpler nseg++; // count this segment myseg.ashes=-nseg;
myseg.list = (Lseg.list != pl) ? pl: p2;
myseg.size = O;
xdl = img.x; ydl = img.y; pixl = img.pix;
burn(xx, yy);/t burn, baby, burn!
if (myseg.size > Lseg.size) Lseg = myseg;
}
}

#ifdef TEST
15 fprintf(stderr, "Saw %d segmentsO, nseg);
if (nseg) fprintf(stderr, "Longest (code %d) starts at %d %dO, ~ ~eg ~hes, Lseg.list[O].x, Lseg.list[O].y);
#endif if (nseg == O) return -1;
20 if (nseg = 1) return 0;
float frac = float(Lseg.size) / float(totpix);
const float minfrac = .9;
if (frac >= minfrac) return l;
return 2;

}

// the magical recursive burning routine // turns to ashes all points that are 8-connected to the initial point void burn( 30 int xcent, int ycent // center of 3 x 3 region of interest )( 2~`Q~544 // if this point is off-scale:
if(xcent<O ycentcO xcenv=xdl ycenv=ydlj return;

// NOTE: this is indeed a check for > O, // not just nonzero, so things don't burn twice.
5 if(pixl[ycent][xcent] > O){
int top = myseg.size++;// keep track of length of segment totpix++; // count total pixels pixl[ycent][xcent] = myseg.ashes;// turn this point to ashes myseg.list[top].x = xcent;
10 myseg.list[top].y = ycent;

burn(xcent+l, ycent+l);// ignite neigbors burn(xcent+l, ycent );
burn(xcent+l, ycent-l);

burn(xcent, ycent+l);
15 burn(xcent, ycent-l);

burn(xcent-l, ycent+l);
burn(xcent-l, ycent );
burn(xcent-l, ycent-l);

#define juml)blcaks YES
20 #ifdef jumpbreaks int jump = imax(xdl, ydl);
jump = jump / 20;
if (jump < 3) jump = 3;
if(jump> 1) {
25 burn(xcent+jump, ycent+jump);// ignite more distant neigbors burn(xcent+jump, ycent );
burn(xcent+jump, ycentjump);

burn(xcent, ycent+jump);

2~2544 burn(xcent, ycentjump);

burn(xcentjump, ycent+jump);
burn(xcentjump, ycent );
5 burn(xcentjump, ycentjump);
}

#endif // if this point NOT set, or already burned, do nothing 10 return;
}

// same as above, but does not assume that black pixels are positive;
// non-zero suffices.

15 int fire(Image img){
for (int yy = 0; yy < img.y; yy++) {
for (int xx = 0; xx < img.x; xx++) {
img.pix[yy][xx] = img.pix[yy][xx] != 0;
}

20 }
return ( firel(img) );
}

***************************************************************

25 111111/11111111lllllllllllllllllllllllllllllllllllllllllllllllllllllll // do most of the work for linear transformation program 2~0~544 rO~ linear transformations on post of fice data // i.e. convert to standard size and aspect ratio // see lin.plan for extended discussion S // Note: xypix[][] will contain small integers 0..9 // for graylevels below threshold, you get zero;
// for graylevels above threshold, you get the graylevel number // This gives you the option of treating it as a boolean // if you don't care about graylevels.
10 // Caller allocates the array; we fill it.

#include <stdio.h>
#include ~m~th h>
#include "new_array.h"
#include "/nets/util/util.h"
15 #include"rec2types.h"
#include "do_lin.h"

inline float fmin(float a, float b){return(a<b? a: b);}
inline float fmax(float a, float b) {return(a>b? a: b); }

inline int imin(int a, int b) {return(a<b? a: b); }
20 inline int imax(int a, int b) {return(a>b? a: b); }

void do_lin( const Image rawJ/ input image const int known_fit,// 1 ==> char already fills 'DOX pdim by qdim.
25 Image des, // result: array of small integers FILE* param_fp,// parameter file filepointer // 0 ==> all parameters take default values const char* sname// filename, for informational messages Z~0~5~4 // provide "" if you can't do better )~

Pair* bl;

bl = new Pair[raw.x*raw.y];// safe; probably overes~im~te ~ of black pixels S int ibl = 0;
int pp, qq;
for (qq = 0; qq < raw.y; qq++) {
for (pp = 0; pp < raw.x; pp++) if (raw.pix[qq][pp]) bl[ibl].x = pp;
10 bl[ibl].y = qq;
ibl++;
) }

do_linl(raw.x, raw.y, bl, ibl, known_fit, des, param_fp, sname);
15 delete(bl);

void do_linl( const int pdim,// size of input array 20 const int qdimJ/ ..
const Pair* blJ/ input: list of black pixels const int nbl, // size of said list const int known_fit,// 1 ==> char already fills box pdim by qdim.
Image des, // result: array of small integers 25 FILE* param_fp,// parameter file filepointer // 0 ==> all parameters take default values const char* sname// filename, for infollllational messages // provide "" if you can't do better ){

2~S44 Fget(kernel, 2);t/ convolution kernel (units of PQ rows/cols) Iget(mingray, 3);// this or larger: return graylevel, else return zero Fget(fcorn, .7);// fatness corner Iget(info, O); // report miscellaneous information S Iget(inflate, 0);//1 ==> small chars can grow, fatness can change float pkern = kernel;
float qkern = kernel;

const int xO = O;
const int yO = O;
10 const float xmid = (des.x - xO) / 2.0;
const float ymid = (des.y - yO) / 2.0;

//1/~/~/1/1//~11111111 // find raw bounding box int pO, qO, p2, q2;
15 int ibl;
int pp, qq;
if (known_fit) {
pO = O; qO = O;
p2 = pdim; q2 = qdim;
20 } else {
pO = bl[O].x; qO = bl[O].y;
p2 = pO; q2 = qO;
for (ibl = l; ibl < nbl; ibl++) {
pp = bl[ibl].x; qq = bl[ibl].y;
25 pO = imin(pO, pp);
p2 = imax(p2, pp);
qO = imin(qO, qq);
q2 = imax(q2, qq);
}

30 p2++; q2++;
}

~/1111111111 2~0:~544 // calculate some moments:
float** xyflt = new_float(des.y, des.x);

// note that we are treating the pixels as BOXES of ink, not points // so the (0,0) pixel extends from (0,0) to (l-eps,l-eps) 5 // and has its center at (.5, .5) float pmid = (p0 + p2) / 2.0;
float qmid = (q0 + q2) / 2.0;

// but if we shift the middle half a bit, // we can pretend the (0,0) pixel is centered at (0,0) 10 float pmx = pmid - .5;
float qmx = qmid - .5;

float mpq = 0.;// PQ moment float mqq = 0.;// QQ moment for (ibl = 0; ibl < nbl; ibl++) {
15 pp = bl[ibl].x; qq = bl[ibl].y;
mpq += (qq - qmx)*(pp - pmx);
mqq += (qq - qmx)*(qq - qmx);

float theta = mpq / mqq;
20 // Note: since pixels are numbered from UPPER left, // positive theta corresponds to "
// negative theta corresponds to "/"

// (pt, qt) is the coordinate where (p,q) will go when the char is deskewed.
25 // This is not quite the same as (x,y) since the latter // has size changes as well.

// Calculate min and max horiz coor lin~tes, // measured relative to a line parallel to the sides of the parallelogram // [the reference line goes through (pmid, qmid) 30 // which is the middle of the raw PO rectangle]
#define pmap(p,q) (p-pmx - (q-qmx)*theta) float pt0 = pmap(bl[0].x, bl[0].y) ;// leftmost black pixel 2~)0~S44 float pt2 = ptO;// rightmost black pixel for (ibl = l; ibl < nbl; ibl++) {
float xxx = pmap(bl[ibl].x, bl[ibl].y);
ptO = fmin(ptO, xxx);
pt2 = fmax(pt2, xxx);

// The points we just calculated are centers of parallelogram boxes // Calculate how much the box sticks out from there.
pt2 += .5*fabs(theta) + .501;//.001 to catch roundoff errors ptO -= .5*fabs(theta) + .501;

float dsw = pt2 - ptO ;// deskewed width float kf = dsw / (p2-pO);// krunch factor // usually (but not always) less than 1 // (not used in further calculations) float conwid = dsw + pkern - 1.;// pretend wider to make room for convolution float conhgt = q2 - qO + qkern - 1;// and taller, also for convolution float qtmid = qmid + (qkern-l.) / 2.;// height of middle of char float ptmid = pmid + ptO + .S*conwid + theta*(qkern-1.)/2.;// p coord of middle of char float fat = conwid / conhgt;// fatness ratio of the parallelogram float dfat = des.x / (float) des.y;// desired fatness ratio float nfat = fat / dfat;// norm~li7ed fatness ratio if (info) fprintf(stdout, "%s w: %d, h: %d, th: %5.2f, dsw: %5.2f, kf: %5.2f, nfat: %5.2fO, sname, p2-pO, q2-qO, theta, dsw, kf, nfat);

111111111111111lllllllllllllllllllllllll // calculate the coefficients of the linear transformation:
float dOO, dOl, dO2, dlO, dl 1, dl2;

if (inflate) { // old "inflationary" scheme float fif = nfat > fcorn ?
l./nfat: l./fcorn;// fatness increasing factor dlO = 0.; // pure skew dl 1 = (des.y-yO) / conhgt;// make output char fill its box vertically - 31 - 2~ 544 dOO=dll *fif;
dOl = - dl 1 * fif * theta;
dO2 = xmid - dOO*ptmid - dOl*qtmid;// center point is fixed point dl2 = ymid - dlO*ptmid - dl l*qtmid;
5 }
else { // never inflate scheme float ygrow = (des.y-yO) / conhgt;
float xgrow = (des.x-xO) / conwid;
float dogrow = fmin(xgrow, ygrow);// grow as little as possible // i.e. shrink so BOTH fit dogrow = fmin(l.O, dogrow);// but NEVER really inflate dlO = O; // pure skew dl 1 = dogrow;// shrink y dOO = dogrow;// and x equally 15 dOl =-dogrow * theta;
dO2 = xmid - dOO*ptmid - dOl*qtmid;// center point is fixed point dl2 = ymid - dlO*ptmid - dl l*qtmid;
}

// make the output space all white 20 int xx, yy;
for (yy = yO; yy < des.y; yy++) {
for (xx = xO; xx < des.x; xx++) {
xyflt[yy][xx] = 0.;

}

25 }

// and start beque~shing blackness int spip = imax(l, int(lO*dl 1)),// spip == scans per input pixel float step = pkern / spip;
float offset = step / 2.;
30 int ii;
float fy, fq;
int iy;
float fxO, fx2;
int ixO, ix2;

~0~5~4 float rxO, rx2;

#define oops(X) {fprintf(stderr, "oops (X) %s %f %fO, sname, fxO, fx2); continue;~
for (ibl = O; ibl < nbl; ibl++) {// loop over black input pixels pp = bl[ibl].x; qq = bl[ibl].y;
S fxO = dOO*pp + dOl*qq+dO2;// start of scan line (in x space) fx2 = dOO*(pkern+pp) +dOl*qq+dO2;// end of scan line (..) ixO = int(fxO);// integer parts ix2 = int(fx2);
rxO = fxO - ixO;// rem~in~ers 10 rx2 = fx2 - ix2;
if (ixO < O) oops(l) ;// error checks & defence if (ixO >= des.x) oops(2);
if (ix2 < O) oops(3);
if (ix2 >= des.x) oops(4);
15 for (ii = 0; ii < spip; ii++){// loop over scan lines per input pixel fq = qq + offset + step*ii;
fy = dlO*pp + dl l*fq + dl2;
iy = (int) fy;
if (iy < O) oops(S);
20 if (iy >= des.y) oops(6);
xyflt[iy][ixO] -= rxO;
xyflt[iy][ix2] += rx2;
for (int jj = ixO; jj < ix2; jj++) {
xyflt[iy]W] += 1.0;
}

// clip the bottom off of the gray-scale // using questionable threshold scheme 30 // first, find blackest output pixel float zmax = O;
for (yy = yO; yy < des.y; yy++ ) {
for (xx = xO; xx < des.x; xx++ ) {
zmax = fmax(zmax, xyflt[yy][xx]);

20~S~4 /I then clip away....
for(yy = yO; yy c des.y; yy++ ) ~
S for(xx = xO; xx < des.x; xx++ ) {
float zz = xyflt[yy][xx];
int gr = int(9.9 * zz / zmax);
if(gr < minglay) gr = O;
des.pix[yy][xx] = gr;
10 }
}

,~a~
// put away all our toys:
delete2(xyflt);
15 }

Claims (13)

Claims:
1. A method for thinning an image composed of pixels arranged in an array, comprising:
a step of passing a plurality of templates, in parallel, over said image, and a step of unconditionally determining, for each template and based on a comparison between said template and said image, whether a selected portion of said image can be deleted from said image.
2. The method of claim 1 further comprising a step deleting portions of said image in accordance with said step of unconditionally determining.
3. The method of claim 1 wherein said step of passing a plurality of templates passes said templates over the image in steps that successively align said templates with different portions of said image.
4. The method of claim 1 wherein at least some of said templates forms an array of more than three pixels by three pixels.
5. A method for thinning an image composed of pixels arranged in an array, comprising:
a step of concurrently passing a plurality of templates over said image, and a step of unconditionally determining, for each template and based on a comparison between said template and said image, whether a selected portion of said image can be deleted from said image.
6. The method of claim 5 wherein said step of passing a plurality of templates passes said templates over the image in steps that successively align said templates with different portions of said image.
7, The method of claim 1 wherein at least one of said templates forms an array of five pixels by five pixels.
8. The method of claim 1 where said selected portion encompasses more than one pixel.
9. A method for thinning lines of an image composed of an array of pixels, comprising:
a first step of selecting a window of size kxk, where k is an integer, a step of applying thinning criteria to a portion of said image covered by said window to determine whether a core subportion of said image can be deleted, a step of deleting said core subportion when said step of applying thinning criteria indicate that said core subportion should be deleted, a step of reducing the size of said window by one when said step of applying thinning criteria indicate that said core subportion should not be deleted, a step of returning control to said step of applying thinning criteria when said step of reducing size yields a size of k greater than 2, and a step of selecting another window following said step of returning control and following said step of deleting.
10. The method of claim 9 wherein said core subportion is of size (k-2) x (k-2).
11. The method of claim 9 further including a step, following said step of selecting another window, of selecting another portion of said image to interact with said window in said step of applying a thinning criteria.
12. The method of claim 9 wherein said step of applying thinning criteria applies said thinning criteria to different portions of said image in parallel.
13. The method of claim 9 wherein said step of applying thinning criteria sequentially applies said thinning criteria to different portions of said image.
CA002002544A 1988-12-20 1989-11-08 Image skeletonization method Expired - Fee Related CA2002544C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US288,338 1981-07-30
US28833888A 1988-12-20 1988-12-20

Publications (2)

Publication Number Publication Date
CA2002544A1 CA2002544A1 (en) 1990-06-20
CA2002544C true CA2002544C (en) 1996-07-16

Family

ID=23106677

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002002544A Expired - Fee Related CA2002544C (en) 1988-12-20 1989-11-08 Image skeletonization method

Country Status (2)

Country Link
JP (1) JPH02219189A (en)
CA (1) CA2002544C (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5615032A (en) * 1979-07-16 1981-02-13 Mitsubishi Electric Corp Semiconductor device and manufacture thereof
JPS58169681A (en) * 1982-03-31 1983-10-06 Fujitsu Ltd Picture processing circuit
JPS62266678A (en) * 1986-05-14 1987-11-19 Sony Corp Image processor

Also Published As

Publication number Publication date
JPH02219189A (en) 1990-08-31
CA2002544A1 (en) 1990-06-20

Similar Documents

Publication Publication Date Title
US5224179A (en) Image skeletonization method
Gupta et al. An integrated architecture for recognition of totally unconstrained handwritten numerals
Poli Genetic programming for feature detection and image segmentation
Congedo et al. Segmentation of numeric strings
Chellapilla et al. Using machine learning to break visual human interaction proofs (HIPs)
Suen et al. Computer recognition of unconstrained handwritten numerals
Huang et al. Off-line signature verification based on geometric feature extraction and neural network classification
Viola et al. Robust real-time face detection
Huang et al. Off-line signature verification using structural feature correspondence
Saidane et al. Automatic scene text recognition using a convolutional neural network
Jackel et al. An application of neural net chips: Handwritten digit recognition
Bhattacharya et al. A hybrid scheme for handprinted numeral recognition based on a self-organizing network and MLP classifiers
Ahmed et al. A novel dataset for English-Arabic scene text recognition (EASTR)-42K and its evaluation using invariant feature extraction on detected extremal regions
Meignen et al. One application of neural networks for detection of defects using video data bases: identification of road distresses
Amin et al. Hand printed Arabic character recognition system
Martin Centered-object integrated segmentation and recognition of overlapping handprinted characters
CA2002544C (en) Image skeletonization method
CA2002542C (en) Imaged symbol classification
Amin et al. Hand-printed character recognition system using artificial neural networks
Chung et al. Handwritten character recognition by Fourier descriptors and neural network
Chiang et al. Recognition of handprinted numerals in VISA® card application forms
Khan Character segmentation heuristics for check amount verification
Nagabhushan et al. Modified region decomposition method and optimal depth decision tree in the recognition of non-uniform sized characters–An experimentation with Kannada characters
Gilewski et al. Education Aspects: Handwriting Recognition-Neural Networks-Fuzzy Logic
Hepp Application of back-propagation to the recognition of handwritten digits using morphologically derived features

Legal Events

Date Code Title Description
EEER Examination request
MKLA Lapsed