CA1256214A - Method and apparatus for vectorizing documents and for symbol recognition - Google Patents

Method and apparatus for vectorizing documents and for symbol recognition

Info

Publication number
CA1256214A
CA1256214A CA000451901A CA451901A CA1256214A CA 1256214 A CA1256214 A CA 1256214A CA 000451901 A CA000451901 A CA 000451901A CA 451901 A CA451901 A CA 451901A CA 1256214 A CA1256214 A CA 1256214A
Authority
CA
Canada
Prior art keywords
vector
data
scan line
coordinates
run
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
CA000451901A
Other languages
French (fr)
Inventor
Eugene A. Kleca
David N. Grover
Curtis A. Lipkie
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ANA Tech CORP
Original Assignee
ANA Tech CORP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ANA Tech CORP filed Critical ANA Tech CORP
Priority to CA000451901A priority Critical patent/CA1256214A/en
Application granted granted Critical
Publication of CA1256214A publication Critical patent/CA1256214A/en
Expired legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

Abstract An apparatus codes a scanned document locally representing each graphic element within a prescribed measure of accuracy by a trapezoidal approximation. The invention also includes a method for similarly coding a scanned document. The apparatus determines whether each scanned run is indicative of a .gamma.- or .lambda. -junction, the termination of an old, or the commencement of a new graphic element, and whether a new linear approximation is necessary. The invention in a preferred embodiment recognizes symbols by determining the center of mass and maximum extremity of a symbol candidate, and comparing it to a reference library after normalizing with respect to scale, orientation and center of mass. In a preferred embodiment, an adaptive threshold parameter governs coding so as to reject noise and optimize a pair of linear predicters in a small number of scans. In a further preferred embodiment the accuracy of the linear predicters is refined so that the error is exponentially bounded.

Description

~5~;2~
(1) METHOD AND APPARATUS FOR VE~T~RIZING DOCUMENTS
- __ _ AND SYMBOL RECt)GNITION

S ..
D criPtion This invention relates to device3 and methods for compactly converting planar drawing~ to coded dlgitized informatlon for subse~uent storage, printing and editing, f and to such de~ices used for symbol learning and recogni tion.

sackqround Art Digital coding of graphic lnforma~ion is commonly 15 called for in a wld~ variety of ~ontext~ from facsl- ' mile da~a tran~mi~ion to compute~ized photograph analy- f si~ and pattern recognition, to computer-aided design applications. The fir~t step in ~uch digitizing is to scan the document in a controlled fashion, measuring the 'l~
20 graphic value of the image at each point. Currently i-available scanning devices are capable of substantially ~.
simultaneou~ly delive~ing a binary output signal for each of n lines of resolution cell~ ~ each cell being approx~mately r 01 mm square. Thus a one metee long ~can llne of an engineering drawing for example would contain 105 such resolution cells; a single square centimeter would contain 10~ re301ution cells.
In pr~ctlce there i~ a high degree of repetition of graphical values in any lmage, and accordingly with the ex~reme volume of digitized graphical value3 produced by ~canning even the 8imple8t images ~ it is necessary to employ coding teehnl~ues, or, more colloquially, pattern recognizing techniques, to reduce the required volume of stored or tran~mltted dat~. The ~imple~t such technique is a one-dimen~ional lnformation compre~on, such a~
run-length ~ncoding, in which ~or each ~can line a ~tring length and starting coordina~e are coded only when the value of a string of con~ecutive resolutlon cell~

,.................................................................... ~

. . . , . ..: . ..

- - : ~ , .:, .
' - " . ' ~ , :..

~ 2) change~. Where, ~ indicated above~ the digitiæed infor-mation is ln the ~orm of rAster output data ~rom .Olmm resolution cell~, a typlcal 80 character alphabetic line might then be coded a~ ~pproximately 200 information slgnals for each 20~m long scan line, a reduction of 99 percent compared to the 2x104 bits of raw raster output data. When it is considered that a sheet of A4 paper contains 6X108 such resolution cellsf it can be seen that such a coding i~ still very cumbersome, requiring over a million informatlon signal~ to code a single page of bi-tonal writing, ~can line by scan line. This inefficiency i~ addres~ed in the prlor art by a number of techniques which look for broader patterns by correlating the run length compres~ed dat~ across a second dimension, typi-cally by comparing contiguous adjacent scan line data andcoding the difference.
Among such te~hniques are those shown in U.S.
Patents Nos. 3,937,871; 4~213,154; and 4,lB9,711.
Another technique of two dimensional encoding involve~
de~igning circuitry specifically to efficiently recognize particular kinds of patterns encountered in a :
fixed use. U.S. Patent No. 4,307,377; for example, shows a device which code~ narrow straight lines, and whi~h approximates narrow curved lines by a segmental approxi~
mation. That patent claims a 97 percent reduction in the amount of data required to be stored, although it i~ not clear what the ba~e technique for such a comparison i~.
Each of the above techniques, while offering a significant reduction in the amount of data required to be stored as compared to the raw raster output, ha~ its drawbacks. Typically the coding technlques which corre-late succes~ive scan line intercepts require the codinq of some data for each ~can llne intercept, and do not produce an output indexed to provide simple acces for editing or for addressing portions of the stored image.
: The method shown in U.S. Patent No. 4,307,377 avoid~
the6e problem~ ~or certain graphic elements, but does not -offer as signlficant a ~ata reduction in coding of image~
other than thin line~. As an example of the limitAtions of prioe art, when applied to a c~ocument contain~ng only a vertically oriented black ~sosceles triang].e, cen~er~d on the page, none of the foregoing coding technique would give an output as compact a~ the intuitive mathema-tical desc~iption compri~ing two linear equations and the top and bottom y-coordinates; nor would the coding output of ~uch prior art devices indicate that such a ~imple image was being scanned. What i~ needed is a single device which qulckly recogni2e~ patterns and compactly codes the information contained in general two dimen-sional drawing~ containing both l~ne drawings and ~haded image portions, and which develops an output u~eful for the extraction of higher level information such as writ-ten character~.
Brief ~e~cr~ption of the Invention The presen~ invention overcomes the foregolny limi~-ation~ by real-time processing of ra~ter scan information using data maintained in a ~mall memory describing e~ch signif~cant graphic elemen~, or vector, which ha~ been encountered in the preceeding M scan lines. In a pre-ferred embodlment, the ~vecto~ized" description in memory includes function~ of two coordinates~ su~h as oenter line and wldth, which locally represent the graphic element as a trapezoid. As the document is scanned, these vector descriptions in memory are con-tlnuously updated and the descriptive parameters refined~ When a vector ends, encounters a branch point, or when a new vectorization i~ required to represent the graphic element scanned, a compactly coded vector description is ~ent as an output, together with a code ~ignal ~ndicating the presence o~ a branch point with another vector if applicable, and the corresponding address in memory i~ freed up to accomodate another vec-tor. At each polnt during th~ real time proceQ~ing of raster data a number of parallel arithmetic logic units . -.
:. , - - ' ' :

: . ' - '-- , (4) compare a ~can line intercept or run with a vector in memory, and no additional coding is performed unless new information 1~ encountered. The entire coding process is done a~ a single pa~s operation, at a ~peed comparable to S the scan speed, yet putting out immensely compact data.
In the above cited éxample of a ~olid triangular image centered on the page, for exalmple, the device would, upon pas~ing M scan lines below the bottom of the triangle without encountering any continuing vector, cause to be produced as an output a code indicating the vector had terminated, the addres~ in memory of the vector parame-ters, and its la~t encountered left and right edge coor-dinates. The data in memory would lnclude the coordinates of the top of the trlangle, and two linear functions de~crlbing its center line and width. (The ending scan line number would automatically be registered on a common synchronizing bus for various components of the system.~ Thu~, fewer than ten integers would fully describe a figure which in the case ~iay of a three cen-timeter tall ~riangle wlth an area of 10 s~uare cen-`timeter~ would include 107 resolution cells spanning 3x103 scan lines~ For a more complica~ed figure, e.g., a region bounded by a simple closed curve, the output would contain similarly compres~ed data representing the flgure ~5 by numerical parameters giving a linear geometricapproximation by included trapezoids. ~n one preferred embodiment of the device, the apparatus includes a keyboard-s~lectable means of varying the integer para-meter M which determines the number of scan lines ~hich a nmissed" vector will be held in memory before it is recognized as texmina~ed and caused to be output. The preferred embodiment also includes a keyable thresholding means, in which the u~er select~ an upper and lower threshold limlt (MINTR0 and MAXTR0) for the dimensions of images the machine will recognlze as noise -e.g. line edge irregularlti~, ink ~pot~ or voids. The - . . : . . .
-, . - ~ . ' .' ' ' .

. ~ .

;L~5~21~L
(5) apparatus then varies an intermediate threshold testing parameter between the two limits as a function of the dimensional traits of the vector under consideration near a particular scan line, so as to optimize the parameterization of the vector within a small nurnber of scan lines, while rejecting as noise any information contained in runs smaller than the threshold level. In another embodiment, the apparatus sorts the vectorized images by dimensional and topological criteria to identify symbol candidates, deter-mines the center of mass and maximum extremity of the symbol candidate, and transposes the candidate to a standard orien-tation and scale. The normalized candidate is then recog-nized by comparison to a preselected or learned library of symbols.
According to a further broad aspect of the present invention there is provided an apparatus for coding a graphic image from raster digital signals defining light intensities of the graphic image along sequential scan lines. This apparatus comprises data compression means for extracting coordinates of a run of like-valued signals along a scan line. Correlating means is provided to correlate coordinates for a given run from the data compression means with a new or previously scanned vector. Vector memory means is provided for storing data pertaining to each vector with respect to which the correlating means has correlated run data. Processing means, operative on coordinates of the given run and on data read from the vector memory means related to a correlated vector,is provided for determining whether the given run's coordinates define a continuation of the correlated vector to within a prescribed degree of accuracy. Means is also provided for clearing the data related to the correlated vector from the vector memory means when the processing means determines that the given run's coordinates do not represent a continuation of the vector to within the prescribed degree of accuracy and for causing data corresponding to the correlated vector to be given as an output.

. , . - , .
': ;

-' - 5a -According to a still further broad aspect of the present invention, there is provided an apparatus for coding binary data representing sequential scan line signals indicative of portions of an image field.
This appara-tus comprises data compression means for extracting from such data coordinates descriptive of each run of consecutive like-valued signals in a scan line, such coordinates being indicative of the width and location on such scan line of the runs.
Processing means is operative on the coordinates for representing the scanned :image field invariably and entirely as a piecewise trapezoidal approximation, wherein (i) each of the two parallel sides of each trapezoid in the approximation is coincident with a scan line; (ii) the other two sides of each trape-zoid are not necessarily parallel to one another;
and (iii) the number of scan line lines between the two parallel sides of each trapezoid is determined directly from processing the coordinate data and is not necessarily an integral multiple of a fixed integer greater than one.
According to a still further broad aspect of the present invention, there is provided a method of creating a normalized representation of a symbol candidate appearing in a -two-dimensional image field.
The method comprises scanning the candidate in an ordered manner and assigning to the candidate a rep-resentation by coordinates along axes of a coordinate system. The coordinates correspond to the line and position values relating to the scan signals generated by the candidate. The center of mass of the repre-sentation of the candidate is then determined. A
point of the candidate at a maximal distance from its center of mass is also determined. The candidate is scaled with respect to its center of mass by a ., ~ . . . : .
.
.: : . ..

- 5b ~ J~

factor proportional to the reciprocal of the distance from the center of mass -to the point of maximum ex-tre-mity. The candidate is -then rotated around its center of mass so the segment determined by its cen-ter of mass and the point of maximum extremi-ty lies at a prescribed angular orien-tation with respect to the axes of -the cooxdinate system. The scaled rotated candidate is then compared to a library of stored representation of symbols and the one with which it has the greatest degree of overlap is determined.
According to a still further broad aspect of the present invention, there is provided a system for identifying, from a cluster of vector data obtained from scanning graphic information along a direction of scan, a symbol in a library of candidates for the symbol. The system comprises means for determining the center of mass of the cluster. Means is also provided for determining a point of the cluster at a maximal distance from its center of mass. Means is provided for scaling the cluster with respect to the center of mass by a factor proportional to the reciprocal of the distance from the center of mass to the point of maximum extremity, so as to normalize the cluster. Means is also provided for computing a representation of the cluster equivalent to rotating the cluster around the center of mass so that the segment determined by the center of mass and the point of maximum extremity lies at a prescribed angular orientation with respect to the direction of scan.
Means is also provided for comparing the scaled rotated candidate to a library of stored representation of symbols and determining the one with which it has the greatest degree of overlap.
:~

, - , . .
,: - . . : -,' ' '~ ' . . . .
. .

- 5c -Brief Description of the Drawings These and other features of the invention will be more clearly understood by reference to -the drawings.
Figure 1 is a block diagram of the operation of the present device.
Figure 2 shows the steps involved in the initial-ization processing.
Figure 3 is a block diagram of the processing apparatus showing major functional units.
Figure ~ is a flow diagram of the real time processing steps involved in coding a document.
Figures 5-12 are logic charts and flow diagrams indicating the precise method of coding as well as the method of thresholding, or hole filling, and pro-visional coding or missing vector processing, tech-niques employed in the present device.
Figures 13 and 1~ are tables showing the data - included in output records at the various processing and coding stages.
Figure 15 shows the steps of termination pro-cessing according to the present invention.
Figure 16 shows a center line output record of a vectorized image.

.
-. . .
: , ' ~ . : ' - '' ' : - ~ . ,.
.:
: .

.

~6) Figure 17 show~ symbol candidates i~entiied from the output record of Figure 16.
Figure 18 shows a symbol having center o~ mas~ and extremity in the standard orilentation.
Figure 19 show~ the same image in another oriLentation.
Figure 20 shows a symbol having ~wo extremities and illustrates the two library t:emplate~ against which it would be compared ~or recogniition.
Figure 21 shows the information maintained in the I, 3 and T~ registers in a preferred embodiment of the vectorizer.
These and other features of the present device will now be more fully explained.

Detailed De~ription of the Invention Referring now to Figure 1 there are indicated the three stages involved ln processing or "vectorizing~ a document according to the present invention. Generally, the present invention takes raster output data and concurrent-ly converts it into compactly coded ~ata representative oPraster scanned graphical elements depicted on engineering drawings or the liLke. The coding is performed by ~epre-senting each gxaphical element in an image as an approxi-mation including one or more trap~zoids, and storing the parameters of trapezoids so de~cribiny currently inter-cepted portions of the iLmage in a memory. Each trapezoid in the approximation of the image is referred to in this description and the ollowing claims as a "vector~
should be noted that, contrary to normal usage, the Nvectors~ refer~ed to herein have "width" as well a~
longitudinal orientation.) Input raster data related to a given small region 1~ then compared to memory data for nearby region~ and relevant information relating thereto is encodefl. The memory data, for each vector, iLnclude~
35 an ordered block of information derive~ from processlng prlor scan iLntercept~ of the related graphic element.
~h1B block 0f information is al50 sometimes called below .. :. . , , ~

: ~ ~ ' . . , - ' -~7) a ~vector~. The first ~tage in processing, labeled INITIALIZING PROCESSING ln Figure ly requires defining several pointer vectors, which will depend on the size of document scanned, a~ well as ordering the memory, and undertaking the initial hand-~haking proce-dures with the output proce~sor and the data compresslon unit~ The second major ~tage depicted in Figure 1 is the real time processing stage. During this ~tage, binary raster out-put data is fed, via a data compression interface unit9 to the processor. The proces~or then sequentially com-pares the incomlng segments of raster data with vec-torized information in its memory, developing an output when necessary~ When the processor determines that a given vector re3ulting from raster data is terminated at a given scan line, an ou~put is made; however a graphic element may contlnue as a single vector to the end of the document being scanned. Accordingly in a final st~ge, denoted TE~MINATION PROCESSING in Figure 1 the device forces output~ for tho~e vectors which have not already terminated, and ~ives appropriate termination signals to other units of the apparatus.
In Figure 2 ~re shown the step~ involved in the INI-TIALIZING PROC~SSING according to the present device. I
2 Generally, the use of the symbol l=] in the drawing~ i means "is replaced by~ or "is set equai ton, as is com-monly understood in computer system~ design. As used in the figures, V stands for the vector under consideration.
NV is a variable representing the next vector in the memory. S* denotes the current scan line number. The symbols R, ~ denote the right and left edge coordinates of a run. Where used with an I (or I+l) subscript (e,g.
RI) the edge coordinateA refer to the corresponding right or left edge data in the I ~or I+l) register of current input segment~. (The~e register~ are described ~elow.) A s~bscrlpt R2 refer~ to the edge data from the vector in the J-register, which will normally be the edge data from ....

' ' , ` ' :.
:

2~.~
(8~
a previous scan of the ~raphic element related to the I
or I~l register run. Finally the subscript R0 when used with the quantities C, E, or S denotes the value of the center, extent or scan lin~ number re~pectively of a memory vector at the scan line where that vector record commences; when urqed wlth the threshold function T, TRo denotes the la~t-computed value of the threshold function T, which ls updated each time the center and edge predic~or ~lopes are reiEined, as will be ~urther explained wlth reference to Figure 6. Re~urning now to Figure 2, as indic~ted ln the uppermost left hand box of Figure 2. Vector number 0 i~ defined as a zero width vector with right hand edge has coordinate ~ his is a ~entlnel vector with fictitious or impos~ible coor-dinates, used to designate the left hand side of the document. The next vector, vector number 1, is al~o a fictitious sentinel vector, and is deined in the third box down, a~ one having its left and right edges at 2 scanner resolution element number 65,000. Thls is a vec-tor of zero width located at the right hand edge of the document, the number 6~000 corresponding to the case of a document having a 65,000 resolution cell scan line width. The remainder of the steps shown in ~he left hand 25 column of Figure 2 amount to establishing a numerieal order of the addresses in the random access memoryO In one embodiment of the device currently operating, th~s memory accomodates 2 10 words, each word being lB8 bit~
long. It has been found that a memor~ of thi~ size i~
generally more then adequate for the processing of even very den~e engineering drawings. In the right hand column of Figure 2 are indicated the remaining steps of the lnitiali.zing processing. Namely, after the two ~en-tinel vectors V30 and V31 have been defined and the ran-dom access memory fully orflered, the device is set for~can line 2ero ~t the left hand side, and signals are output to prepare the output processor for receiving data and to initi.ate input of data from the data compression , ', .

*;~

box. ( ~
Turning now to Flgure 3 there i~ shown a block diagram of the vectorlzer according to the present invenW
tion, in which scanner lnput ls fed to a data compression circuit 31 which extracts, fl.om each run of like-valued bit~, ~he coordinate~ of the center point, C, the scalar extent E or h~lf-w~dth of the run, and the coordinates of the left edge L and rlght edge R of the run. Logically this lnformat~on i8 redundant:, inasmuch as the first two values may be ~omputed from t:he ~econd two, but as a practical matter the creation of an ordered 4~tuple of data values for each input run or ~egment can be done in real time at ~he input ~tage, and doing so avoids greater compl~xity in the later proce~sing. The orflered ~C,E,L,R) lnformation i5 fed to an INPUT BUFFER 32 which contains two register~ denoted I, I~l, which hold compressed (C,E,L,R~ data related to two consecutive input ~egments. The device processes the information in the input buffer by sequentially comparing coordinate~ of 2~ input segment~ to ~ corre~ponding nvector record" held in a vector memory 33 and temporarily loaded in~o a single register 34 called the J register.
A c~mmon mul~ibu3 structure 36 connects the variou~
registers to a plurality of arithmetic logic uni~s shown as 3~a 35e so a~ to deliver corresponding center or edge coordinates of ~egment~ being processed to the ALUs 35a-35e. The use of multiple ALUs and a multibus ~truc-ture togethe~ allow numerous arithmetic comparisons on different set~ of coordinate~ to be carr~ed out in
3~ parallel, so that relevant dimensional and topolog~cal data may be quickly a~certa~ned. ~he J register 34 a~
noted 1~ loaded at a given ti~e with a vector record from the memory 33. Each record ls an ordered 6tring of words derived fro~ the proces~ing of previous ~nput ~egments, and contain~ in addition to the edge coordinates inter-cepted by the mo~t re~ent scan lin~ a variable thre~hold .

:

.

~ 2 (103 number T~o, which governs processing parameterizatlon steps, the original ~can llne number SRo with which the record commences, and lineae predictor functions DE and DC from which the center po.int and half-width or extent S of the input may be predicted, The precise content~ of the vector record and the use of each coordinate or other parameter will be clearer w:Lth reference to Figures 7-12 and 21.
Referring now to Figure 4 there are shown the ~tep6 involved in REAL TIME PROCESSING. Initially the scan line is incremented by one, and a 16-bit binary word is output to the output processor indicating this increment.
The next input to the I register, which generally receive~ the center, extent and edge coordinates of the next scan line intercept from the data compression module, is loaded, and inasmuch a~ the I registers will initially contain ~rrelevant material such as the right hand sentinel vector or the left hand sentinel vector for the new scan line, the right edge is arbitrarily set to -~
. This flushes the I registers and prepares ~he pro-cessor for NEXT STAGE processing, shown in Figure 6.
The NEXT STAGE processlng 18 a sorting step, in which the informatlon in the I register is processed by : the parallel arlthmeti~ logic circuitry in an acceptably short time. A~ noted before, the coding process occur3 ln real time and must be accompli~hed at substantially the same rate a~ ~canner data is fed into ~he system. In NEXT STAGE processing, ~ gi~en vector intercept i~ deter-mined to be one of the following: a branch of a lamda junction, a trunk of a Y junction, a new segment, or a ~: simple overlap talso called a continuation) of an existing vector. In order for the coding to proceed without inordinate demands upon qtorage or memory, and without requiring delays or halts of the scanning opera-tion, the cleterlllin~tlon as to which one of the above four topological and geometric properties applies is made by r3 (11) performlng simple arithmetlcal tests upon the left and right edges of a vector from memory ~held in the J
regi~ter~ and an input record from the data c~mpre~or, held in the I or I~l register. These arithmetic com-parisons are done in p~rallel by four of the arithmetic logic units 35~a)-~e) shown as ALUl-ALU5 in Figure 3; the timing ~ignals from a t~min~ control unit causin~ the appropriate edge coordlnates to be automatically deli-vered to the ALU along bu~e~ for such processin~.
Turning now to Figure 5 which indicates the pro-cessing denoted N~XT I, it can be seen that the center, extent, left and right edge coordinates ln the I+l register are tran~ferred to the I regi~ter, and if there i~ an input available from the data compression unit, then the coordinates of the center, extent, and left and rlght edges of the next available input are loaded into the I+l regl~ter. In this manner there are always aYailable the C, E, L and R data for the next two inter~
cepts alon~ a scanned line for proces~in~ by the pro~
cessor. In additlon to these two records being processed, the processo~ has available in the J regi~ter a complete vector record, comprising 188 bit~ more or less of informaton as will be more ~pecifically ouSlined 2S below. Thi~ J regiBter record shown in Figure 21 18 the mo~t recently updated record for a particular vector in the main memory. Thls main memory record ~ontains pre-dictlve parameters for the vector under consideration as well as the scan line number where it first appeared and ~t~ last measured center, extent and left and right edges, all of which will be loaded into the J regi~ter at a given time for processing o~ input data from intercepts below it in She next subRequent scan line. Thi~ vector from main memory i~ referred to generally as a ~superad~acent vector" in relation to graphic intercep~
directly or contiguously below it and for which it may constltute a connected portion of a continuation or a Y

~ . ~
.

t`~ ~

(l2) or lambda member. The actlve vector ~egments in the memory are continuou~ly re-ordered in a forward link-list during processing ~o that the proces~or a~resses them in the order they are encountered, from left to right, ~o that there will be ~t mo~t two Wsuperadjacent~ vector~
for a glven intercepted segmlent.
Also shown in Figure 5 is the N~XT V proce~æing. In the3e proce~sing ~teps the preceding vector is stored~ V
ls set equal to the next vecltor in memory, which i8 loaded into the J-regi~ter. The quantity RSAVE, a tem-porary processing number equal to plu~ the right handcoordinate o~ the vector is 3aved, and the next vector NV
i8 loaded into the K register. (The quantity RSAVE 1 next used in a s~mple coordinate comparison to detec~
whe~her a hranch point h~s been scanne~.) Turning now to F~gure 6 there i~ shown the stage of processing de3ign~ted NEXT STAGE proce~sing, in which edge data for the vector, or vector intercept~ in the J, or I, an~ I+l register~ re~pectively are selectively com-2~ pared by four parallel arithmetic units and, according asthe respective arithmetlc inequali~ies are true or false, is fed into a sub~equent processing ~tage as indicated ~n the six boxe~ on the right hand side of Figure 6.
Inspection of the log~c di2gram of Eigure 6 reveals that for a given geometrical configuration, the arithmetic ~ompari~on te~ts outlined in the flow diagrams of the left hand column will lead to one and only one of the process boxe3 in the righ~ hand column of Figure 60 In partlcular the top left h~nd logic te~t indicate~ that only if the left ~dge of the I+l iQpUt iS not greater than the right edge coordinate RSAVE is the intercept repre~ented in the I~l reg~ter proce~sed and coded a~ an L-junction. The quantity RSAVE is defined to be ini-tially - ~ , and generally i~ e~ual to the right hand 35 edge of the vector in the J register plus ~ . Similarly going to the ~econd logic branch box, only if the left , - . . .. . . .
,., - - :

- , , - -.
.
-hand edge of the next vector to be loaded in the J
reg~ster is not gre~ter than the rlght han~ edge o~ the I
1ntercept vector plu~ ~ ; and it has passed the pre-ceeding test does the I register intercept get proce~ed as the trunk of a Y junction. In the preferred embodi~
ment of the invention, ~inc~e only the current left and right edge coordinate~ of the next vector are needed for branching te~ts~ these coordinates are loaded into ~mall register, the ~ register, to be available for thi~
proces~ing. The contents of the R register are shown in Figure 21.
~ ontinuing down the logic diagram of Figure 6, only when having pa~ed the preceeding two tests, if the left edge o the ~1 intercept is greater than the right edge of the K regi~ter vector plus ~ ,does the apparatu~
recogni2e that an expected intercept below the v+l vector was not encountered at all and accordingly the ~MISSING
: VECTOR~ proce~sing comes lnto play to provisionally main-tain the memory addressing of vectors and predictive ~ata while ascertaining whether the J vector has in fact ter-minated or whether an unintended void in the drawing was ~canned. Continuing down ~he left hand side of the logic diagram of Figure 6, lf a scan line intercept held in the I~l register has pas~ed the first two tests with loqic 1 and the third w1th logic O and al~o satisfies the ine-quality that` the left edge o the R register vector i~
greater than the right hand edge of the intercept ln the I+l register plu5 ~ , the I~l input is determined to be a NEW S~GMENT and is proce~sed accordingly. ~inally, if the next vector equal~ 1, that i~ the sentinel vector of width O centered on the right hand edge of the page as deflned in the initializing processing, then the scan line has ended and signal~ are generated to ini~iate the next scan. Otherwi~e having pas~ed through the logi~al diagram at all five 3tep~, the input under consideration i9 a SIMPLE OVERLAP (or continuat10n) of ~he vector .
:

'. . ' ' - :

(14) corresponding to it on preceeding scan lines, and will be processed a~ a SIMPLE OVERLAP or continuation vector~
The proce~slng ~ppropriate for each one of these six determination~ hown in Figures 7 to 12 only one of whlch processing stages will be activated for a g~ven input.
Turning now to ~igure 7 there are shown the pro-ces3ing step~ for the scan line inter¢epts from regl~ter I+l when the ~rithmetic comparison tests of Figure 6 have determined the input to be a lambda-JUNCTION. In ~his event, the extent of the input i first checked ~o be greater than a pre~et minimum extent.
The term ~e~en~n, repre~ente~ by E with an iden-tifying ~ubscript, is used throughout this disclosure to 15 refer to one-half of the width of a run. A roundlng-off ~1 convention ih arbitrarily e~tablished to assure integral coordinates for the center polnt and the extent of a run. 1.
The scalar quantity MINEXT of Figure 7 i~ a preselected numbex which act~ ~8 a threshold for screening out cer-taln kinds of noi~e that would otherwise be proces~ed as b~anch point indicatlng data. For example certa~ n graph~c reproduction proces~e~ create microscopic ~treamer line~ of dark density along a dark edge. Such nol~e may be ellminated from the further level processing steps. The choice of MINEXT may also be dictated by the knowledge that a drawing contains vnly, e~., llnes of a certain minimal thicknessO A suitable choice on MINEX~
~creens out all les~er-dimensional features, so that pro cessing tlme will not be used to process such noise.
If it does not ~atisfy this threshold requirement it is dlsregarded, the next input is loaded and next ~tage processing i~ again commenced. On the other hand, if the extent of the 1~1 intercept i8 larger than MINEXT then the device determines wh~ther the present s~an line i~
the first ~can line on which thls vect~r appeared. If not, an out]put contlnu~tion vector record is produced, : - : .. .

... .
'' : ~ ' ' ' ' '' ~
. : :
. .

(15) whereby the de~cription of the vector as far down a~ the branch point 1~ senk as an output ~typical output records are shown in detail in Figures 14-15), and the new vector data for the left leg of the lamda replaces the former vector data in the memory. The ~uantities C, E and S in the J register are temporariLly saved as they will form part of the L-junctlon OUtpllt record and the vector in the J register is glven new C, E and S data descriptlve of the branch of ~he L-junct:ion being processed. The proce~sor then takes the next input, gets a free vector address, and uses lt to reorder the vectors in memory, in such way as to ~tart a new vector compri~ing the right leg of the lamda junction a~ the next sequentially addressed vector after the left leg, which has been coded a~ a con~inuatlon of ~ts top part~ An L junctlon recor~, Figure 14 r is generated identifying the top ~ector and it~ relation to the right leg of the junction. During both the entering in memory of the left leg an~ of the right leg, the parametee ~MISSED~ is set to zero and the parameter T~o iS ~et to MIN~RO, as i5 done initially when commencing any new vector parametri~ation. The signifi-cance of this latter step will be discussed later in con-nection with Figures 11 and 12.
Turning now to Figure 8, there is shown a block diagram of the processing undertaken in the event that the arithmetic comparison testing of Figuee 6 indicate~ a given intercept to be a Y-JUNCTION. Initially there is tested whether the present scan line is in fact the fir~t scan line for the vector. If it is not, a continuation vector record is sent a~ an output, as indicated in Figure 13 and the C, E~ S, TRO and MISSED data are put into the J regi~ter a~ a continuation of the upper left branch of the Y, thereby acqulring its memory address.
Thi~ data is put back In main memory, an output Y- -junction record is produced and the vector~ are reshuffled so that the former right branch of the Y-.

, .:
, , .
, , ' , ~,, ':~
, .
~, ~

.

~ L~2~(16) ~unctlon is no longer in memory and exi~ting vector records are again consecutively ~ddressed. The proces~or then returns to NEXT STAGE processing of Figure 6.
Turning now to Flgure 9 there is ~hown in the alter-native the processing which occurs in the event that the arithmetlc testing of Figurle ~ indicates the given inter-cept to be a SIMPLE OVERLAP ~or continuation) of a pre-viously encountered superadjacent vector. As used throughout thi~ disclosure the term "superadjacent"
refers to an unterminated vector encountered in a pre-viou~ scan llne and which i'3 located so that its width to some extent overlaps or is next to the given intercept in x coordinates. A~ indicated ln Figure 9 SIMPLE OVERLAP
processing commence~ by loading the next input, getting the next vector from memory, and undertaking the hole-filling proces~ing (explainefl below in relation to Figure 121. Whereas the lambda- and Y-junction tests each involve extracting information related to the topological connectivity of different segments, the SIMPLE OVER~AP
processing i~ the major informat~on compressing step of vectorization and involves creating and reflning a p1ec~-wi~e l~near approximation to the image represented by the con~ecutive ~can line intercept~. The parame~er DELS 18 set to be the number of ~can line~ betweem the present scan line and the s~an line where the vector was ini-t~ally encountered. The numbers DC and DE are the slopes respectively of linear function~ representing the center ~n~ the extent. A~ noted previou~ly a para~eter TRo~ ~he ~hreshold parameter, is ~et which indicates the degree of error which ~ill be permi~sible in the vectorlzed approx~mation of the lmage scanned. In the SIMPLE
OVERLAP processing step, the proce~sor initlally deter-mines whether the extent o the intercept stored:in ~he I
register dlffers from the extent predicted from memory by more than TRo~2, and if not whether the center of the I
register intercept dlffer3 from the predicted center by .
-- ~ . . . .
:: - . . .. ~ . .
'`~.'. ".. ` ' . ,- . ~ . - , . ` . ' - , ' ' :, ' ` ' ' .
. :

.

(17) more then TRo. In either case, if ~7reater than two scan lines have elapsed ~lnce the vector was f irst encoun-tered, the vector continuatlon record will be sent a~ an output and a new vector approximation will be commenced 5 usi ng the currerlt data, with the new center, extent ~ and scan line a~ the relevant vector data, with the parame-ter~ TRo set to the minimum ~nd MISSED set to 0 a~ will be explained in regard to Figure~ 11 and 12.
Still in reference to Figure 9, in the event that both the cent~r and edge pr~ediction3 are within the thre~hold amounts requlred, the device performs an addi-tional arithmetic check to determine whether the current scan line i8 2n ~n an integer) scan lines down from where the vector wa8 initially encountered. I not~ the current left and right e~ges are simply entered in the J
register, the p~rameter MISSED i~ set to 0 and the apparatu~ proceed~ to the NEXT STAGE processing. On the other hand, lf th~ number of el~p~ed scan line~ since commencement of the vector is a power of 2, then the lower left hand branch of the processing ~iagram is implemented and th~ parameters DC and DE are updated using present scan line data, to give a new estimate of the slope of the center line and the slope o~ the extent function. It wlll be appreciated that since th~Y refine-2S ment of parameters occur~ after a power of 2 number of scan line~, and when the edge and center predicter~ havealready been determined to be within the respective threshold parameters TRo in absolute value, the linear approximation to the vectorized shape of the underlying image is refined ~o that when a vector continues ~o appear for an increa~ing numbçr of ~can llnes an~ the previou~ly vectorlzed approximation still hold~ good, the linear preclictors are exponentially bounded and give a smaller andl smaller error from the actual image. ~lnally 35 note that in additlon to updating the center and extent predictors, the par~meter TRo i~ updated in accordance . - -.
~ " ' ' '; ' ' ' , '' ` ~ :-.

`~ ~

(18) with the formula ~hown in the third box o~ the lower left hand branch of the proce~sing block diagram.
Specifically, when the half width of the I register intercept is less then MAXTRO but greatex than MINTROt then the refinement of parameters will set TRO equal to this half width. Thus, the threshold parameter i~ adap-tive, 50 that when the graphic features being processed are less then twice the width of MAXTRO, a closer ~egree of accuracy will be requlred of the estimationsO Thi~
allows the proc~s~or to very quickly code information encountered in thick images and yet to more delicately catch the detail of fine graphic images, and to more quickly establi~h its predictive parameters when com-mencing to code a new vectorO Finally we note, again referring to Figure 9, that in any case, whether the ~lope parameters are updated at a power of two scan line, whether the existing data feeds s~raight through at a non power of two ~can line, or whether the edge or center predicter has failed the threshola accuracy te~t and a con~inuation VeCtQr iB sent as an output, the left and right edge data in the 3 register have substituted in them ~he coordinates of the current left and right edges, and the value of the parameter MISSED is set to O before processing control shi~ts to the next stage and the J
: 25 register data i~ re-entered in memory.
Turning now to Figure lO, there are shown the pro-cessing step~ undertaken in the event the arithmetic sorting proce~ of Flgure 6 indicates the presence of a : NEW SEGMENT. In that event, a~ indicated in Figure lO, - 30 if the width of the in~ut in the I~l register is les~
than a specified minimum extent, the input is disre-garded~ the next input loaded~ and the processing shown in Figure 6 reco~menced. Otherwise, the next input is loaded, ancl memory control is signaled to provide a free memory addres~ for a new vector. If no memory addre~s i9 available, as ln the other processing steps wherein some . . . , ~ .

.
~ . . ~ . . . -- -: .
, (19) shuffling of data in memory must be accomplished, the device halts. Otherwlse the address pointers of the free v~tor and the next vector are rearranged, and the center, edge, threshold, current scan l~ne number, and right and left edge~ of the new ~egment are placed in the register wlth ~he parame~er MISSED set to 0. An output record for new ~egment is produced as in Figure 13 and the device proceed~ to the nlext stage. It will be recalled that ~nitial steps of the NEXT STAGE proce~sing, Figure 6, wlll retuxn the J regl~ter contents to memory and proceed to ~ort out the next 1nputs according to the basic topological and geometric primitives of Figure~
7-l0 Turning now to Figure ll there are shown the opera-tive steps of MISSING VECTOR processing. A given vector is retained in main memory when a continuation input fails to appear for less then M sca~ lines. The para-meter MISSED, lt will be recalled, comprises one of the 12 words of data constituting the vector description carried in main memory ~s discussed below in eonnection with Figure 21~ ~his parameter i s updated hy proces~ing de~cribed presently. Provided that MISSE~ is less than a preselected number MAXMIS, the main memory vector~ed description 1~ reta~ned and processing continues a~
usual, with the J-regi~ter edge and center predicters updated before returning to the memory as shown in the left branch of ~he processing diagram, Figure ll.
However, when the number of missed lines is incremented by one and become~ greater than the preset parameter MAXMIS ~thereby indica~lng that the given vector segment has lndeed terminated), an output record designating the end segment i~ generated as shown in Figure 13.
Thereupon the addres3 of thi~ vector is returned to the top of the free v~ctor 11st in the memory controller, and ~5 the address pointer~ arran~ed accordingly, whereupon the processing proceeds to the NEXT STAGE. In this manner, ': ' ' -.

(20) when a predicted scan llne intercept of a vector fails to appear, no new information i8 coded beyond the augmen-tation of the parameter MISSED by one unit, and no infor-mati.on is lost in reliance on such faulty data. In the event that after failing to appear for MAXMIS scan lines the vector is determined to have terminate~, all o~ the provisional data entered in the preceeding MAXMIS scan - lines ls deleted and the output record goes back to the last actual scan llne reading, as noted in the output recorA de~cription, Figure 13.
Turning now ~.o Figure 12, there is shown the pr~-cessing operation o HOLE FILLING. This step i8 only required during the proces~ing of a SIMPL~. OVERLAP, and operates on new inputs, I, I~l, to delete runs of length less then TR~ occuring in places where a run of another : image value i~ predicted. Specifically, if the left edge of the I+l input i~ less than the right edge in the J
register of the corresponding vector, and if the left edge of the I+l input minus the right edge in the I input 20 ~s less than the threshold TRo, then the left edge coor-dinate in the I register i~ saved, the C, E and R con-tents of the I+l register 1~ d into the I regiYter, and the next C, ~, L and R coordinates from the data compre~sion module are loaded into the I~l register. The '~ center and ex~ent of the I regi~ter are redefined in accordance with L~ and the new ed~e data RI . In th 13 manner hole3 of dimension less than TRo are automatically filledO A feed~ack loop then returns the device to the sturt of the hole illing operation, and it again undergoes the fir~t ~wo tests. Thi~ will result in filling of additional hole~, in the event that a number of ink spot~ or void~ occur ih close proximity wi~hin a 801e vector.
It will be ~een that the foregoing processing steps result in the direct and str~ightforward coding of geometric ,and dlmen~ional data for graphic images in real :. ~ ' ............. , . . .
: ~ , ~. . .

- f~ ~

(21) time for a ~canner, with output data generated at each point where signiflcant topological or dimensional infor-mation comprising new data is encountered. In Figures 13 and 14 are shown output records generated at these times.
In particular as appears in Figure 14, when a branch point such a~ a Y-(or lambda-) junction is encountered and the output record ~or the terminated right branch of the Y ~or new right leg of l:he lambda) ls generated, the f record includes the addres~ VMI of the top left (or ~op) branch as well as the addre~;s V of the portion scanned.
The output record ln the ca~ie of a Y- junction includes l.
the left and right edge coordinates of the upper portion a~ well as the left and right edge coordin~te~ from the I-register of th~ lower right portion. For a right leg of a lambda-~unction, the record includes the scan jump, center and exent data for the upper portion. Thi~ ~ata, d~noted CJ, EJ and DELS for simpl~city, comprises the quantities SAVCRO, SAVERO and DELS deined in Figure 7.
The particular choice of output records contains suf- l 20 fic~ent data regarding the topological and dimensional l;
changes encountered ~o that by ff~uitably programming an r output proce~sing devlce, it ls possible to quickly print or access and di~play images generated by, e.g., a con-nected sub-unit deplcted within a g~fven drawing, or, for 25 instance, the set of all electrical circuit units con- I
necting to a 91fven component in an electrical schematic f when analyzed according to this method.
A further point which has not been addressed in the foregoing di~cussion ls the operation of the devire in 3D termination processing a~ indicated in Figure 1. Figure 15 show~ the ~tepcff involved in such proces~lnq. It will - be appreciated that because the graphic elements encoun-tered in a glven scan line are vectorized and stored in memory, and there ia no output oE such data until a 35 slgn~ficant change hss occurred requ~ring a new vec-torization of ~ub~equent continuations or branches, it i5 1.

' ' ' ' , .,' ' . ' ' , . . ~' : . :
'. '~, , ~ ' ' ' '' . , - ' ' , .. . .
.. , . , ~

i6;~
t22) possible for a document to reach its last scan without having generated an out.put record indicative of a graphic element which, for instance, has continued a~ predicted, or which qulte simply has not terminated by the last scan line. In thi~ event a termination processing mode i3 provided, which recogn~zes the last scan line by an externally triggered signal related to the document size, and proceed~ to cau~e an output of an end-of-segm,ent record for each of the vectors then left in memory. ~hen l,~a the scan line i8 ended, tha,t is when NV=l, the device then outputs appropriate signal~ to indicate the end of the coding operation and to terminate communication with ~he output processor and halt ~he apparatus.
It will be appre,ciated that the op,eration of the foregoing coding devlce utllizes a relat.ively small memory for 188 bit words which can be addres~ed and fed ~nto the J reg~ster by appropriate signals for proces~ing in relation to the ~can data being fed into the I and I~l register~. The variou~ edge and center tests necess~ry for character~zing the current input and updating ve~tor memory data are conducted in a straightforward way ~y parallel arithmetic logic un~ts. ~hus, although ~he u~e of parallel proceRsin~ in thls context constitutes part of the within invention, the de~ign of arithmetic pro-cessing units i~ known to those versed in the art.
It will now be seen that the coding operations per-formed by the present invention produce a piecewise trapezoidal approxim~tion of each two-dimensional graphic element by extracting the graphic information ~n sequen-tial scans and parameterizing small trapezoids ~vectors)which locally approximate the graphic element.
For each graphic element, the memory is managed in such a way as to contain a block of dat~ descrip~ive of a small trapezoi,d, which approximates the currently scanned region of the graphic element. Coding is performedr ~nd processed dla~a i8 updated or cau~ed to be output, only , ': ' ' ' .
, , . : .

: ;

(~3) when one o~ flve inform~tion containing event~ (Y-or lambda-branch, new vector, continuation requirlng a fresh trapezoid, or end of vector~ i9 encountered. These event~, which are the 1nformation-theoretic primltlve~ of the present invention, ~re mutually dl~tinct and are complete in the ~en~e that any run on a scan 11ne (which : i~ not noise, or which is not already accurately pre-dict~d by the vector data for the region above it~ must constitute one of these events. Moreover the flr~t four event~ may be determined by simple arithmetic com-parisons, and the fifth by default, so that it is a straightforward matter for a person versed in the art to design four arithmetic logic units which collectively and in p~rallel wlll determlne which of the ~events~ charac-te~ize~ a given run.
~ In one embodiment the apparatus also perform~ ~ymbol : recognition on scanned documents in a manner explainedbelow, using the vectorized repre3entation to analyze the scanned symbol c~nd~dates, as will now be described.
2~ After ~ document has been fully coded the coded data may be used to reproduce a ' f~csimile document, which will be identical, except possibly for runs of up to MAXT~0 resolution cell~ which have been filled or s~oothed out. I
Alternatively an output printer may be activated using 25 the center 1~ ne parameters (rather than the full twodimen~ional set of parameters~, to print out a pseudo-center image of the document. Such a pseudocenter image includes the center line (or piecewise linear curve) of the graphic element. In Figure 16 is shown an image of the center llne~ 161 of a vectorized image such as a ~haded topographical m~p.
In thi~ form, whether printed or as raw output, it i8 possible to identify ma~or clustersr which are possible symbols, for subsequent machine analysi~. Such clusters mcly, for instance, be identified in terms of . . , : :

~lze and connectlvity. For example, any i sola ted graph ic element whlch together with all vectors connected thereto i~ between 1 cm. ~nd 3 cm~ in diameter would be a good candidate for recognition as an alphanumeric symbol.
S~milar criteria may be establi~he~ in so~tware to quickly identify ~mall libraries of symbols for a given use- e.g. electronic components in circuit diagram; beam ~ections and flttings in structural engineerlng drawings.
Figure 17 shows the symhol candidates 171 a-d 50 extracted fro~ the center line im~ge of Figure 16.
Once a major cluster or a symbol candidate C ha3 been i~olated, the app~ratu~ according to the present ~nvention calculate~ the center of mass of C from the vectorized representation of C. Specificall~ a local coordinate system i8 e~tabli~hed and for each tr~pezoid Ti included in the vector representation of C there are calculated a mass Mi e~ual to the area of the trapezoid, and coordinates ~Xi, Yi) equal to the center of mas~ of Ti computed as though Ti wer~ compo~ed o~ a ~heet of 20 material of un~orm den~ity. The mass M of symhol C is set equal to ~ Mi and the center of mass CM is ~hen com-puted from the first moment~ o~ the sy~tem of trapezoid~
around the X and Y-axe~.
Ju~t as in the mathema~ical description of a phy~
25 cal system in classi~al mechanics, the first moment around the X axi~ is defined a~ M~ ~ MiYi and the corresponding fir~t moment around the Y-axis is My MiXi. The centee of ma~s CM of C then has coordinates (X~ Y) =~
This center of ma~s point (X, Y), ha~ the property that the sumbol C may be scaled with re3pect to (X,Y) ~nd the re~ultant ~ymbol will havé the same center of mass (X,Y). Also the area (or mass M) of the symbol C i~
invariant with re~pect to rotation of C around the point ~X,Y). [This latt~r property would of cour~e hold for any point (K,Y)~l Figure~ 18, 19 show a symhol in two different angular orientation~ about its center of ma~s.
i : , , . . :
:: - - . : .
. ~ , .
- . .

2~
~25) ~ aving thu~ established a fixed point (X,Y) for the candld~te C under con~lder~tion, the apparat~ next determine~ the polnt of C which i8 at a maximum dlstance from IX,Y). There may be a unique such extrem~t~, as for the symbol "h~, ~everal ~uch points~ as for the ~ymbol ~X~, or even a large number of such points, as for a cir-cular "on. ~enerally, however, there will be a small finite number~ in any case the machine simply choo~es one extreme point. It will be ,appreclated that, a~ a con-sequence of elementary geomletric considerations, the point of extremity of ~ plecewise linear symbol candidatemust be one of the vertices of the candidate, and a~cor-dingly for a vectorized image the point of extremity may be a~certained by slmply examining the existing output data, computing the dlstance of each vertex from the center of mass, and taking the maximum one.
At this point it is nece~ary to dlstinguish between learning a symbol and recogniz~ng a symbol. The machine ~learns" a symbol in a fixed orientation as follow~, For ~- 20 a given ~ymbol C o~ mass M, center of mass CM - (X,Y) and with a principal e~tremity (Xe ~ Ye ) the machine ~ores a vectorized lmage or optical ma~k of the ~ymbol rotated and shifted so that the line segment determlned by the points tX,Y~, (X~ , Ye ) lies in a standard position~ along 25 the x-axi~ with (X,Y) shlfted to the or~gin.
In Figure 1~ i8 ~hown a symbol 180 aligned with its center of mass 182 and lt~ extremity 183 along the X~axis - 181. In Figure 19 i8 ~hown a symbol 190 isomorphic to that of Figure lB in ~ different orientation. In Figure 30 19 the symbol 190 i8 ~hown h~ving center of mass 192 and extremity 193. The center of mass 192 is placed at the orlgin, and th~ line segment 195 defined by endpoints 192 and 193 lies at an angle 194 ~ith respect to the X-axl~
191 .
In a preferred embodiment, for a ~ymbol having ... .

: , - - .

.- , .
. . - . . :
~ ' -:

(26) multiple extremlties, the machine al~o nlearn~" the image of the symbol as determlned by each o the other extreml-ties. Each of these learned ~ymbols is normalized, e.g.
by sc~ling 80 that ~
Figures 20 A~C ahow a ~ymbol 200 having center of mas.~ 202 and two extremities 203, 203'. In this situation the 3tored library of symbols would include two templates 210 and ~20 representative of the symbol when it is shifted lnto the normal position with it3 center-extremity ~egmen~ allgned with the x-axis.
Having thu~ est~blished a library of learned sym-bol~, in order to ~recognize" a newly scanned symbol C', the machine take~ the ~ymbol can~idate, Ci and determine~
its center of ma~s CM' and extremity ~', rotating it degrees into ~tandard posltion. The candidate C' ~ then normalized and 18 compared to the library of learned sym-bols to determine the degree of overlap. This cvmparison which in the preferred embodiment is performed numeri-cally (as i~ the rotation into standard form) amount~ to measuring the degree of coincidence of the ima~es of C
and C' where C i~ a learned ~ymbol fro~ the library lCj~
of ~tored symbol imAges. Thi~ re~ults in a sequence of measure~ ~ X~1: (X~ ~ 1) of the degree of colncidence between the candidate C' anfl e~h learned symbol Cj in the library of stored ima~es. The candidate C9 iS
~recognized" a~ a particular ~ymbol Cm . when Xm ~ Xj and X~ is sufficlently laLge, ~ay Xm ~ T, a recogn~ion threshold. The number ~ may be approximately .85-.9 for recognition of mechanically formed symbols having a defi- 1.
nite type fontO In thi~ case it has also proven useful to requir2 as ~ condition of recognition that the number min ~Xm - Xj~ k, where k is a ~mall number, e.g. k~.2, chosen to as~ure that the coincidence between C' ~nd Cm is ~ignific~ntly better than between C' and 35 each other C ~ ln the llbrary. In certaln .
' .
: . ',: . , -:' (27) in~tances, as where different ~ymbols are distingui~hedonly by their or1entation (such as "p" and "dn, or "6~
and "9~ it may also be nece~sary in a library containing two such symbols to al80 store and match the angle 194 a~
a criterion oP the recognition process.

1, .

... , . : ~ : : : -.
-.

.

Claims (22)

The embodiments of the invention in which an exclusive property or privilege is claimed are defined as follows:
1. An apparatus for coding a graphic image from raster digital signals defining light intensities of the graphic image along sequential scan lines, such apparatus comprising:
data compression means for extracting coordinates of a run of like-valued signals along a scan line;
correlating means for correlating coordinates for a given run from the data compression means with a new or previously scanned vector;
vector memory means for storing data pertaining to each vector with respect to which the correlating means has correlated run data;
processing means, operative on coordinates of the given run and on data read from the vector memory means related to a correlated vector, for determining whether the given run's coordinates define a continuation of the correlated vector to within a prescribed degree of accuracy;
and means for clearing the data related to the correlated vector from the vector memory means when the processing means determines that the given run's coordinates do not represent a continuation of the vector to within the prescribed degree of accuracy and for causing data corres-ponding to the correlated vector to be given as an output.
2. An apparatus according to claim 1, wherein each of the vectors is a trapezoid, and wherein (i) each of the two parallel sides of each trapezoid in the approximation is coincident with a scan line, (ii) the other two sides of each trapezoid are not necessarily parallel to one another, and (iii) the number of scan line lines between the two parallel sides of each trapezoid is determined directly from processing the coordinate data and is not necessarily an integral multiple of a fixed integer greater than one.
3. An apparatus according to claim 2 further including means operative on the output data for recognizing symbol candidates among scanned graphic images; and means for determining the center of mass of each symbol candidate.
4. An apparatus according to claim 3 further including means for determining the point of maximum extent of a given symbol candidate and its distance from the center of mass.
5. An apparatus according to claim 4 further including a reference library of symbols; and means for comparing a given symbol candidate with the symbols of the library and determining their degree of similarity.
6. An apparatus according to claim 5, wherein the reference library includes symbols normalized by size and orientation, and wherein the means for comparing includes means for normalizing the size and orientation of the given symbol candidate and means for measuring the degree of overlap with symbols of the library.
7. An apparatus according to claim 6 wherein the means for normalizing includes means for transposing a representation of the symbol candidate so that its center of mass lies at the origin of a coordinate system, and also includes means for scaling so that the extremity of the transposed symbol lies at a fixed point.
8. An apparatus according to cliam 7 wherein the symbol candidate is identified as a given symbol from the library if its measured overlap with such symbol is greater than its overlap with other symbols of the library and is larger than a preselected quantity k.
9. An apparatus according to claim 2 further including means for determining the prescribed degree of accuracy as a variable function of the width of the graphic element being scanned by the apparatus.
10. An apparatus for coding binary data representing sequential scan line signals indicative of portions of an image field, such apparatus comprising:
data compression means for extracting from such data coordinates descriptive of each run of consecutive like-valued signals in a scan line, such coordinates being indicative of the width and location on such scan line of the runs; and processing means operative on the coordinates for representing the scanned image field invariably and entirely as a piecewise trapezoidal approximation, wherein (i) each of the two parallel sides of each trapezoid in the approxima-tion is coincident with a scan line; (ii) the other two sides of each trapezoid are not necessarily parallel to one another; and (iii) the number of scan line lines between the two parallel sides of each trapezoid is determined directly from processing the coordinate data and is not necessarily an integral multiple of a fixed integer greater than one.
11. An apparatus according to claim 10 wherein the piecewise trapezoidal approximation comprises a plurality of blocks of coded information signals, each such block being representative within a prescribed measure of accuracy of values of the coordinates of the runs related to a corres-ponding graphic element over a preceding number of scan lines.
12. An apparatus according to claim 11 wherein the prescribed measure of accuracy lies between two preselected values.
13. An apparatus according to claim 12 wherein the prescribed measure of accuracy is a variable function, and wherein the processing means further includes means for determining the value of the measure of accuracy, at a given scan line for a given graphic element, as a function of the width of the graphic element.
14. An apparatus for coding data representing sequen-tial scan line signals defining light intensities of a graphic image, such apparatus comprising:
data compression means for extracting coordinates of a run of consecutive like-valued signals along a scan line, such coordinates being indicative of the width and location on such scan line of the run;
memory means for storing at a given time blocks of processed data, each block related to a single vector which has been previously scanned, and such data including for each vector, a coordinate pair descriptive of the width and location of the vector at an initial scan line on which the vector first occurs and a coordinate pair descriptive of the width and location of the vector at the scan line on which data pertaining to the vector was last obtained;
slope calculation means, for calculating, on a recurring basis, the average rate of change per scan line of each of the coordinate pairs over the length of the vector prediction means, using data from the slope calculation means, for predicting the coordinates descriptive of the width and location of a run of like-valued signals applic-able to a vector on a current scan line; and connectivity means for determining whether the coordinates descriptive of a given run are predicted, within a prescribed measure of accuracy, by the prediction means using data from the slope calculation means applicable to a given vector in the memory means, and, in the event of a prediction within the prescribed measure of accuracy, for updating the data for the given vector in the memory means.
15. An apparatus according to claim 14, wherein the connectivity means comprises means, for determining whether the run has one of the following properties:
(i) it is the first scan of a vector after a Y-junction with a previously scanned vector;
(ii) it is the first scan of a vector after a .lambda.-junction with a previously scanned vector;
(iii) it is a simple continuation of a previously scanned vector;
(iv) it is a new vector with no connectivity to a previously scanned vector.
16. An apparatus according to claim 15, wherein the connectivity means includes a plurality of parallel arithmetic logic units configured so that the determinations made by the connectivity means are made substantially simultaneously.
17. An apparatus according to claim 14, including means for causing recurrent calculation by the slope calculation means whenever the number of scan lines, over which coordi-nate data of the given vector have been obtained, is an integral power of 2.
18. An apparatus, operative on input signal samples representative of light intensities of a two-dimensional image along a series of successive scan lines, for encoding graphic information therein by generation of a series of output records representative of significant graphic data, the apparatus comprising:
(a) data compression means for extracting, from such input signal samples, two coordinates descriptive of each run of consecutive like-valued signals in a scan line, such coordinates being indicative of the width and location on such scan line of the run;

(b) graphic element memory means, for storing a plurality or records of graphic elements, each graphic element constituting a run of substantially consecutive like-valued signals in each of a plurality of substantially consecutive scan lines, the run satisfying a continuity requirement that the coordinates indicative of the width and location of substantially each run, occurring in any scan after the first in which one of such runs occurs, be within a prescribed measure of accuracy of the coordinates predicted therefor on the basis of a specified algorithm utilizing coordinate data of a run from at least one preceding scan line;
(c) predicting means, for predicting, in accord-ance with the algorithm, the coordinates indicative of the width and location of a run on each of a number M, M ? 1, of consecutive scan lines attributable to a first graphic element, based on data for such graphic element stored in the graphic element memory means;
(d) processing means, in communication with the predicting means and the data compression means, for deter-mining whether a run has a fact occurred within M scan lines having coordinates that have been predicted by the prediction means within the prescribed measure of accuracy, and for updating the graphic element memory means in the event of a prediction of a run within the prescribed measure of accuracy, and in the event that there has not been a prediction within the prescribed measure of accuracy, for providing an output from the graphic element memory means of data pertaining to the first graphic element.
19. An apparatus according to claim 18, wherein the processing means further includes means for purging from the graphic element memory means data pertaining to the first graphic element in the event that such data has been provided as an output, so as to make available in the graphic element memory means the space formerly occupied by such data.
20. An apparatus according to claim 18, wherein (i) the graphic elements are all vectors, (ii) the predicting means includes means for determining the rate of change per scan line of each of two coordinates of the first graphic element, and (iii) each of the coordinates predicted by the predicting means is a function of the product of the rate of change of per scan line of such coordinate times the number of scan lines over which the prediction is being made.
21. An apparatus according to claim 20, wherein the prescribed measure of accuracy is a variable function having values between two preselected numbers, and wherein the processing means includes means for determining the prescribed measure of accuracy as a function of the width of the first graphic element at a preceding scan line.
22. An apparatus according to claim 18, wherein the prescribed measure of accuracy is a variable function having values between two preselected numbers, and wherein the processing means includes means for determining the prescribed measure of accuracy as a function of the width of the first graphic element at a preceding scan line.
CA000451901A 1984-04-12 1984-04-12 Method and apparatus for vectorizing documents and for symbol recognition Expired CA1256214A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA000451901A CA1256214A (en) 1984-04-12 1984-04-12 Method and apparatus for vectorizing documents and for symbol recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CA000451901A CA1256214A (en) 1984-04-12 1984-04-12 Method and apparatus for vectorizing documents and for symbol recognition

Publications (1)

Publication Number Publication Date
CA1256214A true CA1256214A (en) 1989-06-20

Family

ID=4127648

Family Applications (1)

Application Number Title Priority Date Filing Date
CA000451901A Expired CA1256214A (en) 1984-04-12 1984-04-12 Method and apparatus for vectorizing documents and for symbol recognition

Country Status (1)

Country Link
CA (1) CA1256214A (en)

Similar Documents

Publication Publication Date Title
EP0120387B1 (en) Method and apparatus for vectorizing documents and symbol recognition
US4759075A (en) Method and apparatus for vectorizing documents and symbol recognition
EP0138079B1 (en) Character recognition apparatus and method for recognising characters associated with diacritical marks
US4153897A (en) Method and device for detecting the similarity between standard and unknown patterns
AU706080B2 (en) Character recognition system identification of scanned and real time handwritten characters
US4183013A (en) System for extracting shape features from an image
EP0006476B1 (en) Character recognition apparatus using curve vector processing
EP0063454A2 (en) Method for recognizing machine encoded characters
JP3065104B2 (en) Fingerprint matching device
EP0551738A2 (en) Method and apparatus for connected and degraded text preprocessing
EP0325233A2 (en) Character string recognition system
EP0369761A2 (en) Character segmentation method
WO1991018368A1 (en) A polygon-based method for automatic extraction of selected text in a digitized document
US5802204A (en) Handwritten character translator using fuzzy logic
CN110490199A (en) A kind of method, apparatus of text identification, storage medium and electronic equipment
US5889889A (en) Method and apparatus for machine recognition of handwritten symbols from stroke-parameter data
US4975974A (en) Character recognition apparatus
US4799271A (en) Optical character reader apparatus
CA1256214A (en) Method and apparatus for vectorizing documents and for symbol recognition
US4138662A (en) Character reader
EP0042968B1 (en) Multi-registration in optical character recognition
RU2707320C1 (en) Method of recognizing a symbol on a banknote and a coprocessor for a computing system of a banknote processing device
EP0201909A2 (en) Procedure for automatic reading of images and device for carrying out this same procedure
JP3747602B2 (en) Image processing method and image processing apparatus
JP3835097B2 (en) Specific pattern detection apparatus and method

Legal Events

Date Code Title Description
MKEX Expiry