CA2185827C

CA2185827C - Method and apparatus for connected and degraded text recognition

Info

Publication number: CA2185827C
Application number: CA002185827A
Authority: CA
Inventors: Chinmoy Bhusan Bose; Shyh-Shiaw Kuo
Original assignee: American Telephone and Telegraph Co Inc
Current assignee: AT&T Corp
Priority date: 1991-12-23
Filing date: 1992-10-26
Publication date: 1999-05-11
Anticipated expiration: 2012-10-26
Also published as: CA2185827A1

Abstract

The present invention provides a method and apparatus for enhancing and recognizing connected and degraded text. The enhancement process comprises filtering a scanned image to determine whether a binary image value of an image pixel should be complemented, determining whether complementing the value of thepixel reduces the sharpness of wedge-like figures in the image, and complementing the binary value of the pixel when doing so does not reduce sharpness. The recognition process may comprise determining primitive strokes in a scanned image, segmenting the scanned image into sub-character segments based on the primitive strokes, identifying features which characterize the sub-character segments, andcomparing identified features to stochastic models of known characters and determining an optimum sequence of known characters based on the comparisons through the use of Viterbi scoring and level building procedures.

Description

Method and Apparatus for Connected and De~raded Text Reco~nition This application is a division of Application Serial No. 2,081,406 filed October 26, 1992.

S Field of the Invention This invention relates generally to the field of optical text recognition, and specifically to the recognition of connected and degraded text.

Back~round of the Invention In modern business office environment~, many devices and systems are used 10 to improve the speed and efficiency associated with the creation, proces~ing, and dissemination of documents. Among these are text processing systems, fax machines, and photocopiers.
From time to time, it may be necessary to convert the text of a printed document to electron i.e. form for text processing or communication purposes. Such a 15 circllm~t~nce may arise, e.g., when a document created on one text processing system must be edited on another system with which there exists no electronic communication capability. The conversion process for such text may comprise optical sc~nning and image analysis processes. The aim of the conversion process is the generation of a colllpulel text file, typically comprising ASCII characters, which reflects the printed 20 text. If a printed document comprises clean, well-formed text, this conversion process may not present much difficulty.
Because of distortion effects associated with repeated photocopying and facsimile tr~n~mi~ion, certain documents may include fuzzy, swollen (degraded) and overlapped (connected) characters which make the text conversion process problematic.
25 The greater the degree of degradation and connectivity, the more difficult it is to accurately discern and identify printed text characters. Naturally, colll~uler files which result from the conversion of documents which contain such text frequently include errors in their representation of the document's words and characters.

Summary of the Invention The present invention provides a method for recognizing characters in a scanned text image, the method comprising the steps of: d~te".~ g primitive strokes in the scanned text image; segmenting the scanned text image into one or more sub-5 character segments based on one or more determined pl;llli~iV~ strokes; identifying oneor more features characterizing a sub-character segment; and recognizing characters based on identified sub-character features.
The invention also provides a method for recognizing characters in a scanned text image, the method comprising the steps of: segmenting the scanned text image 10 into sub-character segment~; identifying one or more features characterizing a sub-character segment of a scanned text image; comparing identified sub-character features to stochastic models of known characters and d~l~ ."lil~ g a distance score based on each comparison; and d~le."lil-il-g an ol,lilllulll sequence of known characters based on determined distance scores.
Also, the invention consists of a text recognition system, the system comprising: means for performing word image enhancement; means, coupled to the means for performing word image enhancement, for performing sub-character segmentation; means, coupled to the means for performing sub-character segmentation, for performing feature extraction based on sub-character segments; means, coupled to the means for performing feature extraction, for performing recognition of text based on a comparison of extracted sub-character features and stochastic models of known characters; and memory means, coupled to the means for performing recognition oftext, for storing the results of text recognition.

Brief D~ .lion of the D. ~. i g~
Figure 1 presel~l~ an illustrative text recognition process according to the present invention.
Figure 2 presell~ line adjacency graph and compressed line adjacency graph representations of the character X.
Figure 3 presents line adjacency graph and compressed line adjacency graph leplesell~lions of the character e.

- 2a-Figure 4 presents the illustrative word pre-processing process presented in Figure 1.
Figure 5 presents a 3 x 3 window of pixels used in a nominal filtering process of a modified median filter.
Figure 6 presents an illustrative original noisy image of the character w.
Figure 7 presents a 3 x 7 window of pixels used in the modified median filter to preserve the sharpness of V-like shapes in an image.
Figure 8 presents a 3 x 7 window of pixels used in the modified median filter to preserve the sharpness of inverse V-like shapes in an image.
Figure 9 presents the character w having two noise-like pixels to be removed by a line adjacency graph filter process.
Figure 10 presents the result of word pre-processing on the character image presented in Figure 6.
Figure 11 presents a line adjacency graph for the character e and associated strokes represe"l~live thereof.
Figure 12 presents a line adjacency graph for the character x and associated strokes repres~"l~Live thereof.
Figure 13 presents the set of strokes associated with the word hello.
Figure 14 plesenl~ two adjacent strokes and the quantities related thereto used to determine whether such strokes should be merged.
Figure 15 presents an illustrative stroke and an illustrative arc.
Figure 16 presents several segments of a line adjacency graph for the characters ky.
Figure 17 presents a set of features extracted from un-preprocessed segments associated with the characters ~V-21 ~5827 Figure 18 presents the features extracted from preprocessed segments of the characters ky.
Figure 19 presents a line adjacency graph for the character z and associated strokes representative thereof.
s Figure 20 presents a set of primitive features extracted from segments associated with the word hello.
Figure 21 presents the line adjacency graphs, compressed line adjacency graphs, and associated primitive feature strokes for the character i.
Figure 22 presents an illustrative collection of 32 feature centers used by 10 the illustrative embodiment of the present invention.
Figure 23 presents the Hidden Markov Models for the characters ju.
Figure 24 presents a trellis leplesentative of the Viterbi scoring and level building techniques.
Figure 25 presents illustrative overlap and blur parameter data for use in 15 generating a training data set.

Detailed Description A. Introduction Figure 1 presents an illustrative text recognition process 10 according to the present invention. Process 10, which receives binary pixel images of individual 20 words to be recognized, comprises a word preprocessing process 100, a sub-character segmentation process 200, a feature extraction process 300, a trainingprocess 400, and a recogni~ion process 500. The images received by process 10 may comprise connected (e.g., touching or overlapping) and degraded (i.e., noisy) characters. They are provided by text scanning and page preprocessing systems, 12s and 5 respectively. These systems 1,5 scan text from paper copies of docl-m~nt~, identify columns of printed text from the scanned images, identify lines within a column, and word boundaries within a line. Text sc~nning and page preproces~ing systems known in the art may be employed for these purposes. See, e.g., H.S. Baird, Global-to-local layout analysis, Proc. IAPR Workshop on Syntactic and Structural30 Pattern Recog., (Sept. 1988); and S.N. Srihari and G.W. Zack, DocurnentIrnageAnalysis? Proc. 8th Int'l Conf. Pattern Recognition, 434-436 (Oct. 1986). In addition to providing images of sc~nnecl words, these systems provide estimates of character point size and base-line location.

-- 2~5~27 Word preprocessing 100 pe~ro~ s filtering and other processing based on line adjacency graphs to reduce noise and retain word image sharpness. Sub-character segmentation 200 divides a preprocessed word image into a number of sub-character segments. These segments are defined using line adjacency graphs to 5 identify strokes. Segments are defined based on the identified strokes. What results is a partitioning of the filtered pixel map received from word preprocessing 100 into a plurality of individual segment maps.
Following segmentation, feature extraction 300 is perforrned. Through feature extraction 300, each if1entified segment is characterized by one or more10 features which may be of either the stroke or arc variety. (If a word presented for recognition is constrained not to comprise connected or signific~ntly degraded characters, it is possible to perform recognition based on coll.p~ing extracted features to feature models of known letters.) With segments characten7~ by their features, process 10 may pe~ro 5 either of two processes: training 400 or recognition 500. By the training process 400, a Hidden Markov Model (HMM) is built for each text character to be recognized. Data associated with one or more trained HMMs (e.g., state tr~nsition probabilities) may be stored in a semiconductor mellloly (not shown), such as a Read Only Memory (ROM). Through the recognition process 500, stochastic ~ t~nces of 20 the sequences of unknown character segments are obtained based on the HMM state transition and associated bi-gram probabilities stored in memory. These distances are used to determine the most likely sequence of text characters which might have produced the unknown observed image segrn~nts~ The most likely sequence of text characters may be saved in memory (not shown) for later retrieval.
2s Embodiments of the present invention may be used to augm~nt the capabilities of conventional optical character recognition systems co~ eleially available. This may be done by providing such systems with software performing the functions described below. Conventional systems would be required to pe~
the text scanning and page preprocessing tasks described above.
An illustrative set of software programs for an embodiment of the present invention written in the "C" language is provided in an Appendix attached hereto. The Appendix also provides a list of the programs a~soci~ted with each of the word preprocessing 100, sub-character segmentation 200, feature extraction 300, training 400, and recognition 500 processes. These programs may be executed on 35 compuLei marketed under the trademark SUN SPARCstation 1.

`_ 2}85827 ~
~, 5 For clarity of explanation, the illustrative text recognition process 10 of the present invention is presented as comprising individual functional blocks. These functional blocks may be provided through the use of either shared or deAic~ted hardware, including, but not limited to, hardware capable of executing software.s Illustrative embotliment~ may comprise digital signal processing (DSP) hardware, such as the AT&T DSP16 or DSP32C, and software performing the operations discussed below. Very large scale integration (VLSI) hardware embo~ nt~ of the present invention, as well as hybrid DSP/VLSI emb~im~llts, may also be provided.
1. Line Adjacency Graphs 0 Line adjacency graphs (LAGs) are employed by several aspects of the illustrative process 10. As used in process 10, a LAG is stored in memory and represents run-lengths in a scannçd and digitized image. Each "run" of consecutive black pixels on a scan line is denoted as a node of a LAG. The degree of a node is expressed as an ordered pair of numbers denoted as (a, b). The number a equals the 5 number of nodes above and connected to a given node, while the number b equals the number of nodes below and connected to the node.
A junction is a node of a LAG having a or b greater then one. A path is a node having a and b less than or equal to one. The left-hand portions of Figures 2 and 3 are LAG le~l~,sent~tion~ of characters X and e, respectively. In these figures, 20 paths and junctions are in~ ted by solid and dotted lines, respectively.
The LAGs presented in Figures 2 and 3 can also be represented in a compressed form referred to as a c-L AG. In a c-LAG, connçcted paths can be represented in a compressed form referred to as a c-path. The right-hand portions of Figures 2 and 3 present c-LAG representations of characters X and e, respectively.
25 Junctions are represented as circles, while c-paths are represented by shaded circles.
In case that one of the degrees of a junction is one, the junction is also included in the corresponding c-path connected to the junction if it is not an outlier compared to the nodes in the c-path. A junction may be considered to be an outlier if the width of the junction divided by the average width of the c-path exceeds a 30 threshold, e.g., 1.4. For example, the two junctions of X in Figure 2 are included into the corresponding c-path. However, the junction in the middle part of e in Figure 3 is not included in the c-path connected to it, since it is an outlier.

~_ -6- 2 1 85827 B. Word Preproc~ssin~
Illustrative word preprocessing 100 is performed on a presented word image in order to reduce spurious noise prior to training or recognition. As shown in Figure 4, word preprocessing 100 compri~es a mol1ifiç~ median filter 120 and a LAG
s process 140. The mo~ifiçcl median filter 120 reduces noise and preserves aspects of sharpness and connectivity while the LAG process 140 removes noise-like run-lengths in the image.
Modifiçd median filter 120 comprises a nominal filtçring process which is mo(lifi~ under certain circum~t~nces. The nominal filtering process employs ao 3x3 window of pixels, such as that shown in Figure 5. The nomin~l process centers the window (window elemPnt no. five) over a given pixel in the image and assigns to that pixel the binary value associated with the majority of the pixels in the window (i.e., the binary value held by at least five of the nine pixels).
Two rules modify this nominal filtering process. The first rule concerns 15 a situation when, according to the nominal filtering process, an empty (i.e., white) pixel should be filled (i.e., made black), such as pixel (i, j ) in Figure 6 (where i and j denote a specific row and column of the image). A 3 x7 window, illustratively presented in Figure 7, is centered over pixel (i, j). If more than 14 of the shaded - pixels are filled (i.e., black) and both pixels (i- 1, j) and (i-2, j) are empty, then 20 the pixel (i, j ) shall not be filled. This first rule preserves the sharpness of wedge-like shapes in the image which can enh~nse pelrolmal1ce in subsequent processingsteps. In this illustrative procedure, the wedge-like shapes preserved are V-like.
A like procedure may be performed using the 3x7 window of Figure 8 to preserve the sharpness of inverse V-like wedge shapes. In this procedure, if more 2s than 14 of the shaded pixels are filled and both pixels (i+ 1, j) and (i+2, j) are empty, then the pixel (i, j) shall not be filled.
The second rule for modifying the nominal filtering process concerns the situation when, according to the nominal process, a filled pixel should be emptied. If such a pixel is an element of a sequence (or run-length) of at least five 30 consecutive filled pixels, then the pixel is not emptied. This second rule may preserve connectivity of run^lengths which may be broken by the nominal filtering process of median filter 120.
Word preprocessing 100 further comprises a LAG filter process 140 to remove some noise-like run-lengths by determining and checking the LAGs 35 associated with the image. Every path (i) located at the top or bottom of each blob having degree (0,1) or (1,0), respectively, and (ii) connected to a junction is removed, such as those two paths located at the top left of 'w' in Figure 9 (where a blob is any set of one or more pixels, wherein each pixel in the set is connected to at least one other pixel in the set in any of the eight ways one pixel may be connected 5 to an adjacent neighbor pixel (vertically: up, down; horizontally: left, right; and diagonally: up-right, up-left, down-right, and down-left)). See Appendix, modules prep.c, lag.c, and clag.c.
Figure 10 presents the character image w from Figure 6 after operation of word preprocessing 100.

0 C. Sub-Character Segmenta~ion The sub-character segmentation process 200 divides image info~ ation received from the word preprocessing 100 into segments which can, in turn, be used to characterize the image in terms of segment features. Such char~ctçri7~tion isuseful to both the training and recognition processes, 400 and 500, respectively.
5 Segmçnt~tiorl is carried out by identifying the strokes present in an image.

1. Stroke Identification In the illustrative process 200, stroke identifiçation is pelrolllled by first determining the direction of dominant strokes in the image. Dominant strokes maybe identified by sc~nning the pixel profiles within a preset range of angular directions 20 at small intervals, and choosing the direction of tallest peaks in the profile. If the direction of the dominant strokes is not vertical, the direction may be norm~li7ed by rotating the pixels on an im~gin~ry slanted line (based on the slant angle) such that the pixels fall on a vertical line.
Primitive strokes are identified to provide useful structural information 25 for segm~nt~tion- The primitive strokes in a word can be identified by generating and analyzing a c-LAG of the image to be recognized. See Appendix, mo~llles lag.c and clag.c. A primitive stroke is identified by its endpoints in the two-dimensional plane in which it lies, x 1 .Y 1 and x 2 ,Y2, where values for x and y are related to the top left corner pixel of a rectangle surrounding a scanned blob. The rectangle has 30 dimensions equal to the height and width of the blob. See Appendix, module blob_extr.c.

-8- ~ 2185~27 Each c-path of the c-LAG is analyzed according to its own characteristics and its neighborhood information. The first step of analyzing a c-path is to divide it into one or more groups of nodes which have similar width and collinear centers.
S Consecutive nodes (intlir~ted by i and i+l) will be considered to have dissimilar widths, w(i) and w(i+l), if all the following tests are sati~fiçd-(i) I w(i) - w(i+ 1 ) I > a;

(ii) either W(i(+)l) ~ or (i(+)l) > ~; and (iii) w(i) w(i-l) + w(i) w(i) , a w(i+l) w(i) w(i+l) w(i+2) O wherein, e.g., a=2.0, ~=0.7, and a=o. 1S. For a group which comprises nodes ofsimilar widths, collinearity of node centers is determined by defining a line through the centers of the first and last nodes in the group and deLe~ ining the maximumdistance of any node center in the group from this line. If this maximum distance is, e.g. less than 2.6 pixel units, the nodes of the group are said to have colline~r centers.
Strokes corresponding to each group with nodes of similar widths and collinear centers are identifiçd (or returned) according to the following rules:
i. When the ratio of group height over average width of the group, denoted as Rh~W, is larger than a threshold (e.g., 1.50), a vertical stroke is returned which is a line fitting centers of the nodes.
ii. When the ratio Rh~W is smaller than a threshold (e.g., 0.65), a horizontal stroke is returned which lies in the middle of the group.
iii. When a group is adjacent to a much wider junction or path of another group either at top or bottom, a vertical stroke is returned (see Figure 11). A group is said to be much wider than another if its width at the point of adjacency 2s divided by the average width of the other group is greater than a threshold, e.g., 1.7.

2 ~ 85827 iv. If a c-path contains only one group and that group connects to two c-paths at both top and bottom, two crossed strokes are returned (see Figure 12).
Each vertical stroke is characterized by a width which is used in segmentation. The width of a vertical stroke is defined as the average width of nodes 5 in the group from which the stroke is returned. Horizontal strokes, on the other hand, are not characterized by widths since such width information is not used by the segmentation process 200.
In order to avoid ambiguity in segment~t;on process 200, strokes are not returned from ambiguous c-paths, i.e., those c-paths which do not satisfy any of the 0 rules (i-iv) for returning strokes. For example, in Figure 13, no stroke is returned for the part of the image where "lo" touches and the upper right part of "o."
The final step for stroke identifiç~tion in the illustrative segmentation process 200 is the merging of certain adjacent strokes. Strokes may be merged ifdeviation to be incurred as a result of the merge is within a predeterminecl tolerance.
5 Consider the example in Figure 14, where E 1 . E2, E3 and E4 are the endpoints of the two adjacent strokes. A new merged stroke is formed by connecting the starting point of the first stroke, E 1, to the end point of the second stroke, E4. Then, five q~l~nfities are ch~cked three (li~t~nces, E2P2, E3 P3 and E2E3, and two ratios, and E E . If all the distances and ratios are smaller than predetermined ElE4 1 4 20 thresholds (e.g., 2.2, 2.2, 5.1, respectively, for the distances, and l/7.4 for both ratios), the deviation is ~eemed acceptable and the two original strokes may be replaced by a new merged stroke. The threshold values are functions of sc~nning resolution (pixels per unit length - assumed known), font type and size (~csllm~l available from the page layout preprocessor), and may be set with a look-up table.

25 2. Se~lne~t~tion Rules Sub-character segmentation is achieved by applying a set of rules based on the returned strokes. The segment boundaries obtained by applying these rulespartition the original pixel image into individual image segments. The segmentation rules are as follows:
30 i. A non-horizontal stroke without any vertical overlap with any other strokei~lentifies a non-horizontal segment, where vertical overlap refers to one stroke partially or wholly lying above or below another stroke as viewed from a vertical direction. The width of the segment is obtained from the width of its -- -lo- 21 85827 strokes.
ii. The space between two non-horizontal segments identifies a horizontal segment.
iii. The vertical overlap of two vertical (or near-vertical) strokes or two in~line-l s strokes identifi~s a non-horizontal segment with a width determined by the overlapped width of the individual strokes. Specifically, non-horizontal segment width refers to the lateral distance traversed by one or more verticallyoverlapping non-horizontal strokes plus an additional ~list~nce added to each stroke end. This ad(lition~l distance is a fraction of the average path width of0 the paths forming the strokes. This fraction depends on the angle which the stroke makes with the vertical. See Appendix, module blob2feat.c. Because there may be more than one stroke in a segment, di~l~,nt non-horizontal strokes may define the left and right edges of a segment. Consequently, the average path width added to define each edge may not be the same.
iv. The vertical overlap of a vertical stroke with any other non-vertical strokeprovides the segment bol~n(l~ries dictated by the vertical stroke.
v. The vertical overlap of an inclined stroke with a horizontal stroke provides the segment boundaries dictated by the inrlinçd stroke.
vi. Two intercepting in~linçd strokes with slopes of opposite sign te.g., strokes forming an 'x' pattern) provide a segment boundary at the point of intersection.

D. Feature Extraction Once a pixel image is segmented by the segmentation process 200, the individual segments may be characterized by identifying in such segments one or 2s more features. Such identifiçd features may be used both in training 400 and character recognition 500.
In the illustrative feature extraction process 300, two types of features are identified within segments: strokes and arcs. Figure 15 presents an illustrative stroke and an illustrative arc. A stroke, which is a line-segment, is uniquely 30 identified by its centroid, length, and slope, and may be lep,~,sellted by a 5-tuple (x, y, rsin2~, rcos2~, d), where (x, y) is its centroid measured with respect tobase-line information provided by systems 1,5, r is its length and ~ is the slope _ 2~85827 11 , angle. (Twice the slope angle is used for the purpose of m~int~ining continuity in the parametric representation, as the slope angle varies between -90 and +90 degrees.) The value d is always 0 for a stroke. An arc may also be represented as the 5-tuple (x, y, rsin2~, rcos2~, d), where the first four pa.~ e~G.~ represent a 5 chord of the arc (in the same fashion as a stroke), and d is the maximum perpendicular distance of the arc from the chord.

1. Se~mPnt Preprocessin~
Prior to feature identific~tion, the illustrative feature extraction process 300 preprocesses individual segments to remove certain noise-like pixels. For o example, Figure 16 presents several segments of the character string ky. If the illustradve feature identification technique is applied directly to these segments, some undesired features will be extracted due to groups of noise-like pixels as inrlic~tefl in Figure 17 by labels a-f.
Consider the group of noise-like pixels identifi~tl by the label e. Since 5 this group is actually a small portion of a c-path of the letter y, it can be excluded from the second segment (which conrern~ a portion of the letter k). Exclusion ofnoise-like pixels is done by elimin~ting all pixel groups which are a portion of either (i) a path or (ii) a c-path in a neighboring segment. Figure 18 presents the features which are extracted from preprocessed segments of string ky. Strokes associated with 20 noise-like pixel groups a-f are no longer present.

2. Feature I~e ltific~tion A structural analysis similar to that described above for sub-character segm~t~tion is employed for identifying (or extracting) segment features. The first step in this analysis involves representing each image segment by a c-LAG. Each c-25 path of a c-LAG is then analyzed to identify its features. If a segment is i~l~nfified as horizontal, horizontal strokes are returned from each c-path of the segment. SeeAppendix, path_s.c. For non-horizontal segments, each c-path thereof is checked and subdivided into groups of nodes, if applicable.
The process of subdividing c-paths for feature identification is different 30 from that p~,lÇ~Illed for sub-character segment~tion (where a c-path is subdivided based upon either a large width change between two adjacent nodes or non-collinear node centers). Here, groups are formed by checking for width change only as described above.

~ 2185827 Two adjacent groups in a segment will be merged into a single group if the following two con-litions are satisfied:

i. Ilwl -W2 11 < a, < Wl < ~.

5 Where w 1 and w2 denote the average widths of the two ~djacent potential groups, and a and ~ are predetermined constants (e.g., 3.0 and 0.6, respectively).
The purpose of con~ition~1 merging of groups is to preserve arc-features within segments. As shown in Figure l 8, there are two potential groups which might be identifi~d in the first segment of the character y due to signific~nt node width 10 ch~ngçs (at the bottom of the character). However, because the change of widths between the two potential adjacent groups is not large enough, as deterrnined by the above con-litions i and ii, the c-path contains only one group. Thus an arc is able to be extracted from that segment according to the criterion (ii~cu~se~l below. In contrast, Figure 19 shows a LAG for the letter z wherein the c-path beginning at the 15 top of the letter is subdivided into two groups which cannot be merged under con-lifion~ i and ii. Therefore, separate corresponding strokes may be identifieaccording to the rule discussed below.
Arc and stroke features are identified sequentially in each group of nodes within a segment. Arcs may be identified by constructing a line connecting20 the centers of the first and last nodes in a group. The center of a node, within the group, located at the greatest ~ nce from the line is then ~eterrnin~l If the ratio of this largest distance over the length of the line is larger than a threshold (e.g., 0.-1), an arc is identified and returned from the group. For example, in Figure 20, an arc -in(lic~ted by a triangle - is returned from the 4th and 10th segments. The three2s vertices are the centers of the first and last nodes, and the node center located at the maximum tli~t~n~e from the line.
Generally, the same rules discussed above used in defining strokes for purposes of sub-character segmentation may be used here. As such, a stroke may be identified based on the ratio of height over average width of the group (Rh/W).
30 Unlike stroke definition for segment~tion, however, looser thresholds may be used in order to return strokes from most of the groups (e.g., 1.2 and 0.85 are used instead of l.5 and 0.65, respectively).

- 13- 2 1 85.827 Special rules may be used for those unclear groups, such as:
i. For an isolated single c-path, such as, the top part of the character 'i' in Figure 21, return a vertical stroke if Rh~W is larger than 0.9. Otherwise, return a horizontal stroke.
s ii. If any vertical stroke i~1enfified during segmentation is contained in a segment, each ambiguous group within this segment returns a vertical stroke as a feature.
iii. If a junction is the first or last node in a c-LAG, a horizontal stroke is returned. See, e.g., the horizontal stroke at the bottom of character 'z' in Figure 19.
Each identified feature is represented as a 5-tuple in a continuous vector space. In both the recognition and training modes, these vectors are mapped to adiscrete space defined by a clustering ~lgonthm (see the section on Training).

E. Training 5 1. Introduction Identifying unknown connected and degraded character images is ~ccomplished by illustrative process 10 by relating observed features, extracted as described above, to known features of known characters. The "closer" observed features are to a stochastic model of a given known character, the more cnnfirlently 20 the image which produced the features can be identified as the known character.
Depending on the appearance of characters in an image, features extracted from a given segment of a given character image may not always be the same (for any two samples of the character to be identified). Characters may appear differently, for example, due to varying connectivity with neighboring characters and 25 varying character degradation (or blur). Furthermore, the starting and ending points of the individual characters become obscured.
Observing features of connected and degraded character images and determining the characters to which such features correspond depends upon a doubly embe~lecl stoch~fic process. That is, one which has an underlying observable 30 stoch~tic process concerning which features might be extracted from image segments, and another stochastic process, not directly observable, concerning which stoch~fic features might be associated with extracted features.

Illustrative process 10 represents the doubly embedded stochastic processes associated with connected and degraded text recognition through the use of Hidden Markov Models (HMM). Unlike the discrete observable Markov Model, wherein each model state corresponds to an observable event, the states of a HMM5 are not directly observable. Rather, observations are probabilistic functions of the state to be determine~l A HMM is provided for each character to be recognized. See Appendix, modlllçs recinit.c and ndorec.c. Each state of a HMM represents one segment of a character. Thus, the number of states in a model depends upon the number of segments needed to represent a character.
Each HMM of illustrative process 10, ~, may be described generally as follows:
i. Each model compri~es a set of states: Q = { cl~j : 1 < j < J }, where J is the number of states in the model. Each state is a stochastic leprcset-t~tion of a segment of a character to be recognized.
1S ii. Each model has associated with it a matrix of state transition probabilities:
A = ~ajm: 1 C j,m < J},whereajm = P(cl~m at i+1 I Cl~; at i). These probabilities represent the likelihood that, for a given model, one state (or segment), c3m, will follow a given state (or segment), Cl~j, in time.
iii. For each state of a model, a vector of observation probabilities for observation Xi: B = {bj(Xi)},wherebj(Xi) = P(Xi I ~j at i). Theseprobabilities represent the likelihood that a given observed segment vector, X i, is associated with a given state, Cl~j (see section 4, below).
iv. Associated with each state of a model is an initial state probability:
Il = { 7~j ), where 11j = P(~j at i= 1). These probabilities represent the likelihood that a given model state will be the initial state from which the first state transition will be made.
v. As part of the recognition process 500 discussed below, each state of each HMM is compared against each segment vector in an observation vector sequence; X = { X i : 1 < i < I }, where I is the number of observations.
This vector represents the series of binary segment vectors representing the features extracted sequentially from the image of a character string.

2 t 85~27 In a~t1ition to the use of probabilities associated with each model of a character (i.e., state transition probabilities, ajm; observation probabilities, bj (Xi );
and initial state probabilities, ~j), illustrative process 10 employs measures of likelihood associated with the sl~ccession of characters in an image. Process 10s utilizes bi-gram probabilities to reflect the likelihood that one character will follow another in a word presented for recognition. Bi-gram probabilities provide contextual information to aid in the process of character and word recognition.
The training process 400 supplies the HMMs of the illustrative process 10 with information which may be used to analyze observations, Xi, to determine a 10 maximum likelihood solution to the problem of identifying connected and degraded characters. That is, training 400 provides not only the state tr~n~ition probabilities, ajm, observation probabilities, bj (X i ), and initial state probabilities, ~j, but also the bi-gram probabilities for contextual analysis. Given an observation sequence X and the model pal~lletel~ determined through training 400, a recognition process 500ls may be employed to determine the optimal state sequence associated with the observations { ~ji: 1 < j < J, 1 ~ i < I }. In other words, recognition 500 may determine the most likely sequence of characters which may be postulated given the set of observations.

2. Training Data Set In order to have an applol,liate training data set for deriving the HMM
parameters, it is plefell~d that a character data set be generally represent~tive of the expected characters in the words presented for recognition. A pseudo-random character generator of the type described by H. S. Baird, Document image defect rnodels, Proc. L~PR Workshop on Syntactic and Structural Pattern Recog., (June 2s 1990), may be used to obtain a set each of the characters for training. For example, the character generator may provide a training set comprising the lower case Roman alphabet (a - z), printed in Times Roman font (point size 10) and sc~nne~l (~im~ ttoA) at 300 pixels per inch. The character generator should provide the two major sources of noise in printed text -- overlap and blur. Figure 25 presents 30 illustrative overlap and blur parameter data for use with the character generator. In the figure, points in-licatecl by an "S" in-licate separate characters within a word while those indicated by a "T" in~licate that characters are touching lightly. Overlap and blur are not orthogonal parameters; that is, a certain amount of blur may produce overlap. Nonetheless, it may be preferable to perform training with a certain amount 35 of overlap not accomplished by blur parameters. A set of approximately 550 non-`- 2 1 8 5827 overlapping training characters at several blur levels may be generated by the character generator, keeping other parameters of the above-referenced character generator constant.

3. Clustering of Features The training set of characters should be segmenterl and their features extracted, as described above, to produce, for example, a set of 1400 segments consisting of a total of approximately 2000 features. These features may be clustered using a k-means algorithm, such as that described by, A. K. Jain and R. C.
Dubes, Algorithms for Clustering Data, Chapter 3 (1988), and J. A. Hartigan, 0 Clustering Algorithms, Chapter 4 (1975). The algorithm may be started with a set of 15 visibly distinct cluster centers chosen from one of the (a - z) training sets. A
"comr~ctness of cluster" index may be defined as:
C _ Mean weighted distance to other cluster centers c - Standard deviation of the cluster mçmbçrs At the end of iterations of the clustering algorithm for the current 15 number of centers, the mean of Cc over all clusters should be determined (thedistances in the numerator are weighted by the number of m~mbers). The number ofclusters should be allowed to increase if the mean "compactness" increases, unless a predetermined number of clusters is reached. A new cluster center may be chosen for a new iteration as the most distant member of the worst (in the sense of the20 "compactness" criterion) cluster. For example, the algorithm may return a set of 32 cluster (or feature) centers from the set of approximately 2000 features. Figure 22 presents an illustrative collection of 32 feature centers (the arcs are represented by triangles with the apex indicating the maximum deviation of the arc from the chord).
Each dot in the field in which each feature center lies represents a corner of a pixel.
25 The "+" represents the intersection of a vertical center-line and the character base-line. The feature centers are provided given 10-point scanned characters. They may be advantageously scaled based on input character point site inforrnation from systems 1,5. See Appendix, modules cluster.c and quant.c.

4. Vector Represen~tion of Se~ment~ and Observation Probability The clustering of features provides a way of partitioning the continuous feature space to a discrete feature space. A training character segment may be represented by a 32-bit binary segment vector, where each bit which is set identifies a given feature center of the discrete feature space which is closest to an identified feature in the segment.
Through the training process 400, observation probabilities are estimated with use of a Bayesian distortion measure using binary features, such as 5 that described by R. O. Duda and P. E. Hart, Pattern Classi~cation and Scene Analysis, Sec. 2 (1973), under the assumption of class-conditional st~tictic~l independence among the features. Training for each of the character models is performed by segmPnting samples of each character, mapping features extracted from each segment to a binary segment vector, and associating with each feature 10 extracted -- each bit of the vector -- a probability of occurrence (i.e., a probability that the feature associated with the bit position will be observed).
Each segment of a HMM may be labeled. For inct~nce, the segments for the character 'u' may be labeled as 'uO', 'ul' and 'u2'. These labels may also be used to represent the corresponding states of the HMM for each character.
Each state of a character model is characterized by the binary probability distribution associated with each bit location. If p n is an estimate of the probability density function at bit location n, where 1 S n < N (e.g., N=32), then P n p ( x n 1 ~ k ) Yjk where Xn iS the binary value of bit n in vector X = { xn: 1 < n < N ~ k iS the 20 event of the state being j in model k, y ~k (n) is the total number of times the bit n of the segment vector for state j, model k was set during training, and Y jk is the total number of times state j of model k appeared during training.
Naturally, density function Pn is but an estimate of the density, and would approach the real density when the sample size is large. Note that many of2s the bit probabilities in a character model may be zero after a training run, due to the difference in features corresponding to different states and different models. In order to resolve any compu~ational problems which might result from this situation, a small probability may be assigned in place of all zero probabilities.
The observation probability for an observation X is:

{bj(X) }= P(X ~ k) = rl [PnXn (1_p )l-xn]
~csuming class-conditional statistical independence between features. Taking logarithm of the above expression (which simplifies the product to a sum, but `- 2 1 8 5827 -retains the relative distance relationships), and redefining ( bj (X) }, {bj(X) ) = log rl [pnxn (1_p )l-Xn]
n ~ x n log 1 + ~ log ( 1 -- P n )-This observation (log) probability serves as a Bayesian measure of distortion ors distance of the observation vector X with respect to a model state.

5. State Transition Probability Within a specific HMM for a character, the state transidon probability is defined as:

ajm = P(c~m at i+1 1 ~j at i), 0 where i is the observation sequence, 1 < j,m < J and m > j. Given the physicalordering of the states within a model, a left-right sequence of HMM states is preferred. As such, the HMM for the character 'u' where J = 3 is: cl~ 1 = uO, ~d2 = u 1, and c33 = u2. The state transition probabilities within a character may be est;m~ted as:

zj (m) ajm = = zj ' where Zj (m) is the total number of transitions from state c~; to ~m. and z; is the total number of tr~nsi~ions from state c~;. Based on observations of connected characters, it is plerell.,d to skip at most one state during the state transitions, i.e., m - j < 2.
Meaningful transition probabilities between states of the same character model should be determined from a large representative training data set. For the illustrative training data set character models, the tr~n~ition probabilities are highly dependent on the degree of connectedness (overlap), the connected pair of characters, and the amount of noise existing in the test samples. The degree of 25 overlap and blur in the training data set should be representative of that expected for the system in operation so that transition probabilities will accurately reflect the likelihood of state (segment) succession in real character images to be identified - 2 1 8~827 Therefore, in lieu of the transition probabilities, penalty functions are added to the cllm~ tive (list~nce measures, to penalize for skipping a state or for staying in the same state. See Appendix, module ndorec.c. P~ "lance may be enh~nced by providing penalty functions which are tuned (or graded) for (i) different 5 characters or (ii) important states in a given character model. The skipping of a state may be indllce~ by the overlapping of two characters (for instance, overlap of the last segment in the segment string {jO, jl } with the first segment in the segment string {uO, ul, u2} in the character string ".ju.." in Figure 23). It could also be induced by a missing segment due to deformation of the character. The decision to O stay in the same state may be caused by an extra segment generated because of deformation of the character.
The transition probability between character models -- the bi-gram probabilities -- may be determined from the statistical studies of the type of text material the recognizer is expected to handle. For general English text, the statistical 15 results of previous studies performed on the transition probabilities betweencharacters may be used, such as those provided by A. G. Konheim, Cryptography: APrimer, Sec. 2.3 (1981), which reports first order transition probabilities between two successive letters in the English language. See Appendix, module recinit.c.
These probabilities are used in a level building technique described below. Although 20 the illustrative process 10 employs bi-gram probabilities, an embodiment may employ n-gram probabilities (n>2) without requiring much in the way of additional coll~putational burden.

6. Lnitial State Probabilities Each character is represented by a HMM with its own initial probability 25 which is assigned to the first and second states in a left-right model. (A second state initial probability is a~signecl to address the skipping of model's first state.) Initial state probabilities apply to the model corresponding to the first character of acharacter string. This probability may be used at the start of the level building algorithm (described below) to discriminate between probable character strings.
30 Again, the initial state probability may be estimated from the statistical studies of the type of text material the recognizer is expected to handle. In case of general English text, useful data is provided by A. Kundu, Y. He, and P. Bahl, Recognition of handwritten word: First and second order Hidden Markov Model based approach, Vol. 22, no. 3, Pattern Recognition (1989). This data is based on the dictionary35 entries of English words starting with each character.

~_ -20- 21 85827 F. Recognition 1. Introduction For recognition of a character string, the segment~tion technique for separating sub-character segments described above may be used. Using the process5 described above for the training mode, the binary feature vectors which correspond to each segment are found. The Bayesian distortion measure (defined above) for finding the ~i~t~n~e of the observed segment from the st~ti~tic~l models for thetrained segments is used. ~of~ifi~d Viterbi scoring is used to match the unknownconnected segments against the single character HMMs. A level building procedure10 keeps track of the path yielding minim~lm ~ t~nce (maximum probability) for the string up to any segment. Parallel processing techniques may be preferred for the recognition process 500 to minimi7ç processing dme. SeeAppendix, modlllçs nrec.cand ndorec.c.

2. Viterbi Scoring Let the states corresponding to I observations be defined as Q = { q 1, q2, - . qi, - . qI }. The best state sequence (that is, the one which maximizes P(Q I X), where X is the input observation sequence) is given by application of the Viterbi scoring procedure defined below:

i. In;ti(71i7a~n~ bj(Xl), 1 < j < 2 ~1 (i) =

where ~ i (j ) is the best score (highest probability) along a single path at observadon i, and ~i (j ) keeps track of the optimal states which provide such score.

ii. Recursion:
(m) = m2<ax~ [~i_l(j)ajm]bm(Xi). 2 < i < I, 1 < m < J

(m) = ar~max [~i_l(j)ajm], 2 < i < I, 1 < m < J
m- <.i5m iii.Termination: p = max [~I (i)]

qI = alrg<jm<aJX[oI(j)]

iv. State sequence backtracking:
qi = ~i+l (ql+1 ) .
i = I-l,I--2, .. , 1 A trellis structure, such as that presented in Figure 20 and described by S. E. Levinson, L. R. Rabiner, M. M. Sondhi, An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition, Vol. 62, no. 4, Bell Syst. Tech. Journal, 1035-1074, (April 1983), explains the implement~tion of Viterbi scoring (and the level-building technique).
10 Each character is ~plt,sellted by a HMM, denoted as ~k~ ( 1 < k < K), where K is the total number of models, which may be greater than the number of characters to be identified. The HMMs allow state transitions (between two consecutive observations) either to the same state, or to the next state, or to the next higher state (skipping a state). The restrictions discussed above concerning penalty functions 15 and state skipping are part of the model characteristics and reflect the nature of the character deform~tion~ A character model is expected to start at the first or the second state. For each observation i, ~i (m) is calculated for each state of each model. Based on the cumulative measure or score, a decision is made recursively on the o~ llull~ previous state at the previous observation (see Figure 24). The above 20 expressions for the termin~tion and backtracking for a single model case have been mo~ifie(l in the level building algorithm described in the following section. For easier manipulations, the probabilities have been replaced by negative log-probabilities in the calc~ tion~ of b m (Xi ) and o i (m). (These are also referred to as "distances" herein.) 25 3. Level Building In illustrative process 10 and recognition process 500, recognition is based on individual character models and is achieved through a determination of the optimum sequence of character models that best matches (in a maximum likelihood sense) the unknown character string (probably deformed and connected). The level30 building technique introduced above is applied to solve for such an Opti sequence of character models.

- -~ 2 1 85827 The level building algorithm is presented in Figure 24. In this figure, i is an observation point corresponding to the observation X i, j is a state of the HMM
~k, I iS a ievel of the stacked models (level corresponds to character position within the string), and k (axis perpenfiic~ r to the plane of the page) is the character s corresponding to the model ~k. At each observation, the cumulative distances for each model and each state is updated for each level. (The operation may be trimmed considerably by noting that, based on the slope of the trellis and the maximum-number of states among the HMMs, some areas of the trellis cannot be reached.) At the end of each level I for the same observation, a minimi7~tion of the cumulative 0 distance is pelrolllled over all k for identifying the best model at that observation with respect to each succee(ling model.
If the cumulative distance at the end of level I for observation i is defined as Dlk(i), then the best model at observation i for the next character model p (d~ (k,p) is the tr~n~iti~n probability from model k to model p) is as follows:
D~p(i) = lsmkisnK(DI (i) + d~(k,p)), 1 < i < I;

Clp(i) = alrgkmiKn(DI (i) + d~(k,p)), 1 < i < I;

P,p(i) = i - il, 1 < i < I;

where Clp (i) stores the value of k corresponding to the best character model atobservation i, level I and for the next character p. pBp (i) stores the backpointer to the 20 best model at the previous level corresponding to CBp (i). i I indic~tes the length (number of observations) of the current level for the model CBp (i). Since skipping of a state within a model is allowed, Dlk(i) holds the minim~lm of the cumulative distances at the last and next to last state of model ;~k.
The initial best probability (shortest cumulative distance) at each new 25 level I for each model p is obtained from the stored value of DIBp (i - 1). The Viterbi score is incremented by m~tching the character models beginning at the new level.
The best character string of length I may be identified at each observation i bybacktracking the pointer pBp (i) to I = 1. This process continues recursively until the end of the maximum expected levels. The overall best string is obtained from30 min DIBp (I), where L IS the maximum expected number of characters in the string, I
is the lastobservation andd~(k,p) = 0. SeeAppendix, modules nrec.c and -23- 2185~27 ndorec.c.

4. Context by Lexicon In addition to the context provided by the use of bi-gram probabilities, context may be provided through the use of a dictionary or lexicon. Words which s are idendfied by the Viterbi/level building techniques may be compared with a lexicon of words to see if such identified words are present. If not, the closest word in the lexicon may be used in place of the identified word or, the closest word may simply be noted for future use by an operator.

Appendix 1. Word Preproc.oe.einp a. ftr.h b. fe.c c. prep.c d. clag.c e. Iag.c f. dwln.c g. grylvl.c lo 2. Sub-CharacterSe~m~n~tio a. ftr.h b. fe.c c. lag.c d. clag.c e. path_analy.c f. wx_detect.c g. blob_extr.c h. blobth.c i. y_detect.c j. Iine_fit.c k. merge.c 1. blob2feat.c m. dwln.c n. grylvl.c 2s 3. Feature Extraction a. ftr.h b. fe.c c. seg2vec.c d. path_s.c e. dwln.c f. grylvl.c 4. Training a. cluster.h b. cluster.c c. quant.c s 5. Recognition a. nrec.h b. nrec.c c. ndorec.c d. recinit.c bl ob extr . c /* extract seperated blobs from input image */
#include <ftr.h>
void blob_extr(row~,cols,node,no_node,blob list,no_blob,clagimage) int rows,cols,no_node;
struct node *node;
struct blob **blob_list;
int *no_blob;
int **clagimage;
{

struct clagnode *clagnode;
struct blob *add_blob(), *blob_t;
void sort_blob();
int no_clagnode, s_node, left,rgt;
int **blob;
int i,j,m,n,k;
int clag_ct=O,col_start,col_end;
if((*blob_list=(struct blob *)calloc(l,sizeof(struct blob))) -- (struct blob *)O ) printf("calloc fail in blob_list\nn);
(*blob_list)->next = (struct blob *)NULL;
s_node=l;
*no_blob = O;
do {
blob_t = add_blob(*blob_list,s_node);
*no_blob += l;
clagnode = (struct clagnode *)calloc((unsigned) no_node, sizeof(struct clagnode));
clagnode -= 1; /* let clagnode number start from ONE */
no_clagnode=O;
clag(rows, node, s_node, clagnode, &no_clagnode);
/* look for the starting node for next blob */
for(i=l;i<=no_node;i++) {

if((node+i)->mark != 1) {

s_node = i;
break;
}

else s_node=O;
}

/* find start and end column for each blob and form clag-image */
left = cols;
rgt = O;
for(i=l;i<=no_clagnode;i++) {

if((clagnode+i)->class=='p') clag_ct += 1;
for(j=l;j<=((clagnode+i)->number);j++) {

col_start = (node+(clagnode+i)->node[j])->col_start;
if(left>col_start) left = col_start;
col_end = (node+(clagnode+i)->node[j])->col_end;
if(rgt<col_end) rgt=col_end;
if(~clagnode+i)->class=='p') for(n=col_start;n<=col_end;n++) - 27 - 2185~27 --clagimage[(node+(clagnode+i)->node[j])->rowth][n] = clag_ct;

blob_t->col_start = left;
blob_t->col_end = rgt;
for(i=l;i<=no_clagnode;i++) free((char*)((clagnode+i)->node+1));
for(i=l;i<=no_clagnode;i++) if((clagnode+i)->no_vector!=O) free((char*)((clagnode+i)->vctr+1));
free((char*)(clagnode+1));
~ while (s_node > O);
for(i=l;i<=no_node;i++) (node+i)->clagnode=O;
(node+i)->clagnode_p=O;
(node+i)->mark = O;

/* sort the blob_list */
blob = imatrix(l,(*no_blob),l,3);
blob_t = *blob_list;
for(i=l;iC=(*no_blob) i++) blob[i][l] = blob_t->q_node;
blob[i][2] = blob_t->col_start;
blob[i][3] = blob_t->col_end;
blob_t = blob_t->nexti }

sort_blob(blob,(*no_blob));
blob_t = *blob_list;
for(i=l;i<=(*no_blob);i++) blob_t->s_node = blob[i][l];
blob_t->col_start = blob[i][2];
blob_t->col_end = blob[i][3];
blob t = blob t->next;
free_imatrix(blob,l,(*no_blob),l,3);

-/* rerange f from min. to max. */void sort_blob(f,n) int **f;
int n ;
int ith,i,k ;
int min,s_node,end ;
for(k=1; k<=n; ++k) min = 9999999999 ;
` for(i=k; i<=n; ++i) {

if ( f[i][2] < min ) {

ith = i ;
min = f[i][2] ;
}
}

s_node = f[ith][1];
end = f[ith][3];

- 28 - 2 ~ 85827 .
f[ith][l] = f[k][1] ;
f[ith][2] = f[k][2] ;
f[ith]~3] = f[k][3] ;
f[k][1] = s_node ;
f[k][2] = min ;
f[k][3] = end ;
}

/* add new entry to end of blob_list */
struct blob *add blob(blob_liqt,s_node) struct blob *blob_list;
int s_node;
I

while~ blob_list->next !5 (struct blob *)NULL ) blob_list = blob_list->next;
/* reserve a space for the next new entry */
if((blob_liqt->next=(struct blob *)calloc(l,sizeof(struct blob))) == (struct blob *)O ) printf("calloc fail in add_vctr(): l\nn);
. /* add NULL to the new end of the list */
(blob_list->next)->next = (struct blob *)NULL;
blob_list->s_node = s_node;
return(blob_list);

bl ob2feat . c /* Converqion of Blob to Feature Vector~.

*
*/
#include <math.h>
#include <ftr.h>
~include <cluster.h>

#define GAP 2.0 /* ~inimllm width of a segment */
#define MINVEC 2.0 /* mi n;mllm length of a vector */
#define NX 0 /* no intersect */
#define NV 1 /* x/v-intersect */
#define LV 2 /* intersect with left vector vertical */
#define RV 3 /* inter~ect with right vector vertical */
/*
typedef struct float start;
float end;
char type;
BSEG;
*/

oid blob2feat(bimage, bcol, vctr_list, bname, base, outl, dspl_s, dspl, col_start,in,lagimg,clagimg) int **bimage, bcol, **dspl_s, **dspl, col_start,**lagimg,**clagimg;
struct vector *vctr_list;
char bname;
int base;
FILE *outl;
PICFILE *in;
extern int Rows, Cols;
struct vector **vppt, *vecpt, *vec, vtemp;
static BSEG seg[20];
FEAT feat;
int i, j, k, vcount, scount, flag, vec_id, xl;
void sort(), find_x(), vec2param();
float x, y, z, laststart, lastend, *xvec, vwi, vwj, wid, width();
float isect, vend, len;
struct vector *v_list, *vctr char dum[20];
int type;

/* Find the total count of vectors in the blob */
/* and sort them based on their x-location */
for (vcount = 0, vecpt = vctr_list; (vecpt->next !=
(struct vector *) NULL); vecpt = vecpt->next) vcount++;
if (!vcount) seg[O].~tart = 0;
~eg[O].end = (float) bcol;
seg[O].type = 'h';

_ 30 _ 2 1 8 5827 -/* Find if the intersect forms a "vertical V" */

if ((hypl>5) &h ~fabq(ml)<0.2)) *type = LV; /* left vector vert */

else if ~(hyp2>5) && (fabs(m2)<0.2)) *type = RV; /* right vector vert */

else *type = NV; /* no vert */
}

void vec2param(baqe, beg, end, vpt, fpt) float base, beg, end;
struct vector *vpt;
FBAT *fpt;
float xl, yl, x2, y2, x, y;
float sint, cost, sin2t, cos2t, hyp;
fpt->d = 3.0 * (vpt->dst); normalize wrt other parameters */
fpt->d = vpt->dst;
if (fpt->d) ( xl = vpt->ax~0] - (end + beg)/2.0;
yl = base - vpt->ay[0];
x2 = vpt->ax~l] - (end + beg)/2.0;
y2 = base - vpt->ay[l];
else xl = vpt->x[0] - (end + beg)/2.0;
yl = base - vpt->y[0];
x2 = vpt->x[l] - ~end + beg)/2.0;
y2 = base - vpt->y[1];
}

/* Make xl <= x2 for limiting theta bet. -90 and +90 */

if (xl > x2) , x = xl, y = yl;
xl = x2, yl = y2;
x2 = x, y2 = y;

hyp = sqrt(SQR(y2-yl)+SQR(x2-xl));
if (hyp < MINVEC) fpt->px = 0;
fpt->py = 0;
else sint = (y2-yl)/hyp;
cost = (x2-xl)/hyp;
sin2t = 2*~int*cost;
cos2t = 2*SQR(cost) - 1;
fpt->x = (x2+xl) / 2.0;
fpt->y = (y2+yl) / 2.0;
fpt->px = hyp * sin2t;
fpt->py = hyp * cos2t;

~ 31 - ~, 2185~27 if (fpt->d) x = vpt->ax[2] - (end + beg)/2.0;
y = ba~e - vpt->ay[2];
if (x <= xl) fpt->d = -fpt->d;
else if (x2 > xl) if (fabs((y-yl)/(x-xl)) > fabs((y2-yl)/(x2-xl))) fpt->d = -fpt->d;
J

float width(vpt) struct vector *vpt;
float xl, yl, x2, y2, x, y, hyp, m, sint;
if (vpt->type == 'h') return(0);
xl = vpt->x[0];
yl = vpt->y[0];
x2 = vpt->x[1];
y2 = vpt->y[1];
/* Make xl <= x2 for limiting theta bet. -90 and +90 */
if (xl > x2) ( x = xl, y = yl;
xl = x2, yl = y2;
x2 = x, y2 = y;

m = (x2 - xl)/(y2 - yl);
hyp = sqrt(SQR(y2-yl)+SQR(x2-xl));
if ((fabs(m) < 0.23) && (hyp > 4.0)) vpt->type = 'I';
return( vpt->width/2.0);

else if ((vpt->small_v = 'c') && (hyp < MINV~C)) return((vpt->width < 8.0) ? (vpt->width/2.0) : 4.0);
else return( vpt->width/4.0 );

9 count = 1;
goto do_vec;
}

vec = (struct vector *) calloc((un~igned)vcount, sizeof(~truct vector ));

/* Copy the-vector list for manipulation */
/* Interchange vector ends so that xl < x2 */
/* Place x[0] of vectors in an array for sorting */
xvec = (float *) calloc((unsigned)vcount, sizeof(float));
for (i=0, vecpt=vec, vctr=vctr_list; i<vcount; i++, vecpt++) ( *vecpt = *vctr;

/* flag very short vector~ */
/* do not flag vectors of type 'c', generated around small holes, such as in char 'e' */
len = sqrt(SQR(vecpt->y[1]-vecpt->y[0]) +
SQR(vecpt->x[1]-vecpt->x[0]));

if ((len < MINVEC) && ((vecpt->small_v != 'c') 11 (len = 0))) vecpt->type = 'e';
if (vecpt->x[0] > vecpt->x[1]) vecpt->x[0] = vctr->x[1];
vecpt->y[0] = vctr->y[1];
vecpt->x[1] = vctr->x[0];
vecpt->y[1] = vctr->y[0];

wid = width(vecpt);

vecpt->l = MAX(0, vecpt->x[0] - wid);
vecpt->r = MIN(vecpt->x[l] + wid, (float) bcol);
vecpt->id = -1;
xvec[i] = vecpt->l;
vctr = vctr->next;
}
if (vcount > 1) sort(vcount, xvec-1);
vppt = (struct vector **) calloc((unsigned)vcount, sizeof(struct vector *));
for (j=0; j<vcount; j++) for (i=0, vecpt=vec; i<vcount; i++, vecpt++) if((vecpt->l - xvec[j]) && (vecpt->id == -1)) vppt[j] = vecpt;
vecpt->id = i;
break;

#ifdef DBUG5 fprintf(stderr, "V_id V_l V_start V_end V_r V_type\n\nn);
for (i=0; i<vcount; i++) fprintf(stderr, "%d %0.2f %0.2f %0.2f %0.2f %c\n", i, vppt[i]->l, vppt[i]->x[0], vppt[i]->x[1], vppt[i]->r, vppt[i]->type);

~ 33 ~ _ 2 1 8 5~27 -#endif /* Do segmentation ~equentially ba~ed on the spread of the "vertical" vectors.
Find the starting ~egment */

isect = vend = 0;
flag = ;
for (i=O, scount=O; i<vcount; i++) if ((vppt~i]->type == 'v') 11 (vppt[i]->type == 'I')) seg[scount].start = vppt[i]->L;
/* Check for segment overlap */
if (scount && ((seg[scount].start - seg[O].start) <= GAP)) { /* merge with last seg */
scount = O;
else if (scount) I /* modify end of prev. seg */
seg[O].end = MIN(seg[O].end, (seg[l].start-O.l));

seg[scount].end = O;
seg[scount].type = 'v';
break;
else if ((vppt[i]->type == 'h') && !flag) seg[O].start = vppt[i]->l;
seg[O].end = vppt[i]->r;
seg[O].type = 'h';
scount = l;
flag = l;
else continue;

if (i == vcount) {

seg[O].start = vppt[O]->l;
seg[O].end = (float) bcol;
seg[O].type = 'h';
scount = l;
goto do_vec;
) while (1) vend = MAX(vend, vppt[i]->r);
seg[scount].type = 'v';
if (vppt[i]->type == 'I') seg[scount].end = vppt[i]->r;
seg[scount+l].start = MIN((float) bcol, seg[scount].end + 0.1);
scount++;
) /* Find next "vertical" vector */

~ 34 ~ ~ 2 1 ~ 5 a 2 7 for tj=i+1; j<vcount; j++) if ~(vppt[j]->type = 'v') 11 (vppt[j]->type == 'I')) break;

if (j - vcount) if (isect && ((isect - seg[scount].start) > GAP) ) seg[scount].end = isect - 0.1;
seg~count].type = 'v';
scount++;
seg[scount].start = isect;

break;

z = vppt[j]->l;
/* Set seg.end if an intersect was detected */
if (isect && (seg[scount].start < isect) && (z > isect)) if ((isect - seg[scount].start) > GAP) seg[scount].end = isect - 0.1;
seg[scount].type = 'v';
scount++i seg[scount].start = isect;

/* Set seg.end if next vert is clear */
if (z > vend) if ((vend - seg[scount].start) > GAe) seg[scount].end = vend;
seg[scount].type = 'v';
scount++;
seg[scount].start = vend + 0.1;

if ((z - seg[scount].start) > GAP ) seg[scount].end = z - 0.1;
seg[scount].type = 'h';
scount++;
seg[scount].start = z;
else seg[scount].start = z;

else if (vppt[j]->type == 'I') if ((z - seg[scount].start) > GAP) seg[scount].end = z - 0.1;
seg[~count].type = 'v';
scount++;
seg'[scount].start = z;

~ ~ 5 ~ ~ 2 1 8 5 8 2 7 else if ~(z - seg[scount].start) < O) scount--; /* merge with prev. segment */

/* Find intersect of i with all other vectors and set isect */
if (vppt[i]->type != 'I') for (k = i+1; k < vcount; k++) /* check, if vectors i and k intersect */
if (vppt[k]->l > vppt[i]->r) break;
find_x(vppt[i], vppt[k], &x, &y, &type);
if (type != NX) isect = x;
break;
}
i = j;

seg[scount].end = vend;
seg[scount].type = 'v';
if (scount && ((seg[scount].end-seg[scount].start) <= GAP)) seg[scount-l].end = seg[scount].end;
scount--;

if ((bcol-seg[scount].end) > GAP) scount++;
seg[scount].start = seg[scount-l].end + 0.1;
seg[scount].end = bcol;
seg~scount].type = 'h';

for (i=O; i<scount; i++) seg[i].end = MIN(seg[i].end, seg[i+l~.start - 0.1);
seg[i].end = MAX(seg[i].start, seg[i].end);

scount++;
do_vec:
#ifdef DBUG5 fprintf(stderr, "scount, s_start s_end\n\n");
for (i=O; i<scount; i++) fprintf(stderr, "%d %O.lf %O.lf\nn, i, seg[i].start, seg[i].end);

- 36 - i- 2 t ~ 5827 #endif /* Convert segments to feature vectorq and parameterize */

feat.lab~0] = ((bname > Ox20~ ? bname : '?'); /* symbol name */
feat.lab~3] = 0;
for (i=0, k=0; i<scount; i++) if ((seg~i].end - seg[i].start) <= GAP) continue;
seg2vec(bimage, Row~, bcol, &seg[i], &v_list,col_start,lagimg,dspl_~,clagimg) ifdef DEMO
/* show segment cutting line on vctr.pic */
xl = (int) ((seg[i].start+col_start)*10.0);
if(xl<0) xl=0;
dwln(xl,O,xl,Rows*lO,90,dspl);
dwln((xl+l),O,(xl+l),Rows*lO,90,dspl);
if(xl != 0) dwln((xl-l),O,(xl-l),Rows*lO,90,dspl);
/* show segment cutting line on vctrs.pic */
xl = (int)(((int)(seg[i].start+0.4)+col_start)*10.0);
if(xl<0) xl=0;
dwln(xl,O,xl,Row~*lO,90,dspl_s);
dwln((xl+l),O,(xl+l),Row~*lO,90,dspl_s);
if(xl != 0) dwln((xl-l),O,(xl-l),Rows*lO,90,dspl_s)i #endif feat.lab[1] = k + Ox30;
feat.num = (int) (seg[i].start + 0.5);
vecpt = v_list;
j = 0;
while (vecpt->next != (struct vector *) NULL) vec2param((float~base, seg[i].start, seg[i].end, vecpt, &feat);
/* Ignore very small vectors */
if (!feat.px && !feat.py) -vecpt = vecpt->next;
continue;

feat.lab[2] = j + Ox30;

ifdef DEMO
outimg_s(dspl_s,vecpt,col_start);
#endif ifdef DBUG5 fprintf(stderr, "%d %s %f %f %f %f %f\n", feat.num, feat.lab, feat.x, feat.y, feat.px, feat.py, feat.d);
#endif fprintf(outl, "%d %s %f %f %f %f %f\nn, feat.num, feat.lab, feat.x, feat.y, feat.px, feat.py, feat.d);
vecpt = vecpt->next;
if (!j) k++;
i++;
while( v_li~t->next != (struct vector *)NULL ) ~ 2185327 vctr=v_ liYt;
v_list = v_list->next;
free(vctr);
free(v_list);

free((char*)vec);
free((char*)xvec);

void find_x(vpti, vptj, xpt, ypt, type) struct vector *vpti, *vptj;
float *xpt, *ypt;
int *type;
float xl, x2, x3, x4, yl, y2, y3, y4, ml, m2;
float hypl, hyp2, zl, z2;
xl = vpti->x[0];
yl = vpti->y[0];
x2 = vpti->x[l];
y2 = vpti->y[l];
x3 = vptj->x[0];
y3 = vptj->y[0];
x4 = vptj->x[l];
y4 = vptj->y[l];
ml = (x2 - xlj/(y2 - yl);
m2 = (x4 - x3)/(y4 - y3);
*ypt = (x3 - xl - m2*y3 + ml*yl)/(ml - m2);
*xpt = xl + ml*(*ypt - yl);
hypl = sqrt(SQR(y2-yl)+SQR(x2-xl));
hyp2 = sqrt(SQR(y4-y3)+SQR(x4-x3));
zl = (fabs(*xpt-xl) > fabs~*xpt-x2) ? (*xpt-xl) : (*xpt-x2));
z2 = (fabs(*xpt-x3) > fabs(*xpt-x4) ? (*xpt-x3) : (*xpt-x4));
/* Determine if the vectors intersect */
if (((fabs(ml) < 0.26) && (hypl <= 4.0)) 11 ((fabs(m2) < 0.26) && (hyp2 <= 4.0))) *type = NX;
return;
}
else if ((*xpt>=xl} && (*xpt<=x2) && (*xpt>=x3) && (*xpt<=x4) &&
(hypl > MINVEC) && (hyp2 > MINVEC)) *type = NV;
else if ((ml*m2 < 0) && (zl*z2 < 0) && ((x3-x2) <= GAP)) *type = NV;

else *type = NX;
return;

- 38 - blo~t /* retrieve an image contains seperated blobs and return a vector liqt containing all vectors inqide thiq blob */
#include <ftr.h>
oid blobth(blob_list,gnode,rowq,bimage,bcol,b_col_start,b_vctr_list, b_clagnode,b_no_clagnode,dspl) struct blob **blob_liqt;
struct node *gnode; ~* globol lag node */
int rows,**d.qpl;
int ***bimage, *bcol, *b_col_start;
struct vector **b_vctr_list;
struct clagnode **b_clagnode;
int *b no clagnode;
int **image,**image_loc,colq,no_clagnode=O,**group;
int s_node,no_blob=1, no_node=O, max_no_node=O, node_ct=O;
int i,j,m,n,k,col_start,col_end,s_clagnode,c_node;
int clagnode_p,no,count,count_b,b_node;
int xl,x2,x3,yl,y2,y3;
int flag, find_theta();
float *x,*y,width,ratio;
struct clagnode *clagnode;
struct node *node;
struct vector *vctr_list, *vctr, *vctrl, *vctr2;
struct blob *blobin,*blob_max;
if((vctr_list=(struct vector *)calloc(l,sizeo(struct vector)))==
(struct vector *) NULL) printf("calloc fail in vctr_list\nn);
vctr_list->next = (struct vector *) NULL;
blobin = *blob_list;
col_start = blobin->col_start;
col_end = blobin->col_end;
for( ; (blobin->next!-(struct blob *) NULL) ; ) ( blobin = blobin->next;
if( blobin->next==(struct blob *) NULL ) break;
else if( ((col_start-2)<blobin->col_start &&
(col_end+2)>blobin->col_end) ) ( if(blobin->col_qtart<col_start) col_start=blobin->col_start;
if(blobin->col_end>col_end) col_end=blobin->col_end;
no_blob += 1;
}

lse if( (col_start>(blobin->col_start-2) &&
col end<(blobin->col_end+2)) ) {

if(blobin->col_start<col_start) col_start=blobin->col_start;
if(blobin->col_end>col_end) col_end=blobin->col_end;
no_blob += l;
}

else break;

cols = col_end - col_start + 1 ;
image = imatrix(O,(rows-l),O,(cols-1));
image_loc = imatrix(O,(rows-l),O,(cols-1));
/* initialize background to white */
for(i=O;i<rows;i++) for(j=O;j<cols;j++) image_loc[i][j] = 255;

- 39 ~ 2 ~ 85827 imageti] [j] = 255;
}

/* form the image ~t clagnode = (struct clagnode *)calloc~(un~igned) (cols*rowq/2), - sizeof(struct clagnode));
clagnode -= 1; /* let clagnode number start from ONE */
blobin = *blob_list;
for(i=l;i<=no_blob;i++) I

s_clagnode = no_clagnode+1;
~_node = blobin->s_node;
no_node=0;
node ct=0;
clag(rows, gnode, s_node, clagnode, &no_clagnode);
for(j=s_clagnode;j<=no_clagnode;j++) for(m=l;m<=((clagnode+j)->number);m++) clagnode_p = (gnode+((clagnode+j)->node[m]))->clagnode_p;
if(clagnode_ peO I I (clagnode_p!=0 && clagnode_p!=j)) no_node += 1;
}

blobin->no_node = no_node;
if((blobin->node=(int*)calloc(no_node,sizeof~int))) - (int*)0) printf(ncalloc fail in blobin->node\nn);
blobin->node -= l;
for(j=s_clagnode;j<=no_clagnode;j++) for(m=l;m<=((clagnode+j)->number);m++) {

clagnode_p = (gnode+((clagnode+j)->node[m]))->clagnode_p;
if(clagnode_p==0 11 (clagnode_p!=0 && clagnode_p!=j)) ( node_ct += 1;
blobin->node[node_ct]=(clagnode+j)->node[m];
}

if(no_node != node_ct) printf(nerror in blob->node finding\nn);
if(no_node > max_no_node) max_no_node = no_node;
blob_max = blobin;

width = 0.0;
ratio=0.0;
for(j=l;j<=no_node;j++) {
c_node = blobin->node[j];
width += (float)((gnode+c_node)->col_end -(gnode+c_node)->col_start+l);
for(m=((gnode+c_node)->col_start);
m<=((gnode+c_node)->col_end);m++) image[(gnode+c_node)->rowth][m-col_start] = 0;
}

width /= (float)no_node;
ratio = (float)no_node / width;
blobin->ratio = ratio;
blobin = blobin->next;
}
for(i=l;i<=no_clagnode;i++) free((char*)((clagn~de+i)->node+1));

free((char*)(clagnode+l));

/* vectorize only the largest blob and blobs have ~mi n~nt verticals */for(j=l;j<-~max_no_node;j++) c_node = blob_max->node~j];
for~m=((gnode+c_node)->col_start);
m<=((gnode+c_node)->col_end);m++) image_loc[(gnode+c_node)->rowth][m-col_startl = 0;
if(no_blob>l) blobin = *blob_list;
for(i=l;i<=no_blob;i++) if(blobin!=blob_max && blobin->ratio>3.0) for(j=l;j<=(blobin->no_node);j++) c_node = blobin->node[j];
for(m=(~gnode+c_node)->col_start);
m<=((gnode+c_node)->col_end);m++) image_loc[(gnode+c_node)->rowth][m-col_start] = 0;
blobin = blobin->next;
}

#ifdef DEMO
/* write the blob image into the displaying-purpose array */
for(i=0;i<~10*rows);i++) for(j=0;j<(10*cols);j++) dspl[i][j+10*col_start] = image_loc[i/10][j/10];
#endif node = (struct node *) calloc((unsigned)(cols*rows),sizeof(struct node));
if(node == (struct node *) NULL) fprintf(stderr, "calloc failed for node\n");
exit (l);
node -= 1; /* let node number start from ONE */
lag(image_loc, rows, cols, node, &no_node);
clagnode = (struct clagnode *)calloc((unsigned) (no_node), sizeof(struct clagnode));
clagnode -= l; /* let clagnode number start from ONE */
no_clagnode = 0;
s_node = 1;
do( clag(rows, node, s node, clagnode, &no_clagnode);
/* look for the starting node for next blob */
for(i=l;i<=no_node;i++) if((node+i)->mark != 1) {

s_node = i;
break;
else s_node=0;

~ while (s node > 0);

/* Analysis each clag-path-node */
for(i=l;i<=no_clagnode;i++) ~_ ~ 41 - 2~85827 if((clagnode+i)->class=='p') path_analy(image_loc,row~,cols,node,clagnode,i,vctr li~t);
t* merging vector.~ in adjacent clag-path-nodes */
do {

merge(vctr_li~t,clagnode,no_clagnode,&flag);
} while(flag == 1);
/* delete the 'v' vector~ if they are not merged with other vector~ or the merged 'v' has large angle */
for(i=l;i<=no_clagnode;i++) if((clagnode+i)->type -- 'v') ( if((clagnode+i)->no vector != 2) printf("error in blobth(l)\nn);
vctrl = (clagnode+i)->vctr[1];
vctr2 = (clagnode+i)->vctr~2];
if(vctrl->no_clag==1 && vctr2->no_clag==1 {

if(vctrl->clagnode[0]!=i ll vctr2->clagnode[0]!=i ) printf(nerror in blobth(2)\nn);
dlt_vctr(&vctr_list,vctrl);
dlt_vctr(&vctr_list,vctr2);
(clagnode+i)->no_vector = 0;
free((char*)((clagnode+i)->vctr+1));
) * CAUTION: if the following block is implemented, the vector information inside clagnode structure is wrong */
else if( find_theta(vctrl,vctr2) ) {

dlt_vctr(&vctr_list,vctrl);
dlt_vctr(&vctr_list,vctr2);
(clagnode+i)->no vector = 0;
free((char*)((clagnode+i)->vctr+1));
}
}

/* prepare output data */
for(i=l;i<=no_blob;i++) *blob_list = (*blob_list)->next;
*bimage = image;
*bcol = cols;
*b vctr_list = vctr_list;
*b_clagnode = clagnode;
*b_no_clagnode = no_clagnode;
*b_col_start = col_start;
free_imatrix(image_loc,0,(rows-1),0,~cols-1));
free((char*)(node+1));
}

find_theta(vpti, vptj) struct vector *vpti, *vptj;
I

float xl, x2, x3, ~4, yl, y2, y3, y4;
float sintl, sint2, costl, cost2;
float hypl, hyp2;

xl = vpti->x[0]i yl = vpti->y[0];
x2 = vpti->x[1];
y2 = vpti->y[1];
x3 = vptj->x[0];
y3 = vptj->y[0];

x4 = vptj->x[l];
y4 = vptj->y~l];
if (x2--xl 1I x4==x3) return (-1);
hypl = sqrt((y2-yl)*(y2-yl)+(x2-xl)*(x2-xl));
hyp2 = sqrt~-(y4-y3)*(y4-y3)+(x4-x3)*(x4-x3));
sintl = fabs((x2-xl)/hypl);
costl = fabs((y2-yl)/hypl);
sint2 = fabs((x4-x3)/hyp2);
cost2 = fabs((y4-y3)/hyp2);
if ((costl*cost2 - sintl*sint2) > 0.0) return (O);
else return (l);

_ 43 _ 2 t 8 582 7 -clag.c /* convert LAG to c-LAG */
#include <ftr.h>
.#define JOUT 1.4 /* threshold for outlier in including junction to path */
void clag(rows, node, s_node, clagnode, no_clagnode) int rows;
int s_node; /* starting node in the lag to do clag converting */
int *no_clagnode;
struct node *node;
struct clagnode *clagnode;
I

void pushval(),popval();
int c_node, p_node, nbr_node;
int i,j;
int count,find,touch;
int s_clagnode;
struct stack *ST;
int *PATH;
float width,width_j;
int flag;
s_clagnode = *no_clagnode + 1;
ST = (struct stack *)NULL ;
if( (PATH=(int *)calloc((unsigned)rows,sizeof(int))) == (int *)0 ) printf("calloc fail in PATH \nn);
PATH -= l; /* index start from ONE */
pushval(&ST,s_node,0);
while( ST != (struct stack *)0 ) /* step 2 */
{

popval(&ST,&c_node,&p_node);
if( (node+c_node)->mark == 1 ) continue;
count=1;
PATH[count]=c_node;
/* similar to "do B" loop */
do {

c_node=PATH[count];
find=0;
touch =0;
for(i=l;i<=((node+c node)->above);i++) {
nbr_node=(node+c_node)->a nodeli-l];
touch += l;
if(nbr_node == p_node) continue;
if( ((node+nbr_node)->mark) != 1) {

find += l;
if(find~
{

count += l;
PATH[count]=nbr_node;
else pushval(&ST,nbr_node,c node);

for(i=l;i<=((node+c_node)->below);i++) {

- nbr_node=(node+c_node)->b_node[i-l];
touch += l;
if(nbr_node == p_node) continue;
if( ((node+nbr_node)->mark) != 1) {

find +~ 1;
if(find==l) ( count += 1;
PATH[count]=nbr_node;
else pushval(&ST,nbr_node,c_node);

/* similar ta ~tep 14 (creat a junction and/or a path)*/
if(find>0 && ((node+c_node)->above>=2 ll (node+c_node)->below>=2)) if(count>2) {

*no_clagnode += 1;
(clagnode+*no_clagnode)->class='p';
(clagnode+*no_clagnode)->number = count-2;
if( ((clagnode+*no_clagnode)->node =
(int *)calloc((unsigned)(count-2),sizeof(int))) = (int *)0 ) printf("calloc fail in clagnode->node: 'p'\nn);
(clagnode+*no_clagnode)->node -= 1; /* start from ONE */
for(i=l;i<=(count-2);i++) ( (clagnode+*no_clagnode)->node~i] = PATH[i];
(node+PATH[i])->mark = 1;
(node+PATH[i])->clagnode = *no_clagnode;

*no_clagnode += 1;
(clagnode+*no_clagnode)->class='j';
(clagnode+*no_clagnode)->number = l;
if( ((clagnode+*no_clagnode)->node =
(int *)calloc(l,sizeof(int))) == (int *)0 ) printf("calloc fail in clagnode->node: 'j'\n");
(clagnode+*no_clagnode)->node -= 1; /* start from ONE */
(clagnode+*no_clagnode)->node[1]=PATH[count-1];
(node+PATH[count-1])->mark = 1;
(node+PATH[count-1])->clagnode = *no_clagnode;
PATH[1] = PATH[count];
count = 1;
p_node=c_node;
~ while(find != 0);
/* similar to step 15 & 16 */
/* creat a path node if paths exist */
if(((node+c_node)->above >= 2) ll ((node+c_node)->below >= 2)) count -= 1 if(count>0) {
*no_clagnode += 1;
(clagnode+*no_clagnode)->class='p';
(clagnode+*no_clagnode)->number = count;
if( ((clagnode+*no_clagnode)->node =
(int *)calloc((unsigned)count,sizeof(int))) == (int *)0 ) printf("calloc fail in clagnode->node: 'p'\n");
(clagnode+*no_clagnode)->node -= li /* start from ONE */
for(i=l;i<=count;i++) {

(clagnode+*no_clagnode)->node[i] = PATH[i];
(node+PATH[i])->mark = 1;
(node+PATH[i])->clagnode = *no_clagnode;
}
/* creat a junction if it exist~ */

45 _ 2~85827 if(~(node+c_node)->above >= 2) 11 ((node+c_node)->below >= 2)) {

*no_clagnode += l;
(clagnode+*no_clagnode)->class='j';
(clagnode+*no_clagnode)->number = l;
if( ((clagnode+*no_clagnode)->node =
(int *)calloc(l,sizeof(int))) == (int *)0 ) printf("calloc fail in clagnode->node: 'j'\n");
(clagnode+*no_clagnode)->node -= l; /* start from ONE */
(clagnode+*no_clagnode)->node[l]=eATH~count+l];
(node+PATH[count+l])->mark = l;
(node+PATH[count+l])->clagnode = *no_clagnode;
}

/* sort nodes inside each path_node in increasing order */
for(i=s_clagnode;i<=*no_clagnode;i++) if((clagnode+i)->class == 'p' && (clagnode+i)->number > 1) {
if((clagnode+i)->node[2] < (clagnode+i)->node[l]) {

count=(clagnode+i)->number ;
for(j=l;j<=count;j++) PATH[count-j+l] = (clagnode+i)->node[j] ;
for(j=l;j<=count;j++) (clagnode+i)->node[j] = PATH[j] ;
}

}
/* modify c-lag: when one of the degrees of a junction is 1, the junction is included in the path connected to the junction if the junction is not a outlier */
for(i=s_clagnode;i<=*no_clagnode;i++) if((clagnode+i)->class == 'p') {

count = 0;
c_node=(clagnode+i)->node[l];
if((node+c_node)->above - 1 ) ( p_node=(node+c_node)->a node[0];
!* check the junction is a outlier or not */
width_j=(float)((node+p_node)->col_end -(node+p_node)->col_start+l);
flag=0;
for(j=l;j<=((clagnode+i)->number);j++) c_node=(clagnode+i)->node[j];
width = ((float)((node+c_node)->col_end -(node+c_node)->col_start + 1)) ;
if(width_j < JOUT*width ) flag=l;
) if((node+p node)->below == 1 && flag==l ) count += l;
PATH[count]=p_node;
(node+p_node)->clagnode_p = i;
}
}
. for(j=l;j<=((clagnode+i)->number);j++) {

count += l;
PATH[count]=(clagnode+i)->node[j];
}

- 46 - ` 2 t 8 5~27 .
c_node=(clagnode+i)->node[(clagnode+i)->number];
if((node+c_node)->below - 1 ) {

p_nodes(node+c_node)->b_node~0];
/* check the junction i9 a outlier or not */
width_j=(float)((node+p_node)->col_end -(node+p_node)->col_~tart+l);
flag=0;
for(j=l;j<=((clagnode+i)->number);j++) ( c_node=(clagnode+i)->node[jl;
width = ((float)((node+c_node)->col end -(node+c_node)->col start + 1)) ;
if(width_j < JOUT*width ) flag=l;
J

if((node+p_node)->above == 1 && flag==l ) count += l;
PATH [ count~=p_node;
(node+p_node)->clagnode_p = i;
) free((char*)((clagnode+i)->node+l));
if( ((clagnode+i)->node =
(int *)calloc((unsigned)count,sizeof(int))) == (int *)0 ) printf("calloc fail in clagnode->node: 'p'\nn);
(clagnode+i)->node -= 1; /* start from ON~ */
for(j=l;j<=count;j++) (clagnode+i)->node[j]=PATH [ j ];
(clagnode+i)->number=count;

/* connect adjacent c-lag nodes */
for(i=s_clagnode;i<=*no_clagnode;i++) if((clagnode+i)->class == 'j'~
{
c_node=(clagnode+i)->node[l];
for(j=l;j<=((node+c_node)->above);j++) -nbr node=(node+c_node)->a_node[j-l];
(clagnode+i)->a_clagnode[j-1]=(node+nbr_node)->clagnode;
if((clagnode+(node+nbr_node)->clagnode)->class ='p') (clagnode+(node+nbr_node)->clagnode)->b_clagnode[0]=i;

}

for(j=l;j<=((node+c_node)->below);j++) {
nbr_node=(node+c node)->b node[j-1];
(clagnode+i)->b_clagnode[~-1]=(node+nbr_node)->clagnode;
if((clagnode+(node+nbr_node)->clagnode)->class=='p') (clagnode+(node+nbr_node)->clagnode)->a_clagnode[0]=i;
}
}

free((char*) (PATH+l) );
free((char*)ST);
}

#undef JOUT
/* push a node into stack */
void pushval(stack,node,p_node) int node,p_node;
struct stack **~tack;

~ 47 ~ 2 1 8 5 82 7 -{

struct stack *np;
np=(struct stack *) malloc(sizeof(struct stack));
if( np -- (struct stack *) O ~ printf("malloc fail in pushval()\nn);
np->node = node;
np->p_node = p_node;
/* push operation */
np->next = *stack;
*stack = np;
}

/* pop a node from stack */
void popval(stack,node,p_node) int *node, *p_node;
struct stack **stack;
struct stack *out;
/* pop operation */
if( (out=*stack) == (struct stack *) O ) printf("STACK empty\n");
else *stack = out->next;
out->next = (struct stack *)O;
/* get the value */
*node = out->node;
*p_node = out->p_node;
free((char*)out);

-,****************************************************************, ,* *, /* CLUSTER.C
/* */
/****************************************************************/

/*
* cluster: find cluster centers for a set of feature vectors *
* usage: cluster [-i file.fv] [-s seed.fv] [-c file.cv]
* usage: cluster [-q file.sv]
*/
.

#include <cluster.h>
#define K 32 /* max no of clusters */
#define S 10 /* initial no. of seeds if seed file is unavailable*/

main(argc, argv) int argc;
char *argv[];
{
register int i, j, jj, k, n;
int nfeat = 0; /* no. of features for clustering */
int nclust = 0; /* no. of active clusters */
int old_nclust;
FILE *fl, *f2, *f3, *f4;
static FEAT feat[2200], old_feat[2200], *fpt, *fptl;
static CLUST clust[K], old_clust[K], new_cl, *new_clust, *cpt, *cptl;
char filel[40], file2[40], file3[40], file4[40];
short cent = 0, quantize = 0, seed = 0, go_end = 0;
short change, mid, bigclust, minclust;
float gindex, old_gindex = 0, mean_var;
float mval, mdist, fdist(), cdist(), duml;
float *data, adev;
void pr_clust(), rs_ex(), quant();

sprintf(file2,"temp.cv"); /* default file */
sprintf(file3,"temp.svn); /* default file */
while(argc-- > 1) if(argv[argc][0] == '-') switch(argv[argc][l]) ( case 'u':
printf(nusage: usage: cluster [-i file.fv] [-s seed.fv]\ -c file.cv] [-q file.sv] \nn);
exit(l);
case 'i': /* input file with feature vectors */
sprintf(filel,"%sn,argv[argc+l]);
cent = l;
break;
case 's': /* input file with cluster seeds */
sprintf(file4,n%sn,argv[argc+1]);
seed = l;
break;
case 'c': /* output/input file with cluster centers */
sprintf(file2,"%sn,argv[argc+1]);

~ 49 ~ 2185827 -break;
ca~e 'q': /* output file with training vectors */
sprintf(file3, n % S n ~ argv[argc+1]);
quantize = 1;
break;
default:
-break;
}

fpt = feat;
cpt = clust;
if (!quantize) {

if (~fl = fopen (filel, "rn) ) == (FILE *) NULL) fprintf(stderr, "*** clust.fv file cannot be opened\n");
exit(l);
}

/* Convert the label to index and increment corresponding entries in the segstat structure */

/* Read total no. of features */
fscanf(fl, "%d", &nfeat); */
/* clust.v file format: [serial label x y r t d] */
fpt = feat, i = O;

while (fscanf(fl,"%d %s %f %f %f %f %f", &fpt->num, fpt->lab, &fpt->x, &fpt->y, &fpt->px, &fpt->py, &fpt->d) != EOF) if (*(fpt->lab+2) != '*') (fpt++, i++;}

}

nfeat = i;
minclust = MAX(2, nfeat*2/100);
#ifdef DBUG
fprintf(stderr, "No. of entries in file.fv read = %d\n", nfeat);
#endif /* If seed file is available, initialize the cluster points and set nclust (no. of clusters), else it is O */
if (seed) {

if ((f4 = fopen (file4, "rn) ) == (FILE *) NULL) fprintf(stderr, n*** clust.v file cannot be opened\nn) exit(l);
}

/* fscanf(f4, "%d", &nclust); Read total no. of seed clusters */
/* seed.v file format: [serial label x y r t d] */
cpt = clust, i = O;

while (fscanf(f4,"%*s %*~ %f %f %f %f %f", &cpt->x, &cpt->y, &cpt->px, &cpt->py, &cpt->d) != EOF) cpt++, i++;

~ 50 - 2185827 -- }
old_nclust = nclust = i;

el~e /* assume the first S entries a~ seeds */
{
old_nclust = nclust = MIN(nfeat, S);

for (i = O, cpt = clust, fpt = feat; i < nclust; i++, cpt++, fpt++) {

cpt->x = fpt->x;
cpt->y = fpt->y;
cpt->px = fpt->px;
cpt->py = fpt->py;
cpt->d = fpt->d;
} }
/* Assign all point~ to the clusters, find clu~ter centroids, reiterate. Exit when no reassignment */
new_clust = &new_cl;
new_clust->x = O, new_clust->y = O;
new_clust->px = O;
new_clust->py = O;
new_clust->d = 0;
n = O; /* no. of new cluster allocations */
while(1) {
for {k=O, change=l; (change && (k < 25)); k++) /* Assign features to clusters */
for (i=O, fpt=feat, mid=O, change=O; i < nfeat; i++, fpt++) {
for (j=O, cptl=clust, mval=MEG; j < nclust; j++, cptl++) duml = fdist(fpt, cptl);
if (duml < mval) {mval = duml, mid = j;~

}

/* Store cluster info. in feature */

if (fpt->c_id != mid) change = 1;
fpt->c_id = mid;
fpt->c_dist = mval;

/* Store cumulative distances in the cluster */
/* and find centroids */
for (j = O, cpt = clust; j < nclust; j++, cpt++) {

cpt->f_sum = O;
cpt->x = O;
cpt->y = O;
cpt->px = O;
cpt->py = O;
cpt->d = O;
cpt->var = O;

for (i=O, fpt=feat, mval = MEG, mid=O; i < nfeat;
i++, fpt++) cpt = clust + fpt->c_id;
(cpt->f_sum)++;
cpt->x +~ fpt->x;
cpt->y += fpt->y;
cpt->px += fpt->px;
cpt->py += fpt->py;
cpt->d += fpt->d;
}

/* Find centroid of cluster-~ */
for (j = 0, cpt = clust; j < nclu~t; j++, cpt++) cpt->x = cpt->x / cpt->f_sum;
cpt->y = cpt->y / cpt->f_sum;
cpt->px = cpt->px / cpt->f_sum;
cpt->py = cpt->py / cpt->f_sum;
cpt->d = cpt->d / cpt->f_sum;

/*** ifdef DBUG
if (nclust == 11) for (i=0, cpt=clust+10, fpt=feat;
i<nfeat; fpt++, i*+) if (fpt->c_id ~~ 0) fprintf(stderr, n**** For Debug Only (Cluster 11) ****\nfeat~%d]....
#endif **/
/* Remove any cluster with less than 2% population, */
***
for (j = 0, cpt = clust; j < nclust; j++, cpt++) if (cpt->f_sum < minclust) {

for (i = j; i < nclust; i++) *(clust+i) = *(clust+i+l);

nclust--, j--;
change = l;
-***/
~ /* End of k-loop */
/* Find variance of clusters */
for (i=0, fpt=feat; i < nfeat; i++, fpt++) ( cpt = clust + fpt->c_id;
cpt->var += fpt->c_dist;

for (j = 0, mean_var = 0; j < nclust; j++) cpt = clust + j;
cpt-ivar = cpt->var / cpt-7f_sum;
if (cpt-~var < 0.01) continue;

mean_var += cpt->var;
***
if (cpt->var < EPS) ` {
ifdef DBUG
fprintf(stderr, "Var = %.3f too low for Cluster %d\nn, cpt->var, j); endif go_end = l;
**/
mean_var = mean_var / nclust;
data = (float *) calloc(nclust, sizeof (float));
for (j = O; j < nclust; j++) I

cpt = clu~t + j;
if (cpt->var < 0.01) continue;
*(data + j) = cpt->var / mean_var;
}

if (nclust >= 2) find_sigma(data, nclust, &adev);
free(data);
ifdef DBUG
fprintf(stderr, "Deviation of ~Var/Mean_var) = %.3f\n", adev); endif if (go_end) if (n) gindex = old_gindex;
nclust = old_nclust;
rs_ex(clust, old_clust, feat, old_feat, nclust, nfeat);
break;
}
/* Find compactness index for each cluster (weighted av.
sq. distance to another cluster / variance of the cluster) */
for (j = O, cpt = clust; j < nclust; j++, cpt++) for (jj=O, mval=O, cptl=clust; jj < nclust; jj++, cptl++) if tjj != j) ~mval += ((cdist(cptl, cpt)) * cptl->f_sum);l cpt->cindex = (mval / (nfeat-cpt->f_sum)) / cpt->var;

/* If min Cindex exceed 15, exit */

for (j = O, cpt = clust, mval = MEG; j < nclust; j++, cpt++) if (cpt->cindex < mval) mval = cpt->cindex;
if ((mval > 15) && (adev < 0.4)) {

ifdef DBUG
fprintf(stderr, "All Cindex exceed l5\n"); endif ifdef DBUG
fprintf(stderr, "Deviation of (Var/Mean_var) is sati~factory\nn); endif if (n) ' break;

/* Find the the wt. av. cindex for all cluster-~ and compare with the previous iteration ~for nclust-l). If worse, return to previous configuration, else grow untill nclust = K is reached */
for (j = O, cpt = clust, gindex = 0; j < nclusti j++, cpt++) {

if (cpt->var < 0.01) continue;
gindex += (cpt->cindex * cpt->f_sum);

gindex = gindex / nfeat;
if (gindex <= old_gindex) ifdef DBUG
fprintf(stderr, "New Gindex %.3f < Old Gindex %.3f\nn, gindex, old_gindex); endif go_end = 1;
if (go_end) if (n) gindex = old_gindex;
nclust = old_nclust;
rs_ex(clust, old_clust, feat, old_feat, nclust, nfeat);
break;
/* Find min. dist to another cluster, must exceed 4(var) */

for (j = O, cpt = clust; j < nclust; j++, cpt++) for (jj=O, mval=MEG, cptl=clust; jj < nclust; jj++, cptl++) if ((jj != j) &~ (mdist = cdist(cptl, cpt)) < mval) mval = mdist;
if ((mval / cpt->var) < 4) ifdef DBUG
fprintf(stderr, "Dist. of Clust %d to Clust %d is %.3f (less than 4)\ endif go_end = l;

if (go_end) if (n) gindex = old_gindex;
nclust = old_nclust;
rs ex(clust, old_clust, feat, old_feat, nclust, nfeat);
break;
*/
/* Find min cindex, quit if greater than 0.5 * gindex */

~ 54 ~ 2 ~ 8 5 82 7 _ for tj = 0, cpt = clust, mval = MEG; j < nclu~t; j++, cpt++) if ~cpt->var <0.01~ continue;
if ~mval > cpt->cindex) mval = cpt->cindex;
mid = j;

if (mval > (gindex * 0.5)) ( ifdef DBUG
fprintf(stderr, "Min. Cindex = %.3f for Cluster %d ( > 0.5*Gindex = %.3f)\nn, mval, mid, gindex); endif break;

if (nclust == K) ifdef DBUG
fprintf(stderr, "Cluster ~ize i~ %d (max)\nn, ~); endif break;
}
ifdef DBUG
fprintf(stderr, "No. of clust = %d, Iter = %d, Mean Var = %.3f, Gindex = %f\n\nn, nclust, k, mean_var, gindex);
***
pr_clust(feat, clust, nfeat, nclust);
**/
fprintf(stderr, "C_id Var Var/Mvar Cindex Center \n");
for (i = 0, cpt = clust; i < nclust; i++, cpt++) fprintf(stderr, "%d %.3f %.3f %.3f %.3f %.3f %.3f %.3f %.3f\nn, i, cpt->var, *(data+i), cpt->cindex, cpt->x, cpt->y, cpt->px, cpt->py, cpt->d);
fprintf(stderr, n------------------------\n");
#endif /* Save the current configuration */

old_gindex = gindex;
old_nclust = nclust;
for (j=0, cpt=clust, cptl=old_clust; j<nclust; j++, cpt++, cptl++) *cptl - *cpt;
for (i=0, fpt=feat, fptl=old_feat; i<nfeat; i++, fpt++, fptl++) *fptl = *fpt;
/* Find the cluster with min. cindex */
for (j = 0, cpt = clust, mval = MEG; j < nclust; j++, cpt++) if (cpt->cindex < mval) ( mval = cpt->cindex, mid = j;

bigclust = mid;
/* Find the feature at the farthest di~t in min. Cindex cluster */
for (i=0, fpt=feat, mval = EPS; i < nfeat; i++, fpt++) - - 2l~s827 -if ((fpt->c_id = bigclust) && (fpt->c_dist > mval)) I

mid = i;
mval = fpt->c_dist;
) }
ifdef DBUG
fprintf(stderr, "Worst Center ID = %d, Farthest Feature ID = %d\nn, bigclust, mid);
#endif /* If this feature was the new center at the previous iteration, exit */ ****
if (old_nclust = nclust) fpt = feat + mid;
if ((new_clust->x == fpt->x) && (new_clust->y = fpt->y) &&
(new_clust->px = fpt->px) && (new_clust->py == fpt->py) &&
(new_clust->d = fpt->d~) ifdef DBUG
fprintf(stderr, "Cluster Centers same as in prev. iter.\nn); endif break;
**/
/* Save the current configuration */
old_gindex = gindex;
old_nclust = nclust;
for (j=O, cpt=clust, cptl=old_clust; j<nclust; j++, cpt++, cptl++) *cptl = *cpt;
for (i=O, fpt=feat, fptl=old_feat; i<nfeat; i++, fpt++, fptl++) *fptl = *fpt;
/* Use this feature as center for the new cluster */

n+t;
nclust++;
cpt = clust + nclust - 1;
fpt = feat + mid;
new_clust->x = cpt->x = fpt->x;
new_clust->y = cpt->y = fpt->y;
new_clust->px = cpt->px = fpt->px;
new_clust->py = cpt->py = fpt->py;
new_clust->d = cpt->d = fpt->d;
- ~ /* End of while(l) loop */
#ifdef DBUG
fprintf(stderr, "FINAL RESULTS:\nn);
fprintf(stderr, "No. of clust = %d, Iter = %d, Gindex = %f\n", nclust, k, gindex);
pr_clust(feat, clust, nfeat, nclust);
#endif if ((f2 = fopen (file2, "wn)) == (FILE *) NULL) fprintf(stderr, n*** seed.cv file cannot be opened\nn);

-exit(1);
}

fprintf(f2,"%d\nn, nclust);
for (i = 0, cpt = clu3t; i < nclust; i++, cpt++) fprintf~f2, "%f %f %f %f %f\n", cpt->x, cpt->y, cpt->px, cpt->py, cpt->d);

fclose(fl);
fclose(f2);
if (seed) fclose(f4);
} /* End of if(!quantize) block */
/* Merge current clusters two at a time and check if it reduces the compactness measure....... */

/* The following part converts a continuous vector to a bit vector for representing a segment */
else /* Open file.fv file */
if ((fl = fopen (filel, "rn)) == (FILE *) NULL) fprintf(stderr, n*** file.fv cannot be opened\nn);
exit(1);

/* Open file.cv file */
if ((f2 = fopen (file2, "rn~) == (FILE *) NULL) fprintf(stderr, n*** file.cv cannot be opened\nn);
exit(1);

/* Open file.sv file */
if ((f3 = fopen (file3, "wn)) -- (FILE *) NULL) {
fprintf(stderr, n*** file.sv file cannot be opened\nn);
exit(1);
}

/* Call quantizer program to convert a set of continuous feature vectors for a segment to a binary bit vector */
quant(fl, f2, f3);
fclose(fl);
fclose(f2);
fclose(f3);

/****************************************/
/* Cluster center to feature diqtance */

float fdiqt(fpt, cpt) FEAT *fpt;
CLUST *cpt;

- 57 - 2 1 8 58~ 7 {
return(SQR~fpt->x - cpt->x) + SQR(fpt->y - cpt->y) +
SQR(fpt->px - cpt->px) + SQR(fpt->py - cpt->py) +
SQR(fpt->d - cpt->d));

/****************************************/
/* Cluster center to cluster center distance */

float cdist(cptl, cpt) CLUST *cpt, *cptl;
return(SQR(cptl->x - cpt->x) + SQR(cptl->y - cpt->y) +
SQR(cptl->px - cpt->px) + SQR(cptl->py - cpt->py) +
SQR(cptl->d - cpt->d));
}

/****************************************/
/* Print cluster members */

void pr_clust(feat, clust, nfeat, nclust) FEAT * feat;
CLUST *clust;
short nfeat, nclust;
FEAT *fpt;
CLUST *cpt;
short i, j;
fprintf(stderr, "C_id Var Cindex Center \nn);
for (i = O, cpt = clust; i < nclust; i++, cpt++) fprintf(stderr, "%d- %.3f %.3f %.3f %.3f %.3f %.3f %.3f\nn, i, cpt->var, cpt->cindex, cpt->x, cpt->y, cpt->px, cpt->py, cpt->d);
fprintf(stderr, n------------------------\n");
for (i = O, cpt = clust; i < nclust; i++, cpt++}
fprintf(stderr, "\nCluster id = %d: Cindex = %.3f Center = %.3f %.3f %.3f %.3f %.
i, cpt->cindex, cpt->x, cpt->y, cpt->px, cpt->py, cpt->d);
fprintf(stderr, "F_ id F_ num F_ lab C_dist\nn);
for (j = O, fpt = feat; j < nfeat; j++, fpt++) if (fpt->c_id == i) fprintf(stderr, "feat[%d] %d %s %.3f\nn, j, fpt->num, fpt->lab, fpt->c_dist);
fprintf(stderr, n***************\nn);

/****************************************/
/* Restore the old configuration */

void r~ ex(clust, old_clust, feat, old_feat, ncIust, nfeat) CLUST *clust, *old_clu~t;
FEAT *feat, *old_feat;
{int nclust, nfeat;
register int i, j;
FEAT *fpt, *fptl;
CLUST *cpt, *cpti;

-- -- 21 85~27 for (j=O, cpt=clu~t, cptl=old_clust; j<nclu~t; j++, cpt++, cptl++) *cpt = *cptl;
for (i=O, fpt=feat, fptl=old_feat; i<nfeat; i++, fpt++, fptl++) *fpt = *fptl;
-/****************************************/
find_sigma(data, n, adev)float *data, *adev;
int n;
int j;
float s, av;
if (n <= 1) return (-1);
el~e for (j=O, s=O.O; j<n; j++) s += *(data+j);
av = s/n;
for (j=O, *adev=O.O; j<n; j++) *adev += SQR((*(data+j)) - av);
*adev = sqrt(*adev/n);
return(O);

}

`~ 2 1 85827 -,****************************************************************, ,* *, /* CLUSTER.H */
/* */
/****************************************************************/
#include <stdio.h>
#include <math.h>
#define EPS l.Oe-6 /* Small value */
#define MEG l.Oe+6 /* Large value */

#define SQR(a_) ((a_) * (a_)) #define MIN(x_,y_) (((x_) < (y_)) ? (x_) : (y_)) #define MAX(x_,y_) (((x_) > (y_)) ? (x_) : (y_)) typedef struct int num; /* feature identifier, used for display */
char lab~4]; /* feature label */
float x; /* feature parameters */
float y;
float px;
float py;
float d;
int c_id; /* feature cluster id */
float c_dist; /* dist from cluster center */
I FEAT;
typedef struct float x; /* cluster parameters */
float y;
float px;
float py;
float d;
int f_sum; /* no. of features in cluster */
float var; /* deviation of cluster */
float cindex; /* compactness and isolation index */
CLUST;

`
-/* given two end-point-q to draw a line in image plane */ dwl n . c#include <math.h>
void dwln(xl,yl,x2,y2,greylevel,image) int xl,yl,x2,y2,greylevel,**image;
int length,i,j;
float x,y,xincrement,yincrement;
length = abs(x2-xl);
if( ab~y2-yl) > length ) length=ab~(y2-yl);
xincrement = (float)(x2-xl) / (float)length ;
yincrement = (float)(y2-yl) / (float)length ;
x = xl + 0.5;
y = yl + 0.5;
for(i=l;i<=length;i++) ( image[((int)y)]~((int)x)]=greylevel;
x += xincrement;
y += yincrement;
}

#include <math.h> ftr. h #include <~tdio.h>
#include <picfile sig.h>
#include <nrutil.h>
#define SYMIN 12.
#define SXMIN 7.
struct stack {
struct stack *next;
int node;
int p_node; ~;
struct queue {
struct stack *front;
struct stack *rear; };
qtruct vector{
struct vector *next;
int id;
float width; /* average width for vertical-vector */
int no_lag; /* total num~ber of effective lag nodes count for width */
char type; /* vertical('v') or horizontal('h') vector or arc('a') or domin~nt diagonal vector for s/z {'s') */
char small_v; /* the vertical vector adjacent to long run-lengths */
float x[2]; /* x location for start and end points */
float y~2]; /* y location for start and end points */
float ax[3]; /* x location for start, end and the farest points of arc*/
float ay[3]; /* y location for start, end and the farest points of arc*/
float dst; /* In first phase, dst=200 if this is a vector in the middle part of y. In second phase, dst is the distance from the farest point to the chord of an arc. */
float l; /* left-most pixel after subtracting half width */
float r; /* right-most pixel after adding half width */
int no_clag; /* total no. of clagnodes covered by this vector */
int clagnode[5]; /* clagnodes covered by this vector */
~ ;
struct blob~
struct blob *next;
int s_node; /* starting node for this blob */
int col_start;
int col_end;
int no_node; /* total no. of node inside this blob */
int *node; /* lag-nodes inside this blob (start from ON~) */
float ratio; /* height to width ratio */
~ ;
struct node I
int rowth;
int col_start;
int col_end;
int above; /* above degree */
int a node[10]; /* lag-node num~ber connected above */
int below; /* below degree */
int b_node[10]; /* lag-node ~um~ber connected below */
int mark;
int clagnode; /* belong to which clagnode */
int clagnode_p; /* probably belong to the other clagnode */
~ ;
struct clagnode {
char class; /* 'p' for PATH, 'j' for junction */
char type; /* type of returned vectors (x or v) from wx_detect~) */
int number; /* total number of lag-node inside this clag-node */
int *node; /* lag-node num~ber start from ON~ */
int a_clagnode[l0]; /* clag-node number connected above */

- 62 - 21 ~S827 -int b_clagnode[l0]; /* clag-node number connected below */
int group; /* total no. of group in~ide thiq clag-node */
int no_vector; /* total no. of vectors in thi~ clag-node */
struct vector **vctr; /* pointers to vectors in this clagnode (start from ONE) */
} ;
typedef struct ( float start;
float end;
char type; /* vert = 'v', hor = 'h' */
} BSEG;
void lag(), clag(), dwln(), path_analy(), merge();
void line_fit(),grylvl(), blob_extr(), blobth();
void seg2vec(), prep(), mdn();
float p21ine();
int collinear(),collinear_ 9 ( ), WX_ detect(),y_detect(),arc_check();
int dn_change();
int find_degree();
struct vector *add_vctr(),*dlt_vctr();
void outimg(),outimg_s();
void path_s();

- 63- 2~85827 fe .c / Extract feature~ from a character image * Usage: fe ~-i image.pic]

* ~efault output file is temp.fv */
#include <ftr.h>
int Rows, Col-~;

main(argc, argv) int argc;
char *argv[];
PICFILE *inl,*out2,*out3,*out4,*outS,*out6;
FILE *outl, *out7;
unsigned char *temp;
int **image, **block, **dspl, **dspl_~, **bimage;
int **clagimg, **lagimg;
int bcol, cols, rows;
int no_blob, col_start;
int no_node, no_clagnode;
int blob_ct=0, str_ct=0;
int i,j;
char bufl[50],buf2[50],duml[20],dum2[30];
struct node *node;
struct clagnode *clagnode;
struct vector *vctr_list, *vctr;
struct blob *blob_list, *blob_t;
int base;
char *str, *charpt;
if(argc==l) printf(nusage: fe [-i image.pic] [-o file.fv]\nn);
exit(l);
}
sprintf(buf2, "temp.fvn); /* default output file */
while(argc-- > 1) if(argv[argc][0] == '-') switch(argv[argc][1]) case 'i': /* input image .pic file */
sprintf(bufl,"%sn,argv[argc+1]);
if( (inl = picopen_r(bufl)) == 0 ) perror(bufl);
exit(1);
- break;

case 'o': /* output file.fv file */
sprintf(buf2,"%s",argv[argc+1]);
break;

default:
fprintf(stderr,"c~: ~nA line option error~nn);
fprintf(stderr, "usage: fe [-i image.pic] [-o file.fv]\nn);

break;
}

if( (outl a fopen(buf2,"w")) == (FILE *) NULL ) fprintf(qtderr,"%s : cannot open for writing\nn,"file.fvn);
exit(1);

Cols = colq = inl->width ;
Rows = rows = inl->height ;
base = atoi(picgetprop(nBASLn, inl));
str = picgetprop(nSTRINGn, inl);
str_ct = strlen(str);

#ifdef DEMO
fprintf(stderr,"\nn);
#else printf(n %s n,str);
if( (out7 = fopen(ntemp.resn,"wn)) = (FILE *) NULL ) fprintf(stderr,n%s : cannot open for writing\nn,nfile.fvn);
exit(1);

fprintf(out7, "%s ", str);
fclose(out7);
#endif /* reserve space for input character image */
image = imatrix(O,(rows-l),O,(cols-1));
block = imatrix(O,(rows-l),O,(cols-1));
lagimg = imatrix(O,(rows-l),O,(cols-1));
clagimg = imatrix(O,(rows-l),O,(cols-l));
- /* initialize background to white */
for(i=O;i<rows;i++) for(j=O;j<cols;j++) image[i][j] = 255;
/* read the input character image */
temp = (unsigned char *) calloc( (unsigned)cols , sizeof(unsigned char));
if(temp = (unsigned char *) NULL) fprintf(stderr, "calloc failed for temp\nn);
exit (1);
for (i=O; i<rows; i++) picread(inl, temp);
for (j=O; j<cols; j++) image[i][j] = (int)temp[j];
free((char*)temp);
/* preprocess the input image by median filter and a mask */
prep(image,rows,col~);
#ifdef DEMO
/* output the procesqed image */
sprintf(duml,"prep.picn);
if( (out6 = picopen_w(duml,PIC_SAMEARGS(inl))) == O ) {

- 65 - : 2 1 85827 ``
-perror(duml);
exit(l);
grylvl(image, rows, cols, out6);
/* enlarge the input image in order to display vectors and segmentation*/
dspl = imatrix(0,(10*rows-1),0,(10*cols-1));
for(i=0;i<10*rows;i++) for(j=0;j<10*cols;j++) dspl[i][j] = 255;
dspl_s = imatrix(0,(10*rows-1),0,(10*cols-1));
for(i=0;i<10*rows;i++) for(j=0;j<10*cols;j++) dspl_s[i][j] = 255;
#endif node = (struct node *) calloc((unsigned)(cols*rows), sizeof(struct node));
if(node == (struct node *) NULL) fprintf(stderr, "calloc failed for node\nn);
exit (1);
node -= 1; /* let node number start from ONE */
lag(image, rows, cols, node, &no_node);
for(i=l;i<=no node;i++~
for(j=((node+i)->col_start);j<=((node+i)->col_end);j++) lagimg[(node+i)->rowth][j] = i;
blob_extr(rows,cols,node,no_node,&blob_list,&no_blob,clagimg);
charpt = str;
blob_t = blob_list;
while( blob_t->next != (struct blob *) NULL ) ++blob_ct; ifdef DEMO
printf("\n ");
printf("--- WORKING ON BLOB %d ---\nn, blob ct );
printf(n\n n); endif blobth(&blob_t,node,rows,&bimage,&bcol,&col_start,&vctr_list, &clagnode,&no_clagnode,dspl);

for(i=0;i<rows;i++) for(j=0;j<bcol;j++) if(bimage[i][j]==0) block[i][j+col_start] = blob_ct;

ifdef DEMO
sprintf(duml,"vctr.pic");
outimg(dspl,vctr_list,col_start,inl,duml); endif /* blob_t advanced in blobth() */

blob2feat(bimage, bcol, vctr_list, *charpt, base, outl, dspl_s, dspl, col_start,inl,lagimg,clagimg);

for(i=l;i<=no_clagnode;i++) -free((char*)((clagnode+i)->node+l));
for(i=l;i<=no clagnode;i++) if((clagnode+i)->no_vector!=0) free((char*)((clagnode+i)->vctr+l));
free((char*)(clagnode+l));
while( vctr_li~t->next != (struct vector *)NULL ) ( vctr=vctr_list;
vctr_list = vctr_list->next;
free(vctr);
free(vctr_list);
free_imatrix(bimage,O,(rows-l),O,(bcol-l));
charpt++;
}
fclo~e(outl);
#if defined (DEMO) &~ defined (SKUO) for(i=O;i<rows;i++) dwln(O,(i*lO),(cols*10-l),(i*10),220,dspl);
for(j=O;j<cols;j++) dwln((j*lO),O,(j*lO),(rows*10-1),220,dspl); elif defined (D~MO) printf (n\nn); else if(blob_ct != str_ct) printf(" (****T****) ");
else printf(" n);
#endif #ifdef DEMO
/* output the vector image */
sprintf(duml,"vctrs.picn);
if((out3 = picopen_w(duml,inl->type,O,O,Cols*lO,Rows*10, inl->chan,argv,inl->cmap)) = 0 ) {

perror(duml);
exit(l);
}

grylvl(dspl_s,(out3->height),(out3->width),out3);
sprintf(duml,nvctr.picn);
if((out2 ~ picopen_w(duml,inl->type,O,O,Cols*lO,Rows*10, "rgbn,argv,inl->cmap)) == 0 ) {

perror(duml);
exit(l);
I

grylvl(dspl,(out2->height),(out2->width),out2);
/* output the blob image */
sprintf(duml,"blob.picn);
if( (out4 = picopen_w(duml,PIC_SAMEARGS(inl))) == 0 ) {

perror(duml);
exit(l);
) grylvl(block, rows, cols, out4);
/* output the lag image */
sprintf(duml,"lag.picn);
if( (outS = picopen_w(duml,PIC_SAMEARGS(inl))) = O ) perror(duml);
exit(1);
grylvl(lagimg, row~, cols, out5);
/* output the clag image */
sprintf(duml,"clag.pic");
if( (out6 = picopen-w(duml~pIc-sAMLARGs(inl))) = O ) ( perror(duml);
exit(l);
grylvl(clagimg, rows, cols, out6);
#endif free_imatrix(block,O,(rows-l),O,(cols-l));
free_imatrix(lagimg,O,(rows-l),O,(cols-l));
free_imatrix(clagimg,O,(rows-l),O,(cols-l));
free_imatrix(image,O,(rows-l),O,(cols-1));
free((char*)(node+1));
while( blob_list->next != (struct blob *) NULL ) ( blob_t = blob_list;
blob_list = blob_list->next;
' free(blob_t);
free(blob_list);
#ifdef DEMO
free_imatrix(dspl,O,(lO*rows-l),O,(lO*cols-1));
free_imatrix(dspl_s,O,(lO*rows-l),O,(lO*cols-l));
#endif }

void outimg(dspl,vctr_list,col_start,in,dum) int **dspl,col_start;
struct vector *vctr_list;
PICFILE *in;
char *dum;

int count,xl,x2,yl,y2;
int i,j;
PICFILE *out;
char **duml = (char **) NULL;
if((out = picopen_w(dum,in->type,O,O,Cols*lO,Rows*10, in->chan,duml,in->cmap)) == 0 ) {

perror(dum);
exit(l);
count = 0;
while( (vctr_list->next) != NULL ) {

if(vctr_list->type != 'a') ( xl=(int)(lO*(vctr_list->x[O]+col_start));
x2=(int)(lO*(vctr_list->x[l]+col_start));
yl=(int)(lO*(vctr_list->y[0]));
y2=(int)(lO*(vctr_list->y[1]));
dwln(xl,yl,x2,y2,220,dspl);
for(i=(yl-l);i<=(yl+l);i++) for(j=(xl-l);j<=(xl+l);j++) dspl[i][j]-220;

- 68 - 2 ~ ~5827 for(i=(y2-l);i<=(y2+1);i++) for(j=(x2-l);j<=(x2+1);j++) dspl[i][j]=220;
/* make the line fatter */
if(yl = y2) dwln(xl,(yl+l),x2,(y2+1),220,dspl);
dwln(xl,(yl-l),x2,(y2-1),220,dspl);
else {

dwlnt(xl+l),yl,(x2+1),y2,220,dspl);
dwln~(xl-l),yl,(x2-l),y2,220,dspl);
}

}
else {

xl=(int)(lO*(vctr_list->ax[O]+col_start));
x2=(int)(lO*(vctr_list->ax[l]+col_start));
yl=(int)(lO*(vctr_list->ay[0]));
y2=(int)(lO*(vctr_list->ay[1]));
dwln(xl,yl,x2,y2,180,dspl);
dwln((xl+l),yl,(x2+1),y2,180,dspl);
dwln((xl-l),yl,(x2-l),y2,180,dspl);
for(i=(yl-l);i<=(yl+l);i++) for(j=(xl-l);j<=(xl+l);j++) dspl[i][j]=170;
for(i=(y2-l);i<=(y2+1);i++) for(j=(x2-l);j<=(x2+1);j++) dspl[i][j]=170;
x2=(int)(lO*(vctr_list->ax[2]+col_start));
y2=(int)(lO*(vctr_list->ay[2]));
dwln(xl,yl,x2,y2,180,dspl);
dwln((xl+l),yl,(x2+1),y2,180,dspl);
dwln((xl-l),yl,(x2-l),y2,180,dspl);
for(i=(y2-l);i<=(y2+1);i++) for(j=(x2-l);j<=(x2+1);j++) dspl[i][j]=170;
xl=(int)(lO*(vctr_list->ax[l]+col_start));
yl=(int)(lO*(vctr_list->ay[1]));
dwln(xl,yl,x2,y2,180,dspl);
dwln((xl+l),yl,(x2+1),y2,180,dspl);
dwln((xl-l),yl,(x2-l),y2,180,dspl);
}

vctr_list = (vctr_list)->next;
}

/* grylvl(dspl,(out->height),(out->width),out);*/
picclose(out);
J

void outimg_s(dspl,vctr_list,col_start) int **dspl,col_start;
struct vector *vctr_li-st;
( int count,xl,x2,yl,y2;
int i,j;
if(vctr_list->type != 'a') xl=(int)(lo*(vctr-list->x[o]+col-start));
x2=(int)(lo*(vctr-list->x[l]+col-start));

- 69 - 2 ~ 85827 , -yl=(int)(lO*(vctr~ t->y[0]));
y2=(int)(lO*(vctr-list->y[l]));
dwln(xl,yl,x2,y2,220,dspl);
for(i=~yl-l);i<=(yl+l);i++) for(j=(xl-l);j<=(xl+l);j++) dspl[i][j]=220;
for(i=(y2-l);i<=(y2+1);i++) for(j=(x2-l);j<=(x2+1);j++) dspl[i][j]=220;
/~ make the line fatter */
if(yl==y2) dwln(xl,(yl+l),x2,(y2+1),220,dspl);
dwln(xl,(yl-l),x2,(y2-1),220,dspl);
}

else {

dwln((xl+l),yl,(x2+1),y2,220,dspl);
dwln((xl-l),yl,(x2-l),y2,220,dspl);
}

else {

xl=(int)(lO*(vctr_list->ax[O]+col_start));
x2=(int)(lO*(vctr_list->ax[l]+col_start));
yl=(int)(lO*(vctr_list->ay[0]));
y2=(int)(lO*(vctr_li.~t->ay[1]));
dwln(xl,yl,x2,y2,180,dspl);
dwln((xl+l),yl,(x2+1),y2,180,dspl);
dwln((xl-l),yl,(x2-l),y2,180,dspl);
for(i=(yl-l);i<=(yl+l);i++) for(j=(xl-l);j<=(xl+l);j++) dspl[i][j]=170;
for(i=(y2-l);i<=(y2+1);i++) for(j=(x2-l);j<=(x2+1);j++) dspl[i][j]=170;
x2=(int)(lO*(vctr_list->ax[2]+col_start));
y2=(int)(lO*(vctr_list->ay[2]));
dwln(xl,yl,x2,y2,180,dspl);
dwln((xl+l),yl,(x2+1),y2,180,dspl);
dwln((xl-l),yl,(x2-l),y2,180,dspl);
for(i=(y2-l);i<=(y2+1);i++) for(j=(x2-l);j<=(x2+1);j++) dspl[i][j]=170;
xl=(int)(lO*(vctr_list->ax[l]+col_start));
yl=(int)(lO*(vctr_list->ay[l]));
dwln(xl,yl,x2,y2,180,dspl);
dwln((xl+l),yl,(x2+1),y2,180,dspl);
dwln((xl-l),yl,(x2-l),y2,180,dspl);
J
}

2 t 85827 `_ -grylvl .c /* rescale the grey level for displaying */
#include <math.h>
#include <-~tdio.h>
#include <picfile_sig.h>
void grylvl(image, rows, cols, outpt) int **image,rows,cols;
PICFILE *outpt;
float TLO = 999999.0;
float THI = -999999.0;
float SLOPE;
unsigned char *temp ;
int i,j;
temp = ~unsigned char *)calloc((unsigned)cols,~izeof(unsigned char));
if(temp == (unsigned char *) NULL) fprintf(stderr, "calloc failed in grylvl()\n");
exit (1);
I

for (i=0; i<rows; i++) for (j=0; j<cols; j++) if ( (float)image[i][j] > THI ) THI=(float)image[i][j];
if ( (floatJimage[i][j] < TLO ) TLO=(float)image[i][j];

#ifdef SKUO
printf(nExtreme value of image: min=%f max=%f \n", TLO, THI );
#endif SLOPE = 255.0/(THI-TLO);
for (i=0; i<rows; i++) for (j=0; j<cols; j++) temp[j] = (unsigned char)((short int)(SLOPE*(image[i][j]-TLO)));
picwrite(outpt, temp);

free(~char*.)temp);
piccloqe(outpt);

- 71 - 2 1 ~5827 -lag.c /* find the line adjacent grapf (LAG) of image[O,row~-l][O,colq-1].
node is the output, no node is the total number of nodes */
#include <ftr.h>

void lag(image, rows, c013, node, no_node) int **image, rows, c013, *no_node;
struct node *node;
int find=0, row, count;
int i,j;
/* find start and end point for each node */
*no node = 1;
for(i=O;i<rows;i++) for(j=O;j<cols;j++) if(image[i][j] = 0 && find == 0) (node + *no_node)->rowth = i;
(node + *no node)->col_start = j;
find = 1 ;
if( image[i][j]==255 && find - 1) (node + *no_node)->col_end = tj~
find = 0 ;
*no_node += 1;
if( j==(cols-1) && find - 1) (node + *no_node)->col_end = j ;
find = ;
*no node += 1;
}
*no_node -= 1;
/* find degree for each node */
for(i=1; iC=*no_node; i++) row = (node+i)->rowth ;
if(row == 0) (node+i)->above=0;
else find=0;
count=0;
for(j=((node+i)->col_start);j<=((node+i)->col_end);j++) if( image[row-l][j] = 0 && find==0) find=1;
count += 1;
if( image[row-l][j] = 255 && find==1) find=0;
if(count>10) printf(nNot enough qpace for a node in struct node\nn);
(node+i)->above=count;

if(row == (rows-1)) (node+i)->below=0;
else find=0;
count=0;

- 72 - 2 1 8~827 `_ , for(j=(~node+i)->col_start);j<=((node+i)->col_end);j++) {

if( image[row+l][j]z=O && find--O) find=l;
count += 1;
if( image[row+l][j]==255 && find==1) find=O;
if(count>10) printf(nNot enough space for b_node in struct node\nn);
(node+i)->below=count;
}

/* connect adjacent nodes */
for(i=1; i<=*no_node; i++) row = (node+i)->rowth ;
if( (node+i)->above > O
count=O;
for(j=(i-1); j> ; j if( (node+j)->rowth < (row-1) ) break;
if( (node+j)->rowth == (row-1) &&
(node+j)->col_start <= (node+i)->col_end &&
(node+j)->col_end >= (node+i)->col_start ) (node+i)->a_node[count] = j;
count += 1;
}
if(count != (node+i)->above) printf(nerror in a_degree of node %d\n",i);
if( (node+i)->below > O ) count=O;
for(j=(i+1); j<=*no_node; j++) if( (node+j)->rowth > (row+1) ) break;
if(.(node+j)->rowth == (row+1) &&
(node+j)->col_start <= (node+i)->col_end &&
(node+j)->col_end >= (node+i)->col_start ) (node+i)->b_node[count] = j;
count += l;
}
if(count != (node+i)->below) printf("error in b_degree of node %d\n",i);

}

1 ine fit.c /* fit a line to a bounch of points ~tart from x[~tart] y[~tart]
to x[end] y[end] */
#include <ftr.h>
#define SQR(a_) ~(a_) * (a_)) int ndatat=0; /* defining declaration */
float *xt=O,*yt=O,aa=O.O,abdevt=0.0; /* defining declaration */
void line_fit(group,x, y, start, end, xl, x2, ~tart_new, end_new) int **group;
float *x,*y,*xl,*x2;
int start, end, *start_new, *end_new;
{

float *xp,*yp,c,m,abdev,slope,width;
int ndata, invert=O,start_o,end_o,i;
void medfit();
start_o = start;
end_o = end;
/* check and discard the start and end lag-path node if it is an outlier */
if(abs(end-start+l)>=3) {
ndata = end_o - start_o - 1;
for(i=(start_o+l); i<=(end_o-l); i++) width += (float) group[i]~2];
width /= (float)ndata ;
if((float)group[start_o][2] < 1.4*width &&
(float)group[start_o][2] > 0.6*width ) width = width*((float)ndata) + (float)group[start_o][2];
ndata += 1;
width /= (float)ndata;
) if((float)group[end_o][2] < 1.4*width &&
(float)group[end_o][2] > 0.6*width ) {
width = width*((float)ndata) + (float)group[end_o][2];
ndata += 1;
width /= (float)ndata;
if((float)yroup[start_o][2] > 2.0*width ll (float)group[start_o][2] < 0.4*width ) start = start_o+1;
if((float)group[end_o][2] > 2.0*width ll (float)group[end_o][2] < 0.4*width ) end = end_o-1;
if(ndata==1) ( start=start_o;
end=end_o;
*start_new = start;
*end_new = end;
ndata = end -start + 1;
slope=fabs((y[end]-y[start])/(x[end]-x[start]));
if ( slope < 1. ) {

xp = &x[start] - 1;
yp = &y[start] - 1;
) else ( _ 74 _ 2 1 8 5827 -invert = l;
yp = &x[start] - l;
xp = &y~start] - 1;
medfit(xp,yp,ndata,&c,&m,&abdev);
if (invert) {

*xl = m * y[start_o] + c;
*x2 = m * y[end_o] + c;
else *xl = (y[start_o] - c) / m;
*x2 = (y[end_o] - c) / m;

void medfit(x,y,ndata,a,b,abdev) float *x,*y,*a,*b,*abdev;
int ndata;
( int j,count=0;
float bb,bl,b2,del,f,fl,f2,sigb,temp,bbu,abdevtu,aau;
float sx=O.O,sy=O.O,sxy=O.O,sxx=O.O,chisq=0.0;
float rofunc();
ndatat=ndata;
xt=x;
yt=y;
for (j=l;j<=ndata;j++) {
sx += x[j];
sy += y[j];
sxy += x[j]*y[j];
sxx += x[j]*x[j];
del=ndata*sxx-sx*sx;
aa=(sxx*sy-sx*sxy)/del;
bb=(ndata*sxy-sx*sy)/del;
for (j=l;j<=ndata;j++) chisq += (temp=y[j]-(aa+bb*x[j]),temp*temp);
if (chisq == 0.) {
*a = aa;
*b = bb;
*abdev=abdevt/ndata;
return;
}

sigb=sqrt(chisq/del);

bbu=bb;
fl=rofunc(bbu);
abdevtu=abdevt;
aau = aa;
bl=bb;
fl=rofunc(bl);
b2=bb+((fl > 0.0) ? fabs(3.0*sigb) : -fabs(3.0*sigb));
f2=rofunc(b2);
while (fl*f2 > 0.0) {
bb=2.0*b2-bl;
bl=b2;
fl=f2;
b2=bb;

--f2=rofunc(b2);
count +- l;
if(count >20) ( *a=aau;
*b=bbu;
*abdevsabdevtu/ndata;
return;
}
}

sigb=O.Ol*sigb;
while (fabs(b2-bl) > .~igb) {
bb=0.5*(bl+b2);
if (bb ss bl.ll bb s= b2) break;
f=rofunc(bb);
if (f*fl >= 0.0) ( fl=f;
bl=bb;
} el~e I
f2=f;
b2=bb;
?

*a=aa;
*b=bb;
*abdev=abdevt/ndata;
if(abdevtu<abdevt) ( *a=aau;
*b=bbu;
*abdev=abdevtu/ndata;

float rofunc (b) float b;
( int j,nl,nmh,nml;
float *arr,d,sum=O.O,*vector();
void sort(),free_vector();
arr=vector(l,ndatat);
nl=ndatat+l;
nml=nl/2;
nmh=nl-nml;
for (j=l;j<=ndatat;j++) arr[j]=yt[j]-b*xt[j];
sort(ndatat,arr);
aa=0.5*(arr[nml]+arr[nmh]);
abdevt=0.0;
for (j=l;j<=ndatat;j++) ( d=yt[j]-(b*xt[j]+aa);
abdevt += fabs(d);
sum += d > 0.0 ? xt[j] : -xt[j];

}
free_vector(arr,l,ndatat);
return sum;

void sort(n,ra) int n;
float ra[];

( int l,j,ir,i;
float rra;
l=(n 1)+1;
ir=n;
for (;;) I
if (1 > 1) rra=ra[--l];
else ( rra=ra[ir];
ratir]=ra[1];
if (--ir == 1) 1 ra[1]=rra;
return;

i=l;
j=l l;
while (j <= ir) ~
if (j < ir ~ ra[j] < ra[j+1]) ++j;
if (rra < ra[j]) ( ra[i]=ra[j];
j += (i=j);
else j=ir+l;
}

ra[i]=rra;

~ 77 ~ 2 1 8~827 -/* check and merge vector~ in adjacent two clag-path-node */ merge . c #include <ftr.h>
#define DT 2.2 /* di~tance threshold for merging two vectors */
#define DT_ 2e 5.1 /* di~tance threshold of two adjacent ending point for merging two vector~ */
#define DT_y 3.0 /* distance threshold for merging two vectors if one of them is a vector of y */
#define DT_r 1./7.4 /* distance ratio threshold for merging two vectors */
#define DT_2e_r 1./5. /* distance ratio threshold of two adjacent ending point for merging two vectors */
#define MINL 4.0 /* ~i n;m-lm length for a vector to be merged */
#define INIQ (h) (h)->front=(h)->rear=(struct stack *) NULL;
void merge(vctr_list,clagnode,no_clagnode,flag) struct vector *vctr_list;
struct clagnode *clagnode;
int no_clagnode,*flag;
{

int i,count,count_1;
int c_node,j_node,jj_node;
struct queue *PATH,*JCT;
void merge_test(), addq(), dumpq();
*flag = O;
PATH = (struct queue *)calloc(l,sizeof(struct queue));
JCT = (struct queue *)calloc(l,sizeof(struct queue));
for(i=l;i<=no_clagnode;i++) if((clagnode+i)->class ='p') {

INIQ(PATH);
INIQ (JCT);
if((clagnode+i)->a_clagnode[O]!=O) {

addq(&JCT,(clagnode+i)->a_clagnode[O]);
while( (JCT->front)!=(struct stack *)NULL ) {

dumpq(&JCT,&j node);
count = O;
while( (clagnode+j_node)->a_clagnode[count]!=O ) {
c_node = (clagnode+j_node)->a_clagnode[count];
if((clagnode+c_node)->class ='p') {
addq (&PATH,C_ node);
if((clagnode+c_node)->a_clagnode[O]!=O) addq(&JCT,(clagnode+c_node)->a_clagnode[O]);
else if((clagnode+c_node)->class=='j') addq(&JCT,c_node);
else printf(nerror in merge(1) \n");
count += 1;

if( (JCT->front!=(~truct stack *) NULL) JCT->rear!=(struct stack *)NULL ) printf(nerror in merge(2) \nn);
INIQ (JCT);
if((clagnode+i)->b clagnode[O]!=O) {

addq(&JCT,(~lagnode+i)->b_clagnode[O]);
while( (JCT->front)!=(struct stack *)NULL ) - 78 - 2 ~ 8 5 827 dumpq(&JCT,&j_node);
count = O;
while( (clagnode+j_node)->b_clagnode[count]!=O ) {

c_node = (clagnode+j_node)->b_clagnode[count];
if((clagnode+c_node)->claqs=='p') addq(&PATH,c_node);
if((clagnode+c_node)->b_clagnode[O]!=O) addq(&JCT,(clagnode+c_node)->b_clagnode[O]);
else if((clagnode+c_node)->class=='j') addq(&JCT,c_node), elqe printf("error in merge(3) \n");
count += 1;

}

if( (JCT->front!=(struct stack *)NULL) 11 JCT->rear!=(struct stack *)NULL ) printf("error in merge(4) \n");
while( (PATH->front)!=(struct stack *)NULL ) dumpq(&PATH,&c_node);
merge_test(vctr_list,clagnode,i,c_node,&flag);

void merge_test(vctr_list,clagnode,c_node,b_node,flag) struct vector *vctr_list;
struct clagnode *clagnode;
int c_node, b_node, *flag;
{
struct vector *vctr_a, *vctr_b, *vctr_temp;
int i,j,m,n,no_lag;
float a[3], L, dql, dq2, dp, L1, L2;
for(i=l;i<=((clagnode+c_node)->no_vector);i++) for(j=l;j<=((clagnode+b_node)->no_vector);j++) vctr_a = ~clagnode+c_node)->vctr[i] ;
vctr_b = (clagnode+b_node)->vctr[j] ;
if(vctr_b = vctr_a 11 !(vctr_b->type ='v' && vctr_a->type=='v') ) continue;
/* sort two vectors in above and below order */
if(vctr_a->y[O] > vctr_b->y[O]) vctr_temp = vctr_b;
vctr_b = vctr_a;
vctr_a = vctr_temp;

a[O] = vctr_a->y[O] - vctr_b->y[1];
a[l] = vctr_b->x[1] - vctr_a->x[O];
a[2] = vctr_b->y[l]*vctr_a->x[O] - vctr_a->y[O]*vctr_b->x[1];
L = sqrt( a[O]*a[O] + a[l]*a[1] );
dql = fabq((a[O]*vctr_a->x[l] + a[l]*vctr_a->y[1] + a[2])/L);
dq2 = fabq((a[O]*vctr_b->x[O] + a[l]*vctr_b->y[O] + a[2])/L);
dp=qqrt((vctr_b->x[O]-vctr_a->x[l])*(vctr_b->x[O]-vctr_a->x[1])+
(vctr_b->y[O]-vctr_a->y[l])*(vctr_b->y[O]-vctr_a->y[1]));
L1=sqrt((vctr_b->x[O]-vctr_b->x[l])*(vctr_b->x[O]-vctr_b->x[l])+

-`_ ~vctr_b->y[O]-vctr_b->y[l])*(vctr_b->y[O]-vctr_b->y[1]));
L2=sqrt~vctr_a-~x[O]-vctr_a->x[l])*~vctr a->x[O]-vctr_a->x[l])+
~vctr_a->y[O]-vctr_a->y[l])*(vctr_a->y[O]-vctr_a->y[l]));
ifdef SKUO
if~ ((dql/L)<DT_r && (dq2/L)<DT_r && (dp/L)<DT_2e_r &&
dql<DT && dq2<DT && dp<DT_2e &&
(L1>=MINL ¦I L2>=MINL)) 11 ~ vctr_b->dqt==200. && dql<DT_y && dq2<DT_y && dp<(DT_y+l.O) )) printf(ndql=%.5f dq2=%.5f dp=%.5f\n",dql/L,dq2/L,dp~L);
else printf(" dql=~.5f dq2=%.5f dp=%.5f\n",dql/L,dq2/L,dp/L) #endif if( ~(dql/L)<DT_r && (dq2/L)<DT_r && ~dp/L)<DT_2e_r &&
dql<DT && dq2<DT && dp<DT_2e &&
(Ll>=MINL 11 L2>=MINL)) 11 ~vctr_b->dqt =200. && dql<DT_y && dq2<DT_y && dp<~DT_y+l.O) )) {

*flag = l;
vctr_a->x[l] = vctr_b->x[l];
vctr_a->y[1] = vctr_b->y[l];
for(m=O;m<~vctr_b->no_clag);m++) ( vctr_a->no_clag += l;
vctr_a->clagnode[vctr_a->no_clag-l] = vctr_b->clagnode[m];
for~n=l;n<=~clagnode+vctr_b->clagnode[m])->no vector);
n++) if~clagnode+vctr_b->clagnode[m])->vctr[n]) 8 vctr_b) ~clagnode+vctr_b->clagnode[m])->vctr[n] = vctr_a;
J

if(vctr_a->no_lag!=O && vctr_b->no_lag!=O) {

no_lag = vctr_a->no_lag + vctr_b->no_lag;
vctr_a->width=((float)(vctr_a->no_lag) * vctr_a->width +(float)(vctr_b->no_lag) * vctr_b->width)/(float)no_lag;
vctr_a->no_lag = no_lag;
}

else if(vctr_b->no_lag!=O) vctr_a->width = vctr_b->width;
vctr_a->no_lag = vctr_b->no_lag;
}
while( (vctr_liqt->next) != vctr_b ) vctr_list = vctr_list->next;
vctr_list->next = vctr_b->next;
free(~char*)vctr_b);
}

}
}

#undef DT
#undef DT_2e #undef DT_r #undef DT_2e_r #undef DT_y #undef MINL
#undef INIQ~h) /* add a node to queue */
void addq(head,node) int node;
struct queue **head;
{
qtruct stack *np;

~ 21 85827 np=(struct stack *) malloc~sizeof(struct stack));
if( np == (~truct stack *) O ) printf(nmalloc fail in addq()\nn);
np->next = (-~truct ~tack *) NULL ;
np->node = node;
/* add operation */
if(t*head)->front~ truct ~tack *) NULL) (*head)->front = np;
else ~*head)->rear->next = np;
(*head)->rear = np;

/* dump a node from queue */
void dumpq(head,node) int *node;
struct queue **head;
struct stack *out;
if( (out=(*head)->front) 2 (struct stack *) NULL ) printf("STACK empty\nn);
el~e if( (*head)->rear = (*head)->front ) (*head)->rear=out->next;
(*head)->front = out->next;
) *node = out->node;
free((char*)out);

-- /******************** *******************************************, /* *, /* NDOREC */
/* */
/****************************************************************/

#include <nrec.h>
#include <math.h>
#define PEN0 5.0 /* Penalty for staying in the same state */
#define PEN2 5.0 /* Penalty for skipping a state */
typedef struct int pframe; /* last frame of prev. model */
int pmodel; /* best model in current level */
~ BPT; /* Back-pointer */

extern char *sname[];
extern unsigned short **pdim2();
extern double Tran[KM][KM]; /* transition prob. */
void ndorec(filel, file2, rows, cols) char filel[40], file2[40];
int rows, cols;
double **pdimf2(), **sprob;
double bitprob, minprob, sum, smin, smin_old, tr_dist, valO, vall, val2;
unsigned short *invec;
char file3[40];
FILE *fl, *f2, *f3;
char dum[S], str[6], *spt;
int hval, loc, segnum, lev_len;
int i, j, k, 1, jl, kl, j_high, jk, I, K, K_old, L;
short t[KM+ll[JM+l]; /* translates [char,seg] to [snum] */
double d[KM+l][JM+l]; /* dist of the new vector */
double dt[KM+l][JM+l]; /* dist of the prev vector */
double D[LM+l][KM+l][JM+l]; /* cum dist */
unsigned short B[LM+l][KM+l][JM+l]; /* no of segments in the path */
double Dt[5*LM+l][LM+l][KM+l]; /* cum dist at prev seg, prev char */
unsigned short stay[LM+l][KM+l][JM+l], Ostay[LM+l][KM+l][JM+l];
/* Flag for staying in the prev. state *~
double dist_k;
unsigned short **sval;
BPT bpt[5*LM+l][LM+l][KM+l]; /* Back-pointer */
/* In bpt[I][l][k], I = current frame, l = current level, k = next model in frame I+l, pframe = last frame of prev. model pmodel = best model in current level for transition to model k in next frame */

if ((fl - fopen (filel, "rn)) == (FILE *) NULL) fprintf(stderr, n*** statfile cannot be opened\n");
exit(l);

~_ 21 85827 ~.
#ifdef DBUG1 fprintf(qtderr, "Creating %d x %d array\n", rows, cols);
#endif sval = pdim2(rows, colq); /* create 2D array */
/* read in from statfile */
if(fread(sval[O], sizeof(unsigned short), colq*rows, fl)== O) fprintf(stderr, "Cannot read from statfile\nn);
exit(l);

sprob = pdimf2(rowq, cols); /* create 2D array to */
/* store bit-dep. probabilitieq */
minprob = 1.0 / (2.0 * (double) cols);
for (i=O; i<rows; i++) for (j = O, sum = O; j < (cols-1); j++) if (!sval[i][cols-1]) sval[i][cols-1] = 1;
bitprob = ((float) sval[i][j])/((float) sval[i][cols-1]);

if (bitprob == 0.0) bitprob = minprob;
if (bitprob == 1.0) bitprob = 1.0 - minprob;
sprob[i][j] = log(bitprob/(l-bitprob));
sum = sum + log(l-bitprob);
}

sprob[i][cols-1] = sum;

#ifdef D~UG1 fprintf(stderr, "label=pO, ind=38:\n");
for ~j=O; j<cols; j++) fprintf(stderr,"[%d]=%l.le%c",j,sprob[38][j], (j+1)%6 ? ' ~ : '\n');
fprintf(stderr, "\n");
#endif /* Allocate memory for the incoming vector */
invec = (unsigned short *) calloc(N, sizeof(unsigned short));

/* Initialize the sstat[] of structures */

/* Initialize translation table t[k][j] */
/* Max JM states (segments) plus no. of states in each row */

for (j=O; j<=JM; j++) t[O][j] = -1;
for (i=O, j=k=l, jk=l; i<rowq; i++) {

t[k][j] = i~

~ 2 1 85827 if ((~name[i+lJ[1] == '0') 11 (i == (row~-1))) while (++j <= JM) t[k][j] = -1; /* unused space */
t[k][0] = jk; /* store J(k), max qtate for k */
jk - 1, j = 1, k++;
else j++, jk++;

#ifdef DBUGl for (i=0; i<=KM; i++) for (j=0; j<=JM; j++) fprintf(stderr, "%d% n, t[i][j]);
fprintf(stderr, "\nn);
#endif if ((f2 = fopen (file2, "rn)) == (FILE *) NULL) {

fprintf(stderr, n*** Vector_file cannot be opened\n");
exit(1);
}

/* Initialize the array~ */

for (l=LM; 1>=0; 1--) for (k=KM; k>=0; k--) B[l][k][0] = 0;
for (j=JM; j>=0; j--) D[l][k][j] = MEG;
stay[l][k][j] = 0;
Ostay[l][k][j] = 0;

for (j=LM; j>=0; j--) for (k=1; k<=KM; k++) Dt[0][j][k] = MEG;
bpt[0][j][k].pframe = 0;
bpt[0][j][k].pmodel = 0;
bpt[l][j][k].pframe = 0;
bpt[l][j][k].pmodel = 0;

for (i=5*LM; i>=0; i--) for (k=l; k<=KM; k++) Dt[i][0][k] = MEG;
bpt[i][0][k].pframe = 0;
bpt[i][0][k].pmodel = 0;

for (k=1; k<=KM; k++) Dt[0][0~[k] = ;

for (k=1; k<=KM; k++) {

jk = t[k]~0];
for (j=l; j<=jk; j++) dt[k]tj] = MEG;
}

/* Read the vector_file for the segment vectors */
I = 0;
while (fscanf(f2, "%s", dum) != EOF) I++;
i=0, j=0;
while(i < (N/8)) if (!(fscanf(f2, "%x", &hval))) fprintf(stderr, n*** Unexpected end of Vector filen);
exit(l);

ifdef DBUGl fprintf(stderr, "hval = %x\n", hval); endif for (k=7; k>=0; k--) invec[j+k] = (hval & 0x01);
hval = hval l;

j = (++i)*8;

if (i = N%8) if (!(fscanf(f2, n%x", &hval))) fprintf(stderr, n*** Unexpected end of Vector filen);
exit(1);
-ifdef DBUGlfprintf(stderr, "hval = %x\nn, hval);
endif for (k=i-l, j=(N/8)*8i k>=0; k--) invec[j+k] = (hval & 0x01);
hval = hval l;

ifdef DBUGl fprintf(stderr, "Stat()-> ");
for (i=0; i<N; i++) fprintf(stderr, n%d n,invec[i]);
fprintf(stderr, "\n\nn);
#endif * Level-building algorithm using HMM (~coring with Viterbi algorithm) */

/* Find the Bayeqian diqtance -~core of the observed seg from - 85 - 2~ t 8582 7 , _ the learned vectorq */
for (k=l; k<=KM; k++) jk = t~k][O];
for (j=l; j<=jk; j++) 1 = t[k][j];
for (i=O, sum=O; i<(coLs-l); i++) sum = sum - ~double) invec[i] * sprob[l][i];
d[k][j] = qum - sprob[l][colq-1];
}
ifdef DBUG1 fprintf(stderr, "\naO=%f,al=%f,a2=%fn,d[l][l],d[1][2],d[1][3]);
#endif /* Do for each level, each model and each state */
for (1=1; l<=LM; l++) if (1 > I) continue;
for (k=1; k<=KM; k++) D[l][k][O] = Dt[I-1][1-l][k];
jk = t[k][O];
for (j=jk; j>O; j--) if ((j>2) 11 ~(j==2) && (1!=1))) ( /* three possibilities- from same state with penalty, from prev. state, or skipping a state with penalty */
valO = D[l][k][j] + (Ostay[l][k][j] ? l.S*PENO :
PENO);
vall = D[l][k][j-1] + d[k][j];
if (j - 2) val2 = D[l][k][j-2] + d[k][j] +
(Ostay[l-l][k][O] ? PEN2/2.0 : PEN2);
else val2 = D[l][k][j-2] + d[k~[j] +
(Ostay[l][k][j-2] ? O : PEN2);
if ((vall <= valO) && (vall <= val2)) /* from prev. state */
D[l][k][j] = vall;
B[l][k][j] = B[l][k][j-l] +1;
else if (val2 <= valO) /* penalize for skipping a state */
D[l][k][j] = val2;
B[l][k][j] = B[lJ[k][j-2] +1;
elqe /* penalize for staying in qame qtate, */
D[l][k][j] = valO;

B[l][k][j] = B[l][k][j] +l;
~tay[l][k][j] = 1;

else if (j--2) /* two possibilities at level 1 - from prev. state, or from the same state */
valO = D[l][k][2] + (Ostay[l][k][2] ? l.S*PENO :
PENO);
vall = D[l][k][l] + d[k][2];
if (vall <= valO) D[l][k][2] = vall;
B[l][k][2] = B[l][k][l] + 1;
else D[l][k][2] = valO;
B[l][k][2] = B[l][k][2] + 1;
stay[l][k][j] = 1;

else if (j==1) /* two possibilities- from the same state, or from previous level */
valO = D[l][k][1] + (Ostay[l][k][1] ? 1.5*PENO :
- eENO);
vall = D[l][k][O] + d[k][l];
if (vall <= valO) D[l][k][1] = vall;
B[l][k][l] e l;
else D[l][k][l] = valO;
B[l][k][l] - B[l][k][l] + l;
stay[l][k][j] = l;

/* end of j-loop */

~ /* end of k-loop */

/* Ignore frame 1. level 1, if the min. distance is large (noise) */
if (I == 1 && 1 == 1) smin = MEG * 2;
for (k=KM ; k>O; k--) if (D[l]Ek][l] < smin~
smin = D[l][k][l];

if (smin > PENO) I = O;

-goto bad_frame;
}
/* Find the min. dist. among the last two states of all model~ of the current level, including the tran. prob.
to the next model j, and store in Dt[I]~l][j] and store the corresponding best model in R and the state in j_high */
for (j=0; j<=KM; j++) smin = MEG * 2;
for (k=KM, K=0; k>0; k--) tr_dist = (j ? (-log(Tran[k-l][j-1]))/2.0 : 0);
jk = t[k][0];
if ~(D[l][k][jk] + tr_dist) < smin) {

smin = D[l][k][jk] + tr_dist;
K = k;
}

K_old = K;
smin old = smin;
for (k=KM, K=0; k>=1; k--) tr_dist = (j ? (-log(Tran[k-l]~j-1]))/2.0 : 0~;
jk = t[k][0] - 1;
if (jk == 0) continue;
/* Add penalty for skipping a state, unless continued in the previous state */
dist_k = D[l][k][jk] + tr dist + (stay[l][k][jk] ? 0 : PEN2);
if (dist_k < smin) smin = dist_k;
K = k;

if (smin >= smin_old) smin = smin_old;
K = K old;
j_high = t[K][0];
e{lse j_high = t[~][0] - 1;

lev_len = B[l][K][j_high]; /* no. of frames in model K, giving min dist in final or semi-final state */

Dt[I][l][j] = smin; /* store best distance for j */
bpt[I][l][j].pframe = I - lev len;
bpt[I][l][j].pmodel = K;
stay[l][j][0] = stay[l][K][j_high]; /* store penalty info for NEXT level */

/* Print the best string so far */

ifdef DBUG
if (!j) str[l] = O;
i = I, kl= O;
for(jl = l; jl > O; jl--) K_old = bpt[i][jl][kl].pmodel;
if (jl <= i) jk = t[K_old][l];
str[jl-l] = sname[jk][O];
else str[i-l] = ' ';
i = bpt[i][jl][kl].pframe;
kl = K_old;

for (spt = str; *spt != 0; spt++) if (*spt == 'S') *spt = 's';
fprintf(stderr, "Fr = %d, Lev = %d, ", I,l);
fprintf(stderr, "D = %.2f, Model = %d/%c%d, PFr = %d, Str = %s\nn, Dt[I][l][O], K, str[l-l], j_high-l, bpt[I][l][O].pframe, str);

#endif } /* end of level */

ifdef DBUG
fprintf(stderr, "\n");
#endif /* Reinitialize for the next frame */

for (l=LM; 1>=0; 1--) for (k=KM; k>=O; k--) for (j=JM; j>=O; j--) Oqtay[l][k][j] = stay[l][k][j];
stay[l][k][j] = O;

bad_frame: continue;
~ /* end of frame ( while(fscanf) loop ) */
/* Find the min. over all levels at the final frame */

smin = MEG;
for (l=LM, L=O; 1>=1; 1--) if (1 > I) continue;
if (Dt[I][l][O] < smin) smin = Dt[I][l][O];
L = l;

'_ ~tr [L] = 0;
i = I, k= O;
for(j = L; j > O; j--) K 5 bpt[i][j][k].pmodel;
if (i >= 1) ( jk = t[K][1];
str[j-l] = sname[jk][O];
else str[i-l] - ' ';
i 5 bpt[i][j][k].pframe;
k = K;
}

for (spt = str; *spt != 0; spt++) if (*spt == 'S') *spt = 's';
#ifdef DEMO
fprintf(stderr,"\nRecognized string is: %s (Dist = %f)~nn, str, Dt[I] [L][0]);
#else printf("%s ", str);
if( (f3 = fopen(ntemp.resn,na")) == (FILE *) NULL ) fprintf(stderr,n%s : cannot open for writing\nn,"file.fv");
exit(l);
}

fprintf(f3, "%s ", str);
fclose(f3);
#endif if (fscanf(f2, "~x", &hval) != EOF) ( fprintf(stderr, "*** Vector file longer than expected\nn);
exit(l);
}
fclose(f2);
}
/****************************************************/
double **pdimf2(row, col) /* creates 2D array of doubles */
int row, col;
int i;
register double **prow, *pdata;
pdata = (double *) calloc(row * col, sizeof (double));
if (pdata == (double *) NULL) fprintf(stderr, "No memory space for data\nn);
exit(1);
}
prow = (double **) calloc(row, sizeof (double *));

if (prow = (double **) NULL) ( fprintf(stderr, "No memory space for row pointers\nn);
exit(l);

go 2 1 85827 for (i = O; i ~ row; i++) prow [ i ] = pdata;
pdata += col;

return prow;

/****************************************************/

- 91 - ~- 2 ~ 8 5827 /************** *** ******************************************, /* *, /* NREC~C
/* ~/
/

* rec: recognize the input word *
* usage: rec [-i ~tatfile] ~-1 statfile] [-r statfile]
[-p statfile label] [-v file.sv]
*

*/
#include <nrec.h>
#include <math.h>

main(argc, argv) int argc;
char *argv[];
{

void ndorec();
unsigned short **pdim2();
void pfree2(); /* frees the memory space */
void pr_stat();
int rows, cols, ind, hval, segnum, veclen;
char *dum, *calloc(), *malloc();
short init=0, learn=0, pr=0, rec=0;
register int i, j, k;
FILE *fl, *f2;
void stat_init();
short snum();
unsigned short **sval;
char filel[40], file2[40], lab[5], getlab[5];
extern char *sname[];

sprintf(filel,"junkl"); /* default statfile */
sprintf(file2,"temp.sv"); /* default Vector_file */
veclen = N/8 + (N%8 ? 1 : 0);
dum = (char *) malloc(2*veclen*sizeof(char));
while(argc-- > 1) if(argv[argc][0] == '-') switch(argv[argc][l]) ( case 'u':
printf("usage: rec [-i statfile.b] [-1 statfile.b] [-r\
statfile.b]\n");
printf(" [-p statfile.b label]\n~);
exit(l);
case 'i': /* initialize mode */
sprintf(filel,"%s",argv[argc+l]);
init = l;
break;
case '1': /* learn mode */
sprintf(filel,"%~",argv[argc+l]);
learn =l;

break;
case 'r': /* recognition mode */
sprintf(filel,"%sn,argv[argc+l]);
rec =l;
break;
case 'p': /* print */
sprintf(filel,"%sn,argv[argc+l]);
sprintf(lab,"%sn,argv[argc+2]);
pr =l;
break;
case 'v': /* initialize mode */
sprintf(file2,"%sn,argv[argc+1]);
break;
default:
break;
}

cols = N + l; /* bit-sums plus total */
rows = ~int) snum(nxxn);
/* "xx" not included in rows, to be used during manual labeling for discarding a seg vector */
if (init) if ((fl = fopen (filel, "w")) == (FILE *) NULL) fprintf(stderr, "*** statfile cannot be opened\n");
exit(1);
}

fprintf(stderr, "Creating %d x %d array\n", cols, rows);
sval = pdim2(rows, cols); /* create 2D array */

if(fwrite(sval[0], sizeof(unsigned short), cols*rows, fl)== 0) fprintf(stderr, "Cannot write into statfile\n");
exit(1);
}

pfree2(sval);
fclose(fl);
}

else if (learn) {

if ((fl = fopen (filel, "r+n)) == (FILE *) NULL) fprintf(stderr, n*** statfile cannot be opened\nn);
exit(l);
}

fprintf(stderr, "Xfering statfile to ~d x %d array\nn, cols, rows);
sval = pdim2(rows, cols); /* create 2D array */
~* read in from statfile */
if(fread(sval[0], sizeof(unsigned short), cols*rows, fl~== 0) fprintf(stderr, "Cannot read from statfile\nn);
exittl);

/* Establish ge'ggtat gtructures and initialize pointers */

_ 93 _ 2 1 85827 ~**** REQD. DURING RECOG. ONLY ****
segstat = (Segstat *) calloc~rows, sizeof (Segstat));
for (i=0; i<rows; i++) I

segstat[i]->bcountp = sval[i];
segstat[i]->tcount = sval[i][N];
***********************************/
/* Prompt for the vector file and the label file */

while(l) {

printf("Vector_file / q: n);
scanf("%sn, file2);
printf(n\nn);
#ifdef DBUG
fprintf(stderr, "Vector file = %s\nn, file2);
#endif if (!(strcmp(file2, "q"))) {
printf("Store the statfile? [y/n]:n);
scanf(n%sn, lab);
printf(n\nn);
/* Write into statfile */
if (!(strcmp(lab, "yn))) {

rewind(fl);
if(write(sval[0], sizeof(unsigned short), cols*rows, fl)== 0) fprintf(stderr, "Cannot write into statfile\nn);
exit(l);

fclose(fl);
exit(0);

if ((f2 = fopen (file2, "rn)) == (FILE *) NULL) {
fprintf(stderr, n*** Vector_file cannot be opened\nn);
continue;

/* Convert the label to index and increment corresponding entries in the segstat structure */

/* Vector-file format: [label hex-byte hex-byte hex-byte hex-byte] */
while (fscanf(f2, "%sn, lab) != EOF) /***
#ifdef DsUG
fprintf(stderr, "label = %s\n", lab);
#endif ***, /* If the label i-~ xx, throw away the vector and advance to the next vector */
if (!(strcmp(lab, "xxn))) {

for-~i=veclen; i>O; i--) fscanf(f2, "%xn, &hval);
continue;
}

if((ind = snum(lab)) == -1) #ifdef DBUG
fprintf(stderr, "Unknown label = %s\nn, lab); .
#endif exit(1);
}

/***
#ifdef DBUG
fprintf(stderr, "ind = %d\nn, ind};
#endif **/
i=O, j=O;
while(i < (N/8)) if (!(fscanf(f2, "%x", &hval))) fprintf(stderr, n*** Unexpected end of Vector filen);
exit(l);
}
#ifdef DBUGl fprintf(stderr, "hval = %x\nn, hval);
#endif for (k=7; k>=O; k--) ( if (hval & OxOl) sval[ind]~j+k]++;
hval = hval >> l;
-/* *((segstat[ind]->bcountp)+j+k)++; */

j = (++i)*8;
}
if (i = N%8) {
if (!(fscanf(f2, "%xn, &hval))) fprintf(stderr, n*** Unexpected end of Vector filen);
exit(l);

#ifdef DBUGl fprintf(stderr, "hval = %x\nn, hval);
#endif for (k=i-l, j=(N/8)*8; k>=O; k--) if (hval & OxOl) ~ val[ind][j+k]++;
hval = hval l;

/* *((segstat[ind]->bcountp)+j+k)++; */

sval[ind][N]++; /* Increment total for the seg. clas3 */

#ifdef DBUG1 fprintf(~tderr, "Stat(%s)-> ",lab);
for (i=O; i<=N; i++) fprintf(stderr, "%d ",sval[ind][i]);
fprintf(stderr, "\n\n");
#endif }
fclose(f2);

}
else if (rec) ndorec(filel, file2, rows, cols);
exit(O);

else if (pr) if ((fl = fopen (filel, "rn)j == (FILE *) NULL) fprintf(stderr, n*** statfile cannot be opened\nn);
exit(1);

sval = pdim2(rows, cols); /* create 2D array */
/* read in from statfile */
if(fread(svalEO], sizeof(unsigned short), cols*rows, fl)== O) fprintf(stderr, "Cannot read from statfile\nn);
exit(1);

while(1) printf(nVector file Label / q q: ");
scanf(n%s %sn, file2, lab);
printf(n\nn);
#ifdef DBUG
fprintf(stderr, "Vector file = %s\nn, file2);
#endif if (!(strcmp(file2, "qn))) fclose(fl);
e~it(O);

if ((f2 = fopen (file2, "rn)) == (FILE *) NULL) fprintf(stderr, n*** Vector_file cannot be opened\nn);

!_ exit(l);
) /* Convert the label to index and increment corre~ponding entries in the segstat structure */
while (fscanf(f2, "%s", getlab) != EOF) {

/* Vector-file format: [label hex-byte hex-byte hex-byte hex-byte] */
/* If the label i9 XX, throw away the vector and advance to the next vector */
if (strcmp(lab, getlab)) for ~i=veclen; i>0; i--) fscanf(f2, "%x", &hval);
continue;

if((ind = snum(lab)) == -1) continue;

#ifdef DBUG
fprintf(stderr, "\nlabel = %s, ind = %d\n", lab, ind); -#endif i=O, j=O;
while(i < (N/8)) if (!(fscanf(f2, "%xn, &hval))) fprintf(stderr, n*** Unexpected end of Vector filen);
exit(l);

#ifdef D~UGl fprintf(stderr, "hval = %x\nn, hval);
#endif for (k=7; k>=0; k--) {

if ((hval & 0x01) 11 (sval~ind][j+k])) fprintf(stderr,nstat[%s][%d] = %d n, lab, (j+k), sval[ind][j+k]);
fprintf(stderr,"vec[%d] = %d\n", (j+k), (hval & 0x01));
hval = hval 1;
}

/* *((segstat[ind]->bcountp)+j+k)++; */
j = (++i)*8;

if (i = N%8) if (!(fscanf(f2, "%xn, &hval))) fprintf(stderr, n*** Unexpected end of Vector file");
exit(1);

.

, #ifdef DBUG1 fprintf(stderr, "hval 2 %x\n", hval);
#endif for (k=i-l, j=~N/8)*8; k>=O; k--) ~ if ~(hval & OxOl) 11 ~sval[ind][j+k])) fprintf~stderr,"stat[%s][%d] = %d ", lab, (j+k), sval[ind][j+k]);
fprintf~stderr,"vec[%d] = \n", ~j+k), (hval & OxO1));

hval = hval l;
/* *((segstat[ind]->bcountp)+j+k)++; */

fprintf(stderr,"stat[%~][N] = %d\nn, lab, sval[ind][N]);
}

fclose(f2);
) /****************************************************/
unsigned short **pdim2(row, col) /* creates 2D array of integers */
int row, col;
int i;
register unsigned short **prow, *pdata;
pdata = (unsigned short *) calloc(row * col, sizeof (unsigned short));
if (pdata == (unsigned short *) NULL) fprintf(stderr, "No memory space for data\nn);
exit(1~;

prow = (unsigned short **) calloc(row, sizeof (unsigned short *));
if (prow == (unsigned short **) NULL) fprintf(stderr, "No memory space for row pointers\n");
exit(1);

for (i = O; i < row; i++) prow[i] = pdata;
pdata += col;

return prow;

/****************************************************/
void pfree2(prow) /* frees the memory space */

- 98 - 2 1 ~ 5827 -unsigned short **prow;
void free();
free~*prow); /* free the data space */
free(prow); /* free the pointer space */
-/****************************************************/
/* snum() converts a pointer to a segname, to a value */
short snum(sp) char *sp;
short i;
extern char *sname[];
for (i=O; sname[il != (char *) NULL; i++) if (!(strcmp(sp, sname[i]))) return i;
#ifdef DBUG
fprintf(stderr, "****Unknown segname %s\n",sp);
#endif return -1;
}
/****************************************************/

`_ .
nrec . h #include <stdio.h>
#define N 32 t* Dimension of bit-vector */
#define JM S /* Col_max in t[k][jl */
#define KM 27 /* Row_max in t[k][j] */
#define LM 5 /* Max no of expected chars in the word */
#define EPS 1. Oe-6 /* Small value */
#define MEG l.Oe+6 /* Large value */
#define MIN(x,y) (((x) < (y)) ? (x) : (y)) #define Z(z) (-log(z)) typedef struct Segstat ( char *name; /* segment name */
short loc; /* location within char */
/* first=O, int=l, end=2; */
double *probp; /* input dependent bit probabilities, */
/* ln(p/1-p) */
double nprob; /* Sum(ln(1-p)) */
float segprob; /* ln(char_probability) */
/* store in the first seg */
/* unsigned short *bcountp; bit_sum of l's during learning */
/* unsigned short tcount; total occurences of this seg */
~ Segstat;

loo 2 ~ 85827 ~ , -path anal y . c ~* Analysi~ the path node */
#include <ftr.h>
#define width_cg 2 /* width change between two adjacent node *t #define LO 0.7 /* threshold for width change */
#define LO_r 0.8 /* lo for ramp like node */
#define ~PSILON 0.lS
#define RATIO_v 1.5 /* threshold of the height-width-ratio for vertical vector */
#define RATIO_v_s 3.0 /* threshold of the height-width-ratio for vertical vector when it's a single isolated path*/
#define RATIO_v_c 1.2 /* threshold of the height-width-ratio for vertical vector when collnear checking fails */
#define RATIO_h 0.65 /* threshold of the height-width-ratio for horizontal vector */
#define DT 1.5 /* distance threshold for merging v-h-v vectors */

void path_analy(image,rows,cols,node,clagnode,ith,vctr_list) int **image,rows,cols;
struct node *node;
struct clagnode *clagnode;
struct vector *vctr_list;
int ith;
{

int flag_a, flag_b, flag_c, flag_s,flag_ii, flag_ii_a;
int flag_sz, flag_x, flag_xx, flag_wx;
int i,j,number,number_x,no_group=l,no_lag;
int c_node,p_node, *merge;
int ibeg,iend,max,gp_count,center;
int start,end;
int **group,no;
float *x,*y;
float epsilon,lo,lol,width,ratio,ratio_a,ratio_b;
float xl,x2,yl,y2,dst;
float a[3], L, dql, dq2, dp;
float width_x,ratio_x;
int m,clagnode_x,x_node;
struct vector *vctr, *vctr_b;
number = (clagnode+ith)->number ;
group = imatrix(l,number,1,4);
/* column 1: lag-node no. */
/* column 2: lag-node's width */
/* column 3: group's no. */
/* column 4: feature category */
/* (1: v-vector, 2: h-vector, 3: arc) */
x=vector(l,number);
y=vector(l,number);
for(j=l;j<=number;j++) {

c_node = (clagnode+ith)->node[j];
x[j]=((float)(node+c_node)->col_start +
(float)(node+c_node)->col_end ) / 2.0;
y[j]=(node+c_node)->rowth;

for(j=l;j<=number;j++) group[j][1] = (clagnode+ith)->node[j]; /* node */
group[j][2] = (node+group[j][1])->col_end /* node's width */
- (node+group[j][1])->col_start + 1 ;
) lol - 2 1 8 5827 ibeg = l;
/* check whether this is a single i~olated path-clagnode */
/* if( (clagnode+ith)->a_clagnode~0] 2= 0 &&
~ clagnode+ith)->b clagnode~0] == 0 ) flag_s=1;
else flag_s=0;*~
/* seperate the path into similar group */
group[1]~3] = no_group;
for(i=l;i<number;i++) /* check width change */
flag_a=0;
flag_b=0;
flag_c=0;
if(abs(group~i]~2]-group~i+1]~2])>width_cg) flag_a=l;
if(i>l&&i<(number-l)) {

epsilon = fabs((float)group~i]~2]/(float)group~i+1]~2] -(float)group~i-1]~2]/(float)group~i]~2]);
epsilon += fabs((float)group~i]~2]/(float)group~i+1]~2] -(float)group~i+1]~2]/(float)group[i+2][2]);if(epsilon>EPSILON) flag b=l;
lo=(float)group~i]~2]/(float)group~i+1]~2];
if(lo<LO 11 lo>(l./LO)) flag_c=l;
if(flag_a==l&&flag_b--l&&flag_c==l) {

iend = i;
/* check collinearity */
if( abs(iend-ibeg)>2 && collinear(x,y,ibeg,iend,&max,&dst)==l && dn_change(x,ibeg,iend) != 2 ) {

no_group += l;
for(j=max;j<=iend;j++) group~j][3] = no_group;
for(j=ibeg;j<=iend;j++) group[j]~4] = 3;
}

ibeg = i+l ;
no_group += l;
/* detect and skip ramp-like path node */
if(i<(number-l)) { .
lol=(float)group[i+1]~2]/(float)group~i+2]~2];
if((lo<LO_r&&lol<LO_r) 11 (lo>(1./LO_r)&&lol>(l./LO_r))) {

i += l;
ibeg = i+l ;
group~i]~3] = no_group;
}
}

group[i+1][3] = no_group;
}

iend = number;
/* detect special cases for wx and return two vectors */
flag_wx = 0;
if( no_group==l && number > 3 ) flag_wx = wx_detect(image,rows,cols,node, clagnode,ith,group,x,y,vctr list);
if(flag_wx == 1) {

~ (clagnode+ith)->group = l;
goto output;
) /* check collinearity */
if( ab~(iend-ibeg~>2 && collinear(x,y,ibeg,iend,&max,&dst)==l && dn_change(x,ibeg,iend) ~= 2 ) ( no_group += l;
for(j=max;j<=iend;j++) group[j]t3] s no_group;
for(j=ibeg;j<=iend;j++) group[j][4] = 3;
(clagnode+ith)->group = no_group;
/* retrieve the vector for each group */
merge = ivector(l,(no_group+l));
gp_count = group[l][3];
ibeg s l;
for(i=l;i<=number;i++) ( if(group[i][3]==gp count && i!=number) continue;
if(i=snumber) iend=i;
else iend = i-l;
if(abs(ibeg-iend)>1) line_fit(group,x, y,ibeg,iend,&xl,&x2,&start,&end);
else { xl=x[ibeg]; x2=x[iend]; start=ibeg; end=iend;~
width=O.0;
for(j=start;j<=end;j++) width += group[j][2];
no lag = abs(end-start+l);
width /= (float)no lag;
ratio = (float)(iend-ibeg+l)twidth;
/* detect the special condition for x-arms type */
flag_x=O;
/* if( ((node+group[ibeg][1])->above = O &&
(node+group~iend][l])->below=zl &&
(node+(node+group[iend][l])->b_node[0])->below a 1) ll ((node+group[iend][l])->below==0 &&
(node+group[ibeg][l])->above==l &&
(node+(node+group[ibeg][l])->a_node[0])->above==1) ) flag_x = 1;*/
flag_xx=0;
width_x = 0.0;
ratio_x = O.0;
if( ((node+group[ibeg][l])->above==0 &&
(node+group[iend][l])->beLow - 1 &&
(node+(node+group[iend][l])->b_node[0])->above==2 &&
(node+(node+group[iend][l])->b_node[O])->below==1) ) {

clagnode_x =
(clagnode+(clagnode+ith)->b_clagnode[O])->b clagnode[0];
number_x = (clagnode+clagnode_x)->number;
for(m=l;m<=number_x;m++) x_node = (clagnode+clagnode_x)->node[m];
width_x += (float)((node+x_node)->col end -(node+x_node)->col_start + 1);
width_x /= (float)number_x;
ratio x = (float)number_x / width_x;
if(ratio_x > O.S && ratio_x<2.2) -- {
flag_ x51;
if((node+(clagnode+clagnode_x)->node[number_x])->below = 2) flag_xx=l;
}
J

else if ( ((node+group[iend][1])->below = 0 &&
(node+group[ibeg][l])->above = 1 &&
(node+(node+grouptibeg][l])->a_node~0])->below--2 &&
(node+(node+group[ibeg][l])->a_node[0])->above==1) ) {

clagnode_x =
(clagnode+(clagnode+ith)->a_clagnode[0])->a_clagnode[0];
number_x = (clagnode+clagnode_x)->number;
for(m=l;m<=number_x;m++) x_node = (clagnode+clagnode_x)->node[m];
width_x +z (float)((node+x_nodel->col_end -(node+x_node)->col_start + l);
width_x /= (float)number_x;
ratio_x = (float)number_x / width_x;
if(ratio_x > 0.5 && ratio_x<2.2) {

flag_x=1;
if((node+(clagnode+clagnode_x)->node[l])->above==2) flag Yx=l;

}
/* detect condition (ii) in page 119 */
flag_ii=0;
if((node+group[ibeg][l])->above =l && (node+group[iend][l])->below==l) p_node = ~node+group[ibeg][1])->a_node[0];
ratio a = (float)abs((node+p_node)->col_end -(node+p_node)->col start +1) / width;
p_node = (node+group[iend][l])->b_node[0];
ratio_b = (float)abs((node+p_node)->col end -(node+p_node)->col start +1) / width;
if(ratio_a>1.7 ll ratio_b>1.7) flag_ii=1;
/* detect condition (ii) in page 119, but one side instead ( for returning a vector in top-right corner of c ) */
flag_ii_a=0;
if((node+group[1][1])->above = 1 && no_group==l &&
(node+group[number][l])->below==0 &&
((clagnode+(clagnode+ith)->a_clagnode[0])->a_clagnode[0]==0 il ((clagnode+(clagnode+ith)->a_clagnode[0])->class ='j' &&
(clagnode+(clagnode+ith)->a_clagnode[0])->a_clagnode[l] - 0 &&
(clagnode+(clagnode+(clagnode+ith)->a_clagnode[0])->a_clagnode[0])->a_clagnode[0]=
{

p_node = (node+group[l][l])->a_node[0];
ratio_a = (float)abs((node+p_node)->col_end -(node+p_node)->col_start +l) / width;
if(ratio_a>l.7) flag_ii_a=1;
else if((node+group[1][1])->above = 0 && no_group==l &&
(node+group[number][1])->below = 1 &&
((clagnode+(clagnode+ith)->b_clagnode[0])->b_clagnode[0]==0 ll ((clagnode+(clagnode+ith)->b_clagnode[0])->class=='j' &&
(clagnode+(clagnode+ith)->b_clagnode[0])->b_clagnode[l] = 0 &&
(clagnode+(clagnode+(clagnode+ith)->b_clagnode[0])->b_clagnode[0])->b_clagnode[0]=
p_node = (node+group[iend][l])->b_node[0];
ratio_b = (float)abs((node+p_node)->col_end -(node+p_node)->col_start +l) / width;
if(ratio_b>1.7) flag_ii a=l;

/* return vector~ */
if(ratio>=RATIO_v II
((flag~ 1 ll flag_ii_a==l) && ((float)no_lag*width) <= 50.) ll (ratio>=0.7 && flag_x-- 1) ll flag_ xx=51 1 1 (group[start][4]==3 && ratio>=RATIO_v_c) ) {

vctr=add_vctr(vctr list,clagnode,ith,xl,y[ibeg],x2,y[iend]);
vctr->type = 'v';
if( flag_ ii31 1 I flag_ii_a==l) vctr->small v = 'c';
if(flag_ii==l && no_lags=l) ( vctr->y[0] -= 0.01;
vctr->y[13 += 0.01;
}

vctr->no_lag = no_lag;
vctr->width = width;
if( y detect(image,rows,cols,node,clagnode,ith,ibeg,iend) = 1 ) vctr->dst-200.0;
for(j=ibeg;jC=iend;j++) group[j][4] = 4; /* vertical vector for y */
merge[gp_count]=4;
else {

for(j=ibeg;j<=iend;j++) group[j][4]-= 1; /* vertical vector */
merge[gp_count]=l;
I

else if ( ratio<=RATIO_h && number > 1) {
center=(int) ((ibeg+iend)~2.0+0.5);
xl = (float)((node+group[center][1])->col_start);
x2 = (float)((node+group[center][1])->col_end);
yl = y2 = (float)((node+group[center][l])->rowth);
(add vctr(vctr_list,clagnode,ith,xl,yl,x2,y2))->type='h';
for(j=ibeg;j<=iend;j++) group[j][4] = 2; /* horozontal vector */
merge[gp count]=2;
else merge[gp_count] = 0;
ibeg = i;
gp_count += l;

/* if nothing returned for an entire path and the x-, y-spread of two ending points of the line-fit() returned vector are large, then return a vertical vector (make for fixing clsz.pic) */
flag_ 5Z = O;
for(i=l;i<=no_group;i++) if(merge[i]) flag_ 9Z = l;
if(flag_ 9Z==O && (float)number >= SYMIN) {
line_fit(group,x, y~l~number~&xl~&x2~&start~&end);
if(fabs(xl-x2)>=SXMIN) ( - 105 _ -- -- 2 1 8~827 ~idth=0.0;
for(j=start;j<=end;j++) width += grouptj][2];
no_lag = abs(end-start+l);
~idth /= (float)no_lag;
vctr=add vctr(vctr_list,clagnode,ith,xl,y[l],x2,y[number]);
vctr->type = 'v';
vctr->no_lag = no_lag;
vctr->width = width;
}
/* merge v-h-v vectors or v-v vectors */
merge[no_group+l]=0;
for(i=1; <=(no_group-l);i++) if(merge[i]--l&&(((merge[i+l]==2IImerge[i+1]==O)&&merge[i+2]==1) merge[i+l] = 1)) vctr = *((clagnode+ith)->vctr) ;
j=l;
while(merge[j]!=0 && j<i) vctr = vctr->next;
j += l;
) if(merge[i+1]--OIImerge[i+1]==1) vctr_b = vctr->next;
else vctr_b = (vctr->next)->next;
a[O] = vctr->y[0] - vctr_b->y[l];
a[1] = vctr_b->x[1] - vctr->x[0];
a[2] = vctr_b->y[l]*vctr->x[0] - vctr->y[O]*vctr_b->x[l];
L = sqrt( a[O]*a[0] + a[l]*a[l] );
dql = fabs((a[O]*vctr->x[l] + a[l]*vctr->ytl] + a[2])/L);
dq2 = fabs((a[O]*vctr_b->x[0] + a[l]*vctr_b->y[0] + a[2])/L);
dp=sqrt((vctr_b->x[0]-vctr->x[l])*(vctr_b->x[0]-vctr->x[l])+
(vctr b->y[0]-vctr->y[l])*(vctr_b->y[0]-vctr->y[l]));
if( dql<DT && dq2<DT ) {

vctr->x[l] = vctr_b->x[l];
vctr->y[1] = vctr_b->y[1];
no_lag = vctr->no_lag + vctr_b->no_lag;
vctr->width=((float)(vctr->no_lag) * vctr->width +(float)(vctr_b->no_lag) * vctr_b->width)/
(float)no_lag;
vctr->no_lag = no_lag;
vctr = vctr_list;
(clagnode+ith)->no_vector -= 1;
/* remove the disgarded vector from link */
while( (vctr->next) != vctr_b ) vctr = vctr->next;
vctr->next = vctr_b->next;
free((char*)vctr_b);
}

free_ivector(merge,l,(no_group+1));
output:
no = (clagnode+ith)->no_vector ;
if( no > 0 ) vctr = *((clagnode+ith)->vctr);
free( (char*)(clagnode+ith)->vctr );
if( ((clagnode+ith)->vctr=(struct vector **) calloc((unsigned)no,sizeof(struct vector*))) ==
(struct vector **)0 ) . 21858~7 printf(ncalloc fail in (clagnode+%d)->vctr\n",i);
~clagnode+ith)->vctr -= 1; /* start from ONE */
for(j=1;j<=no;j++) {

(clagnode+ith)->vctr[j] = vctr;
vctr = vctr->next;

free_vector(x,1,number);
free_vector(y,l,number);
free_imatrix(group,l,number,l,4);

#undef width_cg #undef LO
#undef LO_r #undef EPSILON
#undef RATIO_h #undef RATIO_v #undef RATIO_v_c #undef DT

/* add new entry to end of vecor-list */
struct vector *add_vctr(vctr_list,clagnode,ith,xl,yl,x2,y2) struct vector *vctr_list;
struct clagnode *clagnode;
int ith;
float xl,x2,yl,y2;
{

while( vctr_list->next != (struct vector *)NULL ) vctr_list 5 vctr_list->next;
/* reserve a space for the next new entry *t if((vctr_list->next-(struct vector *)calloc(l,sizeof(struct vector))) == (struct vector *~0 ) printf("calloc fail in add_vctr(): l\nn);
/* add NULL to the new end of the list *t (vctr_list->next)->next = (struct vector *)NULL;
if( (clagnode+ith)->no_vector -- 0 ) {

if( ((clagnode+ith)->vctr=(struct vector **) calloc(l,sizeof(struct vector*))) == (struct vector **)0 ) printf("calloc fail in add_vctr(): (clagnode+%d)->vctr\nn,ith);
*((clagnode+ith)->vctr) = vctr_list;
(clagnode+ith)->no_vector += l;
vctr_list->x[0]=xl+0.5;
vctr_list->x[1]=x2+0.5;
vctr_list->y[0]=yl+0.5;
vctr_list->y[1]=y2+0.5;
vctr_list->no_clag += 1;
vctr list->clagnode[(vctr_list->no_clag -1)] = ith;
return(vctr_list);

/* check collinearity */
#define T0 2.6 /* threshold for collinear */
int collinear(x,y,ibeg,iend,max,distance) float *x,*y;
int ibeg,iend,*max;
float *distance;

~ 2 1 85827 float a[3],L,T,d;
int i;
a[0] = ytibeg] - y~iend];
a[1] = x[iend] - x[ibeg];
a[2] = y[iend]*x[ibeg] - y[ibeg]*x[iend];
L = sqrt( a[O]*a[0] + a[l]*a[1] );
*max = ibeg;
T=0.0;
for(i=(ibeg+l);i<=(iend-l);i++) ( d = a[O]*x[i] + a[l]*y[i] + a[2];
if(fabs(d)>T) {

T = fabs(d);
*max = i;

T = T/L;
*distance = T;
if (T>=TO ) return(1);
else return(0);
#undef T0 dn_change(xarray, start, end) float *xarray;
int start, end;
{
float *xp;
int *cp, last, val, change, count, i;
int left, right;
count = end - start +1;
xp = xarray + start + 1;
last = change = val = 0;
for (i = 1; i < (count-1); i++, xp++) if (*xp > *(xp-1)) left = 1;
else if (*xp < *(xp-1)) left = -1;
else left = 0;
if (*xp < *(xp+l)) right = 1;
else if (*xp > *(xp+1)) right = -1;
else right = 0;
if ((right + left) > 0) val = 1;
else if ((right + left) < 0) val = -1;
else val = last;
if ((last * val) < 0) change ++;
last = val;

return (change);

- 108 - ~ 2 1 85827 - log 2 1 85827 -path.s .c /* Analysis the path node during the feature extraction ~tage #include <ftr.h~
#define width_cg 3 /* width change between two adjacent node */
#define LO 0.6 /* threshold for width change */
#define LO_r 0.8 /* lo for ramp like node */
#define EPSILON 0.15 #define RATIO_v 1.2 /* thre~hold of the height-width-ratio for vertical vector */
#define R~TIO_h 0.85 /* threshold of the height-width-ratio for horizontal vector */
#define RATIO_v_one 0.9 /* threshold of the height-width-ratio for - vertical vector if clag has ONLY one path */
#define ~ATIO_v_i O.7 /* threshold of the height-width-ratio for vertical vector if clag has ONLY one path and is most likely the top of i */
#define DT 1.5 /* distance threshold for merging v-h-v vectorq */
void path_s(image,row~,node,clagnode,no_clagnode,ith,type,vctr_list) int **image,rows, no_clagnode;
struct node *node;
struct clagnode *clagnode;
struct vector *vctr_list;
char type;
int ith;
{

int flag_a, flag_b, flag_c, flag_s,flag_i,flag_ii_a, flag_sz;
int i,j,number,no_group=l,no_lag;
int c_node,p_node, *merge;
int ibeg,iend,max,gp_count,center,count;
int start,end;
int **group,no;
float *x,*y;
float epsilon,lo,lol,width,ratio,ratio_a,ratio_b;
float xl,x2,yl,y2,dst,dst_1nth;
float a[3], L, dql, dq2, dp;
float **gp_width, loend;
struct vector *vctr, *vctr_b;
number = (clagnode+ith)->number ;
group = imatrix(l,number,l,4);
/* column 1: lag-node no. */
/* column 2: lag-node's width */
/* column 3: group's no. */
/* column 4: feature category */
/* (1: v-vector, 2: h-vector, 3: arc) */
x=vector(l,number);
y=vector(l,number);
for(j=l;j<=number;j++) ( c_node = (clagnode+ith)->node[j];
x[j]=((float)(node+c_node)->col_start +
(float)(node+c_node)->col_end ) / 2.0;
y[j]=(node+c_node)->rowth;

for(j=l;j<=number;j++) group[j][l] = (clagnode+ith)->node[jl; /* node */
group[j][2] = (node+group[j][l])->col_end /* node'~ width */
- (node+group[j][l])->col_start + 1 ;

llo - _ 2 1 85827 ibeg = 1;
/* check whether thiq i~ a single path-clagnode */
if( (clagnode+ith)->a_clagnode[0] = 0 &&
(clagnode+ith)->b_clagnode[0] -- 0 ) flag~
else flag_s=0;
/* check whether the single path i~ the top of i */
width=0.0;
if(flag_s==l && ith<no_clagnode && no_clagnode = 2) {

for~i=l;i<=((clagnode+2)->number);i++) width += (float)(node+(clagnode+2)->node[i])->col_end - (float)(node+(clagnode+2)->node[i])->col_start + 1. ;
if((float)((clagnode+2)->number) / width > 4.5) flag_i=l;
/* return if it is a noise-like single path-clagnode */
if(flag_s - 1) ( width=0.0i for(j=l;j<=number;j++) width += group[j][2];
if (width<=4.0) return;
}

/* check width change and seperate them into groups with similar width */
group[1][3] = no_group;
for(i=l;i<number;i++) flag_a=0;
flag_b=0;
flag_c=0;
if(abs(group[i]~2]-group[i+1][2])>width_cg) flag_a=1;
if(i>l&&i<(number-l)) ( epsilon = fabs((float)group[i][2]/(float)group[i+1][2] -(float)group[i-1][2]/(float)group[i][2]);
epsilon += fabs((float)group[i][2]/(float)group[i+1][2] -(float)group[i+1][2]/(float)group[i+2][2]);if(epsilon>EPSILON) flag_b=1;
}

lo=(float)group[i][2]/(float)group[i+1][2];
if(lo<=LO 11 lo>=(l./LO)) flag_c=l;
if(flag_a==l && ( (i==l && group[1][2]>8) 1I flag_b o 1 11 (i = (number-1) && group[number][2]>8) && flag_c = 1 ) {

ibeg = i+l ;
no_group += l;
/* detect and skip ramp-like path node */
if(i<(number-l)) lol=(float)group[i+1][2]/(float)group[i+2][2];
if((lo<LO_r&&lol<LO_r) 11 (lo>(1./LO_r)&&lol>(l.~LO_r))) i += li group[i][3] = no_group;
}
}

group[i+1][3] = no_group;
/* recombine group~ if their AVERAGL widths don't have big change */
if(no_group>=2) gp_width = matrix(l,no_group,i,3)i gp_count = group[1][3];

111- 218S82;7 -ibeg = l;
lol 5 (float)group~1][2] / ~float)group[2][2];
loend = ~float)group[number][2] / (float)group[number-1][2] ;
/* find average width for each group */
for(i=l;i<=number;i++) {

if(group[i][3]==gp_count && i!=number) continue;
if(i==number) iend=i;
else iend = i-l;
count = iend-ibeg+l;
gp_width[gp_count][l] = (float)ibeg;
gp_width[gp_count][2] = (float)iend;
for(j=ibeg;j<=iend;j++) gp_width E gp_count][3] += (float)group[j][2]i /* don't count the first or last run length if it is an outlier */
if(ibeg==l && count>l && abs(group[l][2]-group[2][2])>width_cg &&
(lol<=LO 11 lol>=(l./LO)) ) gp width[gp_count][3] = (gp_width[gp_count][3] -(float)group[1][2]) / (float)(count-l);
else if(iend==number && count>l &&
abs(group[number][2]-group[number-1][2])>width_cg &&
(loend<=LO 11 loend>=(l./LO)) ) gp_width[gp count][3] = (gp width[gp_count][3] -(float)group[number~2]) /
(float)~count-l);
else gp_width[gp_count][3] /= (float)count;
gp_count +5 1;
ibeg = i;
}

/* checking and recombining */
gp_count = no_group;
for(i=l;i<gp_count;i++) {

lo = gp_width[i][3]/gp_width[i+1][3];
if(!(fabs(gp width[i][3]-gp_width[i+1][3])>=(float)width_cg &&
(lo<=LO 11 lo>=(l./LO))) ) {

for(j=((int)gp_width[i+l][l]);j<=number;j++) group[j][3] -= 1;
no_group -= l;
}

-free matrixtgp_width,l,no_group,l,3);

(clagnode+ith)->group = no_group;
/* retrieve the vector for each group */
merge = ivector(l,(no_group+l));
gp_count = group[l][3];
ibeg = l;
for(i=l;i<=number;i++) if(group[i][3]==gp_count && i!=number) continue;
if(i==number) iend=i;
else iend =
width=O.O;
for(j=ibeg;j<=iend;j++) width += group[j][2];
no_lag = iend-ibeg+l;
width /= (float)no_lag;
ratio = (float)(iend-ibeg+l)/width;

/* skip noise-like group */
if( (width*(float)no_lag)<=4.&&width<=2. ) ( merge[gp_count] = O;
ibeg = i;
gp_count += 1;
continue;

/* detect condition (ii) in page 119, but one side inqtead ( for returning a vector in top-right corner of c ) */
flag_ii_a=O;
if((node+group[1][1])->above - 1 && no_group==l &&
(node+group[number][1])->below - O &&
((clagnode+(clagnode+ith)->a_clagnode[O])->a_clagnode[O]==O ll ((clagnode+(clagnode+ith)->a_clagnode[O])->claqs ='j' &&
(clagnode+(clagnode+ith)->a_clagnode[O])->a_clagnode[l]--O &&
(clagnode+(clagnode+(clagnode+ith)->a_clagnode[O])->a_clagnode[O])->a_clagnode[O]=
( p_node = (node+group[1][1])->a_node[O];
ratio_a = (float)abs((node+p_node)->col_end -(node+p_node)->col_start +1) / width;
if(ratio_a>l.7) flag_ii_a=l;
else if((node+group[l][l])->above-- O && no_group==1 &&
(node+group[number][l])->below--l &&
((clagnode+(clagnode+ith)->b_clagnode[O])->b_clagnode[O]==O ll ((clagnode+(clagnode+ith)->b_clagnodetO])->class=='j' &&
(clagnode+(clagnode+ith)->b_clagnode~O])->b clagnode[l]==O &&
(clagnode+(clagnode+(clagnode+ith)->b_clagnode[O])->b_clagnode[O])->b_clagnode[O]=-( p_node = (node+group[iend][l])->b_node[O];
ratio_b = (float)abs((node+p_node)->col_end -(node+p_node)->col_start +l) / width;
if(ratio_b>l.7) flag_ii_a=li }

/* return vectors *~
/* return a vertical if it is satisfied the qz criterian */
flag_sz = O;
if( ((float)abs(iend-ibeg)) >= SYMIN ) ( line_fit(group,x, y, ibeg, iend, &xl, &x2,&start,&end);
if( ratio>2.5 && fabs(xl-x2) >= SXMIN) flag_sz = l;
.
if( flag_ 9Z ) vctr=add_vctr(vctr_list,clagnode,ith,xl,y[ibeg],x2,y[iend]);
vctr->width = width;
vctr->no_lag = no_lag;
vctr->type = 's';
}

/* check arc tendency */
else if( abs(iend-ibeg)>2 &&
arc_check(node,group,x,y,ibeg,iend,&max,&dst)==l ) {

line_fit(group,x, y, ibeg, iend, &xl, &x2,&start,&end);
vctr-add_vctr(vctr_list,clagnode,ith,xl,y[ibeg],x2,y[iend]);
vctr->width = width;
vctr->no_lag = no_lag;
dst_lnth = dst/sqrt((x[iend]-x[ibeg])*(xtiend]-x[ibeg])+
(y[iend]-y[ibeg])*(y[iend]-y[ibeg])) ;
if( dqt_lnth <= 1./8. ) merge[gp_count]=l; /* qtill check for merging poq~ibility */
else merge[gp_count]=3; /* definitly arc */

- 113 - 2 1 8s827 -vctr->ax[0]=x[ibeg]+0.5;
vctr->ax[l]=x[iend]+0.5;
vctr->ax~2]=x[max]+0.5;
vctr->ay[0]=y[ibeg]+0.5;
vctr->ay~l]=y[iend]+0.5;
vctr->ay[2]=y[max]+0.5;
vctr->dst-= dst;
vctr->type = 'a';
) /* check collinearity */
else if( abs(iend-ibeg)>2 &&
collinear_s(x,y,ibeg,iend,&max,&dst) = 1 ) {

line_fit(group,x, y, ibeg, iend, &xl, &x2, &start,&end);
vctr-add_vctr(vctr_list,clagnode,ith,xl,y[ibeg],x2,y[iend]);
vctr->width = width;
vctr->no_lag = no_lag;
dst_lnth = dst/sqrt((x[iend]-x[ibeg])*(x[iend]-x[ibeg])+
(y[iend]-y[ibeg])*(y[iend]-y[ibeg])) ;
if( dst_lnth <= 1./8. ) merge[gp_count]=1; /* still check for merging possibility */
else merge[gp_count]=3; /* definitly arc */
vctr->ax[0]=x[ibeg]+0.5;
vctr->ax[1]=x[iend]+0.5;
vctr->ax[2]=x[max]+0.5;
vctr->ay[0]=y[ibeg]+0.5;
vctr->ay[1]=y[iend]+0.5;
vctr->ay[2]=y[max]+0.5;
vctr->dst = dst;
vctr->type = 'a';
}

/* return vertical vectors */
else if(ratio>=RATIO_v II flag_ii_a==1 11 (flag_i && ratio>=RATIO_v_i) II
(flag_s==1 && ratio>=RATIO_v_one)) if(abs(ibeg-iend)>1) line_fit(group,x, y, ibeg, iend, &xl, &x2,&~tart,&end);
else ( xl=x[ibeg]; x2=x[iend]; }
vctr=add_vctr(vctr_list,clagnode,ith,xl,y[ibegj,x2,y[iend]);
vctr->type = 'v';
vctr->width = width;
vctr->no_lag = no_lag;
merge[gp_count]=1;
}
/* return hrorizontal vectors */
else if ( ratio<=RATIO_h II flag_s==l ) ( center=(int) ((ibeg+iend)/2.0+0.5);
xl = (float)((node+group[center][l])->col_start)i x2 = (float)((node+group[center][1])->col_end)i yl = y2 = tfloat)((node+group[center][l])->rowth);
(add_vctr(vctr_list,clagnode,ith,xl,yl,x2,y2))->type='h';
merge[gp_count]=2;
}

/* return a v-vector for an undetermined group if the segment is marked as 'v'*/
else if ( type=='v' ) ( if(abs(ibeg-iend)>1) line_fit(group,x, y, ibeg, iend, &xl, &x2,&start,&end);
else ~ xl=x[ibeg]; x2=x[iend]; ) vctr=add_vctr(vctr_list,clagnode,ith,xl,y[ibeg],x2,y[iend]);
vctr->type = 'v';
vctr->width = width;
vctr->no_lag = no_lag;

- 114 - 2~8~

merge[gp_count]=1;
el~e mergetgp_count] = 0;
ibeg = i;
gp_count += 1;

/* merge v-h-v vector.q or v-v vector~ */
merge[no_group+1]=0;
for(i=l;i<=(no_group-l);i++) if(merge[i]==l&&(((merge[i+l]==21Imerge[i+l]==o)&&merge[i+2]==l)11 merge[i+1]==1)) {

vctr = *((clagnode+ith)->vctr) ;
j=l;
while(merge[j]!=0 && j<i) I

vctr = vctr->next;
j += 1;

}

if~merge[i+1]==0llmerge[i+1]==1) vctr_b = vctr->next;
else vctr_b = (vctr->next)->next;
a[0] = vctr->y[0] - vctr_b->y[1];
a[1] = vctr_b->x[l] - vctr->x[0];
a[2] = vctr_b->y[l]*vctr->x[0] - vctr->y[0]*vctr_b->x[l];
L = sqrt~ a[0]*a[0] + a[l]*a[l] );
dql = fab~((a[0]*vctr->x[l] + a[l]*vctr->y[l] + a[2])/L);
dq2 = fabs((a[0]*vctr_b->x[0] + a[l]*vctr_b->y[0] + a[2])/L);
dp=sqrt((vctr_b->x[0]-vctr->x[l])*(vctr_b->x[0]-vctr->x[l])+
(vctr_b->y[0]-vctr->y[l])*~vctr_b->y[0]-vctr->y[l]));
if( dql<DT && dq2<DT ) ( vctr->x[l] = vctr_b->x[l];
vctr->y[l] = vctr_b->y[l];
no_lag = vctr->no_lag + vctr_b->no_lag;
vctr->width=((float)(vctr->no_lag) * vctr->width +(float)(vctr_b->no_lag) * vctr_b->width)/
(float)no_lag;
vctr->no_lag = no_lag;
vctr->type='v';
vctr->dst=0.0;
vctr = vctr_list;
(clagnode+ith)->no_vector -= l;
/* remove the disgarded vector from link */
while( (vctr->next) != vctr_b ) vctr = vctr->next;
vctr->next = vctr_b->next;
free((char*)vctr_b);
) free_ivector(merge,l,(no_group+l));
no = (clagnode+ith)->no vector ;
if( no > 0 ) ( vctr = *((clagnode+ith)->vctr);
free( (char*)(clagnode+ith)->vctr );
if( ((clagnode+ith)->vctr=(struct vector **) calloc((un~igned)no,sizeof(struct vector*))) =
(struct vector **)0 ) printf("calloc fail in (clagnode+%d)->vctr\nn,i);
(clagnode+ith)->vctr -= l; /* start from ONE */
for(j=l;j<=no;j++~ ~

-(clagnode+ith)->vctr~j] = vctr;
vctr = vctr->next;
) free_vector(x,l,number);
free_vector(y,l,number);
ree_imatrix(group,1,number,1,4);
}
#undef width_cg #undef LO
#undef LO_r #undef EPSILON
#undef RATIO_h #undef RATIO_v #undef RATIO_v_one #undef DT

/* check arc tendency */
int arc_check(node,group,x,y,ibeg,iend,max,distance) struct node *node;
int **group;
float *x,*y;
int ibeg,iend,*max;
float *distance;
{

float a[3],L,T,d;
float dl,dr;
int i;
int flag=0,sign,sign_p,fst_sign=10,sign_cl=0,sign_cr=0,flag_sc=0;
int beg, end;
for(i=ibeg;i<iend;i++) dl = (float)((node+group[i][1])->col_start) -(float)((node+group[i+1][1])->col_start);
dr = (float)((node+group[i][1])->col_end) -(float)((node+group[i+1][1])->col_end);
if( (dl*dr) < 0.0 ) return(0);
else if(dl>0.) sign=1;
else if(dl==0.0) sign=0;
else sign=-1;
if(dl!=0.0) sign_cl += 1;
if(dr!=0.0) sign_cr += 1;
if(i!=ibeg && sign==sign_p) continue;
else sign_p=sign;
if(flag==0 && sign!=0) ( ~lag=l;
fst_sign=sign;
continue;
) else if(flag==1 && fst_sign == (-l*sign) ) flag_sc=1;
if(flag_sc==1 && fst_sign==sign) return(0);
if( sign_cl <= 2 && sign_cr <= 2 && (sign_cl+sign_cr)<4 ) return(0);

a[0] = y[ibeg] - y[iend];
a[1] = x[iend] - x[ibeg];
a[2] = y[iend]*x[ibeg] - y[ibeg]*x[iend];
L = sqrt( a[0]*a[0] + a[l]*a[1] );

- 116 - 2 1 8 58~ 7 *max = ibegi T=O.O;
for(i=(ibeg+l);i<=(iend-l);i++) {

d = a[O]*x[i] + a[l]*y[i] + a[2];
if(fabs(d)>T) ( T = fab~(d);
*max = i;
}
}

T = T/L;
*distance = T;
i~f( T/L > 1./10. ) return(1);
else return(O);

/* check collinearity */
int collinear_~(x,y,ibeg,iend,max,distance) float *x,*y;
int ibeg,iend,*max;
float *distance;
{

float a[3],L,T,d;
int i;
a[O] = y[ibeg] - y[iend];
a[l] = x[iend] - x[ibeg];
at2] = y[iend]~x[ibeg] - y[ibeg]*x[iend];
L = sqrt( a[O]*a[O] + a[l]*a[l] );
*max = ibeg;
T=O.O;
for(i=(ibeg+l);i<=(iend-l);i++) {

d = a[O]*x[i] + a[l]*y[i] + a[2];
if(fabs(d)>T) {

T = fabs(d);
*max = i;
}
T = T/L;
*distance = T;
if(T/L > 1./8.) return(1);
else return(O);
}

! - 117 - ~ - 2 ~ 8 5 8 2 7 -/* noise reduction based on median filter idea and a mask */ prep. c #include <ftr.h>
#define TV 14 void prep(o_image,rows,cols) int **o_image,row~,cols;

int **f_image, **image, **mask, count, npix=4;
int vedge_up, vedge_dw, hsum;
int i,j,m,n;
int flag_up, flag_dw;
int n_node, c_node,width_n,width_c,no_node,no_clagnode,s_node;
struct clagnode *clagnode;
struct node *node;
f_image = imatrix(-2,rows+3,-2,cols+3);
image = imatrix(-2,rows+3,-2,cols+3);
mask = imatrix(-1,1,-3,3);
for(i=-l;i<=l;i++) for(j=-3;j<=3;j++) mask[i][j] = 1;
mask[-1][-1] = mask[-l][O] = mask[-1][1] = mask[O][O] = O;
for (i=1; i<=rows; i++) ( for (j=1; j<=cols; j++) if( o_image[i-l][j-1] == O ) {

f_image[i][j] = 1;
image[i][j] = 1;

for(i=l;i<=rows;i++) for(j=l;j<=cols;j++) {

count=O;
for(m=(i-l);m<=(i+l);m++) for(n=(j-l);n<=(j+l);n++) count += image[m][n];
if(count<=npix) {

/* remove the pixel if it does not belong to a long scan line */
hsum=O;
for(n=(j-l);n<=(j+l);n++) hsum += image[i][n];
if( !(hsum =3 && ((image[i][j-2]==1 && image[i][j-3]==1) (image[i][j+2]==l && image[i][j+3]==l)ll - (image[i][j-2]==1 && image[i][j+2]--1))) ) f_image[i][j]=O;
}

else if( count<=7 && image[i][j] c O ) {

/* add the pixel if it is not in a V-edge */
flag_up = O;
flag_dw = O;
vedge_up = vedge_dw = O;
for(m=-l;m<=l;m++) for(n=-3;n<=3;n++) vedge_up += image[i+m][j+n]*mask[m][n];
if(image[i-l][j] c O && image[i-2][j]==0) - 118 - _ 2185827 .
flag_up = 1;
for(m=-l;m<=l;m++) for(n=-3;n<=3;n++) vedge_dw += image[i+m][j+n]*ma~k[-m][n];
if(image[i+l][j]==0 && image[i+2][j] = 0) flag_dw = 1;
if( !((vedge_up>TV && flag_up) ll (vedge_dw>TV && flag_dw)) ) f_image[i][j]=1;
}

else f_image[i]~j]=1;

for (i=1; i<=rows; i++) for (j=1; j<=cols; j++) {

if(f_image[i][j]==0) o_image[i-l][j-1] = 255;
else o_image[i-l~[j-1] = 0;

free_imatrix(f_image,-2,rows+3,-2,cols+3);
free_imatrix(image,-2,rows+3,-2,cols+3);
free_imatrix(mask,-1,1,-3,3);

/* remove some noise-like node from the top or bottom of each blob */
node = (struct node *) calloc((unsigned)(cols*row~),sizeof(struct node));
if(node == (struct node *) NULL) {

fprintf(stderr, "calloc failed for node\nn);
exit (1);
node -= 1; /* let node number start from ONE */
lag(o_image, rows, cols, node, &no node);
clagnode = (struct clagnode *)calloc((unsigned) (no_node), sizeof(struct clagnode));
clagnode -= l; /* let clagnode number start from ONE */
no_clagnode = 0;
s_node = 1;
do{
clag(rows, node, s_node, clagnode, &no_clagnode);
/* look for the starting node for next blob */
for(i=l;i<=no_node;i++) {
if((node+i)->mark != 1) {

s_node = i;
break;
else s_node=0;
~ while (s_node > 0);
/* remove the clag-path-node which has only one run length and connects to some other clagnode from the image */
for(i=l;i<=no_clagnode;i++) if((clagnode+i)->class = 'p' && (clagnode+i)->number = 1) c_node = (clagnode+i)->node[l]i if(((node+c_node)->above+(node+c_node)->below) - 1) for(j=((node+c_node)->col_start);j<=((node+c_node)->col_end);j++) o_image[(node+c_node)->rowth][j]=255;
/* remove path-node with degree (0,1) or (1,0) if it is noise-like */
for(i=l;i<=no_node;i++) ( if((node+i)->above c O && (node+i)->below==1) I

n_node = (node+i)->b node[O];
width_n - (node+n_node)->col_end - (node+n_node)->col_start + 1 ;
width_c = (node+i)->col_end - (node+i)->col_start + 1 ;
if( (width_n-width_c)>=3 && ((float)width_n/(float)width_c)>=3.0) for(j=(tnode+i)->col_start);j<=((node+i)->col_end);j++) o_image[(node+i)->rowth][j]=255;
else if((node+i)->above==1 && (node+i)->below==O) {

n_node = (node+i)->a_node[O];
width n = (node+n_node)->col_end - (node+n_node)->col_start + 1 ;
width_c = (node+i)->col_end - (node+i)->col_start + 1 ;
if( (width_n-width_c)>=3 && ((float)width_n~(float)width_c)>=3.0) for(j=((node+i)->col_start);j<=((node+i)->col_end);j++) o_image[(node+i)->rowth][j]=255;
}

for(i=l;i<=no_clagnode;i++) free((char*)((clagnode+i)->node+l));
free((char*)(clagnode+~));
free((char*)(node+1));

/* median filtering */
void mdn~(o_image,rows,cols) int **o_image,rows,cols;
( int **f_image, **image, count, npix=4;
int vedge_up, vedge_dw, hsum;
int i,j,m,n;
f_image = imatrix(-2,rows+3,-2,cols+3);
image = imatrix(-2,rows+3,-2,cols+3);
for (i=l; i<=rows; i++) for (j=1; j<=cols; j++) if( o_image[i-l][j-l] == O ) image[i][j] = f_image[i][j] = 1;

for(i=l;i<=rows;i++) for(j=l;j<=cols;j++) countzO;
for(m=(i-1);m<=(i+l);m++) for(n=(j-l);n<=(j+1);n++) count += image[m][n];
if(count<=npix) f_image[i][j]=O;
else f_image[i][j]=l;

for (i=l; i<=rowq; i++) for (j=l; j<=cols; j++) if(f_image[i][j]==O) o_image[i-l][j-l] = 255;
. else o_image[i-l][j-l] = O;

free_imatrix(f_image,-2,rows+3,-2,col-~+3);
free_imatrix(image,-2,rows+3,-2,cols+3);

- 120 - 2 1 858~?~

- 121 - 2 1 8 5 8 2 ~
-/*************** ***** *************************************/
/* */
/*QUANT.,, */
/* */
/*******************************************`*********************/

/*
* quant: convert a continuous vector to a bit vector for repreRenting * a segment.
*
*/
#include "cluster.h"

void quant(fl, f2, f3) FILE *fl, *f2, *f3;

int nclust, nfeat, veclen, r_count, count_feat, count_seg;
CLUST *clu~t, *cpt;
FEAT feat, *fpt;
short i, j, k;
char oldlab[3];
int *qpt, *hpt, hval, mid;
float mval, dum, fdist();

fpt = &feat;
fscanf(f2, "%dn, &nclust);
veclen = nclust/8 + (nclust%8 ? 1 : 0); /* length of bit-vector in bytes */
if ((clust = (CLUST *) calloc(nclust, sizeof (CLUST))) == (CLUST *) NULL) {

fprintf(stderr, "No memory space for data\nn);
exit(1);
}

/* Read the clust.cent file */

for (i = O, cpt = clust; i < nclust; i++, cpt++) if (fscanf(f2, "%f %f %f %f %fn, &cpt->x, &cpt->y, &cpt->px, &cpt->py, &cpt->d) == EOF) fprintf(stderr, n*** clust.cent file unexpected termination\nn);
exit(1);

/* Create an array for temp. storage of binary vector in bit-~ */
if ((qpt = (int *) calloc(nclust, sizeof (int))) == (int *) NULL) fprintf(stderr, "No memory space for data\nn);
exit(1);
}

/* Create an array for temp. storage of binary vector in byte~ */

if ((hpt = (int *) calloc(veclen, sizeof (int))) --= (int *) NULL) ~ -- 2185827 fprintf~stderr, "No memory space for data\nn);
exit(1);

/* Read the clust.v file, one entry at a time, and determine the bit vector for a segment tlook for the label ending with-a 0, marking the start of a 9egment) */
f~canf(fl, "%dn, &nfeat);
*/
count_feat = 0;
count_seg = 0;
while (1) r_count = fscanf(fl,"%d %s %f %f %f %f %f", &fpt->num, fpt->lab, &fpt->x, &fpt->y, &fpt->px, &fpt->py, &fpt->d);
if (*(fpt->lab+2) == '*') continue;
/* At the end of a segment, store the binary vector and write in xxx.sv file */
if (((*(fpt->lab+2)=='0') ll (r_count==EOF)) && count_feat) {
i=0, j=0;
while(i < (nclust/8)) for (k=7, hval = 0; k>=0; k--) hval = hval I *(qpt+j)i j++;
hval = hval 1;
}

*(hpt+i) = (hval 1) &0xFF;
i++;
}

if (i = nclust%8) for (k=i-1, j=(nclust/8)*8, hval=0; k>=0; k--) hval = hval I *(qpt+j);
i++i hval = hval 1;
*(hpt+nclust/8) = (hval >> 1) &0xFF;
J

oldlab[2] = 0;
fprintf(f3, n%s n~ oldlab);
for (i=0; i<veclen; i++) fprintf(f3, n %xn, *(hpt + i));
*(hpt + i) = 0;
}
fprintf(f3~ n\nn);

for (i=0; i<nclust; i++) *(qpt+i) = 0;
strcpy(oldlab, fpt->lab);

`- 21 85827 ~ count_feat = 0;
count_qeg++; /* cumulative no. of qegmentq */

if (r_count = EOF) return;
if (!count_3eg) strcpy(oldlab, fpt->lab);
/* A.qsign featureq to cluqterq */
mid=0;
for (j=0, cpt=clust, mval=MEG; j < ncluQt; j++, cpt++) dum = fdi.qt(fpt, cpt);
if (dum < mval) Imval = dum, mid = j;}

*(qpt+mid) = l;
count_feat++; /* cumulative no. of featureq in the current qegment */

_ ~ ` t c ( ~t~e ~ 5t~s ~ cac~ ~arac~c~
#include <nrec.h> ~ reclhl . J
~r~nS;t;o~ frOb~ ;t-es LQ~ee~ ch~c~

char *sname[] = {naOn naln "a2n nbOnt nbln~ "b2n, nCOn~ nCln~ ~C2 ~
ndOn~ ndln~ "d2n, neo n ~ ~el n ~ ~e2 nfon nfln gO, gln, "g2n, nhon~ nhln~ "h2n, ion~
n jon n jln nkOn~ nkln~ "k2n, "lon, nmon nmln~ nm2n~ ~m3n~ ~m4 nnOn~ nnln~ "n2n, nOOn~ noln~ "o2n, "pOn, ~pln~ ~p2n~
~qOn~ ~qln~
nron nrln ~sOn~
nton ntln nuon~ nUln~ ~U2 nvon nVln nwon nWln~ "W2n~ "w3 nxo n nXl n nyo n nyl n "zon, SO ~ S 1 r n~xn~(char *) NULL~i .

/* Initialize transition prob. table */
double Tran[27][27] = {
/*A*/ {{.0011),{.0193),{.0388),t.0469),(.0020),(.0100), (.0233),(.0020),(.0480),(.0020),(.0103),(.1052), (.0281),(.1878),(.0008),(.0222),(EPS), (.1180), .1001),(.1574),(.0137),(.0212),(.0057),(.0026), (.0312),(.0023),(.1001)J, /*B*/ ((.0931),(.0057),(.0016),(.0008),(.3219),(EPS), (EPS), (EPS), (.0605),(.0057),(EPS), (.1242), (.0049),(EPS), (.0964),(EPS), (EPS), (.0662), (.0229),(.0049),(.0727),(.0016),(EPS), (EPS), (.1168),(EPS),(.0229)~, /*c*/ {(.1202),(EPS), (.0196),(.0004),(.1707),(EPS), (EPS), (.1277),(.0761),(EPS), (. 0324),(.0369), (.0015),(.0011),(.2283),(EPS), (.0004),(.0426), (.0087),(.0893),(.0347),(EPS), (EPS), (EPS), (.0094),(EPS),(. 0087)~, /*D*/ {(.1044),(.0020),(.0026),(.0218),(.3778),(.0007), (.0132),(.0007),(.1803),(.0033),(EPS), (. 0125), (.0178),(.0053),(.0733),(EPS), (.0007),(.0324), (.0495),(.0013),(.0601),(.0099),(.0040),(EPS), (.0264),(EPS),(. 0495)}, /*E*/ {(.0660),(.0036),(.0433),(.1194),(.0438),(.0142), .0125),(.0021),(.0158),(.0005),(.0036),(.0456), .0340),(.1381),(.0040),(.0192),(.0034),(.1927), .1231),(.0404),(.0048),(.0215),(.0205),(.0152), (.0121),(.0004),(.1231)~, /*F*/ {(.0838),(EPS), (EPS), (EPS), (.1283),(.0924), -(EPS), (EPS), (. 1608),(EPS), (EPS), (. 0299), - t.0009),(.0009),(.2789),(EPS), (EPS), (.1215), (.0026),~.0496),(.0462),~EPS), ~EPS), ~EPS), (.0043),(EPS),(.0026)), /*G*/ {(.1078),(EPS), (EPS), (.0018),~.2394),(EPS), (.0177),(.1281),(.0839),(EPS), (EPS), (.0203), (.0027),(.0451),~.1140),(EPS), (EPS), (.1325), (.0256),(.0247),(.0512),(EPS), (EPS), (EPS), (.0053),(EPS),(.0256)J, /*H*/ {(.1769),(.0005),(.0014),(.0008),(.5623),(EPS), (EPS), (.0005),(.1167),(EPS), (EPS), (.0016), (.0016),(.0038),(.0786),(EPS), (EPS), (.0153), (.0027),(.0233),(.0085),(EPS), (.OOll),(EPS), (.0041),(EPS),(.0027)~, /*I*/ ((.0380),(.0082),(.0767),(.0459),(.0437),(.0129), (.0280),(.0002),(.0016),(EPS), (.0050),(.0567), (.0297),(.2498),(.0893),(.0100),(.0008),(.0342), (.1194),(.1135),(.0011),(.0250),(EPS), (.0023), (.0002),(.0079),(.1194)), /*J*/ {(.1259),(EPS), (EPS), (EPS), (.1818),(EPS), (EPS), (EPS), (.0350),(EPS), (EPS), (EPS), (EPS), (EPS), (.3147),(EPS), (EPS), (.0070), (EPS), (EPS), (.3357),(EPS), (EPS), (EPS), (EPS), (EPS),(EPS)~, /*K*/ {{.0395),{.0028),{EPS), (.0028),(.5282),(.0028), (EPS), (.0198),(.1582),(EPS), (.0113),(.0198), (.0028),(.0565),(.0198),(EPS), (EPS), (.0082), (.1102),(.0028),(.0028),(EPS), (EPS), (EPS), (.0113),(EPS),(.1102)~, /*L*J {(.1342),(.0019),(.0022),(.0736),(.1918),(.0105~, (.0108),(EPS), (.1521),(EPS), (.0079),(.1413), (.0082),(.0004),(.0778),(.0041),(EPS), (.0034), (.0389),(.0254),(.0269),(.0056),(.0011),(EPS), (.0819),(EPS),(.0389)~, /*M*/ ((.1822),(.0337),(.0026),(EPS), (.2975),(.0010), (EPS), (EPS), (.1345),(EPS), (EPS), (.0010), (.0654),(.0042),(.1246),(.0722),(EPS), (.0026), (.0244),(.0005),(.0337),(.0005),(EPS), (EPS), (.0192),(EPS),(.0244)~, /*N*/ {(.0550),(.0004),(.0621),(.1681),(.1212),(.0102), (.1391),(.0013),(.0665),(.0009)-,(.0066),(.0073), (.0104),(.0194),(.0528),(.0004),(.0007),(.0011), (.0751),(.1641),(.0124),(.0068),(.0018),(.0002), (.0157),(.0004),(.0751)}, /*O*/ {(.0082),(.0101),(.0162),(.0231),(.0037),(.1299), (.0082),(.0025),(.0092),(.0014),(.0078),(.0416), (.0706),(.2190),(.0222),(.0292),(EPS), (.1530), (.0357),(.0396),(.0947),(.0334),(.0345),(.0012), (.0041),(.0004),(.0357)~, /*P*/ {(.1359),(EPS), (.0006),(EPS), (.1747),(EPS), (EPS), (.0237),(.0423),(EPS), (EPS), (.0812), (.0073),(.0006),(.1511),(.0581),(EPS), (.2306), (.0180),(.0287),(.0457),(EPS), (EPS), (EPS), (.0017),(EPS),(.0180)~, /*Q*/ I(EPS), (EPS), (EPS), (EPS), (EPS), (EPS), (EPS), (EPS), (EPS), (EPS), (EPS), (EPS), (EPS), (EPS), (EPS), (EPS), (EPS), (EPS), (EPS), (EPS), (l.OOO),(EPS), (EPS), (EPS), (EPS), (EPS),(EPS)~, /*R*/ {(.1026),(.0033),(.0172),(.0282~,(.2795),(.0031), (.0175),(.0017),(.1181),(EPS), (.0205),(.0164), (.0303),(.0325),(.1114),(.0055),(EPS), (.0212), (.0655),(.0596),(.0192),(.0142),(.0017),(.0002), (.0306),(EPS),(.0655)~, /*S*/ ((.0604),(.0012),(.0284),(.0027),(.1795),(.0024), (EPS), (.0561),(.1177),(EPS), (.0091),(.0145), `(.0112),(.0021),(.0706),(.0386),(.0009),(.0027), 2 ~ ~5827 ~`' (.0836),(.2483),(.0579),(EPS), (.0039),(EPS), (.0081),(EPS),(.0836)~, /*T*/ ((.0619),(.0003),(.0036),(.0002),(.1417),(.0007 (.0002),(.3512),(.1406),(EPS), (EPS), (.0101 (.0044),(.0015),(.1229),(.0003),(EPS), (.0479 (.0418),(.0213),(.0195),(.0005),(.0088),(EPS), (.0203),(.0005),(.0418)}, /*U*/ ((.0344),(.0415),(.0491),(.0243),(.0434),(.0052 (.0382),(.0010),(.0258),(EPS), (.0014),(.0197 (.0329),(.1517),(.0019),(.0386),(EPS), (.1460 (.1221),(.1255),(.0029),(.0012),(EPS), (.0010 (.0014),(.0005),(.1221)~, /*V*/ {(.0749),(EPS), (EPS), (.0023),(.6014),(EPS), (EPS), (EPS), (.2569),(EPS), (EPS), (EPS), (.0012),(EPS), (.0530),(EPS), (EPS), (EPS)~
(.0023),(EPS), (.0012),(.0012),(EPS), (EPS), (.0058),(EPS),(.0023)}, /*W*/ ((.2291),(.0008),(EPS), (.0032),(.1942),(EPS), (EPS), (.1422),(.2104),(EPS), (EPS), (.0041 (EPS), (.0357),(.1292),(EPS), (EPS), (.0106 (.0366),(.0016),(EPS), (EPS), (EPS), (EPS), (.0024),(EPS),(.0366)}, /*X*/ {(.0672),(EPS), (.lll9),(EPS), (.1269),(EPS), (EPS), (.0075),(.1119),(EPS), (EPS), (EPS), (.0075),(EPS), (.0075),(.3507),(EPS), (EPS), (EPS), (:1716),(EPS), (EPS), (EPS), (.0373 (EPS), (EPS),(EPS)), /*Y*/ ~(.0586),(.0034),(.0103),(.0069),(.2897),(EPS), (EPS), (EPS), (.0690),(EPS), (.0034),(.0172 (.0379),(.0172),(.2207),(.0310),(EPS), (.0310 (.1517),(.0172),(.0138),(EPS), (.0103),(EPS), (.0069),(.0034),(.1517)}, /*Z*/ {(.2278),(EPS), (EPS), (EPS), (.4557),(EPS), (EPS), (EPS), (.2152),(EPS), (EPS), (.0127 (EPS), (EPS), (.0506),(EPS), (EPS), (EPS), (EPS), (EPS), (.0127),(EPS), (EPS), (EPS), (EPS), (.0253),(EPS)}, /*S*/ {(.0604),(.0012),(.0284),(.0027),(.1795),(.0024 (EPS), (.0561),(.1177),(EPS), (.0091),(.0145 (.0112),(.0021),(.0706),(.0386),(.0009),(.0027 (.0836),(.2483),(.0579),(EPS), (.0039),(EPS), (.0081),(EPS),(.0836)}};

-seg2vec. c /* vectorize each segment */
#include <ftr.h>
#define SMIN 8. /* mi n;~l-m ~pred in x and y direction for det~rm;ning ~om; n~nt diagonal vector for s */
#define SeIX 9 /* m;n;mllm no. of column~ for testing existing of s */
void seg2vec(bimage,rows,bcol,seg,s_v_1ist,col_~tart,1agimg,dspl,clagimg) int **bimage,rows,bcol,col_start,**lagimg,**dspl,**clagimg;
struct vector **s_v_list;
BS~G *seg;
extern int Cols;
int s_mdf();
int **image,cols,no_clagnode=O,**group,*prof;
int s_node, no_node=0;
int i,j,m,n,c_node,no;
int start,end,path,pixel,path_g,path_l;
int flag, flag_b,flag m,flag_h,ibeg,iend,jbeg,jend;
int path_beg, path_end;
float xl,x2,yl,y2;
float *x,*y, ratio, width;
struct clagnode *clagnode;
struct node *node;
struct vector *vctr_list, *vctr, *add_h();
void dv_check();
if((vctr_list=(struct vector *)calloc(1,sizeof(struct vector)))==
(struct vector *)NULL) printf("calloc fail in vctr_list\nn);
vctr_list->next = (struct vector *)NULL;
start=(int)(seg->start + 0.4);
if(start<0) start=0;
end=(int)(seg->end + 0.6)-1;
if(end>Cols) end=Cols;
cols = end - start + 1 ;
image = imatrix(O,(rows-l),O,(col~-1));
/* form the segment image */
for(i=O;i<rows;i++) for(j=O;j<cols;j++) image[i][j] = bimage[i][j+start];
/* remove wrong cutting pixels */
for(i=O;i<rows;i++) for(j=O;j<cols;j++) if(image[i][j]==0) jbeg = j;
pixel = 1;
if(j==(cols-1)) jend = j;
else while(image[i][j+l]==0) {

j += l;
pixel += l;
if(j==(cols-1)) break;
jend = j;
}

if(pixel<=2 && !(cols<=3 && seg->type=='h') ) - 128 - 218~827 -path_g=0;
path_beg=-l;
path_end=Col~+l;
path=lagimg[i][jbeg+qtart+col_qtart];
for(n=col_qtart;n<=(col_start+bcol);n++) if(lagimg[i][n]==path) path_g += 1;
if(path_beg==-1) path_beg = n;
if(n<(col_start+bcol)) if(lagimg[i][n+l]!=path) path_end = ni }

else path_end = ni }

ratio = (float)pixel / (float)path_g;
if(ratio<l./5. && (path_beg==(jbeg+start+col_start) ll path_end = (jend+start+col_start)) ) for(n=jbeg;n<=jend;n++) image[i][n] = 255;
else ( path_g=0;
path_1=0;
path_beg=Cols+l;
path_end=-l;
path=clagimg[i][jbeg+start+col_start];
for(m=O;m<rows;m++) {

flag_b = 0;
for(n=col_start;n<=(col_start+bcol);n++) if(clagimg[m][n]==path) ( path_g += 1;
if(flag_b==0) if(path_beg>n) path_beg = n;
flag_b = l;
}

if(n<(col_start+bcol)) ( if(clagimg[m][n+1]!=path && path_end<n) path_end = n;
}
else path_end = n;
}

for(n=(start+col_start);n<=(end+col_start);n++) if(clagimg[m][n]==path) path_l +- 1;
) ratio = (float)path_l / (float)path_g;
if(ratio<0.2 && (path_beg>=(start+col_start) ll path_end<=(end+col_start)) ) for(n=jbeg;n<=jend;n++) image[i][n] = 255;

/* find the clag */
node = (struct node *) calloc((unsigned)(cols*rowq),qizeof(struct node));
if(node == (struct node *) NULL) ( fprintf(stderr, "calloc failed for node\n");
exit (1);

node -= 1; /* let node number start from ONE */
lag(image, rows, col~, node, &no_node);
clagnode = ~struct clagnode *)calloc((un~igned) (no node), sizeof(struct clagnode));
clagnode -= 1; /* let clagnode number start from ONE */
no_clagnode = 0;
s_node = 1;
do( clag(rows, node, .~_node, clagnode, &no_clagnode);
~* look for the starting node for next blob */
for(i=l;i<=no_node;i++) ( ift(node+i)->mark != 1) ~_node = i;
break;
else s_node=0;

}

) while (s_node > 0);
/* remove the clag-path-node which has only one run length and connects to some other clagnode from the image (for fixing top of z or s) */
for(i=l;i<=no_clagnode;i++) if((clagnode+i)->class == 'p' && (clagnode+i)->number == 1) c_node = (clagnode+i)->node[1];
if(((node+c_node)->above+(node+c_node)->below)==1) for(j=((node+c_node)->col_start);j<=((node+c_node)->col_end);j++) image[(node+c_node)->rowth][j]=255;

for(i=l;i<=no_clagnode;i++) free((char*)((clagnode+i)->node+1));
free((char*)(clagnode+1));
free((char*)(node+1));
#ifdef DEMO
/* write the segment image into the displaying-purpose array */
for(i=0;i<(10*rows);i++) for(j=0;j<(10*cols);j++) dspl[i][j+10*(start+col_start)] = image[i/10][j/10];
#endif node = (struct node *) calloc((unsigned)(cols*rows),sizeof(struct node));
if(node == (struct node *) NULL) {

fprintf(stderr, "calloc failed for node\nn);
exit (1);

node -= 1; /* let node number start from ONE */
lag(image, rows, cols, node, &no_node);
lagnode = (struct clagnode ~)calloc((unsigned) no_node, sizeof(struct clagnode));
clagnode -= l; /* let clagnode number start from ONE */
s_node=1;
no_clagnode=0;
do {
clag(row~, node, ~_node, clagnode, &no_clagnode);
/* look for the starting node for next blob */
for(i=l;i<=no_node;i++) ~ {
if((node+i)->mark != 1) ( 9_ node = i;
break;
else s_node=0;
I while (s_node > 0);
/* return horizontal vector~ if it is indicated by seg->type and it~
height to width ratio is high */
flag_h = 0;
for(i=l;i<=no_clagnode;i++) if((clagnode+i)->class ='p' && (clagnode+i)->number>9) ( width = 0.0;
for(j=l;j<=((clagnode+i)->number);j++) width += ((node+(clagnode+i)->node[j])->col_end -(node+(clagnode+i)->node[j])->col_start);
width /= (float)((clagnode+i)->number);
if( (float)((clagnode+i)->number)/width > 2.5 ) flag_h=l;
if(seg->type = 'h' && flag_h==0) ( prof = ivector(O,cols-l);
for(flag_b=O,i=O;i<rows;i++) flag = 0;
for(j=O;j<cols;j++) if(image~i][j]==0) flag += 1;
if(flag>l) flag=l; /* discard only one pixel wide element */
else flag=0;
if(flag_b = O&&flag==l) ibeg=i;
if((flag_b==l&&flag==0) ll (flag_b==l&&flag==l&&i==(rows-l))) ( iend=i-l;
yl = y2 = ((float)(ibeg+iend))/2.0+0.5;
for(n=O;n<cols;n++) {

prof[n] = 0;
for(m=ibeg;m<=iend;m++) if(image[m][n] = 0) prof[n] += 1;
xl=(float)0;
x2=(float)cols;
for(n=O;n<(cols-l);n++) {

if(prof[n]==0 && prof[n+1]>0) xl=(float)(n+1);
if(prof[n]>0 && prof[n+1] = 0) x2=(float)(n+1);
xl = xl+(float)start ;
x2 = x2+(float)start;
if(xl<0.0) xl=0.0;
if(x2>(float)bcol) x2 = (float)bcol - 0.000001;
(add_h(vctr_list,xl,yl,x2,y2))->type='h';
flag_b = flag;
*s_v_list = vctr_li~t;
free_imatrix(image,O,(rows-l),O,(cols-1));
free_ivector(prof,O,col~-l);
return;

/* Analysi~ each clag-path-node or a junction node if it i~ in the top or bottom of the whole clag */
for(i=l;i<=no_clagnode;i++) if((clagnode+i)->ClasS--'p' ll ((clagnode+i)->class-'j'&& ((clagnode+i)->a_clagnode[0]==0 ll (clagnode+i)->b_clagnode[0]==0)) ) path_s(image,rows,node,clagnode,no_clagnode,i,seg->type,vctr_list);
/* merging vectors in adjacent clag-path-nodes */
do {

merge(vctr_list,clagnode,no_clagnode,&flag_m);
~ while(flag_m==l) ;
/* if the dominant diagonal vector of s exists, delete all the other vertical vectors from the list */
if(cols>=SPIX) s_mdf(&vctr_list);
/* if there is a dominant straight vertical vector, delete all the other vectors from the list */
/* dv_check(&vctr_list);*~ /* NOT complete yet (doesn't check overlap) */
/* free reserved memory spaces */
for(i=l;i<=no_clagnode;i++) free((char*)((clagnode+i)->node+l));
for(i=l;i<=no_clagnode;i++) if((clagnode+i)->no_vector!=0) free((char*)((clagnode+i)->vctr+l));
free((char*)(clagnode+l));
free((char*)(node+l));
/* prepare output data */
*s v_list = vctr_list;
while(vctr_list->next != (struct vector *)NULL) {

for(i=0;i<2;i++) {

vctr_list->x[i] += start;
if(vctr_list->x[i] < seg->start ) vctr_list->x[i] = seg->start;
else if (vctr_list->x[i] > seg->end ) vctr_list->x[i] = seg->end;

if(vctr_list->type=='a') for(i=0;i<3;i++) ( vctr_list->ax[i] += start;
if(vctr_list->ax[i] < seg->start ) v.ctr_list->ax[i] = seg->start;
else if (vctr_list->ax[i] > seg->end ) vctr_list->ax[i] = seg->end;
vctr_list = vctr_list->next;
free_imatrix(image,0,(rows-1),0,(cols-1));

/* delete all the other vectors if a dominant staight vertical exist */void dv_check(vctr_list) struct vector **vctr_list;
{

float width(),length,Mlength;
~truct vector *vctr,-*vctr_di int flag;
vctr = *vctr_list;
flag = 0;
Mlength = 0.0;
length = 0.0;

- 132 - _ 2 1 85827 while(vctr->next != (struct vector *)NULL) if(vctr->type--'v') {

width(vctr);
if(vctr->type=='I') length = sqrt((vctr->x[O]-vctr->x~1])*
(vctr->x[O]-vctr->x[l]) +
(vctr->y[O]-vctr->y[1])*
(vctr->y[O]-vctr->y[1]) );
flag += l;
vctr->type = 'v';
if~Mlength < length) Mlength = length;
vctr_d = vctr;

vctr = vctr->next;

if(flag>O && Mlength>5.0 ) vctr = *vctr_list;
while(vctr->next != (struct vector *)NULL) {

if(vctr != vctr_d) vctr = dlt_vctr(&vctr_list,vctr);
else vctr = vctr->next;

/* modify the vector list if the dominant diagonal vector of s exists */
int s_mdf(vctr_list) struct vector **vctr_list;
{

struct vector *vctr_s, *vctr, *vctr_b;
float ymax=O. ,xmax=O.;
vctr = *vctr_list;
while(vctr->next != (struct vector *)NULL) {

if(vctr->type--'s') {

vctr_s = vctr;
xmax - fabs(vctr->x[O]-vctr->x[l]);
ymax = fabs(vctr->y[O]-vctr->y[l]);
vctr = vctr->next;

if(xmax >= SMIN && ymax >= SMIN) {

vctr = *vctr_list;
while(vctr->next != (struct vector *)NULL) if((vctr->type=='a' 1I vctr->type=='v') && vctr != vctr_s) vctr = dlt_vctr(&vctr_list,vctr);
else vctr = vctr->next;
return(1);

-- -- 2~ ~5827 else return~O);
}

/* delete an entry from the vecor-liqt */
struct vector *dlt_vctr(vctr_list,vctr) struct vector **vctr_list,*vctr;
( struct vector *vctr_b;
if( (*vctr_list) -- vctr) {

*vctr_list = vctr->next;
free(vctr);
return(*vctr_list);
else {
vctr_b = *vctr_list;
while(vctr_b->next != vctr) vctr_b = vctr_b->next;
vctr_b->next = vctr->next;
free(vctr);
return(vctr_b->next);
}

~* add new entry to end of vecor-list */
struct vector *add_h(vctr_list,xl,yl,x2,y2) struct vector *vctr_list;
float xl,x2,yl,y2;
{

while( vctr_list->next != (struct vector *)NULL ) vctr_list = vctr list->next;
/* reserve a space for the next new entry */
if((vctr_list->next=(struct vector *)calloc(l,sizeof(struct vector))) -- (struct vector *)O ) printf(ncalloc fail in add_vctr(): l\n");
/* add NULL to the new end of the li~t */
(vctr_list->next)->next = (struct vector *) NULL;
vctr_list->x E O ] =xl;
vctr_list->x~1]=x2;
vctr_list->y[O]=yl;
vctr_list->y[1]=y2;
return(vctr_list);
}

#undef SMIN
#undef SPIX

-134- 5~ 218582~

wx_detect . c /* detect two special cases, eg w & x, in Table 2 of Pavlidi~ CVGIP paper */

#include <ftr.h>

#define RATIO_up 2.2 /* upper bound for the ratio of height and width */
#define RATIO_low 0.-6 /* lower bound for the ratio of height and width */
#define CNTR 1.2 /* threshold for identifing center point outliers */
#define VARM S /* m; n; m~lm height of an arm of V */
#define XARM 3 /* m;n;m-l~ height of an arm of X */
int wx_detect(image,rows,cols,node,clagnode,ith,group,x,y,vctr_list) int **image,rows,cols;
struct node *node;
struct clagnode *clagnode;
struct vector *vctr list;
int ith; /* ith clagnode */
int **group;
float *x,*y; /* center points of the ith clag-path node */
( int jna,jnb,c_node,number,rowth_a,rowth_b,rowth;
int i,j,jbeg,jend,flag,count_n;
int count_nar,count_nal,count_nbr,count_nbl;
int .~tart,end;
float xa[4],xb[4],width,ratio,width_2,width_3;
float px[41,py[4];
float center=0.0;
float xl,x2;
struct vector *vctr;
number=(clagnode+ith)->number;
width=0.0;
for(j=l;j<=number;j++) {

c_node = (clagnode+ith)->node[j];
width += (float)((node+c_node)->col_end - ~node+c_node)->col_start+1);
width /= (float)number;
ratio = (float)num.ber / width;
if(ratio>RATIO_up 1I ratio < RATIO_low) return (0);
c node=(clagnode+ith)->node[1];
if( (node+c_node)->above>l && (node+c_node)->below==1 ) rowth_a=(node+c node)->rowth-1;
else if( (node+c_node)->above==1 && (node+c_node)->below = } ) rowth_a=(node+(node+c_node)->a_node[0])->rowth - 1 ;
else if( ((node+c_node)->above+(node+c_node)->below) == 1 ) rowth a = -1 ;
else return(0);
jna= find_degree(image,rows,rowth_a,node,c_node,xa);
i~(jna==2) {

rowth = rowth_a;
jbeg-xa[0];
jend=xa[1];
flag=l;
count_n = l;
while(flag==l) if(jbeg>0) while(im.age[rowth][jbeg-l] = 0) {
jbeg -= l;

if(jbe~ 0) break;
if~jend<(cols-1)) while(image~rowth][jend+l]==0) {

jend += l;
if(jend==(cols-l)) break;
rowth -=1 ;
if(rowth<0) break;
flag=0;
for(j=(jbeg-1 > 0 ? jbeg-l : O);j<zjend;j++) if(image[rowth][j]a=0) flagsl;
if(flag==1) count_n += 1;
count nal = count_n;
rowth=rowth_a;
jbegsxa[2];
jend=xa[3];
flagsl;
count_n = 1;
while(flags~1) {

if(jbeg>0) while(image[rowth][jbeg-1]==0) {

jbeg -= 1;
if(jbeg==0) break;
if(jend<(cols-1)) while(image[rowth][jend+l]==0) {

jend += l;
if(jend==(cols-1)) break;
rowth -=1 ;
if(rowth<0) break;
flag-0;
for(j=jbeg;j<=(jend+l < cols-1 ? jend+1 : col~-l);j++) if(image[rowth][j]==0) flagsl;
if(flag c 1) count_n += 1;
-count_nar = count_n;

c_nodes(clagnode+ith)->node[number];
if( (node+c_node)->abovessl && (node+c_node)->below>1 ) rowth_b=(node+c_node)->rowth+1;
else if( (node+c_node)->above==1 && (node+c_node)->below==1 ) rowth_b=(node+(node+c_node)->b_node[0])->rowth + 1 ;
else if( ((node+c_node)->above+(node+c_node)->below) 55 1 ) rowth_b = rows ;
else return(0);
jnb= find_degree(image,rows,rowth_b,node,c_node,xb);
if(jnb==2) {
rowth=rowth_b;
jbegsxb[0];
jend=xb[1];
flag=1;
count_n = 1;
while(flag==1) {

2 1 ~58~7 if(jbeg>O) while(image[rowth][jbeg-1] = O) {

jbeg -5 1;
if(jbeg-=O) break;
if(jend<~cols-1)) while(image[rowth][jend+1]==0) jend += l;
if(jend==(col~-l)) break;
}

rowth +=l ;
if(rowth>(rows-1)) break;
flag=O;
for(j=(jbeg-l > O ? jbeg-l : O);j<=jend;j++) if(image[rowth][j]==O) flag=l;
if(flags=1) count_n += 1;
count_nbl = count_n;

rowth=rowth_b;
jbeg=xb[2];
jend=xb[3];
flag=1;
count_n = 1 while(flag=-1) {

if(jbeg>O) while(image[rowth][jbeg-l]==O) {
jbeg -= 1;
if(jbeg==O) break;
if(jend<(cols-1)) while(image[rowth][jend+1]==0) {

jend += 1;
if(jend==(cols-1)) break;
rowth +=1 ;
if(rowth>(rows-l)) break;
flag=O;
for(j=jbeg;j<=(jend+1 < cols-1 ? jend+1 : cols-l);j++) if~image[rowth][j]==O) flagsl;
if(flags=1) count_n += 1;
}

count_nbr = count_n;

if(jna==O && (clagnode+ith)->a_clagnode[O]!=O) return(O);
if(jnb==O && (clagnode+ith)->b_clagnode[O]!=O) return(O);
/* retrun O if outliers of center point~ exist in the 'v' situation */
if( jna==O && jnb==2 && number >= 5 && count_nbr>=VARM && count_nbl>=VARM) {

line fit(group,x, y, 1, number, &xl, &x2, &start, &end);
for(i=(number-2);i<=number;i++) if( p21ine(xl,y[1],x2,y[number],x[i],y[i]) > CNTR ) return(O);
el~e if( jna==2 && jnb==O && number >= 5 &&
count nar>=VARM && count_nal>=VARM) {
line_fit(group,x, y, 1, number, &xl, &x2, &start, &end);
for(i=l;i<=3;i++) if( p21ine(xl,y[1],x2,y[number],x[i],y[i]) > CNTR ) return(O);

-/* retrun 2 if x-arm is not long enough in the 'x' situation */
if( jna C 2 && jnb = 2 && (count_nar<XARM II count_nal<XARM II
count_nbr<XARM II count_nbl<XARM ) ) return(2);
/* return 1 if normal 'v' or 'x' situationn exist */
if( jna==O&&jnb==2 && count_nbr>=VARM && count_nbl>=VARM ) ( px[O]=px[l]= X[l];
py[O]=py[l]= y[l];
px[2]= (xb[O]+xb[1])/2.;
px[3]= (xb[2~+xb[3])/2.;
py[2]=py[3]= y[number];
(clagnode+ith)->type = 'v';
width_2 = fabs(xb[1]-xb[O]);
width_3 = fabs(xb[3]-xb[2]);
else if( jna = 2&&jnb==0 && count_nar>=VARM && count_nal>=VARM) {
px[O]= (xa[O]+xa[1])/2.;
px[1]= (xa[2]+xa[3])/2.;
py[O]=py[l]= y[l];
px~2]=px[3]= x[number];
py[2]=py[3]= y[number];
(clagnode+ith)->type = 'v';
width_3 = fabs(xa[1]-xa[O]);
width 2 = fabs(xa[3]-xa[2]);
}

else if( jna==2&&jnb = 2 ) {
px[O]= (xa[O]+xa[1])/2.;
p~[1]= (xa[2]+xa[3])/2~;
py[O]=py[l]= y[l];
px[2]= (xb[O]+xb[1])/2.;
p~[3]= (xb[2]+xb[3])/2.;
py[2]=py~3]= y[number];
(clagnode+ith)->type = 'x';
width_2 = fabs(xa[3]-xa[2]) < fabs(xb[1]-xb[O]) ?
fabs(xa[3]-xa[2]) : fabs(xb[1]-xb[O]);
width_3 = fabs(xa[1]-xa[O]) < fabs(xb[3]-xb[2]) ?
fabs(xa[1]-xa[O]) : fab~(xb[3]-xb[2]) ;
}

else return(O);
/* since vctr->no_lag=O, we don't use width_2 & width_3 during merging */
vctr = add_vctr(vctr_list,clagnode,ith,px[O],py[O],px[3],py[3]);
vctr->type='v';
vctr->width = width_3;
vctr = add_vctr(vctr_list,clagnode,ith,px[l],py[l],px[2],py[2]);
vctr->type='v';
vctr->width = width_2;
(clagnode+ith)->group = l;
return (l);
}

#undef RATIO_ub #undef RATIO_lb #undef CNTR

/* find jna & jnb for wx_detect() */
int find_degree(image,row~,rowth,node,c_node,xa) int **image;
struct node *node;

~ int c_node,rows,rowth;
fLoat xa[4];
int j, count=-l, jna=O, find;
if(rowth>=O && rowth<rows) {
find=O;
for(j=((node+c node)->col_start);j<=((node+c_node)->col_end);j++) if(image[rowth][j]==O && find == O) count += l;
if(count<4) xa[count]=(float)j;
find = 1 ;
if( image[rowth][j]==255 && find =- 1) count += l;
if(count<4) xa[count] = (float)(j-l) ;
find - O ;
jna += l;
if( j==((node+c_node)->col_end) && find == 1) count += l;
if(count<4) xa[count] = (float)j;
jna += l;

}
return jna;
}

/* find the distance from a point to a line indicated by two end points */
float p21ine(xl,yl,x2,y2,xin,yin) float xl,yl,x2,y2;
float xin,yin;
float a[3],L,T,d;
int i;
a[O] = yl - y2;
a[l] = x2 - xl;
a[2] = y2*xl - yl*x2;
L = sqrt( a[O]*a[O] + a[l]*a[l] );
d = a[O]*xin + a[l]*yin + at2];
return( fabs(d)/L );
}

- 139 - _ 2 1 ~5827 y detect.c /* detect the 'v' situation in the middle part of y */
#include <ftr.h>
int y_detect~image,rows,cols,node,clagnode,ith,beg,end) int **image,rows,col~,beg,end;
struct node *node;
struct clagnode *clagnode;
int ith; /* ith clagnode */
int jna,c_node,rowth_a,rowth;
int i,j,jbeg,jend,flag,count_n;
float xa~4~;
c_node=~clagnode+ith)->node[end];
if( !( (node+c_node)->above==1 && ~node+c_node)->below~
return~O);
c_node=~clagnode+ith)->node[beg];
if~ ~node+c_node)->above>l && (node+c_node)->below==1 ) rowth_a=(node+c_node)->rowth-l;
else if~ ~node+c_node)->above==l && ~node+c_node)->below==l ) rowth_a=~node+~node+c_node)->a_node[O])->rowth - 1 ;
else return~O);
jna= find_degree~image,rows,rowth_a,node,c_node,xa);
if~jna != 2) return~O);
else ( rowth = rowth_a;
jbeg=xa[O];
jend=xa[l];
flag=l;
count_n = l;
while~flag==l) {

if~jbeg>O) while~image[rowth][jbeg-l] = O) jbeg -= l;
if~jbeg = O) break;
) if~jend<~cols-l)) while~image[rowth][jend+l] = O) jend += l;
if~jend==~cols-l)) break;
}

rowth -=1 ;
if~rowth<O) break;
flag=O;
for~j=jbeg;j<=jend;j++) if(image[rowth][j]==O) flag=l;
if~flag==l) count_n += 1;
if~count_n <= 3) return~O);
rowth=rowth_a;
jbeg=xa[2];
jend=xa[3];
flag=l;
count_n = l;
while~flag-- 1) {

_ - 140 _ 2185827 if(jbeg>0) while(image[rowth]tjbeg-l] = 0) jbeg -= l;
if ( jbeg==0) break;
if(jend<(col~
while(image~rowth][jend+l] = 0) jend += l;
iftjend==(col.~-l)) break;
rowth -=1 ;
if(rowth<0) break;
flagsO;
for(j=jbeg;j<=jend;j++) if(image[rowth][j]==0) flagsl;
if(flag==l) count n += l;
if (count_n <= 3) return(0);

return(l);

Claims

1. A method for recognizing characters in a scanned text image, the method comprising the steps of:
determining primitive strokes in the scanned text image;
segmenting the scanned text image into one or more sub-character segments based on one or more determined primitive strokes;
identifying one or more features characterizing a sub-character segment; and recognizing characters based on identified sub-character features.

2. The method of claim 1 wherein the step of determining primitive strokes comprises:
representing a scanned text image with one or more compressed line adjacency graphs, a compressed line adjacency graph comprising one or more compressed paths;
dividing a compressed path of a compressed line adjacency graph into two or more groups of nodes based on node width and center location information; and determining one or more strokes for a group based upon a set of one or more stroke identification rules.

3. The method of claim 2 wherein the step of determining one or more strokes comprises the step of merging adjacent strokes based on a set of one or more stroke merging rules.

4. The method of claim 1 wherein a feature characterizing a sub-character segment comprises a stroke.

5. The method of claim 1 wherein a feature characterizing a sub-character segment comprises an arc.

6. The method of claim 1 wherein a feature characterizing a sub-character segment is represented by a 5-tuple.

7. The method of claim 1 wherein the step of identifying features comprises the steps of:
representing a sub-character segment with a compressed line adjacency graph, the compressed line adjacency graph comprising one or more compressed paths; andanalyzing a compressed line adjacency graph to determine one or more features.

8. The method of claim 7 wherein the step of identifying features further comprises the step of excluding from a segment one or more pixels associated with a compressed path from a neighboring segment.

9. The method of claim 7 wherein the step of analyzing a compressed line adjacency graph comprises the step of identifying a horizontal stroke for a compressed path of a horizontal segment.

10. The method of claim 7 wherein the step of analyzing a compressed line adjacency graph comprises the step of defining for a compressed path of a non-horizontal segment one or more groups of nodes based on node width information.

11. The method of claim 10 wherein the step of analyzing a compressed line adjacency graph further comprises the step of merging two adjacent groups into asingle group based on average group width information.

12. The method of claim 10 wherein the step of analyzing a compressed line adjacency graph further comprises the step of identifying an arc feature within a group.

13. The method of claim 12 wherein the step of identifying an arc feature within a group comprises the steps of:
defining a line segment connecting the centers of the first and last nodes in a group;
determining a node center within the group which is the greatest distance from the line; and identifying an arc feature defined by the centers of the first and last nodes and the determined node center when the greatest distance divided by the length of the line segment exceeds a threshold.

14. The method of claim 10 wherein the step of analyzing a compressed line adjacency graph further comprises the step of identifying a stroke feature within a group.

15. The method of claim 1 wherein the step of recognizing characters based on identified sub-character features comprises the step of comparing sub-character features to known features of known characters.

16. A method for recognizing characters in a scanned text image, the method comprising the steps of:
segmenting the scanned text image into sub-character segments;
identifying one or more features characterizing a sub-character segment of a scanned text image;
comparing identified sub-character features to stochastic models of known characters and determining a distance score based on each comparison; and determining an optimum sequence of known characters based on determined distance scores.

17. The method of claim 16 further comprising training a stochastic model based on identified sub-character features of known characters.

18. The method of claim 17 wherein training comprises performing a k-means clustering of feature vectors to adaptively partition a feature space.

19. The method of claim 18 wherein training further comprises representing a segment vector in a binary N-dimensional space, where N is a number of feature clusters.

20. The method of claim 16 wherein the stochastic model of a character comprises a Hidden Markov Model.

21. The method of claim 20 further comprising the steps of training the Hidden Markov Model by determining probabilities for states of the model.

22. The method of claim 20 wherein the Hidden Markov Model comprises penalty functions for skipping a model state.

23. The method of claim 20 wherein a Hidden Markov Model comprises penalty functions for remaining in a model state.

24. The method of claim 16 wherein the step of determining a distance score comprises determining a Bayesian distance score.

25. The method of claim 16 wherein the step of determining an optimum sequence of known characters is further based on a model of context.

26. The method of claim 25 wherein the model of context comprises a stochastic model for a sequence of characters.

27. The method of claim 26 wherein the stochastic model for a sequence of characters comprises n-gram probabilities.

28. The method of claim 25 wherein the model of context comprises a lexicon of sequences of text characters.

29. The method of claim 16 wherein the step of determining an optimum sequence of known characters comprises the step of performing Viterbi scoring.

30. The method of claim 29 wherein the step of performing Viterbi scoring comprises performing a level building process.

31. A text recognition system, the system comprising:
means for performing word image enhancement;
means, coupled to the means for performing word image enhancement, for performing sub-character segmentation;
means, coupled to the means for performing sub-character segmentation, for performing feature extraction based on sub-character segments;
means, coupled to the means for performing feature extraction, for performing recognition of text based on a comparison of extracted sub-character features and stochastic models of known characters; and memory means, coupled to the means for performing recognition of text, for storing the results of text recognition.

32. The system of claim 31 further comprising a scanner for scanning a paper copy of a document and producing a pixel image thereof.

33. The system of claim 32 further comprising a page preprocessor, coupled to the scanner, for determining pixel images of words based on a scanned pixel image of a document.

34. The system of claim 31 further comprising a means for training a stochastic model of a known character.