MXPA95003295A - System and method for automated interpretation of input expressions using novel a posteriori probability measures and optimally trained information processing networks - Google Patents

System and method for automated interpretation of input expressions using novel a posteriori probability measures and optimally trained information processing networks

Info

Publication number
MXPA95003295A
MXPA95003295A MXPA/A/1995/003295A MX9503295A MXPA95003295A MX PA95003295 A MXPA95003295 A MX PA95003295A MX 9503295 A MX9503295 A MX 9503295A MX PA95003295 A MXPA95003295 A MX PA95003295A
Authority
MX
Mexico
Prior art keywords
image
interpretation
input
symbols
records
Prior art date
Application number
MXPA/A/1995/003295A
Other languages
Spanish (es)
Inventor
John Burges Christopher
Steward Denker John
Original Assignee
Lucent Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lucent Technologies Inc filed Critical Lucent Technologies Inc
Publication of MXPA95003295A publication Critical patent/MXPA95003295A/en

Links

Abstract

La presente invención se refiere a un método y sistema para formar una interpretación de una expresión de entrada, en donde la expresión de entrada estáexpresada en un medio, la interpretación es una secuencia de símbolos, y cada símbolo es un símbolo en un conjunto de símbolos conocidos. En general, el sistema procesa un conjunto de datos de entrada adquiridos, representativo de la expresión de entrada, para formar un conjunto de segmentos, los cuales son utilizados entonces para especificar un conjunto de co-segmentaciones. Cada co-segmentación y cada posible interpretación para la expresión de entrada estárepresentada en una estructura de datos. La estructura de datos es representable gráficamente por un grafo que comprende un arreglo bidimensional de nodos distribuidos en hileras y columnas, y conectadas selectivamente por arcos dirigidos. Cada trayectoria, que se extiende a través de los nodos y a lo largo de los arcos dirigidos, representa una co-segmentación y una posible interpretación para la expresión de entrada. La totalidad de las co-segmentaciones y la totalidad de las posibles interpretaciones para la expresión de entrada, están representadas por el conjunto de trayectoria que se extiende a través del grafo. Para cada hilera de nodos en el grafo, un conjunto de registros es producido para el conjunto de símbolos conocidos, utilizando un complejo de redes de procesamiento de la información neural entrenadasóptimamente. Después de esto, el sistema calcula una probabilidad a posteriori para una o más interpretaciones de la secuencia de símbolos. Derivando cada una de las probabilidades a posteriori solamente a través del análisis del conjunto de datos de entrada adquiridos, se producen probabilidades altamente confiables para las interpretaciones que se encuentran en competencia, para la expresión de entrada. Los principios de la presente invención pueden ser practicados con series o cadenas de caracteres escritos cursivamente de longitud arbitraria y pueden estar fácilmente adaptados para su uso en sistemas de reconocimiento de señales vocales.

Description

/ "SYSTEM AND METHOD FOR THE AUTOMATED INTERPRETATION OF ENTRY EXPRESSIONS, USING MEASUREMENTS / - NOVELTIES OF PROBABILITY TO POSTERIORI AND INFORMATION PROCESSING NETWORKS OPTIMALLY TRAINED" Inventors: CHRISTOPHER JOHN BURGES, North American, domiciled in 11 Andorra Terrace, Freehold, New Jersey 07728, E.U.A. and JOHN STE ARD DENKER, North American, domiciled at 6 Koosman Drive, Leonardo, New Jersey 07737, E.U.A.
Causaire: AT & T CORP., New York State Corporation, E.U.A. domiciled at 32 Avenue of the Americas, New York, New York 10013-2412, E.U.A.
FIELD OF THE INVENTION The present invention relates in general to an automated method and system for interpreting input expressions, such as handwritten characters, using novel a posteriori probability measurements and optimally trained neural information processing networks.
BRIEF DESCRIPTION OF PREVIOUS TECHNIQUE Currently, there is a great commercial interest in the construction of machines that can interpret (that is, recognize) correctly, series of alphanumeric characters possibly connected, recorded on several media. For example, the US Postal Service. will rely very soon on such machines to correctly recognize ZIP Codes written by hand on the pieces of mail during their classification in the mail and the operations of assignments of routes across the country. Currently, a number of character recognition systems of the prior art have been developed for use in various environments. A variety of such systems and related techniques are described in the following technical publications: • Y. Le Cun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, and L.D. Jackel, "Handwritten Digit Recognition with a Back-Propagation Network", pp. 396- 404 in Advances in Neural Information Processing 2, David Touretsky, ed., Morgan Kaufman (1990); • J.S. Bridle, "Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition", in Neuro-computing: Algorithms, Architectures and Applications, F. Fogelman and J. H rault, ed., Springer-Verlarg (1989); • J.S. Bridle, "Training Stochastic Model Recognition Algorithms as Networks Can Lead to Maximun Mutual Information Estimation of Parameters", in Advances in Neural Information Processing 2, David Touretzky, ed., • Morgan Kaufman (1990); O. Matan, J. Bromley, C.J.C. Burges, J.S. Denker, L.D. Jackel, Y. LeCun, E.P.D. Pednault, W.D. Satterfield, CE. Stenard, and T.J. Thompson, "Reading Handwritten Digits: A ZIP code Recognition System", IEEE Computer 25 (7) 59-63 (July 1992); • C.J.C. Burges, O Matan, Y. LeCun, J.S. Denker, L.D. Jackel, CE. Stenard, CR. Nohl, J.I. Ben, "Shortest Path Segmentation: A Method for Training to Neural Network to Recognize Character Strings ", IJCNN Conference Proceedings 3, pp. 165-172 (June 1992); • CJ. Burges, O. Matan, J. Bromley, CE. Stenard, "Rapid Segmentation and Classification of Handwritten Postal Delivery Addresses Using Neural Network Techonology", Interi Report, Task Order .Nu ber 104230-90-C-2456, USPS Reference Library, Washington D.C. (August 1991); • Edwin P.D. Pednault, "A Hidden Markov Model for Resolving Segmentation and Interpretation Ambiguities in Unconstrained Handwriting Recognition," Ball Labs Technical Memo 11352-090929-01 M, (1992); Y • Ofer Matan, Christoper J.C Burges, Yann LeCun and John S. Denker, "Multi-Digit Recognition Using a Space Displacement Neural Network", in Neural Information Processing Systems 4, J.M. Moody S.J. Hanson and R.P. Lippman, eds., Morgan Kaufmann (1992).
Although the systems of the prior art described in the above references can be distinguished from each other, they are better characterized by the structural and functional characteristics they share in common. Specifically, each system of the prior art acquires at least one I image from a series of possibly connected characters, which will be inter- ^ -J ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pretested by the system. In general, for a given alphabet, the number of possible interpretations from which the system must select the "best" interpretation is equal to the number of possible character sets, which can be serially placed together using the characters in the alphabet and the applicable morphological restrictions. In the applications of the ZIP Code recognition, each permissible interpretation is restricted by the length of the ZIP Code, especially; it must have five or nine digits. According to the convention, the acquired image of the character series is typically preprocessed to remove underlines, spatial noise, and the like. The preprocessed image I is then "cut" or divided into sub-images of manageable size. The sub-images between each set of adjacent cut lines are referred to as the "cells or cells" of the image. In some cases, the limit between two cells or cells is determined to be a "definitive cut", which falls or falls definitively between two characters, while in other cases, the cut is considered undefined and the determination of whether the cut drops or descends between two characters, is deferred or deferred until further processing is performed. The Image cells, adjacent, are then combined to form "segments" of the image. After this, the segments of the image are serially placed together, from the left to the right, to form acceptable "co-segmentations" of the image, which substantially include all of the pixels of the preprocessed image. Specifically, an acyclic directed graph is used to construct a model of acceptable "co-segmentations" of the image. Typically, this model is constructed by associating each segment of the image with a node in a directed acyclic or unipolar graph. The nodes in the graph are then connected with directed arcs. In general, two nodes in the graph are connected if and only if the segments of the image they represent are legally abutting in a co-segmentation of the acceptable image. When the graph is completely constructed, each path through it corresponds to a co-segmentation of the image, of the preprocessed image, and each possible co-segmentation of the image corresponds to a particular trajectory through the graph. After the graph has been constructed, recursive clipping techniques are used to remove from the graph any node that corresponds to a segment of the image that falls or descends through a line of cut definitive, through the preprocessed image. After the graph has been clipped, each segment of the image associated with a node that subtracts or remains in the clipped graph is sent to a recognizer of the neural network for classification and registration. Based on such classification and registration, each node in the clipped graph is assigned to a "record or mark" which is derived from the register or mark of the recognizer assigned to the segment of the associated image. Typically each mark or register of the recognizer is converted into a probability by a computation procedure that involves the normalization of the mark or registration of the recognizer. After this, a mark or record of the trajectory (that is, the probability of union) is calculated for each trajectory through the cut graph by simply multiplying the "marks or registers" assigned to the nodes along the trajectory. According to this scheme of recognition of multiple characters (MCR), the record path higher through the cut graph, corresponds to the "best" co-segmentation of the image and the best interpretation of the series of characters for the image acquired. The details of such techniques are described in Application No. Seri 07 / 816,414 filed on December 31, 1991, entitled "Alphanumeric Image Co-segmentation Scheme", and Application Serial No. 07 / 816,415 filed on December 31, 1991, entitled "Graphic System for Automated Co-segmentation and Recognition for Image Recognition Systems", both incorporated here for reference. Although the methods of the prior art have been useful in the design of commercial and experimental character recognition systems, the operation of such systems has been less than ideal, particularly in real-time, highly demanding applications. In particular, MCR systems of the prior art generally operate by identifying only a co-segmentation that supports a given interpretation. This approach is based on the concept that there is only one "best" co-segmentation. In accordance with such prior art approaches, the registration of this first "best" co-segmentation is only the record that is considered during the recognition process. Consequently, the MCR systems of the prior art employ methods that are equivalent to the assumption, incorrectly, that the co-segmentation of the correct image is known. Based on this assumption, the individual character records are normalized to calculate the probabilities for the particular characters in the alphabet or allowed code. This leads to irretrievably discarding valuable information about how well the segmentation algorithm worked on the particular segment of the image. Prior art MCR systems based on such assumptions are frequently referred to as "maximum likelihood sequence estimation" (MLSE) machines. In addition to choosing an interpretation of the image, some prior art MCR systems often provide a mark or record which means that it offers some indication of the likelihood that the chosen interpretation is correct. In many applications, it is desired to have a mark or record that can be interpreted as an exact probability, to facilitate the combination of these results of the MCR system with other sources of information. However, the MCR systems of the prior art have tended to emphasize the choice of the "best" interpretation, while de-emphasizing accurate marking or registration. Consequently, marks or records frequently contain systematic errors of many orders of magnitude. Accordingly, there is a great need in the art for an improved method and system for interpreting expressed symbol sequences, represented in various media.
OBJECTS OF THE PRESENT INVENTION Accordingly, a general object of the present invention is to provide an improved method and system for interpreting input expressions, such as the series of characters expressed in or recorded on a medium using, for example, writing techniques either printed or italics (that is, handwriting). A further object of the present invention is to provide such an automated method and character interpretation system, which uses a posteriori probabilities for the selection of the best interpretation of the character series. A further object of the present invention is to provide such a method and system of the interpretation of the series of characters, automated, wherein each a posteriori probability is derived inductively, combining an a priori information with the exemplary, known pixel images. A further object of the present invention is to provide such an automated method and system of interpretation of the series of characters, which are capable of interpreting arrays of characters of arbitrary length, and can be easily adapted in conjunction with each other.
^ ¿^^^^^^^^^^^^^^^^^^ M with the systems of interpretation of sentences, automated, and similar. A further object of the present invention is to provide a multiple character recognition system, in which a complex of neural computer networks, optimally trained, are used during the processes of interpretation of the co-segmentation of the image and the series of characters. A further object of the present invention is to provide such a method of multiple character recognition, in which the co-segmentation of the image and the interpretation of the series of characters are combined in a unique adaptive learning process, which takes performed by a complex of neural computer networks, optimally trained, to maximize the record of the correct interpretation of the character series. A further object of the present invention is to provide such a system which uses a novel data structure, based on a directed, acyclic or unipolar graph, especially modified, in which each path therethrough represents both a co-segmentation of the image as an interpretation of the character series. A further object of the present invention is to assign marks or records to the possible, selected interpretations of the image, in particular, marks or records which can be interpreted as the exact estimates of the probability of the selected interpretations. A further object of the present invention is to provide such a system, in which the a posteriori probability assigned to each interpretation of the particular series of characters is defined as a relation or proportion, in which the numerator of the relation is calculated by adding the marks or records of the trajectory along the totality of the trajectories through the graph that represents the same interpretation of the series of characters, and the denominator of the relation is calculated by adding the marks or records of the trajectory along the the totality of the marks or trajectories through the graph that represents the totality of the possible interpretations of the series of characters that have the same number of characters. A still further object of the present invention is to provide such a multi-character handwriting recognition system, which can be realized or constructed as a portable device. A further object of the present invention is to provide a method of interpreting the series of characters, where the Viterbí Algorithm is used to identify which interpretation of the series of characters has the best mark or record of the trajectory; where the Forward Algorithm is then used to calculate the exact sum of the totality of trajectory records that represent the interpretation of the series of characters identified by the Viterbí Algorithm; and where the Forward Algorithm is used to calculate a normalization constant for the sum calculated exactly, above, by adding all the marks or records of the trajectory through the graph, which represent the totality of the possible interpretations of the series of characters. A further object of the present invention is to provide a method for the interpretation of the series of characters, wherein the Lightning or Beam Search Algorithm is used to identify a number of interpretations of the series of characters, which are in competition. , that have the best set of marks or records of the trajectory; where the Forward or Advance Algorithm is used to calculate, for each interpretation of the series of characters, the exact sum of the totality of the marks • iiSSfe-. or trajectory records that represent the interpretation of the series of characters that are in competition, identified using the Viterbi Algorithm; and where the Forward or Advanced Algorithm is then used to calculate a single normalization constant for each of the interpretations of the series of characters that are in competition, adding up all the marks or records of the trajectory through the graph that represent the totality of possible interpretations of the series of characters. A further object of the present invention is to provide a system for interpreting input expressions, with a learning mode of the operation, in which both the graph and the complex of neural information processing networks are used for Train the system by optimally adjusting the parameters of the neural networks during one or more of the training sessions. A still further object of the present invention is to provide such a system, in which a sensitivity analysis is used during training of the neural network, so that each adjustable parameter in the neural networks is adjusted in a direction that increase the posterior probability of the interpretation of the series of characters known that will be correct and will decrease the posterior probability of interpretations that are known to be incorrect. A further object of the present invention is to provide such a system for interpreting the input expression, where during its learning mode of operation, the Baum-Welch Algorithm is used to calculate how sensitively these marks or records produced by the total system change, in response to the increasing changes made to each adjustable parameter of the networks neural These and other objects of the present invention will become apparent hereinafter and in the Claims of the Invention.
BRIEF DESCRIPTION OF THE PRESENT INVENTION According to a first aspect of the present invention, a method and system are provided to form an interpretation of an input expression "expressed in a medium, using either written or italic writing techniques.In general, the system acquires a set of input data representative of the input expression.The set of input data acquired is divided into a set of segments, which are then used to specify a set of co-segmentations. The system then uses a new data structure to implicitly represent each co-segmentation and each possible interpretation for the input expression. The data structure can be represented as a directed acyclic or unipolar graph comprising a two-dimensional array of nodes arranged or distributed in rows and columns and connected selectively by directed arcs. Each trajectory extends through the nodes and along the directed arcs represents a co-segmentation and a possible interpretation for the input expression. The totality of the co-segmentations and the totality of the possible interpretations for the input expression are represented implicitly by the set of trajectories that extend through the graph. For each row of nodes in the graph, a set of marks or records is produced for the set of known symbols using, for example, an optimally trained neural information processing network. In conjunction with the graph, these marks or registers implicitly assign a mark or record of the trajectory for each trajectory through the graph. Using these marks or records of the trajectory, the system identifies the best interpretations of the sequence of symbols, and calculates a posteriori probabilities for them. Deriving each of the probabilities a posteriori, by substantially analyzing the acquired, complete input data set, a highly reliable probability is produced for each interpretation of the symbol sequence. The principles of the present invention can be practiced with virtually any expressed sequence of symbols, including series of characters written cursively of an arbitrary length. The system can also be easily adapted for use in conjunction with automated sentence interpretation systems. In a first illustrative embodiment of the present invention that involves the recognition of character sequences, the system determines the interpretation of the series of characters that has a higher scoring or registration trajectory, through the graph. To determine if this interpretation is reliable, the system also produces as an output, the a posteriori probability for this interpretation of the series of characters. This probability is calculated as the ratio of a portion of the numerator to a portion of the denominator. The numerator portion is equal to the sum of the marks or records of the trajectory, for all trajectories through the graph that represents an interpretation of the given series of characters. The portion of the denominator is equal to the sum of the marks or records of the trajectory for all the trajectories through the graph that represents the totality of the possible interpretations of the series of characters. If the probability is less than a predetermined threshold, the user can not guarantee that this interpretation is reliable, and therefore the user is informed that other steps must be taken prior to the additional action. In a second illustrative embodiment of the present invention, the first system finds a set of trajectories through the graph that has the highest set of marks or records of the trajectory. For each trajectory in this set, the system identifies the interpretation of the corresponding series of characters, and evaluates the a posteriori probability of this interpretation (including the contributions of other trajectories with the same contribution). The system identifies the set of possible interpretations of the series of characters, which is represented by the set of trajectories found. The a posteriori probabilities for the set of possible interpretations of the series of characters are calculated _. & - - ^ J & amp; give then. The system analyzes the calculated set of a posteriori probabilities to determine which possible interpretation of the series of characters has the highest a posteriori probability. Based on this analysis, the system produces as output, (i) one or more interpretations of the series of characters, which have high posterior probabilities and (ii) an exact estimate of the posterior probability for each interpretation of the series of characters. In the second illustrative mode, the a posteriori probability for each interpretation of the series of characters that is competing is calculated as the ratio of a portion of the numerator to a portion of the denominator. The portion of the numerator is equal to the sum of the marks or records of the trajectory for the totality of the trajectories through the graph that represents the interpretation of the series of characters that are competing. The portion of the denominator is equal to the sum of the marks or record of the trajectories for the totality of the trajectories through the graph that represents the totality of the possible interpretations of the series of characters. According to a second aspect of the present invention, means and a novel form are provided to optimally train the interpretation system of the symbol sequence of the present invention. This is achieved by supplying the system with a unique learning mode of operation. In its learning operation mode, the system processes a large number of training images representative of known input expressions. For each processed training image, the system increasingly adjusts the set of adjustable parameters that characterize the operation of each neural network. The direction of each increasing adjustment is such that the average probability for interpretation of the series of characters known to be correct increases, while the average probability for interpretations of the sequence of symbols known to be going To be incorrect, decreases. The system and method of the present invention can be used to interpret character strings which have been expressed in virtually any way, including graphic recording on electrically passive media such as paper, plastic or fabrics, or on media electrically active, such as pressure sensitive writing surfaces, and "touch screen" display and writing surfaces, all well known in the art.
BRIEF DESCRIPTION OF THE DRAWINGS For a more complete understanding of the Objects of the Present Invention, the following De-carved Description of the Illustrative Modality will be read in conjunction with the attached drawings, wherein: Figure 1 is a block diagram of the system that illustrates the various components used to make or manufacture the interpretation system of the character series of the illustrative embodiments of the present invention; Figure 2 is a block diagram of the interpretation system of the character series of the present invention; Figure 3 is a preprocessed image of a Zip code written by hand using a cursive writing technique; Figures 4A to 4F are a set of preprocessed images of the ZIP Code in Figure 3, each shown with a superimposed set of cutting lines generated during the cell or cell generation stage of the image of the interpretation method of the series of characters of the present invention; Figure 5 is a table of "cells or cells" of the image (ie, sub-images) formed between the cutting lines shown in Figure 4; Figure 6 is a table of "segments" of the image, formed by combining the cells or cells of the adjacent image, shown in Figures 4 and 5; Figure 7 is a table that fits three quarters of the many "co-segmentations" of the legal image, consisting of connected sets of spatially contiguous image segments, shown * a in the Figure 6; Figure 8 is a graphic representation of the novel data structure of the present invention, which is used to graphically represent the segments of the image, the possible co-segmentations of the image formed from them, the possible interpretations of the series of characters, and the marks or registers assigned to the possible interpretations of the series of characters; Figure 9 is a schematic representation of the interpretation system of the character series of the present invention, shown configured adaptively for the recognition of a zip code image that has been analyzed in eleven segments of the image; Figures 10A and 10B, taken together, show a high-level flow diagram, illustrating the steps carried out during the method of interpretation of the series of characters according to the first illustrative embodiment of the present invention; Figures 11 A and 11B, taken together, show a high-level flow diagram, illustrating the steps performed during the method of interpreting character strings according to the second illustrative embodiment of the present invention; Figure 12 is a schematic representation of the manually-interpretable character series interpretation system of the present invention; and Figures 13A and 13B, taken together, show a high-level flow diagram, illustrating the steps performed during the training method of the interpretation system of the character series of the present invention.
DETAILED DESCRIPTION OF THE ILLUSTRATIVE MODALITY OF THE PRESENT INVENTION As illustrated in Figure 1, the system 1 for interpreting the symbol sequence (i.e., the "recognition") of the illustrative embodiment is shown to comprise a number of components of the integrated system, In particular, the system comprises one or more processing units 2 (eg, microprocessors) controlled by the programs stored in the storage 3 of the program memory. The storage 3 of the program memory also stores an operating system program, the application programs, and various image processing routines of the present invention. The data storage memory 4 is provided to store the data associated with the data structure of the present invention. In general, the system includes a device 5 for acquiring the input data set, for acquiring an input data set representative of an expressed sequence of symbols. In the illustrative mode, this device is made or manufactured as an image detector for acquiring color or grayscale images of possibly connected series of alphanumeric characters recorded on a recording medium 6, as shown in Fig. 2. In the illustrative embodiment, character strings may be recorded on electrically passive recording surfaces such as paper, plastic, wood, fabrics, etc., or on an electrically active recording surface, such as pressure-sensitive fingering surfaces ^ - '' .- S ^ rf ^ fe.i-i ^ J-áa?. ^., ^.: ^.: - ^ -. . i? | -fiilTTltlHJ "r - '* ~ -" J " or "touch screen" LCD display and writing surfaces well known in the art. The character strings can be expressed using conventional print or italics (ie handwriting) techniques. As shown in Figure 1, the system of the illustrative embodiment comprises a random access data storage memory (e.g., VRAM) 7 for separating or buffering acquired images from character strings that will be interpreted The storage memory 8 of the mass data is provided for the long-term storage of these images. Preferably, the system also includes a visual indication unit 9 having a display or visual indication surface (LCD); a keyboard or other device 10 for data entry; a device 11 for aiming, tracking and selecting icons displayed visually on the display screen; the input / output device 12; and an interconnection of the system 13 for interconnection with one or more external base systems 14 using the information provided by the system 1. Preferably, the system components 2, 3, 4, 7, and 8 are enclosed in a suitable compact housing for the particular application by hand. The other components may have their own accommodations. As shown, each One of these components is operatively associated with the processors 2 by means of one or more data busbars 15 of the system, in a manner well known in the art. In the applications of the ZIP code recognition, the system of the present invention is properly interconnected with the conventional route and mail classification machinery 14, well known in the postal technique. As illustrated in Figure 2, the interpretation system 1 of the series of characters performs a number of functions to arrive at an interpretation of the "series of characters" registered graphically, denoted by Ci based on the analysis of the information of the pixel contained in the acquired image I of a series of graphically registered characters. These processing steps of the image will be described sequentially in detail later, with reference to other drawings of the identified figure. In general, the system and method of the present invention can be applied to the series of characters printed by a machine or written by hand, of arbitrary length. Accordingly, the present invention will be useful in handwriting recognition applications, where the person writing is allowed to write one or more words on a writing surface. of one class or another; or write one or more sentences for automated recognition. For clarity of exposition only, the first and second illustrative embodiments of the present invention shown in Figures 10A and 11B will consider the problem of interpreting (i.e., classifying) ZIP Codes written by hand, a case in which the length of The series of characters is generally known, which is going to be five to nine digits. However, in alternative embodiments, the method and system of the present invention can be used to interpret arrays of characters (i.e., words) of arbitrary length, as in the larger context of sentence recognition systems, automated, caught in the technique. In Figure 2, Blocks A through I schematically represent the various steps carried out during the process of interpreting the character series of the present invention. As indicated in Block A in Figure 2, the first stage of the process involves capturing an I image of a series of characters. Typically, each I image acquired by the system 1 comprises a pixel array. Each pixel in the image matrix has a brightness in the scale of the gray color, representative of the intensity of the image in the location of the pixel in the image.
In addition, the saturation of the pixels can be represented. Each acquired image is stored in the separator circuit (s) 7 of the recurrent pulse cycle. As indicated by Block B, the second stage of the process involves the "preprocessing" of the stored image I. The appropriate image preprocessing operations performed by the processor 2 typically include: the location of the "region of interest". "; the removal of underlines; the disengagement and antishearing of the image; the removal of fly spots (that is, small connected components) and the imposition of paths; and the normalization of the image to a standard size (for example, 20 pixels of height, with a chosen width so that the relation or dimensional reorientation of the image remains unchanged). Notably, the normalization of the image is made so that the preprocessed image I 'can be sent to subsequent stages of the system without requiring further normalization of the image. Frequently, the resampling performed during the normalization procedure, leads to an image effectively in the gray scale if the original image was in black and white. The upper and lower contours of the normalized image are then used to secure long rows of characters in the directions of the image. vertical as horizontal. The additional details with respect to the above image preprocessing operations are described in the co-pending Application No. 07 / 816,414 of the Applicant entitled "Alphanumeric Images Segmentation Scheme" filed on December 31, 1991, supra. The next stage of the recognition process, indicated in Block C, involves cutting the preprocessed image I1 into the sub-images, called "cells or cells". The purpose of generating cells or cells of the image is so that the cells or cells of the image can be combined to form "segments" of the image S. during the stage of formation of the segment of the image, indicated in Block D In accordance with the present invention, the cells or cells of the image are generated by first performing the "Connected Component Analysis" on the preprocessed image to detect the presence of large "connected components". After this, a drawing process of the cutting line is vigorously "waved" to these sub-images containing the large connected components. Both the Connected Component Analysis subprocesses and the vigorously undulating cut line pattern are carried out by the programmed processors 2 using the associated RAM 4. More specifically, the Connected Components Analysis analyzes the intensity of the pixels comprising the preprocessed image to determine the presence of the character components (i.e., groups of pixels) which are connected or linked together. Notably, Connected or Linked Component Analysis investigates along the vertical and horizontal image directions for the groups of pixels that have a predetermined range of intensity value and which appear to form the large character components that are linked together, and which will probably be associated with one or more characters. The examples of connected or linked character components are indicated in the second and third images of the ZIP Code, presented in Figures 4A to 4F. It is possible that more than one character is contained within the sub-image that contains a large connected component. It is important to determine where a cut line will be drawn through such a sub-image so that no more than one character is represented by the pixels of a cell or cell of the image. This is achieved by generating "undulating" cutting lines through the large, identified connected components. In general, this process of generating the line of Cutting is allowed to subdivide a character represented by a large connected component, into two or more cells or cells of the image, by simply drawing cut lines through the group of pixels representing the character. The number of ways in which adjacent cells or cells of the image can be combined to construct a segment of the image, grows rapidly with the number of cells or cells, of the image generated during this stage of the recognition process. The system of the present invention avoids cutting the preprocessed image into ultra-small cells or image cells by using complex heuristic characteristics which identify a set of good cutting lines, remove or remove reduncting and similar lines. The operation of this subprocess is illustrated by the cut lines drawn on and removed selectively from the preprocessed image shown in Figures 4A to 4F. At the end of this subprocess, the pixels between each adjacent pair of the remaining cut lines define a "cell or cell" of the image. The cells or cells of the image produced during the process of generation of the cell or cell of the image, exemplary, are tabulated in Figure 5. As shown in this table, each cell or cell of the image is identified by a number from the cell or _ »*» _ &-.- JfcMg cell, for example, 0, 1, 2, 3, 4, etc. Additional details regarding the automated generation of cut lines during this stage of the recognition process here are described in Application Serial No. 07 / 816,414, copending, supra. As indicated in Block D in Figure 2, the subsequent stage of the process involves combining cells or cells of the contiguous image (ie, consecutive), in an order from left to right, to produce a set of "segments" of the image, as shown in the table of Figure 6. As shown in this table, each segment of the image is identified by combining the numbers assigned to its cells or cells of the constituent image, for example, 0, 01, 1 , 2, 23, etc. Ideally, each segment of the image contains pixels that represent one and only one character. However, this will not always be the case. It is important that the final set of segments of the image contain segments of the correct image. The complex heuristic characteristics are used to determine how many cells or cells of the image and which cells or cells of the image must be combined to build segments of the image. In general, the heuristic characteristics are expressed in terms of defined "cuts", cuts of the "interconnected component", cuts of the "interconnected component", etc. The parameters and adjustment factors of these Heuristic characteristics are determined empirically. Each segment of the image consists of a set of pixels of the image which will be analyzed by the network of processing of the assigned neural information, invoked by the system. As will be explained in more detail here later, the function of each neural network is to analyze the set of pixels of each segment of the image to which it has been assigned and to produce as an output, a mark or record for each of the (10) possible numeric characters (ie, symbols) that the set of pixels could actually represent or can be classified as possibly. The next stage of the process, generally represented in Block E, involves serially placing, from left to right, "segments" of consecutive images to produce a set of "co-segmentations" of the acceptable (ie, legal) image. Each of such co-segmentations of the image must be taken into account for all of the pixels in the preprocessed image I '. Preferably, it is desired to consider as few co-segmentations of the image as possible, still ensuring that the correct co-segmentation is contained in the set of all the co-segmentations of the image constructed from the cells or cells of the generated image. In the table of Figure 7, three of the many co-segmentations of the legal image for the ZIP Code of five exemplary characters are shown. As indicated in Block E, the co-segmentations are implicitly formed by the "acyclical or unipolar, direct alignment graph" of Figure 8.- The structure of the graph ensures that each of these co-segmentations of the image consists of of five segments of the image. To capture the reality of the spatial structure of the input image I, there are rules which govern the way in which the segments of the image can be serially placed together. For example, the right edge of a segment must be contiguous with the left edge of the segment of the subsequent image (ie, it is not permissible to skip a group of pixels and / or to combine the pixels in the wrong spatial order). However, if desired, some of these restrictions may be relaxed under appropriate conditions. Additional details regarding the series placement together of the consecutive S-image segments are described in copending Application Serial No. 07 / 816,415 entitled "Graphic System for Automated Recognition and Segmentation for Systems of Recognition of the Image ", supra. If desired, ? iÉ ^^^ m i ^ '. the co-segmentations of the selected image can be explicitly displayed in Block F. Notably, the acyclic or unipolar, direct graph of the present invention also provides a novel means for modeling (ie, representing) simultaneously, both the whole of possible co-segmentations of the image. { s} of the preprocessed image I 'and the set of interpretations. placed in series of the characters (ie, the classification) \ Cf that are allowed by the character alphabet and possibly rected by the syntax of the language or the code, in which the series of registered characters has been expressed. As will be described in detail with reference to Figure 8, this data cture, graphically expressed as a "direct acyclic or unipolar graph", is used by the system of the present invention to formulate, in a unified manner, the problems both co-segmentation of the image and interpretation of the series of characters, as a problem of a "better trajectory through the graph". Intuitively, this formulation of the problem has geometric resources. The alignment graph, the data cture that implements it, and the processes that use it, will be described in detail later. After this, the processes that use this graph will be described in greater detail during the Image Segment Analysis Stage indicated in Block G in Figure 2, the Track Record and the Probability Computation Stage indicated in Block H, and the Interpretation Stage of the Series of characters indicated in Block I of the same. As shown in Figure 8, the graph of the present invention comprises a two-dimensional arrangement of nodes which, at a high level of the description, is similar to graphs of the prior art, referred to as lattice or lattice diagrams. . As will become apparent hereinafter, the graph or alignment graph of the present invention is implemented by a data cture which performs a number of important modeling functions. In a well-known way in the programming technique, this data cture is created, modified and managed by the programmed processors 2. Each node in the graph or alignment graph is realized as a separate data cture, which is a subcture of the "main data cture". The cture of the data for each node has a number of "local" information fields tagged and specifically adapted for the storage of the following information items: a unique node identifier (ie, the code that identifies the address of the column / row of the node); the computed registers for each of the possible numerical characters that the pixels of the segment of the associated image can represent; the "unnormalized" records calculated for each of the possible numeric characters that the pixels of the associated image segment can represent; the node identifiers of their predecessor nodes; and the identifiers of the node, of its descendant nodes. To store the information produced during each case of the process, the main data cture has a number of "global" information fields tagged and specifically adapted to store the following information items that include: a set of codes that identify which segment of the particular image is represented by the nodes of each particular row in the graph or alignment graph; a set of addresses that identify where each segment of the image is stored in memory; and the sum of the records along the selected trajectories and the groups of trajectories through the graph or alignment graph that represents the same interpretation of the series of characters.
In the first illutive mode, the number of columns in the alignment graph is equal to the number of characters in the interpretations of the possible series of characters (for example, 5 for the ZIP code in Figure 3). Also, the number of rows in the alignment graph is equal to the number of segments of the image constructed during the generation stage of the image segment of the process. Accordingly, the size of the graph or alignment graph will typically vary in the row size for each I image that has been acquired for interpretation (ie, analysis and classification). Consequently, for each acquired image I, the programmed processor 2 routinely generates a graph of the type shown in Figure 8 which is particularly adapted to the acquired image. Each one of these alignment graphs is physically implemented creating a data structure corresponding to it, which is stored in RAM 4. The information, which is related to the co-segmentations of the image, for the image I and its possible interpretations of the series of characters are stored in the information fields of the data structures created particularly for this image. Finally, this organized information is used by the programmed processor 2 to select narrate the interpretation C of the series of characters, most likely, from the set of candidates of the same As illustrated in FIG. 8, the graph or alignment graph of the present invention has a number of fine structural features. The main part of the graph contains rows and columns. Each column corresponds to a character position in the C interpretation of the character string. Because the example deals with a 5-character ZIP code, 5 columns are required as shown. Each row corresponds to a segment of the image. Because the example contains 11 segments, 11 rows are required, as shown. At each intersection of a row with a column, there is a node, represented by a pair of points (• •). The left point represents the "morning" part of the node and the right point represents the "night" part of the node. Each such node can be specified by its row index and its column index. In addition, there is a special start node 17, located before the first position of the character and to the left of the segment of the image further to the left. There is also a special end node 18, located to the right of the position of the last character and below the image segment to the right. 1 As illustrated in Figure 8, there are ten recognition arcs that connect the morning and night portions of each node. For clarity tones, only three of the ten recognition arcs are visible in Figure 8. During the interpretation process, each recognition arc 19 is labeled with a "mark or record r" that is assigned to the character represented by the recognition arc. In the exemplary mode, these recognition arcs represent * marks or r records without normalizing, assigned to the numeric characters comprising the ZIP Codes. However, in word and sentence recognition applications, these recognition arcs will typically represent the non-normalized marks or registers, assigned to the symbols in a predetermined alphabet or vocabulary. As shown in Figure 8, a direct bonding or bonding 19 is also drawn between each night portion of a node and the morning portion of its immediately adjacent node to represent the node's descent and descent, between such nodes. Unlike the recognition arcs, the joining or gluing arcs in this example are not assigned marks or registers r by a neural network. In other modalities, complex union marks or registers can be used, but a system is used for this modality ~ ° ^ - F --i ^ *? T ^ '* * - ** - V simple: the allowed arcs are assigned the mark or record of 1.0 and are retained, while the disabled arcs are assigned a mark or record of 0.0 and are discarded from the alignment graph. As shown, the morning portion of any node can have more than one connecting arc introduced to it. Similarly, the night portion of any node may have more than one joining arc leaving it. As a result of the restrictions imposed on the construction of the co-segmentations of the image, there may be arcs of union in the graph or graph of alignment that make the detection locally, but they do not make the detection globally. Consequently, certain bonding arcs can be removed or trimmed to improve the computing efficiency of the interpretation process. The following procedure for trimming the "joining arc" can be performed on the graph prior to the advancing to the Image Segment Analysis Stage indicated in Block G in Figure 2. The first step of the Trimming procedure of the joining arc involves the computation of the "front cone" of the nodes that are descendants of the starting node, iteratively marking the descendants of the nodes that are already marked, as elements of the front cone. The second step of the procedure involves the computation of the "back" cone of the nodes that are the ancestors of the end node, iteratively marking the descendant of the nodes that are already marked-two as elements of the back cone. The third step of the procedure involves determining which nodes are not in the logical intersection of these two cones, and then marking these cones as "dead". After this, any binding arcs extending to or from a node marked "dead" are erased (ie, trimmed) from the list of allowed joining arcs. Each node within the intersection of these cones is considered "alive or active" and will have marks or records assigned to its set of recognition arcs during the analysis stage of the image segment. Notably, satisfying this global constraint, there will typically be many nodes in the top corners on the right and bottom left of the alignment graph that have no legal ancestors or descendants. This fact is represented by the absence of input and output linking arcs in these regions of the graph or alignment graph, as illustrated in Figure 8. In addition, if necessary or desired, the graph or graph of alignment can be trimmed using the presence of defined cuts.
Each trajectory through the graph or graph represents both a co-segmentation and an interpretation. The arcs of union in the trajectory specify the co-segmentation, while the arcs of recognition in the trajectory specify the interpretation. To understand how the process of the present invention selects the interpretation of the "correct" series of characters either from the complete set of possible interpretations of the series of characters, or from a much smaller set of interpretations of the series of characters than they are competing, first it is necessary to understand the various subprocesses that precede the last selection of the interpretation of the series of "correct" characters. The first sub-process refers to the computation of unnormalized marks or registers, assigned to the recognition arcs in each node. The second subprocess refers to the calculation of the sum of the marks or registers r associated with the totality of the trajectories of the series of characters through the graph or graph of alignment that represents the same interpretation of the series of characters. These subprocesses will be described later. As illustrated in Figure 9, the analysis stage of the image segment of the inter-prefation process makes use of a complex of neural computer networks 21. The primary function of each computer network Neurosurgery is to analyze the pixels of the segment of the image coindised with the first row in the graph or graph, and calculate a set of "marks or records" (ie marks or records r) which are assigned to the arcs of recognition in each node in the lowest row in the graph or graph. There is one segment and therefore one neural network per row, and all the nodes in the same row receive the same set of ten marks or records r. For reasons of clarity, only three of the ten recognition arcs are shown in Figure 9 for each node. In essence, each neural network assigns coordinates to its input (a group of pixels represented by a set of numbers) for a set of ten numbers ro r1- ... r "9 called marks or registers r. The architecture of the network guarantees that these marks or registers r are positive, allowing their interpretation as probabilities without normalizing. A large value for r. it represents a high probability that the input segment represents the digit "0", and similarly the other new marks or registers r correspond to another nine digits, respectively. A large mark or record also reflects a high probability that the input segment is part of a correct co-segmentation of the image; on the contrary, if a Segment was formed by selecting a digit in half (as it sometimes can) all of the ten marks or registers r for estj¡§ > segment must be small to represent the detection of the undesirable nature of the segment. According to the convention, the coordinate assignment function of each neural computing network is characterized by a set of adjustable parameters that can be represented in vector form as a Weighting Vector W with the components, w. 1, w2 _, ... wm.
Initially, the set of adjustable parameters of each neural network is adjusted to a set of initial values. However, as will be described in greater detail hereinafter, a Parameter Adjustment Step of the Neural Network indicated in Block J in Figure 2 is provided so that during one or more learning sessions, these parameters can be adjusted increasingly so that the input / output coordinate assignment function of each neural network is conditioned to conform to a reliable set of training data. In the preferred embodiment of the present invention, this training data set consists of several hundred thousand valid training images taken from the ZIP Codes that have been written to 3. hand by different people! through the country. In the exemplary embodiment, the r marks or registers produced from each i neth neural computing network are expressed in the vector form as r = r1, r_ ..., r, and assigned to the ten corresponding recognition arcs ( that is, the information fields) in all the nodes in the ninth row of the g * ra • alignment. In general, each neural network can be realized as a computer program, an electronic circuit, or any microscopic or macroscopic device capable of implementing the input / output coordinate assignment function of a neural network. In the preferred embodiment, however, each neural network is implemented by executing the well-known LeNet computer program, described in greater detail in the technical article entitled "Handwritten Digit Recognition with a Retro-Propagation Network", by Y. Le Cun, et al., Published on pages 396-404 in Advances in Neural Information Processing 2, (David Touretzky, Editor), Morgan Kaufman (1990). A further description regarding the construction and training of neural computing networks can be found in the article "Automatic Learning, Rule Extraction, and Generalization" by John Denker, et al. cited on pages 877-922 in Complex Systems, Vol. 1, October 1987. In the graph or graph of alignment, there exist or may exist two or more trajectories (representing different co-segmentations) that represent the same interpretation of the series of characters. In accordance with the principles of the present invention, the trajectories representing a given interpretation should be considered as a "group". The mark or record assigned to the given interpretation must depend on the sum of the marks or records of all the trajectories in the group. This is in contrast to prior art recognizers, who generally consider the mark or record for only one trajectory in this group, without considering the contributions of other trajectories in the group. In the case of images containing five digits, there will usually be 10 possible different interpretations, and the number of trajectories through the alignment graph can also be even larger. Therefore, it is impractical to display them explicitly, or to consider each possibility individually. The structure of the data and the algorithms of the present invention allow the system to identify certain important groups of trajectories, such as the group of trajectories that correspond to a given interpretation, or the group of all the trajectories, and to efficiently evaluate the brand or record of the group, that is, the sum of the marks or records of the trajectories in the group. The system of the present invention analyzes the pixels of the acquired image I, and calculates the sum over all the trajectories through the graph representing the candidate's interpretation (ie, the classification) for which the probability is being calculated. Each term in the sum is the product of the marks or records assigned to the arcs that comprise a certain path in the graph or graph of alignment. Normalization is carried out only after the sum is calculated. We call this normalization "series by series". In contrast, prior art recognizers who calculate probabilities generally normalize records relatively early in the process, typically in a way that is equivalent to "character by character" normalization, whereby valuable information about of the quality of the co-segmentation. It is important that the training process of the neural computing network described below, trains the complex of neural networks to produce marks or records r that they contain information about the probability that a given co-segmentation is correct, and not only the probability that an interpretation of the given character of the segment is correct. The normalized mark or record, produced by the system and method of the present invention, represents an estimate of the posterior probability P (C / l). In contrast, the maximum probabilities of sequence estimation, used in the MCR systems of multiple character recognition of the prior art, generally use a priori probabilities (ie probability or likelihood) of the P form (l / C) ). This is acceptable for many purposes, since these different measures of probability may be related to each other, giving some additional information. The real advantage of the a posteriori formulation is that the internal calculations of the method and system of the present invention depend on the estimates of the probability of posterior binding of the interpretation and co-segmentation P (C, S / l). The expression of the corresponding a priori (probability) p (I / C, S) can not easily be related to the a posteriori form useful, since it is generally not feasible to estimate the marginal probability P (S). As a result, the recognizers of the prior art, although they are able to identify the interpretations of the brands or higher registers, typically are unable to assign marks or records which are properly normalized. The well-normalized marks or registers of the present invention can be more easily interpreted as probabilities and therefore can be more easily combined with information from other sources. In general, the objective of the procedure described in Figure 10 is to calculate a new a posteriori probability P (C / l) for each interpretation of the series of characters that are in competition, represented by the alignment graph illustrated in the Figure 9. Notably, each such probability is calculated as a ratio expressed as a portion of the numerator divided by a denominator portion. Mathematically, the probability measurement of the present invention is expressed as: Notably, the first term Er (Cl f S ', I) in the portion of the numerator represents a series of multiplications of the marks or registers r along the u ¿g ^ i ^^^^^^ fa ^ ^ sssjj-f- *. -S & -. ^ A, ¥ 't ^ arcs of each trajectory (S.1), and the portion of the complete numerator S IIr (C S1 I) represents a sum of S "? such record products of the trajectory, on the totality of the trajectories (ie, the co-segmentations S1) that represent the same interpretation of the series of characters. The first expression S IIr (C ', S "I) in a portion of the denominator represents the sum of the products of the trajectory record, over all the trajectories that represent the same interpretation of the series of characters , and the portion of the complete denominator SS llr (C?, S "lf I) represents a sum of the C 'S "i totality of the products of the mark or record of the trajectory over the totality of the interpretations of the series of characters {c.}., Represented by the graph or graph of alignment, since the portion of the denominator includes the contributions of all possible interpretations, their value depends only on the acquired image I, not on the particular interpretation C The purpose of the denominator portion is to ensure that the probability is properly normalized, such that the sum ( over all the C) of P (C / I) is equal to the unit (that is, 1), according to the general principles of probability. Once the numerator portion is calculated for an interpretation of the particular character string, then the probability for this interpretation of the character string is obtained by simply dividing its calculated numerator by the common denominator. Fortunately, there are a number of different ways in which the computation or probability calculation procedure, described above, can be used, such as by incorporation into a larger procedure, to arrive at an interpretation of the "correct" character string. " One approach is illustrated in the flow diagram of Figures 10A and 10B, while an alternative approach is illustrated in the flow chart of Figures 11 A and 11B. These two alternative approaches will be described in detail later. The steps of the first method of interpreting the series of characters of the present invention are described in the flow diagram of Figures 10A and 10B. As indicated in Block A, the first step of this procedure involves the use of a neural computation network, shown in Figure 9, to calculate or compute the set of marks or records. to . n ^ SS ^^ ass tros r for each of the nodes along the ith row in the graph. Then, how was indicated in Block B, the procedure uses / pj? Well-known Viterbi algorithm, to identify (as a sequence of the codes that represent the arcs of union and the arcs of recognition) the trajectory through the graph of alignment that has the mark or record of the maximum trajectory. Then the processor identifies the interpretation of the series of characters that correspond to this trajectory. When the mark or record of the trajectory for this trajectory of the series of characters, is only an approximation in which, by itself, it is not a reliable measurement, it is therefore discarded. Only the information that identifies the interpretation of the C character series. represented by this path (for example, 35733 for ZIP codes of five characters), is retained. Then as indicated in Block C in Figure 10A, the procedure uses the well-known "Forward Algorithm" to calculate the same portion D (I) of the common denominator of the probability measurement for the interpretation of the identified character set. . This number is then stored in a main data structure used to implement the graph of alignment in Figure 8. The use of the Forward Algorithm provides a precise value for the sum (the totality on the trajectories) of the product (along each path) of the marks or registers r without normalizing for all of the possible interpretations of the series of characters represented by the alignment graph. In some embodiments, the union arcs contribute to the marks or records of the trajectory only by virtue of their presence or absence. In the more complex modalities, the marks or registers can be assigned to the arcs of union (as well as to the arcs of recognition) and the totality of such marks or registers are included as factors in the product along each trajectory. How is it? As indicated in Block D in the Figure 10A, after calculation or computation of the portion of common denominator D (I) above, the procedure uses the "Forward Algorithm" to compute or calculate the portion of the numerator N (C (, v.) 1) of the measurement of probability to correct the interpretation of the C () character series previously identified by the Viterbi Algorithm. This number is then stored in the main data structure, used to implement the alignment graph of xa Figure 8. Here, the Forward Algorithm accepts as input the code that identifies the interpretation of the selected series of characters, identified by the viterbi Algorithm, and produces as & z ^ .X?. ,, output a precise numbered value (ie, a restricted sum) for this interpretation of the selected character string C (). Notably, the portion of the numerator computed or calculated for the interpretation of the series of characters is equal to the sum (over the trajectories) of the product of the marks or registers r without normalizing, along each path through the alignment graph which represents the interpretation of the character string C (). During this computation or calculation of the portion of the numerator, the arcs of union are treated in the same way as during computation or calculation of the denominator. As indicated in Block E in Figure 1 OA, after the denominator portion and the numerator portions have been computed or calculated, the improved probability P (C / I) is computed or calculated for the interpretation of the character series C (). This probability is then stored in the main data structure. Finally, as indicated in Block F in Figure 10B, the processor determines whether the computed or calculated probability in Block E is greater than a threshold. If this is the case, then the processor ensures that the interpretation of the character string, selected by the Viterbi Algorithm, is the interpretation of the character string of the Highest probability for the analyzed image I. After this in Block G, the processor produces as output of the system, both (i) the interpretation of the series of characters (for example, 35733) and (ii) the computed probability or calculated, associated with it. Together, these two items can be used (in conjunction with other information) as the basis for the decision how to direct the mail piece. At this stage in the processing, there may be one or more reasons that may be advantageous for performing additional calculations to identify additional, high register interpretations. For one, it may be desirable to ensure that the interpretation of the highest probability has been identified even if in Block F the probability assigned to C () is less than 0.5. In this case, a set of interpretations of the series of characters that are competing is identified, and the probability is calculated or computed for each element of the set. Also, the present invention can be used as part of a larger system where multiple interpretations (and probabilities) thereof are used in subsequent processing. Specifically, an interpretation which is given a high probability by the present invention based on the image of the acquired pixel, can be excluded by the latter stages in the largest system, * p? after which an alternative interpretation may be necessary. For these and other reasons, the alternative procedure described in the flow diagram of Figures 11 A and 11B can be used. As indicated in Block A in Figure 11A, the first step of this procedure also involves the use of the neurosynth computing network to compute or calculate the set of marks or registers r for each of the nodes at the same time. along the 10th row in the graph. Then as indicated in Block B, the procedure uses a Beam Search algorithm to identify (as a sequence of the codes that represent the arcs of union and the arcs of recognition) a relatively small set of trajectories through the graph of alignment. After this, the set of interpretations of the series of characters that are in competition. { c J ", which corresponds to this set of trajectories, is identified As indicated in Block C of Figure 11 A, the processor uses the well-known Forward Algorithm, to compute or compute the denominator D (I) which It serves as the portion of the probabil- ity denominator P (CI) for each C interpretation in the set mt-. aaa > - . { c.j of the interpretations that are in competition. This number is stored in the structure of the main data. Again, the Forward Algorithm provides a precise value for the sum (over the trajectories) of the product of the marks or registers r without normalizing, for the arcs along each trajectory. In the case of the denominator portion, the sum works on all the possible interpretations. 10 To compute or compute the marks or records for the identified interpretations, as indicated in Block D, the processor uses the Forward Algorithm to compute or calculate the portion of the numerator N (C./I) of the probability for each interpretation. C. 15 of the series of characters that are in competition. These numbers are then stored in the main data structure. The Forward Algorithm provides a precise value for the sum (over trajectories) of the product of the unregulated r-marks or registers, for the arcs along each path. Note that the sum computed or calculated by the Forward Algorithm is a sum over the trajectories. A trajectory is the path identified by the Beam Search algorithm (Lightning Search or Do) in Block B. Indeed, this trajectory produces the largest term in Si could be assumed that the sum is well approximated by its largest term, it would be unnecessary to operate the Forward Algorithm to evaluate the numerator; the results of the Beam Search Algorithm may be sufficient. This is called the approximation of the "sum of a term". However, the sum is not always well approximated by its larger term, and therefore it is advantageous to discard or discard the marks or records computed or calculated by the Beam Search Algorithm, retain the interpretations identified by the Beam Search Algorithm, and evaluate the marks or records of retained interpretations using the Forward Algorithm. Typically it is not feasible to compute or calculate the numerator for all possible interpretations, that is why it is advantageous to identify, in Block B, a relatively small set of interpretations which, because of their large "one-term" marks or records , they were expected to have large numerators, and ipso facto large probabilities. The foregoing description describes the operation of the system after it has been trained. Now the learning mode of the system will be described. To achieve optimal performance, the system of interpretation of the character string of the present invention is provided with an automatic learning mode of operation, which makes it possible for the system to be automatically trained during one or more learning sessions. This mode of operation will be described in detail later with reference to Block J in Figure 2 and to the system illustrated in Figure 9. As illustrated in Block J in Figure 2 and illustrated in Figure 9, the system of interpreting The character set of the present invention includes a Neural Network Parameter Adjustment Module 29, which interacts with both the graph 30 and the complex of the neural network (s) 21 of the system. shown in Figure 9. In general, the training process of the present invention is based on the concept of supervised learning, that is, for each image I * in the training set, there is an attributable interpretation C *. In the illustrative mode, the Neural Network Parameter Adjustment Module is designed to ensure that the expected P (C * / I *) probability (that is, average) for the interpretation of the character string, correct, is increased during the processing of all the images I * in the training set, while the expected probability P (C / I) for each of the interpretations X * In short, the objective of the learning mode, and therefore of the Parameter Adjustment Module, is to ensure that the average probability of misinterpretations is minimized, while the probability for the correct interpretation C of the series of characters is maximized.In the illustrative mode, log [P (C / I)] is chosen as the objective function, because the log function has a steeper slope near zero. This causes the training process to emphasize the low-register pixel configurations (ie, the segments of the image) which is advantageous because these are ones that are more problematic and require more training. To perform the chosen objective function, the processor uses the gradient thereof, which is expressed as: where w = w. , w_ ..., w is the Weighting Vector of the m-dimensional neural network and ri. = r, 1, r_2, .... rn is the vector of the n-dimensional register produced as the output 25 of the neural network i. Typically, the weighting vector w has 10,000 or more components. The vector r has exactly 10 components for digital recognition. The product of points on the right-hand side of the previous gradient expression implies a sum over the components of r. In general, there is a gradient expression of this form for each of the neural networks, that is, for each row in the alignment grid. Sometimes it may be advantageous to control more than one ed using the same weighting vector w, in which case the gradient of w contains contributions from each of such networks. In the exemplary embodiment, as shown in Figure 9, the Weighting Vector is stored in a register or register 31 which provides the same Weighting Vector to each and all neural networks in the system. Before starting the multi-character recognizer training process described here, the neural network weighting vector must be started. It can be started with random values according to some reasonable contribution, or it can be initiated with chosen values that are believed to be especially appropriate. In many cases, it is advantageous to temporarily disconnect the neural network from the alignment graph and pre-train it over the manually segmented images, as if it were to be used as a recognizer of a unique character. fa & ¡^ ^ ¡¡ggg ^ ¡y ^ The resulting Weighting Vectof values serve as a starting point for the training process of the multiple character recognizer described hereinafter, "The left side ** of the gradient expression is called the System Sensitivity Vector because of that it is a gradient that provides the information with respect to the sensitivity of the output of the complete system with respect to the changes in the Weighting Vector w. Each component of the System Sensitivity Vector belongs to the corresponding component of the Weighting Vector. Specifically, if a given component of the System Sensitivity Vector is greater than zero, a small increase in the corresponding component of the Weighting Vector will cause an increase in the probability P (CI) that the system assigns to interpretation C for the I image. in question. Briefly, the System Sensitivity Vector can be used to optimize the fit of the objective function described above. For a deeper understanding of the principles that underlie the process of training them, it is useful to appreciate the nature of the quantities that comprise the gradient function. According to the previous formula, the Vector System Sensitivity is calculated or computed as the product of points (product of vector matrix) of two different quantities shown on the right side of the formula. The first of such quantities is a vector, SlogP / Or, which provides information about the sensitivity of the output of the graph with respect to changes in the marks or registers r r -... r provided at its input. This can be taught as the Graph Sensitivity Vector. The second quantity is a matrix <;) r / W of N x N that provides information about the sensitivity of the outputs of the neural network with respect to the changes in the Vector of Weighting that controls all of the neural networks. Only for conceptual purposes, the three previous terms can be thought of as being functionally interrelated as follows. During the processing of each I * training image, the evaluated Neural Network Sensitivity Matrix is used to transform the evaluated Grade Sensitivity Vector to produce a System Sensitivity Vector evaluated. In turn, the individual components of the System Sensitivity Vector evaluated, adjust the corresponding components of the Weighting Vector so that the objective function P (C * / I *) of the Parameter Adjustment Module, be optimized. In theory, the System Sensitivity Vector can be obtained by numerically evaluating the terms on the right side of the gradient function, and then perform the mathematical operation specified by this. However, during the training session there is a simple way to operationally evaluate the System Sensitivity Vector for each image / interpretation pair l *, C * J. As will be explained later with respect to the flowchart in Figure 13, the well-known Back-Propagation Algorithm (Back-Prop) can be used to evaluate the System Sensitivity Vector in a computationally efficient manner, without having to explicitly evaluate the Neural Network Sensitivity Matrix When the system of the present invention is operated in its learning mode, the training process of Figure 13 is performed for each I * image in the database of the training device, indicated in Block K in Figure 2. Each I * image has associated with it an interpretation of the known C * character series, typically a large number (eg, tens of thousands) of the Image / interpretation pairs «fl *, C * J- are used to train the system during the course of the particular training session.
As indicated in Block B in Figure 2, each I * image is preprocessed in essentially the same way as was done during the interpretation process of the present invention. Also, as indicated in Blocks C to E in Figure 2, the image segments and the image co-segmentations are constructed for the I * image in essentially the same way as was done during the process. of interpretation of the present invention, respectively. Then, as indicated in Block F in the Figure 2, a graph model is constructed for the co-segmentations of the generated image and the possible interpretations of the series of characters associated with the I * image. In this stage of the training process, the training method of the illustrative modality exploits the following facts. First, each probability P (C * / I *) has a numerating portion N (C * / I *) and a common denominator portion DI. Second, using the well-known properties of logarithms and derivatives, the Graph Sensitivity Vector (that is, the partial derivative of logt P (C * / l *)] with respect to the variables of the r record, can be re-expressed as follows: dr Importantly, the Graph Sensitivity Vector, represented conspicuously on the left side of the equality, can be easily evaluated by the procedure described in Figures 13A and 13B, as described below. As indicated in Block A in Figure 13A, the processor executes the Forward Propagation Algorithm (Forward) once to numerically evaluate the denominator portion of the probability P (C * / I *) for the image pair / interpretation -. { l *, C * l, and once to evaluate the denominator portion of it. These values are then stored. Notably, during this step of the process the Forward Algorithm exploits the fact that the graph built for the image / interpretation pair. { l *, C * j, implicitly represents the analytic (ie, algebraic) expressions used to mathematically express the numerator and denominator portions of the associated probability P (C * / I *). In Block B of Figure 13A, the processor executes the well-known Baum-Welch Algorithm, to numerically evaluate the partial derivative of the numerator portion of the probability P (C * / I *) with respect to the variable r. In Block C, the processor uses the Forward Algorithm to calculate the value of the denominator portion of the probability P (C * / I *). In Block D, the processor executes the well-known Baum-Welch Algorithm, to numerically evaluate the partial derivative of the denominator portion of the probability P (C * / I *) with respect to the variables. Then in Block E in Figure 13A, the processor uses the evaluated numerator and denominator portions, and the evaluated partial derivatives thereof, to numerically evaluate the Graph Sensitivity Vector according to the formula described above. To efficiently evaluate the System Sensitivity Vector for the image / interpretation pair l *, C * 3", the training process then adjusts the gradient vector of the output layer of each neural network to make it equal to the corresponding components. -tes of the graph sensitivity vector, numerically evaluated, as indicated in Block F in Figure 13B, then in Block G, the processor uses the Backward Propagation Algorithm to compute or calculate the components of the Vector Sensitivity Vector. System according to the formula described above The details of the process by means of which the Backward Propagation is used to calculate the desired result, can be found in the article "Automatic Learning, Extraction of Rules, and Generalization" by Denker et al., Supra.
Notably, the Backward Propagation Algorithm is not used to explicitly evaluate the Neural Network Sensitivity Matrix, *. but rather to evaluate the matrix vector product of the Neural Network Sensitivity Matrix and the sensitivity vector of the graph. The result is an explicit evaluation of the Total System Sensitivity Vector. The latter suggests an advantageous direction in which each of the components in the Vector of Weighting of each neural network is modernized. As indicated in Block H, after processing each I * image, the processor uses the individual components of the System Sensitivity Vector, evaluated numerically, to modernize the individual components of the Weighting Vector. A preferred upgrade or update procedure is described below. Prior to the update, each i-th component of the Weighting Vector is denoted as wi., 'And after the update, each i-th component is denoted as w.'. After the processing of each I * image, the Weighting Vector is updated according to the following expression: w ^ ^ -w. + d, dlosP ÍC * /! *) dw. where or is the "step size control parameter", where w. denotes the updated weighting vector, and where is the partial derivative -r dwx of logP (I * / C *) with respect to w .. In principle, there is a multitude of different control parameters of step size, delta or. , one for each component of the Weighting Vector, but in practice it may be convenient to adjust them so that all are equal. In general, the value of the step size control parameter depends on (i) the normalization factors chosen for the pixel inputs to the neural networks and (ii) the normalization factors chosen for the intermediate values of the neural networks ( that is, outputs from one layer to the next layer in neural networks), and can be re-estimated during training. In essence, there are two main interests when an appropriate value is selected for the step size control parameter. If the value selected for this control parameter is too small, then the convergence of the weighting vector w with respect to its optimum value proceeds very slowly. On the other hand, if the value selected for this control parameter is too large, then there is a strong ability that the training process will advance by steps above and beyond the optimal value for w. This phenomenon in the Weighting Space W is referred to as the "oscillatory divergence", which tends to deteriorate the total quality of system operation and may completely alter the training procedure. The training process described above is repeated for each image / interpretation pair I *, C * J in the training set. As more and more training data is processed by the system in its learning mode, the values of the individual components of the Neural Network Weighting Vector converge towards the optimal values that satisfy the objective function that governs the training process of the neural network. present invention. Note that during the training process there is no need to use or operate the Beam Search Algorithm or the Viterbi Algorithm. Once the training process has produced a satisfactory Weighting Vector, the system can perform its recognition and registration tasks without additional reference to the training database. This means that in some modalities, the training can be carried out "in the laboratory" and the recognition and registration can be made "in the field". The product obtained in the field does not need to have the provision to store the training database or the training algorithms. In other modalities, it may be desirable that the product obtained in the field be capable of effecting re-training or increasing training, in which case some provision may be necessary to store the selected training examples. In particular, for "personal" recognizers such as those shown in Figure 12, the operation of the system can be maximized by retraining it to accommodate the idiosyncracies of a single user or a small set of users, based on the examples they provide. When the method and system of the present invention are carried out in a handwriting recognition device, portable, the images converted into a bitmap, of the words, series of numbers and the like, which have been validated by the user. -rio, are preferably stored in a non-volatile memory structure in the device. The function of this memory structure is to store both the information converted into a bitmap and formatted in ASCII corresponding to the image / interpretation pairs •. { _I * / C * _r. During a period of use of the device, A set of training data is constructed from such collected information. When the set of training data is of sufficient size, the portable device can be operated in its "learning mode". In this mode of operation, the images in the training data set are processed according to the training process of the present invention. After each pair of image / interpretation is reprocessed, the individual components of the Weighting Vector are incrementally adjusted in a way that the objective function described above is achieved. The large classes of additional embodiments of the present invention can be easily constructed. For example, instead of the preprocessed images derived from the pixel information, the input to the system could be preprocessed images derived from the path information of the pen, or the lists (not in the form of the image) derived from the information of the route. As another example, the input could consist of the preprocessed information derived from an audio signal, for example, speech signals. Similarly, other forms of output can be implemented: the output symbols could represent not only digits, but also alphanumeric characters, phoneme s, complete words, abstract symbols, or groups thereof. It is easy to contemplate applications such as decoding and coded symbols that correct errors, transmitted over a noisy communication channel. In alternative modalities, the function performed by the neural network complex can be carried out by any device capable of (1) s accepting an input; (2) according to a set of parameters, produce an output that can be interpreted as a mark or vector of the records; (3) Based on a given derived vector, adjust the set of parameters in a direction that will change the output in the direction specified by the derived vector. In the alternative embodiments of the present invention, the function performed by the "alignment graph" can be carried out by a conventional dynamic programming network, or any device that processes the sequence information in the required form, specifically: ( 1) accept the marks or records that describe several entities that can be a part of a sequence; (2) efficiently identify several high register sequences and the corresponding interpretations; (3) Efficiently calculate the brand or total record for all sequences consistent with a given interpretation; and (4) efficiently calculate the sensitivity of its results with respect to brands or entry records. Also, the number of modules in the processing chain can exceed two. Each module must have (i) sensitivity outputs (if any previous module has adjustable parameters), (ii) sensitivity inputs (if the same or any previous module has adjusted parameters), and (iii) ordinary data inputs and data outputs. The probabilities described here do not need to be represented in the processor and the memory by numbers between zero and one. For example, it may be advantageous to store them as log probabilities in the range between some large negative number and zero, and to adapt the computing steps that describe series and parallel combinations of probabilities accordingly. The method and system of the present invention can be used to interpret input expressions that have been expressed in, on, within, or through any of a wide variety of media, including, for example, electrically passive recording media (graphics) , such as paper, wood, glass, etc .; electrically active recording media such as writing surfaces sensitive to the pressure and writing and display surfaces, touch screen; phonological record means such as vocal signals produced by humans and machines; and means such as air, in which the wavy paths of the boom therein are encoded (by non-contacting schemes, electrically active, eg, RF position detection, optical position detection , the detection of the capacitive position), then transmitted, recorded and / or recognized using the system and method of the present invention. Notably, in the last application described, it is not necessary for the sequence of symbols, which is represented graphically on a surface, but simply expressed. The system and method of the present invention can also be used with the conventional speech recognition system. In an example of such an application, the input data set will be a pronunciation of recorded speech signals (ie, the phonological signal) represented over the time domain. In accordance with the present invention, the pronunciation of the recorded speech signal is divided into samples of small speech signals (eg, voice signal cells), each of a very short duration of time. Each cell of the vocal signal is preprocessed and divided into velocity cells. The cells of the signal they are then combined to form the "voice signal segments", each containing spectral information representative of at least one phoneme in the pronunciation of the speech signal. These segments of the speech signal are then combined to form the co-segmentations which are represented using the acyclic graph of the present invention. Then, using the co-segmentations and the set of all possible interpretations of the phoneme series, the system and method of the present invention proceeds to compute the a posteriori probability for the interpretation of the highest brand or record phoneme series. The finer details of this process of recognizing the speech signal will be readily apparent to those skilled in the art of speech recognition. It should be understood that r modifications to the illustrative embodiments of the present invention will be readily apparent to those of ordinary skill in the art. All such modifications and variations are deemed to be within the spirit and scope of the present invention as defined by the appended Claims of the Invention. lsR It is noted that in relation to this date the best method known by the applicant to carry out the aforementioned invention, is that which is clear from the present description of the invention.
Having described the invention as above, property is claimed as contained in the following = gr-

Claims (39)

R E I V I N D I C A C I O N S
1. A system for analyzing an input expression and the possible interpretations of the record of the input expression, the system is characterized in that it comprises: segment producing means for analyzing a set of input data representative of an input expression and dividing the set of input data in a plurality of segments, each segment has specifiable limits and are classifiable as possibly representing any of a plurality of symbols in a predetermined set of symbols; segment registration means for analyzing each segment in the plurality of segments, and assigning a record to each possible classification of the segment associated with a particular symbol in the predetermined symbol set; means of representation for representing a plurality of possible interpretations for the input expression, and a plurality of co-segmentations of the image, wherein each of the possible interpretations consists of a different sequence of the symbols, and each of the -segmentations consists of a different sequence of the segments; co-segmentation logging means for assigning records to the plurality of co-segmentations based on the records assigned to the segments; identifying means of the candidate interpretation, to identify one or more interpretations of candidate symbols from the plurality of possible interpretations based on the records assigned to the plurality of segments; means for recording the sequence of symbols for assigning records to the one or more candidate interpretations based on the registers assigned to one or more of the plurality of segments; first means of evaluating the. record to evaluate the records assigned to one or more candidate interpretations; second means of evaluation of the register to evaluate the records assigned to the plurality of possible interpretations; and means producers of the standardized record, to produce a standardized record for each candidate interpretation, using the record evaluated for the plurality of possible interpretations.
2. The system in accordance with the claim 1, characterized in that the input data set comprises a set of pixels associated with an image acquired from a symbol sequence represented graphically, and wherein the means producing the segment analyze the set of pixels and divide the set of pixels into a plurality of segments of the image, such that each segment of the image has specified limits and is classifiable possibly representing any one or more of the plurality of characters in a predetermined character set.
3. The system according to claim 2, characterized in that the means of registration of the segment analyze each segment of the image in the plurality of the segments of the image and assign a record to each possible classification of the segment of the image, wherein each assigned record is associated with a particular character in the default character set.
4. The system according to claim 3, characterized in that the means of representation represent a plurality of possible sequences of characters and a plurality of co-segmentations of the image, wherein each of the possible sequences of characters consist of a sequence of characters, and each of the co-segmentations consists of a sequence of the segments of the image.
5. The system according to claim 4, characterized in that the recording means of the co-segmentation assign registers to the plurality of co-segmentations of the image based on the registers assigned to the segments of the image, and wherein the means identifiers of the sequence of candidate symbols, identify one or more sequences of candidate characters, based on the records assigned to the segments of the image.
6. The system according to claim 5, characterized in that the means of registering the symbol sequence assign registers to one or more sequences of candidate characters based on the registers assigned to the segments of the image, and wherein the first means of Registry evaluation evaluates the records assigned to one or more sequences of candidate characters.
7. The system according to claim 6, characterized in that the second means of evaluation of the register evaluate the registers assigned to the plurality of possible sequences of characters, and the normalizing means of the register normalize the registers assigned to each candi-data sequence of characters using the register evaluated for the plurality of possible sequences of characters.
8. The system according to claim 7, characterized in that the means of representation comprise a data structure representable by a graph comprising a network or two-dimensional arrangement of nodes distributed in rows and columns, and selectively connected by directed arcs, wherein each of the columns of nodes is indexed by a position of the character, and each of the rows of nodes is indexed by a segment of the image in an order that corresponds to the spatial structure of the acquired image and where each path that extends through the nodes and along the directed arcs represents a co-segmentation of the image and a possible sequence of characters, and substantially all of the co-segmentations of the image and substantially all of the possible sequences of characters are represented by the set of trajectories that extends through the graph.
9. The system according to claim 8, characterized in that each of the nodes further comprises a set of recognition arcs, and each of the recognition arcs represents a character and is associated with an assigned record.
10. The system according to claim 1, characterized in that the means of representation implicitly represent the plurality of possible interpretations and the plurality of co-segmentations of the image.
11. The system according to claim 10, characterized in that the means of representation comprise a data structure representable by a graph comprising a two-dimensional array of nodes arranged or distributed in rows and columns, and selectively connected by directed arcs, wherein each of the columns of nodes is indexed by a position of the symbol, and each of the rows of nodes is indexed by a segment in an order that substantially corresponds to the sequential structure of the input data set, and where each path that extends through the nodes and along the directed arcs represents a co-segmentation and a jf * ^.? ^ "s ^? s¡ & possible interpretation for the input expression, and substantially all of the co-segmentations and substantially all of the possible interpretations, are represented by the set of trajectories that extend through the graph.
12. The system according to the claim-cacidr? 1, characterized in that the means of representation comprise a data structure representable by a graph comprising a two-dimensional arrangement of nodes distributed in rows and columns and selectively connected by directed arcs, where each of the columns of nodes is indexed by a position of the symbol, and each of the rows of nodes is indexed by a segment in an order that substantially corresponds to the sequential structure of the input data set, and where each path extending through the nodes already length of the directed arcs represents a co-segmentation and a possible interpretation for the input expression, and the totality of the co-segmentations and the totality of the possible interpretations, are represented by the set of trajectories that extend through the graph .
13. A method for forming an interpretation of an input expression, where the input expression is expressed in a medium, the interpretation is a sequence of symbols, and each symbol is an ele en-to in a predetermined set of symbols, the method is characterized in that it comprises the steps of: (a) acquiring an input data set representative of the input expression; (b) processing the input data set to form a set of segments, each of the segments is at least a partial subset of the acquired input data set and is classifiable by representing any of the symbols in the predetermined set of symbols; (c) forming a data structure that represents a set of co-segmentations and a set of possible interpretations for the input expression, each of the co-segmentations consists of a set of segments which collectively represent the data set of input and are arranged or distributed in an order that substantially preserves the sequential structure of the input data set, each of the possible interpretations for the input expression consists of a possible sequence of symbols, and each symbol in the possible sequence of symbols .faith. ^ e. which is selected from the predetermined symbol set and occupies a position of the symbol in the possible sequence of symbols, the data structure is graphically representable by a graph comprising a two-dimensional array of nodes arranged or distributed in rows and columns , and selectively connected by directed arcs, each of the columns of the nodes, is indexable. by a position of the symbol, and each of the rows of the nodes is indexable by a segment 0 of the image in an order corresponding to the logical structure of the acquired input data set, and each path extends through the nodes and along the directed arcs that represent a co-segmentation and a possible interpretation for the input expression, and the totality of the co-segmentations and the totality of the possible interpretations for the input expression are represented by • the set of trajectories that extends through the graph; 0 (d) for each row of nodes in the graph, produce a set of records for the default set of symbols, represented by each node in the row, where the production of each set of records includes the analysis of the segment that marks or index row 5 of nodes for which the record set is produced; (e) Implicitly or explicitly attribute a record of the trajectory to trajectories through the graph; and (f) analyzing the records of the trajectories attributed to the trajectories through the graph in step (e) to select one or more possible interpretations for the input expression.
14. The method according to claim 13, characterized in that each node further comprises a set of recognition arcs, and each recognition arc represents a predetermined symbol and is associated with a record produced during step (d).
15. The method according to claim 14, characterized in that step (d) comprises using a plurality of adjustable parameters to produce the set of records.
16. The method according to claim 15, characterized in that the information processing means having as characteristic the plurality of adjustable parameters, are used during step (d) to analyze each of the segments and to produce the set of records for it.
17. The method according to claim 14, characterized in that step (f) further comprises: for at least one possible interpretation for the input expression, calculating an amount corresponding to a posteriori probability, wherein each quantity is calculated as the ratio of a numerating portion to a denominator portion, wherein the numerating portion corresponds to the sum of trajectory records for substantially all of the trajectories through the graph representing a possible interpretation for the input expression, where each of the records of the trajectory corresponds to the product of records associated with the recognition arcs along a trajectory, and wherein the denominator portion corresponds to the sum of records of the trajectory for substantially all of the trajectories through the graph that substantially represents all of the possible interpretations for the input expression where each of the records of the trajectory corresponds to the product of the records associated with the recognition arcs as long of one of the trajectories,
* 18. The method of lfrrity with claim 17, characterized in that it also comprises during step (f), (i) determine the trajectory through the graph that has the record of the highest trajectory, (ii) identify the possible interpretation for the input expression which is represented by the path determined in sub-step (i), (iii) calculate the quantity for the possible interpretation for the input expression identified in sub-step (ii); e (iv) provide as the output, the quantity 5 calculated in sub-step (iii) and the index or mark representative of the possible interpretation for the input expression identified in sub-step (ii).
19. The method of compliance with the claim 20 section 17, characterized in that it also comprises during step (f), (i) determining a set of trajectories through the graph that has a high set of records of the trajectory, 25 (ii) identifying the set of possible inter- -------- prefations for the input expression which is represented by the set of trajectories determined in sub-step (i), (iii) calculate a set of quantities for the set of possible interpretations for the input expression, identified in the sub-step (ii) ); (iv) analyze the set of quantities calculated in sub-step (iii) to determine which of the possible interpretations for the input expression has an a posteriori probability of high registration; AND (v) provide as an output, the possible interpretation for the input expression identified in sub-step (ii) and the representative mark or index of the high-posterior probability of registration, determined in sub-step (iv).
20. The method according to claim 17, characterized in that each a posteriori probability is calculated as the ratio of a numerating portion with respect to a denominator portion, and wherein the step (f) further comprises, (i) determining a set of trajectories through the graph that has a high set of trajectory registers, (ii) identify the set of possible interpretations for the input expression which is represented by the set of trajectories determined in sub-step (i), (iii) calculate a set of quantities for the set of possible interpretations for the expression of entry, identified in sub-step (ii); and (iv) provide as output, the set of possible interpretations for the input expression, identified in sub-step (ii), and the quantities calculated in sub-step (iii).
21. The method according to claim 15, characterized in that during step (id), the set of adjustable parameters specifies the relationship between the segment provided to the means of processing the information for the analysis, and the set of records produced from the means of processing the information.
22. The method according to claim 21, characterized in that it also comprises the training of the means of processing information by means of: (i) process a number of known sequences of symbols using the information processing means, and (ii) for each known sequence, increasingly adjust the set of adjustable parameters so that the probability assigned to the correct interpretation is increased over the average, and the probability assigned to incorrect interpretations, decreases above the average.
23. The method according to claim 22, characterized in that the means for processing the information comprise a network for processing the neural information.
24. The method according to claim 13, characterized in that the input expression is expressed using printed or cursive writing techniques, and recorded graphically on a medium of 20 record.
25. A system for forming an interpretation of an input expression, where the expression of input is expressed in a medium, the interpretation is 25 ~ i »Kfta¡-. aa »afeafeag-B» a sequence of symbols and each of the symbols is an element in a predetermined set of symbols, the system is characterized in that it comprises: means for acquiring the data set, for acquiring the input data set representative of the input expression; data processing means for processing the acquired data set to produce a plurality of segments, each of the segments having specifiable limits and classifiable as possibly representing any of a plurality of symbols in a predetermined set of symbols; means of specifying co-segmentation to produce data specifying a set of co-segmentations, each of the co-segmentations consists of a set of segments that collectively represent the set of input data acquired and are arranged or distributed in an order that substantially preserves the sequential structure of the acquired input data set; means that specify the interpretation of the sequence of symbols to produce data specifying a set of possible interpretations for the expression of input, each of the possible interpretations for the expression of input consists of a posi- sequence of symbols and each of the symbols in the possible sequence of symbols, is selected from the predetermined symbol set and occupies a position of the symbol in the possible sequence of symbols; means of storing the data to store in a data structure, the data produced representative of each co-segmentation and each possible interpretation for the input expression, wherein The data structure is graphically representable by a graph comprising a network or two-dimensional arrangement of nodes distributed in rows and columns and selectively connected by directed arcs, and in which each of the columns of nodes is indexable by a position 15 of the symbol and each of the rows of nodes is indexable by one of the segments in an order corresponding to the sequential structure of the input data set acquired, where each path extending through the nodes and along of the directed arcs, 20 represents one of the co-segmentations and one of the possible interpretations for the input expression, where the set of co-segmentations and the set of possible interpretations for the expression of input, are represented by the set of trajectories that 25 extends through the graph; ^ í ^ -. -? & ¿^ x ^ s. »~ - * ates. - ^^ & ^ k ^^^^^ t¡ ^ J »-? ^^^^^. segment analyzer means, to analyze the data in each of the segments, and produce, for each row of nodes in the graph, a set of records for the set of symbols represented by each node in the row; computing means or calculation of the trajectory record, to compute or calculate a record of the trajectory for each of the trajectories through the graph; and 0 means of analyzing the trajectory record, to analyze the records of the calculated trajectory to select one or more of the possible interpretations for the input expression.
26. The system according to claim 25, characterized in that each of the nodes further comprises a set of recognition arcs, and each of the recognition arcs represents one of the known symbols and is associated with one of the 0 computed or computed records.
27. The system according to claim 26, characterized in that the means for analyzing the trajectory record further comprise means 5 for calculating an amount corresponding to a proba- _______ ______ B.J *, y.j. faith & a posteriori for each of the possible interpretations for the input expression.
28. The system according to claim 27, characterized in that each quantity is calculated as the ratio of a numerating portion with respect to a denominator portion, wherein the numerator-portion corresponds to the sum of the trajectory records for substantial- The totality of trajectories through the graph represents a possible interpretation for the expression of input where each of the records of the trajectory corresponds to the product of records associated with the recognition arcs along one of the trajectories, and wherein the denominator portion corresponds to the sum of records of the trajectory for substantially all of the trajectories through the graph that substantially represents all of the possible interpretations for the input expression, wherein each of the records of the trajectory corresponds to the product of records associated with the recognition arcs along one of the trajectories. _ < 3t? ¡? Fa3 *. , • ¿"-
29. The system according to claim 25, characterized in that it further comprises: means for determining the trajectory through the graph having the record of the highest trajectory, means for identifying the possible interpretation for the input expression which is represented by the determined trajectory that has the record of the highest trajectory, means to calculate the quantity for each of the possible interpretations for the input expression; and means to provide as output, marks or indices representing the calculated quantity and possible interpretation for the input expression.
30. The system according to claim 29, characterized in that the means for analyzing the record of the trajectory further comprise: means for determining a set of trajectories through the graph that has the highest set of records of the trajectory, means for identifying the set of possible interpretations for the input expression which is represented by the determined set of paths - -jnáLjéíJÉiiv ¡.tt ^ t £ - * SS * ß Vi-É-L
The means of calculating a set of quantities for the set identifies possible interpretations for the input expression, means for analyzing the calculated set of quantities, and for determining which possible interpretation for the input expression has the highest a posteriori probability ^ the highest set of trajectory records; and 0 means for providing as output, marks or indices representing the possible interpretation for the input expression having the highest a posteriori probability, and the highest a posteriori probability, determined. 31. The system according to claim 27, characterized in that the means for analyzing the segment comprise a set of adjustable parameters which specify the relationship between the segment 0 provided for the analysis and the set of records produced from the means of segment analysis.
32. The system according to claim 31, characterized in that it also comprises system training means 5 for training the system using a plurality of training data sets, each of the training data sets includes a set of data acquired from an input expression and an interpretation that is known to be correct for the input expression, the means of system training include means of adjusting the parameter to increasingly adjust the , 'set of adjustable parameters so that the measurement of the average probability for the interpretation that is known to be correct is increased, and the measurement of the average probability for the set of interpretations that are known to be incorrect, diminish.
33. The system according to claim 27, characterized in that the input expression is recorded graphically on a recording medium.
34. The method according to claim 33, characterized in that the input expression is expressed using printed or cursive writing techniques.
35. A system to form an interpretation of an input expression, where the expression of input is expressed in a medium, the interpretation is a sequence of symbols, and each of the symbols is an element in a predetermined set of symbols, the system is characterized in that it comprises: means of acquiring the image to acquire an image of the input expression; means of processing the image to process the image to form a set of segments of the image, each of the segments of the image being a sub-image of the acquired image; means of specifying the co-segmentation of the image to produce data specifying a set of co-segmentations of the image, each of the co-segmentations of the image consist of a set of the segments of the image that collectively represent the acquired image and that are arranged or distributed in an order that substantially preserves the spatial structure of the acquired image; means of specifying the interpretation of the sequence of symbols to produce data specifying a set of possible interpretations for the input expression, each of the possible interpretations for the input expression consist of a sequence of symbols, each symbol in the sequence of symbols is selected from the set of symbols default and that occupies the position of the symbol in the sequence of symbols; means of storing the data to store in a data structure, the data produced representative of each one, of the co-segmentations of the image and each of the possible interpretations for the input expression, wherein the data structure. it is graphically representable by a directed acyclic graph comprising a two-dimensional arrangement of nodes distributed in rows and columns and selectively connected by the directed arcs, and where each of the columns of nodes is indexable by a position of the symbol, and each of the rows of the nodes are indexable by a segment of the image in an order that corresponds to the spatial structure of the acquired image, and where each path that extends through the nodes and along the directed arcs represents one of the co-segmentations of the image and one of the possible interpretations for the expression of input, and the totality of the co-segmentations of the image and the totality of the possible interpretations for the expression of input are represented by the set of trajectories that extend through the graph; means of analyzing the segment of the image, to analyze each of the segments of the image, and to produce, for each * of the rows of nodes in the graph, a set of records for the predetermined set of symbols, represented by each node in the row; means for calculating the trajectory record to calculate a trajectory record for each of the trajectories through the graph; and means of analyzing the trajectory record, to analyze the records of the calculated trajectory, to select one or more of the possible interpretations for the input expression.
36. The system according to claim 37, characterized in that the input expression is recorded graphically on an electrically passive medium.
37. The system according to claim 35, characterized in that the input expression is recorded graphically on an electrically active medium.
38. The method according to claim 35, characterized in that the input expression is written using printed writing techniques or italics,
39. A system for forming an interpretation of an input expression, where the input expression is expressed in a medium, the interpretation is a sequence of symbols, and each of the symbols is an element in a predetermined set of symbols, the system - is characterized in that it comprises: means for providing a set of input data and a verified symbol sequence, for each of a plurality of known input expressions; segment producing means for analyzing each of the input data sets and for dividing the input data set into a plurality of segments, each of the segments having specifiable limits and which are classifiable as the representation of any one of a plurality of symbols in the predetermined symbol set; segment registration means characterized by one or more adjustable parameters, and having means to analyze each of the segments and to assign a set of records to each possible classification of the segment in a manner dependent on one or more adjustable parameters, where each record in each of the assigned sets of records are associated with a particular symbol in the predetermined symbol set; representation means for representing a plurality of possible symbol sequences and a plurality of co-segmentations of the image, wherein each of the possible symbol sequences consists of a different sequence of the symbols, and each of *. * the co-segmentations consists of a different sequence of the segments; co-segmentation logging means for assigning records to the plurality of co-segmentations based on the records assigned to the segments; means for registering the symbol sequence for assigning a record to each of the verified symbol sequences based on the registers assigned to one or more of the plurality of co-segmentations; first means of evaluating the record to evaluate the records assigned to the verified sequence of symbols; second record evaluation means for evaluating the registers assigned to the plurality of possible symbol sequences; producing means of standardized records to produce a standardized record for the verified sequence of symbols, using the evaluated record for the plurality of possible sequences of symbols; means of estimating the sensitivity to estimate the sensitivity of the standardized records produced with respect to one or more adjustable "parameters"; and means of adjusting the parameters to adjust one or more of the usable parameters to increase the average probability so that each of the segments is correctly classified and to decrease the average probability that each of the segments is incorrectly classified. . In testimony of which I sign this in this City of Mexico, D.F., on July 31, 1995. Attorney SUMMARY OF THE INVENTION The present invention relates to a method and system for forming an interpretation of an input expression, wherein the input expression is expressed in a medium, the interpretation is a sequence of symbols, and each symbol is a symbol in a set of known symbols. In general, the system processes a set of acquired input data, representative of the input expression, to form a set of segments, which are then used to specify a set of co-segmentations. Each co-segmentation and each possible interpretation for the input expression is represented in a data structure. The data structure is graphically representable by a graph comprising a two-dimensional arrangement of nodes distributed in rows and columns, and selectively connected by directed arcs. Each trajectory, which extends through the nodes and along the directed arcs, represents a co-segmentation and a possible interpretation for the input expression. The totality of the co-segmentations and the totality of the possible interpretations for the input expression, are represented by the set of trajectories that extends through the graph. For each row for the set of known symbols, using a complex of neural information processing networks optimally trained. After this, the system calculates an a posteriori probability for one or more interpretations of the symbol sequence. Deriving each of the probabilities a posteriori only through the analysis of the acquired input data set, highly reliable probabilities are produced for the interpretations that are in competition, for the input expression. The principles of the present invention can be practiced with series or strings of characters written cursivamelfte, of arbitrary length and can be easily adapted for use in speech recognition systems.
MXPA/A/1995/003295A 1994-08-04 1995-07-31 System and method for automated interpretation of input expressions using novel a posteriori probability measures and optimally trained information processing networks MXPA95003295A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US08286147 1994-08-04

Publications (1)

Publication Number Publication Date
MXPA95003295A true MXPA95003295A (en) 2001-12-04

Family

ID=

Similar Documents

Publication Publication Date Title
EP0696013B1 (en) System and method for automated interpretation of input expressions using novel a posteriori probability measures and optimally trained information processing networks
Mahdavi et al. ICDAR 2019 CROHME+ TFD: Competition on recognition of handwritten mathematical expressions and typeset formula detection
US11715014B2 (en) System and method of character recognition using fully convolutional neural networks with attention
Chen et al. Variable duration hidden Markov model and morphological segmentation for handwritten word recognition
US7756335B2 (en) Handwriting recognition using a graph of segmentation candidates and dictionary search
Senior A combination fingerprint classifier
US6052481A (en) Automatic method for scoring and clustering prototypes of handwritten stroke-based data
US6513005B1 (en) Method for correcting error characters in results of speech recognition and speech recognition system using the same
JP2667951B2 (en) Handwriting recognition device and method
JP5217127B2 (en) Collective place name recognition program, collective place name recognition apparatus, and collective place name recognition method
EP0608708A2 (en) Automatic handwriting recognition using both static and dynamic parameters
EP1362322A2 (en) Holistic-analytical recognition of handwritten text
EP3539052A1 (en) System and method of character recognition using fully convolutional neural networks with attention
JP2003524258A (en) Method and apparatus for processing electronic documents
Sundaram et al. Bigram language models and reevaluation strategy for improved recognition of online handwritten Tamil words
Ganai et al. Projection profile based ligature segmentation of Nastaleeq Urdu OCR
Nath et al. Improving various offline techniques used for handwritten character recognition: a review
JPH0954814A (en) Analysis of input signal expression and scoring system of possible interpretation of input signal expression
El-Mahallawy A large scale HMM-based omni front-written OCR system for cursive scripts
Malik A Graph Based Approach for Handwritten Devanagri Word Recogntion
MXPA95003295A (en) System and method for automated interpretation of input expressions using novel a posteriori probability measures and optimally trained information processing networks
Rodríguez-Serrano et al. Handwritten word image retrieval with synthesized typed queries
Frinken et al. Self-training strategies for handwriting word recognition
Mehta et al. Optical music notes recognition for printed piano music score sheet
Arica An off-line character recognition system for free style handwriting