JPH0954814A

JPH0954814A - Analysis of input signal expression and scoring system of possible interpretation of input signal expression

Info

Publication number: JPH0954814A
Application number: JP7218266A
Authority: JP
Inventors: John Bergs Christopher; ジョンバーグスクリストファー; Stewart Denker John; スチュワードデンカージョン
Original assignee: AT&T Corp
Current assignee: AT&T Corp
Priority date: 1995-08-04
Filing date: 1995-08-04
Publication date: 1997-02-25

Abstract

PROBLEM TO BE SOLVED: To analyze an input symbol expression by dividing an input data set showing the input symbol expression into plural specified segments, evaluating scores allocated to identified candidate interpretations and generating a normalized score. SOLUTION: Image acquisition A acquires an image I of a character string such as a handwritten character on a recording medium 6 and stores it in a frame buffer. Image processing B preprocesses image normalization, etc., has a boundary capable of designating the input symbol expression by image segment formation D and divides it into plural segments which can be classified. Image segment analysis G analyzes the respective segments and allocates scores to the respective classes of segments attached to a specified symbol in the prescribed input symbol set. Character string interpretation I identifies candidate symbol interpretation based on the allocated score, evaluates the score allocated to this candidate interpretation and while using the evaluated scores concerning these plural kinds of possible analysis, the normalized score concerning each kind of candidate interpretation is generated.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は新規な帰納的確率測
度及び最適に学習されたニューラル情報処理ネットワー
クを用いる、手書き文字のような入力記号表現を自動的
に解釈する方法及びシステムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and system for automatically interpreting input symbolic representations, such as handwriting, using a novel inductive probability measure and optimally learned neural information processing network.

【０００２】[0002]

【従来の技術】現在、様々な媒体に記録された適当に結
合されたアルファベット文字の列を正確に解釈（すなわ
ち、認識）できる装置の開発が市場で強く求められてい
る。例えば、米国の郵政省は、郵便の蓄積及び全国への
配達経路指定操作中に、郵便物に手書きされたＺＩＰコ
ード（すなわち、郵便番号）を正確に認識するために、
このような装置の一刻も早い開発を強く望んでいる。2. Description of the Related Art Currently, there is a strong demand in the market for development of an apparatus capable of accurately interpreting (i.e., recognizing) a properly combined string of alphabetic characters recorded on various media. For example, the U.S. Department of Posts and Telecommunications to accurately recognize a ZIP code (ie, zip code) handwritten on a mail piece during postal accumulation and national routing operations.
We strongly desire the rapid development of such a device.

【０００３】現在、多数の文字認識システムが様々な環
境下で使用するために開発されている。このような様々
なシステム及び関連技術は下記の技術文献に開示されて
いる。 (1) Y. Le Cun, B. Boser, J.S. Denker, D. Henderso
n, R.E. Howard, W. Hubbard, and L.D. Jackel, "Hand
writtten Digit Recognition with a Back-Propagation
Network", pp. 396-404 in Advances in Neural Infor
mation Processing2, David Touretzky, ed., Morgan K
aufman (1990),(2) J.S. Bridle, "Probabilistic Inte
rpretation of Feedforward Classification Network O
utputs, with Relationships to Satistical Pattern R
ecognition", in Neuro-Computing: Algorithms, Archi
tectures and apications, F. Fogelman and J. Heraul
t, ed., Springer-Verlag (1989),(3) J.S. Bridle, "T
raining Stochastic Model Recognition Algorithms as
Networks Can Lead To Maximum Mutual Information E
stimation of Parameters",in Advances in Neural Inf
ormation Processing 2, David Touretzky, ed., Morga
n Kaufman (1990),(4) O. Matan, J. Bromley, C.J.C.
Burges, J.S. Denker, L.D. Jackel, Y. LeCun, E.P.D.
Pednault, W.D. Satterfield, C.E. Stenard, and T.
J. Thompson, "Reading Handwritten Digit: A ZIP cod
e Recognition System", IEEE Computer 25(7)59-63 (J
uly 1992),(5) C.J.C. Burges, O. Matan, Y. Le Cun,
J.S. Denker, L.D. Jackel, C.E. Stenard, C.R. Nohl,
J.I. Ben, "Shortest Path Segmentation: A Method f
or Training a Neural Network to Recognize Characte
r Strings", IJCNN Conference proceedings 3,pp.165-
172 (June 1992),(6) C.J.C. Burges, O. Matan, J. Br
omley, C.E. Stenard, "Rapid Segmentation and Class
ification of Handwritten Postal Delivery Addresses
using Neural Network Technology", Interim Report,
Task Order Number 104230-90-C-2456, USPS Referenc
e Library, Washington D.C. (August 1991),(7) Edwin
P.D. Pednault, "A Hidden Markov Model For Resolvi
ng Segmentation and Interpretation Ambiguities in
Unconstrained Handwriting Recognition", Ball Labs
Technical Memorandum 11352-090929-01TM, (1992),
及び(8) Ofer Matan, C.J.C. Burges, Y. Le Cun, J.S.
Denker, "Multi-Digit Recognition Using a Space Di
splacement Neural Network", in Neural Information
Processing System 4, J.M. Moody, S.J. Hanson and
R.P. Lippman, eds.,Morgan Kaufman (1990)。Many character recognition systems are currently being developed for use in various environments. Such various systems and related techniques are disclosed in the following technical documents. (1) Y. Le Cun, B. Boser, JS Denker, D. Henderso
n, RE Howard, W. Hubbard, and LD Jackel, "Hand
writtten Digit Recognition with a Back-Propagation
Network ", pp. 396-404 in Advances in Neural Infor
mation Processing2, David Touretzky, ed., Morgan K
aufman (1990), (2) JS Bridle, "Probabilistic Inte
rpretation of Feedforward Classification Network O
utputs, with Relationships to Satistical Pattern R
ecognition ", in Neuro-Computing: Algorithms, Archi
tectures and apications, F. Fogelman and J. Heraul
t, ed., Springer-Verlag (1989), (3) JS Bridle, "T
raining Stochastic Model Recognition Algorithms as
Networks Can Lead To Maximum Mutual Information E
stimulation of Parameters ", in Advances in Neural Inf
ormation Processing 2, David Touretzky, ed., Morga
n Kaufman (1990), (4) O. Matan, J. Bromley, CJC
Burges, JS Denker, LD Jackel, Y. LeCun, EPD
Pednault, WD Satterfield, CE Stenard, and T.
J. Thompson, "Reading Handwritten Digit: A ZIP cod
e Recognition System ", IEEE Computer 25 (7) 59-63 (J
uly 1992), (5) CJC Burges, O. Matan, Y. Le Cun,
JS Denker, LD Jackel, CE Stenard, CR Nohl,
JI Ben, "Shortest Path Segmentation: A Method f
or Training a Neural Network to Recognize Characte
r Strings ", IJCNN Conference proceedings 3, pp.165-
172 (June 1992), (6) CJC Burges, O. Matan, J. Br
omley, CE Stenard, "Rapid Segmentation and Class
ification of Handwritten Postal Delivery Addresses
using Neural Network Technology ", Interim Report,
Task Order Number 104230-90-C-2456, USPS Referenc
e Library, Washington DC (August 1991), (7) Edwin
PD Pednault, "A Hidden Markov Model For Resolvi
ng Segmentation and Interpretation Ambiguities in
Unconstrained Handwriting Recognition ", Ball Labs
Technical Memorandum 11352-090929-01TM, (1992),
And (8) Ofer Matan, CJC Burges, Y. Le Cun, JS
Denker, "Multi-Digit Recognition Using a Space Di
splacement Neural Network ", in Neural Information
Processing System 4, JM Moody, SJ Hanson and
RP Lippman, eds., Morgan Kaufman (1990).

【０００４】前記の文献に記載された従来のシステムは
互いに区別できるが、これらのシステムが共通に共有す
る構造及び機能特徴により最も特徴付けられる。While the conventional systems described in the above references are distinguishable from each other, they are best characterized by the structural and functional features they share in common.

【０００５】特に、従来技術の各システムは、システム
により解釈されるべき、場合により接続されている文字
列の少なくとも一つの画像Ｉを取得する。一般的に、所
定のアルファベットの場合、システムが選択しなければ
ならない“最良”解釈を含む可能解釈の数は、アルファ
ベット内の文字及び適用可能な形態的制約を用いて一緒
に数珠繋ぎにすることができる可能文字列の数に等し
い。ＺＩＰコード（郵便番号）認識用途では、各許容可
能解釈は、ＺＩＰコードの長さにより制約される。すな
わち、ＺＩＰコードは５又は９桁でなければならない。In particular, each prior art system acquires at least one image I of an optionally connected character string which is to be interpreted by the system. In general, for a given alphabet, the number of possible interpretations, including the “best” interpretation that the system must choose, can be stringed together using the letters in the alphabet and applicable morphological constraints. Equal to the number of possible strings. In ZIP code (postal code) recognition applications, each acceptable interpretation is constrained by the length of the ZIP code. That is, the ZIP code must be 5 or 9 digits.

【０００６】従来技術によれば、下線、空間ノイズなど
を除去するために、文字列の取得画像は一般的に、前処
理される。次いで、この前処理画像Ｉは、管理可能なサ
イズの副画像へ“カット”又は分割される。各隣接カッ
トラインの組の間の副画像は画像“セル”と呼ばれる。
或る場合には、２つのセル間の境界は、２つの文字間に
明確に含まれる“確定カット”であると決定される。一
方、他の場合には、カットは不明確と見做され、カット
が２つの文字間に含まれるか否かの決定は、更なる処理
が行われるまで、延期される。According to the prior art, the captured image of the character string is generally pre-processed to remove underlining, spatial noise, etc. This preprocessed image I is then "cut" or divided into manageable sized sub-images. The sub-images between each set of adjacent cutlines are called image "cells".
In some cases, the boundary between two cells is determined to be a "determined cut" that is explicitly included between the two characters. On the other hand, in other cases, the cut is considered ambiguous and the determination of whether the cut is contained between two characters is deferred until further processing.

【０００７】次いで、画像“セグメント”を生成するた
めに、隣接画像セルは結合される。その後、前処理画像
の殆ど全ての画素を包含する許容可能な画像“コンセグ
メンテーション”を生成するために、画像セグメントは
左から右へ、一緒に数珠繋ぎにされる。特に、許可可能
な画像“コンセグメンテーション”のモデルを構成する
ために、非環式（鎖状）有向グラフが使用される。一般
的に、このモデルは、各画像セグメントに有向非環式グ
ラフ内のノードを付随させることにより構成される。Adjacent image cells are then combined to produce an image "segment". The image segments are then stitched together from left to right to produce an acceptable image "consegmentation" that includes almost all the pixels of the preprocessed image. In particular, an acyclic (chain) directed graph is used to construct a model of admissible image "consegmentation". In general, this model is constructed by associating each image segment with a node in a directed acyclic graph.

【０００８】次いで、グラフ内のノードは有向弧と接続
される。一般的に、ノードが示す画像セグメントが許容
可能な画像コンセグメンテーション内で合法的に隣接し
ている場合にのみ、グラフ内の２つのノードは接続され
る。The nodes in the graph are then connected with directed arcs. In general, two nodes in a graph are connected only if the image segments they represent are legally adjacent within an acceptable image consegmentation.

【０００９】グラフが完全に構成されると、グラフ内を
通る各パスは前処理画像の画像コンセグメンテーション
に対応し、また、全ての可能画像コンセグメンテーショ
ンはグラフ内を通る特定のパスに対応する。グラフが構
成された後、再帰的剪定（枝刈り）技法を使用し、グラ
フから、前処理画像を通る明確なカットラインに出会う
画像セグメントに対応するノードを除去する。When the graph is fully constructed, each pass through the graph corresponds to an image concatenation of the preprocessed image and all possible image concatenations correspond to a particular pass through the graph. After the graph has been constructed, a recursive pruning technique is used to remove from the graph the nodes corresponding to image segments that meet a distinct cutline through the preprocessed image.

【００１０】グラフが剪定された後、剪定グラフ内に残
っているノードに付随する各画像セグメントは、分類及
びスコアリングのために、ニューラルネットワーク認識
器に送られる。このような分類及びスコアリングに基づ
き、剪定グラフ内の各ノードは、付随画像セグメントに
割当てられた認識器スコアから導出される“スコア”が
割当てられる。After the graph has been pruned, each image segment associated with the remaining nodes in the pruned graph is sent to a neural network recognizer for classification and scoring. Based on such classification and scoring, each node in the pruning graph is assigned a "score" that is derived from the recognizer score assigned to the satellite image segment.

【００１１】一般的に、各認識器スコアは、認識器スコ
アを正規化することからなる計算手順により、確率に変
換される。その後、パススコア（すなわち、結合確率）
は剪定グラフを通る各パスについて計算される。この計
算は、例えば、パスに沿ってノードに割当てられた“ス
コア”を単に乗算することにより行われる。この多文字
認識（ＭＣＲ）方式により、剪定グラフを通る最高スコ
アリングパスは、取得画像に関する“最良”画像コンセ
グメンテーション及び文字列解釈に対応する。Generally, each recognizer score is converted into a probability by a computational procedure consisting of normalizing the recognizer score. Then the pass score (ie, the joint probability)
Is calculated for each pass through the pruning graph. This calculation is done, for example, by simply multiplying the "score" assigned to the node along the path. With this multi-character recognition (MCR) scheme, the highest scoring pass through the pruning graph corresponds to the "best" image consegmentation and string interpretation for the acquired image.

【００１２】これらの技術に関する詳細な説明は、１９
９１年１２月３１日に出願された米国特許出願第０７／
８１６４１４号明細書及び同第０７／８１６４１５号明
細書に開示されている。A detailed description of these techniques is given in 19
US Patent Application No. 07 / filed December 31, 1991
No. 8164414 and No. 07/816415.

【００１３】従来の方法は市販及び実験用文字認識シス
テムの設計には有用であるが、このようなシステムの性
能は、特に、要求度の高いリアルタイム用途では理想的
なものではない。特に、従来のＭＣＲシステムは一般的
に、所定の解釈をサポートする一つのコンセグメンテー
ションだけを識別することにより動作する。この方法
は、唯一の“最良”コンセグメンテーションが存在する
という概念を前提にしている。While conventional methods are useful in the design of commercial and experimental character recognition systems, the performance of such systems is not ideal, especially for demanding real-time applications. In particular, conventional MCR systems generally operate by identifying only one consegmentation that supports a given interpretation. This method presupposes the concept that there is only one "best" consegmentation.

【００１４】このような従来の方法では、この一つの
“最良”コンセグメンテーションのスコアは、認識プロ
セス中に考究される唯一のスコアである。従って、従来
のＭＣＲシステムは、正しい画像コンセグメンテーショ
ンが既知であると不正確に仮定することと同等である方
法を使用する。この仮定に依拠して、容認コード又はア
ルファベットにおける特定の文字に関する確率を計算す
るために、個々の文字スコアが正規化される。In such conventional methods, this single "best" concatenation score is the only score considered during the recognition process. Therefore, conventional MCR systems use a method that is equivalent to incorrectly assuming that the correct image segmentation is known. Relying on this assumption, individual character scores are normalized to calculate the probability for a particular character in the acceptance code or alphabet.

【００１５】これは、セグメンテーションアルゴリズム
が画像の特定セグメントについて行われた方法に関する
有用な情報を回復不能に廃棄してしまう。このような仮
定に基づく従来のＭＣＲシステムは、しばしば、“最尤
シーケンス推定”（ＭＬＳＥ）マシーンと呼ばれる。This irreversibly discards useful information about how the segmentation algorithm was done for a particular segment of the image. Conventional MCR systems based on such assumptions are often referred to as "maximum likelihood sequence estimation" (MLSE) machines.

【００１６】画像の解釈の選択に加えて、従来の幾つか
のＭＣＲシステムは、しばしば、選択された解釈が正し
い確率の何らかの表示を与えることを意味するスコアを
提供する。多くの用途では、ＭＣＲシステムのこの結果
を他の情報源と結合することを容易にするため、正確な
確率として解釈できるスコアを有することが望ましい。
しかし、従来のＭＣＲシステムは“最良”解釈の選択を
強調する傾向があったが、正確なスコアリングは強調し
ていない。従って、スコアはしばしば数桁程度の相当大
きな組織的エラーを包含する。In addition to the choice of image interpretation, some conventional MCR systems often provide a score which means that the chosen interpretation gives some indication of the correct probability. In many applications it is desirable to have a score that can be interpreted as an exact probability to facilitate combining this result of the MCR system with other sources.
However, conventional MCR systems tended to emphasize the choice of "best" interpretation, but not accurate scoring. Therefore, scores often include significant systematic errors, on the order of several orders of magnitude.

【００１７】従って、様々な媒体で表示される記号シー
ケンスを解釈するための優れた方法及びシステムの開発
が強く望まれている。Therefore, it is highly desirable to develop a good method and system for interpreting symbol sequences displayed on various media.

【００１８】[0018]

【発明が解決しようとする課題】従って、本発明の一般
的目的は、例えば、印刷又は筆記体書込技術により媒体
に表示又は記録された文字列のような入力記号表現を解
釈する優れた方法及びシステムを提供することである。Accordingly, it is a general object of the present invention to provide a superior method of interpreting input symbolic representations such as strings displayed or recorded on a medium by, for example, printing or cursive writing techniques. And to provide a system.

【００１９】本発明の別の目的は、最良文字列解釈の選
択に、帰納的確率を使用する自動化文字列解釈の方法及
びシステムを提供することである。Another object of the invention is to provide a method and system for automated string interpretation that uses inductive probabilities in selecting the best string interpretation.

【００２０】本発明の別の目的は、各帰納的確率が、先
験的情報と既知例の画素画像と結合することにより帰納
的に導出される、自動化文字列解釈の方法及びシステム
を提供することである。Another object of the present invention is to provide a method and system for automated string interpretation in which each a posteriori probability is derived a posteriori by combining a priori information with a known example pixel image. That is.

【００２１】本発明の別の目的は、任意の長さの文字列
を解釈することができ、しかも、自動文章解釈システム
などと容易に併用することができる、自動文字列解釈の
方法及びシステムを提供することである。Another object of the present invention is to provide an automatic character string interpretation method and system which can interpret a character string of an arbitrary length and can be easily used together with an automatic text interpretation system. Is to provide.

【００２２】本発明の別の目的は、正しい文字列解釈の
スコアを最大にするために最適に学習されたニューラル
計算ネットワークの複合体により行われる単一のアダプ
ティブ学習プロセスに、画像コンセグメンテーション及
び文字列解釈に結合させる、多文字認識の方法を提供す
ることである。Another object of the present invention is to provide a single adaptive learning process performed by a complex of neural computational networks that has been optimally trained to maximize the correct string interpretation score. The purpose is to provide a method of multi-character recognition that is coupled to column interpretation.

【００２３】本発明の別の目的は、特別に変更された非
環式有向グラフに基づく新規なデータ構造を使用し、グ
ラフ内を通る各パスは画像コンセグメンテーション及び
文字列解釈の双方を示すシステムを提供することであ
る。Another object of the invention is to use a new data structure based on a specially modified acyclic directed graph, where each pass through the graph exhibits both image consegmentation and string interpretation. Is to provide.

【００２４】本発明の別の目的は、画像の被選択可能解
釈に、スコア（特に、この被選択解釈の確率の正確な推
定値として解釈できるスコア）を割当てることである。Another object of the present invention is to assign a score to a selectable interpretation of an image, especially a score that can be interpreted as an accurate estimate of the probability of this selected interpretation.

【００２５】本発明の別の目的は、特定の文字列解釈の
各々に割当てられた帰納的確率が比率として定義され、
この比率の分子部分は、同じ文字列解釈を示すグラフ内
を通る全てのパスに沿ってパススコアを加算することに
より計算され、比率の分母部分は、同じ文字数の可能文
字列解釈を全て示すグラフ内を通る全てのパスに沿って
パススコアを加算することにより計算されるシステムを
提供することである。Another object of the invention is that the inductive probability assigned to each particular string interpretation is defined as a ratio,
The numerator part of this ratio is calculated by adding the path scores along all paths through the graph showing the same string interpretation, and the denominator part of the ratio is the graph showing all possible string interpretations with the same number of characters. It is to provide a system that is calculated by adding the path scores along all paths through it.

【００２６】本発明の別の目的は、携帯型デバイスとし
て実現させることができる、多文字手書認識システムを
提供することである。Another object of the present invention is to provide a multi-character handwriting recognition system which can be realized as a portable device.

【００２７】本発明の別の目的は、文字列解釈の方法を
提供することであり、該方法は、どの文字列解釈が最良
パススコアを有するか識別するためにヴィテルビ(Viter
bi)アルゴリズムを使用し、次いで、ヴィテルビアルゴ
リズムにより識別された文字列解釈を示す全パススコア
の正確な和を計算するために順方向アルゴリズムを使用
し、そして、全ての可能文字列解釈を示すグラフ内を通
る全てのパススコアを加算することにより、前記の正確
に計算された和に関する正規化定数を計算するために順
方向アルゴリズムを使用することからなる。Another object of the present invention is to provide a method of string interpretation which is used to identify which string interpretation has the best pass score.
bi) algorithm, then use the forward algorithm to compute the exact sum of all pass scores that indicate the string interpretation identified by the Viterbi algorithm, and indicate all possible string interpretations. It consists of using a forward algorithm to calculate the normalization constant for the correctly calculated sum by adding all the pass scores through the graph.

【００２８】本発明の別の目的は、文字列解釈の方法を
提供することであり、該方法は、最良のパススコアセッ
トを有する多数の競合文字列解釈を識別するためにビー
ムサーチアルゴリズムを使用し、ヴィテルビアルゴリズ
ムにより識別された競合文字列解釈を示す全パススコア
の正確な和を、各文字列解釈について計算するために順
方向アルゴリズムを使用し、そして、その後、全ての可
能文字列解釈を示すグラフ内を通る全てのパススコアを
加算することにより、各競合文字列解釈について、単一
の正規化定数を計算するために順方向アルゴリズムを使
用することからなる。Another object of the invention is to provide a method of string interpretation that uses a beam search algorithm to identify the multiple competing string interpretations that have the best pass score set. And then use the forward algorithm to compute the exact sum of all path scores indicating the competing string interpretations identified by the Viterbi algorithm for each string interpretation, and then all possible string interpretations. Consisting of using the forward algorithm to compute a single normalization constant for each competing string interpretation by adding all the pass scores through the graph showing

【００２９】本発明の別の目的は、１つ以上の学習セッ
ション中にニューラルネットワークのパラメータを最適
に調整することによりシステムを学習させるために、グ
ラフ及びニューラル情報処理ネットワークの複合体の両
方を使用することからなる、操作の学習モードを有する
入力記号表現解釈システムを提供することである。Another object of the present invention is to use both the graph and neural information processing network complex to train the system by optimally adjusting the parameters of the neural network during one or more learning sessions. To provide an input symbolic representation interpretation system having a learning mode of operation.

【００３０】本発明の別の目的は、正しいと知られてい
る文字列解釈の帰納的確率が増大し、正しくないと知ら
れている解釈の帰納的確率が低下する方向に、ニューラ
ルネットワーク内の各調整可能パラメータを調整するた
めに、ニューラルネットワーク学習中に感度分析が使用
されるシステムを提供することである。Another object of the present invention is to increase the inductive probability of string interpretations known to be correct and decrease the inductive probability of interpretations known to be incorrect, in neural networks. To provide a system in which sensitivity analysis is used during neural network learning to adjust each tunable parameter.

【００３１】本発明の別の目的は、ニューラルネットワ
ークの各調整可能なパラメータに対して為された増分変
化に応答するシステムの全体的変化により生成されたこ
れらのスコアを高感度に計算するために、操作の学習モ
ード中に、バウム−ウエルチ(Baum-Welch)アルゴリズム
を使用することからなる、入力記号表現解釈システムを
提供することである。Another object of the invention is to sensitively calculate these scores produced by the overall change in the system in response to the incremental changes made to each adjustable parameter of the neural network. , Providing an input symbolic representation interpretation system which consists of using the Baum-Welch algorithm during the learning mode of operation.

【００３２】[0032]

【課題を解決するための手段】前記課題を解決するため
に、本発明は、印刷又は筆記体書込技術を用いて、媒体
に表現された入力記号表現の解釈を生成する方法及びシ
ステムを提供する。In order to solve the above problems, the present invention provides a method and system for generating an interpretation of an input symbolic representation rendered on a medium using a printing or cursive writing technique. To do.

【００３３】一般的に、本発明のシステムは入力記号表
現を示す入力データセットを取得する。取得された入力
データセットは一連のセグメントに分割される。次い
で、この一連のセグメントは、一連のセグメンテーショ
ンを指定するために使用される。その後、本発明のシス
テムは、入力記号表現の各コンセグメンテーション及び
各可能解釈を暗黙的に示すために、新規なデータ構造を
使用する。In general, the system of the present invention obtains an input data set representing an input symbolic representation. The acquired input data set is divided into a series of segments. This series of segments is then used to specify a series of segmentations. The system of the present invention then uses the novel data structure to implicitly indicate each concatenation and each possible interpretation of the input symbolic representation.

【００３４】データ構造は、行と列に配列され、有向弧
により選択的に連結されたノードの２次元アレーからな
る有向非環式（鎖状）グラフとして示すことができる。
ノード内を通り、有向弧に沿って延びる各パスは、入力
記号表現の一つのコンセグメンテーションと一つの可能
解釈を示す。入力記号表現の全てのコンセグメンテーシ
ョンと全ての可能解釈は、グラフ内を通して延びる一連
のパスにより暗に示される。The data structure can be represented as a directed acyclic (chain) graph consisting of a two-dimensional array of nodes arranged in rows and columns, selectively connected by directed arcs.
Each path that passes through the node and along the directed arc represents one consegmentation and one possible interpretation of the input symbolic representation. All concatenations and all possible interpretations of the input symbolic representation are implied by a series of paths extending through the graph.

【００３５】グラフ内の各ノード行の場合、例えば、最
適に学習されたニューラル情報処理ネットワークを用い
て、既知の記号セットについて一連のスコアが生成され
る。グラフに関連して、これらのスコアに、グラフ内を
通る各パスに関するパススコアを暗黙的に割当てる。こ
れらのパススコアを用いて、本発明のシステムは最良の
記号シーケンス解釈を識別し、これらに関する帰納的確
率を計算する。For each node row in the graph, a series of scores is generated for a known set of symbols using, for example, an optimally trained neural information processing network. Related to the graph, these scores are implicitly assigned a path score for each path through the graph. Using these pass scores, the system of the present invention identifies the best symbol sequence interpretations and computes the a posteriori probabilities for them.

【００３６】殆ど全ての取得入力データセットを分析し
て各帰納的確率を導出することにより、各記号シーケン
ス解釈に関する高信頼性確率を生成する。本発明の原理
は、任意の長さの走り書き的に書き込まれた文字列など
のような殆ど全ての記号表現シーケンスについて実施で
きる。本発明のシステムは、自動文章解釈システムと共
に使用するのにも容易に適合させることができる。Highly reliable probabilities for each symbol sequence interpretation are generated by analyzing almost all acquired input data sets to derive each inductive probability. The principles of the present invention can be implemented for almost any symbolic representation sequence, such as a scribbled string of arbitrary length. The system of the present invention can be easily adapted for use with an automatic sentence interpretation system.

【００３７】本発明のシステムは、グラフ内を通る最高
スコアリングパスを有する文字列解釈を決定する。この
解釈が信頼できるか否か決定するために、本発明のシス
テムは、出力として、この文字列解釈の帰納的確率も生
成する。この確率は分母部分に対する分子部分の比率と
して計算される。分子部分は、所定の文字列解釈を示す
グラフ内を通る全てのパスのパススコアの和に等しい。The system of the present invention determines the string interpretation that has the highest scoring path through the graph. In order to determine if this interpretation is reliable, the system of the invention also produces as output the recursive probability of this string interpretation. This probability is calculated as the ratio of the numerator part to the denominator part. The numerator is equal to the sum of the path scores of all paths that pass through the graph showing the given string interpretation.

【００３８】分母部分は、全ての可能文字列解釈を示す
グラフ内を通る全てのパスのパススコアの和に等しい。
確率が所定の閾値未満である場合、ユーザは、この解釈
は信頼できることを保証できず、その結果、ユーザは、
更なる動作の前に、他のステップに取りかからなければ
ならないことを告知される。The denominator part is equal to the sum of the pass scores of all the passes through the graph showing all possible string interpretations.
If the probability is less than a predetermined threshold, the user cannot guarantee that this interpretation is reliable, so that the user
Before further action, it is announced that another step must be taken.

【００３９】本発明の別の実施例では、本発明のシステ
ムは、最高の一連のパススコアを有するグラフ内を通る
一連のパスを最初に発見する。この一連のパス内の各パ
スについて、本発明のシステムは、対応する文字列解釈
を識別し、そして、この解釈（同じ寄与を有する他のパ
スからの寄与を含む）の帰納的確率の数値を求める。本
発明のシステムは、発見された一連のパスにより示され
る可能文字列解釈群を識別する。In another embodiment of the present invention, the system of the present invention first finds the series of paths through the graph having the highest series of path scores. For each path in this series of paths, the system of the present invention identifies the corresponding string interpretation and then computes the numerical value of the recursive probability of this interpretation (including contributions from other paths that have the same contribution). Ask. The system of the present invention identifies a set of possible string interpretations indicated by the set of discovered paths.

【００４０】次いで、可能文字列解釈群の帰納的確率を
計算する。どちらの可能文字列解釈が最大帰納的確率を
有するか決定するために、本発明のシステムは、計算さ
れた帰納的確率群を分析する。この分析に基づき、本発
明のシステムは、出力として、(i)高帰納的確率を有す
る一つ以上の文字列解釈及び(ii)各文字列解釈の帰納的
確率の正確な推定値を生成する。Next, the recursive probability of the possible character string interpretation group is calculated. In order to determine which possible string interpretation has the maximum inductive probability, the system of the present invention analyzes the calculated inductive probability group. Based on this analysis, the system of the present invention produces (i) one or more string interpretations with high recursive probabilities and (ii) an accurate estimate of the recursive probabilities of each string interpretation. .

【００４１】各競合文字列解釈に関する帰納的確率は、
分母部分に対する分子部分の比率として計算される。分
子部分は、競合文字列解釈を示すグラフ内を通る全ての
パスに関するパススコアの和に等しい。分母部分は、全
ての可能文字列解釈を示すグラフ内を通る全てのパスに
関するパススコアの和に等しい。The inductive probability for each competing string interpretation is
Calculated as the ratio of the numerator part to the denominator part. The numerator part is equal to the sum of the path scores for all paths through the graph showing competing string interpretations. The denominator part is equal to the sum of path scores for all paths through the graph showing all possible string interpretations.

【００４２】本発明の記号シーケンス解釈システムを最
適に学習させる新規な方法及びシステムも提供される。
これは、動作の独特な学習モードをシステムに付与する
ことにより行われる。Novel methods and systems for optimally learning the symbol sequence interpretation system of the present invention are also provided.
This is done by giving the system a unique learning mode of operation.

【００４３】この動作の学習モードでは、本発明のシス
テムは、既知の入力記号表現を示す多数の学習画像を処
理する。各被処理学習画像について、本発明のシステム
は、各ニューラルネットワークの機能を特徴付ける調整
可能なパラメータ群を増分的に調整する。各増分調整の
方向は、正しいと知られている文字列解釈に関する平均
確率が増大し、一方、正しくないと知られている記号シ
ーケンス解釈に関する平均確率が低下するような方向で
ある。In the learning mode of operation, the system of the present invention processes a large number of learning images showing known input symbolic representations. For each processed learning image, the system of the present invention incrementally adjusts a set of adjustable parameters that characterize the function of each neural network. The direction of each incremental adjustment is such that the average probability for a string interpretation known to be correct increases while the average probability for a symbol sequence interpretation known to be incorrect decreases.

【００４４】本発明のシステム及び方法は、例えば、当
業者に周知な、紙、プラスチック又は織物のような電気
的に受動的な媒体又は感圧式書込面及び“タッチスクリ
ーン”書込／表示面のような電気的に能動的な媒体への
図形記録を含めて、実質的に任意の方法で表示された文
字列を解釈するために使用される。The system and method of the present invention can be used, for example, with electrically passive media or pressure sensitive writing surfaces such as paper, plastic or fabric and "touch screen" writing / display surfaces well known to those skilled in the art. It is used to interpret a displayed character string in virtually any way, including graphic recording on electrically active media such as.

【００４５】[0045]

【発明の実施の形態】図１は多数の統合システム構成部
品からなる記号シーケンス解釈（すなわち、“認識”）
システム１のブロック図である。特に、このシステムは
プログラムメモリ記憶装置３に記憶されたプログラムに
より制御される１個以上の処理ユニット２（例えば、マ
イクロプロセッサ）を有する。プログラムメモリ記憶装
置３はオペレーティングシステムプログラム、アプリケ
ーションプログラム及び本発明の様々な画像処理ルーチ
ンなども記憶する。本発明のデータ構造に付随するデー
タを記憶するためのデータ記憶メモリ４も有する。DETAILED DESCRIPTION OF THE INVENTION FIG. 1 illustrates a symbol sequence interpretation (ie, "recognition") consisting of a number of integrated system components.
2 is a block diagram of the system 1. FIG. In particular, the system comprises one or more processing units 2 (eg microprocessors) controlled by a program stored in a program memory storage device 3. The program memory storage device 3 also stores an operating system program, application programs, and various image processing routines of the present invention. It also has a data storage memory 4 for storing the data associated with the data structure of the invention.

【００４６】一般的に、このシステムは、記号の表示シ
ーケンスを示す入力データセットを取得するための入力
データセット取得デバイス５を有する。このデバイス
は、図２に示されるような、記録媒体６に記録されるア
ルファベット文字の可能な接続列の濃度階調又は着色画
像を取得するための画像検出器として実現される。Generally, the system comprises an input data set acquisition device 5 for acquiring an input data set representing a display sequence of symbols. This device is realized as an image detector for obtaining a density gradation or a colored image of a possible connection sequence of alphabetic characters recorded on a recording medium 6, as shown in FIG.

【００４７】文字列は、紙、プラスチック、木、繊維な
どのような電気的に受動的な記録面又は、感圧デジタル
化表面又は当業者に周知な“タッチスクリーン”ＬＣＤ
書込及び表示表面のような電気的に能動的な記録面に記
録することができる。文字列は、常用の印刷又は筆記体
（すなわち、手書き）書込技術を用いて表現することが
できる。The strings may be electrically passive recording surfaces such as paper, plastic, wood, fibers, etc., or pressure sensitive digitized surfaces or "touch screen" LCDs well known to those skilled in the art.
It is possible to record on electrically active recording surfaces such as writing and display surfaces. The strings can be represented using conventional printing or cursive (ie, handwriting) writing techniques.

【００４８】本発明のシステムは、解釈すべき文字列の
取得画像をバッファするための、ランダムアクセスデー
タ記憶メモリ（例えば、ＶＲＡＭ）７を有する。大容量
データ記憶メモリ８がこれら画像の長期間記憶用に設け
られている。The system of the present invention comprises a random access data storage memory (eg VRAM) 7 for buffering the captured image of the character string to be interpreted. A mass data storage memory 8 is provided for long term storage of these images.

【００４９】本発明のシステムは、可視ディスプレイス
クリーン又は画面（ＬＣＤ）を有する可視ディスプレイ
ユニット９，キーボード又はその他のデータ入力デバイ
ス１０，ディスプレイ画面上に可視的に表示された図形
アイコンを指摘、ドラッギング及び選択するするための
デバイス１１，入力／出力デバイス１２，及びシステム
１により供給された情報を使用する１個以上の外部ホス
トシステム１４をインターフェースするためのシステム
インターフェース１３も包含する。The system of the present invention includes a visual display unit 9 having a visual display screen or screen (LCD), a keyboard or other data input device 10, pointing, dragging and displaying graphical icons visually displayed on the display screen. It also includes a device 11 for selecting, an input / output device 12, and a system interface 13 for interfacing with one or more external host systems 14 using the information provided by the system 1.

【００５０】システム構成部品２，３，４，７及び８は
手元の特定用途に適したコンパクトなハウジング内に収
納されている。その他の構成部品はそれぞれのハウジン
グ内に収納されている。これら各構成部品は、１本以上
のシステムバス１５を介してプロセッサ２に付随して動
作することができる。ＺＩＰコード（いわゆる、郵便番
号）認識用途では、本発明のシステムは、常用の郵便蓄
積及び経路指定装置１４により適当にインターフェース
される。The system components 2, 3, 4, 7 and 8 are housed in a compact housing suitable for the particular application at hand. The other components are housed in their respective housings. Each of these components can operate in association with the processor 2 via one or more system buses 15. For ZIP code (so-called postal code) recognition applications, the system of the present invention is suitably interfaced with a conventional mail storage and routing device 14.

【００５１】図２に示されるように、文字列解釈システ
ム１は、図形的に記録された文字の列の取得画像Ｉに包
含される画素情報の分析に基づき、図形的に記録された
“文字の列”（Ｃｉにより示される）の解釈に到達する
ために、多数の機能を果たす。これらの画像処理ステー
ジは、他の関連図面を参照しながら、下記で順に説明す
る。As shown in FIG. 2, the character string interpretation system 1 uses the graphically recorded "characters" based on the analysis of the pixel information included in the acquired image I of the graphically recorded character string. Perform a number of functions in order to arrive at the interpretation of the sequence "(indicated by Ci)". These image processing stages will be described in order below with reference to other related figures.

【００５２】一般的に、本発明のシステム及び方法は任
意の長さの機械印刷又は手書文字列に適用できる。従っ
て、本発明は手書認識用途で有用である。この場合、手
書きする筆者は、様々な種類の書込画面に１つ以上の文
字を書き込むことができ、あるいは、自動認識用に一つ
以上の文章を書き込むことができる。In general, the system and method of the present invention is applicable to machine-printed or handwritten strings of any length. Therefore, the present invention is useful in handwriting recognition applications. In this case, the handwriting writer can write one or more characters on various types of writing screens, or one or more sentences for automatic recognition.

【００５３】図１４及び図１７に示された実施例は手書
きＺＩＰコードの解釈（すなわち、分類）の問題を検討
するものであり、この場合、文字列長さは５又は９桁で
ある。しかし、本発明の方法及びシステムは、当業者に
公知の自動文章認識システムの長い文脈のような、任意
の長さの文字列（すなわち、単語群）を解釈するために
も使用できる。The embodiments shown in FIGS. 14 and 17 examine the problem of interpretation (ie, classification) of handwritten ZIP codes, where the string length is 5 or 9 digits. However, the method and system of the present invention can also be used to interpret strings (ie words) of arbitrary length, such as the long context of automatic sentence recognition systems known to those skilled in the art.

【００５４】図２において、ブロックＡ〜Ｉは、本発明
の文字列解釈処理中に行われる様々なステージを模式的
に示す。図２におけるブロックＡに示されるように、処
理の第１ステージは、文字列の画像Ｉを取得することで
ある。一般的に、システム１により取得される各画像Ｉ
は画素のマトリックスからなる。In FIG. 2, blocks A to I schematically show various stages performed during the character string interpretation processing of the present invention. As shown in block A in FIG. 2, the first stage of the process is to obtain an image I of the string. Generally, each image I acquired by the system 1
Consists of a matrix of pixels.

【００５５】画像マトリックス中の各画素は画像内の画
素位置における画像の強度を示す濃度階調輝度を有す
る。更に、画素の飽和を示すこともできる。各取得画像
はフレームバッファ７に記憶される。ブロックＢにより
示されるように、処理の第２ステージは、記憶画像Ｉの
“前処理”である。プロセッサ２により行われる適当な
画像前処理操作は、“所望領域”の位置決め，下線の除
去，画像のデスランティング(deslanting)及びデスキュ
ーイング(deskewing)，小さい点（すなわち、微小な接
続成分）と侵入ストロークの除去，及び標準サイズへの
画像の正規化（例えば、画像のアスペクト比が変更され
ないように選択された幅と共に、２０画素高さに正規化
する）などである。Each pixel in the image matrix has a density gradation brightness that indicates the intensity of the image at a pixel location within the image. It can also indicate pixel saturation. Each acquired image is stored in the frame buffer 7. As indicated by block B, the second stage of processing is the "pre-processing" of the stored image I. Suitable image pre-processing operations performed by processor 2 include "desired area" positioning, underline removal, image deslanting and deskewing, small dots (ie, small connected components) and penetration. Stroke removal, normalization of the image to a standard size (e.g. normalization to 20 pixel height with width selected such that the aspect ratio of the image does not change), etc.

【００５６】画像正規化は、前処理画像Ｉ’が、更なる
画像正規化を必要とすることなく、システムのその後の
ステージに送ることができるようにするために行われ
る。正規化処理中に行われる再サンプリングは、原画像
が白黒であったとしても、効果的な濃度階調画像を生じ
る。その後、正規化画像の上部及び下部輪郭を使用し、
水平及び垂直画像両方向における文字のロングテールを
クリップする。前記の画像前処理操作に関する更に詳細
な説明は、１９９１年１２月３１日に出願された米国特
許出願第０７／８１６４１４号明細書に開示されてい
る。Image normalization is performed to allow the preprocessed image I'to be sent to subsequent stages of the system without the need for further image normalization. The resampling performed during the normalization process yields an effective grayscale image even if the original image is black and white. Then use the top and bottom contours of the normalized image,
Clip the long tail of characters in both horizontal and vertical image directions. A more detailed description of the above image pre-processing operations is disclosed in U.S. patent application Ser. No. 07 / 816,414 filed December 31, 1991.

【００５７】ブロックＣで示される認識処理の次のステ
ージは、前処理画像Ｉ’を“セル”と呼ばれる副画像へ
切り出すことである。画像セルを生成する目的は、この
画像セルを結合し、ブロックＤで示される画像セグメン
ト生成ステージ中に画像“セグメント”Ｓ_iを生成でき
るようにするためである。The next stage in the recognition process, shown in block C, is to cut the preprocessed image I'into a sub-image called a "cell". The purpose of generating the image cells is to combine the image cells so that an image "segment" S _i can be generated during the image segment generation stage shown in block D.

【００５８】本発明によれば、画像セルは先ず、大きな
“結合成分”の存在を検出するために、前処理画像に対
して“結合成分分析”を行うことにより生成される。そ
の後、これらの大きな結合成分含有副画像に対して“ス
マート”波動カットライン作図処理を行う。結合成分分
析及びスマート波動カットライン作図副処理の両方と
も、付属ＲＡＭ４を用いてプログラム化プロセッサ２に
より行われる。According to the invention, image cells are generated by first performing a "connective component analysis" on the preprocessed image in order to detect the presence of large "connective components". Then, "smart" wave cut line drawing processing is performed on these large connected component-containing sub-images. Both the connected component analysis and the smart wave cutline drawing sub-process are performed by the programmed processor 2 using the attached RAM 4.

【００５９】更に詳細には、結合成分分析は、一緒に結
合された文字成分（すなわち、画素群）の存在を決定す
るために、前処理画像からなる画素の強度を分析する。
結合成分分析は、垂直及び水平画像方向に沿って、所定
の範囲内の強度値を有し、一緒に結合される大きな文字
成分を生成するために現れ、１個以上の文字に付随する
と思われる、画素クラスタをサーチする。結合文字成分
は例えば、図４〜図８に示された２番目及び３番目のＺ
ＩＰコード画像などである。More specifically, connected component analysis analyzes the intensities of the pixels of the preprocessed image to determine the presence of character components (ie, groups of pixels) that have been combined together.
Connected component analysis appears to produce large character components that have intensity values within a predetermined range along the vertical and horizontal image directions and are combined together and appear to be associated with one or more characters. , Search for pixel clusters. The combined character component is, for example, the second and third Z shown in FIGS.
For example, an IP code image.

【００６０】大きな結合成分を包含する副画像内に２個
以上の文字が包含されることがある。１個以下の文字が
画像セルの画素により示されるようにするため、カット
ラインをこのような副画像を通して引くことが重要であ
る。これは、被識別大結合成分中に“波動”カットライ
ンを生成することにより行われる。More than one character may be included in a sub-image containing a large connected component. It is important to draw cut lines through such sub-images so that no more than one character is indicated by the pixels of the image cell. This is done by creating a "wave" cutline in the discriminated large connected component.

【００６１】一般的に、このカットライン生成処理によ
り、単に文字を示す画素群中にカットラインを引くだけ
で、大結合成分により示される文字を２個以上の画像セ
ルに細分ことができる。隣接画像セルを結合し画像セグ
メントを構成する方法の数は、この認識処理ステージ中
に生成される画像セルの数と共に急速に増大する。In general, this cut line generation processing makes it possible to subdivide the character indicated by the large connected component into two or more image cells simply by drawing a cut line in the pixel group indicating the character. The number of methods of combining adjacent image cells to form an image segment grows rapidly with the number of image cells generated during this recognition processing stage.

【００６２】本発明のシステムは、一連の良好なカット
ラインを識別し、冗長なラインなどを除去する複雑なヒ
ューリスティック（発見的方法）を使用することによ
り、前処理画像を微小画像セルにカットすることを避け
る。この副処理の操作は、図４〜図８に示された前処理
画像に対して引かれ、かつ、選択的に除去されたカット
ラインにより例証される。The system of the present invention cuts a pre-processed image into minute image cells by using a complex heuristic that identifies a series of good cut lines and removes redundant lines and the like. Avoid things. This sub-processing operation is illustrated by the cut lines drawn and selectively removed for the pre-processed images shown in FIGS.

【００６３】この副処理の最後に、残りのカットライン
の各隣接対間の画素は画像“セル”を画成する。画像セ
ル生成処理中に生成された画像セルは図９の表に示され
る。この表に示されるように、各画像セルはセル番号
（例えば、０，１，２，３，４等）により識別される。
認識処理のこのステージ中のカットライン自動生成に関
する更に詳細な説明は、米国特許出願第０７／８１６４
１４号明細書に開示されている。At the end of this sub-processing, the pixels between each adjacent pair of remaining cutlines define an image "cell". The image cells generated during the image cell generation process are shown in the table of FIG. As shown in this table, each image cell is identified by a cell number (eg, 0, 1, 2, 3, 4 etc.).
For a more detailed description of automatic cutline generation during this stage of the recognition process, see US patent application Ser. No. 07/8164.
No. 14 specification.

【００６４】図２のブロックＤに示されるように、処理
の次のステージは、隣接する（すなわち、連続的な）画
像セルを左から右の順序で結合し、図１０の表に示され
るような一連の画像“セグメント”を生成する。この表
に示されるように、各画像セグメントはその構成画像セ
ルに割当てられた番号（例えば、０，０１，１，２，２
３等）を結合することにより識別される。理想的には、
各画像セグメントは一つだけの文字を示す画素を包含す
る。しかし、いつもこのように上手くいくわけではな
い。As shown in block D of FIG. 2, the next stage of processing combines adjacent (ie, consecutive) image cells in left-to-right order, as shown in the table of FIG. Generate a series of image "segments". As shown in this table, each image segment has a number (eg, 0, 01, 1, 2, 2) assigned to its constituent image cells.
3 etc.). Ideally,
Each image segment contains pixels that represent only one character. But this is not always the case.

【００６５】画像セグメントの最終セットが正しい画像
セグメントを包含することが重要である。複雑な発見的
方法を使用し、画像セルの個数及びどの画像セルを結合
して画像セグメントを構成すべきか決定する。一般的
に、発見的方法は、“明確な限界のある”カット、“相
互接続成分”カット、“内部接続成分”カットなどの用
語で表現される。これらの発見的方法のパラメータ及び
調整ファクタは経験的に決定される。It is important that the final set of image segments include the correct image segment. A complex heuristic is used to determine the number of image cells and which image cells should be combined to form an image segment. In general, heuristics are described in terms of "explicitly bounded" cuts, "interconnect components" cuts, "interconnect components" cuts, and the like. The parameters and adjustment factors for these heuristics are determined empirically.

【００６６】各画像セグメントは一連の画像画素からな
る。この画像画素は、システムに包含される割当てニュ
ーラル情報処理ネットワークにより分析される。下記で
詳細に説明するように、各ニューラルネットワークの機
能は、割当てられた各画像セグメントの一連の画素を分
析すること、及び、画素セットが実際に示されるか又は
可能性として分類される可能な数字文字（すなわち、記
号）の各々に関するスコアを出力として生成することで
ある。Each image segment consists of a series of image pixels. The image pixels are analyzed by an assigned neural information processing network included in the system. As will be described in detail below, the function of each neural network is to analyze a series of pixels in each assigned image segment, and the pixel set can actually be shown or classified as possible. To produce as output a score for each of the numeric characters (ie symbols).

【００６７】ブロックＥで示される処理の次のステージ
は、一連の許容しうる（すなわち、合法な）画像“コン
セグメンテーション”を生成するために、連続的画像
“セグメント”を左から右へ一緒に数珠繋ぎにすること
である。このような各画像コンセグメンテーションは、
前処理画像Ｉ’内の全ての画素を説明しなければならな
い。The next stage of processing, indicated by block E, is to combine successive image "segments" from left to right to produce a series of acceptable (ie, legal) image "consegmentations". It is to connect the beads. Each such image consegmentation is
All the pixels in the preprocessed image I'must be accounted for.

【００６８】できるだけ少数のコンセグメンテーション
を検討することが望ましい。これでも確実に、正しいコ
ンセグメンテーションが生成画像セルから構成された一
連の全画像コンセグメンテーション中に包含される。図
１１の表において、５文字のＺＩＰコード例に関する合
法画像コンセグメンテーションが３例示されている。ブ
ロックＥで示されるように、コンセグメンテーションは
図１２の“有向非環式割当グラフ”により生成される。It is desirable to consider as few consegmentations as possible. This still ensures that the correct segmentation is included in the sequence of all image concatenations made up of the generated image cells. In the table of FIG. 11, three examples of legal image segmentation for a five character ZIP code example are illustrated. As shown in block E, the consegmentation is generated by the "directed acyclic assignment graph" of FIG.

【００６９】このグラフの構造は、これら画像コンセグ
メンテーションの各々が５個の画像セグメントからなる
ことを保証する。入力画像Ｉの空間構造の実体を獲得す
るために、画像セグメントを一緒に数珠繋ぎにすること
ができる方法を支配する規則が存在する。例えば、或る
セグメントの右端を次の画像セグメントの左端に接触し
なければならない。（すなわち、一束の画素をスキップ
する、及び／又は、誤った空間順序で画素を結合するこ
とは許されない。）The structure of this graph guarantees that each of these image consegmentations consists of 5 image segments. There are rules governing how the image segments can be spliced together in order to capture the substance of the spatial structure of the input image I. For example, the right edge of one segment must touch the left edge of the next image segment. (Ie, skipping a bundle of pixels and / or combining pixels in the wrong spatial order is not allowed.)

【００７０】しかし、所望により、これら制約の幾つか
は、適当な条件下で緩和させることができる。連続的な
画像セグメントＳ_iを一緒に数珠繋ぎにすることに関す
る更に詳細な説明は、米国特許出願第０７／８１６４１
５号明細書に開示されている。所望により、被選択画像
コンセグメンテーションをブロックＦで明快に表示する
ことができる。However, if desired, some of these constraints may be relaxed under appropriate conditions. For a more detailed description of lacing consecutive image segments S _i together, see US patent application Ser. No. 07/81641.
No. 5 discloses it. If desired, the selected image consegmentation can be explicitly displayed in block F.

【００７１】本発明の有向非環式（鎖状）グラフは、前
処理画像Ｉ’の可能な画像コンセグメンテーション群
｛Ｓ｝と、アルファベット文字により可能にされるか又
は被記録文字列が表示された言語又はコードのシンタッ
クスにより制約される文字列解釈（すなわち、分類）群
｛Ｃ｝の両方を同時にモデル化する新規な手段も提供す
る。The directed acyclic (chain) graph of the present invention shows the possible image concatenation group {S} of the preprocessed image I'and the alphabetic characters enabled or the recorded character string is displayed. It also provides a novel means of simultaneously modeling both string interpretation (ie, taxonomy) groups {C} that are constrained by the language or code syntax that is used.

【００７２】図１２に関連して詳細に説明されるよう
に、“有向非環式グラフ”として表現できるこのデータ
構造は、画像コンセグメンテーション及び文字列解釈問
題の両方を“グラフ内の最適パス”問題として統一的な
方法で公式化するために、本発明のシステムにより使用
される。直感的に、この問題公式化は形状的アピールを
有する。This data structure, which can be represented as a "directed acyclic graph", as described in detail in connection with FIG. 12, addresses both image consegmentation and string interpretation problems in "optimal paths within the graph.""Used by the system of the present invention to formulate as a matter in a uniform manner. Intuitively, this problem formulation has a geometrical appeal.

【００７３】アライメントグラフ、このグラフを実現す
るデータ構造及びこのグラフを使用する方法について以
下詳細に説明する。その後、このグラフを使用する方法
を、図２のブロックＢで示される画像セグメント分析ス
テージ、ブロックＨで示されるパススコア及び確率計算
ステージ及びブロックＩで示される文字列解釈ステージ
において詳細に説明する。The alignment graph, the data structure that implements this graph, and the method of using this graph are described in detail below. The method of using this graph is then described in detail in the image segment analysis stage shown in block B, the pass score and probability calculation stage shown in block H and the string interpretation stage shown in block I of FIG.

【００７４】図１２に示されるように、本発明のグラフ
はノードの二次元アレーからなる。このグラフは高レベ
ルの記述において、格子又はトレリスダイアグラムと呼
ばれる従来のグラフに類似する。本発明のアライメント
グラフは、多数の重要なモデル化機能を行うデータ構造
により実現される。このデータ構造はプログラム業界で
周知の方法により、プログラム化プロセッサ２により作
成、変更及び管理される。As shown in FIG. 12, the graph of the present invention consists of a two-dimensional array of nodes. This graph, in a high-level description, resembles a conventional graph called a lattice or trellis diagram. The alignment graph of the present invention is implemented with a data structure that performs a number of important modeling functions. This data structure is created, modified and managed by the programmed processor 2 in a manner well known in the programming industry.

【００７５】アライメントグラフにおける各ノードは個
別的なデータ構造として実現される。これは“主データ
構造”の副構造である。各ノードに関するデータ構造は
多数の“ローカル”情報欄を有する。この情報欄は次の
ような情報事項を記憶することができ、かつ、特別の標
識が付されている。ユニークなノード識別子（すなわ
ち、ノードの列／行アドレスを識別するコード），付随
画像セグメントの画素を示すことができる可能数字文字
の各々に関する算定スコア，付随画像セグメントの画素
を示すことができる可能数字文字の各々に関する算定
“非正規化”スコア，先祖ノードのノード識別子，及び
子孫ノードのノード識別子。Each node in the alignment graph is realized as an individual data structure. This is a substructure of the "main data structure". The data structure for each node has a number of "local" information fields. This information column can store the following information items and is marked with a special mark. A unique node identifier (ie, a code that identifies the column / row address of the node), a possible numerical digit that can indicate the pixels of the satellite image segment, a calculated score for each of the letters, a possible digit that can indicate the pixels of the satellite image segment. The calculated "denormalized" score for each of the characters, the node identifier of the ancestor node, and the node identifier of the descendant node.

【００７６】この方法の各ステージで生成された情報を
記憶するために、主データ構造は多数の“グローバル”
情報欄を有する。この情報欄は次のような情報事項を記
憶することができ、かつ、特別の標識が付されている。
どの特定の画像セグメントがアライメントグラフ内の各
特定行のノードにより示されるか識別する一連のコー
ド，各画像セグメントがメモリ内に記憶される場所を識
別する一連のアドレス，及び被選択パスと、同じ文字列
解釈を示すアライメントグラフ中の一連のパスに沿った
スコアの和。In order to store the information generated at each stage of the method, the main data structure is a large number of "global".
It has an information column. This information column can store the following information items and is marked with a special mark.
Same as a series of codes identifying which particular image segment is represented by a node in each particular row in the alignment graph, a series of addresses identifying where each image segment is stored in memory, and a selected path Sum of scores along a series of paths in the alignment graph showing string interpretation.

【００７７】アライメントグラフ内の列の数は、可能文
字列解釈内の文字の桁数（例えば、図３のＺＩＰコード
では５桁）に等しい。また、アライメントグラフ内の行
の数は、本発明の方法の画像セグメント生成ステージ中
に構築される画像セグメントの数に等しい。例えば、ア
ライメントグラフのサイズは一般的に、解釈（すなわ
ち、分析及び分類）のために取得される各画像Ｉに関す
る行サイズを変える。The number of columns in the alignment graph is equal to the number of digits of characters in the possible string interpretation (eg, 5 digits in the ZIP code of FIG. 3). Also, the number of rows in the alignment graph is equal to the number of image segments constructed during the image segment generation stage of the method of the present invention. For example, the size of the alignment graph typically changes the row size for each image I acquired for interpretation (ie analysis and classification).

【００７８】従って、各取得画像Ｉの場合、プログラム
化プロセッサ２は型通りに、取得画像に対して特別に作
製された図１２に示されるタイプのグラフを生成する。
このような各アライメントグラフは、ＲＡＭ４に記憶さ
れるものに対応するデータ構造を生成することにより物
理的に実現される。Thus, for each acquired image I, the programmed processor 2 routinely produces a graph of the type shown in FIG. 12 specially made for the acquired image.
Each such alignment graph is physically realized by generating a data structure corresponding to that stored in the RAM 4.

【００７９】画像Ｉ及びその可能文字列解釈に関する画
像コンセグメンテーションに関する情報は、この情報に
ついて特別に生成されたデータ構造の情報欄に記憶され
る。最後に、この被編成情報は、解釈の候補群｛Ｃ｝か
ら最有望文字列解釈Ｃを選択するために、プログラム化
プロセッサ２により使用される。The information on the image I and the image segmentation on its possible character string interpretation is stored in the information field of the data structure specially generated for this information. Finally, this organized information is used by the programmed processor 2 to select the most promising string interpretation C from the candidate set of interpretations {C}.

【００８０】図１２に示されるように、本発明のアライ
メントグラフは多数の精密な構造特徴を有する。グラフ
の主要部分は行と列を有する。各列は文字列解釈Ｃ内の
一つの文字位置に対応する。この事例は５文字のＺＩＰ
コードに関するものなので、図示されるように、５列必
要である。各行は画像セグメントに対応する。この事例
は１１セグメントを有するので、図示されるように、１
１行必要である。As shown in FIG. 12, the alignment graph of the present invention has a number of precise structural features. The main part of the graph has rows and columns. Each string corresponds to one character position in the string interpretation C. This example is a 5-character ZIP
As it relates to the code, it requires 5 columns as shown. Each row corresponds to an image segment. This case has 11 segments, so 1
You need one line.

【００８１】或る列の或る行の各解釈には、一対のドッ
ト（・・）により示されるノードが存在する。左のド
ットはノードの“モーニング”部分を示し、右のドット
はノードの“イブニング”部分を示す。このような各ノ
ードは、その行インデックスと列インデックスにより特
定される。更に、最初の文字位置の前で、最も左側の画
像セグメントの左端に配置された特殊開始ノード１７が
存在する。同様に、最後の文字位置に右側で、最も左側
の画像セグメントの下側に配置された特殊終了ノード１
８が存在する。In each interpretation of a certain row of a certain column, there is a node indicated by a pair of dots (...). The left dot indicates the "morning" part of the node and the right dot indicates the "evening" part of the node. Each such node is identified by its row index and column index. Furthermore, before the first character position, there is a special start node 17 located at the left end of the leftmost image segment. Similarly, the special end node 1 placed to the right of the last character position and below the leftmost image segment.
There are eight.

【００８２】図１２に示されるように、各ノードのモー
ニング及びイブニング部分を接続する１０個の認識弧が
存在する。明瞭化のために、図１２には１０個の認識弧
の内３個しか図示されていない。解釈処理中に、各認識
弧１９は“ｒ−スコア”で標識化される。As shown in FIG. 12, there are 10 recognition arcs connecting the morning and evening parts of each node. For clarity, only three of the ten recognition arcs are shown in FIG. During the interpretation process, each recognition arc 19 is labeled with an "r-score".

【００８３】この“ｒ−スコア”は認識弧により示され
る文字に割当てられる。これらの認識弧は、ＺＩＰコー
ドを構成する数字文字に割当てられた非正規化ｒ−スコ
アを示す。しかし、単語及び文章認識用途では、これら
の認識弧は一般的に、所定のアルファベット又は語彙中
の記号に割当てられた非正規化スコアを示す。図１２に
示されるように、このようなノード間のノード子孫系統
及び先祖系統を示すために、或るノードの各イブニング
部分とその直ぐ隣のノードのモーニング部分との間に、
直線化グルー弧(glue-arc)１９も引かれる。This "r-score" is assigned to the character indicated by the recognition arc. These recognition arcs indicate the denormalized r-score assigned to the numeric characters that make up the ZIP code. However, in word and sentence recognition applications, these recognition arcs generally represent denormalized scores assigned to symbols in a given alphabet or vocabulary. As shown in FIG. 12, in order to show the node descendant system and the ancestor system between such nodes, between each evening part of a certain node and the morning part of the immediately adjacent node,
A straightening glue-arc 19 is also drawn.

【００８４】認識弧と異なり、このグルー弧はニューラ
ルネットワークによりｒ−スコアは割当てられない。そ
の他の実施例では、複雑なグルー弧スコアを使用するこ
ともできるが、この実施例の場合、単純なシステムが使
用される。すなわち、許容弧にはスコア１．０が割当て
られ、かつ、保持される。しかし、非許容弧にはスコア
０．０が割当てられ、アライメントグラフから廃棄され
る。Unlike the recognition arc, this glue arc is not assigned an r-score by the neural network. In this embodiment, a simple system is used, although in other embodiments complex glue arc scores may be used. That is, a score of 1.0 is assigned to the allowable arc and held. However, the non-permissive arc is assigned a score of 0.0 and is discarded from the alignment graph.

【００８５】ノードのモーニング部分はここに進入する
２つ以上のグルー弧を有することもできる。同様に、ノ
ードのイブニング部分はここから出る２つ以上のグルー
弧を有することもできる。画像コンセグメンテーション
の構成にインポーズされる制約の結果として、ローカル
的には道理にかなうが、グローバル的には道理にかなわ
ないグルー弧がアライメントグラフ中に存在することも
ある。従って、解釈処理の計算効率を改善するために、
特定のグルー弧を除去又は剪定することができる。図２
のブロックＧに示される画像セグメント分析ステージに
進む前に、アライメントグラフに対して次ぎの“グルー
弧”剪定処理を行うことができる。The morning portion of a node can also have more than one glue arc entering it. Similarly, the evening portion of a node can have more than one glue arc out of it. As a result of the constraints imposed on the composition of image consegmentation, there may be glue arcs in the alignment graph that make sense locally but not globally. Therefore, in order to improve the calculation efficiency of the interpretation process,
Specific glue arcs can be removed or pruned. FIG.
The next "glue arc" pruning process can be performed on the alignment graph before proceeding to the image segment analysis stage shown in block G.

【００８６】グルー弧剪定処理の第１工程は、順方向コ
ーンのメンバーとしてすでにマークされたノードの子孫
を反復的にマークすることにより、開始ノードの子孫で
あるノードの“順方向コーン”を計算することである。
処理の第２工程は、逆方向コーンのメンバーとしてすで
にマークされたノードの先祖を反復的にマークすること
により、終了ノードの先祖であるノードの“逆方向コー
ン”を計算することである。The first step in the glue arc pruning process is to compute the "forward cone" of a node that is a descendant of the start node by iteratively marking the descendants of the node already marked as members of the forward cone. It is to be.
The second step in the process is to compute the "back cone" of the node that is the ancestor of the end node by iteratively marking the ancestors of the nodes already marked as members of the back cone.

【００８７】処理の第３工程は、どのノードがこれら２
つのコーンの論理的共通部分に存在しないか決定し、次
いで、これらノードを“デッド”としてマークする。そ
の後、“デッド”とマークされたノードへ延びる又はこ
のノードから延びるグルー弧を許容グルー弧のリストか
ら削除（すなわち、剪定）する。これらコーンの共通部
分内の各ノードは“アライブ”と見做され、画像セグメ
ント分析ステージ中にその一連の認識弧に割当てられる
スコアを有する。The third step of the process is to determine which node
Determine if they are not in the logical intersection of two cones, then mark these nodes as "dead". Then, the glue arcs that extend to or from the node marked as "dead" are removed (ie, pruned) from the list of allowed glue arcs. Each node within the intersection of these cones is considered an "alive" and has a score assigned to its series of recognition arcs during the image segment analysis stage.

【００８８】このグローバルな制約を満たすことによ
り、合法的な先祖又は子孫を有しない、アライメントグ
ラフの右手上方コーナー及び左手下方コーナーに多数の
ノードが存在することとなる。この事実は、図１２に示
されるように、アライメントグラフのこれらの領域内に
入力グルー弧及び出力グルー弧が存在しないことにより
示される。更に、必要により、又は所望により、アライ
メントグラフは明確なカットの存在を用いることにより
剪定することもできる。Satisfying this global constraint results in a large number of nodes in the upper right and lower left corners of the alignment graph that have no legal ancestors or descendants. This fact is shown by the absence of input and output glue arcs within these regions of the alignment graph, as shown in FIG. Furthermore, if necessary or desired, the alignment graph can also be pruned by using the presence of distinct cuts.

【００８９】グラフ内の各パスはコンセグメンテーショ
ン及び解釈の両方を示す。パス内のグルー弧はコンセグ
メンテーションを指定し、パス内の認識弧は解釈を指定
する。本発明の方法がどのようにして、可能文字列解釈
の全ての群又は競合する文字列解釈の少数群の何れかか
ら“正しい”文字列解釈を選択するのかを理解ために、
先ず最初に、“正しい”文字列解釈の最終的選択に先行
する幾つかのサブプロセスを理解しなければならない。Each path in the graph represents both consegmentation and interpretation. Glue arcs in the path specify consegmentation, and cognitive arcs in the path specify the interpretation. To understand how the method of the present invention selects the "correct" string interpretation from either all of the possible string interpretations or a minority of competing string interpretations,
First of all, one must understand some sub-processes that precede the final choice of "correct" string interpretation.

【００９０】第１のサブプロセスは、各ノードの認識弧
に割当てられた非正規化ｒ−スコアの計算に関する。第
２のサブプロセスは、同じ文字列解釈を示すアライメン
トグラフを通る全ての文字列パスに付随するｒ−スコア
の和の計算に関する。これらのサブプロセスについて以
下説明する。The first sub-process involves the calculation of the denormalized r-score assigned to the recognition arc of each node. The second sub-process involves the calculation of the sum of r-scores associated with all string passes through the alignment graph showing the same string interpretation. These sub-processes will be described below.

【００９１】図１３に示されるように、解釈処理の画像
セグメント分析ステージは複雑なニューラル計算ネット
ワーク２１を使用する。各ｉ番目のニューラル計算ネッ
トワークの基本的機能は、グラフ内のｉ番目の行と同時
インデックス化された画像セグメントＳ_iの画素を分析
し、グラフ内のｉ番目の行内の各ノードにおける認識弧
に割当てられる一連の“スコア”（すなわち、ｒ−スコ
ア）を計算することである。As shown in FIG. 13, the image segment analysis stage of the interpretation process uses a complex neural computation network 21. The basic function of each i-th neural computing network is to analyze the pixels of the image segment S _i that are co-indexed with the i-th row in the graph and find the recognition arc at each node in the i-th row in the graph. Computing a series of "scores" (ie, r-scores) that are assigned.

【００９２】一つのセグメントが存在すると、行当たり
の一つのニューラルネットワーク及び同じ行内の全ての
ノードは同じ１０個組のｒ−スコアを受信する。明確化
のために、各ノードについて、１０個の認識弧の内の３
個だけしか図１３には図示されていない。要するに、各
ニューラル計算ネットワークはその入力（一連の番号に
より示される一群の画素）を、ｒ−スコアと呼ばれる１
０個組の番号ｒ₀，ｒ₁，．．．ｒ₉にマップする。When there is one segment, one neural network per row and all nodes in the same row receive the same set of 10 r-scores. For clarity, 3 out of 10 recognition arcs for each node
Only one is shown in FIG. In essence, each neural computation network has its input (a group of pixels represented by a series of numbers) 1 called the r-score.
0-piece number r ₀ , r ₁ ,. . . Map to r ₉ .

【００９３】ネットワークのアーキテクチャは、これら
のｒ−スコアがポジティブであり、これらの解釈を非正
規化確率として容認することを保証する。ｒ₀が大きな
値であることは、入力セグメントが数字の“０”を示す
高い確率を示し、同様に、他の９個のｒ−スコアはそれ
ぞれ他の９個の数字に対応する。また、大きなｒ−スコ
アは、入力セグメントが画像の正しいコンセグメンテー
ションの一部である高い確率を反映する。The network architecture ensures that these r-scores are positive and accept their interpretation as denormalized probabilities. A large value of r ₀ indicates a high probability that the input segment represents the number “0”, and likewise, the other 9 r-scores respectively correspond to the other 9 numbers. Also, a large r-score reflects the high probability that the input segment is part of the correct segmentation of the image.

【００９４】逆に言えば、数字を半分にカットすること
によりセグメントが生成される場合（時々起こることが
ある）、このセグメントに関する１０個全てのｒ−スコ
アは、セグメントの望ましからざる特性の検出を示すた
めに、小さくなければならない。Conversely, if a segment is produced by cutting numbers in half (sometimes it happens sometimes), then all 10 r-scores for this segment are of undesired properties of the segment. Must be small to indicate detection.

【００９５】本発明によれば、各ニューラル計算ネット
ワークのマッピング機能は、成分を有する重みベクトル
Ｗ₁，Ｗ₂，．．．Ｗ_mとしてベクトル形で示すことがで
きる一連の調整可能パラメータを特徴とする。初めに、
各ニューラル計算ネットワークの一連の調整可能パラメ
ータを一連の初期値に調整する。According to the present invention, the mapping function of each neural calculation network is such that the weight vectors W ₁ , W ₂ ,. . . It features a set of adjustable parameters that can be shown in vector form as W _m . at first,
Adjust a set of adjustable parameters for each neural computation network to a set of initial values.

【００９６】しかし、下記で詳細に説明するように、図
２のブロックＪで示されるニューラルネットワークパラ
メータ調整ステージは、１つ以上の学習セッション中
に、各ニューラル計算ネットワークの入力／出力マッピ
ング機能を一連の学習データに順応させるように条件付
けするような方法で、これらのパラメータを増分的に調
整できるために設けられている。この学習データセット
は、国中の異なる人々により手書きされたＺＩＰコード
を有する数十万の有効化学習画像からなる。However, as will be described in detail below, the neural network parameter adjustment stage, indicated by block J in FIG. 2, serializes the input / output mapping functions of each neural computational network during one or more learning sessions. Are provided to allow incremental adjustment of these parameters in such a way that they are conditioned to accommodate the learning data of. This training dataset consists of hundreds of thousands of validated training images with ZIP codes handwritten by different people across the country.

【００９７】各ｉ番目のニューラル計算ネットワークか
ら生成されたｒ−スコアは、ｒ＝ｒ₁，ｒ₂，．．．，ｒ
_Nとしてベクトル形で表示され、アライメントグラフの
ｉ番目の行内の全てのノードにおける１０個の対応する
認識弧（すなわち、情報欄）に割当てられる。The r-score generated from each i-th neural network is r = r ₁ , r ₂ ,. . . , R
Displayed in vector form as _N , it is assigned to the 10 corresponding recognition arcs (ie, information columns) at all nodes in the i-th row of the alignment graph.

【００９８】一般的に、各ニューラル計算ネットワーク
は、コンピュータプログラム、電気回路、又はニューラ
ル計算ネットワークの入力／出力マッピング機能を実現
できる微視的又は巨視的デバイスとして実現できる。し
かし、各ニューラル計算ネットワークは周知のLeNet
（登録商標）コンピュータプログラムを実行することに
より実現される。In general, each neural computing network can be implemented as a computer program, an electrical circuit, or a microscopic or macroscopic device capable of implementing the input / output mapping function of the neural computing network. However, each neural network is a well-known LeNet
It is realized by executing a (registered trademark) computer program.

【００９９】このLeNet（登録商標）コンピュータプロ
グラムは、Y.Le Cun et al., "Handwritten Digit Reco
gnition with a Back-Propagation Network", pp 396-4
04, Advances in Neural Information Processing 2,
(David Touretzky, Editor), Morgan Kaufman (1990)
に詳述されている。更に、ニューラル計算ネットワーク
の構成及び学習に関する詳細な説明は、John Denker et
al., "Automatic Learning , Rule Extraction, and G
eneralization", pp 877-922, Complex Systems,Vol.
1, October,1987に開示されている。This LeNet® computer program is described by Y. Le Cun et al., "Handwritten Digit Reco.
gnition with a Back-Propagation Network ", pp 396-4
04, Advances in Neural Information Processing 2,
(David Touretzky, Editor), Morgan Kaufman (1990)
Are detailed in. In addition, a detailed description of the construction and learning of neural computing networks can be found in John Denker et al.
al., "Automatic Learning, Rule Extraction, and G
eneralization ", pp 877-922, Complex Systems, Vol.
1, October, 1987.

【０１００】アライメントグラフにおいて、同じ文字列
解釈を示す２個以上のパス（異なるコンセグメンテーシ
ョンを示す）が存在することもある。所定の解釈を示す
パスは“グループ”と見做さなければならない。所定の
解釈に割当てられたスコアはグループ内の全てのパスの
スコアの和に依存しなければならない。これは、このグ
ループ内の一つだけのパスに関するスコアを一般的に考
慮する従来の認識器と異なり、グループ内の他のパスの
寄与を無視する。In the alignment graph, there may be more than one path (indicating different consegmentation) indicating the same character string interpretation. Paths that exhibit a given interpretation must be considered "groups." The score assigned to a given interpretation must depend on the sum of the scores of all paths in the group. This disregards the contributions of other paths in the group, unlike conventional recognizers which generally consider the score for only one path in this group.

【０１０１】５個の数字を包含する画像の場合、一般的
に、１０⁵個の可能個別解釈が存在し、アライメントグ
ラフを通るパスの本数はこれよりも更に多いこともあ
る。従って、これらを明快に示したり、あるいは各確率
を個別的に検討することは実際的ではない。本発明のデ
ータ構造及びアルゴリズムにより、本発明のシステム
は、特定の重要なパスグループ（例えば、所定の解釈に
対応するパスグループ又は全てのパスグループ）を識別
し、また、このグループのスコア（すなわち、このグル
ープ内のパスのスコアの和）を効率的に評価することが
できる。For images containing 5 numbers, there are typically 10 ⁵ possible individual interpretations, and the number of passes through the alignment graph may be even higher. Therefore, it is not practical to show these clearly or examine each probability individually. The data structure and algorithm of the present invention allows the system of the present invention to identify a particular important path group (eg, the path group corresponding to a given interpretation or all path groups), and also the score of this group (ie, , The sum of the scores of the paths in this group) can be efficiently evaluated.

【０１０２】本発明のシステムは取得画像Ｉの画素を分
析し、確率が計算された候補解釈（すなわち、分類）を
示すグラフを通る全てのパスの和を計算する。和中の各
項目は、アライメントグラフ内の特定のパスからなる弧
に割当てられたスコアの積である。和が計算された後に
のみ、正規化が行われる。これを“列毎”正規化と呼
ぶ。The system of the present invention analyzes the pixels of the acquired image I and calculates the sum of all the paths through the graph showing the candidate interpretations (ie, classifications) for which the probabilities have been calculated. Each item in the sum is the product of the scores assigned to the arcs of the particular path in the alignment graph. Normalization is performed only after the sum is calculated. This is called "column by column" normalization.

【０１０３】これに対し、確率を計算する従来の認識器
は、処理の比較的初期の段階でスコアを一般的に正規化
する。一般的に、或る意味では“文字毎”正規化と同等
である。これにより、コンセグメンテーションの品質に
関する価値ある情報を廃棄してしまう。下記に説明する
ニューラル計算ネットワーク学習プロセスは、ニューラ
ルネットワークの複合体を学習させ、セグメントの所定
の文字解釈が正しい確率ではなく、所定のコンセグメン
テーションが正しい確率に関する情報を包含するｒ−ス
コアを生成することが重要である。In contrast, conventional recognizers that compute probabilities generally normalize scores at relatively early stages of processing. In general, in a sense, it is equivalent to "character by character" normalization. This discards valuable information about the quality of the consegmentation. The neural computation network learning process described below trains a complex of neural networks to produce r-scores that contain information about the probability that a given segmentation is correct, rather than the probability that a given character interpretation of a segment is correct. This is very important.

【０１０４】本発明のシステム及び方法により生成され
た正規化スコアは、事後確率Ｐ（Ｃ／Ｉ）の推定値を示
す。これに対し、従来技術の多文字認識ＭＣＲシステム
で使用される最尤シーケンス推定確率は一般的に、Ｐ
（Ｉ／Ｃ）形の事前確率を使用する。これらの異なる確
率測度はその他の各所定の若干の追加情報に関連させる
ことができるので、多くの目的に受け入れられる。事後
確率の実際の利点は、本発明のシステム及び方法の内部
計算が、解釈とコンセグメンテーションの結合事後確率
Ｐ（Ｃ，Ｓ／Ｉ）の推定値に依存することである。The normalized score generated by the system and method of the present invention represents an estimate of the posterior probability P (C / I). In contrast, the maximum likelihood sequence estimation probabilities used in prior art multi-character recognition MCR systems are generally P
Use (I / C) type prior probabilities. These different probability measures are acceptable for many purposes, as they can be associated with each other certain additional information. The actual advantage of posterior probabilities is that the internal computations of the system and method of the present invention rely on an estimate of the combined posterior probabilities P (C, S / I) of interpretation and segmentation.

【０１０５】対応する事前（最尤）表示Ｐ（Ｉ／Ｃ，
Ｓ）は有用な事後形に容易に関連させることができな
い。なぜなら、一般的に、周辺確率Ｐ（Ｓ）を推定する
ことが容易ではないからである。その結果、従来の認識
器は最高スコアの解釈を識別するができるが、適正に正
規化されたスコアを割当てることができない。本発明の
正しく正規化されたスコアは、確率として非常に容易に
解釈することができ、従って、他のソースからの情報と
非常に容易に結合させることができる。Corresponding prior (maximum likelihood) display P (I / C,
S) cannot be easily related to useful posterior forms. This is because it is generally not easy to estimate the marginal probability P (S). As a result, conventional recognizers are able to identify the interpretation of the highest score, but cannot assign a properly normalized score. The correctly normalized scores of the present invention can be very easily interpreted as probabilities, and thus can be combined with information from other sources very easily.

【０１０６】一般的に、図１４に示される手順の目標
は、図１３に図示されたアライメントグラフにより示さ
れる各競合文字列解釈に関する新規な事後確率Ｐ（Ｃ／
Ｉ）を計算することである。このような各確率は、分母
部分により割られる分子部分として表示される比率とし
て計算される。数学的に、本発明の確率測度は次式によ
り表される。In general, the goal of the procedure shown in FIG. 14 is to find a new posterior probability P (C / C / for each competing string interpretation shown by the alignment graph shown in FIG.
I) is to be calculated. Each such probability is calculated as a ratio displayed as the numerator part divided by the denominator part. Mathematically, the probability measure of the present invention is given by:

【０１０７】[0107]

【数１】 [Equation 1]

【０１０８】分子部分の最初の項First term of the molecular part

【数２】は、各パス（Ｓ_i’）の弧に沿ったｒ−スコアの一連の
乗法を示し、全分子部分[Equation 2] Denotes a series of multiplications of the r-scores along the arc of each path (S _i '), the total numerator part

【数３】は、同じ文字列解釈を示す全てのパス（すなわち、コン
セグメンテーションＳ’）にわたるこのようなパススコ
ア積の加法を示す。(Equation 3) Indicates the addition of such path score products over all paths exhibiting the same string interpretation (i.e. consegmentation S ').

【０１０９】分母部分の最初の式、The first expression in the denominator part,

【数４】は同じ文字列解釈を示す全てのパスにわたるパススコア
積の和を示し、全分母部分、(Equation 4) Is the sum of path score products over all paths showing the same string interpretation, the total denominator part,

【数５】はアライメントグラフにより示される全文字列｛Ｃ｝に
わたる全パススコア積の加法を示す。(Equation 5) Indicates the addition of all path score products over all character strings {C} indicated by the alignment graph.

【０１１０】分母部分は全可能解釈からの寄与を包含す
るので、その値は取得画像Ｉのみにより左右され、特定
の解釈Ｃによっては左右されない。分母部分の目的は、
確率が適正に正規化されることを確保することである。
これにより、確率の一般的な原理により、Ｐ（Ｃ_i／
Ｉ）の和（すなわち、全Ｃ_i）は１に等しい。Since the denominator part contains contributions from all possible interpretations, its value depends only on the acquired image I, not on the particular interpretation C. The purpose of the denominator part is
It is to ensure that the probabilities are properly normalized.
Thus, according to the general principle of probability, P (C _i /
The sum of I) (ie, all C _i ) is equal to 1.

【０１１１】特定の文字列解釈について分子部分が計算
されると、この文字列解釈に関する確率は、その計算分
母を共通分母で割ることにより得られる。“正しい”文
字列解釈に到達するために、例えば、一層大きな処理手
順に組み込むことにより、前記の確率計算手順を使用で
きる多数の様々な方法が存在する。この方法の一例を図
１４及び１５の流れ図に示す。更に別の方法を図１６及
び図１７の流れ図に示す。これらの２つの方法を以下詳
細に説明する。When the numerator part is calculated for a particular string interpretation, the probability for this string interpretation is obtained by dividing its calculated denominator by the common denominator. There are a number of different ways in which the above probabilistic calculation procedure can be used to arrive at a "correct" string interpretation, for example by incorporating it into a larger procedure. An example of this method is shown in the flow chart of FIGS. Yet another method is shown in the flow chart of FIGS. These two methods are described in detail below.

【０１１２】本発明の最初の文字列解釈手順のステップ
を図１４及び図１５の流れ図に示す。ブロックＡに示さ
れるように、この手順の最初のステップは、グラフ内の
ｉ番目の行に沿った各ノードに関するｒ−スコアの組を
計算するために、図１３に示されたｉ番目のニューラル
計算ネットワークを使用することである。次いで、ブロ
ックＢに示されるように、手順は、最大パススコアを有
するアライメントグラフを通るパスを（グルー弧及び認
識弧を示す一連のコードとして）識別するために、周知
のヴィテルビ(Viterbi)アルゴリズムを使用する。The steps of the first character string interpretation procedure of the present invention are shown in the flow charts of FIGS. As shown in block A, the first step of this procedure is to compute the set of r-scores for each node along the i-th row in the graph to compute the i-th neural set shown in FIG. Is to use a computational network. Then, as shown in block B, the procedure uses the well-known Viterbi algorithm to identify the paths (as a series of codes indicating glue and recognition arcs) through the alignment graph with the maximum path score. use.

【０１１３】その後、プロセッサはこのパスに対応する
文字列解釈を識別する。この文字列パスに関するパスス
コアは、それ自体が信頼性のある測度ではない近似値に
しか過ぎないので、このスコアは廃棄される。このパス
により示された文字列解釈Ｃ _(V)を識別する情報（例え
ば、５文字のＺＩＰコードである３５７３３）だけが保
持される。The processor then responds to this path.
Identifies a string interpretation. The path for this string path
The core is an approximation that is not a reliable measure by itself.
This score is discarded as it is nothing more. This path
Character string interpretation C indicated by _(V)Information that identifies the
For example, only the 5 character ZIP code 35733) is saved.
Be held.

【０１１４】次いで、図１４のブロックＣに示されるよ
うに、手順は、被識別文字列解釈に関する確率測度の共
通分母部分Ｄ（Ｉ）を計算するために、周知の“順方向
アルゴリズム”を使用する。図１２のアライメントグラ
フを実現するために使用される主データ構造に、この数
字を記憶する。The procedure then uses the well-known "forward algorithm" to compute the common denominator part D (I) of the probability measure for the identified string interpretation, as shown in block C of FIG. To do. This number is stored in the main data structure used to implement the alignment graph of FIG.

【０１１５】順方向アルゴリズムを使用することによ
り、アライメントグラフにより示される全ての可能文字
列解釈に関する非正規化ｒ−スコアの（各パスに沿っ
た）積の（パス全体の）和の正確な値が得られる。グル
ー弧はその存在又は不在によってのみ、パススコアに寄
与する。スコアはグルー弧（同様に、認識弧）に割当て
ることができ、このような全てのスコアは各パスに沿っ
た積内にファクタとして包含される。By using the forward algorithm, the exact value of the (over all paths) sum of products (along each path) of the denormalized r-scores for all possible string interpretations indicated by the alignment graph. Is obtained. Glue arcs contribute to the pass score only by their presence or absence. Scores can be assigned to glue arcs (as well as recognition arcs), and all such scores are factored into the product along each path.

【０１１６】図１４のブロックＤに示されるように、前
記の共通分母部分Ｄ（Ｉ）を計算した後、ヴィテルビア
ルゴリズムにより既に識別された正しい文字列解釈Ｃ
_(V)の確率測度の分子部分Ｎ（Ｃ_(V)Ｉ）を計算するため
に、“順方向アルゴリズム”を使用する。その後、図１
２のアライメントグラフを実現するために使用される主
データ構造に、この数字を記憶する。After computing said common denominator part D (I), as shown in block D of FIG. 14, the correct string interpretation C already identified by the Viterbi algorithm is obtained.
_A "forward algorithm" is used to compute the numerator part N (C _(V) I ₎ of the probability measure of _(V) . Then, FIG.
Store this number in the main data structure used to implement the alignment graph of 2.

【０１１７】順方向アルゴリズムは、ヴィテルビアルゴ
リズムにより識別された被選択文字列解釈を識別するコ
ードを入力として受け入れ、この被選択文字列解釈Ｃ
_(V)の正確な分子値（すなわち、制限付き和）を出力と
して生成する。文字列解釈の算定分子部分は、文字列解
釈Ｃ_(V)を示すアライメントグラフを通る各パスに沿っ
た非正規化ｒ−スコアの積の（パス全体の）和に等し
い。この分子部分の計算中に、グルー弧は、分母部分の
計算中と同じ方法により処理される。The forward algorithm accepts as input a code identifying the selected character string interpretation identified by the Viterbi algorithm, and this selected character string interpretation C
Generate as output the exact numerator value (i.e., restricted sum) of _(V) . The calculated numerator portion of the string interpretation is equal to the sum (of the entire path) of the denormalized r-score products along each path through the alignment graph showing the string interpretation C _(V) . During the calculation of this numerator part, the glue arcs are processed in the same way as during the calculation of the denominator part.

【０１１８】図１４のブロックＥに示されるように、分
母部分及び分子部分が計算された後、文字列解釈Ｃ_(V)
について改良された確率Ｐ（Ｃ_V／Ｉ）が計算される。
その後、この確率は主データ構造に記憶される。最後
に、図１５のブロックＦに示されるように、プロセッサ
は、ブロックＥにおける算定確率が閾値よりも大きいか
否か決定する。After the denominator and numerator parts have been calculated, as shown in block E of FIG. 14, the string interpretation C _(V)
An improved probability P (C _V / I) for is calculated.
This probability is then stored in the main data structure. Finally, as shown in block F of FIG. 15, the processor determines whether the calculated probability in block E is greater than a threshold.

【０１１９】大きい場合、プロセッサは、ヴィテルビア
ルゴリズムにより選択された文字列解釈が被分析画像Ｉ
に関する最高確率文字列解釈であることを確信する。そ
の後、ブロックＧにおいて、プロセッサはシステムから
出力として、(i)文字列解釈（例えば、３５７３３）及
び(ii)これに付随する算定確率の両方を生成する。郵便
物をどのように経路指定するか決定するための基礎とし
て、（他の情報と共に）これら２つの項目を一緒に使用
できる。If so, the processor determines that the character string interpretation selected by the Viterbi algorithm is the analyzed image I.
I'm sure it is the highest probability string interpretation for. Thereafter, in block G, the processor produces as output from the system both (i) a string interpretation (e.g., 35733) and (ii) an associated calculation probability. These two items (along with other information) can be used together as a basis for determining how to route mail.

【０１２０】処理のこのステージにおいて、別の高スコ
ア解釈を識別するために、追加計算を行うことが好まし
い多数の理由が存在する。例えば、ブロックＦにおいて
Ｃ_(V ₎に割当てられた確率が０．５未満であっても、最
高確率解釈を確実に識別することが望ましいことがあ
る。この場合、一連の競合文字列解釈が識別され、この
組の各メンバーに関する確率が算定される。At this stage of processing, there are a number of reasons why it may be desirable to make additional calculations to identify alternative high score interpretations. For example, it may be desirable to reliably identify the highest probability interpretation even if the probability assigned to C _(V ₎ in block F is less than 0.5. In this case, a set of competing string interpretations are identified and the probabilities for each member of the set are calculated.

【０１２１】また、本発明は、多数の解釈（及び確率）
が後の処理で使用される一層大きなシステムの一部とし
て使用することもできる。特に、取得画素画像に基づき
本発明により高確率が与えられた解釈、一層大きなシス
テムにおける後のステージにより除外することもでき
る。このため、別の解釈が必要である。この理由又はそ
の他の理由により、図１６及び図１７の流れ図に示され
た別の手順が使用される。The present invention also provides multiple interpretations (and probabilities).
Can also be used as part of a larger system used in subsequent processing. In particular, it can also be excluded by an interpretation given a high probability according to the invention on the basis of the acquired pixel image, a later stage in a larger system. Therefore, another interpretation is needed. For this or other reasons, the alternative procedure shown in the flow charts of FIGS. 16 and 17 is used.

【０１２２】図１６のブロックＡに示されるように、こ
の手順の最初のステップは、グラフ内のｉ番目の行に沿
った各ノードの一連のｒ−スコアを計算するために、ｉ
番目の計算ネットワークも使用する。次いで、ブロック
Ｂに示されるように、この手順は、（グルー弧及び認識
弧を示す一連のコードとして）アライメントグラフを通
る比較的小さな組のパスを識別するために、ビームサー
チアルゴリズムを使用する。その後、このパスの組に対
応する競合文字列解釈｛Ｃ_j｝の組が識別される。As shown in block A of FIG. 16, the first step in this procedure is to compute a series of r-scores for each node along the ith row in the graph, i
The second computational network is also used. Then, as shown in block B, the procedure uses a beam search algorithm to identify a relatively small set of paths through the alignment graph (as a series of codes indicating glue and recognition arcs). The set of competing string interpretations {C _j } corresponding to this set of paths is then identified.

【０１２３】図１６のブロックＣに示されるように、プ
ロセッサは分母Ｄ（Ｉ）を計算するために周知の順方向
アルゴリズムを使用する。分母Ｄ（Ｉ）は競合解釈の組
｛Ｃｊ｝における各解釈Ｃ_jに関する確率Ｐ（Ｃ_jＩ）の
分母部分として役立つ。この数字は主データ構造に記憶
される。順方向アルゴリズムは、各パスに沿った弧の非
正規化ｒ−スコアの積の（パス全体の）和の正確な値を
与える。分母部分の場合、和は全ての可能解釈を概説す
る。As shown in block C of FIG. 16, the processor uses the well-known forward algorithm to calculate the denominator D (I). The denominator D (I) serves as the denominator part of the probability P (C _j I) for each interpretation C _j in the competitive interpretation set {Cj}. This number is stored in the main data structure. The forward algorithm gives the exact value of the (over the entire path) sum of the denormalized r-score products of the arcs along each path. In the case of the denominator part, the sum outlines all possible interpretations.

【０１２４】被識別解釈に関するスコアを計算するため
に、ブロックＤに示されるように、プロセッサは順方向
アルゴリズムを使用し、各競合文字列解釈Ｃ_jの確率の
分子部分Ｎ（Ｃ_j／Ｉ）を計算する。これらの数字は主
データ構造に記憶される。順方向アルゴリズムは、各パ
スに沿った弧の非正規化ｒ−スコアの積の（パス全体
の）和の正確な値を与える。To compute the score for the identified interpretation, the processor uses a forward algorithm, as shown in block D, to compute the numerator portion N (C _j / I) of the probabilities of each competing string interpretation C _j. To calculate. These numbers are stored in the main data structure. The forward algorithm gives the exact value of the (over the entire path) sum of the denormalized r-score products of the arcs along each path.

【０１２５】順方向アルゴリズムにより計算された和は
パス全体の和である。一つのパスはブロックＢにおける
ビームサーチアルゴリズムにより識別されるパスであ
る。実際、このパスは和における最大項を生成する。和
がその最大項により適切に近似されていると見做される
場合、分子を評価するために順方向アルゴリズムを行う
必要はない。ビームサーチアルゴリズムの結果は十分で
ある。これは“一項和”近似と呼ばれる。The sum calculated by the forward algorithm is the sum of all paths. One path is the path identified by the beam search algorithm in block B. In fact, this pass produces the largest term in the sum. If the sum is considered to be properly approximated by its maximal term, then no forward algorithm needs to be performed to evaluate the molecule. The results of the beam search algorithm are satisfactory. This is called the "unary sum" approximation.

【０１２６】しかし、この和は必ずしも、その最大項に
より適切に近似されるわけではない。従って、ビームサ
ーチアルゴリズムにより計算されたスコアを廃棄し、ビ
ームサーチアルゴリズムにより識別された解釈を保持
し、順方向アルゴリズムを用いて保持解釈のスコアを評
価することが好ましい。However, this sum is not necessarily properly approximated by its maximum term. Therefore, it is preferable to discard the score calculated by the beam search algorithm, retain the interpretation identified by the beam search algorithm, and evaluate the retained interpretation score using the forward algorithm.

【０１２７】全ての可能解釈について分子を計算するこ
とは一般的に不可能である。これが、ブロックＢにおい
て解釈の比較的小さな組を識別することが好ましい理由
である。この解釈は、その大きな“一項目”スコアのた
めに、大きな分子及びその事実によって大きな確率を有
するものと予想される。It is generally not possible to calculate the molecule for all possible interpretations. This is why it is preferable to identify a relatively small set of interpretations in block B. This interpretation is expected to have a large probability due to the large molecule and its facts, due to its large "one-item" score.

【０１２８】前記の説明は、システムが学習された後の
システムの動作を説明するものである。次に、システム
の学習モードを説明する。The above description describes the operation of the system after it has been learned. Next, the learning mode of the system will be described.

【０１２９】最適な性能を得るために、本発明の文字列
解釈システムには、一つ以上の学習セッション中にシス
テムを自動的に学習させることができる動作の自動学習
モードが付与されている。この動作モードは、図２のブ
ロックＪ及び図１３に図示されたシステムを参照しなが
ら下記に詳細に説明する。For optimal performance, the string interpretation system of the present invention is provided with an automatic learning mode of operation that allows the system to automatically learn during one or more learning sessions. This mode of operation is described in detail below with reference to the system illustrated in block J of FIG. 2 and FIG.

【０１３０】図２のブロックＪ及び図１３に示されるよ
うに、本発明の文字列解釈システムはニューラルネット
ワークパラメータ調整モジュール２９を有する。このモ
ジュール２９は、図１３に示されたシステムのグラフ３
０及びニューラル計算ネットワーク２１の複合体の両方
と相互作用する。一般的に、本発明の学習プロセスは、
教師付き学習のコンセプトに基づく。As shown in block J of FIG. 2 and FIG. 13, the character string interpretation system of the present invention has a neural network parameter adjustment module 29. This module 29 is a graph of the system 3 shown in FIG.
It interacts with both 0 and the complex of the neural network 21. In general, the learning process of the present invention is
Based on the concept of supervised learning.

【０１３１】すなわち、学習セット内の各画像Ｉ^*につ
いて、初めから割当てられた解釈Ｃ^*が存在する。ニュ
ーラルネットワークパラメータ調整モジュールは、正し
い文字列解釈の予測（すなわち、平均）確率Ｐ（Ｃ^*／
Ｉ^*）が学習セット内の全ての画像Ｉ^*の処理中に増大
し、一方、各正しくない文字列解釈の予測確率Ｐ（Ｃ／
Ｉ）が学習プロセス中に低下することを確保するように
設計されている。That is, for each image I ^* in the learning set, there is an interpretation C ^* assigned from the beginning. The neural network parameter adjustment module uses the predicted (ie, average) probability P (C ^* /
I ^* ) increases during the processing of all images I ^* in the training set, while the prediction probability P (C / C /
It is designed to ensure that I) drops during the learning process.

【０１３２】要するに、学習モードの目的、従って、ニ
ューラルネットワークパラメータ調整モジュールの目的
は、誤った解釈の平均確率を最小にする一方で、正しい
文字列解釈Ｃの確率を最大にすることを確保することで
ある。ログ関数はゼロ付近が急勾配なので、ｌｏｇ［Ｐ
（Ｃ／Ｉ）］は目的関数として選択される。In summary, the purpose of the learning mode, and thus the purpose of the neural network parameter adjustment module, is to ensure that the probability of correct string interpretation C is maximized while minimizing the average probability of incorrect interpretation. Is. Since the log function has a steep slope near zero, log [P
(C / I)] is selected as the objective function.

【０１３３】これにより、学習プロセスは低スコア画素
パターン（すなわち、画像セグメント）を強調する。こ
れらのパターンは最も問題のあるものであり、そのため
最も学習が必要なものなので、この強調は好ましいもの
である。選択された目的関数を実現するために、プロセ
ッサは下記の数６で示される、関数の勾配を使用する。This causes the learning process to emphasize low score pixel patterns (ie, image segments). This emphasis is preferable because these patterns are the most problematic and therefore the most learning-intensive. To implement the selected objective function, the processor uses the gradient of the function, shown below in Equation 6.

【０１３４】[0134]

【数６】 (Equation 6)

【０１３５】前記数６において、Ｗ＝Ｗ₁，
Ｗ₂，．．．，Ｗ_mはｍ次元のニューラルネットワーク重
みベクトルであり、ｒ_i=ｒ₁,ｒ₂,...,ｒ_nは、ｉ番目の
ニューラルネットワークからの出力として生成されるｉ
番目のｎ次元のｒ−スコアベクトルである。一般的に、
重みベクトルｗは１００００又はこれ以上の成分を有す
る。ｒ−スコアベクトルは数字認識用に正確に１０成分
を有する。前記の勾配式の右側のドット積は、ｒの成分
全体の和を意味する。In the above equation 6, W = W ₁ ,
W ₂ ,. . . , W _m are m-dimensional neural network weight vectors, and r _i = r ₁ , r ₂ , ..., R _n are i generated as outputs from the i-th neural network.
The th n-dimensional r-score vector. Typically,
The weight vector w has 10,000 or more components. The r-score vector has exactly 10 components for digit recognition. The dot product on the right side of the above gradient equation means the sum of all components of r.

【０１３６】一般的に、各ニューラルネットワークにつ
いて、すなわち、アライメント格子内の各行について、
この形の勾配式が存在する。時には、同じ重みベクトル
ｗを用いて２個以上のネットワークを制御することが好
ましいこともある。この場合、ｗの勾配はこのような各
ネットワークからの寄与を包含する。図１３に示される
ように、重みベクトルはレジスタ３１に記憶される。レ
ジスタ３１は、システム内の各及び全てのニューラルネ
ットワークに同じ重みベクトルを与える。In general, for each neural network, ie for each row in the alignment grid,
There is a gradient formula of this form. At times, it may be preferable to control more than one network with the same weight vector w. In this case, the gradient of w contains the contributions from each such network. As shown in FIG. 13, the weight vector is stored in the register 31. Register 31 provides the same weight vector to each and every neural network in the system.

【０１３７】ここに説明した多文字認識器の学習プロセ
スを始める前に、ニューラルネットワーク重みベクトル
を初期化しなければならない。幾つかの合理的な分布に
従うランダム値により初期化することもできるし、ある
いは、特に好適であると先験的に思われる選択値により
初期化することもできる。Prior to starting the learning process of the multi-character recognizer described herein, the neural network weight vector must be initialized. It can be initialized with random values that follow some rational distribution, or it can be initialized with selected values that a priori appear to be particularly suitable.

【０１３８】多くの場合において、まるで単一文字認識
器として使用されるかのように、アライメントグラフか
らニューラルネットワークを一時的に分離し、これを手
で断片化された画像について前学習させることが好まし
い。得られた重みベクトル値は下記に説明する多文字認
識器学習プロセスのための出発点として役立つ。In many cases, it is preferable to temporarily separate the neural network from the alignment graph and pretrain it on the manually fragmented image, as if it were used as a single character recognizer. . The obtained weight vector values serve as a starting point for the multi-character recognizer learning process described below.

【０１３９】勾配式の左側は、システム感度ベクトルと
呼ばれる。なぜなら、これは、重みベクトルｗの変化に
ついて全システムの出力の感度に関する情報を与える勾
配だからである。システム感度ベクトルの各成分は、重
みベクトルの対応成分に属する。The left side of the gradient equation is called the system sensitivity vector. Because it is a slope that gives information about the sensitivity of the output of the whole system with respect to changes in the weight vector w. Each component of the system sensitivity vector belongs to the corresponding component of the weight vector.

【０１４０】特に、システム感度ベクトルの所定の成分
がゼロよりも大きい場合、重みベクトルの対応成分の微
小な増加は、システムが当該画像Ｉに関する解釈Ｃに割
当てる確率Ｐ（ＣＩ）を増大させる。要するに、システ
ム感度ベクトルは、前記の目的関数を最適化するために
使用できる。In particular, if the predetermined component of the system sensitivity vector is greater than zero, a small increase in the corresponding component of the weight vector increases the probability P (CI) that the system assigns to the interpretation C for the image I in question. In short, the system sensitivity vector can be used to optimize the above objective function.

【０１４１】この学習プロセスの基礎となる原理を更に
深く理解するために、勾配関数を構成する数の特性の真
価を認めることが有用である。To better understand the underlying principles of this learning process, it is useful to recognize the true value of the properties of the numbers that make up the gradient function.

【０１４２】前記の公式に従って、システム感度ベクト
ルは、公式の右側に示された２つの他の数のドット積
（ベクトル・マトリックス積）として計算される。この
ような第１の数はベクトル、∂ｌｏｇＰ／∂ｒである。
これは、その入力に対して与えられたｒ−スコア
ｒ₁．．．ｒ_nの変化についてグラフ出力の感度に関する
情報を与える。According to the above formula, the system sensitivity vector is calculated as the dot product (vector matrix product) of two other numbers shown on the right side of the formula. Such a first number is the vector, ∂logP / ∂r.
This is the r-score r ₁ . . . Gives information on the sensitivity of the graph output for changes in r _n .

【０１４３】これはグラフ感度ベクトルと見做すことが
できる。第２の数はＮ×Ｎマトリックス∂ｒ／∂Ｗであ
る。これは、全てのニューラルネットワークを制御する
重みベクトルの変化についてニューラルネットワーク出
力の感度に関する情報を与える。This can be regarded as a graph sensitivity vector. The second number is the N × N matrix ∂r / ∂W. This gives information about the sensitivity of the neural network output for changes in the weight vector controlling all neural networks.

【０１４４】前記の３種類の項目は次のように関数的に
相互関係にあるものと見做すこともできる。各学習画像
Ｉ^*の処理中に、評価されたニューラルネットワーク感
度マトリックスを使用し、評価されたシステム感度ベク
トルを生成するために、評価されたグラフ感度ベクトル
を変換する。次いで、評価されたシステム感度ベクトル
の個々の成分は重みベクトルの対応成分を調整し、その
結果、パラメータ調整モジュールの目的関数Ｐ（Ｃ^*／
Ｉ^*）が最適化される。The above-mentioned three types of items can be considered to be functionally interrelated as follows. During processing of each training image I ^* , the evaluated neural network sensitivity matrix is used to transform the evaluated graph sensitivity vector to produce an evaluated system sensitivity vector. The individual components of the evaluated system sensitivity vector then adjust the corresponding components of the weight vector, so that the objective function P (C ^* /
I ^* ) is optimized.

【０１４５】理論上は、システム感度ベクトルは、勾配
関数の右側の項目を数字的に評価し、次いで、特定され
た数学的演算を行うことにより得ることができる。しか
し、学習セッション中に、各画像／解釈対｛Ｉ^*，Ｃ^*｝
についてシステム感度ベクトルを操作的に評価する一層
簡単な方法が存在する。Theoretically, the system sensitivity vector can be obtained by numerically evaluating the items to the right of the gradient function and then performing the specified mathematical operation. However, during the learning session, each image / interpretation pair {I ^* , C ^* }
There is a simpler way to operatively evaluate the system sensitivity vector for.

【０１４６】図１９及び図２０の流れ図に関して下記に
説明するように、ニューラルネットワーク感度マトリッ
クスを明快に評価する必要無しに、コンピュータを使用
する効率的な方法でシステム感度ベクトルを評価するた
めに、周知の逆方向伝播("Back-Prop")アルゴリズムを
使用することができる。As described below with respect to the flow charts of FIGS. 19 and 20, there is a well-known technique for evaluating system sensitivity vectors in a computer-efficient manner without the need to explicitly evaluate neural network sensitivity matrices. The back-propagation ("Back-Prop") algorithm of can be used.

【０１４７】本発明のシステムをその学習モードで動作
させる場合、図１９及び図２０の学習プロセスは、図２
のブロックｋに示されるように、学習セットデータベー
スにおける各画像Ｉ^*について行われる。各画像Ｉ^*は、
既知の文字列解釈Ｃ^*と連合される。一般的に、非常に
多量（例えば、何万）の画像／解釈対｛Ｉ^*，Ｃ^*｝を使
用し、特別な学習セッションの過程でシステムを学習さ
せる。When operating the system of the present invention in its learning mode, the learning process of FIGS.
This is done for each image I ^* in the learning set database, as shown in block k. Each image I ^* is
Associated with the known string interpretation C ^* . In general, a very large amount (eg, tens of thousands) of image / interpretation pairs {I ^* , C ^* } are used to train the system in the course of a particular learning session.

【０１４８】図のブロックＢに示されるように、各画像
Ｉ^*は、本発明の解釈プロセス中に行われる方法と概ね
同じ方法で前処理される。また、図２のブロックＣ〜Ｅ
に示されるように、画像セグメント及び画像コンセグメ
ンテーションは、それぞれ本発明の解釈プロセスの過程
で行われる方法と概ね同じ方法で、画像Ｉ^*について作
成される。As shown in block B of the figure, each image I ^* is preprocessed in much the same way as is done during the interpretation process of the present invention. In addition, blocks C to E in FIG.
As shown in, the image segment and the image consegmentation are created for the image I ^* , respectively, in much the same way as done during the interpretation process of the present invention.

【０１４９】その後、図２のブロックＦに示されるよう
に、生成画像コンセグメンテーション及び画像Ｉ^*に付
随する可能文字列解釈についてグラフモデルが作成され
る。学習プロセスのこのステージにおいて、本発明の学
習方法は次の事実を開発する。Then, as shown in block F of FIG. 2, a graph model is created for the generated image consegmentation and possible string interpretation associated with the image I ^* . At this stage of the learning process, the learning method of the present invention develops the following facts.

【０１５０】第１に、各確率Ｐ（Ｃ^*／Ｉ^*）は分子部分
Ｎ（Ｃ^*／Ｉ^*）と共通分母部分ＤＩを有する。第２に、
対数及び導関数の周知の特性を用いて、グラフ感度ベク
トル（すなわち、ｒ−スコア変数に関するｌｏｇ［Ｐ
（Ｃ^*／Ｉ^*）］の部分導関数）は次の数７により再表示
することができる。First, each probability P (C ^* / I ^* ) has a numerator part N (C ^* / I ^* ) and a common denominator part DI. Second,
Using the well-known properties of logarithm and derivative, the graph sensitivity vector (ie, log [P
The partial derivative of (C ^* / I ^* )] can be re-expressed by the following equation 7.

【０１５１】[0151]

【数７】 (Equation 7)

【０１５２】等式の左側に目立って示される、グラフ感
度ベクトルは、下記に説明する図１９及び図２０に示さ
れる手順により容易に数値を求めることができる。The graph sensitivity vector, which is conspicuously shown on the left side of the equation, can be easily obtained by the procedure shown in FIGS. 19 and 20 described below.

【０１５３】図１９のブロックＡに示されるように、プ
ロセッサは順方向伝播アルゴリズムを実行し、画像／解
釈対｛Ｉ^*，Ｃ^*｝に関する確率Ｐ（Ｃ^*／Ｉ^*）の分子部
分の数値を求め、そして、その分母部分の数値を求め
る。その後、この数値を記憶する。プロセスのこのステ
ップにおいて、順方向アルゴリズムは、画像／解釈対
｛Ｉ^*，Ｃ^*｝について作成されたグラフは、付随確率Ｐ
（Ｃ^*／Ｉ^*）の分子部分及び分母部分を数学的に表示す
るために使用される分析的（すなわち、代数学的）式を
暗黙的に示すという事実を開発する。As shown in block A of FIG. 19, the processor executes a forward propagation algorithm to compute the numerical value of the numerator part of the probability P (C ^* / I ^* ) for the image / interpretation pair {I ^* , C ^* }. Then, the numerical value of the denominator part is calculated. Then, this numerical value is stored. In this step of the process, the forward algorithm computes the graph created for the image / interpretation pair {I ^* , C ^* } with the associated probability P.
We develop the fact of implicitly showing the analytical (ie algebraic) formulas used to mathematically represent the numerator and denominator parts of (C ^* / I ^* ).

【０１５４】図１９のブロックＢにおいて、プロセッサ
は、変数ｒに関する確率Ｐ（Ｃ^*／Ｉ^*）の分子部分の部
分導関数の数値を求めるために、周知のバウム−ウエル
チ(Baum-Welch)アルゴリズムを実行する。ブロックＣに
おいて、プロセッサは順方向アルゴリズムを使用し、確
率Ｐ（Ｃ^*／Ｉ^*）の分母部分の値を計算する。In block B of FIG. 19, the processor determines the numerical value of the partial derivative of the numerator part of the probability P (C ^* / I ^* ) with respect to the variable r by the well-known Baum-Welch algorithm. To execute. At block C, the processor uses a forward algorithm to calculate the value of the denominator part of the probability P (C ^* / I ^* ).

【０１５５】ブロックＤにおいて、プロセッサは、変数
に関する確率Ｐ（Ｃ^*／Ｉ^*）の分母部分の部分導関数の
数値を求めるために、周知のバウム−ウエルチ(Baum-We
lch)アルゴリズムを実行する。その後、図２０のブロッ
クＥにおいて、前記の数式に従ってグラフ感度ベクトル
の数値を求めるために、プロセッサは、数値が求められ
た分子部分及び分母部分及びその部分導関数を使用す
る。In block D, the processor determines the numerical value of the partial derivative of the denominator part of the probability P (C ^* / I ^* ) for the variable by the well known Baum-Welch method.
lch) Run the algorithm. Then, in block E of FIG. 20, the processor uses the numerically evaluated numerator and denominator parts and their partial derivatives to numerically determine the graph sensitivity vector according to the above equation.

【０１５６】画像／解釈対｛Ｉ^*，Ｃ^*｝に関するシステ
ム感度ベクトルの数値を効率的に求めるために、図２０
のブロックＦに示されるように、学習プロセスは、数値
の求められたグラフ感度ベクトルの対応成分に等しい各
ニューラルネットワークの出力層勾配ベクトルを設定す
る。To efficiently determine the numerical value of the system sensitivity vector for the image / interpretation pair {I ^* , C ^* }, see FIG.
The learning process sets the output layer gradient vector of each neural network equal to the corresponding component of the numerically determined graph sensitivity vector, as shown in block F of.

【０１５７】次いで、ブロックＧにおいて、プロセッサ
は逆方向伝播アルゴリズムを使用し、前記の数式に従っ
てシステム感度ベクトルの成分を計算する。所望の結果
を計算するために使用される逆方向伝播アルゴリズムの
プロセスの詳細な説明は、前掲のDenker et al., "Auto
matic Learning, Rule Extraction, and Generalizatio
n"に開示されている。Then, in block G, the processor uses the backpropagation algorithm to calculate the components of the system sensitivity vector according to the above equation. A detailed description of the process of the backpropagation algorithm used to calculate the desired result is given in Denker et al., "Auto.
matic Learning, Rule Extraction, and Generalizatio
n ".

【０１５８】逆方向伝播アルゴリズムはニューラルネッ
トワーク感度マトリックスを明快に評価するためには使
用されず、むしろ、ニューラルネットワーク感度マトリ
ックスとグラフ感度ベクトルのベクトル・マトリックス
積の数値を求めるために使用される。The back-propagation algorithm is not used to unambiguously evaluate the neural network sensitivity matrix, but rather to determine the vector-matrix product of the neural network sensitivity matrix and the graph sensitivity vector.

【０１５９】その結果は、全体的なシステム感度ベクト
ルの明快な評価である。これは、各ニューラルネットワ
ークの重みベクトルにおける成分の各々を更新する有効
な方向を示唆する。ブロックＨに示されるように、各画
像Ｉ^*を処理した後、プロセッサは、数値の求められた
システム感度ベクトルの個々の成分を使用し、重みベク
トルの個々の成分を更新する。好ましい更新手順を以下
説明する。The result is a clear evaluation of the overall system sensitivity vector. This suggests a valid direction to update each of the components in the weight vector of each neural network. After processing each image I ^* , as shown in block H, the processor uses the individual components of the numerically determined system sensitivity vector to update the individual components of the weight vector. The preferred update procedure is described below.

【０１６０】更新前、重みベクトルの各ｉ番目の成分は
Ｗ_iとして示され、更新後、各ｉ番目の成分はＷ_i’とし
て示される。各画像Ｉ^*を処理した後、重みベクトルは
下記の数８に従って更新される。Before updating, each i-th component of the weight vector is shown as W _i , and after updating each i-th component is shown as W _i ′. After processing each image I ^* , the weight vector is updated according to Equation 8 below.

【０１６１】[0161]

【数８】 (Equation 8)

【０１６２】前記数８において、δ_iは“ステップサイ
ズ制御パラメータ”であり、Ｗ_i’は更新された重みベ
クトルを示し、∂ｌｏｇ（Ｃ^*／Ｉ^*）／∂Ｗ_iはＷ_iに関
するｌｏｇ（Ｉ^*／Ｃ^*）の部分導関数である。原則とし
て、重みベクトルの各成分について、多数の異なるステ
ップサイズ制御パラメータδ_iが存在するが、実際的に
は、これらは全て均等に設定することが好ましい。In Expression 8, δ _i is a “step size control parameter”, W _i ′ represents an updated weight vector, and ∂log (C ^* / I ^* ) / ∂W _i is a log relating to W _i. It is a partial derivative of (I ^* / C ^* ). In principle, there are many different step size control parameters δ _i for each component of the weight vector, but in practice it is preferable to set them all equally.

【０１６３】一般的に、ステップサイズ制御パラメータ
の値は、(i)ニューラルネットワークに対する画素入力
について選択された正規化ファクタ、及び(ii)ニューラ
ルネットワークの中間値（すなわち、ニューラルネット
ワーク内の或る層から次の層までの出力）について選択
された正規化ファクタにより左右され、学習中に再推定
することができる。In general, the value of the step size control parameter is (i) the normalization factor selected for the pixel inputs to the neural network, and (ii) the median value of the neural network (ie, some layer in the neural network). To the next layer) and can be re-estimated during training, depending on the normalization factor chosen.

【０１６４】要するに、ステップサイズ制御パラメータ
に関する適当な値を選択する場合、２つの主要な関心事
が存在する。この制御パラメータに関する選択値が小さ
すぎる場合、重みベクトルｗのその最適値への収束は非
常に緩慢に進行する。一方、この制御パラメータに関す
る選択値が大き過ぎる場合、学習プロセスは、ｗの最適
値を飛び越してしまう危険性が非常に高い。この重み空
間Ｗの現象は、“発振分岐”と呼ばれる。これはシステ
ム性能の全体的品質を低下させ易く、学習手順を完全に
崩壊させてしまう。In summary, there are two main concerns when choosing an appropriate value for the step size control parameter. If the selected value for this control parameter is too small, the convergence of the weight vector w to its optimum value proceeds very slowly. On the other hand, if the selected value for this control parameter is too large, the learning process is very likely to skip over the optimal value of w. This phenomenon of the weight space W is called “oscillation branch”. This tends to reduce the overall quality of system performance and completely disrupts the learning procedure.

【０１６５】前記の学習プロセスは、学習セット内の各
画像／解釈対｛Ｉ^*，Ｃ^*｝について反復される。学習モ
ードにおけるシステムにより更に一層多数の学習データ
が処理されるにつれて、ニューラルネットワーク重みベ
クトルの個々の成分の値は、本発明の学習プロセスを支
配する目的関数を満たす最適値に向かって収束する。学
習プロセス中に、ビームサーチアルゴリズム又はヴィテ
ルビアルゴリズムを行う必要性は存在しない。The above learning process is repeated for each image / interpretation pair {I ^* , C ^* } in the learning set. As more and more learning data is processed by the system in learning mode, the values of the individual components of the neural network weight vector converge towards an optimal value that satisfies the objective function governing the learning process of the present invention. There is no need to perform beam search or Viterbi algorithms during the learning process.

【０１６６】学習プロセスが申し分のない重みベクトル
を生成したら、システムは、学習データベースを更に参
照することなく、その認識及びスコアリングタスクを実
行できる。このことは、学習は“研究室(in the lab)”
で行うことができ、認識及びスコアリングは“現場(in
the field)”で行うことができることを意味する。Once the learning process has generated a satisfactory weight vector, the system can perform its recognition and scoring tasks without further reference to the learning database. This means learning is “in the lab”
Recognition and scoring can be done in
means that you can do it in the field) ”.

【０１６７】現場で得られた結果は、学習データベース
又は学習アルゴリズムを記憶するための設備を有するこ
とを必要としない。或る場合には、現場で得られた結果
が再学習又は増分学習を行うことができることが望まし
いこともある。このような場合には、選択された学習例
を記憶するための設備が必要になることもある。The results obtained in the field do not require having a facility for storing learning databases or learning algorithms. In some cases, it may be desirable for the results obtained in the field to be re-learned or incrementally learned. In such cases, equipment may be needed to store the selected learning examples.

【０１６８】特に、図１８に示されるような“パーソナ
ル”認識器の場合、システムを再学習させることにより
システムの性能を最大にし、この認識器が供給する具体
例に基づいて、シングルユーザ又は小さなユーザ群の特
異性を適応させることができる。In particular, in the case of a "personal" recognizer as shown in FIG. 18, retraining the system maximizes system performance and is based on the implementation provided by this recognizer, either single-user or small. The specificity of the user group can be adapted.

【０１６９】本発明の方法及びシステムが携帯用手書認
識装置で実現される場合、ユーザにより確認された、単
語、数字列などのビットマップ化画像は、このデバイス
内の不揮発性メモリ構造内に記憶させることが好まし
い。このメモリ構造の機能は、画像／解釈対｛Ｉ^*，
Ｃ^*｝に対応するビットマップ化及びＡＳＣＩＩフォー
マット化情報の両方を記憶することである。デバイスの
使用期間にわたって、学習データセットはこのような収
集情報から構築される。When the method and system of the present invention is implemented in a portable handwriting recognizer, the bitmapped image of a word, number sequence, etc., as seen by the user, is stored in a non-volatile memory structure within the device. It is preferable to store it. The function of this memory structure is that the image / interpretation pair {I ^* ,
To store both bitmapped and ASCII formatted information corresponding to C ^* }. Over the life of the device, a training dataset is constructed from such collected information.

【０１７０】学習データセットが十分なサイズのもので
ある場合、携帯用デバイスはその“学習モード”で動作
させることができる。各画像／解釈対｛Ｉ^*，Ｃ^*｝が再
処理された後、重みベクトルの個々の成分は、前記の目
的関数が行われるような方法で増分的に調整される。If the learning data set is of sufficient size, the portable device can be operated in its "learning mode". After each image / interpretation pair {I ^* , C ^* } has been reprocessed, the individual components of the weight vector are incrementally adjusted in such a way that the above objective function is performed.

【０１７１】本発明の多数の種類の追加実施例も容易に
構成させることできる。例えば、画像情報から導出され
る前処理画像の代わりに、システムへの入力は、ペンス
トローク情報から導出される前処理画像又はストローク
情報から導出されるリスト（画像形ではない）であるこ
ともできる。別の例では、入力は音声信号（例えば、会
話）から導出される前処理情報からなることもできる。Many types of additional embodiments of the present invention can also be readily constructed. For example, instead of a preprocessed image derived from the image information, the input to the system could be a preprocessed image derived from the pen stroke information or a list (not an image form) derived from the stroke information. . In another example, the input can also consist of pre-processing information derived from a voice signal (eg, speech).

【０１７２】同様に、その他の形態の出力も実現でき
る。出力記号は数字だけでなく、アルファベット文字、
音素、単語全体、省略記号又はこれらの集団なども示す
ことができる。高雑音通信チャネルにより伝送される復
号化及びエラー訂正符号化記号のような用途を想像する
ことは容易である。Similarly, other forms of output can be realized. Output symbols are not only numbers, but also alphabet letters,
Phonemes, whole words, ellipsis or groups of these can also be indicated. It is easy to imagine applications such as decoding and error correction coded symbols transmitted over high noise communication channels.

【０１７３】別の実施例では、ニューラルネットワーク
の複合体により行われる機能は、(1)入力を受信するこ
とができ、(2)一連のパラメータに従い、スコア又はス
コアのベクトルとして解釈されることができる出力を生
成することができ、(3)所定の導関数ベクトルに基づ
き、導関数ベクトルにより指定される方向に出力を変更
する方法にパラメータ群を調整できるデバイスにより実
行することができる。In another embodiment, the function performed by the neural network complex is (1) capable of receiving input and (2) interpreted as a score or vector of scores, according to a set of parameters. Possible outputs can be generated, and (3) based on a given derivative vector, performed by a device that can adjust the parameters in a way that changes the output in the direction specified by the derivative vector.

【０１７４】“アライメントグラフ”により行われる機
能は、常用の動的プログラミング格子又は必要な方法で
一連の情報を処理するデバイスにより実行することがで
きる。この方法は特に、(1)シーケンスの一部である様
々な実体を記載するスコアを受信し、(2)様々な高スコ
アリングシーケンス及び対応する解釈を効率的に識別
し、(3)所定の解釈と一致する全てのシーケンスに関す
る全スコアを効率的に計算し、そして(4)入力スコアに
対するその結果の感度を効率的に計算することからな
る。The functions performed by the "alignment graph" can be performed by a conventional dynamic programming grid or a device that processes a series of information in the required manner. This method is specifically (1) receiving scores describing various entities that are part of the sequence, (2) efficiently identifying various high scoring sequences and corresponding interpretations, and (3) predetermined It consists of efficiently computing the total score for all sequences that match the interpretation, and (4) efficiently computing the sensitivity of the result to the input score.

【０１７５】また、処理チェーン内のモジュールの個数
は２以上であることができる。各モジュールは、(i)感
度出力（先行モジュールが調整可能なパラメータを有す
る場合，(ii) 感度入力（このモジュール又は先行モジ
ュールが調整パラメータを有する場合），及び(iii)通
常のデータ入力及びデータ出力を有しなければならな
い。Also, the number of modules in the processing chain can be two or more. Each module has (i) a sensitivity output (if the preceding module has adjustable parameters), (ii) a sensitivity input (if this module or the preceding module has adjustable parameters), and (iii) normal data input and data. Must have an output.

【０１７６】ここに説明した確率は、ゼロ又は１の間の
数字によりプロセッサ及びメモリで示される必要はな
い。例えば、この確率を若干大きな負数とゼロとの間の
範囲内の対数確率として記憶し、そして、確率の級数及
び並列組合せを記載する計算ステップを適合させること
が好ましい。The probabilities just described need not be indicated in the processor and memory by numbers between zero or one. For example, it is preferable to store this probability as a logarithmic probability in the range between a slightly larger negative number and zero, and adapt the calculation steps describing the series and parallel combinations of probabilities.

【０１７７】本発明のシステム及び方法は入力記号表現
を解釈するために使用できる。このような入力記号表現
は各種様々な媒体｛例えば、紙、木、ガラスなどのよう
な電気的に受動的な（図形）記録媒体，感圧式書込面及
びタッチスクリーン式書込及び表示面のような電気的な
能動的な記録媒体，人間の音声及び機械生成音声のよう
な音声記録媒体，及び空気のような媒体（この場合、空
気中を波動するペンストロークは、例えば、ＲＦ位置セ
ンシング、光位置センシング、容量性位置センシングな
どの電気的に能動的な非接触方式により、符号化され
る）など｝に表示され、次いで、本発明のシステム及び
方法を用いて、伝送され、記憶され及び／又は認識され
る。このような用途では、記号のシーケンスを表面上に
図形的に表示する必要はなく、単に表示するだけでよ
い。The systems and methods of this invention can be used to interpret input symbolic representations. Such input symbolic representations can be used for a variety of different media {eg, electrically passive (graphic) recording media such as paper, wood, glass, pressure sensitive writing surface and touch screen writing and display surface. Such as electrically active recording media, audio recording media such as human voice and machine-generated voice, and media such as air (where pen strokes waving in air are, for example, RF position sensing, Coded by an electrically active contactless method such as optical position sensing, capacitive position sensing, etc.), and then transmitted, stored and using the system and method of the present invention. / Or be recognized. In such an application, the sequence of symbols need not be displayed graphically on the surface, but merely be displayed.

【０１７８】本発明のシステム及び方法は常用の音声認
識システムでも使用できる。このような用途は例えば、
入力データセットは、時間領域に示された被記録音声発
音（すなわち、音声信号）である。本発明によれば、被
記録音声発音は、それぞれ非常に短い持続時間の小さな
音声サンプル（例えば、音声セル）に分割される。各音
声セルは前処理され、そして速度セルに分割される。The system and method of the present invention can also be used in conventional voice recognition systems. Such applications include, for example:
The input data set is the recorded voice pronunciation (ie, voice signal) shown in the time domain. According to the invention, the recorded voice pronunciation is divided into small voice samples (e.g. voice cells), each of which has a very short duration. Each voice cell is preprocessed and divided into velocity cells.

【０１７９】その後、音声セルは結合され、“音声セグ
メント”を生成する。各セグメントは、音声発音中の少
なくとも一つの音素を示すスペクトル情報を包含する。
次いで、これらの音声セグメントを結合し、本発明の非
環式グラフを用いて示されるコンセグメンテーションを
生成する。その後、このコンセグメンテーション及び全
可能音素ストリング解釈群を用いて、本発明のシステム
及び方法は、最高スコアリング音素ストリング解釈に関
する帰納的確率を計算し始める。この音声認識プロセス
の細部は音声認識分野の当業者に自明である。The voice cells are then combined to produce a "voice segment". Each segment contains spectral information that indicates at least one phoneme in the phonetic pronunciation.
These speech segments are then combined to produce the consegmentation shown using the acyclic graph of the present invention. Then, using this consegmentation and the set of all possible phoneme string interpretations, the system and method of the present invention begins to compute the a posteriori probability for the highest scoring phoneme string interpretation. The details of this speech recognition process will be apparent to those skilled in the art of speech recognition.

【０１８０】[0180]

【発明の効果】以上説明したように、本発明によれば、
印刷又は筆記体書込技術により媒体に表示又は記録され
た文字列のような入力記号表現を解釈する優れた方法及
びシステムが提供される。本発明によれば、最良文字列
解釈の選択に、帰納的確率を使用し、各帰納的確率が、
先験的情報と既知例の画素画像と結合することにより帰
納的に導出され、任意の長さの文字列を正確に解釈する
ことができる。As described above, according to the present invention,
Superior methods and systems are provided for interpreting input symbolic representations such as strings displayed or recorded on media by printing or cursive writing techniques. According to the present invention, the recursive probabilities are used to select the best string interpretation, and each recursive probability is
It can be derived a posteriori by combining a priori information with a pixel image of a known example, and a character string of arbitrary length can be accurately interpreted.

[Brief description of drawings]

【図１】本発明の一例による文字列解釈システムを実現
するために使用される様々な構成部品を示すシステムブ
ロック図である。FIG. 1 is a system block diagram illustrating various components used to implement a string interpretation system according to an example of the present invention.

【図２】本発明の文字列解釈システムのブロック図であ
る。FIG. 2 is a block diagram of a character string interpretation system of the present invention.

【図３】筆記体書込技術を用いた手書きＺＩＰコードの
前処理画像の図である。FIG. 3 is a diagram of a preprocessed image of a handwritten ZIP code using a cursive writing technique.

【図４】図３におけるＺＩＰコードの前処理画像の図で
あり、本発明の文字列解釈方法の画像セル生成ステージ
中に生成された一連の重ね合わせカットラインを有す
る。FIG. 4 is a diagram of a pre-processed image of the ZIP code in FIG. 3 with a series of overlay cut lines generated during the image cell generation stage of the character string interpretation method of the present invention.

【図５】図３におけるＺＩＰコードの前処理画像の図で
あり、本発明の文字列解釈方法の画像セル生成ステージ
中に生成された一連の重ね合わせカットラインを有す
る。FIG. 5 is a diagram of a preprocessed image of the ZIP code in FIG. 3, having a series of overlay cutlines generated during the image cell generation stage of the character string interpretation method of the present invention.

【図６】図３におけるＺＩＰコードの前処理画像の図で
あり、本発明の文字列解釈方法の画像セル生成ステージ
中に生成された一連の重ね合わせカットラインを有す
る。FIG. 6 is a diagram of a pre-processed image of the ZIP code in FIG. 3 with a series of overlay cut lines generated during the image cell generation stage of the character string interpretation method of the present invention.

【図７】図３におけるＺＩＰコードの前処理画像の図で
あり、本発明の文字列解釈方法の画像セル生成ステージ
中に生成された一連の重ね合わせカットラインを有す
る。7 is a diagram of a pre-processed image of the ZIP code in FIG. 3 having a series of overlay cutlines generated during the image cell generation stage of the character string interpretation method of the present invention.

【図８】図３におけるＺＩＰコードの前処理画像の図で
あり、本発明の文字列解釈方法の画像セル生成ステージ
中に生成された一連の重ね合わせカットラインを有す
る。FIG. 8 is a diagram of a pre-processed image of the ZIP code in FIG. 3 with a series of overlay cut lines generated during the image cell generation stage of the character string interpretation method of the present invention.

【図９】図４〜図８に示されたカットライン間に生成さ
れた画像“セル”（すなわち、副画像）の表図である。9 is a table of image "cells" (ie, sub-images) generated between the cut lines shown in FIGS. 4-8.

【図１０】図４〜図９に示された隣接画像セルを結合す
ることにより生成された画像“セグメント”の表図であ
る。10 is a table of image "segments" produced by combining adjacent image cells shown in FIGS. 4-9. FIG.

【図１１】図１０に示された空間的に連続的な画像セグ
メントの結合組により生成された多数の合法的画像“コ
ンセグメンテーション”のうちの３種類のコンセグメン
テーションを示す表図である。11 is a table showing three types of concatenation of the number of legal image “consegmentation” generated by the combined set of spatially continuous image segments shown in FIG.

【図１２】画像セグメント、これから生成された可能画
像コンセグメンテーション、可能文字列解釈及び可能文
字列解釈に割当てられたスコアを図形的に示すために使
用される本発明の新規なデータ構造を図形的に示す模式
図である。FIG. 12 graphically illustrates the novel data structure of the present invention used to graphically represent image segments, possible image consegmentation generated therefrom, possible string interpretations and scores assigned to possible string interpretations. It is a schematic diagram shown in.

【図１３】１１個の画像セグメントに分析されたＺＩＰ
コード画像を認識するために適合的に構成された本発明
の文字列解釈システムの模式図である。FIG. 13: ZIP analyzed into 11 image segments
FIG. 3 is a schematic diagram of a character string interpretation system of the present invention that is adapted to recognize a code image.

【図１４】本発明により文字列を解釈する方法において
行われるステップを例証する高レベル流れ図であり、下
記の図１５と一体的に組み合わされる。FIG. 14 is a high level flow chart illustrating the steps performed in a method of interpreting a string according to the present invention, which is integrally combined with FIG. 15 below.

【図１５】本発明により文字列を解釈する方法において
行われるステップを例証する高レベル流れ図であり、前
記の図１４と一体的に組み合わされる。FIG. 15 is a high level flow chart illustrating the steps performed in a method of interpreting a string according to the present invention, which is integrally combined with FIG. 14 above.

【図１６】本発明により文字列を解釈する方法において
行われるステップを例証する高レベル流れ図であり、下
記の図１７と一体的に組み合わされる。FIG. 16 is a high level flow chart illustrating the steps performed in a method of interpreting a string according to the present invention, which is integrally combined with FIG. 17 below.

【図１７】本発明により文字列を解釈する方法において
行われるステップを例証する高レベル流れ図であり、前
記の図１６と一体的に組み合わされる。17 is a high level flow chart illustrating the steps performed in a method of interpreting a string according to the present invention, which is integrally combined with FIG. 16 above.

【図１８】本発明の文字列解釈システムの手持ちタイプ
の概要斜視図である。FIG. 18 is a schematic perspective view of a handheld type character string interpretation system of the present invention.

【図１９】本発明の文字列解釈システムを学習させる方
法において行われるステップを例証する高レベル流れ図
であり、下記の図２０と一体的に組み合わされる。FIG. 19 is a high level flow chart illustrating steps performed in a method of training a string interpretation system of the present invention, which is integrally combined with FIG. 20 below.

【図２０】本発明の文字列解釈システムを学習させる方
法において行われるステップを例証する高レベル流れ図
であり、前記の図１９と一体的に組み合わされる。FIG. 20 is a high level flow chart illustrating the steps performed in a method for training a string interpretation system of the present invention, which is integrally combined with FIG. 19 above.

[Explanation of symbols]

１本発明の記号シーケンス解釈システム２プロセッサ３プログラム記憶メモリ４データ記憶メモリ５画像取得デバイス７フレームバッファ８大容量記憶メモリ９可視表示装置１０キーボード１１ポインティングデバイス（マウス）１２入力／出力デバイス１３システムインターフェース１４ホストシステム１５システムバス 1 Symbol Sequence Interpretation System of the Present Invention 2 Processor 3 Program Storage Memory 4 Data Storage Memory 5 Image Acquisition Device 7 Frame Buffer 8 Mass Storage Memory 9 Visible Display 10 Keyboard 11 Pointing Device (Mouse) 12 Input / Output Device 13 System Interface 14 Host system 15 System bus

フロントページの続き (72)発明者ジョンスチュワードデンカーアメリカ合衆国，07737 ニュージャージー，レオナルド，クースマンドライブ６Front Page Continuation (72) Inventor John Steward Denker USA, 07737 New Jersey, Leonardo, Coosman Drive 6

Claims

[Claims]

1. A system for analyzing an input symbol representation and scoring possible interpretations of the input symbol representation, analyzing an input data set representing the input symbol representation, and dividing the input data set into a plurality of segments. Segment generating means, wherein each segment has a definable boundary and can be classified as indicating any one of a plurality of symbols within a given symbol set,
Segment scoring means for analyzing each segment in the plurality of segments and assigning a score to each possible classification of the segment associated with a particular symbol in the predetermined symbol set; and a plurality of possible interpretations of the input symbol representation. And a display means showing a plurality of image concatenations, wherein each possible interpretation consists of a different sequence of the symbols and each consegmentation consists of a different sequence of the segments, based on a score assigned to the segment, Consegmentation scoring means for assigning scores to the plurality of consegmentations; candidate interpretation identifying means for identifying one or more candidate symbol interpretations from the plurality of possible interpretations based on the scores assigned to the plurality of segments; The plurality of segments A symbol sequence scoring means for assigning a score to the one or more candidate interpretations based on a score assigned to one or more of them, and a first score for evaluating the score assigned to the one or more candidate interpretations Evaluating means, second score evaluating means for evaluating the scores assigned to the plurality of candidate interpretations, and normalized score generation for generating a normalized score for each candidate interpretation using the evaluation scores for the plurality of possible interpretations. And a scoring system for possible interpretation of the input symbolic representation, characterized by comprising:

2. The input data set comprises a series of pixels associated with a captured image of a graphically represented symbol sequence, the segment generating means analyzing the group of pixels to define the series of pixels as a plurality of pixels. Split into image segments,
Thereby, each of the image segments has a specified boundary and can be classified as indicating any one or more characters of the plurality of characters in a given character set. Item 1 system.

3. The segment scoring means analyzes each image segment in the plurality of image segments,
The system of claim 2, then assigning a score to each possible classification of the image segment, each assigned score being associated with a particular character within the predetermined character set.

4. The system of claim 3, wherein said display means indicates a plurality of character sequences and a plurality of image consegmentations, each possible character sequence comprises said character sequence, and each said concatenation comprises said image segment sequence. .

5. The consegmentation scoring means assigns scores to the plurality of image segmentations based on the scores assigned to the image segments, and the candidate symbol sequence identifying means assigns scores to the image segments. 5. The system of claim 4, based on which one or more candidate character sequences are identified.

6. The symbol sequence scoring means assigns scores to the one or more candidate character sequences based on the scores assigned to the image segments, and the first score evaluating means includes the one or more candidates. The system of claim 5, wherein the score assigned to the character sequence is evaluated.

7. The second score evaluation means evaluates the scores assigned to the plurality of possible character sequences, and the score normalization means uses the evaluation scores for the plurality of possible character sequences to obtain the candidate characters. 7. The system of claim 6, normalizing the scores assigned to the sequences.

8. The display means is arranged in columns and rows,
And a data structure that can be displayed by a graph consisting of a two-dimensional node array selectively connected by directed arcs, each node column is indicated by one character position, and each node row is the acquired image. Indexed by one said image segment in an order corresponding to the spatial structure of
Each path that passes through the node and that extends along the directed arc represents one of the image consegmentation and one of the possible character sequences, and approximately all of the image consegmentation and approximately all of the possible character sequences. 8. The system of claim 7, indicated by a series of paths extending through the graph.

9. The system of claim 8, wherein each node further comprises a series of recognition arcs, each recognition arc indicating one of the letters and associated with one of the assignment scores.

10. The system of claim 1, wherein said display means is indicative of said plurality of possible interpretations and said plurality of image consegmentations.

11. The display means comprises a data structure which can be displayed by a graph which is arranged in columns and rows and which is composed of a two-dimensional node array which is selectively connected by directed arcs. Pointed to by one symbolic position, each said node row being pointed by one said segment in an order generally corresponding to the sequential structure of said input data set, passing through said node and along said directed arc. Each extending path represents one said segmentation and one said possible interpretation of said input symbolic representation, wherein substantially all said consegmentation and substantially all said possible interpretations are represented by a series of paths extending through said graph. 10 systems.

12. The display means has a data structure that can be displayed by a graph that is arranged in columns and rows and that is composed of a two-dimensional node array that is selectively connected by directed arcs. Pointed to by one symbolic position, each said node row being pointed by one said segment in an order generally corresponding to the sequential structure of said input data set, passing through said node and along said directed arc. 3. Each path extending represents one said segmentation and one said possible interpretation for said input symbolic representation, all said consegmentations and all said possible interpretations being indicated by a series of paths extending through said graph. system.

13. A method of generating an interpretation of an input symbolic representation, the input symbolic representation being represented in a medium, the interpretation being a sequence of symbols, each symbol being an element within a predetermined symbol set, The method comprises: (a) obtaining an input data set representing the input symbolic representation, and (b) processing the input data set to generate a series of segments, where:
The segment is at least a partial subset of the acquired input data set and can be classified as indicating any one symbol in the predetermined symbol set,
(c) generating a data structure showing a set of concatenations and a set of possible interpretations for the input symbolic representations, where each said concatenation is
Collectively showing the input datasets, consisting of the segments arranged in an order that generally preserves the sequential structure of the input datasets, each possible interpretation of the input symbolic representations consisting of possible symbol sequences, Each symbol in the possible symbol sequence is selected from a predetermined symbol set and occupies a symbol position in the possible symbol sequence, the data structure being arranged in columns and rows, selectively linked by directed arcs. Graphically represented by a graph consisting of a two-dimensional array of nodes, each column of nodes can be pointed to by one of the symbolic positions, and each row of nodes can be represented in an order corresponding to the logical structure of the acquired input data set. Each path that can be pointed to by one of the image segments and that extends through the node and along the directed arc is One said possible segmentation and one said possible interpretation for said input symbolic representation, all said segmentation and all said possible interpretations for said input symbolic representation are indicated by a series of paths extending in said graph, (d ) Generating, for each node row in the graph, a series of scores for the predetermined symbol set represented by each node in the row, wherein generating the series of scores produces the series of scores. (E) implicitly or explicitly assigning a path score to a path through the graph, including (f) analyzing one or more possible interpretations of the input symbolic representation, including analysis of segments pointing to the node rows Analyzing the path scores attributed to the paths passing through the graph in step (e) to make a selection; Method of generating an interpretation of the input expression, characterized by comprising.

14. Each node further comprises a series of recognition arcs, each recognition arc representing one of the characters, and
14. The method of claim 13, associated with one of the scores generated in (d).

15. The method of claim 14, wherein step (d) comprises using a plurality of adjustable parameters to generate the series of scores.

16. The information processing means characterized by said plurality of adjustable parameters is used in step (d) for analyzing each said segment and for generating said score group for this segment. Item 15 method.

17. Step (f) comprises computing a quantity corresponding to an inductive probability for at least one of the possible interpretations of the input symbolic representation, where:
Each said quantity is calculated as the ratio of the numerator part to the denominator part, the numerator part corresponding to the sum of the path scores for almost all paths through the graph showing one said possible interpretation of the said input symbolic representation, each said The path score corresponds to the product of the scores associated with the recognition arcs along one of the paths, and the denominator part is the path score for almost all paths through the graph showing almost all of the possible interpretations for the input symbolic representation. 15. The method of claim 14 corresponding to the sum of the scores, each path score corresponding to the product of the scores associated with the recognition arcs along one of the paths.

18. In step (f), (1) determining the path through the graph having the highest path score, and (2) relating to the input symbolic representation represented by the path determined in substep (1). Identify possible interpretations, (3) calculate the quantity for the possible interpretations for the input symbolic representation identified in substep (2), and (4) calculate the quantity and substeps calculated in substep (3). 2)
18. Providing as an output an indication indicating the possible interpretation of the input symbolic representation identified in.
the method of.

19. In step (f), further comprising: (1) determining a series of paths through the graph having a high series of path scores, and (2) the series of paths determined in substep (1). Identifying a set of possible interpretations for the input symbolic representation indicated by the path, and (3) calculating a set of said quantities for said set of possible interpretations for the input symbolic representation identified in substep (2), Analyzing the set of quantities calculated in sub-step (3) to determine which of the possible interpretations of the input symbolic representation have a high scoring recursive probability, and The method of claim 17, comprising providing as outputs possible indications for the input symbolic representation identified in substep (2) and indications of high scoring recursive probabilities determined in substep (4).

20. Each of the inductive probabilities is calculated as a ratio of the numerator part to the denominator part, and step (f) further comprises (1) determining a series of paths through the graph having a high series of path scores. , (2) identifying a set of possible interpretations for the input symbolic representation represented by the sequence of paths determined in substep (1), and (3) relating the input symbolic representation identified in substep (2). Compute a series of the quantities for the set of possible interpretations, and (4) output a series of possible interpretations for the input symbolic representation identified in substep (2) and the quantity calculated in substep (3). 18. The method of claim 17, comprising:

21. In step (d), the set of adjustable parameters is between the segment supplied to the information processing means for analysis and the series of scores generated from the information processing means. 16. The method of claim 15, wherein the relationship is specified.

22. (1) processing a large number of known symbol sequences using said information processing means, and (2) incrementally adjusting said series of adjustable parameters for each known sequence, whereby 22. The method of claim 21, comprising training the information processing means by averaging the probabilities assigned to the correct interpretations and decreasing the probabilities assigned to the incorrect interpretations.

23. The method of claim 22, wherein said information processing means comprises a neural information processing network.

24. The method of claim 13, wherein the input symbolic representation is displayed using a printing or cursive writing technique and is graphically recorded on a recording medium.

25. A system for generating an interpretation of an input symbolic representation, the input symbolic representation being represented in a medium, the interpretation being a symbol sequence, each symbol being an element within a predetermined symbol set, The system comprises: (a) a dataset acquisition means for obtaining an input dataset indicative of the input symbolic representation; and (b) a data processing means for processing the obtained dataset to generate a plurality of segments, wherein: Each segment has a specifiable boundary and can be classified as indicating any one of a plurality of symbols within a given symbol set,
(c) A segmentation specifying means for generating data specifying a series of consegmentation, and here,
Each said segmentation collectively represents said acquired input data and consists of a series of said segments arranged in an order generally preserving the sequential structure of said acquired input data set, and (d) a series of said input symbolic representations. Symbol sequence interpretation designating means for generating data designating possible interpretations, wherein each possible interpretation of the input symbolic representation comprises a possible sequence of symbols, wherein each said symbol in said possible sequence of symbols is said predetermined Storing in a data structure selected data from a set of symbols and occupying a symbol position within said possible sequence of symbols, (e) generating each said consegmentation and said each possible interpretation of said input symbolic representation. The data storage means and the data structure are arranged in columns and rows and selectively connected by directed arcs. Graphically represented by a graph consisting of a two-dimensional array of nodes, each column of nodes can be pointed to by one of the symbolic positions, and each node row is in an order corresponding to the sequential structure of the acquired input data set. Each path that can be pointed to by one of the image segments and that extends through the node and along the directed arc represents one of the series of concatenations and one of the possible interpretations of the input symbol representation, The set of concatenations and the set of possible interpretations for the input symbolic representations are indicated by a set of paths extending through the graph, (f) analyzing the data in each segment, and each row of nodes in the graph. , Generate a series of scores for the symbol set represented by each node in the row And segment analysis means that a path score calculating means for calculating a path score for each path through the (g) the graph, and
(h) An interpretation generating system for an input symbolic expression, comprising: a path score analyzing means for analyzing a calculation path score in order to select one or more possible interpretations of the input symbolic expression.

26. The system of claim 25, wherein each node further comprises a series of recognition arcs, each recognition arc representing one of the known symbols and associated with one of the calculated scores.

27. The system of claim 26, wherein said pass score analysis means further comprises means for calculating a quantity corresponding to an inductive probability of each of said possible interpretations of said input symbolic representation.

28. Each said quantity is calculated as a ratio of a numerator part to a denominator part, the numerator part being the sum of path scores for almost all paths through a graph showing one said possible interpretation of said input symbolic representation. Correspondingly, each path score corresponds to the product of the scores associated with the recognition arcs along one of the paths, and the denominator part is approximately all that passes through the graph showing almost all the possible interpretations of the input symbolic representation 28. The system of claim 27, which corresponds to the sum of the path scores for the paths, each path score corresponding to the product of the scores associated with the recognition arcs along one of the paths.

29. (1) means for determining a path through the graph having the highest path score; and (2) means for identifying possible interpretations for the input symbolic representations indicated by the determined path having the highest path score. And (3) means for calculating the quantity of each of the possible interpretations of the input symbolic representation, and (4) means for supplying as output an indication indicating the calculated quantity and the possible interpretations of the input symbolic representation. The system of claim 25.

30. The path score analysis means is represented by (1) means for determining a series of paths through the graph having the highest series of path scores, and (2) the determined series of paths. Means for identifying a set of possible interpretations for the input symbolic representation, (3) means for calculating the series of quantities for the series of identifiable interpretations for the input symbolic representation, and (4) the calculated series of quantities. And (5) determining which of the possible interpretations of the input symbolic representation has the highest inductive probability of the highest set of path scores, and (5) the input having the highest inductive probability. 30. The system of claim 29, further comprising means for providing, as an output, an indication of the possible interpretations of symbolic representations and the determined highest a posteriori probability.

31. The segment analysis means includes the segment supplied to the information processing means for analysis,
28. The system of claim 27, comprising a set of the adjustable parameters that specify a relationship between the set of scores generated from the information processing means.

32. A system learning means for training the system using a plurality of learning data sets, wherein the learning data set includes an acquisition data set of an input symbol representation and a known and correct interpretation of the input symbol representation. Including the system learning means to increase the average interpretation measure for the known-correct interpretations and decrease the average interpretation measure for the set of incorrect-known interpretations. 32. The system of claim 31, further comprising parameter adjustment means for incrementally adjusting the adjustable parameter of.

33. A system for generating an interpretation of an input symbolic representation, the input symbolic representation being represented in a medium, the interpretation being a sequence of symbols, each symbol being an element within a predetermined symbol set, The system comprises (a) image acquisition means for acquiring an image of the input symbolic representation, and (b) image processing means for processing the image to generate a series of image segments, wherein the image segment is the A sub-image of the acquired image, (c) image concatenation specifying means for generating data specifying a series of image concatenations, wherein each of the image consegmentation indicates the acquired image collectively, Consisting of a series of said image segments arranged in an order that preserves the spatial structure of the acquired image, (d)
Symbol sequence interpretation designating means for generating data designating a series of possible interpretations of the input symbolic representation, wherein each possible interpretation of the input symbolic representation consists of a possible sequence of symbols A symbol is selected from the predetermined symbol set and occupies a symbol position within the symbol sequence, (e) a data structure of each of the image consegmentation and generated data indicating each of the possible interpretations of the input symbolic representation. Data storage means for storing therein, said data structure being graphically represented by a directed acyclic graph consisting of a two-dimensional array of nodes arranged in columns and rows, selectively connected by directed arcs, Each of the node columns can be pointed to by one of the symbol positions, and each of the node rows is in an order corresponding to the spatial structure of the acquired image. Each path that can be pointed to by one of the image segments and that extends through the node and along the directed arc represents one of the image consegmentation and one of the possible interpretations of the input symbolic representation. All of the consegmentation and all of the possible interpretations of the input symbolic representation are represented by a series of paths extending through the graph,
(f) image segment analysis means for analyzing each of the image segments and generating, for each row of nodes in the graph, a series of scores for the predetermined symbol set represented by each node in the row, (g) Path score calculating means for calculating a path score for each path passing through the graph, and (h) path score analyzing means for analyzing the calculated path score to select one or more of the possible interpretations for the input symbolic representation. An interpretation and generation system for input symbolic representations, which consists of and.

34. A system for generating an interpretation of an input symbolic representation, the input symbolic representation being represented in a medium, the interpretation being a sequence of symbols, each symbol being an element within a predetermined symbol set, The system comprises: (a) a means for providing an input data set and a confirmed symbol sequence for each of a plurality of known input symbol representations; and (b) analyzing each input data set to generate a plurality of said input data sets. Segment generation means for dividing into a plurality of segments, wherein the segment has a definable boundary and can be classified as indicating any one of a plurality of symbols in the predetermined symbol set, (c) analyze each segment by one or more adjustable parameters, and a set of scores such that the segmentation depends on the one or more adjustable parameters. Segment scoring means characterized by means assigned to each of the possible interpretations of each, where each score in each of the assigned series of scores is associated with a particular symbol in the predetermined symbol set, (d) display means showing a plurality of possible symbol sequences and a plurality of image consegmentations, wherein each possible symbol sequence comprises a different sequence of the symbols and each consegmentation comprises a different sequence of the segments, (e) Consegmentation scoring means for assigning scores to the plurality of consegmentations based on the scores assigned to the segments; and (f)
Symbol sequence scoring means for assigning a score to each of said confirmed symbol sequences based on a score assigned to one or more of said plurality of consegmentations; and (g) assigned to said confirmed symbol sequences. A first score evaluating means for evaluating the score,
(h) second score evaluation means for evaluating the scores assigned to the plurality of possible symbol sequences, and (i) a normalized score for each confirmed symbol sequence using the evaluation scores for the plurality of possible interpretations. A normalized score generation means for generating, (j) a sensitivity estimation means for estimating the sensitivity of the generated normalized score with respect to the one or more adjustable parameters, and (k) an average probability that each segment is accurately classified. And adjusting the one or more adjustable parameters to reduce the average probability of each segment being incorrectly classified. Interpretation generation system.