JPH01231186A - Character recognizing system - Google Patents

Character recognizing system

Info

Publication number
JPH01231186A
JPH01231186A JP63056221A JP5622188A JPH01231186A JP H01231186 A JPH01231186 A JP H01231186A JP 63056221 A JP63056221 A JP 63056221A JP 5622188 A JP5622188 A JP 5622188A JP H01231186 A JPH01231186 A JP H01231186A
Authority
JP
Japan
Prior art keywords
character
line
center line
classification
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP63056221A
Other languages
Japanese (ja)
Inventor
Kaoru Suzuki
薫 鈴木
Shuichi Tsujimoto
辻本 修一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Original Assignee
Toshiba Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp filed Critical Toshiba Corp
Priority to JP63056221A priority Critical patent/JPH01231186A/en
Priority to US07/321,268 priority patent/US4998285A/en
Priority to EP19890302416 priority patent/EP0332471A3/en
Publication of JPH01231186A publication Critical patent/JPH01231186A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)

Abstract

PURPOSE:To execute a classification of a character which cannot be discriminated by only a shape, a detection of a line pitch, and a comparison, an integration and a classification based on a line pitch and a center line position by extracting a center line from one character line, evaluating a height position of a character, based thereon and classifying the character. CONSTITUTION:A document image which has been inputted from an input means 1 passes through a character line extracting part 2 and a character segmenting part 3, and a series of character circumscribed rectangles and a character pattern are extracted. A center line extracting part 4 extracts a center line from this series of character circumscribed rectangles, and a classifying part 5 uses the obtained center line and classifies each character, based on a position of its circumscribed rectangle. A pattern recognizing part 6 coordinates the character pattern and a pattern collation dictionary 7 and outputs a result of collation. An interpreting part 8 evaluates both a result of classification and a result of collation and outputs a result of recognition, and an output means 9 outputs a result of recognition.

Description

【発明の詳細な説明】 〔発明の目的〕 (産業上の利用分野) 本発明は文書画像から文字列を切り出し、各文字を認識
するための文字認識方式に関する。
DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Field of Industrial Application) The present invention relates to a character recognition method for cutting out a character string from a document image and recognizing each character.

(従来の技術) 欧文認識においては1例えばコンマとアポストロフィの
区別やX2とX2における“2″の持つ意味の区別が文
字の形だけでは決定できない場合がある。
(Prior Art) In the recognition of European characters, for example, the distinction between a comma and an apostrophe, or the distinction between the meanings of "2" in X2 and X2, may not be determined based only on the shape of the characters.

これらの文字の区別は文字の出現位置の差異に着目すれ
ば容易に決定できるが、評価のための位置の基準を設け
る必要がある。また様々な文書に対応するためには、こ
れらの基準は文書中から抽出して用いるのが望ましい。
The distinction between these characters can be easily determined by paying attention to the differences in the positions in which the characters appear, but it is necessary to provide a positional standard for evaluation. Furthermore, in order to deal with a variety of documents, it is desirable to extract these standards from the documents and use them.

従ってどのような基準をどのようにして文書から抽出す
るかが問題であった。
Therefore, the problem was how to extract criteria from documents.

従来1文書中の文字を位置に着目して分類する方法とし
て、例えば特開昭62−187988のように文字行中
に4本のサプラインを設定し、該サプラインを補正しな
がら文字を分類してゆく方法等があったが十分なものと
はいえなかった。
Conventionally, as a method of classifying characters in a document by focusing on their positions, for example, as in JP-A-62-187988, four purlines are set in a character line, and the characters are classified while correcting the purlines. There were ways to do this, but they were not sufficient.

また文書認識においては、互いに関連する文字行を比較
・統合・分類できると文書の構造理解に役立つ。そのた
めには比較のためのパラメータを文字行から抽出する必
要があった。
Additionally, in document recognition, being able to compare, integrate, and classify related character lines is useful for understanding the structure of a document. To do this, it was necessary to extract parameters for comparison from the character lines.

(発明が解決しようとする課題) 以上のように、従来技術には形だけでは決定できない文
字の区別、区別のための基準を文書中から抽出する問題
1文書の構造理解のためのパラメータを文字行から抽出
する問題があった。
(Problems to be Solved by the Invention) As described above, the conventional techniques include character distinction, which cannot be determined by shape alone, and problem 1 of extracting criteria for distinction from a document. I had a problem extracting from a row.

本発明の目的は、−文字行からその中心線を抽出し、こ
れに基いて文字の高さ位置を評価して文字を分類し、上
記問題点を全て解決することにある。
An object of the present invention is to extract the center line from a line of characters, evaluate the height position of the characters based on this, and classify the characters, thereby solving all of the above problems.

〔発明の構成〕[Structure of the invention]

(課題を解決するための手段) 上記目的を達成するため、本発明は文字パターンの存在
位置と存在範囲により定義される数値または関数を行方
向に演算して得られる関数を用いて中心線を抽出する機
能及び当該文字行中の各文字を得られた中心線に対する
行来直方向へのずれ量によって分類する機能を有する文
字認識方式を提供する。
(Means for Solving the Problems) In order to achieve the above object, the present invention calculates the center line using a function obtained by calculating in the row direction a numerical value or function defined by the position and range of the character pattern. A character recognition method is provided which has a function of extracting each character in the character line and a function of classifying each character in the character line based on the amount of deviation in the horizontal and vertical directions with respect to the obtained center line.

(作 用) 本発明によれば5文字行毎に安定に中心線を抽出でき、
抽出された中心線を用いて文字を分類できる。この分類
によれば、文字の位置の差異が明確になるので、形のみ
の評価では難しい文字の区別を決定できる。また抽出さ
れた中心線の位置及び間隔を評価することによって1行
ピッチの抽出、行の比較・統合・分類が可能となる。
(Function) According to the present invention, it is possible to stably extract the center line for every 5 character lines,
Characters can be classified using the extracted center line. According to this classification, the difference in the position of characters becomes clear, so it is possible to determine the distinction between characters, which is difficult to do by evaluating only the shape. Furthermore, by evaluating the position and interval of the extracted center lines, it becomes possible to extract the pitch of one line and to compare, integrate, and classify the lines.

(実施例) 本発明の一実施例を図面に従って説明する。第1図は本
発明に係る文字認識装置の全体構成を示すブロック図で
ある。入力手段1より入力された文書画像は、文字行抽
出部2及び文字切り出し部3を経て一連の文字外接矩形
と文字パターンが抽出される。中心線抽出部4はこの一
連の文字外接矩形から中心線を抽出する。分類部5は得
られた中心線を用いて各文字をその外接矩形の位置に基
いて分類する。パターン認識部6は当該文字のパターン
とパターン照合用辞書7を対応付けて照合結果を出力す
る。解釈部8では分類結果と照合結果の両方を評価して
認識結果を出力する。出力手段9は該認識結果を出力す
る。
(Example) An example of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the overall configuration of a character recognition device according to the present invention. A document image inputted from the input means 1 passes through a character line extracting section 2 and a character cutting section 3, where a series of character circumscribing rectangles and character patterns are extracted. The center line extraction unit 4 extracts a center line from this series of character circumscribed rectangles. The classification unit 5 uses the obtained center line to classify each character based on the position of its circumscribed rectangle. The pattern recognition unit 6 associates the character pattern with the pattern matching dictionary 7 and outputs the matching result. The interpreter 8 evaluates both the classification results and the matching results and outputs the recognition results. The output means 9 outputs the recognition result.

次に中心線抽出部4の詳細な説明をする。第2図は、第
1図における中心線抽出部4の一処理系のフローである
。以下に第2図の各ブロックの働きを説明する6 関数定義10では文字行中の各文字の位置と行に垂直な
方向の文字の大さきに応じた関数を定義する。本実施例
ではこの関数は第4図に示すような3角形の形をした関
数である。この3角形は第3図13に示す示すX軸(文
字行に垂直な方向)上の辺を底辺とし、その高さをある
固定の値Hとしたような2等辺3角形である。
Next, the center line extraction section 4 will be explained in detail. FIG. 2 is a flowchart of one processing system of the center line extraction section 4 in FIG. The function of each block in FIG. 2 will be explained below.6 Function definition 10 defines a function corresponding to the position of each character in a character line and the size of the character in the direction perpendicular to the line. In this embodiment, this function is a triangular function as shown in FIG. This triangle is an isosceles triangle whose base is the side on the X axis (direction perpendicular to the character line) shown in FIG. 3, and whose height is a fixed value H.

ヒストグラム作成11では10で定義した関数を1文字
行にわたって行方向に加算したヒストグラムを作成する
In histogram creation step 11, a histogram is created by adding the function defined in step 10 in the row direction over one character line.

ピーク検出12では得られたヒストグラム中から最も高
いピークを検出する。このピークが中心線の位置を与え
る。
In peak detection 12, the highest peak is detected from the obtained histogram. This peak gives the location of the centerline.

次に第1図の分類部5の働きを説明する。本実施例では
第4図(a)(b)に示すω〜■のカテゴリーに文字を
分類する。第1図の分類部では、中心線に対する行に垂
直な方向へのずれ量により文字を分類する。第5図は上
記ずれ量の定義を示す図である1図中のLlとL2の差
を求めこれをLとしてLの絶対値の大きさにより該文字
をこの値が小さい文字・中程度の文字・大きい文字の3
通りに分類する。この3つの分類区分は、第4図のカテ
ゴリ一番号で〔■〕及び〔■■〕及び〔に)■〕の3つ
のグループに文字を分類することに相当する。このとき
上記2番目と3番目のグループに属する文字を上記りの
値の正負によりさらに分類する。すなわちLをL=L1
−L2と定義すれば、Lが正となる場合には該文字は第
4図のカテゴリ一番号で■もしくは(へ)であると判定
し、Lが負となる場合には該文字は■もしくは■である
と判定する。
Next, the function of the classification section 5 shown in FIG. 1 will be explained. In this embodiment, characters are classified into categories ω to ■ shown in FIGS. 4(a) and 4(b). The classification section shown in FIG. 1 classifies characters based on the amount of deviation from the center line in the direction perpendicular to the line. Figure 5 is a diagram showing the definition of the above deviation amount. Find the difference between Ll and L2 in Figure 1, take this as L, and classify the character according to the magnitude of the absolute value of L as a character with a small value or a medium character.・Large letter 3
Sort by street. These three classification divisions correspond to the classification of characters into three groups, [■], [■■], and [ni)■] using the category number 1 in FIG. At this time, the characters belonging to the second and third groups are further classified according to the sign of the above values. In other words, L=L1
-L2, if L is positive, the character is determined to be ■ or (to) in the category 1 number in Figure 4, and if L is negative, the character is determined to be ■ or (to). It is determined that ■.

第6図に上記分類の条件を示す0図中のカッコ内の数字
は第4図におけるカテゴリ一番号である。
FIG. 6 shows the conditions for the above classification. The numbers in parentheses in FIG. 0 are the category numbers in FIG.

またthl、th2.th3はしきい値である。なお第
1図の解釈部8では、パターン照合により得られる尤度
の付属した候補文字のうち上記分類結果に合致するもの
で尤度の最大となる文字を認識結果として選択する。
Also thl, th2. th3 is a threshold value. Note that the interpreter 8 in FIG. 1 selects, as the recognition result, the character that matches the classification result and has the maximum likelihood among the candidate characters with associated likelihoods obtained by pattern matching.

またこの実施例の変形として、次に例を挙げる。Further, as a modification of this embodiment, the following example will be given.

上記実施例において、パターン照合に先立って文字分類
を行うことにより該文字の属するカテゴリーを限定して
、パターン照合に際しては該カテゴリーに属する文字だ
けから成るパターン照合用辞書のみを参照するようにし
ても良い。
In the above embodiment, character classification may be performed prior to pattern matching to limit the category to which the character belongs, and during pattern matching, only a pattern matching dictionary consisting of only characters belonging to the category may be referred to. good.

上記実施例において、文字の位置と存在領域により定義
される2等辺3角形を行方向に加算して得られるヒスト
グラムだけでなく、例えば文字外接矩形の中心位置のヒ
ストグラムを用いても良い。
In the above embodiment, not only the histogram obtained by adding isosceles triangles defined by the character position and the existing area in the row direction, but also the histogram of the center position of the character circumscribing rectangle, for example, may be used.

またそのヒストグラムをぼかす等の処理を行っても良い
Further, processing such as blurring the histogram may be performed.

上記実施例において、予め微小文字を除去したりスキュ
ーの補正を行ってから中心線を抽出しても良い。
In the above embodiment, the center line may be extracted after removing small characters or correcting skew in advance.

上記実施例の第1図の分類部において、文字のカテゴリ
ーを一意に決定せず尤度を付加するなどして複数のカテ
ゴリーを出力しても良い。この場合、解釈部では出力さ
れた全てのカテゴリーと出力されたパターン照合結果に
ついて、双方の尤度を評価して最終結果を決定する。
In the classification section shown in FIG. 1 of the above embodiment, a plurality of categories may be output by adding a likelihood, etc., without uniquely determining the category of characters. In this case, the interpreter evaluates the likelihood of all the output categories and the output pattern matching results to determine the final result.

上記実施例において、あるフォントのある文字が分類さ
れ得るカテゴリーについての知識を持ち、同じ文字であ
ってもフォントの違いにより複数の分類カテゴリーに属
する文字の分類結果と該知識を照合することにより処理
中の文字フォントの種類の範囲を推定でき、以後の処理
を該文字フォントの種類の範囲に適応させることが可能
である。
In the above embodiment, processing is performed by having knowledge about the categories into which a certain character in a certain font can be classified, and comparing this knowledge with the classification results of characters that belong to multiple classification categories due to differences in fonts, even if the same character is the same character. It is possible to estimate the range of types of character fonts in the text, and to adapt subsequent processing to the range of types of character fonts.

上記実施例において、1行のみならず多数行の情報を用
いて同様の処理を行っても良い。
In the embodiments described above, similar processing may be performed using not only one line of information but also multiple lines of information.

上記実施例により抽出された中心線に基いた行ピツチ検
出1行の比較・統合・分類も行える。すなわち、行方向
に垂直な方向に隣接した行の中心線間の距離を求めて隣
接行間の距離(行ピッチ)を得る。行ピッチが同程度の
行は同一の記事として統合することが可能である。また
、行方向に隣接した行について、各々の中心線の位置が
同程度である行は同一の行に統合することができる。
Comparison, integration, and classification of one line with line pitch detection based on the center line extracted by the above embodiment can also be performed. That is, the distance between the center lines of adjacent rows in the direction perpendicular to the row direction is determined to obtain the distance between adjacent rows (row pitch). Lines with similar line pitches can be combined as the same article. Furthermore, rows that are adjacent in the row direction and whose center lines are at the same position can be combined into the same row.

要するに本発明はその要旨を逸脱しない限り種々に変形
して用いることができる。
In short, the present invention can be modified and used in various ways without departing from the gist thereof.

〔発明の効果〕〔Effect of the invention〕

この発明により、形だけでは区別できない文字の分類、
行ピッチの検出、行ピッチ・中心線位置に基く行の比較
・統合・分類が可能となる。
With this invention, the classification of characters that cannot be distinguished by shape alone,
It is possible to detect line pitch and compare, integrate, and classify lines based on line pitch and centerline position.

【図面の簡単な説明】 第1図は本発明による文字認識装置の構成図、第2図は
第1図に示す文字認識装置の一実施例における中心線抽
出処理のフロー図、第3図は第1図に示す文字認識装置
の一実施例における文字外接矩形の存在位置と存在領域
により定義される関数を示す図、第4図は第1図に示す
文字認識装置の一実施例における文字分類区分を示す図
、第5図は第1図に示す文字認識装置の一実施例におけ
るずれ量の定義を示す図、第6図は第1図に示す文字認
識装置の一実施例における分類条件を示す図である。 1・・・入力手段、     2・・・文字行抽出部、
3・・・文字切り出し部、  4・・・中心線抽出部、
5、・1分類部、      6・・・パターン認識部
。 7・・・パターン照合用辞書、 8・・・解釈部、      9・・・出力手段。 代理人 弁理士 則 近 憲 佑 同  松山光之 第  1 図 第  2 図 第3図 イク!【りよ)a/+其(〃デゴリー(ト号(α) (F) (21(3) (41(5)  カテゴリー増
Y号Cb) 第  4 図
[BRIEF DESCRIPTION OF THE DRAWINGS] FIG. 1 is a block diagram of a character recognition device according to the present invention, FIG. 2 is a flowchart of center line extraction processing in an embodiment of the character recognition device shown in FIG. 1, and FIG. FIG. 4 is a diagram showing a function defined by the existence position and existence area of a character circumscribing rectangle in one embodiment of the character recognition device shown in FIG. 1, and FIG. 4 is a character classification in one embodiment of the character recognition device shown in FIG. 1. 5 is a diagram showing the definition of the amount of deviation in one embodiment of the character recognition device shown in FIG. 1, and FIG. 6 is a diagram showing the classification conditions in one embodiment of the character recognition device shown in FIG. 1. FIG. 1... Input means, 2... Character line extraction unit,
3... Character cutting section, 4... Center line extraction section,
5. 1 classification section, 6... pattern recognition section. 7... Dictionary for pattern matching, 8... Interpretation unit, 9... Output means. Agent Patent Attorney Noriyuki Chika Yudo Mitsuyuki Matsuyama Figure 1 Figure 2 Figure 3 Iku! [Riyo) a/+ that (〃Degory (G number (α)) (F) (21 (3) (41 (5) Category increase Y number Cb) Figure 4

Claims (2)

【特許請求の範囲】[Claims] (1)文字列を含む画像から文字行を切り出し、文字行
中の各文字の存在位置または存在領域によって数値また
は関数を定義し、この定義された数値または関数を各文
字について行方向に演算して得られる関数を用いて前記
文字行の中心線を抽出することを特徴とする文字認識方
式。
(1) Cut out a character line from an image containing a character string, define a numerical value or function based on the location or area of each character in the character line, and calculate this defined numerical value or function for each character in the row direction. A character recognition method characterized in that the center line of the character line is extracted using a function obtained by
(2)抽出された中心線は、文字行中の各文字の位置と
該中心線に対する行垂直方向へのずれ量によって該文字
を分類するために供されるものである請求項1記載の文
字認識方式。
(2) Characters according to claim 1, wherein the extracted center line is used to classify the characters according to the position of each character in the character line and the amount of deviation in the line perpendicular direction from the center line. Recognition method.
JP63056221A 1988-03-11 1988-03-11 Character recognizing system Pending JPH01231186A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP63056221A JPH01231186A (en) 1988-03-11 1988-03-11 Character recognizing system
US07/321,268 US4998285A (en) 1988-03-11 1989-03-09 Character recognition apparatus
EP19890302416 EP0332471A3 (en) 1988-03-11 1989-03-10 Character recognition apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP63056221A JPH01231186A (en) 1988-03-11 1988-03-11 Character recognizing system

Publications (1)

Publication Number Publication Date
JPH01231186A true JPH01231186A (en) 1989-09-14

Family

ID=13021050

Family Applications (1)

Application Number Title Priority Date Filing Date
JP63056221A Pending JPH01231186A (en) 1988-03-11 1988-03-11 Character recognizing system

Country Status (1)

Country Link
JP (1) JPH01231186A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04344585A (en) * 1991-05-21 1992-12-01 Oki Electric Ind Co Ltd Optical character reader
JPH04344584A (en) * 1991-05-21 1992-12-01 Oki Electric Ind Co Ltd Optical character reader
US5369715A (en) * 1990-04-27 1994-11-29 Sharp Kabushiki Kaisha Optical character recognition system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5369715A (en) * 1990-04-27 1994-11-29 Sharp Kabushiki Kaisha Optical character recognition system
JPH04344585A (en) * 1991-05-21 1992-12-01 Oki Electric Ind Co Ltd Optical character reader
JPH04344584A (en) * 1991-05-21 1992-12-01 Oki Electric Ind Co Ltd Optical character reader

Similar Documents

Publication Publication Date Title
Taylor et al. Extraction of data from preprinted forms
Khayyat et al. Arabic handwritten text line extraction by applying an adaptive mask to morphological dilation
CN111753706B (en) Complex table intersection point clustering extraction method based on image statistics
CN113723330B (en) Method and system for understanding chart document information
Suen et al. Bank check processing system
CN116824608A (en) Answer sheet layout analysis method based on target detection technology
Lam et al. Automatic processing of information on cheques
CN108717544A (en) A kind of newspaper sample manuscript word automatic testing method based on intelligent image analysis
JPH0950527A (en) Frame extracting device and rectangle extracting device
Van Phan et al. Collecting handwritten nom character patterns from historical document pages
CN101814140B (en) Method for positioning envelope image address
JPH01231186A (en) Character recognizing system
Kumar et al. Text line segmentation of handwritten documents using clustering method based on thresholding approach
CN115063817A (en) Form identification method and system based on morphological detection and storage medium
Nguyen et al. Enhanced character segmentation for format-free Japanese text recognition
Li et al. Vehicle License Plate Recognition Combing MSER and Support Vector Machine in A Complex Environment
Mitrpanont et al. Using contour analysis to improve feature extraction in Thai handwritten character recognition systems
Gao et al. Segmentation and recognition of dimension texts in engineering drawings
Saidi et al. Recognition of offline handwritten Arabic words using a few structural features
JPH02116987A (en) Character recognizing device
Khan et al. Efficient segmentation of sub-words within handwritten arabic words
Guo et al. Detection of street-level traffic panels based on cascaded color segmentation
Su et al. Character segmentation for classical Mongolian words in historical documents
Chen et al. A robust algorithm for separation of Chinese characters from line drawings
JPH0713994A (en) Character recognizing device