JP2995809B2 - In-table character recognition device - Google Patents

In-table character recognition device

Info

Publication number
JP2995809B2
JP2995809B2 JP2188922A JP18892290A JP2995809B2 JP 2995809 B2 JP2995809 B2 JP 2995809B2 JP 2188922 A JP2188922 A JP 2188922A JP 18892290 A JP18892290 A JP 18892290A JP 2995809 B2 JP2995809 B2 JP 2995809B2
Authority
JP
Japan
Prior art keywords
character
characters
unit
recognition
cell
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP2188922A
Other languages
Japanese (ja)
Other versions
JPH0475187A (en
Inventor
昇 中村
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP2188922A priority Critical patent/JP2995809B2/en
Publication of JPH0475187A publication Critical patent/JPH0475187A/en
Application granted granted Critical
Publication of JP2995809B2 publication Critical patent/JP2995809B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)
  • Character Discrimination (AREA)

Description

【発明の詳細な説明】 (産業上の利用分野) 本発明は表内の文字を高速かつ正確に、認識する文字
認識装置に関するものである。
Description: TECHNICAL FIELD The present invention relates to a character recognition device for recognizing characters in a table at high speed and accurately.

(従来の技術) 従来は表(セル)の中の文字を、一般の文字として扱
い、表中文字の縦横関係を利用せずに文字種類の判断を
行っていた。そのため文字種類の判断が正確でなく且つ
判断処理速度が遅いという欠点があった。
(Prior Art) Conventionally, characters in a table (cell) are treated as general characters, and the character type is determined without using the vertical / horizontal relationship of the characters in the table. For this reason, there has been a defect that the determination of the character type is not accurate and the determination processing speed is slow.

(発明が解決しようとする課題) 本発明の課題は、表内文字の文字種類認識をより正確
且つ高速に行える表内文字認識装置を提供することにあ
る。
(Problem to be Solved by the Invention) An object of the present invention is to provide an in-table character recognizing device capable of more accurately and quickly recognizing the character type of the in-table character.

(課題を解決するための手段) 一般に表の上下のセル列について着目すると、英文字
は表の中に複数の文字列を含み、日本語文字は1つの文
字列からなり、数字文字は右端位置が揃っており、左端
位置は揃っていない傾向があり、セル内の文字位置から
文字の種別を推定することが可能である。かかる知見を
得て、本発明では表内文字の上記特性を使って課題を達
成したものであり、その要旨は、 キーボード・マウス等の指示により文字認識の指令を
行う認識指令部と、表内文字を表とともにイメージで入
力する画像入力部と、同画像入力部のイメージ・データ
を格納する画像メモリと、イメージ・データより表構造
を抽出しセルに分解する表構造抽出部と、各セルより文
字列を取り出し左端位置及び右端位置を抽出する文字列
抽出部と、同文字列抽出部で抽出されたセル内の文字列
数と文字位置とから文字の種別を推定する文字種予測部
と、同文字種予測部で推定した文字種を優先してサンプ
ル文字とイメージ・データの文字とを比較して文字を決
定する文字認識部と、サンプル文字を記憶した文字パタ
ーン辞典と、文字認識部で決定した文字を出力する外部
出力部と、これらの各部を制御するCPUとを有し、 文字種予測部において、セルの中に複数の文字列があ
れば英文字と推定し、それ以外で先頭セル以外の文字列
の右端位置が揃っていない場合及び右端と左端がともに
揃っている場合は日本語文字と推定し、更に右端が揃っ
ていて左端が揃っていない場合は数字であると文字種を
推定し、推定した文字種を優先して文字認識部において
文字を比較決定することを特徴とする表内文字認識装置
にある。
(Means for solving the problem) Generally, when focusing on the upper and lower cell rows of the table, English characters include a plurality of character strings in the table, Japanese characters are composed of one character string, and numeric characters are in the rightmost position. And the left end positions tend not to be aligned, and it is possible to estimate the character type from the character position in the cell. Based on this knowledge, the present invention has achieved the object using the above-described characteristics of the characters in the table, and the gist of the invention is that a recognition command unit that issues a command for character recognition by an instruction of a keyboard, a mouse, and the like, An image input unit for inputting characters as images together with a table, an image memory for storing image data of the image input unit, a table structure extraction unit for extracting a table structure from the image data and decomposing the cells into cells, A character string extraction unit that extracts a character string and extracts a left end position and a right end position; a character type prediction unit that estimates a character type from the number of character strings and the character position in a cell extracted by the character string extraction unit; The character recognition unit determines the character by comparing the sample character with the image data character with priority given to the character type estimated by the character type prediction unit, the character pattern dictionary that stores the sample characters, and the character recognition unit. It has an external output unit that outputs specified characters, and a CPU that controls each of these units. In the character type prediction unit, if there are multiple character strings in a cell, it is estimated to be an English character, and otherwise, the first cell If the right end positions of character strings other than are not aligned and both the right end and left end are aligned, it is assumed to be Japanese characters, and if the right end is aligned and the left end is not aligned, the character type is assumed to be numeric The character recognition unit in the table is characterized in that the character recognition unit compares and determines the character by giving priority to the estimated character type.

(作用) 本発明は、表内の文字認識において、上下方向の項目
は、同じ文字の種別が使われている可能性が高いことに
着目して、表構造抽出部によりイメージ・データからセ
ルを抽出し、文字列抽出部でセル内の文字列の列数と右
端・左端位置を求め、文字種予測部で上記表内文字の特
性を使って文字の種別を推定することにより、認識辞書
とのマッチング順序を推定した文字種優先に変え、認識
辞書とのマッチング回数を減らし、表の中の文字の認識
率を高め、かつ、認識速度を高めるものである。
(Operation) In the present invention, in character recognition in a table, attention is paid to the possibility that the same character type is used for items in the vertical direction, and a cell is extracted from image data by the table structure extraction unit. The character string extraction unit determines the number of columns of the character string in the cell and the right and left end positions, and the character type prediction unit estimates the type of the character using the characteristics of the characters in the table. The matching order is changed to the estimated character type priority, the number of times of matching with the recognition dictionary is reduced, the recognition rate of the characters in the table is increased, and the recognition speed is increased.

(実施例) 以下、本発明の一実施例について図面を参照しながら
説明する。
Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

第1図は表内文字認識装置のブロック構成図である。
1はキーボード、マウス等の指示により、文字認識を行
う命令を発行する認識指令部である。2は認識するイメ
ージ・データを格納する画像メモリ、3はイメージ・デ
ータより表構造を抽出しセルに分解する表構造抽出部、
4は各セルより文字列を取り出し、左端位置及び右端位
置を抽出する文字列抽出部、5は表の列の文字種別を推
定する文字種予測部、6は推定した文字種から順番にサ
ンプル文字と比較を行ない、前もって決めた値よりも近
い類似度ならば、その文字として出力する文字認識部、
7はサンプル文字を記憶する文字パターン辞書、8は表
より認識した文字を出力する外部出力部、14は表内文字
を表とともにイメージで入力する画像入力部である。
FIG. 1 is a block diagram of the in-table character recognition device.
Reference numeral 1 denotes a recognition command unit that issues a command for performing character recognition according to an instruction from a keyboard, a mouse, or the like. 2 is an image memory for storing image data to be recognized, 3 is a table structure extraction unit for extracting a table structure from the image data and decomposing the table structure into cells.
Reference numeral 4 denotes a character string extracting unit for extracting a character string from each cell and extracting a left end position and a right end position; 5 a character type predicting unit for estimating a character type of a table row; Is performed, and if the similarity is closer than the predetermined value, the character recognition unit that outputs the character,
7 is a character pattern dictionary for storing sample characters, 8 is an external output unit for outputting characters recognized from the table, and 14 is an image input unit for inputting characters in the table as images together with the table.

第2図は、表内文字認識装置のハード構成図である。
9はRAMで画像メモリ、表の構造、予測文字種等を格納
する。10はROMで、文字パターン辞書・プログラムを格
納する。11はRS232−Cで、認識指令及び文字出力を行
う。12はCPUで、プログラムの動作をコントロールす
る。13はスキャナで画像をイメージ・メモリの中にとり
こむ画像入力部14を構成する。
FIG. 2 is a hardware configuration diagram of the in-table character recognition device.
A RAM 9 stores an image memory, a table structure, a predicted character type, and the like. A ROM 10 stores a character pattern dictionary / program. An RS232-C 11 performs a recognition command and a character output. 12 is a CPU, which controls the operation of the program. Reference numeral 13 denotes an image input unit 14 which takes an image into an image memory by a scanner.

処理全体の流れを、第3図のフローチャートを使って
説明する。表構造抽出部3で、スキャナ13で取り込んだ
イメージ・データから、水平及び垂直な線分で表わされ
る表形式を得る(ステップ1)。この表形式により、縦
及び横に分離された長方形領域をセルという。この表の
列がなければ終わり(ステップ2)、あれば、文字列抽
出部4において、その列の先頭を除くすべてのセルにつ
いて文字列を求める(ステップ3)。次に、文字種予測
部5で、その中に1個でも複数の文字列がある場合は、
その列は英文字で占められていると予測する(ステップ
4、5、6)。1つの文字列の場合、文字列の左端位
置、右端位置を求める(ステップ7)。列の先頭のセル
を除いたセルがすべて、右端位置が揃っていなかった
ら、この列のセルの文字列は日本語文字と推定する(ス
テップ8、9)。右端位置が揃っており、左端位置が揃
っていない場合は、数字と推定(ステップ11)、右端、
左端ともに揃っている場合は日本語文字と推定する(ス
テップ9)。文字認識部6において、列のセルに対し
て、文字種の推定に従い、文字を認識する(ステップ1
2)。
The flow of the entire process will be described with reference to the flowchart of FIG. The table structure extraction unit 3 obtains a table format represented by horizontal and vertical line segments from the image data captured by the scanner 13 (step 1). In this table format, a rectangular area divided vertically and horizontally is called a cell. If there is no column in the table, the process ends (step 2). If there is, the character string extracting unit 4 obtains a character string for all cells except the head of the column (step 3). Next, when the character type prediction unit 5 includes at least one character string therein,
The sequence is predicted to be occupied by English characters (steps 4, 5, 6). In the case of one character string, the left end position and the right end position of the character string are obtained (step 7). If all the cells except the first cell in the column are not aligned at the right end, the character string of the cell in this column is estimated to be a Japanese character (steps 8 and 9). If the right end position is aligned and the left end position is not aligned, it is estimated as a number (step 11),
If both the left edges are aligned, it is estimated that the characters are Japanese characters (step 9). The character recognizing unit 6 recognizes characters in the cells of the column according to the estimation of the character type (step 1
2).

全体の処理具体例を用いて、説明を補足する。まず表
構造抽出部3において、正規化されたイメージ・データ
(第4図参照)から、水平・垂直な線分を得て、それら
から構成されるセルを抽出する(第5図)。
The explanation will be supplemented by using a specific example of the entire processing. First, in the table structure extraction unit 3, horizontal and vertical line segments are obtained from the normalized image data (see FIG. 4), and cells composed of them are extracted (FIG. 5).

次に、これらのセルは、すべて文字列数1であるか
ら、文字列の左端位置及び右端位置を求める(第6図に
左端・右端の位置の座標値のメモリ状態を示してい
る)。次に文字種予測部5において、2列目、3列目は
最初のセルを除いて、右端位置が揃っていて、左端位置
は揃っていないので、数字であると推定する。文字認識
部6において、これらの推定に従い、辞書の検索順序を
かえ、認識時の識別レベルを低くすることにより、“O"
と“O"等を誤って認識しなくなり、かつ検索する辞書の
サンプル文字数が少なくなるため、処理時間の短縮して
文字を決定できるものとした。
Next, since these cells all have one character string, the left end position and the right end position of the character string are obtained (FIG. 6 shows the memory state of the coordinate values of the left end position and the right end position). Next, in the character type prediction unit 5, the second and third columns, except for the first cell, have the same right end position and not the same left end position. The character recognizing unit 6 changes the search order of the dictionary in accordance with these estimations and lowers the recognition level at the time of recognition, thereby obtaining "O"
And "O" are not erroneously recognized, and the number of sample characters in the dictionary to be searched is reduced, so that the character can be determined with a reduced processing time.

(発明の効果) 以上のように、本発明は、表内文字の文字列の数と右
端・左端位置の特徴を使って文字種を決定し、それを優
先して使うことで、認識率が高く且つ認識速度も速いも
のにできた。
(Effect of the Invention) As described above, according to the present invention, the character type is determined using the number of character strings of the characters in the table and the characteristics of the right end and left end positions, and the character type is preferentially used. In addition, the recognition speed was high.

【図面の簡単な説明】 第1図は本発明の一実施例におけるブロック構成図、第
2図は本発明一実施例におけるハード構成図、第3図は
本発明の一実施例におけるフローチャート図、第4図は
認識を行う表の例を示す説明図、第5図は各セルの呼び
方を示す説明図、第6図は各セルの文字列左右端の座標
の位置を示す説明図、第7図は各列に対し予測された文
字種を示す説明図である。
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram of one embodiment of the present invention, FIG. 2 is a hardware diagram of one embodiment of the present invention, FIG. 3 is a flowchart of one embodiment of the present invention, FIG. 4 is an explanatory diagram showing an example of a table for recognition, FIG. 5 is an explanatory diagram showing how each cell is called, FIG. 6 is an explanatory diagram showing the coordinates of the character string left and right ends of each cell. FIG. 7 is an explanatory diagram showing the character type predicted for each column.

Claims (1)

(57)【特許請求の範囲】(57) [Claims] 【請求項1】キーボード・マウス等の指示により文字認
識の指令を行う認識指令部と、表内文字を表とともにイ
メージで入力する画像入力部と、同画像入力部のイメー
ジ・データを格納する画像メモリと、イメージ・データ
より表構造を抽出しセルに分解する表構造抽出部と、各
セルより文字列を取り出し左端位置及び右端位置を抽出
する文字列抽出部と、同文字列抽出部で抽出されたセル
内の文字列数と文字位置とから文字の種別を推定する文
字種予測部と、同文字種予測部で推定した文字種を優先
してサンプル文字とイメージ・データの文字とを比較し
て文字を決定する文字認識部と、サンプル文字を記憶し
た文字パターン辞典と、文字認識部で決定した文字を出
力する外部出力部と、これらの各部を制御するCPUとを
有し、 文字種予測部において、セルの中に複数の文字列があれ
ば英文字と推定し、それ以外で先頭セル以外の文字列の
右端位置が揃っていない場合及び右端と左端がともに揃
っている場合は日本語文字と推定し、更に右端が揃って
いて左端が揃っていない場合は数字であると文字種を推
定し、推定した文字種を優先して文字認識部において文
字を比較決定することを特徴とする表内文字認識装置。
1. A recognition command section for issuing a command for character recognition in accordance with an instruction from a keyboard, a mouse or the like, an image input section for inputting characters in a table as an image together with a table, and an image for storing image data of the image input section. A memory, a table structure extraction unit that extracts a table structure from image data and decomposes the cells, a character string extraction unit that extracts a character string from each cell and extracts a left end position and a right end position, and an extraction by the same character string extraction unit Character type prediction unit that estimates the character type from the number of character strings and the character position in the selected cell, and compares the sample character with the image data character with priority given to the character type estimated by the same character type prediction unit. A character pattern dictionary that stores sample characters, an external output unit that outputs characters determined by the character recognition unit, and a CPU that controls these units. If there are multiple character strings in the cell, it is assumed to be English characters, and if the right end position of the character strings other than the first cell is not aligned and if both the right and left ends are aligned, Japanese characters If the right end is aligned and the left end is not aligned, the character type is estimated to be a number, and the character is compared and determined by the character recognition unit with priority given to the estimated character type. Recognition device.
JP2188922A 1990-07-17 1990-07-17 In-table character recognition device Expired - Lifetime JP2995809B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2188922A JP2995809B2 (en) 1990-07-17 1990-07-17 In-table character recognition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2188922A JP2995809B2 (en) 1990-07-17 1990-07-17 In-table character recognition device

Publications (2)

Publication Number Publication Date
JPH0475187A JPH0475187A (en) 1992-03-10
JP2995809B2 true JP2995809B2 (en) 1999-12-27

Family

ID=16232234

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2188922A Expired - Lifetime JP2995809B2 (en) 1990-07-17 1990-07-17 In-table character recognition device

Country Status (1)

Country Link
JP (1) JP2995809B2 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680688B (en) * 2020-06-10 2023-08-08 创新奇智(成都)科技有限公司 Character recognition method and device, electronic equipment and storage medium
CN112070087B (en) * 2020-09-14 2023-06-02 成都主导软件技术有限公司 Train number identification method and device with end bit and readable storage medium

Also Published As

Publication number Publication date
JPH0475187A (en) 1992-03-10

Similar Documents

Publication Publication Date Title
CN106940799B (en) Text image processing method and device
JP5211334B2 (en) Handwritten symbol recognition method and apparatus
JP4308785B2 (en) Digital ink question retrieval apparatus and method
Ma et al. Joint layout analysis, character detection and recognition for historical document digitization
US5335289A (en) Recognition of characters in cursive script
JP2995809B2 (en) In-table character recognition device
JP2926066B2 (en) Table recognition device
Modi et al. Text line detection and segmentation in Handwritten Gurumukhi Scripts
CN110263736A (en) A kind of component identification method, apparatus, storage medium and system
US5596657A (en) Method of sorting out candidate characters in character recognition system
Sarkar et al. Text line extraction from handwritten document pages based on line contour estimation
CN112667771A (en) Answer sequence determination method and device
JP4466241B2 (en) Document processing method and document processing apparatus
JPH0638276B2 (en) Pattern identification device
Han et al. Coarse classification of Chinese characters via stroke clustering method
KR19990049667A (en) Korean Character Recognition Method
JP2675303B2 (en) Character recognition method
Nimsuk et al. Offline Handwriting Recognition of Thai Characters Using Multiple Deep Neural Networks
KR100332752B1 (en) Method for recognizing character
KR950011065B1 (en) A character recognition method
JPH0589190A (en) Drawing information checking system
JPH1166230A (en) Device, method, and medium for document recognition
JPH05346974A (en) Character recognizing device
JPH03126188A (en) Character recognizing device
JPH11120291A (en) Pattern recognition system

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081029

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091029

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091029

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101029

Year of fee payment: 11

EXPY Cancellation because of completion of term
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101029

Year of fee payment: 11