JPS62120586A - Character recognizing device - Google Patents

Character recognizing device

Info

Publication number
JPS62120586A
JPS62120586A JP60260646A JP26064685A JPS62120586A JP S62120586 A JPS62120586 A JP S62120586A JP 60260646 A JP60260646 A JP 60260646A JP 26064685 A JP26064685 A JP 26064685A JP S62120586 A JPS62120586 A JP S62120586A
Authority
JP
Japan
Prior art keywords
character
dictionary
written
characters
vertical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP60260646A
Other languages
Japanese (ja)
Inventor
Koichi Ejiri
公一 江尻
Akira Sakurai
彰 桜井
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority to JP60260646A priority Critical patent/JPS62120586A/en
Publication of JPS62120586A publication Critical patent/JPS62120586A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To recognize a laterally written character such as alphabets included in a vertical written document by recognizing a rejected character through the use of a dictionary for laterally written characters after its picture is rotated by 90 deg.. CONSTITUTION:Picture data from a scanner 2 is accumulated in a buffer 4, and accessed by a vertical and lateral direction decision part 6 and a line segment part 8. If an inputted sentence is decided to be vertical written, it is inputted to a feature extraction part 14 without being rotated 90 deg.. The feature vector of the sentence is inputted to a matching part 16, and its input and the feature vectors of a common dictionary D1 and a laterally written character dictionary D3 are calculated. If a rejected character occurs because a character with a minimum distance cannot be found, a rotating part 12 rotates the input character by 90 deg.. The picture data of the input character is transmitted to the feature extraction part 14, and its feature vector is calculated in the matching part 16 based on the common dictionary D1 and the laterally written character dictionary D2. Finally the code of the character with a minimum distance is outputted as a recognized result.

Description

【発明の詳細な説明】 〔技術分野〕 本発明は文字認識装置に関し、さらに詳しくは、縦書き
文書の文字認識の可能な文字認識装置に関する。
DETAILED DESCRIPTION OF THE INVENTION [Technical Field] The present invention relates to a character recognition device, and more particularly to a character recognition device capable of recognizing characters in vertically written documents.

〔従来技術〕[Prior art]

文字認識装置は一般に横書き文書用に作られているが、
縦書き文書用の文字認識装置も一部開発されている。
Character recognition devices are generally made for horizontally written documents, but
Some character recognition devices for vertically written documents have also been developed.

しかし、従来の縦書き文書用文字認識装置にあっては、
縦書き文書に横書き文字が含まれていると。
However, in conventional character recognition devices for vertically written documents,
When a vertically written document contains horizontally written characters.

その横書きの文字の文字認識が不可能であった。It was impossible to recognize the horizontally written characters.

例えば第3図に示すような縦書き文書は珍らしくないが
、この文書の2行目の欧文文字列の文字を認識すること
ができなかった。
For example, a vertically written document like the one shown in FIG. 3 is not uncommon, but the characters in the Roman character string on the second line of this document could not be recognized.

〔目 的〕〔the purpose〕

本発明の目的は、縦書き文書に含まれる欧文などの横書
き文字も正しく認識可能な文字認識装置を提供すること
にある。
An object of the present invention is to provide a character recognition device that can correctly recognize horizontally written characters such as European characters included in a vertically written document.

〔構 成〕〔composition〕

この目的を達成すべくなされた本発明の文字認識装置は
、縦書き文字の辞書と、横書き文字の辞書と、文字の画
像を90度回転させる手段を備え、縦書き文字の辞書を
用いて縦書き文書の文字認識中にリジェクト文字が発生
した場合、そのリジェクト文字の画像を90度回転させ
たのち、横書き文字の辞書を用いて認識を試みることを
特徴とするものである。
The character recognition device of the present invention, which has been made to achieve this object, is equipped with a dictionary for vertically written characters, a dictionary for horizontally written characters, and a means for rotating a character image by 90 degrees. When a reject character is generated during character recognition of a written document, the image of the reject character is rotated by 90 degrees, and then recognition is attempted using a dictionary of horizontally written characters.

〔実施例〕〔Example〕

以下、本発明の一実施例について、図面を参照し説明す
る。
An embodiment of the present invention will be described below with reference to the drawings.

第1図は本発明の文字認識装置の一実施例を示す概略ブ
ロック図である。この図において、2は入力文書を画素
分解して読み取るスキャナであり、白黒2値の画像デー
タを出力する。この画像データはバッファ4に一時的に
蓄積される。このバッファ4は縦横判定部6および行切
出部8によってアクセスされる。
FIG. 1 is a schematic block diagram showing an embodiment of the character recognition device of the present invention. In this figure, numeral 2 denotes a scanner that separates and reads an input document into pixels, and outputs black and white binary image data. This image data is temporarily stored in the buffer 4. This buffer 4 is accessed by an aspect determination section 6 and a line cutting section 8.

縦横判定部6は入力文書が縦書き文書であるか、横書き
文書であるかの判別を行うものである。このような縦横
判別は種々の方法で可能であるが。
The vertical/horizontal determining unit 6 determines whether the input document is a vertically written document or a horizontally written document. This type of vertical/horizontal determination can be performed using various methods.

本実施例にあっては、次のような方法によって縦横判別
がなされる。
In this embodiment, the orientation is determined by the following method.

文書画像のラン特性を調べると、縦書き文字の部分では
縦の短い白ランの発生頻度が高いのに対し、横書き文字
の部分では横の短い白ランの発生頻度が高い。そこで縦
横判定部6はバッファ4をアクセスして入力文書の画像
データを読込み、縦の短い白ランと横の短い白ランの発
生頻度を測定し、その測定結果から縦書き文書または横
書き文書の別を判定する。このような縦横判別の詳細に
ついては、特開昭56−149674号「画像特性の識
別方法」に詳しく述べられている。
When examining the run characteristics of a document image, it is found that short vertical white runs occur more frequently in vertically written characters, while short horizontal white runs occur more frequently in horizontally written characters. Therefore, the aspect determination unit 6 accesses the buffer 4 to read the image data of the input document, measures the frequency of occurrence of short white runs in the vertical direction and short white runs in the horizontal direction, and uses the measurement results to determine whether the document is vertically written or horizontally written. Determine. The details of such vertical/horizontal discrimination are described in Japanese Patent Application Laid-Open No. 56-149674 entitled "Identification Method of Image Characteristics".

行切出部8はバッファ4をアクセスして画像データを読
込み、入力文書の行を切出して各行の画像データを文字
切出部10に入力する。この行切出処理は例えば公知の
射影法によって行われる。
The line cutting unit 8 accesses the buffer 4 to read image data, cuts out lines of the input document, and inputs the image data of each line to the character cutting unit 10. This line cutting process is performed, for example, by a known projection method.

この行切出処理は縦書きと横書きとで行の切出方向を変
える必要があるので、縦横判別部6より縦横判別結果が
行切出部8に通知されるようになっている。
Since this line cutting process requires changing the line cutting direction for vertical and horizontal writing, the vertical/width determining unit 6 notifies the line cutting unit 8 of the results of the vertical/width determination.

文字切出部1oは行切出部8より入力される行単位の画
像データから個々の文字の画像データを切出すものであ
り、その文字切出処理は例えば公知の射影法によって行
われる。切出された文字画像データは、90度回転部1
2を介して特徴抽出部14に入力される。
The character cutting section 1o cuts out image data of individual characters from the line-by-line image data inputted from the line cutting section 8, and the character cutting process is performed, for example, by a known projection method. The extracted character image data is transferred to the 90 degree rotation section 1.
2 to the feature extraction unit 14.

この90度回転部12は通常、文字画像データをそのま
>(90度回転の操作を施さないで)特徴抽出部14へ
伝達するが、90度回転をマツチング部16より指示さ
れた場合には、文字画像を90度回転させた画像データ
を特徴抽出部14に与える。
This 90 degree rotation unit 12 normally transmits the character image data as is (without performing a 90 degree rotation operation) to the feature extraction unit 14, but when 90 degree rotation is instructed by the matching unit 16, , provides image data obtained by rotating the character image by 90 degrees to the feature extraction unit 14.

第2図に90度回転部12の構成を示す。この図に示す
ように、90度回転部12は1文字分の両像データを蓄
積できる文字画像メモリ3oとX。
FIG. 2 shows the configuration of the 90 degree rotating section 12. As shown in this figure, the 90 degree rotation unit 12 has character image memories 3o and X that can store image data for one character.

Yアドレスカウンタ31.32を主要要素としてなるも
のである。文字画像メモリ30への画像データの書込み
の際には、Xアドレスカウンタをゼロから順次インクリ
メントして最大値に達すると、Xアドレスカウンタ31
をゼロクリアすると\もにYアドレスカウンタを1だけ
インクリメントさせる。このようなアドレス更新をXア
ドレスカウンタ32が最大値になるまで繰返す。
The main elements are Y address counters 31 and 32. When writing image data to the character image memory 30, the X address counter is sequentially incremented from zero and when it reaches the maximum value, the X address counter 31 is incremented sequentially from zero.
When is cleared to zero, the Y address counter is incremented by 1. Such address updating is repeated until the X address counter 32 reaches the maximum value.

文字画像メモリ30からの画像の読出しは、通常は書込
みの場合と同様にX、Yアドレスカウンタ31.32を
インクリメントしながら行われる。
Reading of an image from the character image memory 30 is normally performed while incrementing the X and Y address counters 31 and 32, as in the case of writing.

これに対し、90度回転を指示された場合には、Xアド
レスカウンタ32を最大値からデクリメントしていき、
ゼロになるとXアドレスカウンタ31をゼロから1ずつ
インクリメントするというアドレス更新を行うことによ
り、90度回転した文字の画像データを特徴抽出部14
へ送る。
On the other hand, when a 90 degree rotation is instructed, the X address counter 32 is decremented from the maximum value,
By updating the address by incrementing the X address counter 31 by 1 from zero when it reaches zero, the feature extraction unit 14 extracts the image data of the character rotated by 90 degrees.
send to

特徴抽出部14は、入力された文字画像データの特徴を
抽出して特徴ベクトルを作成し、それをマツチング部1
6へ送る。このマツチング部14は、その特徴パラメー
タと辞書18に登録されている各文字の特徴ベクトルと
のマツチング演算を実行し、最小距離の文字を探索する
ものである。
The feature extraction unit 14 extracts the features of the input character image data to create a feature vector, and the matching unit 1
Send to 6. The matching unit 14 performs a matching operation between the feature parameters and the feature vectors of each character registered in the dictionary 18, and searches for a character with the minimum distance.

こ−で、辞IF18は基本的には縦書き文字用辞書と文
字用辞書とから構成されるが、この実施例では、辞書容
量を減するために、特徴ベクトルが縦書きでも横書きで
も共通な文字については辞書を一つの共通辞書D1に統
合し、それ以外の文字の辞書を横書き文字用辞書D2お
よび縦書き文字用辞書D3に分離させた構造となってい
る。
The dictionary IF 18 is basically composed of a dictionary for vertically written characters and a dictionary for characters, but in this embodiment, in order to reduce the dictionary capacity, feature vectors are common to both vertically and horizontally written characters. The structure is such that the dictionaries for characters are integrated into one common dictionary D1, and the dictionaries for other characters are separated into a dictionary for horizontally written characters D2 and a dictionary for vertically written characters D3.

縦書き文書と横書き文書では用いる辞書などが相違する
ため、縦横判定部6より縦横判別の結果がマツチング部
16に通知される。
Since the dictionaries and the like used are different between vertically written documents and horizontally written documents, the matching section 16 is notified of the result of the vertical/horizontal determination from the vertical/width determining section 6.

つぎに全体的動作について説明する。まず入力文書が横
書き文書であると縦横判定部6により判定された場合の
動作について説明する。
Next, the overall operation will be explained. First, an explanation will be given of the operation when the aspect determination unit 6 determines that the input document is a horizontally written document.

この場合、横書き文書としての行切出しおよび文字切出
しがなされ、切出された文字の画像データは90度回転
部12を通じてそのまへ特徴抽出部14へ入力され、そ
の文字の特徴ベクトルがマツチング部16へ送られる。
In this case, line cutting and character cutting are performed as a horizontally written document, and the image data of the cut out characters is directly input to the feature extraction unit 14 through the 90 degree rotation unit 12, and the feature vectors of the characters are input to the matching unit 16. sent to.

この場合、マツチング部16は横書きモードで動作し、
入力文字の特徴ベクトルと共通辞書D1および横書き文
字用辞書D2の特徴ベクトルとの距離演算を行い、最小
距離の文字を探索し、その距離が所定値以下ならば、そ
の文字のコードを出力し、そうでなければリジェクトコ
ードを出力する。
In this case, the matching section 16 operates in horizontal writing mode,
Calculates the distance between the feature vector of the input character and the feature vectors of the common dictionary D1 and the dictionary for horizontal characters D2, searches for a character with the minimum distance, and if the distance is less than a predetermined value, outputs the code of that character, Otherwise, output a reject code.

次に入力文書が縦書き文書と判定された場合の動作を説
明する。この場合、縦書き文書としての行切出しと文字
切出しが行われ、またマツチング部16は縦書きモード
で動作する。
Next, the operation when the input document is determined to be a vertically written document will be explained. In this case, line cutting and character cutting are performed as a vertically written document, and the matching unit 16 operates in vertical writing mode.

切出された文字の画像データは90度回転を施されるこ
となく特徴抽出部14に入力され、その特徴べ、クトル
がマツチング部16に入力される。
The extracted character image data is input to the feature extraction section 14 without being rotated by 90 degrees, and its feature vector is input to the matching section 16.

マツチング部16はその入力文字の特徴ベクトルと共通
辞書D1および縦書き文字用辞書D3の特徴徴ベクトル
とのマツチング演算を行い、所定値以下の最小距離の文
字を探索する。そのような文字を見つけた場合、マツチ
ング部16はその文字コードを出力し、次の文字の認識
に進む。
The matching unit 16 performs a matching operation between the feature vector of the input character and the feature vectors of the common dictionary D1 and the dictionary for vertical characters D3, and searches for a character with a minimum distance less than or equal to a predetermined value. If such a character is found, the matching unit 16 outputs the character code and proceeds to recognize the next character.

しかし、そのような文字が見つからない場合。But if no such character is found.

つまりリジェクト文字が発生した場合、マツチング部1
6からりトライ指示が送出される。この指示に応答して
、90度回転部12はリジェクト文字となった入力文字
の90度回転操作を行い、その90度回転後の入力文字
の画像データを特徴抽出部14に送る。特徴抽出部14
はその90度回転文字の特徴抽出を行い、その特徴ベク
トルをマツチング部16に送る。マツチング部16は、
その特徴ベクトルについて、今度は共通辞書D1および
横書き文字用辞書D2とのマツチング演算を行う、この
2回目のマツチング演算により、所定値以下の最小距離
の文字が見つかれば、その文字コードを認識結果として
出力し、次の文字の認識に進む。今度も所定値以下の最
小距離の文字が見つからなければ、マツチング部16は
その入力文字をリジェクト文字と最終的に判断してリジ
ェクトコードを出力し、次の文字の認識に進む。
In other words, if a reject character occurs, matching section 1
6. A try instruction is sent. In response to this instruction, the 90 degree rotation section 12 performs a 90 degree rotation operation on the input character that has become a rejected character, and sends the image data of the input character after the 90 degree rotation to the feature extraction section 14 . Feature extraction unit 14
extracts the features of the 90 degree rotated character and sends the feature vectors to the matching unit 16. The matching section 16 is
Next, a matching operation is performed on the feature vector with the common dictionary D1 and the dictionary for horizontal characters D2. If a character with a minimum distance less than a predetermined value is found through this second matching operation, that character code is used as the recognition result. output and proceed to recognizing the next character. If a character with a minimum distance equal to or less than a predetermined value is not found this time, the matching unit 16 finally determines the input character as a reject character, outputs a reject code, and proceeds to recognize the next character.

このように、縦書き文書の文字認識は縦書き用辞書(こ
の実施例では縦書き用辞書D3および共通辞WDL)を
用いて行われるが、そのような認識でリジェクト文字が
発生した皆合は、そのリジェクト文字となった入力文字
を90度回転した文字について、横書き文字用辞書(こ
の実施例では横書き文字用辞書D2および共通辞書Di
)を用いて文字認識が試みられる。
In this way, character recognition of vertically written documents is performed using the vertical writing dictionary (in this embodiment, the vertical writing dictionary D3 and the common dictionary WDL), but if a reject character is generated in such recognition, , for characters obtained by rotating the input character 90 degrees as reject characters, the horizontal writing character dictionary (in this embodiment, the horizontal writing character dictionary D2 and the common dictionary Di
) character recognition is attempted.

例えば、第3図に示した縦書き文書の2行目の欧文部分
については、縦書き文字用辞書を用いた1回目の文字認
識動作では認識できず、リジェクト文字が発生する。し
かし、入力文字を90度回転し、横書き文字用辞書を用
いて行われる2回目の文字認識動作で、そのような欧文
部分の各文字は正しく認識できることは明らかである。
For example, the Roman portion of the second line of the vertically written document shown in FIG. 3 cannot be recognized in the first character recognition operation using the dictionary for vertically written characters, and a reject character is generated. However, it is clear that each character in such a Roman part can be correctly recognized by a second character recognition operation in which the input character is rotated by 90 degrees and a dictionary for horizontally written characters is used.

〔効 果〕〔effect〕

以上詳細に説明したように、本発明は縦書き文字の辞書
と、横書き文字の辞書と、文字の画像を90度回転させ
る手段を・備え、縦書き文字の辞書を用いて縦書き文書
の文字認識中にリジェクト文字が発生した場合、そのリ
ジェクト文゛字の画像を90度回転させたのち、横書き
文字の辞書を用いて認識を試みる構成であるから、縦書
き文書に含まられる欧文などの横書き文字についても認
識可能な文字認識装置を実現できる。
As explained in detail above, the present invention includes a dictionary of vertically written characters, a dictionary of horizontally written characters, and a means for rotating a character image by 90 degrees, and uses the dictionary of vertically written characters to rotate characters of a vertically written document. If a reject character occurs during recognition, the image of the rejected character is rotated 90 degrees and then recognition is attempted using a dictionary of horizontally written characters. A character recognition device that can also recognize characters can be realized.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明の文字認識装置の一実施例を示す概略ブ
ロック図、第2図は90度回転部の概略ブロック図、第
3図は横書き文字列が混在した縦書き文書の一例を示す
図である。 2・・・スキャナ、 4・・・バッファ、 6・・・縦
横判定部、 8・・・行切出部、 10・・・文字切出
部、12・・・90度回転、 14・・・特徴抽出部、
16・・・マツチング部、  18・・・辞書、Dl・
・・共通辞書、 D2・・・横書き文字用辞書、D3・
・・縦書き文字用辞書。 第1区 I乙 第2図 ?−カ
Fig. 1 is a schematic block diagram showing an embodiment of the character recognition device of the present invention, Fig. 2 is a schematic block diagram of a 90 degree rotation section, and Fig. 3 is an example of a vertically written document containing a mixture of horizontally written character strings. It is a diagram. 2... Scanner, 4... Buffer, 6... Vertical/horizontal determination section, 8... Line cutting section, 10... Character cutting section, 12... 90 degree rotation, 14... feature extraction section,
16...Matching section, 18...Dictionary, Dl.
・・Common dictionary, D2・・Dictionary for horizontal writing characters, D3・
・Dictionary for vertical writing characters. 1st ward I Otsu 2nd figure? -F

Claims (1)

【特許請求の範囲】[Claims] (1)縦書き文字の辞書と、横書き文字の辞書と、文字
の画像を90度回転させる手段を備え、縦書き文字の辞
書を用いて縦書き文書の文字認識中にリジェクト文字が
発生した場合、そのリジェクト文字の画像を90度回転
させたのち、横書き文字の辞書を用いて認識を試みるこ
とを特徴とする文字認識装置。
(1) When a reject character occurs during character recognition of a vertically written document using a dictionary of vertically written characters, a dictionary of horizontally written characters, and a means for rotating the character image by 90 degrees. , a character recognition device characterized in that after rotating an image of the rejected character by 90 degrees, recognition is attempted using a dictionary of horizontally written characters.
JP60260646A 1985-11-20 1985-11-20 Character recognizing device Pending JPS62120586A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP60260646A JPS62120586A (en) 1985-11-20 1985-11-20 Character recognizing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP60260646A JPS62120586A (en) 1985-11-20 1985-11-20 Character recognizing device

Publications (1)

Publication Number Publication Date
JPS62120586A true JPS62120586A (en) 1987-06-01

Family

ID=17350805

Family Applications (1)

Application Number Title Priority Date Filing Date
JP60260646A Pending JPS62120586A (en) 1985-11-20 1985-11-20 Character recognizing device

Country Status (1)

Country Link
JP (1) JPS62120586A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0325233A2 (en) * 1988-01-18 1989-07-26 Kabushiki Kaisha Toshiba Character string recognition system
JPH0279184A (en) * 1988-09-16 1990-03-19 Hitachi Ltd Method for discriminating normal picture of picture information device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0325233A2 (en) * 1988-01-18 1989-07-26 Kabushiki Kaisha Toshiba Character string recognition system
JPH0279184A (en) * 1988-09-16 1990-03-19 Hitachi Ltd Method for discriminating normal picture of picture information device

Similar Documents

Publication Publication Date Title
JPH1139428A (en) Direction correcting method for document video
JPH01112388A (en) Character recognizing method
JPH0424781A (en) Document processor
JPS62120586A (en) Character recognizing device
JPH11272800A (en) Character recognition device
JP2675303B2 (en) Character recognition method
JPS59158482A (en) Character recognizing device
JPS62166479A (en) Recognizing method for visiting card
JPH0728935A (en) Document image processor
JP2722549B2 (en) Optical character reader
JP2963474B2 (en) Similar character identification method
JPH0714000A (en) Table recognizing device
JPS6386089A (en) Character recognizing device
JP2972443B2 (en) Character recognition device
JP2000187704A (en) Character recognition device, its method and storage medium
JPH03217993A (en) Character size recognizer
JPH05189604A (en) Optical character reader
JPH113433A (en) Table closing line intersection correcting device
JPS5816371A (en) Electronic interpreter
JPH11203400A (en) Character inputting device and method therefor, and machine readable recording medium for recording program for allowing computer to execute the same method
JPH03217994A (en) Document processor
JPH03219384A (en) Character recognizing device
JPH06139277A (en) Electronic dictionary device
JPS60254282A (en) Character recognizing system
JPS63131287A (en) Character recognition system