JPS62120586A

JPS62120586A - Character recognizing device

Info

Publication number: JPS62120586A
Application number: JP60260646A
Authority: JP
Inventors: Koichi Ejiri; 公一江尻; Akira Sakurai; 彰桜井
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1985-11-20
Filing date: 1985-11-20
Publication date: 1987-06-01

Abstract

PURPOSE:To recognize a laterally written character such as alphabets included in a vertical written document by recognizing a rejected character through the use of a dictionary for laterally written characters after its picture is rotated by 90 deg.. CONSTITUTION:Picture data from a scanner 2 is accumulated in a buffer 4, and accessed by a vertical and lateral direction decision part 6 and a line segment part 8. If an inputted sentence is decided to be vertical written, it is inputted to a feature extraction part 14 without being rotated 90 deg.. The feature vector of the sentence is inputted to a matching part 16, and its input and the feature vectors of a common dictionary D1 and a laterally written character dictionary D3 are calculated. If a rejected character occurs because a character with a minimum distance cannot be found, a rotating part 12 rotates the input character by 90 deg.. The picture data of the input character is transmitted to the feature extraction part 14, and its feature vector is calculated in the matching part 16 based on the common dictionary D1 and the laterally written character dictionary D2. Finally the code of the character with a minimum distance is outputted as a recognized result.

Description

【発明の詳細な説明】〔技術分野〕本発明は文字認識装置に関し、さらに詳しくは、縦書き
文書の文字認識の可能な文字認識装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Technical Field] The present invention relates to a character recognition device, and more particularly to a character recognition device capable of recognizing characters in vertically written documents.

[Prior art]

文字認識装置は一般に横書き文書用に作られているが、
縦書き文書用の文字認識装置も一部開発されている。Character recognition devices are generally made for horizontally written documents, but
Some character recognition devices for vertically written documents have also been developed.

しかし、従来の縦書き文書用文字認識装置にあっては、
縦書き文書に横書き文字が含まれていると。However, in conventional character recognition devices for vertically written documents,
When a vertically written document contains horizontally written characters.

その横書きの文字の文字認識が不可能であった。It was impossible to recognize the horizontally written characters.

例えば第３図に示すような縦書き文書は珍らしくないが
、この文書の２行目の欧文文字列の文字を認識すること
ができなかった。For example, a vertically written document like the one shown in FIG. 3 is not uncommon, but the characters in the Roman character string on the second line of this document could not be recognized.

〔the purpose〕

本発明の目的は、縦書き文書に含まれる欧文などの横書
き文字も正しく認識可能な文字認識装置を提供すること
にある。An object of the present invention is to provide a character recognition device that can correctly recognize horizontally written characters such as European characters included in a vertically written document.

〔composition〕

この目的を達成すべくなされた本発明の文字認識装置は
、縦書き文字の辞書と、横書き文字の辞書と、文字の画
像を９０度回転させる手段を備え、縦書き文字の辞書を
用いて縦書き文書の文字認識中にリジェクト文字が発生
した場合、そのリジェクト文字の画像を９０度回転させ
たのち、横書き文字の辞書を用いて認識を試みることを
特徴とするものである。The character recognition device of the present invention, which has been made to achieve this object, is equipped with a dictionary for vertically written characters, a dictionary for horizontally written characters, and a means for rotating a character image by 90 degrees. When a reject character is generated during character recognition of a written document, the image of the reject character is rotated by 90 degrees, and then recognition is attempted using a dictionary of horizontally written characters.

〔Example〕

以下、本発明の一実施例について、図面を参照し説明す
る。An embodiment of the present invention will be described below with reference to the drawings.

第１図は本発明の文字認識装置の一実施例を示す概略ブ
ロック図である。この図において、２は入力文書を画素
分解して読み取るスキャナであり、白黒２値の画像デー
タを出力する。この画像データはバッファ４に一時的に
蓄積される。このバッファ４は縦横判定部６および行切
出部８によってアクセスされる。FIG. 1 is a schematic block diagram showing an embodiment of the character recognition device of the present invention. In this figure, numeral 2 denotes a scanner that separates and reads an input document into pixels, and outputs black and white binary image data. This image data is temporarily stored in the buffer 4. This buffer 4 is accessed by an aspect determination section 6 and a line cutting section 8.

縦横判定部６は入力文書が縦書き文書であるか、横書き
文書であるかの判別を行うものである。このような縦横
判別は種々の方法で可能であるが。The vertical/horizontal determining unit 6 determines whether the input document is a vertically written document or a horizontally written document. This type of vertical/horizontal determination can be performed using various methods.

本実施例にあっては、次のような方法によって縦横判別
がなされる。In this embodiment, the orientation is determined by the following method.

文書画像のラン特性を調べると、縦書き文字の部分では
縦の短い白ランの発生頻度が高いのに対し、横書き文字
の部分では横の短い白ランの発生頻度が高い。そこで縦
横判定部６はバッファ４をアクセスして入力文書の画像
データを読込み、縦の短い白ランと横の短い白ランの発
生頻度を測定し、その測定結果から縦書き文書または横
書き文書の別を判定する。このような縦横判別の詳細に
ついては、特開昭５６−１４９６７４号「画像特性の識
別方法」に詳しく述べられている。When examining the run characteristics of a document image, it is found that short vertical white runs occur more frequently in vertically written characters, while short horizontal white runs occur more frequently in horizontally written characters. Therefore, the aspect determination unit 6 accesses the buffer 4 to read the image data of the input document, measures the frequency of occurrence of short white runs in the vertical direction and short white runs in the horizontal direction, and uses the measurement results to determine whether the document is vertically written or horizontally written. Determine. The details of such vertical/horizontal discrimination are described in Japanese Patent Application Laid-Open No. 56-149674 entitled "Identification Method of Image Characteristics".

行切出部８はバッファ４をアクセスして画像データを読
込み、入力文書の行を切出して各行の画像データを文字
切出部１０に入力する。この行切出処理は例えば公知の
射影法によって行われる。The line cutting unit 8 accesses the buffer 4 to read image data, cuts out lines of the input document, and inputs the image data of each line to the character cutting unit 10. This line cutting process is performed, for example, by a known projection method.

この行切出処理は縦書きと横書きとで行の切出方向を変
える必要があるので、縦横判別部６より縦横判別結果が
行切出部８に通知されるようになっている。Since this line cutting process requires changing the line cutting direction for vertical and horizontal writing, the vertical/width determining unit 6 notifies the line cutting unit 8 of the results of the vertical/width determination.

文字切出部１ｏは行切出部８より入力される行単位の画
像データから個々の文字の画像データを切出すものであ
り、その文字切出処理は例えば公知の射影法によって行
われる。切出された文字画像データは、９０度回転部１
２を介して特徴抽出部１４に入力される。The character cutting section 1o cuts out image data of individual characters from the line-by-line image data inputted from the line cutting section 8, and the character cutting process is performed, for example, by a known projection method. The extracted character image data is transferred to the 90 degree rotation section 1.
2 to the feature extraction unit 14.

この９０度回転部１２は通常、文字画像データをそのま
＞（９０度回転の操作を施さないで）特徴抽出部１４へ
伝達するが、９０度回転をマツチング部１６より指示さ
れた場合には、文字画像を９０度回転させた画像データ
を特徴抽出部１４に与える。This 90 degree rotation unit 12 normally transmits the character image data as is (without performing a 90 degree rotation operation) to the feature extraction unit 14, but when 90 degree rotation is instructed by the matching unit 16, , provides image data obtained by rotating the character image by 90 degrees to the feature extraction unit 14.

第２図に９０度回転部１２の構成を示す。この図に示す
ように、９０度回転部１２は１文字分の両像データを蓄
積できる文字画像メモリ３ｏとＸ。FIG. 2 shows the configuration of the 90 degree rotating section 12. As shown in this figure, the 90 degree rotation unit 12 has character image memories 3o and X that can store image data for one character.

Ｙアドレスカウンタ３１．３２を主要要素としてなるも
のである。文字画像メモリ３０への画像データの書込み
の際には、Ｘアドレスカウンタをゼロから順次インクリ
メントして最大値に達すると、Ｘアドレスカウンタ３１
をゼロクリアすると＼もにＹアドレスカウンタを１だけ
インクリメントさせる。このようなアドレス更新をＸア
ドレスカウンタ３２が最大値になるまで繰返す。The main elements are Y address counters 31 and 32. When writing image data to the character image memory 30, the X address counter is sequentially incremented from zero and when it reaches the maximum value, the X address counter 31 is incremented sequentially from zero.
When is cleared to zero, the Y address counter is incremented by 1. Such address updating is repeated until the X address counter 32 reaches the maximum value.

文字画像メモリ３０からの画像の読出しは、通常は書込
みの場合と同様にＸ、Ｙアドレスカウンタ３１．３２を
インクリメントしながら行われる。Reading of an image from the character image memory 30 is normally performed while incrementing the X and Y address counters 31 and 32, as in the case of writing.

これに対し、９０度回転を指示された場合には、Ｘアド
レスカウンタ３２を最大値からデクリメントしていき、
ゼロになるとＸアドレスカウンタ３１をゼロから１ずつ
インクリメントするというアドレス更新を行うことによ
り、９０度回転した文字の画像データを特徴抽出部１４
へ送る。On the other hand, when a 90 degree rotation is instructed, the X address counter 32 is decremented from the maximum value,
By updating the address by incrementing the X address counter 31 by 1 from zero when it reaches zero, the feature extraction unit 14 extracts the image data of the character rotated by 90 degrees.
send to

特徴抽出部１４は、入力された文字画像データの特徴を
抽出して特徴ベクトルを作成し、それをマツチング部１
６へ送る。このマツチング部１４は、その特徴パラメー
タと辞書１８に登録されている各文字の特徴ベクトルと
のマツチング演算を実行し、最小距離の文字を探索する
ものである。The feature extraction unit 14 extracts the features of the input character image data to create a feature vector, and the matching unit 1
Send to 6. The matching unit 14 performs a matching operation between the feature parameters and the feature vectors of each character registered in the dictionary 18, and searches for a character with the minimum distance.

こ−で、辞ＩＦ１８は基本的には縦書き文字用辞書と文
字用辞書とから構成されるが、この実施例では、辞書容
量を減するために、特徴ベクトルが縦書きでも横書きで
も共通な文字については辞書を一つの共通辞書Ｄ１に統
合し、それ以外の文字の辞書を横書き文字用辞書Ｄ２お
よび縦書き文字用辞書Ｄ３に分離させた構造となってい
る。The dictionary IF 18 is basically composed of a dictionary for vertically written characters and a dictionary for characters, but in this embodiment, in order to reduce the dictionary capacity, feature vectors are common to both vertically and horizontally written characters. The structure is such that the dictionaries for characters are integrated into one common dictionary D1, and the dictionaries for other characters are separated into a dictionary for horizontally written characters D2 and a dictionary for vertically written characters D3.

縦書き文書と横書き文書では用いる辞書などが相違する
ため、縦横判定部６より縦横判別の結果がマツチング部
１６に通知される。Since the dictionaries and the like used are different between vertically written documents and horizontally written documents, the matching section 16 is notified of the result of the vertical/horizontal determination from the vertical/width determining section 6.

つぎに全体的動作について説明する。まず入力文書が横
書き文書であると縦横判定部６により判定された場合の
動作について説明する。Next, the overall operation will be explained. First, an explanation will be given of the operation when the aspect determination unit 6 determines that the input document is a horizontally written document.

この場合、横書き文書としての行切出しおよび文字切出
しがなされ、切出された文字の画像データは９０度回転
部１２を通じてそのまへ特徴抽出部１４へ入力され、そ
の文字の特徴ベクトルがマツチング部１６へ送られる。In this case, line cutting and character cutting are performed as a horizontally written document, and the image data of the cut out characters is directly input to the feature extraction unit 14 through the 90 degree rotation unit 12, and the feature vectors of the characters are input to the matching unit 16. sent to.

この場合、マツチング部１６は横書きモードで動作し、
入力文字の特徴ベクトルと共通辞書Ｄ１および横書き文
字用辞書Ｄ２の特徴ベクトルとの距離演算を行い、最小
距離の文字を探索し、その距離が所定値以下ならば、そ
の文字のコードを出力し、そうでなければリジェクトコ
ードを出力する。In this case, the matching section 16 operates in horizontal writing mode,
Calculates the distance between the feature vector of the input character and the feature vectors of the common dictionary D1 and the dictionary for horizontal characters D2, searches for a character with the minimum distance, and if the distance is less than a predetermined value, outputs the code of that character, Otherwise, output a reject code.

次に入力文書が縦書き文書と判定された場合の動作を説
明する。この場合、縦書き文書としての行切出しと文字
切出しが行われ、またマツチング部１６は縦書きモード
で動作する。Next, the operation when the input document is determined to be a vertically written document will be explained. In this case, line cutting and character cutting are performed as a vertically written document, and the matching unit 16 operates in vertical writing mode.

切出された文字の画像データは９０度回転を施されるこ
となく特徴抽出部１４に入力され、その特徴べ、クトル
がマツチング部１６に入力される。The extracted character image data is input to the feature extraction section 14 without being rotated by 90 degrees, and its feature vector is input to the matching section 16.

マツチング部１６はその入力文字の特徴ベクトルと共通
辞書Ｄ１および縦書き文字用辞書Ｄ３の特徴徴ベクトル
とのマツチング演算を行い、所定値以下の最小距離の文
字を探索する。そのような文字を見つけた場合、マツチ
ング部１６はその文字コードを出力し、次の文字の認識
に進む。The matching unit 16 performs a matching operation between the feature vector of the input character and the feature vectors of the common dictionary D1 and the dictionary for vertical characters D3, and searches for a character with a minimum distance less than or equal to a predetermined value. If such a character is found, the matching unit 16 outputs the character code and proceeds to recognize the next character.

しかし、そのような文字が見つからない場合。But if no such character is found.

つまりリジェクト文字が発生した場合、マツチング部１
６からりトライ指示が送出される。この指示に応答して
、９０度回転部１２はリジェクト文字となった入力文字
の９０度回転操作を行い、その９０度回転後の入力文字
の画像データを特徴抽出部１４に送る。特徴抽出部１４
はその９０度回転文字の特徴抽出を行い、その特徴ベク
トルをマツチング部１６に送る。マツチング部１６は、
その特徴ベクトルについて、今度は共通辞書Ｄ１および
横書き文字用辞書Ｄ２とのマツチング演算を行う、この
２回目のマツチング演算により、所定値以下の最小距離
の文字が見つかれば、その文字コードを認識結果として
出力し、次の文字の認識に進む。今度も所定値以下の最
小距離の文字が見つからなければ、マツチング部１６は
その入力文字をリジェクト文字と最終的に判断してリジ
ェクトコードを出力し、次の文字の認識に進む。In other words, if a reject character occurs, matching section 1
6. A try instruction is sent. In response to this instruction, the 90 degree rotation section 12 performs a 90 degree rotation operation on the input character that has become a rejected character, and sends the image data of the input character after the 90 degree rotation to the feature extraction section 14 . Feature extraction unit 14
extracts the features of the 90 degree rotated character and sends the feature vectors to the matching unit 16. The matching section 16 is
Next, a matching operation is performed on the feature vector with the common dictionary D1 and the dictionary for horizontal characters D2. If a character with a minimum distance less than a predetermined value is found through this second matching operation, that character code is used as the recognition result. output and proceed to recognizing the next character. If a character with a minimum distance equal to or less than a predetermined value is not found this time, the matching unit 16 finally determines the input character as a reject character, outputs a reject code, and proceeds to recognize the next character.

このように、縦書き文書の文字認識は縦書き用辞書（こ
の実施例では縦書き用辞書Ｄ３および共通辞ＷＤＬ）を
用いて行われるが、そのような認識でリジェクト文字が
発生した皆合は、そのリジェクト文字となった入力文字
を９０度回転した文字について、横書き文字用辞書（こ
の実施例では横書き文字用辞書Ｄ２および共通辞書Ｄｉ
）を用いて文字認識が試みられる。In this way, character recognition of vertically written documents is performed using the vertical writing dictionary (in this embodiment, the vertical writing dictionary D3 and the common dictionary WDL), but if a reject character is generated in such recognition, , for characters obtained by rotating the input character 90 degrees as reject characters, the horizontal writing character dictionary (in this embodiment, the horizontal writing character dictionary D2 and the common dictionary Di
) character recognition is attempted.

例えば、第３図に示した縦書き文書の２行目の欧文部分
については、縦書き文字用辞書を用いた１回目の文字認
識動作では認識できず、リジェクト文字が発生する。し
かし、入力文字を９０度回転し、横書き文字用辞書を用
いて行われる２回目の文字認識動作で、そのような欧文
部分の各文字は正しく認識できることは明らかである。For example, the Roman portion of the second line of the vertically written document shown in FIG. 3 cannot be recognized in the first character recognition operation using the dictionary for vertically written characters, and a reject character is generated. However, it is clear that each character in such a Roman part can be correctly recognized by a second character recognition operation in which the input character is rotated by 90 degrees and a dictionary for horizontally written characters is used.

〔effect〕

以上詳細に説明したように、本発明は縦書き文字の辞書
と、横書き文字の辞書と、文字の画像を９０度回転させ
る手段を・備え、縦書き文字の辞書を用いて縦書き文書
の文字認識中にリジェクト文字が発生した場合、そのリ
ジェクト文゛字の画像を９０度回転させたのち、横書き
文字の辞書を用いて認識を試みる構成であるから、縦書
き文書に含まられる欧文などの横書き文字についても認
識可能な文字認識装置を実現できる。As explained in detail above, the present invention includes a dictionary of vertically written characters, a dictionary of horizontally written characters, and a means for rotating a character image by 90 degrees, and uses the dictionary of vertically written characters to rotate characters of a vertically written document. If a reject character occurs during recognition, the image of the rejected character is rotated 90 degrees and then recognition is attempted using a dictionary of horizontally written characters. A character recognition device that can also recognize characters can be realized.

[Brief explanation of drawings]

第１図は本発明の文字認識装置の一実施例を示す概略ブ
ロック図、第２図は９０度回転部の概略ブロック図、第
３図は横書き文字列が混在した縦書き文書の一例を示す
図である。２・・・スキャナ、　４・・・バッファ、　６・・・縦
横判定部、　８・・・行切出部、　１０・・・文字切出
部、１２・・・９０度回転、　１４・・・特徴抽出部、
１６・・・マツチング部、　　１８・・・辞書、Ｄｌ・
・・共通辞書、　Ｄ２・・・横書き文字用辞書、Ｄ３・
・・縦書き文字用辞書。第１区Ｉ乙第２図？−カFig. 1 is a schematic block diagram showing an embodiment of the character recognition device of the present invention, Fig. 2 is a schematic block diagram of a 90 degree rotation section, and Fig. 3 is an example of a vertically written document containing a mixture of horizontally written character strings. It is a diagram. 2... Scanner, 4... Buffer, 6... Vertical/horizontal determination section, 8... Line cutting section, 10... Character cutting section, 12... 90 degree rotation, 14... feature extraction section,
16...Matching section, 18...Dictionary, Dl.
・・Common dictionary, D2・・Dictionary for horizontal writing characters, D3・
・Dictionary for vertical writing characters. 1st ward I Otsu 2nd figure? -F

Claims

[Claims]

(1) When a reject character occurs during character recognition of a vertically written document using a dictionary of vertically written characters, a dictionary of horizontally written characters, and a means for rotating the character image by 90 degrees. , a character recognition device characterized in that after rotating an image of the rejected character by 90 degrees, recognition is attempted using a dictionary of horizontally written characters.