JPS62295193A

JPS62295193A - Type recognizing method

Info

Publication number: JPS62295193A
Application number: JP61138164A
Authority: JP
Inventors: Koichi Ejiri; 公一江尻; Hajime Sato; 元佐藤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1986-06-16
Filing date: 1986-06-16
Publication date: 1987-12-22

Abstract

PURPOSE:To recognize the European language character of a type with high accuracy by checking a size after an unknown pattern is recognized by utilizing a code to designate the size of the horizontal direction classified by a character and the starting position of a pattern to be connected. CONSTITUTION:On the dictionary (template) of a character pattern, a code designating a size in the horizontal direction classified by a character and the starting position of the pattern to be connected is provided, and after an unknown pattern is recognized by utilizing this, the size is checked. At the time of segmenting it with a segmenting candidate and the character size is not coincident, rejection is obtained. Consequently, the pointer for segmenting is shifted to the next candidate segmenting position and characteristic extraction and matching are executed again. In such case, as well as the preceding case, not only the matching between characteristic vectors, but also the matching of the character size can be executed.

Description

【発明の詳細な説明】３、発明の詳細な説明 ■技術分野本発明は、文字認識装置（ＯＣＲ）特に、雑誌等の印刷
物の組版による文字の認識、タイプ印字のうち、プロポ
ーショナルピッチの認識に関するものである。Detailed Description of the Invention 3. Detailed Description of the Invention ■Technical Field The present invention relates to a character recognition device (OCR), particularly character recognition in typesetting of printed materials such as magazines, and proportional pitch recognition in type printing. It is something.

■従来技術ＯＣＲの前提条件としては、文字を個々に切出すことが
前提となる。■Prior art A prerequisite for OCR is to cut out characters individually.

この切出しは、文字同士が接触する時は、切出しに失敗
することがある。This cutting may fail when characters touch each other.

これを防ぐため、従来は垂直方向に投影したヒストグラ
ムの極小値から、複数の切出し候補を用意し、各切出し
位置で認識を試み、最も類似距離の小さいものを採用し
ていた。To prevent this, in the past, multiple cropping candidates were prepared from the minimum value of the histogram projected in the vertical direction, recognition was attempted at each cropping position, and the one with the smallest similarity distance was selected.

例えば、第１図のようなパターンの場合、Ｐ□を端点と
してＰ、、Ｐ、、Ｐ４の各候補点を用意し、各間隔ＯＬ
　ｌ　ｇ、　１ｇ、の内、Ｐ１〜Ｐ２で最大のマンテン
グを示す候補１１０”を結果として出力していた。For example, in the case of the pattern shown in Figure 1, each candidate point P, , P, , P4 is prepared with P□ as the end point, and each interval OL is
Among 1g and 1g, candidate 110'', which exhibits the maximum mantength in P1 and P2, was output as a result.

ところが、第２図（１）のようにパターンにかすれ等の
不良があるとき新たな切出し候補Ｐ、が生じる。この場
合、Ｐ５で切出されたパターンは１１　Ｃ＃１と認識さ
れ、その後の認識に悪影響を及ぼすことになる。However, as shown in FIG. 2(1), when there is a defect such as blurring in the pattern, a new cutting candidate P is generated. In this case, the pattern cut out at P5 will be recognized as 11 C#1, which will have an adverse effect on subsequent recognition.

更に、第２図（２）のように隣り合った文字が重なり合
う場合には、切出しそのものが極めて困難であった。Furthermore, when adjacent characters overlap as shown in FIG. 2 (2), it is extremely difficult to cut out the characters.

上記のような場合、認識不能（リジェクト）と判断し、
再び切出し処理をやり直すことが既に出願されているが
、ここでは説明を省略する。In the above cases, it is judged as unrecognizable (rejected),
Although an application has already been filed for starting the cutting process again, the explanation will be omitted here.

通常の印刷物、組版出力については、各文字毎に文字サ
イズや次の文字の左端位置が決まっており、第２図（１
）のＰ、が切出し候補であっても、通常“ＣＴｔの文字
サイズはＰ工〜Ｐ５間隔と異なるから０識結果が“Ｃ７
１になることはない。For normal printed matter and typesetting output, the font size and left edge position of the next character are determined for each character, as shown in Figure 2 (1).
) is a candidate for cutting out, the font size of “CTt is usually different from the P-P5 interval, so the 0 recognition result is “C7”.
It will never be 1.

■　目的本発明は、活字の欧文文字を高い精度で認識することを
特徴とする特に、組版で印刷された文字は互いに接触していること
が多く、これを高精度で認識することが要求されている
。■ Purpose The present invention is characterized by recognizing printed European characters with high precision. In particular, characters printed by typesetting often touch each other, and it is required to recognize these with high precision. ing.

■構成第３図は本発明の全体の処理フローであり、第２図（１
）のＰ５で切出した時には、文字サイズが合致せず、そ
の結果リジェクトとなる。■Configuration Figure 3 shows the overall processing flow of the present invention, and Figure 2 (1)
), the character sizes do not match, resulting in a rejection.

従って、次の候補切出し位置Ｐ２へ切出し用のポインタ
をシフトし、再度特徴抽出／マツチングを行う。Therefore, the pointer for clipping is shifted to the next candidate clipping position P2, and feature extraction/matching is performed again.

この時は、前回同様、特徴ベクトル間のマツチングだけ
でなく、文字サイズのマツチングも実施することは言う
までもない。Needless to say, at this time, like last time, not only matching between feature vectors but also character size matching is performed.

第４図は文字認識用の特徴ベクトルの辞書を表現した例
を示す。FIG. 4 shows an example of a dictionary of feature vectors for character recognition.

図中、１は登録コード、２は出力コード、３はサイズ、
４はスタートシフト、５は特徴ベクトルである。In the figure, 1 is the registration code, 2 is the output code, 3 is the size,
4 is a start shift, and 5 is a feature vector.

すなわち。Namely.

登録コード（英文ならＡＳＣＩ　Ｉ、邦文ならＪＩＳコ
ードが一般的）１゜出力コード（これは１組版で使用される組合せ活字に対
応しており、第２図（２）の“ｆｉ”などは１つの活字
で印字されるが、出力結果はｔｓｆ”と“ｉ”に分離さ
れ出力される）２、サイズ（文字の水平方向の大きさを示す数値であり、単
位は任意）３、スタートシフト（第２図（２）のように、左側のパター
ンの内部に右側のパターンが入り込んでいる場合、その
大きさを示す値）４、および、通常の特徴ベクトル５から成る。Registration code (ASCI I for English text, JIS code for Japanese text is common) 1° Output code (This corresponds to the combination typeface used in one typesetting, such as "fi" in Figure 2 (2) is 1. (Although the output result is separated into "tsf" and "i" and output) 2. Size (a numerical value indicating the horizontal size of the character, the unit is arbitrary) 3. Start shift ( As shown in FIG. 2 (2), when the pattern on the right is included inside the pattern on the left, it consists of a value 4 indicating its size and a normal feature vector 5.

第５図では、Ａがサイズ３であり、Ｂがスタートシフト
４である。In FIG. 5, A is size 3 and B is start shift 4.

文字パターンのオーバーラツプ（第２図（２））に対応
するためには、上述の２字組の辞書を用意するだけでな
く、切出しエラーを防ぐために別個の認識法も利用する
。In order to deal with character pattern overlap (FIG. 2 (2)), not only the above-mentioned two-character dictionary is prepared, but also a separate recognition method is used to prevent segmentation errors.

第５図のＡの幅のパターンにＷ、　Ｄされたコード毎に
マツチングするとき、スタートシフトの値が０より大き
いコードに付いては、この値だけパターンの右端を削除
した上でマツチング処理を実行する。When matching each chord W and D to the pattern of width A in Figure 5, for chords whose start shift value is greater than 0, the right end of the pattern is deleted by this value before matching. Execute.

第６図は第３図の「パターン切出し」、「特徴抽出＆マ
ツチング」の部分の詳細説明である。FIG. 6 is a detailed explanation of the "pattern extraction" and "feature extraction &matching" portions of FIG. 3.

以上の説明において、文字サイズは予め与えておく必要
がある。ところが、この値はフォントの種類毎に異なる
。In the above explanation, the character size needs to be given in advance. However, this value differs depending on the type of font.

このため、個別のメモリを有し、この中に表に示すよう
なデータを保持しておく。For this reason, it has a separate memory in which data as shown in the table is held.

従って、認識の開始時には、使用頻度の高いフォントの
データを辞書に登録し、もし、リジェクトが多数発生す
れば、このデータを入九換えることによって最適なもの
を探すようにすれば、多様なフォントに対応することが
できる。Therefore, at the beginning of recognition, data for frequently used fonts is registered in the dictionary, and if a large number of rejections occur, this data can be changed to find the most suitable one. can correspond to

ところで、重なりパターンを認識する際１重なりパター
ン全体でマツチングを行い、特定の登録コードを候補と
する。By the way, when recognizing an overlapping pattern, matching is performed on the entire overlapping pattern, and a specific registered code is used as a candidate.

次に、Ｓｓをもとにパターンの一部の削除（右。Next, delete part of the pattern based on Ss (right).

左）を行い、それぞれの部分について登録コードに対応
する文字の特徴ベクトルとマツチングを行う。(left) and performs matching for each part with the feature vector of the character corresponding to the registered code.

入カバターンがｆｉの時は、ｆｉが候補となり、次にｆ
とｉで認識する。When the input cover turn is fi, fi becomes a candidate, then f
Recognize with and i.

こうして認識精度を高める。This increases recognition accuracy.

■効果以上説明したように、本発明は、高い認識精度で活字文
字を認識できるという効果がある。(2) Effects As explained above, the present invention has the effect of being able to recognize printed characters with high recognition accuracy.

[Brief explanation of the drawing]

第１図は従来のパターンの一例を示す図、第２図は従来
の文字における文字認識の問題点を説明するための図、第３図は本発明の全体の処理フローを示す図、第４図は
文字認識用の特徴ベクトルの辞書を表現した例を示す図
、第５図は文字パターンのオーバーラツプの例を示す図、第６図は第３図の「パターン切出し」、「特徴抽出＆マ
ツチング」の部分の詳細説明である。１　・・・登録コード、２　・・・　出力コード、３　
・・・サイズ、　　　４　・・・スタートシフト、５　
・・・特徴ベクトル。特許出願人　　株式会社　リコー第１図第２図（＋　）　　　　（２）弓Ｐ５Ｐ２Ｐ３第３図第４図第５図第６図Fig. 1 is a diagram showing an example of a conventional pattern, Fig. 2 is a diagram for explaining problems in character recognition in conventional characters, Fig. 3 is a diagram showing the overall processing flow of the present invention, and Fig. 4 is a diagram showing an example of a conventional pattern. The figure shows an example of a dictionary of feature vectors for character recognition, Figure 5 shows an example of character pattern overlap, and Figure 6 shows the "pattern extraction" and "feature extraction &matching" methods in Figure 3. ” This is a detailed explanation of the part. 1...Registration code, 2...Output code, 3
...Size, 4 ...Start shift, 5
...feature vector. Patent applicant Ricoh Co., Ltd. Figure 1 Figure 2 (+) (2) Bow P5P2P3 Figure 3 Figure 4 Figure 5 Figure 6

Claims

[Claims]

(1) A device that optically recognizes printed characters has a code on a dictionary (template) of character patterns that specifies the horizontal size of each character and then the start position of the connected pattern; A type recognition method that uses this to recognize an unknown pattern and then performs a size check.

(2) For character patterns that partially overlap when the patterns are projected in the vertical direction, part of the pattern is deleted and then the feature vectors are matched. The printed character recognition method described in section 1).

(3) The printed character recognition method according to claim (1), characterized in that the character size and numerical values specifying the starting position of the next character are stored in separate tables.

(4) The printed character recognition method according to claim (2), characterized in that when performing matching, only specific characters constituting an overlapping pattern are recognized.