JPH04130979A

JPH04130979A - Character picture segmenting method

Info

Publication number: JPH04130979A
Application number: JP2253850A
Authority: JP
Inventors: Akiko Nakajima; 明子中島
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1990-09-21
Filing date: 1990-09-21
Publication date: 1992-05-01
Anticipated expiration: 2014-07-12
Also published as: JP2918666B2

Abstract

PURPOSE:To obtain the character frame of a specified character composed of only line segments vertical in a row direction by deciding a condition concerning the length and integrated width of a character element circumscribed rectangle having width narrower than a prescribed value and a following character element circumscribed rectangle. CONSTITUTION:At a character picture segmentation processing part 2, the linkage of black picture elements on a document picture is searched, the group of linked black picture elements is extracted as a character element, and the coordinates of the circumscribed rectangle of the character element (such as the coordinates of a diagonal apex, for example) are extracted as a character element data. On the other hand, the horizontal and vertical distances between the circumscribed rectangles of the character elements are calculated, the group of the character element circumscribed rectangles having the both distances smaller than a certain threshold value is segmented as one row, and the data is stored in a character element data memory 7 as well. A parameter setting part 8 monitors the entire character element data in the character element data memory 7, sets a relative parameter used for character segmentation (character element integration and character pattern synthesization) and stores the set parameter in a parameter memory 9. According to the condition decision using the parameter in the parameter memory 9, a character element integration part 10 starts the integration of the character element circumscribed rectangles, which are regarded as the same character, successively from the character element at the head of the row so as to correct the character element data in the character element data memory 7.

Description

【発明の詳細な説明】【産業上の利用分野〕本発明は、文字認識システムにおいて２値画像として入
力された文書画像より文字画像を切出す方法に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a method for cutting out a character image from a document image input as a binary image in a character recognition system.

[Conventional technology]

文字認識システムにおいては、スキャナー等によって２
値画像として入力された文書画像より行を切出し、各行
毎に文字画像を切出して文字認識を行う。In character recognition systems, two
Lines are cut out from the document image input as a value image, character images are cut out for each line, and character recognition is performed.

この文字画像の切出し方法は、射影による方法と連結黒
画素の外接矩形による方法に大別される。Methods for cutting out character images are roughly divided into a method using projection and a method using a circumscribed rectangle of connected black pixels.

しかし、いずれの方法であっても、射影の一塊もしくは
一つの外接矩形が、そのまま一つの文字のデータとなる
とは限らないため、それらのデータを組合せて、あるい
は区切って１文字データを得る技術が必要である。However, no matter which method you use, a block of projection or a single circumscribed rectangle does not always directly turn into data for one character, so there is no technology to combine or separate the data to obtain single character data. is necessary.

そこで従来、連結黒画素（文字素）の外接矩形を抽出し
、単独の外接矩形または複数の外接矩形の組合わせの幅
を、予め設定された標準文字幅と比較し、１文字とみな
し得る幅の外接矩形またはその組合わせ領域を文字枠と
して文字画像を切出す方法（特開昭６１−１１７６７０
号）が知られている。Conventionally, the circumscribed rectangle of connected black pixels (graphemes) is extracted, and the width of a single circumscribed rectangle or a combination of multiple circumscribed rectangles is compared with a preset standard character width, and the width that can be considered as one character is calculated. A method for cutting out a character image using a circumscribed rectangle or a combination thereof as a character frame (Japanese Patent Laid-Open No. 117670/1983
No.) is known.

また、単独の外接矩形または組合わせの領域を文字画像
とみなして文字認識を行い、その結果の確からしさの評
価により文字画像を確定する方法が知られている（同一
出願人による昭和６３年特許出願第１３３４２４号）。There is also a known method in which a single circumscribed rectangle or a combination of regions is regarded as a character image, character recognition is performed, and the character image is determined by evaluating the certainty of the result (patented in 1988 by the same applicant). Application No. 133424).

〔発明が解決しようとする課題］縦書き行の漢数字「−１二、三」や横書き行の漢字「川
」のように、行方向の垂直な線分のみからなる文字は、
１文字分の文字幅を持っていないことが多く、また連続
して出現した場合、どの部分で区切ってもそれなりの認
識結果が得られる。[Problem to be solved by the invention] Characters consisting only of vertical line segments, such as the vertically written Chinese numerals "-1, 2, 3" and the horizontally written kanji "kawa", are
They often do not have the width of one character, and if they appear consecutively, a certain recognition result can be obtained no matter where they are separated.

したがって、このような文字が連続して出現した場合等
には、前後の空白幅を考慮しても、前記従来方法では切
出し間違いが起こりやすい。Therefore, when such characters appear consecutively, cutting errors are likely to occur in the conventional method, even if the preceding and following blank widths are considered.

本発明の目的は、行方向に垂直な線分のみからなる漢数
字等の文字画像が連続して出現した場合にも、正確な切
出しが可能な文字画像切出し方法を提供することにある
。SUMMARY OF THE INVENTION An object of the present invention is to provide a character image cutting method that can accurately cut out even when character images such as Chinese numerals consisting only of line segments perpendicular to the row direction appear consecutively.

１課題を解決するための手段〕本発明は、行画像から抽出した文字素外接矩形を統合す
ることによって文字枠を得て行画像より文字画像を切出
す文字画像切出し方法において、幅が所定値より小さい
文字素外接矩形に注目し、該注目した文字素外接矩形と
後続の文字素外接矩形の長さ及び統合したときの幅に関
する条件判定を行い、該条件判定の結果に従って該注目
した文字素外接矩形と後続の文字素外接矩形の統合を制
御することにより、行方向に垂直な線分のみからなる特
定の文字の文字枠を得ることを特徴とする。1. Means for Solving the Problems] The present invention provides a character image cutting method in which a character frame is obtained by integrating grapheme circumscribing rectangles extracted from a line image, and a character image is cut out from a line image. Focusing on the smaller grapheme circumscribing rectangle, a conditional judgment is made regarding the length and width of the drawn grapheme circumscribing rectangle and the subsequent grapheme circumscribing rectangle, and the drawn grapheme is determined according to the result of the condition judgment. By controlling the integration of a circumscribing rectangle and a subsequent grapheme circumscribing rectangle, a character frame for a specific character consisting only of line segments perpendicular to the line direction is obtained.

〔作　用］縦書きの漢数字「−二三」のように、行方向に垂直な線
分のみからなる特定の文字の場合、各文字を構成する各
線分が文字素として抽出されるが、その外接矩形は幅が
十分に小さく、また各線分の長さは文字に固有の比率関
係を持っている。文字全体の幅も文字に固有である。[Function] In the case of a specific character consisting only of line segments perpendicular to the line direction, such as the vertically written Chinese numeral ``-23'', each line segment that makes up each character is extracted as a grapheme, but The width of the circumscribing rectangle is sufficiently small, and the length of each line segment has a proportional relationship specific to the character. The overall width of a character is also character specific.

したがって上に述べた本発明によれば、縦書き行の文字
列「−二三」を例にした場合、次の通り正確に切出すこ
とが可能である。Therefore, according to the present invention described above, when the vertically written character string "-23" is taken as an example, it is possible to accurately cut out the character string as follows.

まず、「−」の文字素外接矩形の幅は十分に小さいので
、これに注目するが、次の文字素外接矩形（「二」の上
線分に対応）と統合した幅は、間に文字間スペースが存
在するために標準的な文字間としては過大となるので、
注目した文字素外接矩形は次の文字素外接矩形とは別々
の文字を構成すると判定して統合せず、注目した文字素
外接矩形を単独で文字枠として「−」の画像を切出すこ
とができる。First, the width of the glyph-eme circumscribing rectangle for "-" is sufficiently small, so we will pay attention to it, but the width when integrated with the next glyph-eme circumscribing rectangle (corresponding to the upper line segment of "2") is Due to the presence of spaces, the standard spacing between characters is too large, so
The noticed grapheme circumscribing rectangle is determined to constitute a separate character from the next grapheme circumscribing rectangle and is not integrated, and the image of "-" can be cut out using the noticed grapheme circumscribing rectangle as a single character frame. can.

次に「二」の上線分に対応した文字素外接矩形に注目す
るが、次の上線分に対応した文字素外接矩形と統合した
間は、間に文字間スペースがないため標準的な文字幅と
して過大ではない。さらに注目した文字素外接矩形の長
さ（線分の長さ）は次の文字素外接矩形の長さより短い
。このような条件から二つの文字素外接矩形は「二」を
構成すると判定して統合し、統合した矩形を文字枠とし
て「二」の画像を切出すことができる。Next, we will focus on the grapheme circumscribing rectangle corresponding to the upper line segment of "2", but the character width is the standard character width since there is no inter-character space between the characters. It's not too much. Furthermore, the length (length of a line segment) of the character element circumscribing rectangle of interest is shorter than the length of the next character element circumscribing rectangle. Based on these conditions, it is possible to determine that the two character element circumscribing rectangles constitute "2" and integrate them, and cut out the image of "2" using the integrated rectangle as a character frame.

次に「三」の最上部線分に対応した文字素外接矩形に注
目する。次の文字素外接矩形と統合した幅は標準的な文
字幅として過大ではないが、長さの関係が「＝」と逆で
ある。さらに次の文字素外接矩形まで統合した幅は、標
準的な文字幅として過大とはならず、中央の文字素外接
矩形の長さに比べ最後の文字素外接矩形の長さが大きい
。このような条件から、３１１の文字素外接矩形はｒ三
Ｊを構成すると判定して統合し、統合した矩形を文字枠
として「三Ｊの画像を切出すことができる。Next, pay attention to the grapheme circumscribing rectangle corresponding to the top line segment of "three". The width integrated with the next grapheme circumscribing rectangle is not excessive as a standard character width, but the relationship in length is the opposite of "=". Furthermore, the integrated width up to the next grapheme circumscribing rectangle is not excessive as a standard character width, and the length of the last grapheme circumscribing rectangle is larger than the length of the central grapheme circumscribing rectangle. Based on these conditions, it is determined that the circumscribed rectangle of the character element 311 constitutes r3J, and they are integrated, and the image of 3J can be cut out using the integrated rectangle as a character frame.

〔Example〕

第１図は本発明の一実施例に係る文字認識システムのブ
ロック図である。FIG. 1 is a block diagram of a character recognition system according to an embodiment of the present invention.

処理対象の文書画像はスキャナー等から２値画像データ
として入力され、原画像メモリ１に記憶されている０文
字画像切出し処理部２は、この原画像メモリ１内の文書
画像の文字画像を切出す処理を行い、切出した文字画像
を文字画像メモリ３に格納する０文字認識処理部４は、
その文字画像の正規化、特徴量抽出、辞書との照合を行
って文字を認識し、認識結果を認識結果メモリ５に格納
する。A document image to be processed is input as binary image data from a scanner or the like, and a 0 character image cutting unit 2 stored in the original image memory 1 cuts out character images of the document image in the original image memory 1. The 0 character recognition processing unit 4 performs processing and stores the extracted character image in the character image memory 3.
The characters are recognized by normalizing the character image, extracting features, and comparing it with a dictionary, and the recognition results are stored in the recognition result memory 5.

文字画像切出し処理部２において、行切出し部６は文書
画像より１行分の文字素を抽出し、文字素データを文字
素データメモリ７に格納する。例えば、文書画像上の黒
画素の連結を探索し、連結した黒画素の塊を文字素（文
字または文字の一部を構成する黒画素群）として抽出し
、文字素の外接矩形の座標（例えば対角頂点の座標）を
文字素データとして抽出する。また、文字素外接矩形の
間の水平方向及び垂直方向の距離を求め、両方向の距離
がある閾値より小さい文字素外接矩形の集まりを１行と
して切出し、そのデータも文字素データメモリ７に格納
する。このような行切出しは、行方向射影による行切出
しより文書のスキュー等も強い。In the character image extraction processing section 2, a line extraction section 6 extracts one line of character elements from the document image and stores the character element data in the character element data memory 7. For example, by searching for connections of black pixels on a document image, extracting a block of connected black pixels as a grapheme (a group of black pixels that make up a character or a part of a character), The coordinates of the diagonal vertices) are extracted as character element data. Additionally, the horizontal and vertical distances between the grapheme circumscribing rectangles are determined, and a collection of grapheme circumscribing rectangles whose distance in both directions is smaller than a certain threshold is cut out as one line, and that data is also stored in the grapheme data memory 7. . Such line cutting is more likely to cause document skew than line cutting by line direction projection.

パラメータ設定８Ｂは、文字素デーメモリ７内の文字素
データ全体を見渡し、文字切出しく文字素統合及び文字
パターン合成）に用いるための相対的なパラメータを設
定し、パラメータメモリ９に格納する。文字素統合は複
数回行われるが、処理を１回路るごとに、修正された文
字素データメモリ７内の文字素データに基づいてパラメ
ータを再設定する。Parameter setting 8B looks over the entire grapheme data in the grapheme data memory 7, sets relative parameters for use in character extraction, grapheme integration, and character pattern synthesis, and stores them in the parameter memory 9. Although the grapheme integration is performed multiple times, the parameters are reset based on the revised grapheme data in the grapheme data memory 7 each time the process is completed.

設定するパラメータは次の通りである。The parameters to be set are as follows.

（ａ）　　行高さ（横書き行の下端から上端までの距離
、縦書き行の縦書きの左端から右端までの距離）これは、行を構成する文字素外接矩形の最大高さとする
。ただし、文字素外接矩形の高さとは、横書き行であれ
ば下端から、縦書き行であれば左辺から、文字素外接矩
形の最も遠い辺までの距離である。(a) Line height (distance from the bottom edge to the top edge of a horizontally written line, distance from the vertically written left edge to right edge of a vertically written line) This is the maximum height of the circumscribed rectangle of the character element forming the line. However, the height of a character element circumscribing rectangle is the distance from the bottom edge of a horizontally written line, and from the left side of a vertically written line to the farthest side of the character element circumscribing rectangle.

（ｂ）　　標準文字幅（幅とは行方向のサイズ）行高さ
より少し大きな値であり、例えば［行高さ］＋１　（ド
ツト）とする。(b) Standard character width (width is the size in the line direction) This is a value slightly larger than the line height, for example, [line height] + 1 (dot).

（ｃ）　　最小文字間隔隣り合う文字素外接矩形の最小間隔とする。(c) Minimum character spacing This is the minimum interval between adjacent grapheme circumscribing rectangles.

ただし、ある幅、例えば［標準文字幅コの半分以上の幅
を持つ文字素外接矩形だけを対象として決定する。However, only character element circumscribing rectangles with a certain width, for example, half or more of the standard character width, are determined.

なお、１回目の文字素統合処理の前においては、次の初
期値に設定する。Note that before the first character element integration process, the following initial value is set.

［初期値コ＝−［行高さコ／３０（負の値は、文字素外接矩形に重なっていることを意味
する）（ｄ）　　最大文字幅行を構成する文字素外接矩形は最大幅とする。[Initial value ko=-[line height ko/30 (a negative value means that it overlaps the grapheme circumscribing rectangle) (d) Maximum character width The grapheme circumscribing rectangle that makes up the line has the maximum width. do.

（ｅ）　　Ｋｎｔｈ行に垂直方向の一つの線分のみからなる文字素とみなし
得る文字素外接矩形の最大幅であり、例えば［行高さ］
／８に設定する。(e) Knth This is the maximum width of a grapheme circumscribing rectangle that can be considered as a grapheme consisting of only one line segment perpendicular to the line, for example, [line height]
/8.

文字素統合部１０は、パラメータメモリ９内のパラメー
タを用いた条件判定により、行の先頭の文字素より順に
、同一の文字とみなされる文字素外接矩形を統合し、文
字素データメモリ７内の文字素データを修正する。この
文字素統合処理は、通常処理モードと例外処理モードと
からなる。The grapheme integration unit 10 integrates grapheme circumscribing rectangles that are considered to be the same character, starting from the first grapheme in a line, based on conditional judgment using parameters in the parameter memory 9, and Modify grapheme data. This character element integration processing consists of a normal processing mode and an exception processing mode.

まず、例外処理モードの内容について説明する。First, the contents of the exception handling mode will be explained.

第２図は例外処理モードの説明のためのフローチャート
である。ただし、Ｗｌは注目している文字素の文字幅、
Ｗ２は注目している文字素の次の文字素の文字幅、Ｗ３
はＷ２の次の文字素の文字幅、Ｌｌは注目している文字
素の長さ、Ｌ２は注目している文字素の次の文字素の長
さ、Ｌ３はＷ２の次の文字素の長さ、Ｗｌ、２は注目し
ている文字素と次の文字素を統合したときの文字幅、Ｗ
ｌ。FIG. 2 is a flowchart for explaining the exception handling mode. However, Wl is the character width of the grapheme of interest,
W2 is the character width of the grapheme next to the grapheme of interest, W3
is the character width of the grapheme next to W2, Ll is the length of the grapheme of interest, L2 is the length of the grapheme next to the grapheme of interest, L3 is the length of the grapheme after W2 , Wl, 2 is the character width when the grapheme of interest and the next grapheme are integrated, W
l.

２．３は注目している文字素と続く二つの文字素を統合
したときの文字幅である。2.3 is the character width when the grapheme of interest and the two following graphemes are integrated.

横書き行の「川」や縦書き行の「−二三」なとの行方向
に垂直な一つの線分よりなる文字素に分解され、その文
字素の間隔が広い特定の文字は、その文字素間隔が他の
文字の文字素間隔より十分に広い場合であれば、通常処
理モードで切出すことが可能であるが、そのような条件
が満たされる文書は少ない。この例外処理モードは、そ
のような特定文字の切出しを適確な行うための文字素統
合処理を行うモードである。Certain characters that are broken down into graphemes consisting of one line segment perpendicular to the line direction, such as ``kawa'' in a horizontally written line or ``-23'' in a vertically written line, and whose graphemes have wide intervals, are If the prime spacing is sufficiently wider than the grapheme spacing of other characters, it is possible to extract in the normal processing mode, but there are few documents that satisfy such conditions. This exception processing mode is a mode in which character element integration processing is performed to accurately extract such specific characters.

注目した文字素１の文字幅（文字素外接矩形の行方向の
サイズ）ｗｌと、パラメータＫｎｔｈとの比較判定を行
い（ステップ■）　、ｗ　１　＜　Ｋｎｔｈであれば、
注目文字素ｌは行方向に垂直な一つの線分のみからなる
と見做し例外処理モードに入る。The character width of the grapheme 1 of interest (size in the row direction of the grapheme circumscribing rectangle) wl is compared with the parameter Knth (step ■), and if w 1 < Knth,
The character element l of interest is assumed to consist of only one line segment perpendicular to the line direction, and the exception handling mode is entered.

例外処理モードに入ると、まず注目文字素１と次の文字
素２を統合したときの文字幅１．２と標準文字幅との比
較判定を行い（ステップ■）、ｗｌ、２≧積標準字幅で
あれば注目文字素１を単独の文字素として確定する。例
えば第３図に示すような縦書き行の「−二三」の文字列
において、先頭の「−」の文字素に注目すると、ｗｌ、
２標準文字幅となるので、単独の文字素として確定する
。When entering the exception handling mode, first, a comparison is made between the character width 1.2 when the noticed grapheme 1 and the next grapheme 2 are integrated and the standard character width (step ■), and wl, 2 ≥ product standard character. If it is the width, the character element 1 of interest is determined as a single character element. For example, in the character string "-23" in a vertical line as shown in Figure 3, if we pay attention to the first character "-", we can see that wl,
Since it has a standard character width of 2, it is determined as a single character element.

ステップ■でｗｌ、２（標準文字幅となったときは、注
目している文字素ｌの長さ（外接矩形の行方向と垂直な
方向のサイズ）Ｌｌと次の文字素２の長さＬ２との比較
判定を行い（ステップ■）、Ｌ２＞Ｌｌならば文字素１
と文字素２を統合し一つの文字素に確定する。例えば第
３図に示した「二」の上の線分の文字素を注目文字素１
とすれば、ステップ■の条件を満足するので、「二」を
構成する二つの文字素は一つの文字素に確定的に統合さ
れる。In step ■, wl, 2 (when the standard character width is reached, the length of the grapheme 1 of interest (size in the direction perpendicular to the row direction of the circumscribing rectangle) Ll and the length of the next grapheme 2 L2 (step ■), and if L2>Ll, character element 1
and grapheme 2 are integrated to form one grapheme. For example, the grapheme of the line segment above “two” shown in Figure 3 is the grapheme of interest 1.
If so, the condition of step (■) is satisfied, so the two graphemes constituting "2" are definitively integrated into one grapheme.

ステップ■でＬ２≦Ｌ１となったときは、注目文字素１
、次の文字素２、さらに次の文字素３を統合した文字幅
ｗ１．２．３と標準文字幅の比較判定を行い（ステップ
■）、ｗｌ、２．３＜標準文字幅であれば文字素２の長
さＬ２と文字素３の長さＬ３の大小比較を行い（ステッ
プ■）、Ｌ３）Ｌ２であれば文字素１．２．３を確定的
に統合する。例えば、第３図に示した［三Ｊの一番上の
線分の文字素を注目文字素ｌとすると、ステップ■の条
件を満足するので、「三」を構成する３個の文字素は確
定的に統合される。When L2≦L1 in step ■, the noted grapheme 1
, compare and judge the character width w1.2.3 that integrates the next grapheme 2 and the next grapheme 3 with the standard character width (step ■), and if wl,2.3<standard character width, the character A comparison is made between the length L2 of element 2 and the length L3 of character element 3 (step ■), and L3) if L2, character elements 1.2.3 are definitively integrated. For example, if the grapheme of the top line segment of [3J shown in Figure 3 is the noted grapheme l, the condition of step ■ is satisfied, so the three graphemes that make up ``3'' are Deterministically integrated.

以上の説明から明らかなように、第３図に示した通常処
理モードでは切出しが難しい文字列に対し、第４図に示
す文字素統合結果が得られるので、各文字を正しく切出
すことができる。As is clear from the above explanation, the grapheme integration result shown in Figure 4 is obtained for character strings that are difficult to extract in the normal processing mode shown in Figure 3, so each character can be correctly extracted. .

次に通常処理モードの内容について第５図乃至第９図を
用い説明する。ただし、便宜上、ここでは横書き行であ
るとする。Next, the contents of the normal processing mode will be explained using FIGS. 5 to 9. However, for convenience, it is assumed here that the lines are written horizontally.

■）隣り合う二つの文字素外接矩形ａ、ｂが条件式（１
）及び（２）同時に満たす場合、一つの文字素外接矩形
に統合する（第５図参照）。■) Two adjacent glyph circumscribing rectangles a and b are conditional expressions (1
) and (2) are satisfied at the same time, they are integrated into one glyph-eme circumscribing rectangle (see Figure 5).

Ｗａｂ≦［標準文字幅］　　　　　　　　　　条件式（
１）ただしＷａｂはａ、ｂの統合矩形の文字幅である。Wab≦[Standard character width] Conditional expression (
1) However, Wab is the character width of the integrated rectangle of a and b.

Ｘ５ｂ−Ｘｅａ＜　［最小文字間隔コ　　　　　　条件
式（２）ただしＸｓｂはｂの始点Ｘ座標、Ｘｅａはａの
終点Ｘ座標である。X5b-Xea<[Minimum character spacing Conditional expression (2) where Xsb is the X coordinate of the starting point of b, and Xea is the X coordinate of the ending point of a.

条件式（１）は統合後の文字素外接矩形の幅Ａａｂ［標
準文字幅］を越えないことを示し、条件式（２）はａ、
ｂの間隔Ｇａｂが〔最小文字間隔］より小さいことを示
している。Conditional expression (1) indicates that the width of the grapheme circumscribing rectangle after integration does not exceed Aab [standard character width], and conditional expression (2) indicates that a,
This indicates that the interval Gab of b is smaller than the [minimum character interval].

ただし、条件式（３）に該当する場合には標準文字間を
越えても統合する。However, if conditional expression (3) is met, the characters are integrated even if the standard character spacing is exceeded.

Ｘｅｂ（Ｘｅｂ　　　　　　　　　　　　　　　　条件
式（３）ただしＸｅｂはｂの終点ｘｌｌ標である。ａ、
ｂの位置関係はａが左６１（行先頭寄り）、ｂが右側で
あるとする。Xeb (Xeb Conditional expression (3) where Xeb is the end point xll mark of b.a,
Assume that the positional relationship of b is that a is 61 on the left (near the beginning of the line) and b is on the right.

すなわち、第６図に示すようにＤａｂ＜Ｏとなる完全に
重なった二つの文字素外接矩形ａ、ｂはＷａｂ≧［標準
文字幅］であっても統合する。That is, as shown in FIG. 6, two completely overlapping grapheme circumscribing rectangles a and b where Dab<O are combined even if Wab≧[standard character width].

２）前記１）において条件式（１）を満たすが、条件式
（２）を満たさない場合、右側の文字素外接矩形（ｂと
する）が次の条件式（４）〜（７）をすべて満すときは
、濁点もしくは半濁点の文字素外接矩形であるとみなし
例外的に統合する（第７図参照）。2) In 1) above, if conditional expression (1) is satisfied but conditional expression (2) is not satisfied, the right grapheme circumscribing rectangle (referred to as b) satisfies all of the following conditional expressions (4) to (7). If it satisfies the above criteria, it is considered to be a circumscribed rectangle of a voiced or handakuten character and is exceptionally integrated (see Figure 7).

ｗｂ≦ｆｔｈ　　　　　　　　　　　　　　　　条件式
（４）Ｈｂ≦ｆｔｈ　　　　　　　　　　　　　　　　
条件式（５）（Ｈｂはｂの高さ）Ｙｅｂ　−ＴｓａＳ　ｆ　ｔｈ　　　　　　　　　　　
　条件式（６）（Ｙｅｂはｂの終点ｙ座標、Ｙｅａはａ
の始点ｙ座標）Ｘｓｂ−Ｘｅａ（（最小文字間隔コ＋α
　　　条件式（７）ただしｆｔｈ（閾値）＝［行高さ］
／βである。αとβは経験値であり、例えばα＝２、β
＝３に選ばれる。wb≦fth Conditional expression (4) Hb≦fth
Conditional expression (5) (Hb is the height of b) Yeb -TsaS f th
Conditional expression (6) (Yeb is the end point y coordinate of b, Yea is a
starting point y coordinate)Xsb-Xea((minimum character spacing + α
Conditional expression (7) where fth (threshold) = [row height]
/β. α and β are empirical values, for example α=2, β
=3 is selected.

３）　前記１）において条件式（１）、　（２）を満た
すものの、右側の文字素外接矩形（ｂとする）が次の条
件式（８）〜（１０）に該当する場合は、句読点である
とみなして例外的に統合しない（第８図参照）。3) Although conditional expressions (1) and (2) are satisfied in 1) above, if the right side grapheme circumscribing rectangle (referred to as b) falls under the following conditional expressions (8) to (10), punctuation marks should be used. It is assumed that there is such a thing and it is not integrated as an exception (see Figure 8).

ｗｂ≦ｆｔｈ　　　　　　　　　　　　　　　　条件式
（８）Ｈｂ≦ｆｔｈ　　　　　　　　　　　　　　　　
条件式（９）ｂａｓｅ−Ｙｅｂ≦ｆ　ｔｈ／　７　　　
　　　　　　　条件式（１０）ただしｂａｓｅはベース
ラインのｙＪ！！標、７は経験値で例えば４に選ばれる
。wb≦fth Conditional expression (8) Hb≦fth
Conditional expression (9) base-Yeb≦f th/7
Conditional expression (10) where base is the baseline yJ! ! The target, 7, is an experience value and is selected as 4, for example.

以上の如き通常処理モードによる統合処理が１行の文字
素データ（例外処理モードで確定した文字素は除く）に
ついて１回終了するたびに、パラメータ設定部８により
修正後の文字素データに基づき再設定され、パラメータ
メモリ９が書き替えられる。この書き替えられたパラメ
ータを用い、文字素統合部１０は通常処理モードの処理
を再度実行する。Each time the above-described integration process in the normal processing mode is completed for one line of grapheme data (excluding the grapheme determined in the exception processing mode), the parameter setting unit 8 restarts the process based on the revised grapheme data. is set, and the parameter memory 9 is rewritten. Using the rewritten parameters, the grapheme integration unit 10 re-executes the process in the normal process mode.

このようにパラメータをダイナミックに修正しつつ文字
素統合処理を繰り返す。この繰り返しは、例えばパラメ
ータが変化しなくなったとき、もしくはパラメータの変
化量がある閾値以下となったときに終了する。In this way, the grapheme integration process is repeated while dynamically modifying the parameters. This repetition ends, for example, when the parameter no longer changes, or when the amount of change in the parameter becomes less than or equal to a certain threshold.

第９図は横書き行の通常処理モードによる文字素統合の
例を示す。（ａ）に示す行画像に対し、行切出し部６に
より（ｂ）に示すような文字素が抽出される。これらの
文字素は１回目の統合処理により（ｃ）に示すように統
合され、２回目の統合処理によって（ｄ）に示すように
統合される。FIG. 9 shows an example of character element integration in the normal processing mode for horizontally written lines. The line cutting unit 6 extracts character elements as shown in (b) from the line image shown in (a). These graphemes are integrated as shown in (c) in the first integration process, and as shown in (d) in the second integration process.

（ｄ）に示すように統合処理が完了した段階において、
文字素外接矩形の殆どは一つの文字の外接矩形（文字枠
）に対応する。すなわち、殆どの文字の外接矩形が生成
される。しかし、この例における「小Ｊのように統合す
べき文字素が分離したまま残ることもある。At the stage when the integration process is completed as shown in (d),
Most of the grapheme circumscribing rectangles correspond to the circumscribing rectangle (character frame) of one character. That is, circumscribed rectangles of most characters are generated. However, in this example, graphemes that should be integrated may remain separated, such as the small J.

このような統合未完の文字素が残る可能性があるので、
文字画像切出し処理部２は文字パターン合成部１１にお
いて、単純に統合処理後の各文字素外接矩形を文字枠と
して原画像より文字画像を切出すのではなく、連続した
文字素外接矩形を［最大文字幅コ　（パラメータメモリ
９に記憶されている）を越えない程度に組合せ、組合せ
た矩形（１個の文字素外接矩形も含む）のそれぞれを文
字枠として原画像より文字画像の候補を切出し文字画像
メモリ３に出力する。ただし、文字素統合処理の例外処
理モードで確定した文字素については、それ単独での文
字画像切出しだけを行う。Since there is a possibility that such unintegrated graphemes may remain,
In the character pattern synthesis section 11, the character image extraction processing unit 2 does not simply cut out a character image from the original image using each character element circumscribing rectangle after the integration process as a character frame, but instead extracts the character image from the original image by using continuous character element circumscribing rectangles as [maximum The characters are combined to the extent that they do not exceed the character width (stored in the parameter memory 9), and each of the combined rectangles (including the circumscribed rectangle of one grapheme) is used as a character frame to cut out character image candidates from the original image. Output to image memory 3. However, for a character element determined in the exception processing mode of character element integration processing, only character image extraction is performed for that character element alone.

文字認識処理部４においては、文字画像メモリ３に格納
された文字画像候補について文字認識を行う。ただし、
文字素が重複した文字画像候補については、それぞれの
認識結果を評価し、最も文字としての確からしさが他界
文字画像候補を選択しくすなわち文字認識結果より文字
切出しを確定し）、その認識結果だけを認識結果メモリ
５に出力する。The character recognition processing section 4 performs character recognition on the character image candidates stored in the character image memory 3. however,
For character image candidates with overlapping graphemes, the recognition results of each are evaluated, and the character image candidate that is most likely to be a character is selected (in other words, character extraction is determined based on the character recognition results), and only that recognition result is selected. The recognition result is output to the memory 5.

なお、このような文字素を組合せて認識し、認識結果の
評価によって文字切出しを確定する方法については前記
昭和６３年第１３３４２４号特許出願の明細書及び図面
に詳述されている。The method of recognizing a combination of such graphemes and determining character segmentation by evaluating the recognition results is detailed in the specification and drawings of the patent application No. 133424 of 1988.

〔Effect of the invention〕

以上、詳細に説明した如く、本発明によれば、行方向の
垂直な線分のみからなる特定の文字（例えば縦書き行の
漢数字ｒ−二三Ｊなど）が連続して出現した場合にも、
正確な文字画像切出しが可能となる。As explained above in detail, according to the present invention, when specific characters consisting only of vertical line segments (for example, Chinese numerals r-23J in vertical lines) appear consecutively, too,
Accurate character image extraction becomes possible.

[Brief explanation of drawings]

第１図は本発明の一実施例に係る文字認識システムのブ
ロック図、第２図は文字素統合の例外処理モードを説明
するためのフローチャート、第３図は例外処理モードの
対象となる文字列を示す図、第４図は第３図に示した文
字列の統合処理結果を示す図、第５図乃至第８図は文字
素統合の通常処理モードのための統合条件の説明図、第
９図は文字素統合の例を示す図である。ｌ・・・原画像メモリ、２・−・文字画像切出し処理部、３・・・文字画像メモ１ハ　４・・・文字認識処理部、
５・・・認識結果メモリ、　６・・・行切出し部、７・
・・文字素データメモ１ハ８・・・パラメータ設定部、９・・・パラメータメモリ、　　１０・・・文字素統合
部、１１・・・文字パターン合成部。第３区第４図、処環Ｓ東第９図Ｗ久ｂａｒ中斧乏午女ご虻中９コＱｏ−ｈ＜〔系小文Ｗ賜］ｔｚＷし　≦子土ｈ１−１ｔ、ｊは五ｈＬｃｔｂ’＝ｆそｈ〈　〔末４ｊ１夕；雪中Ｓコ ≦＋＋ｈ ≦＋ｔｈ ≦　子モｈFIG. 1 is a block diagram of a character recognition system according to an embodiment of the present invention, FIG. 2 is a flowchart for explaining an exception handling mode for character element integration, and FIG. 3 is a character string subject to the exception handling mode. FIG. 4 is a diagram showing the result of the character string integration process shown in FIG. 3. FIGS. The figure shows an example of grapheme integration. l...Original image memory, 2...Character image cutout processing unit, 3...Character image memo 1c 4...Character recognition processing unit,
5... Recognition result memory, 6... Line cutting section, 7.
...Characteristic data memo 1c8...Parameter setting section, 9.Parameter memory, 10.Characteristic integration section, 11.Character pattern synthesis section. 3rd ward, 4th figure, 3rd ward, 4th figure, 9th figure, W Kyu bar, middle axe, little horse girl, abbess, 9th Qo-h <[Short text W gift] tz W ≦Chito h 1-1t, j 5h Lctb'=fsoh <

Claims

[Claims]

(1) In a character image extraction method in which a character frame is obtained by integrating character element circumscribing rectangles extracted from line images and character images are extracted from line images, attention is paid to character element circumscribing rectangles whose width is smaller than a predetermined value, A conditional determination is made regarding the length and width of the noted grapheme circumscribing rectangle and the following grapheme circumscribing rectangle, and the width of the noted grapheme circumscribing rectangle and subsequent grapheme circumscribing rectangle is determined according to the result of the conditional judgment. A character image cutting method characterized by obtaining a character frame of a specific character consisting only of line segments perpendicular to the line direction by controlling integration.