JPH02125389A

JPH02125389A - Space detecting method

Info

Publication number: JPH02125389A
Application number: JP63269239A
Authority: JP
Inventors: Hiroshi Nakayama; 寛中山
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1988-07-01
Filing date: 1988-10-25
Publication date: 1990-05-14

Abstract

PURPOSE:To generate a necessary number of space codes by calculating line standard character width and line minimum character width from the maximum value of vertical projection in a line direction and determining the number of spaces of a blank part by using the line standard character width, line minimum character width, and the length of the space part. CONSTITUTION:A CPU 103 refers to the line projection of a line (current line) to be processed which is stored in a line projection storage area 102b according to a line standard character width and line minimum character width calculation program 104c to find the maximum value (line maximum projection value), calculates the current line standard character width and current line minimum character width by using the line projection maximum value, and stores the result in storage areas 102c and 102d in a RAM 102. Then the line standard character width, line minimum character width, and the length of the blank part are used to determine the number of spaces of the blank part. Consequently, the number of spaces can be determined properly.

Description

【発明の詳細な説明】【産業上の利用分野〕本発明は、文字認識処理装置において、文書の各行のス
ペース検出方法に関する。【従来の技術〕文字認識装置における処理の一般的な流れは第１０図に
示すように、処理対象の文書の画像を入力しくステップ
ｆｏｏｌ）、文書画像から行を切出しくステップ１００
２）、行から文字を切出しくステップ１００３）、切出
した文字を認識しくステップ１００４）、また行の空白
部に対する１または２以上のスペース・コードを生成し
くステップ１００５）、認識した文字コードおよび生成
したスペース・コードを認識結果として保存する（ステ
ップ１００６）、というものである。スペース検出ステップ１００５においては、行の空白部
の長さからスペース数を決定するが、従来は、外部から
予め設定した標準文字幅に相当する値と空白部の長さを
用いた特定の演算によってスペース数を決定していた。〔発明が解決しようとする課題〕上記従来のスペース検出方法によると、定ピツチの文書
であっても、文書のピッチが変わる都度、標準文字幅を
設定し直さなければならないという煩わしさがあった。また、英文の文書の殆どは不定ピッチであるが、このよ
うな文書の場合に上記従来方法でスペース数を適切に決
定することは困難であった。本発明の目的は、定ピツチ、不定ピッチ、文字サイズを
問わない、英文の文書などに好適なスペース検出方法を
提供することにある。〔課題を解決するための手段〕本発明によるスペース検出方法は、処理対象の文書の各
行において、行方向に対して垂直方向の射影（黒画素数
）の最大値から行標準文字幅および行最小文字幅を計算
し、該行標準文字幅および行最小文字幅と空白部の長さ
を用い、特定の演算によって該空白部のスペース数を決
定するというものである。これ以外の本発明の特徴につ
いては、実施例により説明する。〔作　用〕第９図に示すｈの文字をモデルとして考える。この文字だけが行内に存在するものと仮定した場合、文
字の高さＬｈと行の垂直方向の射影（黒画素数）の最大
値（行最大射影値と称す）とは比例関係にある。°そし
て、ＬｗとＬｈの比は、フォント、文字サイズが変わっ
ても大きく変化することはないという性質がある。したがって、適当な文字についてＬｈ、ＬｗおよびＬｓ
を測定して設定しておけば、各行について求めた行最大
射影値を用い、例えば次式により、各行毎に妥当な行標
準文字幅および行最小文字幅を計算することができる。行標準文字幅＝行最大射影値ＸＬｗ／Ｌｈ　　　（１）
行最小文字幅＝行最大射影値ＸＬｓ／Ｌｈ　　　（２）
なお、アルファベットはｈのように上に伸びた文字と、
ｙやｊのように下に伸びた文字が混在し、また文書の読
取り画像にはある程度のスキューが発生するので、行の
幅（水平射影が閾値を越える範囲の幅）は文字の高さを
必ずしも的確に反映するものではなく、文書の内容や読
取り状態により不安定である。これに対し１行最大射影値を用いれば、そのような不安
定さを排除し、適切な行標準文字幅および行最小文字幅
を計算できる。そして、このような行標準文字幅および
行最小文字幅を用いた特定の演算（大小比較、除算など
）により、不定ピッチの英文の文書などにおいても、ス
ペース数を的確に決定することが可能となる。〔発明の実施例〕以下、本発明の実施例につき図面を用い詳細に説明する
。第１図は本発明の各実施例に係る共通の装置構成を示す
概略ブロック図である。１０１は文書の画像を入力する
ためのスキャナー、１０２は画像や処理に関連したデー
タ、処理結果を格納するためのＲＡＭ、１０３は処理を
実行すルＣＰ　Ｕ、１０４は各処理のプログラムを格納
したＲＯＭである。以上の装置構成による処理内容について、実施例毎に以
下に説明する。夾巖桝よスキャナー１０１によって入力した文書の画像に対し、
ＣＰＵ１０３は行切出しプログラム１０４ａに従って行
切出しを行い、切出した行イメージをＲＡＭ　１０２内
の行イメージ格納領域１０２ａに格納する。この行切出
しは、例えば文書（こ＼では横書き文書とする）の水平
方向の射影（垂直軸に対する射影）を測定し、射影の値
が特定の閾値を越える範囲を行の範囲として切出す方法
などによって行う。なお、行を複数のブロックに分割し
、ブロック毎の水平射影によって行切出しを行うような
方法など、行切出しの方法自体は任意である。この行切出しと同時に、ＣＰＵ１０３は行イメージの垂
直方向の射影（黒画素数）を測定し、測定結果すなわち
行射影をその格納領域１０２ｂに格納する。以下、第２図および第３図を参照する。ＣＰＵ１０３は行標準文字幅・行最小文字幅計算プログ
ラム１０４ｃに従って、行射影格納領域１０２ｂに格納
されている現在処理の対象としている行（現在行）の行
射影を参照し、その最大値（行最大射影値）を求め（第
２図のステップ２０１）、この行射影最大値を用い前記
（１）式および（２）式により現在行標準文字幅および
現在行最小文字幅を計算し、その結果をＲＡＭ１０２内
のそれぞれの格納領域１０２ｃ、１０２ｄに格納する（
ステップ２０２）。次にＣＰＵ１０３は、文字切出しプログラム１０４ｂに
従い、現在行射影に基づいて現在行イメージからの文字
の切出しを行う（ステップ２０３）。この文字切出しは例えば、現在行射影の値が特定の閾値
を越える範囲を文字範囲とし、それ以外の範囲を余白部
とする方法により行うが、他の方法によっても行っても
よい。切出した文字のイメージについて、ＣＰＵＩＯ３は文字
認識プログラム１０４ｆにより、文字認識を行う（ステ
ップ２０４）。なお、この文字認識のために必要な辞書
があるが、図中には示されていない。認識した文字のコ
ードは認識結果格納領域１０２ｅに格納する（ステップ
２０５）。なお、文字認識のための特徴抽出や辞書との
マツチングなどを専用のハードウェアによって実行する
構成としてもよい。また、文字切出しにより抽出された現在打上の空白部に
ついて、ＣＰＵ１０３はスペース処理プログラム１０４
ｅに従いスペース数決定の処理を行う（ステップ２０６
）。この処理は第３図に示す通りである６まず、空白部の長さＬと現在行最小文字幅との比較判定
を行う（ステップ３０１）。Ｌ≦（現在行最小文字幅）のときは、当該空白部のスペ
ース数Ｎを最終的に０とする（ステップ３０２）。Ｌ〉（＠往行最小文字幅）のときは、現在行最小文字幅
をＬに加えた値を改めて空白部長さしに設定しくステッ
プ３０３）、このＬｌ′現在現在準標準文字幅り算した
値をＮに設定する（ステップ３ｏ４）。そして、とのＮ
と２の比較判定を行い（ステップ３０５）　、Ｎ＞２の
ときはＮの値（小数点以下の切捨てまたは四捨五入によ
り丸めを行った値）を最終的にスペース数Ｎに設定する
。他方、ステップ３０５でＮ以下２と判定したときは、
最終的にスペース数Ｎを１に設定する（ステップ３０６
）。このようにして空白部の長さ、現在行標準文字幅および
現在行最小文字幅を用いた演算によって不定ピッチ、定
ピツチの別や文字サイズを問わず最適なスペース数を決
定することができる。次に、ＣＰＵ１０３は決定したスペース数Ｎに対応した
個数のスペース・コードを認識結果格納領域１０２ｅに
格納する（ステップ２ｏ７）。以上説明した処理を文書の全行について実行し、１ペー
ジについての処理を終了する。なお、認識した文字のコードおよびスペース・コードを
ＲＡＭ１０２上に一時的に記憶しておき、現在行の処理
の最後でその文字コードおよびスペースコードを併合し
て認識結果格納領域１０２ｅに格納するようにしてもよ
い。大ｍ又前記実施例１においては、各行における先頭の余白部と
他余白部とを区別せずに処理したが、本実施例２おいて
は、各行の先頭の余白部だけを他の余白部とは区別し、
以下説明するような特別な処理を行う。°これ以外の処
理は前記実施例１と同様である本実施例２においては、文書の最初の行につぃて決定し
た現在行文字標準幅および現在行最小文字幅を標準文字
幅および最小文字幅とし、これをそれぞれの格納領域１
０２ｉ、１０２ｊに格納する。空白部のスペース決定の処理内容は基本的には第３図に
示した内容と同様である。しかし、各行の先頭の空白部
（文字より先に出現する余白部）のスペース数決定処理
において、当該行について決定した現在行標準文字幅お
よび現在行最小文字幅に代えて、格納領域１０２ｉ、１
０２ｊに格納されている標準文字幅および最小文字幅を
用いる。すなわち、第３図のステップ３０１，３０３で現在行最
小文字幅に代えて最小文字幅を用い、ステップ３０４で
現在行標準文字幅に代えて標準文字幅を用いる６各行の先頭の空白部具外の空白部のスペース決定には、
前記実施例１と同様に各現在行標準文字幅および現在行
最小文字幅を用いる。当然の結果として、文書の最初の行に関しては、その空
白部のスペース数決定の処理は前記実施例１と全く同一
となる。本実施例２に固有の効果は以下の通りである。前記実施例１において、行毎に文字サイズが異なる場合
１行の先頭の余白部の長さが同じであっても、スペース
数決定に用いる現在行標準文字幅および現在行最小文字
幅が違ってくるため、行先頭の余白部に対するスペース
数がまちまちになり、その結果、認識結果を出力すると
、各行の先頭の余白部の長さがばらつき、不自然になる
。これに対し本実施例２においては、すべての行の先頭の
余白部については、特定の一つの行（こＮでは文書の最
初の行）の現在行表示文字幅および現在行最小文字幅を
共通に使用するため、同じ長さの先頭の空白部は同じ数
のスペース・コードが生成される。したがって、上に述
べたように行頭の余白部に関する不自然さを排除できる
。末凰鼓ｙ本実施例内においても、前記実施例２におけると同様に
各行の先頭の余白部を他余白部と区別し、以下説明する
ような特別な処理を行う。これ以外の処理は前記実施例
１と同様である。本実施例３においては、文書の各行の先頭の空白部につ
いてはスペース数の決定を行わず、その長さの情報をＲ
ＡＭ１０２上の格納領域１０２ｆに順次格納しておく。行の先頭以外の余白部については、前記実施例１と同様
に現在行標準文字幅および現在行最小文字幅を用いてス
ペース数の決定を行う。また、各行について計算した現
在行標準文字幅および現在行最小文字幅の情報を、ＲＡ
Ｍ１０２上の格納領域１０２ｇ、ＬＯ２ｈに順次格納し
ておく。文書の最後の行について文字認識および先頭以外の空白
部のスペース・コード生成を終了した後、プログラム１
０４ｄに従って、格納領域１０２ｇに格納されているす
べての現在行標準文字幅の平均値を求め、これを標準文
字幅として格納領域１０２ｉに格納し、また格納領域１
０２ｈに格納されているすべての現在行最小文字幅の平
均値を求め、これを最小文字幅として格納領域１０２ｊ
に格納する。そして、格納領域１０２ｆに格納されている各行の先頭
の空白部の長さと、格納領域１０２ｉ。１０２ｊに格納されている標準文字幅および最小文字幅
を用い、第３図に示すと同様な処理によって各行の先頭
空白部のスペース数を決定し、その数のスペース・コー
ドを生成し、認識結果格納領域１０２ｅの必要な位置に
格納する。本実施例３によれば、前記実施例２に比べ処理はやへ複
雑になるが、前記実施例より以上に行頭空白部の処理結
果が安定する。失凰桝土前記実施例３においては、各現在行標準文字幅および各
現在行最小文字幅の平均値を、それぞれ各行頭空白部の
スペース数計算に用いた。これに対し、本実施例４にお
いては、各現在行標準文字幅の中で最も多く出現した値
を行頭空白部のスペース数決定で標準文字幅として使用
し、同様に各現在行表示文字幅の中で出現頻度が最高の
値を行頭空白部のスペース数決定で最小文字幅として使
用する。これ以外は前記実施例３と同処理である。本実施例４によれば、前記実施例３より処理は少し複雑
となるが１行頭空白部に関し一層自然な認識結果を得ら
れる。失に餞１本実施例は、行標準文字幅および行最小文字幅のほかに
、各行の垂直射影が一定閾値以下となる空白部の水平方
向の長さもスペース数決定の演算に用いる点が、前記各
実施例と異なる。第４図は処理の全体的な流れを示すフローチャートであ
る。第２図に示した前記各実施例における処理との違い
は、空白部の水平方向長さの分布に対応した値である行
最小余白幅決定のステップ４０３が追加されていること
へ、空白部分のスペース数決定のステップ４０５の内容
が変更されていることであり、他の処理ステップの内容
は同様である。第５図はステップ４０３の処理内容を示すフローチャー
トである。この処理では、現在行の余白の長さの分布か
ら、狭いスペース間隔で印字された行または幅の広いス
ペース間隔で印字された行であるかを判定し、スペース
数決定演算に用いる行最小空白幅を決定する。具体的に
は、行最小空白幅を、幅の狭いスペース間隔で印字され
た行では小さい値に１幅の広いスペース間隔で印字され
た行では大きい値に、それぞれ決定する。すなわち、ステップ５０１からステップ５０７において
、現在行の文字量空白部（前後の文字の間の空白であり
、行の両端の空白部は除く）の中で、行標準文字幅の２
倍以上の幅の文字量空白部（これは、もともとスペース
数が２以上の空白部と考えられる）を除いた文字量空白
部の幅の最大値Ｍを求める。次にステップ５０８，５０
９において、最大幅値Ｍを行標準文字幅と比較すること
により、広いスペース間隔の行と狭いスペース間隔の行
とを判別する。たゾし、スペースのない行も存在するの
で、最大幅値Ｍが行標準文字幅以下のときは、行最小文
字幅と比較することにより、スペースのない行の判別も
行う。広いスペース間隔の行またはスペースなし行に対
しては、ステップ５１０により、行標準文字幅と行最小
文字幅の和の２分の１の値を行最小空白幅とする。そう
でない行については、ステップ５１１により行最小文字
幅と最大幅値Ｍとの和の２分の１の値を行最小空白幅と
する。決定した行最小空白幅はＲＡＭ１０２の領域１０
２ｋに格納する。第６図はスペース数決定ステップ４０５の処理内容を示
すフローチャートである。第３図に示した前記各実施例
における処理との違いは、ステップ６０１において行最
小文幅幅に替えて行最小空白幅と空白部の長さとを比較
することである。なお、行先頭の空白部については、前記実施例１のよう
に文字量空白部と区別せずにスペース数を決定してもよ
いし、前記実施例２，３．４のように文字量空白部と区
別してスペース数決定を行ってもよい。ヌ】ｕｌ」本実施例の全体的な処理の流れは第４図に示した前記実
施例５の場合と同様であるが、スペース数決定処理（第
４図のステップ４０５）の内容が違う。第７図はこのスペース数決定処理のフローチャートであ
る。ステップ７０１で行最小文字幅以下の幅の文字量空
白部を検出し、それについてはステップ７０２でスペー
ス数を０に決定する。他方、行最小文字幅を越える幅の文字量空白部について
は、ステップ７０３，７０４によって前記各実施例と同
様にスペース数の仮決定を行う。次に前後の文字幅の合計値と行標準文字幅とを比較しく
ステップ７０５）、行標準文字幅以下のときは仮決定さ
れたスペース数から１を差し引いた値をスペース数に決
定しくステップ７０６）、そうでない行についてはスペ
ース数をＯまたは１に決定する（ステップ７０８，７０
９）。たゾし、ステップ７０６で決定されたスペース数
が０より小さくなった場合はスペース数を０にする（ス
テップ７０２）。すなわち、本実施例においては、定ピツチで文字幅の小
さい文字が続けて印字された場合、その文字量空白部の
幅は、前後の文字が標準的な文字幅を持っている場合に
比べて大きくなることに着目し、ステップ７０５によっ
て文字幅が狭い文字によって前後を挟まれた文字量空白
部を識別し、そのような文字量空白部については行最小
文字幅および行標準文字幅によって仮決定したスペース
数を１スペースだけ小さな値に補正する。行の先頭の余白部についての処理は前記実施例１．２，
３．４と同様でよい。寒廠叢ユ前記実施例５，６を組合せた方法によりスペース数を決
定するもので、そのフローチャートを第８図に示す。す
なわち、第７図のステップ７０１に対応するステップ８
０１において、文字量空白部の幅りと行最小空白幅とを
比較することが前記実施例６と違うが、これ以外の処理
内容は前記実施例６と同様である。以上説明した各実施例のスペース検出方法によれば、定
ピツチ、不定ピッチの別、文字サイズの違いに関係なく
、簡単高速にスペース検出が可能であり、モデルにより
設定した文字の各パラメータが処理対象としている文字
より得られる各パラメータと違っていても、処理の過程
でその誤差を吸収し的確なスペース数が得られる。また実施例５，７のスペース検出方法によれば、行毎の
文字量空白部の幅の分布をスペース数決定に反映させる
ことにより、より正確なスペース検出が可能である。さらに実施例６，７のスペース検出方法によれば１文字
間空白部の前後の文字幅を考１１６　してスペース数を
決定するため、特に定ピツチで幅の狭い文字を印字した
文書に対しより正確なスペース検出が可能である。〔発明の効果〕以上の説明から明らかなように、本発明によれば、処理
対象文書の定ピツチ、不定ピッチ、文字サイズを問わず
、空白部のスペース数を的確に決定し必要な数のスペー
ス・コードを生成することができ、また行頭の空白部の
長さが認識後に不自然になることも防止でき、しかもス
ペース処理も比較的簡単で高速処理が可能である。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a method for detecting spaces in each line of a document in a character recognition processing device. [Prior Art] The general flow of processing in a character recognition device is shown in FIG. 10, in which an image of a document to be processed is input (step 100), and lines are cut out from the document image (step 100).
2), step 1003) of cutting out a character from a line, step 1004) of recognizing the cut out character, and step 1005) of generating one or more space codes for a blank part of the line, and the recognized character code and generation. The resulting space code is saved as a recognition result (step 1006). In the space detection step 1005, the number of spaces is determined from the length of the blank part of the line. Conventionally, the number of spaces is determined by a specific calculation using a value corresponding to the standard character width set in advance from the outside and the length of the blank part. The number of spaces was determined. [Problems to be Solved by the Invention] According to the above-mentioned conventional space detection method, even if the document has a fixed pitch, there is the trouble of having to reset the standard character width every time the pitch of the document changes. . Furthermore, most English documents have irregular pitches, and in the case of such documents, it is difficult to appropriately determine the number of spaces using the above conventional method. SUMMARY OF THE INVENTION An object of the present invention is to provide a space detection method suitable for English documents, regardless of fixed pitch, irregular pitch, or character size. [Means for Solving the Problems] The space detection method according to the present invention calculates the line standard character width and line minimum from the maximum value of the projection (number of black pixels) in the direction perpendicular to the line direction in each line of the document to be processed. The character width is calculated, the line standard character width, the line minimum character width, and the length of the blank area are used to determine the number of spaces in the blank area by a specific calculation. Other features of the present invention will be explained with reference to Examples. [Function] Consider the letter h shown in Figure 9 as a model. Assuming that only this character exists in a line, there is a proportional relationship between the height Lh of the character and the maximum value of the vertical projection (number of black pixels) of the line (referred to as the line maximum projection value). °The ratio between Lw and Lh has the property that it does not change greatly even if the font and character size change. Therefore, for appropriate characters Lh, Lw and Ls
By measuring and setting , it is possible to calculate an appropriate line standard character width and line minimum character width for each line using the line maximum projection value determined for each line, for example, using the following equation. Line standard character width = line maximum projection value XLw/Lh (1)
Line minimum character width = line maximum projection value XLs/Lh (2)
In addition, the alphabet has letters that extend upward like h,
Because there are characters that extend downward, such as y and j, and a certain amount of skew occurs in the scanned image of the document, the line width (width in the range where the horizontal projection exceeds the threshold) is determined by the height of the characters. It does not necessarily reflect accurately and is unstable depending on the content of the document and the reading conditions. On the other hand, if the maximum projection value per line is used, such instability can be eliminated and an appropriate line standard character width and line minimum character width can be calculated. By performing specific operations (such as size comparison and division) using the standard line character width and minimum line character width, it is possible to accurately determine the number of spaces even in English documents with irregular pitches. Become. [Embodiments of the Invention] Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a schematic block diagram showing a common device configuration according to each embodiment of the present invention. 101 is a scanner for inputting document images; 102 is a RAM for storing images, data related to processing, and processing results; 103 is a CPU that executes processing; and 104 stores programs for each process. It is a ROM. The contents of processing performed by the above device configuration will be explained below for each embodiment. For the image of the document input by the scanner 101,
The CPU 103 performs line cutting according to the line cutting program 104a, and stores the cut out line image in the line image storage area 102a in the RAM 102. This line segmentation can be done, for example, by measuring the horizontal projection (projection on the vertical axis) of a document (in this case, a horizontally written document), and then cutting out the range where the projection value exceeds a certain threshold as the range of lines. done by. Note that the line extraction method itself may be arbitrary, such as dividing a line into a plurality of blocks and performing line extraction by horizontal projection for each block. At the same time as this row extraction, the CPU 103 measures the vertical projection (number of black pixels) of the row image, and stores the measurement result, that is, the row projection, in the storage area 102b. Reference will now be made to FIGS. 2 and 3. According to the line standard character width/line minimum character width calculation program 104c, the CPU 103 refers to the line projection of the line currently being processed (current line) stored in the line projection storage area 102b, and calculates its maximum value (line maximum Projection value) (step 201 in Figure 2), and using this maximum line projection value, calculate the standard character width of the current line and the minimum character width of the current line using equations (1) and (2) above, and use the results. Stored in the respective storage areas 102c and 102d in the RAM 102 (
Step 202). Next, the CPU 103 cuts out characters from the current line image based on the current line projection according to the character cutting program 104b (step 203). This character extraction is performed, for example, by a method in which a range in which the current line projection value exceeds a specific threshold value is defined as a character range, and the other range is defined as a margin, but other methods may also be used. The CPUIO 3 performs character recognition on the cut out character image using the character recognition program 104f (step 204). Note that there is a dictionary necessary for this character recognition, but it is not shown in the figure. The code of the recognized character is stored in the recognition result storage area 102e (step 205). Note that a configuration may be adopted in which feature extraction for character recognition, matching with a dictionary, and the like are executed by dedicated hardware. In addition, regarding the blank part of the current printout extracted by character cutting, the CPU 103 uses the space processing program 104.
Step 206
). This process is as shown in FIG. 3.6 First, the length L of the blank space is compared with the minimum character width of the current line (step 301). When L≦(current line minimum character width), the number N of spaces in the blank section is finally set to 0 (step 302). L> (@minimum forward character width), the value obtained by adding the minimum character width of the current line to L is set again as the blank length length (step 303), and this Ll' is calculated by dividing the current semi-standard character width. Set the value to N (step 3o4). And the N
and 2 are compared (step 305), and when N>2, the value of N (the value rounded off or rounded off after the decimal point) is finally set as the number of spaces N. On the other hand, when it is determined in step 305 that N is less than or equal to 2,
Finally, the number of spaces N is set to 1 (step 306
). In this way, the optimum number of spaces can be determined by calculation using the length of the blank section, the standard character width of the current line, and the minimum character width of the current line, regardless of whether the pitch is irregular or constant, or the character size. Next, the CPU 103 stores a number of space codes corresponding to the determined number of spaces N in the recognition result storage area 102e (step 2o7). The process described above is executed for all lines of the document, and the process for one page is completed. Note that the code and space code of the recognized character are temporarily stored in the RAM 102, and at the end of processing the current line, the character code and space code are merged and stored in the recognition result storage area 102e. It's okay. In addition, in the first embodiment, processing was performed without distinguishing between the leading blank part of each line and other blank parts, but in the present embodiment 2, only the leading blank part of each line was processed compared to other blank parts. Distinguish from
Special processing is performed as described below. ° Other processes are the same as in the first embodiment. In the second embodiment, the current line character standard width and current line minimum character width determined for the first line of the document are used as the standard character width and the minimum character width. This is the width of each storage area 1
02i, 102j. The processing content for determining the blank space is basically the same as that shown in FIG. However, in the process of determining the number of spaces in the blank area at the beginning of each line (the margin area that appears before the characters), instead of the current line standard character width and current line minimum character width determined for the line, storage areas 102i, 1
The standard character width and minimum character width stored in 02j are used. That is, in steps 301 and 303 of FIG. 3, the minimum character width is used instead of the minimum character width of the current line, and in step 304, the standard character width is used instead of the standard character width of the current line. To determine the blank space of
As in the first embodiment, each current line standard character width and current line minimum character width are used. As a natural result, for the first line of the document, the process for determining the number of blank spaces is exactly the same as in the first embodiment. Effects specific to the second embodiment are as follows. In the first embodiment, when the character size differs for each line, even if the length of the margin at the beginning of one line is the same, the standard character width of the current line and the minimum character width of the current line used to determine the number of spaces are different. As a result, the number of spaces for the margin at the beginning of a line varies, and as a result, when the recognition result is output, the length of the margin at the beginning of each line varies, resulting in an unnatural result. On the other hand, in this embodiment 2, for the blank space at the beginning of all lines, the current line display character width and current line minimum character width of one specific line (in this case, the first line of the document) are set to the same width. , leading blanks of the same length will generate the same number of space codes. Therefore, as described above, the unnaturalness associated with the blank space at the beginning of the line can be eliminated. In this embodiment, as in the second embodiment, the blank space at the beginning of each line is distinguished from other blank spaces, and special processing as described below is performed. Processing other than this is the same as in the first embodiment. In this third embodiment, the number of spaces is not determined for the blank section at the beginning of each line of the document, but the length information is
The information is sequentially stored in the storage area 102f on the AM 102. Regarding the margins other than the beginning of the line, the number of spaces is determined using the standard character width of the current line and the minimum character width of the current line, as in the first embodiment. In addition, information on the current line standard character width and current line minimum character width calculated for each line is
The data is sequentially stored in the storage area 102g and LO2h on the M102. After completing character recognition for the last line of the document and generating space codes for blank areas other than the beginning, program 1
04d, calculate the average value of all the current line standard character widths stored in the storage area 102g, store this as the standard character width in the storage area 102i, and
Find the average value of all the current line minimum character widths stored in 02h, and use this as the minimum character width in the storage area 102j.
Store in. Then, the length of the blank section at the beginning of each line stored in the storage area 102f and the storage area 102i. Using the standard character width and minimum character width stored in 102j, the number of spaces at the beginning of each line is determined by the same process as shown in Figure 3, a space code of that number is generated, and the recognition result is It is stored in the required position in the storage area 102e. According to the third embodiment, although the processing is slightly more complicated than that of the second embodiment, the processing result of the blank space at the beginning of a line is more stable than the above embodiment. In Example 3, the average value of the standard character width of each current line and the minimum character width of each current line was used to calculate the number of spaces in each blank space at the beginning of each line. On the other hand, in this embodiment 4, the value that appears most frequently among the standard character widths of each current line is used as the standard character width in determining the number of spaces in the blank area at the beginning of a line, and similarly The value with the highest frequency of occurrence is used as the minimum character width in determining the number of spaces at the beginning of a line. Other than this, the processing is the same as in the third embodiment. According to the fourth embodiment, although the processing is a little more complicated than the third embodiment, a more natural recognition result can be obtained for the blank space at the beginning of one line. The point of this embodiment is that, in addition to the line standard character width and line minimum character width, the horizontal length of the blank area where the vertical projection of each line is less than a certain threshold is also used in the calculation to determine the number of spaces. This is different from each of the embodiments described above. FIG. 4 is a flowchart showing the overall flow of processing. The difference from the processing in each of the above embodiments shown in FIG. The only difference is that the content of step 405 for determining the number of spaces has been changed, and the content of the other processing steps is the same. FIG. 5 is a flowchart showing the processing contents of step 403. In this process, it is determined from the distribution of margin lengths of the current line whether the line is printed with narrow spacing or wide spacing, and the minimum line spacing used in the calculation to determine the number of spaces is determined. Determine the width. Specifically, the minimum line blank width is determined to be a small value for lines printed with narrow space intervals and a large value for lines printed with one wide space interval. That is, from step 501 to step 507, within the character amount blank area (the blank space between the preceding and succeeding characters, excluding the blank areas at both ends of the line) of the current line, 2 of the line standard character width is
The maximum value M of the width of the character amount blank area excluding the character amount blank area which is twice the width or more (this is originally considered to be a blank area with two or more spaces) is determined. Then steps 508,50
At step 9, lines with wide spacing and lines with narrow spacing are determined by comparing the maximum width value M with the line standard character width. However, since there are also lines without spaces, when the maximum width value M is less than the line standard character width, the line without spaces is also determined by comparing it with the line minimum character width. For lines with wide spacing or lines without spaces, step 510 sets the line minimum blank width to half the sum of the line standard character width and the line minimum character width. For other lines, in step 511, one-half of the sum of the line minimum character width and the maximum width value M is set as the line minimum blank width. The determined minimum line blank width is area 10 of RAM 102.
Store in 2k. FIG. 6 is a flowchart showing the processing contents of the space number determination step 405. The difference from the processing in each of the embodiments shown in FIG. 3 is that in step 601, instead of the minimum line sentence width, the minimum line blank width is compared with the length of the blank section. Regarding the blank part at the beginning of the line, the number of spaces may be determined without distinguishing it from the blank part of the character amount as in the first embodiment, or the number of spaces may be determined without distinguishing it from the blank part of the character amount as in the above embodiment The number of spaces may be determined separately from the section. The overall process flow of this embodiment is the same as that of the fifth embodiment shown in FIG. 4, but the content of the space number determination process (step 405 in FIG. 4) is different. FIG. 7 is a flowchart of this space number determination process. In step 701, a character amount blank portion having a width less than the minimum line character width is detected, and in step 702, the number of spaces is determined to be zero. On the other hand, for blank spaces with a character amount exceeding the line minimum character width, the number of spaces is tentatively determined in steps 703 and 704 in the same way as in each of the embodiments described above. Next, compare the total width of the preceding and succeeding characters with the line standard character width (step 705), and if it is less than the line standard character width, the number of spaces is determined by subtracting 1 from the tentatively determined number of spaces (step 706). ), and for other rows, the number of spaces is determined to be O or 1 (steps 708, 70
9). If the number of spaces determined in step 706 is smaller than 0, the number of spaces is set to 0 (step 702). In other words, in this embodiment, when characters with a fixed pitch and a small character width are printed one after another, the width of the blank space is smaller than when the characters before and after have a standard character width. Focusing on the increase in character width, in step 705, character space spaces sandwiched between characters with narrow character widths are identified, and such space space spaces are tentatively determined based on the line minimum character width and line standard character width. Correct the number of spaces by one space. The processing for the blank space at the beginning of the line is as described in Example 1.2 above.
It may be the same as 3.4. The number of spaces is determined by a method that combines the methods of Examples 5 and 6 described above, and a flowchart thereof is shown in FIG. That is, step 8 corresponding to step 701 in FIG.
In No. 01, the difference from the sixth embodiment is that the width of the character amount blank part and the minimum line blank width are compared, but the other processing contents are the same as the sixth embodiment. According to the space detection method of each embodiment described above, spaces can be detected easily and quickly regardless of whether the pitch is fixed or irregular, or the character size is different, and each parameter of the character set by the model is processed. Even if each parameter obtained from the target character differs, the error is absorbed during the processing process and an accurate number of spaces can be obtained. Further, according to the space detection methods of the fifth and seventh embodiments, more accurate space detection is possible by reflecting the width distribution of the character amount blank portion for each line in determining the number of spaces. Furthermore, according to the space detection methods of Embodiments 6 and 7, the number of spaces is determined by considering the character widths before and after a blank space between characters116. Accurate space detection is possible. [Effects of the Invention] As is clear from the above description, according to the present invention, the number of blank spaces can be accurately determined and the necessary number can be achieved regardless of the fixed pitch, irregular pitch, or character size of the document to be processed. It is possible to generate a space code, and it is also possible to prevent the length of a blank section at the beginning of a line from becoming unnatural after recognition, and space processing is relatively simple and can be processed at high speed.

[Brief explanation of the drawing]

第１図は本発明の各実施例に係る装置構成を示す概略ブ
ロック図、第２図および第３図は本発明の一実施例にお
ける全体的処理およびスペース数決定処理の詳細フロー
チャート、第４図、第５図および第６図は本発明の他の
実施例における全体的処理、最小空白幅決定処理および
スペース数決定処理のフローチャート、第７図は本発明
の別の実施例におけるスペース数決定処理のフローチャ
ート、第８図は本発明のもう一つの実施例におけるスペ
ース数決定処理のフローチャート、第９図は行最大射影
値による行標準文字幅および行最小文字幅の決定を説明
するための文字のモデルを示す図、第１０図は文字認識
装置の一般的な処理の流れ図である。１０１・・・スキャナー　　１０２・・・ＲＡＭ、１０
３＝−ＣＰＵ、　　１１０４−ＲＯ。第２図第４図第６区第９図FIG. 1 is a schematic block diagram showing the device configuration according to each embodiment of the present invention, FIGS. 2 and 3 are detailed flowcharts of overall processing and space number determination processing in one embodiment of the present invention, and FIG. 4 , FIG. 5 and FIG. 6 are flowcharts of overall processing, minimum blank width determination processing, and space number determination processing in another embodiment of the present invention, and FIG. 7 is a flow chart of space number determination processing in another embodiment of the present invention. 8 is a flowchart of the space number determination process in another embodiment of the present invention, and FIG. 9 is a character flowchart for explaining the determination of the line standard character width and line minimum character width based on the line maximum projection value. FIG. 10, a diagram showing the model, is a flowchart of general processing of the character recognition device. 101...Scanner 102...RAM, 10
3=-CPU, 1104-RO. Figure 2 Figure 4 Figure 6 Section Figure 9

Claims

[Claims]

(1) For each line of the document to be processed, calculate the line standard character width and line minimum character width from the maximum value of the projection in the direction perpendicular to the line direction, and calculate the line standard character width, the line minimum character width, and the blank area. 1. A space detection method in a character recognition device, characterized in that the number of spaces in the blank portion is determined using the length of .

(2) A claim characterized in that, in the calculation for determining the number of blank spaces at the beginning of each line of a document to be processed, a line standard character width and a line minimum character width calculated for one line of the document are used. The space detection method described in item (1).

(3) Use the respective average values of the line standard character width and line minimum character width calculated for each line of the document to be processed in the calculation for determining the number of blank spaces at the beginning of each line of the document. The space detection method according to claim 1, characterized in that:

(4) The line standard character width and line minimum character width that have the highest frequency of occurrence calculated for each line of the document to be processed are used in the calculation for determining the number of blank spaces at the beginning of each line of the document. Claim (1) characterized in that the use of
) space detection method described.

(5) The space detection method according to claim (1), characterized in that the horizontal distribution of blank areas in each row is used in calculation for determining the number of spaces.

(6) The space detection method according to claim 1 or 5, characterized in that the number of spaces in the blank area is corrected based on the widths of the character parts before and after the blank area.