JP4544691B2

JP4544691B2 - Character reader

Info

Publication number: JP4544691B2
Application number: JP2000110112A
Authority: JP
Inventors: 博一岩下; 和弘石川
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2000-04-12
Filing date: 2000-04-12
Publication date: 2010-09-15
Anticipated expiration: 2020-04-12
Also published as: JP2001297302A

Description

【０００１】
【発明の属する技術分野】
本発明は、読み取った文字、あるいは文字列に対して誤って文字を認識した可能性を判別し、誤認識したときは、読み取られた文字列について知識処理あるいは後処理を行うような文字読取装置に関する。
【０００２】
【従来の技術】
従来より、帳票イメージ上に記録された文字を読み取る文字読取装置が知られている。
【０００３】
従来の文字読取装置では、画像上の指定された領域を走査し、行切り出し処理によって行の座標を切り出し、文字切り出し処理によって行の座標内の文字の座標を検出する。そして、検出後、各文字の座標の画像についての文字認識処理を行う。
【０００４】
文字読取装置では、一般に、行の切り出し、文字の切り出しを行ってから文字認識が行われる。
行の切り出しを行うには、画像を横方向に走査し、黒画素数のヒストグラムを作成し、ヒストグラムの値が０になった箇所で各行を区切る。
【０００５】
また、文字の切り出しを行うには、画像を縦方向に走査し、黒画素数のヒストグラムを作成し、ヒストグラムの値が０になった箇所で区切り、各文字に分離する。
誤って文字を認識したときは、単語照合辞書等を用いた知識処理あるいは後処理と呼ばれる処理を行うことにより、誤読や不読の文字を置換して認識率を向上させる。
【０００６】
【発明が解決しようとする課題】
ところで、かかる従来の文字読取装置では、ヒストグラムを作成して行あるいは文字の切り出しを行うようにしているので、正しく行切り出し、文字切り出しを行えない場合がある。
【０００７】
図２は、かかる従来の説明図である。
例えば、図２（ａ）は、複数の文字に対して抹消線が施されている例を示す。尚、図中、破線で示す矩形は、１文字の正しい区分を示す。この場合、抹消線があるために文字間を正しく識別できなくなってしまう。従って、すべての文字を１つの文字と誤認識してしまい、正しく行切り出しを行えない。
【０００８】
また、図２（ｂ）は、複数の文字を丸囲いした例を示す。この場合、丸囲い線のために、２行の文字を縦長の１文字と認識してしまい、正しい行切り出しを行うことができない。
【０００９】
また、図２（ｃ）は、印字にずれが生じた例を示す。この場合、行間に間隙がなくなってしまい、行を正しく識別できず、２行の文字を縦長の１文字として切り出してしまう。
【００１０】
また、図２（ｄ）は、桁区切りの線が行に含まれている例を示す。この場合、文字認識処理には不要である桁線が文字として切り出されてしまう。
このように行切り出しや文字切り出し結果に誤りがあると、明らかに文字サイズや文字の位置が正しくなくても、検出された文字の矩形を各文字としてそのまま文字認識の処理がなされ、誤読が生じたり不要な文字が出力されたりして文字認識装置としての信頼性が低下する。
【００１１】
また、このような認識結果の修正するには、オペレータが手作業で行う必要があり、オペレータの負担も増大する。
従って、誤って文字を認識した可能性があるか否かを正しく判断し、誤って文字を認識したときは、後処理を自動的に行えるようにする必要がある。
【００１２】
【課題を解決するための手段】
本発明は以上の点を解決するため次の構成を採用する。
〈構成１〉
本発明の文字読取装置は、所定の用紙に記入された文字を画像データとして取得する画像入力手段と、前記取得した画像データから各文字を切り出し、各文字の位置及び大きさを文字座標から認識する認識手段と、同一行の中で隣接する文字の間隔が所定の条件を連続で満たす場合に、これらの認識文字を１つのブロックに設定し、該ブロックの位置及び大きさを当該各文字の文字座標からブロック座標として認識するブロック認識手段と、認識されたブロックのブロック座標を格納する第1の記憶手段と、ブロックの高さが所定値以上であることを判定するサイズ判定条件を格納する第２の記憶手段と、認識された前記ブロックのブロック座標より求まるブロックの高さが前記サイズ判定条件に該当するか否かを判定するサイズ判定手段と、前記サイズ判定条件に該当すると判定されたブロックに対し、後処理を行う制御手段とを備えることを特徴とする。
【００１３】
〈構成２〉
構成１の構成の文字読取装置では、誤切り出し判定条件は、スクリプトにより記述されることを特徴とする。
【００１４】
〈構成３〉
構成１の構成の文字読取装置では、制御手段は、前記サイズ判定の結果に応じて、当該文字を別の文字に置換する処理、未処理、削除処理のうち、いずれか１つを前記後処理として行うことを特徴とする。
【００２３】
【発明の実施の形態】
以下、本発明の実施の形態を具体例を用いて説明する。
〈具体例１〉
具体例１は、画像データから文字を認識し、認識された文字の位置及び大きさに対して、読み取り領域に適応したサイズ判定データを誤読判定条件として指定してサイズ判定を行い、誤読の可能性があるときは、不読等の後処理を行うようにしたものである。
【００２４】
図１は、具体例１の構成を示すブロック図である。
具体例１の文字読取装置は、画像入力部１と、表示部２と、入力部３と、制御部４と、認識部５と、サイズ判定部６と、画像メモリ２１と、読み取り領域情報格納メモリ２２と、認識結果格納メモリ２３と、判定データ格納メモリ２４と、判定結果格納メモリ２５と、参照座標格納メモリ２６と、を備えて構成されている。
【００２５】
画像入力部１は、イメージスキャナおよびＦＡＸ等のように、帳票上に記入された文字、図形を画像データとして入力する機能を有する画像入力手段である。
表示部２は、ディスプレイ等のように、オペレータに対して情報を表示する機能を有するものである。
入力部３は、キーボード、マウス等のように、オペレータからの入力を受け付ける機能を有するものである。
【００２６】
認識部５は、画像メモリ２１に格納されている画像データを参照して読み取り領域を走査し、行の切り出し及び文字の切り出しを行って各文字の位置及び大きさを文字座標で特定し、各文字の認識を行い、認識した文字を文字コードに変換する機能を有する認識手段である。
【００２７】
図３は文字座標の説明図である。
この図３に示すように、文字「あ」は認識対象の文字であって、この文字の位置及び大きさは、図中、破線で示すように文字「あ」を囲む矩形によって特定される。
この矩形は、所定の位置を原点とし、図中、左上の座標を（l,t）、右下の座標を（r,b）として、座標（l,t）−（r,b）で表され、この座標が文字座標となる。
【００２８】
サイズ判定部６は、認識部５によって認識された結果に対して、文字の位置及びサイズについてのサイズ判定を行い、これにより誤読の可能性を判定する機能を有する誤読判定処理手段である。
【００２９】
図４は、具体例１のサイズ判定に用いるサイズ判定データの一例を示す説明図である。
この図４に示すように、サイズ判定データには、複数の条件式及びその条件式に該当したときの処理方法が含まれている。
【００３０】
ここで、処理方法としての不読は、認識されて変換された文字コードを、例えば「？」などの認識結果として含まれるべきでない文字に置換してオペレータによる認識結果の修正作業を容易にするための処理である。
【００３１】
未処理は、予め後処理を行わないように指定条件が設定された文字を不読や削除などの処理から除外するための処理である。
削除は、不要と考えられる文字を削除する処理である。
尚、処理方法は条件式に応じて適宜、設定される。
【００３２】
制御部４は、文字読取装置の各ブロックを制御する機能を有するものである。
画像メモリ２１は、画像入力部１によって入力された画像データを格納するためのメモリである。
【００３３】
読み取り領域情報格納メモリ２２は、読み取り処理を行うための領域情報を格納するためのメモリである。
図５は具体例１の読み取り領域の説明図である。
一例として帳票を示す。この帳票には、文字が印字された読み取り領域Ａ〜Ｆが設定されている。認識対象の文字の大きさ、字体等は各領域毎に異なっており、サイズ判定は、この読み取り領域Ａ〜Ｆ毎に行われる。領域情報は、この領域の座標を指定するための領域指定情報とこの領域に適用されるサイズ判定用の条件式等を指定するためのサイズ判定データ指定情報とであり、読み取り領域Ａ〜Ｆ毎に格納されている。
【００３４】
認識結果格納メモリ２３は、各領域の座標及び、認識部５によって認識された文字座標及び文字コードを格納するものである。
判定データ格納メモリ２４は、図４に示すようなサイズ判定データを格納するためのメモリである。
【００３５】
判定結果格納メモリ２５は、サイズ判定を行った結果、最終的に得られた文字コード及びその文字座標を格納するためのメモリであり、この判定結果格納メモリ２５に格納されるこの文字コード及びその文字座標は、認識結果格納メモリ２３に格納されているデータ形式と同じ形式で格納される。
【００３６】
参照座標格納メモリ２６は、サイズ判定部６が認識結果格納メモリ２３を参照するための参照座標及び文字位置を格納するためのメモリである。具体例１では、参照座標として、認識部５によって認識された文字の文字座標が格納される。文字位置は、認識部５に格納されている文字のうち、参照する文字の位置を示すデータであり、参照する文字が例えば１文字目のときは１となる。
【００３７】
〈動作〉
次に具体例１の動作を説明する。
制御部４は各ブロックを制御して文字の読み取りを実行する。
【００３８】
図６は具体例１の動作を示すフローチャートである。
ステップ（図中、ステップを「Ｓ」と記す。）１では、画像入力部１が帳票等から画像データを読み込む。
画像データは画像メモリ２１に格納され、その領域情報は読み取り領域情報格納メモリ２２に格納される。
【００３９】
制御部４は、読み取り領域情報格納メモリ２２に領域情報が格納されているか否かを判定する（ステップ２）。
最初は、図５に示すように読み取り領域Ａが指定される。
認識部５は、画像メモリ２１に格納されている画像データを参照し、読み取り領域Ａの領域指定情報に基づいてこの読み取り領域Ａ内の画像イメージに対して行の切り出し、文字の切り出しを行い、各文字の座標を検出する。そして、この文字の認識を行う。認識された文字は文字コードに変換され、この文字座標及び変換された文字コードは認識結果格納メモリ２３に格納される（ステップ３）。
【００４０】
サイズ判定部６は、この認識結果に対してサイズ判定を行う（ステップ４）。
図７は、具体例１のサイズ判定部６が行うサイズ判定処理を示すフローチャートである。
ステップ１１では、認識結果格納メモリ２３から判定対象の文字座標及びその文字位置を取得し、取得した文字座標及び文字位置を参照座標格納メモリ２６に格納する。尚、最初の文字位置は１である。また、次の判定対象となる文字の文字座標及び文字位置がなければ、例えば文字座標及び文字位置をすべて０にした「矩形なし」の情報を参照座標格納メモリ２６に格納する。
【００４１】
ステップ１２では、参照座標格納メモリ２６に格納されている文字座標を参照し、判定対象となる次の文字座標の有無を判定する。
参照座標格納メモリ２６に「矩形なし」の情報が格納されていなければ、ステップ１３に進む。
【００４２】
ステップ１３では、サイズ判定データを用いて条件計算を行う。
サイズ判定の条件は便宜上、数式によって記述される。次式（１）〜（３）は、そのサイズ判定に用いる条件式の一例である。
（ｂ−ｔ＋１）＞Ｗth …（１）
（ｒ−ｌ＋１）＞Ｈth …（２）
ｔ＜ｔmin …（３）
但し、ｌ、ｒ：矩形のｘ座標
ｔ，ｂ：矩形のｙ座標
Ｗth：幅（ｘ）方向の矩形の下限値（例えば４０）
Ｈth：高さ（ｙ）方向の矩形の下限値（例えば４０）
ｔmin：座標ｔの最小値（例えば１２００）
【００４３】
式（１）は文字座標から得られる矩形の高さ（ｙ方向）による誤読判定条件を示し、式（２）は文字座標から得られる矩形の幅（ｘ方向）による誤読判定条件を示し、式（３）は文字座標自体を誤読判定条件にしたものである。
【００４４】
尚、次式（４）に示すように、２つ以上の条件式を論理積（ＡＮＤ）あるいは、論理和（ＯＲ）で複合化してもよい。
（ｂ−ｔ＋１）＞４０ＡＮＤ（ｒ−ｌ＋１）＞４０ …（４）
【００４５】
このサイズ判定データは判定データ格納メモリ２４に格納されている。読み取り領域情報格納メモリ２２に格納されているサイズ判定データ指定情報を参照し、このサイズ判定データ指定情報を用いて読み取り領域Ａに適応したサイズ判定データが指定される。文字座標は参照座標格納メモリ２６から取り出され、この文字座標にこの読み取り領域Ａのサイズ判定データを適用して誤読判定のための条件計算を行う。
【００４６】
この装置には、これらの条件式の真偽を計算する制御プログラムが格納されている。
条件計算は、まず、図４に示す最初の条件式に文字座標を代入することにより行われる。計算の結果、偽のとき、即ち、条件式を満足しないときは、次の条件式に文字座標を代入する。このように、順次、文字座標を条件式に代入し、真になったとき、そこで計算を終了させる。
【００４７】
例えば、図２（ａ）、（ｂ）に示すように、複数の文字に対して抹消線が施されている場合、複数の文字を丸囲いした場合には、式（１）、（２）を満足するようになる。また、図２（ｃ）、（ｄ）に示すように、印字がずれた場合、桁区切りの線が行に含まれている場合には、式（３）を満足するようになる。このような場合、各条件式の計算結果は真となる。
所定の条件式の計算結果が真となったとき、誤読の可能性があると判定され、それ以上の条件計算は行わない。
【００４８】
一方、すべての条件式についての計算結果が偽となったとき、このサイズ判定データによる条件計算は偽となる。このときは、誤読の可能性はないと判定される。
この計算結果は、判定結果格納メモリ２５に格納される。
【００４９】
ステップ１４では、まず、計算結果の真偽を判別する。
計算結果が偽のときは、ステップ１５に進む。
ステップ１５では、偽となった文字の文字位置を参照座標格納メモリ２６から取得し、誤読の可能性はないと判定されているので、認識結果格納メモリ２３に格納されているその文字位置の文字座標及び文字コードをそのまま判定結果格納メモリ２５へコピーする。
また、ステップ１４において、条件計算の結果が真となったときは、ステップ１６に進む。
【００５０】
ステップ１６では、処理方法を判別する。
例えば、図４において、認識対象文字の文字座標が条件式２を満足することにより計算結果が偽になったとき、処理方法は未処理となる。
【００５１】
処理方法が未処理のときは、ステップ１５に進み、予め設定された指定条件を満足する文字を不読や削除などの処理から除外するために認識結果格納メモリ２３に格納されているその文字位置の文字座標及び文字コードをそのまま判定結果格納メモリ２５へコピーする。
【００５２】
また、条件式１を満足することにより計算結果が偽になったとき、処理方法は不読となる。
処理方法が不読のときは、ステップ１７に進み、その文字の文字位置を参照座標格納メモリ２６から取得し、認識結果格納メモリ２３に格納されているその文字位置の文字座標を判定結果格納メモリ２５へコピーし、その文字の文字コードを、例えば「？」などの認識結果として含まれるべきでない文字に置換して判定結果格納メモリ２５に格納する。従って、オペレータは、この文字を視認することにより誤読の可能性を一目で識別できる。
また、処理方法が削除のときは、例えばゴミ等によってイメージ化され、誤読されたと考えられる不要な文字あるいは記号を削除する。
【００５３】
最初の文字についてのこのような処理が終了した後、ステップ１１に戻り、次の文字座標及び文字位置を取得して同じようにステップ１２〜１７を実行する。
そして、参照座標格納メモリ２６に「矩形なし」の情報が格納されたときは、ステップ１２において、読み取り領域Ａにおいて認識された全ての文字について、サイズ判定が行われたと判定し、ステップ２に戻る。
このような処理は、読み取り領域Ｂ〜Ｆについても行われ、全ての読み取り領域についてこのような処理が行われたとき、処理が完了する。
【００５４】
〈具体例１の効果〉
以上、説明したように具体例１によれば、文字認識対象の文字の読み取りを行うときに、各文字の座標および座標から求められる高さや幅等に対してサイズ判定データを設定し、サイズ判定を行うようにしたので、その文字座標から認識結果の各文字の誤読の可能性についての評価を適切に行うことができる。
【００５５】
また、処理方法が未処理のときは、予め不読処理を行わないように設定された指定条件を満足するような文字に対しては、未処理とすることにより、この文字を不読処理、削除処理から除外することができる。
【００５６】
また、認識結果に誤読の可能性が高いと考えられる文字に対しては、不読処理を行うことにより、その対象となった文字が「？」のような含まれるべきでない文字に置換されるので、オペレータは一目で視認でき、オペレータによる認識結果の修正作業が容易となる。
【００５７】
また、例えばゴミ等によってイメージ化されて誤読されたと考えられる明らかに不要な文字、記号に対しては、この文字を削除することにより、オペレータによる修正作業の負荷を軽減できる。
【００５８】
さらに、このようなサイズ判定を読み取り領域毎に行うようにしたので、読み取り領域毎に字体、文字種、大きさ異なっているような帳票においても各領域毎に適切な文字認識、サイズ判定を行うことができ、文字認識精度が向上する。
【００５９】
〈具体例２〉
具体例２は、文字座標に基づいて３つの文字の前後関係を算出し、この前後関係に対して誤読判定条件を設定し、サイズ判定を行うようにしたものである。
具体例２の文字読取装置は、具体例１と同様に、画像入力部１と、表示部２と、入力部３と、制御部４と、認識部５と、サイズ判定部６と、画像メモリ２１と、読み取り領域情報格納メモリ２２と、認識結果格納メモリ２３と、判定データ格納メモリ２４と、判定結果格納メモリ２５と、参照座標格納メモリ２６と、を備えて構成されている。
【００６０】
但し、参照座標格納メモリ２６には３つの格納エリアが備えられている。
図８は、その参照座標格納メモリ２６の説明図である。
格納エリアｂは、認識対象である現在文字の文字座標及び文字位置を格納するためのエリアであり、格納エリアａ，ｃは、それぞれ現在文字の１つ前の文字座標及び文字位置、その次の文字の文字座標及び文字位置を格納するためのエリアである。
尚、具体例１と同一要素については同一符号を付して説明を省略する。
【００６１】
〈動作〉
次に具体例２の動作を説明する。
具体例２においても、具体例１と同様に、図６のフローチャートを実行し、ステップ４においてサイズ判定処理を実施する。
【００６２】
図９は具体例２のサイズ判定処理を示すフローチャートである。
ステップ２１では、先頭文字の文字座標及び文字位置を取得する。
取得した文字座標及び文字位置は参照座標格納メモリ２６の格納エリアｃに格納され、格納エリアａ、ｂにはともに「矩形なし」の情報が格納される。
【００６３】
ステップ２２では、次の文字の文字座標及び文字位置を取得する。
次の文字座標及び文字位置が取得されたとき、参照座標格納メモリ２６の格納エリアｃに格納されていた先頭文字の文字座標及び文字位置は格納エリアｂに格納され、取得した次の文字座標及び文字位置が格納エリアｃに格納される。
【００６４】
以後、文字座標及び文字位置を取得する毎に、格納エリアｂ，ｃに格納されている文字座標及び文字位置をそれぞれ格納エリアａ，ｂに格納し、取得した文字座標及び文字位置を格納エリアｃに格納する。
【００６５】
尚、次にサイズ判定を行う文字の文字座標及び文字位置がなければ、例えば文字座標及び文字位置をすべて０にした「矩形なし」の情報を格納エリアｃに格納する。
【００６６】
ステップ２３では、判定対象の現在の文字の有無を判定する。
格納エリアｂに「矩形なし」の情報が格納されていないときは、判定対象の現在の文字があると判定してステップ２４に進む。
【００６７】
ステップ２４では、３つの文字の文字座標に対し、サイズ判定データを用いて条件計算を行う。
図１０は具体例２の説明図である。
３つの文字座標を図１０（ａ）に示すように設定する。
この３つの文字座標から文字間の間隔を算出し、この間隔に誤読判定条件としてのサイズ判定データを適用してサイズ判定を行う。
式（５）、（６）は、サイズ判定データとしての条件式の一例である。
ｌ−ｐｒ−１＝０ …（５）
ｎｌ−ｒ−１＝０ …（６）
【００６８】
式（５）は、図１０（ｂ）に示すように、現在の文字と前の文字の間隔が０となる条件式であり、式（６）は、図１０（ｃ）に示すように、現在の文字と次の文字の間隔が０となる条件式である。
【００６９】
尚、式（５）及び（６）を具体例１と同様に論理積（ＡＮＤ）あるいは論理和（ＯＲ）で複合化してもよい。
また、３つの文字の前後関係は各文字の間隔に限られるものではなく、３つの文字の大きさの関係等を条件にしてもよい。
【００７０】
このサイズ判定データは判定データ格納メモリ２４に格納されており、具体例１と同じように、読み取り領域情報格納メモリ２２に格納されているサイズ判定データ指定情報を参照し、このサイズ判定データ指定情報を用いて図５に示す読み取り領域Ａに適応したサイズ判定データを指定し、参照座標格納メモリ２６から３つの文字座標を取り出して、この読み取り領域Ａのサイズ判定データを適用して誤読判定のための条件計算を行う。
【００７１】
条件計算の方法は具体例１と同様であり、計算結果は判定結果格納メモリ２５に格納される。
但し、参照座標格納メモリ２６の格納エリアａ、または格納エリアｃに「矩形なし」の情報が格納されているときは、その条件式の計算結果は偽となる。
【００７２】
ステップ２５では、計算結果の真偽を判別する。
計算結果が偽のときは、どの条件式にも該当しないので誤読の可能性はないと判定し、ステップ２６に進んで参照座標格納メモリ２６の格納エリアｂに格納されている文字位置を取得し、具体例１と同様に認識結果格納メモリ２３に格納されているその文字位置の文字座標及び文字コードを判定結果格納メモリ２５にそのままコピーする。
【００７３】
また、計算結果が真のときは、誤読の可能性があると判定し、ステップ２７に進んで条件式に対応する処理方法を判別し、処理方法が未処理のときは、格納エリアｂに格納されている現在の文字の文字座標及び文字コードをそのまま判定結果格納メモリ２５へコピーする（ステップ２６）。
【００７４】
処理方法が不読のときは、ステップ２８に進み、その文字の文字位置を参照座標格納メモリ２６から取得し、認識結果格納メモリ２３に格納されているその文字位置の文字座標を判定結果格納メモリ２５へコピーし、その文字の文字コードを、例えば「？」などのように認識結果として含まれるべきでない文字に置換して判定結果格納メモリ２５に格納する。
そして、処理方法が削除のときは、不要と考えられる文字あるいは記号を削除する。
【００７５】
このような処理を全ての文字について行い、全ての文字についてサイズ判別が行われたとき（ステップ２３）、このサイズ判定処理を終了させ、全ての読み取り領域についてこのような処理が行われたとき（ステップ２）、処理が完了する。
【００７６】
〈具体例２の効果〉
以上、説明したように具体例２によれば、３つの文字の前後関係に対してサイズ判定を行うようにしたので、具体例１と同様の効果が得られるだけでなく、誤読の可能性を、より的確に判別することができる。
【００７７】
〈具体例３〉
具体例３は、文字座標に基づいて算出された現在文字の行位置、文字位置、行先頭からの文字位置、その行の文字数に誤読判定条件を設定し、サイズ判定を行うようにしたものである。
【００７８】
図１１は、具体例３の構成を示すブロック図である。
具体例３の文字読取装置は、画像入力部１と、表示部２と、入力部３と、制御部４と、認識部５と、サイズ判定部６と、画像メモリ２１と、読み取り領域情報格納メモリ２２と、認識結果格納メモリ２３と、判定データ格納メモリ２４と、判定結果格納メモリ２５と、参照座標格納メモリ２６と、関連情報格納メモリ２７と、を備えて構成されている。
【００７９】
この関連情報格納メモリ２７は、現在文字に関する情報として、現在文字の行位置、文字位置、行先頭からの文字位置およびその行の文字数などの関連情報を格納するメモリである。
【００８０】
図１２は具体例３の関連情報の説明図である。
この行位置Ｌ、文字位置Ｉ、行先頭からの文字位置ＬＩ、その行の文字数ＬＮは１以上の値とする。
尚、具体例１及び具体例２と同一要素については同一符号を付して説明を省略する。
【００８１】
〈動作〉
次に具体例３の動作を説明する。
具体例３においても、具体例１と同様に、図６のフローチャートを実行し、ステップ４においてサイズ判定処理を実施する。
【００８２】
図１３は具体例３のサイズ判定処理を示すフローチャートである。
ステップ３１〜３３では、具体例２のステップ２１〜２３と同様に先頭文字及び次の文字の文字座標及び文字位置を取得し、それぞれ参照座標格納メモリ２６の格納エリアｂ，ｃに格納し、ステップ３４に進む。
【００８３】
ステップ３４では、判定対象である現在文字の関連情報を設定する。
即ち、認識結果格納メモリ２３を参照し、図１２に示すように、判定対象の文字についての行位置Ｌ、文字位置Ｉ、行先頭からの文字位置ＬＩ、その行の文字数ＬＮ等を取得する。そして、これらの関連情報を関連情報メモリＳ１３に格納する。
【００８４】
ステップ３５では、判定対象である現在の文字の文字座標及びその関連情報に対し、サイズ判定データを用いて条件計算を行う。
式（７）、（８）は、サイズ判定データとしての条件式の一例である。
Ｉ＝２ …（７）
ＬＮ＝３ …（８）
式（７）は２文字目の場合の条件式であり、式（８）は行文字数が３の場合の条件式である。
【００８５】
尚、具体例１、２と同様に、これら２つの条件式を論理積（ＡＮＤ）あるいは論理和（ＯＲ）で複合化してもよい。
このサイズ判定データは判定データ格納メモリ２４に格納されており、具体例１、２と同じように、読み取り領域情報格納メモリ２２に格納されているサイズ判定データ指定情報を参照し、このサイズ判定データ指定情報を用いて読み取り領域に適応したサイズ判定データを指定し、参照座標格納メモリ２６から３つの文字座標を取り出して、この読み取り領域のサイズ判定データを適用して誤読判定のための条件計算を行う。
【００８６】
条件計算の方法は具体例１と同様であり、計算結果は判定結果格納メモリ２５に格納される。
但し、具体例２と同様に、参照座標格納メモリ２６の格納エリアａ、または格納エリアｃに「矩形なし」の情報が格納されているときは、その条件式の計算結果は偽となる。
そして、ステップ３６〜３９では、具体例１，２と同様に後処理を行う。
【００８７】
このような処理を全ての文字について行い、全ての文字についてサイズ判別が行われたとき（ステップ３３）、このサイズ判定処理を終了させ、全ての読み取り領域についてこのような処理が行われたとき（ステップ２）、処理が完了する。
【００８８】
〈具体例３の効果〉
以上、説明したように具体例３によれば、現在文字と前後の文字との位置関係だけでなく、現在文字の関連情報として行位置Ｌ、文字位置Ｉ、行先頭からの文字位置ＬＩ、その行の文字数ＬＮに対してサイズ判定を行うようにしたので、具体例１，２の効果が得られるとともに、特定の行や文字について処理条件を設定でき、行切り出し処理や文字切り出し処理の誤りによる誤読や不要な文字への、より的確な処理を適用することができる。
【００８９】
〈具体例４〉
具体例４は、文字座標に基づいて３つの文字の行座標を算出し、この行座標に誤読判定条件を設定し、サイズ判定を行うようにしたものである。
【００９０】
具体例４の文字読取装置は、具体例３と同様に、画像入力部１と、表示部２と、入力部３と、制御部４と、認識部５と、サイズ判定部６と、画像メモリ２１と、読み取り領域情報格納メモリ２２と、認識結果格納メモリ２３と、判定データ格納メモリ２４と、判定結果格納メモリ２５と、参照座標格納メモリ２６と、関連情報格納メモリ２７と、を備えて構成されている。
尚、具体例１〜３と同一要素については同一符号を付して説明を省略する。
【００９１】
〈動作〉
次に具体例４の動作を説明する。
具体例４においても、具体例１と同様に、図６のフローチャートを実行し、ステップ４においてサイズ判定処理を実施する。
【００９２】
図１４は具体例４のサイズ判定処理を示すフローチャートである。
ステップ４１では、先頭行の矩形領域を作成する。
図１５は具体例４の行座標の作成方法を示す説明図である。
この図１５に示すように、破線で示す矩形領域▲１▼、▲２▼、▲３▼はそれぞれ１つの文字を囲む矩形領域を示す。
【００９３】
矩形領域▲１▼〜▲３▼は例えば、以下の文字座標によって表す。
矩形領域▲１▼の文字座標：（pl,pt）−（pr,pb）
矩形領域▲２▼の文字座標：（l,t）−（r,b）
矩形領域▲３▼の文字座標：（nl,nt）−（nr,nb）
【００９４】
先頭行の行座標を作成するには、この全文字を含むようにして最小の矩形領域▲４▼を設定する。この矩形領域▲１▼〜▲３▼の文字座標を認識結果格納メモリ２３から取り出して、行座標（nl,pt）−（nr,b）が作成される。
【００９５】
また、この行の最終文字の文字位置をこの行の文字位置として、作成された先頭行の行座標（nl,pt）−（nr,b）及びこの文字位置を参照座標格納メモリ２６の格納エリアｃに格納し、格納エリアａ，ｂには「矩形なし」の情報を格納する。
【００９６】
ステップ４２では、ステップ４１と同様に、次の行の矩形領域を作成する。
次の行の矩形領域が作成されたとき、参照座標格納メモリ２６の格納エリアｃに格納されていた先頭行の行座標及びその行の最終文字の文字位置は格納エリアｂに格納され、作成された次の行座標及びその行の最終文字の文字位置が格納エリアｃに格納される。
【００９７】
以後、行座標が作成される毎に、格納エリアｂ，ｃに格納されているデータをそれぞれ格納エリアａ，ｂに格納し、作成した文字座標及び文字位置を格納エリアｃに格納する。
【００９８】
尚、次にサイズ判定を行うべき行の行座標がなければ、例えば行座標及びその行の最終文字の文字位置をすべて０にした「矩形なし」の情報を格納エリアｃに格納する。
【００９９】
ステップ４３では、判定対象である現在の行の有無を判定する。
格納エリアｂに「矩形なし」の情報が格納されていないときは、判定対象の現在の行があると判定してステップ４４に進む。
【０１００】
ステップ４４では、判定対象の現在行の関連情報を設定する。
即ち、現在行の最終文字を参照座標格納メモリ２６の格納エリアｂから取得し、その文字位置の文字に関して認識結果格納メモリ２３を参照し、その行位置Ｌ、文字位置Ｉ、行先頭からの文字位置ＬＩ、その行の文字数ＬＮ等を取得する。そして、これらの関連情報を関連情報メモリＳ１３に格納する。
【０１０１】
ステップ４５では、現在行及びその関連情報に対し、サイズ判定データを用いて条件計算を行う。
条件式については、現在行の位置関係、現在行の大きさ等について設定することができる。
また、具体例１〜３と同様に、２つの条件式を論理積（ＡＮＤ）あるいは論理和（ＯＲ）で複合化してもよい。
【０１０２】
サイズ判定データは判定データ格納メモリ２４に格納されており、具体例１〜３と同じように、読み取り領域情報格納メモリ２２に格納されているサイズ判定データ指定情報を参照し、このサイズ判定データ指定情報を用いて読み取り領域に適応したサイズ判定データを指定し、参照座標格納メモリ２６から３つの文字座標を取り出して、この読み取り領域のサイズ判定データを適用して誤読判定のための条件計算を行う。
【０１０３】
サイズ判定データには、具体例１（図４）と同じような条件式とそれに対応した処理方法が含まれている。
条件計算の方法は具体例１と同様であり、計算結果は判定結果格納メモリ２５に格納される。
【０１０４】
但し、具体例２と同様に、参照座標格納メモリ２６の格納エリアａ、または格納エリアｃに「矩形なし」の情報が格納されているときは、その条件式の計算結果は偽となる。
【０１０５】
ステップ４６では、計算結果を判別し、計算結果が偽のときは、ステップ４７に進む。
ステップ４７では、関連情報格納メモリ２７から現在行の文字位置Ｉ、即ち、現在行の最終文字位置とその行の文字数ＬＮを取得し、この行の開始文字位置（Ｉ−ＬＮ＋１）から最終文字位置Ｉまでの文字座標及び文字コードを認識結果格納メモリ２３から判定結果格納メモリ２５へそのままコピーする。
【０１０６】
また、計算結果が真のときは、誤読の可能性があると判定してステップ４８に進んで条件式に対応する処理方法を判別する。
処理方法が未処理のときは、ステップ４７に進み、判定結果が偽のときと同じ処理を行う。
処理方法が不読のときは、ステップ４９に進む。
【０１０７】
ステップ４９では、関連情報格納メモリ２７から現在行の文字位置Ｉ、即ち、最終文字位置とその行の文字数ＬＮを取得し、この行の開始文字位置（Ｉ−ＬＮ＋１）から最終文字位置Ｉまでの文字座標を認識結果格納メモリ２３から判定結果格納メモリ２５へコピーし、文字コードを、例えば「？」などのように認識結果として含まれるべきでない文字に置換して判定結果格納メモリ２５に格納する。
処理方法が削除のときは、その文字を削除する。
【０１０８】
このような処理を全ての行について行い、全ての行についてサイズ判別が行われたとき（ステップ４３）、このサイズ判定処理を終了させ、全ての読み取り領域についてこのような処理が行われたとき（ステップ２）、処理が完了する。
【０１０９】
〈具体例４の効果〉
以上、説明したように具体例４によれば、現在行前後の位置関係を算出し、この位置関係に対してサイズ判定を行うようにしたので、行単位で行の切り出し処理や文字の切り出し処理の誤りを判別し、後処理を行うことができる。
【０１１０】
〈具体例５〉
具体例５は、同一行で同じ位置条件の文字が連続したとき、これらの文字をブロックにまとめ、ブロック単位でサイズ判定を行うようにしたものである。
【０１１１】
具体例５の関連情報格納メモリ２７には、前後の文字間隔に基づいてブロックにまとめるための条件式が格納されている。
例えば、文字位置ｉ，ｉ＋１の文字座標をそれぞれ（Ｌ(i)，Ｔ(i)）−（Ｒ(i)，Ｂ(i)）、文字座標（Ｌ(i+1)，Ｔ(i+1)）−（Ｒ(i+1)，Ｂ(i＋1)）とすると、間隔Ｄは以下の式（９）によって計算される。
Ｄ＝Ｌ(i+1)−Ｒ(i)−１ …（９）
【０１１２】
式（１０）〜（１５）は、間隔Ｄに基づいてブロックを作成する条件を示す式である。
Ｄ＝Ｄthl …（１０）
Ｄ≠Ｄthl …（１１）
Ｄ＜Ｄthl …（１２）
Ｄ≦Ｄthl …（１３）
Ｄ＞Ｄthl …（１４）
Ｄ≧Ｄthl …（１５）
但し、Ｄthl：所定値
【０１１３】
これらの式（１０）〜（１５）が関連情報格納メモリ２７に格納されている。
具体例５の判定データ格納メモリ２４には、このブロックに対して適用されるサイズ判定データが格納されている。
【０１１４】
図１６は具体例５のサイズ判定データの一例を示す説明図である。
具体例５の参照座標格納メモリ２６は、具体例２と同様に３つの格納エリアａ〜ｃを有している。
尚、具体例１〜４と同一要素については同一符号を付して説明を省略する。
【０１１５】
〈動作〉
次に具体例５の動作を説明する。
具体例５においても、具体例１と同様に、図６のフローチャートを実行し、ステップ４においてサイズ判定処理を実施する。
【０１１６】
図１７は具体例５のサイズ判定処理を示すフローチャートである。
ステップ５１では、認識結果格納メモリ２３から取得したその領域の文字を先頭から参照して、その間隔Ｄを計算し、条件式（１０）〜（１５）を評価して、いずれかの条件が同一行で連続して該当するときは、これらの文字を含む最小の矩形領域を１つのブロックとする。
【０１１７】
図１８はこのブロックの説明図である。
この図１８に示すように、同一行に文字Ｐ，Ｑ，Ｒが並んでいる場合、文字Ｐ，Ｑの間隔Ｄは、前述のように式（９）によって表される。
【０１１８】
例えば、文字Ｐ，Ｑの間隔Ｄが式（１０）〜（１５）のいずれか１つに該当しているときは文字Ｐ，Ｑが１つのブロックにまとめられる。図１８の破線で示す領域がこのようにして作成された１つのブロックを示す。
【０１１９】
尚、文字が、図１５に示すように領域▲１▼、▲２▼、▲３▼に印字されているときは、実線で示す領域▲４▼が最小の矩形領域となり、これが１つのブロックになる。
このブロックはブロック座標（Ｌ(i)，Ｔ(i)）−（Ｒ(i+1)，Ｂ(i+1)）によって特定される。
【０１２０】
まず、最初、参照座標格納メモリ２６の格納エリアａ、ｂには、「矩形なし」の情報を格納し、格納エリアｃにこの先頭ブロックのブロック座標をそのブロックの最終文字位置とともに格納する。
【０１２１】
ステップ５２では、次のブロックをステップ５１と同じように作成する。
参照座標格納メモリ２６の格納エリアａ，ｂ，ｃに格納されている参照情報を１つずつ移動させ、次のブロックの参照座標を認識結果格納メモリ２３から取得し、このブロックの参照座標を格納エリアｃにそのブロックの最終文字位置とともに格納する。もし次の行がないときは、「矩形なし」の情報を格納する。
【０１２２】
ステップ５３では、サイズ判定を行うべきブロックの有無を判定する。
サイズ判定を行うべきブロックがあるときは、ステップ５４に進む。
ステップ５４では、判定対象である現在ブロックの関連情報を設定する。
この関連情報を設定するには、現在ブロックの最終文字の文字位置を、参照座標格納メモリ２６に格納されている現在ブロックの文字位置から取得し、その文字位置の文字について認識結果格納メモリ２３を参照し、行位置Ｌ、文字位置Ｉ、行先頭からの文字位置ＬＩ，その行の文字数ＬＮおよびブロック文字数ＢＮを取得し、関連情報格納メモリ２７に格納する。尚、Ｌ，Ｉ，ＬＩ，ＬＮ、ＢＮは１以上の値とする。
【０１２３】
ステップ５５では、参照座標格納メモリ２６のエリアａ，ｂ，ｃに格納されている現在ブロックの１つ前のブロック、現在ブロック、その次のブロックのブロック座標、及び関連情報格納メモリ２７に格納されているブロック関連情報を参照し、読み取り領域情報格納メモリ２２に格納されているその領域の領域情報に従って、判定データ格納メモリ２４に格納されているサイズ判定データを参照し、このサイズ判定データの条件式の真偽を計算する。
尚、条件式は、具体例１〜４と同じような条件式であってもよいし、論理積(ＡＮＤ)や論理和(ＯＲ)によって複合化させたものでもよい。
【０１２４】
サイズ判定データには、具体例１（図４）と同じような条件式とそれに対応した処理方法が含まれている。
条件計算の方法は具体例１と同様であり、計算結果は判定結果格納メモリ２５に格納される。
【０１２５】
但し、具体例２と同様に、参照座標格納メモリ２６の格納エリアａ、または格納エリアｃに「矩形なし」の情報が格納されているときは、その条件式の計算結果は偽となる。
【０１２６】
ステップ５６では、計算結果の真偽を判別し、計算結果が偽のときは、ステップ５７に進む。
ステップ５７では、関連情報格納メモリ２７から現在ブロックの文字位置Ｉ、即ち、現在ブロックの最終文字位置とそのブロックの文字数ＢＮを取得し、このブロックの開始文字位置（Ｉ−ＢＮ＋１）から最終文字位置Ｉまでの文字座標及び文字コードを認識結果格納メモリ２３から判定結果格納メモリ２５へそのままコピーする。
【０１２７】
また、計算結果が真のときは、誤読の可能性があると判定してステップ５８に進んで条件式に対応する処理方法を判別する。
処理方法が未処理のときは、ステップ５７に進み、判定結果が偽のときと同じ処理を行う。
処理方法が不読のときは、ステップ５９に進む。
【０１２８】
ステップ５９では、関連情報格納メモリ２７から現在ブロックの文字位置Ｉ、即ち、最終文字位置とそのブロックの文字数ＢＮを取得し、このブロックの開始文字位置（Ｉ−ＢＮ＋１）から最終文字位置Ｉまでの文字座標を認識結果格納メモリ２３から判定結果格納メモリ２５へコピーし、その文字コードを、例えば「？」などのように認識結果として含まれるべきでない文字に置換して判定結果格納メモリ２５に格納する。
処理方法が削除のときは、その文字を削除する。
【０１２９】
このような処理を全てのブロックについて行い、全てのブロックについてのサイズ判定が終了したとき（ステップ５３）、ステップ２に戻り、全ての領域情報について認識処理（ステップ３）、サイズ判定処理（ステップ４）が行われたとき（ステップ２）、すべての処理を終了させる。
【０１３０】
〈具体例５の効果〉
以上、説明したように具体例５によれば、同一行で同じ条件の文字が連続したとき、これらの文字をブロックにまとめ、このブロックに対してサイズ判定を行うようにしたので、ブロック単位で行の切り出し処理や文字の切り出し処理の誤りを判別し、後処理を行うことができる。
【０１３１】
〈具体例６〉
具体例６は、サイズ判定データをスクリプトデータで記述するようにしたものである。
【０１３２】
図１９は、具体例６の構成を示すブロック図である。
具体例６の文字読取装置は、画像入力部１と、表示部２と、入力部３と、制御部４と、認識部５と、サイズ判定部６と、スクリプトデータ解析部７と、画像メモリ２１と、読み取り領域情報格納メモリ２２と、認識結果格納メモリ２３と、判定データ格納メモリ２４と、判定結果格納メモリ２５と、参照座標格納メモリ２６と、関連情報格納メモリ２７と、スクリプトデータ格納メモリ２８と、を備えて構成されている。
【０１３３】
スクリプトデータ格納メモリ２８は、スクリプトで記述されたサイズ判定データを格納するメモリであり、このスクリプトはテキストで記述されている。
スクリプトデータ解析部７は、スクリプトデータ格納メモリ２８に格納されているサイズ判定データを参照し、構文解析を行い、サイズ判定部６が使用できる内部的なサイズ判定データに変換する機能を有する解析部である。
尚、具体例１〜５と同一要素については同一符号を付して説明を省略する。
【０１３４】
〈動作〉
次に具体例６の動作を説明する。
具体例２においても、具体例１と同様に、図６のフローチャートを実行し、ステップ４においてサイズ判定処理を実施する。
【０１３５】
図２０は具体例６のサイズ判定処理を示すフローチャートである。
ステップ６１では、スクリプトで記述されたサイズ判定データを解析する。
サイズ判定データを解析するには、読み取り領域情報格納メモリ２２に格納されているその領域の情報に従ってスクリプトデータ格納メモリ２８からスクリプトを取得する。
【０１３６】
スクリプトデータ解析部７はこのスクリプトを構文解析し、サイズ判定部６が使用できる内部的なサイズ判定データに変換し、変換されたサイズ判定データを判定データ格納メモリ２４に格納する。
【０１３７】
式（１６）は、このスクリプトで記述されたサイズ判定データの一例を示す式である。
処理単位，（条件１）処理１｜（条件２）処理２｜…｜（条件ｎ）処理ｎ…（１６）
【０１３８】
処理単位には、文字単位、行単位等の処理単位が記述され、条件１〜ｎには、例えば、条件式（１０）〜（１５）が記述される。そして、その条件１〜ｎに対応した処理１〜ｎを列挙する。
【０１３９】
ステップ６２〜６６では、具体例５のステップ５２〜５５と同様に現在行の関連情報を設定し、判定データ格納メモリ２４に格納されているサイズ判定データを用いて条件計算を行う。
そして、ステップ６７〜７０では、具体例５と同じような後処理を行う。
【０１４０】
このような処理を全てのブロックについて行い、全てのブロックについてのサイズ判定が終了したとき（ステップ６４）、ステップ２に戻り、全ての領域情報について認識処理（ステップ３）、サイズ判定処理（ステップ４）が行われたとき（ステップ２）、すべての処理を終了させる。
【０１４１】
〈具体例６の効果〉
以上、説明したように具体例６によれば、サイズ判定データをスクリプトで記述することにより、具体例１〜５と同様の効果を得ることができるとともに、条件式を容易に定義できる。このため、サイズ判定データの誤り等による変更に容易に対応することができる。
【図面の簡単な説明】
【図１】具体例１の構成を示すブロック図である。
【図２】従来の説明図である。
【図３】具体例１の文字座標の説明図である。
【図４】具体例１のサイズ判定データの一例を示す説明図である。
【図５】具体例１の読み取り領域の説明図である。
【図６】具体例１の動作を示すフローチャートである。
【図７】具体例１のサイズ判定処理を示すフローチャートである。
【図８】具体例２の参照座標格納メモリの説明図である。
【図９】具体例２のサイズ判定処理を示すフローチャートである。
【図１０】具体例２の説明図である。
【図１１】具体例３の構成を示すブロック図である。
【図１２】具体例３の関連情報の説明図である。
【図１３】具体例３のサイズ判定処理を示すフローチャートである。
【図１４】具体例４のサイズ判定処理を示すフローチャートである。
【図１５】具体例４の行座標の作成方法を示す説明図である。
【図１６】具体例５のサイズ判定データの一例を示す説明図である。
【図１７】具体例５のサイズ判定処理を示すフローチャートである。
【図１８】具体例５のブロックの説明図である。
【図１９】具体例６の構成を示す説明図である。
【図２０】具体例６のサイズ判定処理を示すフローチャートである。
【符号の説明】
１画像入力部
４制御部
５認識部
６サイズ判定部
２２読み取り領域情報格納メモリ
２３認識結果格納メモリ
２４判定データ格納メモリ
２５判定結果格納メモリ
２６参照座標格納メモリ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a character reading device that determines the possibility of erroneously recognizing a read character or character string, and performs knowledge processing or post-processing on the read character string when it is erroneously recognized. About.
[0002]
[Prior art]
Conventionally, a character reading device that reads characters recorded on a form image is known.
[0003]
In a conventional character reading apparatus, a specified area on an image is scanned, line coordinates are extracted by line cutout processing, and character coordinates within the line coordinates are detected by character cutout processing. After the detection, character recognition processing is performed on the image of the coordinates of each character.
[0004]
In a character reading apparatus, generally, character recognition is performed after cutting out lines and characters.
In order to cut out rows, the image is scanned in the horizontal direction, a histogram of the number of black pixels is created, and each row is separated at a point where the value of the histogram becomes zero.
[0005]
In order to cut out characters, the image is scanned in the vertical direction, a histogram of the number of black pixels is created, and the image is separated at each point where the value of the histogram is 0 and separated into each character.
When a character is recognized by mistake, knowledge recognition using a word matching dictionary or a process called post-processing is performed to replace misread or unread characters and improve the recognition rate.
[0006]
[Problems to be solved by the invention]
By the way, in such a conventional character reading apparatus, a line or character is cut out by creating a histogram, so there are cases where line cutting and character cutting cannot be performed correctly.
[0007]
FIG. 2 is a diagram for explaining such a conventional technique.
For example, FIG. 2A shows an example in which a plurality of characters are erased. In the figure, a rectangle indicated by a broken line indicates a correct division of one character. In this case, the character line cannot be correctly identified due to the erasing line. Therefore, all characters are mistakenly recognized as one character, and line segmentation cannot be performed correctly.
[0008]
FIG. 2B shows an example in which a plurality of characters are circled. In this case, because of the circled line, two lines of characters are recognized as one vertically long character, and correct line segmentation cannot be performed.
[0009]
FIG. 2C shows an example in which a deviation occurs in printing. In this case, there is no gap between the lines, the lines cannot be correctly identified, and the characters on the two lines are cut out as one vertically long character.
[0010]
FIG. 2D shows an example in which a line for separating digits is included in a row. In this case, a digit line that is unnecessary for character recognition processing is cut out as a character.
If there is an error in the line cutout or character cutout in this way, even if the character size or character position is clearly incorrect, the character recognition process will be carried out as it is with the detected character rectangle as each character, resulting in misreading. Or unnecessary characters are output, and the reliability of the character recognition device is lowered.
[0011]
Moreover, in order to correct such a recognition result, it is necessary for the operator to perform it manually, which increases the burden on the operator.
Therefore, it is necessary to correctly determine whether or not there is a possibility that the character has been erroneously recognized, and when the character is erroneously recognized, it is necessary to perform post-processing automatically.
[0012]
[Means for Solving the Problems]
  The present invention adopts the following configuration in order to solve the above points.
<Configuration 1>
  The character reading device according to the present invention includes an image input unit that acquires characters entered on a predetermined sheet as image data, cuts out each character from the acquired image data, and recognizes the position and size of each character from the character coordinates. And the recognition means that are adjacent to each other in the same line continuously satisfy a predetermined condition, these recognition characters are set in one block, and the position and size of the block are set for each character. Stores block recognition means for recognizing block coordinates from character coordinates, first storage means for storing the block coordinates of the recognized block, and size determination conditions for determining that the height of the block is a predetermined value or more. A second storage unit; a size determination unit that determines whether or not a height of a block obtained from the block coordinates of the recognized block satisfies the size determination condition; Control means for performing post-processing on a block determined to satisfy the size determination condition.
[0013]
<Configuration 2>
  In the character reading device having the configuration 1, the erroneous cutout determination condition is described by a script.
[0014]
<Configuration 3>
  In the character reading device having the configuration 1, the control unit performs any one of a process of replacing the character with another character, an unprocessed process, and a deleting process according to the size determination result. It is characterized by performing as follows.
[0023]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described using specific examples.
<Specific example 1>
Specific example 1 recognizes a character from image data, performs size determination by specifying size determination data adapted to the reading area as a misread determination condition for the position and size of the recognized character, and allows misreading When there is a characteristic, post-processing such as unreading is performed.
[0024]
FIG. 1 is a block diagram illustrating a configuration of the first specific example.
The character reading device of Example 1 includes an image input unit 1, a display unit 2, an input unit 3, a control unit 4, a recognition unit 5, a size determination unit 6, an image memory 21, and reading area information storage. The memory 22 includes a recognition result storage memory 23, a determination data storage memory 24, a determination result storage memory 25, and a reference coordinate storage memory 26.
[0025]
The image input unit 1 is an image input unit having a function of inputting characters and figures written on a form as image data, such as an image scanner and a FAX.
The display unit 2 has a function of displaying information to the operator, such as a display.
The input unit 3 has a function of receiving input from an operator, such as a keyboard and a mouse.
[0026]
The recognition unit 5 scans the reading area with reference to the image data stored in the image memory 21, performs line segmentation and character segmentation, specifies the position and size of each character by character coordinates, The recognition means has a function of recognizing characters and converting the recognized characters into character codes.
[0027]
FIG. 3 is an explanatory diagram of character coordinates.
As shown in FIG. 3, the character “A” is a character to be recognized, and the position and size of this character are specified by a rectangle surrounding the character “A” as indicated by a broken line in the drawing.
This rectangle is represented by the coordinates (l, t)-(r, b), where the specified position is the origin, the upper left coordinates are (l, t), and the lower right coordinates are (r, b). This coordinate becomes the character coordinate.
[0028]
The size determination unit 6 is misread determination processing means having a function of performing size determination on the position and size of characters with respect to the result recognized by the recognition unit 5 and thereby determining the possibility of misreading.
[0029]
FIG. 4 is an explanatory diagram illustrating an example of size determination data used for size determination in the first specific example.
As shown in FIG. 4, the size determination data includes a plurality of conditional expressions and a processing method when corresponding to the conditional expressions.
[0030]
Here, unreading as a processing method replaces a recognized and converted character code with a character that should not be included as a recognition result, such as “?”, And facilitates correction of the recognition result by the operator. Process.
[0031]
The unprocessed process is a process for excluding a character for which a designated condition has been set so as not to perform post-processing in advance from processes such as unreading and deleting.
Deletion is a process of deleting characters that are considered unnecessary.
The processing method is appropriately set according to the conditional expression.
[0032]
The control unit 4 has a function of controlling each block of the character reading device.
The image memory 21 is a memory for storing the image data input by the image input unit 1.
[0033]
The reading area information storage memory 22 is a memory for storing area information for performing reading processing.
FIG. 5 is an explanatory diagram of the reading area of the first specific example.
A form is shown as an example. In this form, reading areas A to F on which characters are printed are set. The size, font, and the like of the character to be recognized are different for each region, and the size determination is performed for each of the reading regions A to F. The area information is area designation information for designating the coordinates of this area and size determination data designation information for designating a size determination conditional expression applied to this area, for each of the read areas A to F. Stored in
[0034]
The recognition result storage memory 23 stores the coordinates of each region and the character coordinates and character codes recognized by the recognition unit 5.
The determination data storage memory 24 is a memory for storing size determination data as shown in FIG.
[0035]
The determination result storage memory 25 is a memory for storing the character code finally obtained as a result of the size determination and its character coordinates, and the character code stored in the determination result storage memory 25 and the character code thereof. The character coordinates are stored in the same format as the data format stored in the recognition result storage memory 23.
[0036]
The reference coordinate storage memory 26 is a memory for storing reference coordinates and character positions for the size determination unit 6 to refer to the recognition result storage memory 23. In the first specific example, the character coordinates of the character recognized by the recognition unit 5 are stored as reference coordinates. The character position is data indicating the position of the character to be referred to among the characters stored in the recognition unit 5, and is 1 when the character to be referred to is, for example, the first character.
[0037]
<Operation>
Next, the operation of Example 1 will be described.
The controller 4 reads each character by controlling each block.
[0038]
FIG. 6 is a flowchart showing the operation of the first specific example.
In step (step is indicated as “S” in the figure) 1, the image input unit 1 reads image data from a form or the like.
The image data is stored in the image memory 21, and the area information is stored in the reading area information storage memory 22.
[0039]
The control unit 4 determines whether or not area information is stored in the reading area information storage memory 22 (step 2).
Initially, a reading area A is designated as shown in FIG.
The recognizing unit 5 refers to the image data stored in the image memory 21, performs line segmentation and character segmentation on the image image in the reading area A based on the area designation information of the reading area A, Detect the coordinates of each character. Then, this character is recognized. The recognized character is converted into a character code, and the character coordinates and the converted character code are stored in the recognition result storage memory 23 (step 3).
[0040]
The size determination unit 6 performs size determination on the recognition result (step 4).
FIG. 7 is a flowchart illustrating the size determination process performed by the size determination unit 6 according to the first specific example.
In step 11, the character coordinates to be determined and their character positions are acquired from the recognition result storage memory 23, and the acquired character coordinates and character positions are stored in the reference coordinate storage memory 26. The first character position is 1. If there is no character coordinate and character position of the next character to be determined, for example, “no rectangle” information in which the character coordinate and character position are all 0 is stored in the reference coordinate storage memory 26.
[0041]
In step 12, the character coordinates stored in the reference coordinate storage memory 26 are referred to and the presence / absence of the next character coordinates to be determined is determined.
If “no rectangle” information is not stored in the reference coordinate storage memory 26, the process proceeds to step 13.
[0042]
In step 13, condition calculation is performed using the size determination data.
The size determination conditions are described by mathematical formulas for convenience. The following expressions (1) to (3) are examples of conditional expressions used for the size determination.
(B−t + 1)> Wth (1)
(R−1 + 1)> Hth (2)
t <tmin (3)
Where l, r: x coordinate of rectangle
t, b: y-coordinate of rectangle
Wth: Lower limit value of rectangle in width (x) direction (for example, 40)
Hth: lower limit value of rectangle in height (y) direction (for example, 40)
tmin: minimum value of the coordinate t (for example, 1200)
[0043]
Expression (1) indicates the misreading determination condition based on the height (y direction) of the rectangle obtained from the character coordinates, and Expression (2) indicates the misreading determination condition based on the width (x direction) of the rectangle obtained from the character coordinates. (3) uses character coordinates themselves as misreading determination conditions.
[0044]
In addition, as shown in the following expression (4), two or more conditional expressions may be combined with logical product (AND) or logical sum (OR).
(B−t + 1)> 40 AND (r−1 + 1)> 40 (4)
[0045]
This size determination data is stored in the determination data storage memory 24. The size determination data designation information stored in the reading area information storage memory 22 is referred to, and size determination data adapted to the reading area A is designated using the size determination data designation information. The character coordinates are extracted from the reference coordinate storage memory 26, and the condition determination for misreading determination is performed by applying the size determination data of the reading area A to the character coordinates.
[0046]
This apparatus stores a control program for calculating the truth of these conditional expressions.
The condition calculation is performed by first substituting the character coordinates into the first conditional expression shown in FIG. If the result of the calculation is false, that is, if the conditional expression is not satisfied, character coordinates are substituted into the next conditional expression. In this way, the character coordinates are sequentially substituted into the conditional expression, and when it becomes true, the calculation is terminated there.
[0047]
For example, as shown in FIGS. 2A and 2B, when a plurality of characters are erased, or when a plurality of characters are circled, equations (1) and (2) Will be satisfied. Further, as shown in FIGS. 2C and 2D, when printing is shifted, and when a line for separating digits is included in a row, the expression (3) is satisfied. In such a case, the calculation result of each conditional expression is true.
When the calculation result of the predetermined conditional expression becomes true, it is determined that there is a possibility of misreading, and no further conditional calculation is performed.
[0048]
On the other hand, when the calculation results for all the conditional expressions are false, the condition calculation based on the size determination data is false. At this time, it is determined that there is no possibility of misreading.
This calculation result is stored in the determination result storage memory 25.
[0049]
In step 14, first, whether the calculation result is true or false is determined.
When the calculation result is false, the process proceeds to step 15.
In step 15, the character position of the false character is acquired from the reference coordinate storage memory 26 and it is determined that there is no possibility of misreading. Therefore, the character at the character position stored in the recognition result storage memory 23 is determined. Coordinates and character codes are copied to the determination result storage memory 25 as they are.
If the result of the condition calculation is true in step 14, the process proceeds to step 16.
[0050]
In step 16, the processing method is determined.
For example, in FIG. 4, when the calculation result becomes false because the character coordinates of the recognition target character satisfy the conditional expression 2, the processing method is unprocessed.
[0051]
When the processing method is not yet processed, the process proceeds to step 15 and the character position stored in the recognition result storage memory 23 for excluding the character satisfying the preset designated condition from the processing such as unread or delete. Are directly copied to the determination result storage memory 25.
[0052]
Further, when the calculation result becomes false by satisfying conditional expression 1, the processing method becomes unread.
When the processing method is unread, the process proceeds to step 17, where the character position of the character is acquired from the reference coordinate storage memory 26, and the character coordinates of the character position stored in the recognition result storage memory 23 are determined. The character code of the character is replaced with a character that should not be included as a recognition result, such as “?”, And stored in the determination result storage memory 25. Therefore, the operator can identify at a glance the possibility of misreading by visually recognizing this character.
Further, when the processing method is deletion, unnecessary characters or symbols that are imaged with, for example, dust and are considered misread are deleted.
[0053]
After such processing for the first character is completed, the process returns to step 11 to acquire the next character coordinate and character position and execute steps 12 to 17 in the same manner.
When the “no rectangle” information is stored in the reference coordinate storage memory 26, it is determined in step 12 that the size determination has been performed for all characters recognized in the reading area A, and the process returns to step 2. .
Such processing is also performed for the reading areas B to F, and when such processing is performed for all the reading areas, the processing is completed.
[0054]
<Effect of specific example 1>
As described above, according to the first specific example, when the character to be recognized is read, the size determination data is set with respect to the coordinates of each character and the height and width obtained from the coordinates to determine the size. Therefore, it is possible to appropriately evaluate the possibility of misreading each character of the recognition result from the character coordinates.
[0055]
In addition, when the processing method is unprocessed, the character that satisfies the specified condition set not to perform the unread process in advance is unprocessed by making the character unprocessed. It can be excluded from the deletion process.
[0056]
In addition, for characters that are considered to be likely to be misread in the recognition result, the unread process is performed to replace the target character with a character that should not be included, such as “?”. Therefore, the operator can visually recognize at a glance, and the operator can easily correct the recognition result.
[0057]
Further, for example, for an obviously unnecessary character or symbol that is considered to be misread due to being imaged by dust or the like, the burden of correction work by the operator can be reduced by deleting this character.
[0058]
Furthermore, since such size determination is performed for each reading area, even in a form in which the font, character type, and size differ for each reading area, appropriate character recognition and size determination are performed for each area. Character recognition accuracy is improved.
[0059]
<Specific example 2>
Specific example 2 calculates the context of three characters based on the character coordinates, sets misreading determination conditions for the context, and performs size determination.
Similar to the first specific example, the character reading device of the second specific example includes an image input unit 1, a display unit 2, an input unit 3, a control unit 4, a recognition unit 5, a size determination unit 6, and an image memory. 21, a reading area information storage memory 22, a recognition result storage memory 23, a determination data storage memory 24, a determination result storage memory 25, and a reference coordinate storage memory 26.
[0060]
However, the reference coordinate storage memory 26 is provided with three storage areas.
FIG. 8 is an explanatory diagram of the reference coordinate storage memory 26.
The storage area b is an area for storing the character coordinates and character position of the current character to be recognized, and the storage areas a and c are the character coordinates and character position immediately before the current character, respectively, This is an area for storing character coordinates and character positions of characters.
In addition, the same code | symbol is attached | subjected about the same element as the specific example 1, and description is abbreviate | omitted.
[0061]
<Operation>
Next, the operation of the specific example 2 will be described.
In the second specific example, as in the first specific example, the flowchart of FIG. 6 is executed, and the size determination process is performed in step 4.
[0062]
FIG. 9 is a flowchart showing the size determination process of the second specific example.
In step 21, the character coordinates and character position of the first character are acquired.
The acquired character coordinates and character positions are stored in the storage area c of the reference coordinate storage memory 26, and information on “no rectangle” is stored in the storage areas a and b.
[0063]
In step 22, the character coordinates and character position of the next character are acquired.
When the next character coordinate and character position are acquired, the character coordinate and character position of the first character stored in the storage area c of the reference coordinate storage memory 26 are stored in the storage area b, and the acquired next character coordinate and The character position is stored in the storage area c.
[0064]
Thereafter, each time the character coordinates and character positions are acquired, the character coordinates and character positions stored in the storage areas b and c are stored in the storage areas a and b, respectively, and the acquired character coordinates and character positions are stored in the storage area c. To store.
[0065]
If there is no character coordinate and character position of the character whose size is to be determined next, for example, “no rectangle” information in which the character coordinate and the character position are all 0 is stored in the storage area c.
[0066]
In step 23, it is determined whether or not there is a current character to be determined.
If the information “No rectangle” is not stored in the storage area b, it is determined that there is a current character to be determined, and the process proceeds to Step 24.
[0067]
In step 24, condition calculation is performed on the character coordinates of the three characters using the size determination data.
FIG. 10 is an explanatory diagram of the second specific example.
Three character coordinates are set as shown in FIG.
An interval between characters is calculated from these three character coordinates, and size determination is performed by applying size determination data as a misread determination condition to the interval.
Expressions (5) and (6) are examples of conditional expressions as size determination data.
l-pr-1 = 0 (5)
nl-r-1 = 0 (6)
[0068]
Expression (5) is a conditional expression in which the interval between the current character and the previous character is 0, as shown in FIG. 10 (b), and Expression (6) is as shown in FIG. 10 (c). This is a conditional expression in which the interval between the current character and the next character is zero.
[0069]
It should be noted that equations (5) and (6) may be combined with logical product (AND) or logical sum (OR) in the same manner as in the first specific example.
Further, the context of the three characters is not limited to the interval between the characters, and the relationship between the sizes of the three characters may be used as a condition.
[0070]
This size determination data is stored in the determination data storage memory 24. As in the first specific example, the size determination data specification information stored in the read area information storage memory 22 is referred to and this size determination data specification information is referred to. 5 is used to specify the size determination data adapted to the reading area A shown in FIG. 5, to extract three character coordinates from the reference coordinate storage memory 26, and to apply the size determination data of the reading area A for misreading determination. The condition calculation is performed.
[0071]
The condition calculation method is the same as in the first specific example, and the calculation result is stored in the determination result storage memory 25.
However, when “no rectangle” information is stored in the storage area a or the storage area c of the reference coordinate storage memory 26, the calculation result of the conditional expression is false.
[0072]
In step 25, it is determined whether the calculation result is true or false.
When the calculation result is false, it is determined that there is no possibility of misreading because it does not correspond to any conditional expression, and the process proceeds to step 26 to acquire the character position stored in the storage area b of the reference coordinate storage memory 26. Similarly to the first specific example, the character coordinates and the character code of the character position stored in the recognition result storage memory 23 are copied to the determination result storage memory 25 as they are.
[0073]
If the calculation result is true, it is determined that there is a possibility of misreading, and the process proceeds to step 27 to determine the processing method corresponding to the conditional expression. If the processing method is unprocessed, it is stored in the storage area b. The character coordinates and character code of the current character being copied are directly copied to the determination result storage memory 25 (step 26).
[0074]
When the processing method is unread, the process proceeds to step 28, where the character position of the character is acquired from the reference coordinate storage memory 26, and the character coordinates of the character position stored in the recognition result storage memory 23 are determined. The character code of the character is replaced with a character that should not be included as a recognition result, such as “?”, And stored in the determination result storage memory 25.
When the processing method is deletion, characters or symbols considered unnecessary are deleted.
[0075]
When such processing is performed for all characters and size determination is performed for all characters (step 23), the size determination processing is terminated, and such processing is performed for all reading regions (step 23). Step 2), the process is complete.
[0076]
<Effect of specific example 2>
As described above, according to the second specific example, since the size determination is performed for the context of the three characters, not only the same effect as the first specific example is obtained but also the possibility of misreading is obtained. Therefore, it can be determined more accurately.
[0077]
<Specific example 3>
Specific example 3 is a method for performing size determination by setting a misreading determination condition to the line position, the character position, the character position from the beginning of the line, and the number of characters in the line calculated based on the character coordinates. is there.
[0078]
FIG. 11 is a block diagram illustrating a configuration of the third specific example.
The character reading device of specific example 3 includes an image input unit 1, a display unit 2, an input unit 3, a control unit 4, a recognition unit 5, a size determination unit 6, an image memory 21, and reading area information storage. The memory 22 includes a recognition result storage memory 23, a determination data storage memory 24, a determination result storage memory 25, a reference coordinate storage memory 26, and a related information storage memory 27.
[0079]
The related information storage memory 27 is a memory for storing related information such as the current character line position, character position, character position from the beginning of the line, and the number of characters in the line, as information on the current character.
[0080]
FIG. 12 is an explanatory diagram of related information of the third specific example.
The line position L, the character position I, the character position LI from the head of the line, and the number of characters LN in the line are set to values of 1 or more.
In addition, the same code | symbol is attached | subjected about the same element as the specific example 1 and the specific example 2, and description is abbreviate | omitted.
[0081]
<Operation>
Next, the operation of specific example 3 will be described.
Also in the specific example 3, as in the specific example 1, the flowchart of FIG. 6 is executed, and the size determination process is performed in step 4.
[0082]
FIG. 13 is a flowchart showing the size determination process of the third specific example.
In steps 31 to 33, the character coordinates and character positions of the first character and the next character are acquired and stored in the storage areas b and c of the reference coordinate storage memory 26, respectively, as in steps 21 to 23 of the second specific example. Proceed to 34.
[0083]
  In step 34, related information of the current character that is the determination target is set.
  That is, referring to the recognition result storage memory 23, as shown in FIG. 12, the line position L, the character position for the character to be determinedIThe character position LI from the head of the line, the number of characters LN in the line, and the like are acquired. Then, the related information is stored in the related information memory S13.
[0084]
In step 35, condition calculation is performed on the character coordinates of the current character to be determined and the related information using the size determination data.
Expressions (7) and (8) are examples of conditional expressions as size determination data.
I = 2 (7)
LN = 3 (8)
Expression (7) is a conditional expression for the second character, and Expression (8) is a conditional expression for the number of line characters of three.
[0085]
Similar to the specific examples 1 and 2, these two conditional expressions may be combined with logical product (AND) or logical sum (OR).
This size determination data is stored in the determination data storage memory 24. Like the specific examples 1 and 2, the size determination data is referred to the size determination data designation information stored in the read area information storage memory 22, and this size determination data is stored. The size determination data adapted to the reading area is specified using the designation information, three character coordinates are extracted from the reference coordinate storage memory 26, and the condition calculation for misreading determination is performed by applying the size determination data of the reading area. Do.
[0086]
The condition calculation method is the same as in the first specific example, and the calculation result is stored in the determination result storage memory 25.
However, as in the second specific example, when “no rectangle” information is stored in the storage area a or the storage area c of the reference coordinate storage memory 26, the calculation result of the conditional expression is false.
In steps 36 to 39, post-processing is performed in the same manner as in specific examples 1 and 2.
[0087]
When such processing is performed for all characters and size determination is performed for all characters (step 33), the size determination processing is terminated and such processing is performed for all reading areas (step 33). Step 2), the process is complete.
[0088]
<Effect of specific example 3>
  As described above, according to the specific example 3, not only the positional relationship between the current character and the preceding and following characters, but also the line position L and the character position as related information of the current character.ISince the size determination is performed with respect to the character position LI from the beginning of the line and the number of characters LN of the line, the effects of specific examples 1 and 2 can be obtained, and the processing conditions can be set for a specific line and character. More accurate processing can be applied to misreading and unnecessary characters due to errors in line cutout processing and character cutout processing.
[0089]
<Specific Example 4>
In the fourth specific example, line coordinates of three characters are calculated based on the character coordinates, a misreading determination condition is set to the line coordinates, and size determination is performed.
[0090]
As in the third specific example, the character reading device of the fourth specific example includes an image input unit 1, a display unit 2, an input unit 3, a control unit 4, a recognition unit 5, a size determination unit 6, and an image memory. 21, a reading area information storage memory 22, a recognition result storage memory 23, a determination data storage memory 24, a determination result storage memory 25, a reference coordinate storage memory 26, and a related information storage memory 27. Has been.
In addition, about the same element as the specific examples 1-3, the same code | symbol is attached | subjected and description is abbreviate | omitted.
[0091]
<Operation>
Next, the operation of the specific example 4 will be described.
Also in the specific example 4, as in the specific example 1, the flowchart of FIG.
[0092]
FIG. 14 is a flowchart illustrating the size determination process of the fourth specific example.
In step 41, a rectangular area in the first row is created.
FIG. 15 is an explanatory diagram showing a method of creating line coordinates according to the fourth specific example.
As shown in FIG. 15, rectangular areas {circle around (1)}, {circle around (2)}, and {circle around (3)} indicated by broken lines each indicate a rectangular area surrounding one character.
[0093]
The rectangular areas (1) to (3) are represented by the following character coordinates, for example.
Character coordinates of rectangular area (1): (pl, pt)-(pr, pb)
Character coordinates of rectangular area (2): (l, t)-(r, b)
Character coordinates of rectangular area (3): (nl, nt)-(nr, nb)
[0094]
In order to create the line coordinates of the first line, the minimum rectangular area (4) is set so as to include all the characters. The character coordinates of the rectangular areas {circle around (1)} to {circle around (3)} are extracted from the recognition result storage memory 23, and line coordinates (nl, pt) − (nr, b) are created.
[0095]
Further, the character position of the last character of this line is taken as the character position of this line, the line coordinates (nl, pt)-(nr, b) of the created first line and the character position are stored in the reference coordinate storage memory 26 storage area. In the storage area a and b, information “no rectangle” is stored.
[0096]
In step 42, as in step 41, a rectangular area of the next row is created.
When the rectangular area of the next line is created, the line coordinates of the first line stored in the storage area c of the reference coordinate storage memory 26 and the character position of the last character of the line are stored and created in the storage area b. The next line coordinate and the character position of the last character of the line are stored in the storage area c.
[0097]
Thereafter, every time line coordinates are created, the data stored in the storage areas b and c are stored in the storage areas a and b, respectively, and the created character coordinates and character positions are stored in the storage area c.
[0098]
If there is no line coordinate of the line to be subjected to size determination next, for example, “no rectangle” information in which the line coordinates and the character position of the last character of the line are all 0 is stored in the storage area c.
[0099]
In step 43, the presence / absence of the current line to be determined is determined.
If the information “No rectangle” is not stored in the storage area b, it is determined that there is a current line to be determined, and the process proceeds to Step 44.
[0100]
  In step 44, the relevant information of the current line to be determined is set.
  That is, the last character of the current line is acquired from the storage area b of the reference coordinate storage memory 26, the recognition result storage memory 23 is referred to for the character at the character position, and the line position L, character positionIThe character position LI from the head of the line, the number of characters LN in the line, and the like are acquired. Then, the related information is stored in the related information memory S13.
[0101]
In step 45, condition calculation is performed on the current line and related information using size determination data.
As for the conditional expression, the positional relationship of the current line, the size of the current line, etc. can be set.
Further, as in the first to third examples, the two conditional expressions may be combined with logical product (AND) or logical sum (OR).
[0102]
The size determination data is stored in the determination data storage memory 24. Like the specific examples 1 to 3, the size determination data specification information stored in the read area information storage memory 22 is referred to, and the size determination data specification is performed. The size determination data adapted to the reading area is specified using the information, the three character coordinates are extracted from the reference coordinate storage memory 26, and the condition calculation for misreading determination is performed by applying the size determination data of the reading area. .
[0103]
The size determination data includes a conditional expression similar to the specific example 1 (FIG. 4) and a processing method corresponding to the conditional expression.
The condition calculation method is the same as in the first specific example, and the calculation result is stored in the determination result storage memory 25.
[0104]
However, as in the second specific example, when “no rectangle” information is stored in the storage area a or the storage area c of the reference coordinate storage memory 26, the calculation result of the conditional expression is false.
[0105]
In step 46, the calculation result is determined. If the calculation result is false, the process proceeds to step 47.
In step 47, the character position I of the current line, that is, the final character position of the current line and the number of characters LN of that line are obtained from the related information storage memory 27, and the final character position is determined from the start character position (I-LN + 1) of this line. The character coordinates and character codes up to I are copied from the recognition result storage memory 23 to the determination result storage memory 25 as they are.
[0106]
If the calculation result is true, it is determined that there is a possibility of misreading, and the process proceeds to step 48 to determine the processing method corresponding to the conditional expression.
When the processing method is unprocessed, the process proceeds to step 47, and the same processing as when the determination result is false is performed.
If the processing method is unread, the process proceeds to step 49.
[0107]
In step 49, the character position I of the current line, that is, the last character position and the number of characters LN of that line are obtained from the related information storage memory 27, and the start character position (I-LN + 1) of this line to the last character position I is acquired. The character coordinates are copied from the recognition result storage memory 23 to the determination result storage memory 25, and the character code is replaced with a character that should not be included as a recognition result, such as “?”, And stored in the determination result storage memory 25. .
When the processing method is deletion, delete the character.
[0108]
When such processing is performed for all rows and size determination is performed for all rows (step 43), the size determination processing is terminated and such processing is performed for all reading areas (step 43). Step 2), the process is complete.
[0109]
<Effect of specific example 4>
As described above, according to the fourth specific example, the positional relationship before and after the current line is calculated, and the size determination is performed with respect to this positional relationship. And post-processing can be performed.
[0110]
<Specific example 5>
In the specific example 5, when characters having the same position condition continue in the same line, these characters are grouped into blocks, and the size is determined in units of blocks.
[0111]
The related information storage memory 27 of the fifth specific example stores conditional expressions for grouping into blocks based on the preceding and following character intervals.
For example, the character coordinates of character positions i and i + 1 are (L (i), T (i))-(R (i), B (i)) and character coordinates (L (i + 1), T (i +), respectively. 1)) − (R (i + 1), B (i + 1)), the interval D is calculated by the following equation (9).
D = L (i + 1) -R (i) -1 (9)
[0112]
Expressions (10) to (15) are expressions indicating conditions for creating a block based on the interval D.
D = Dthl (10)
D ≠ Dthl (11)
D <Dthl (12)
D ≦ Dthl (13)
D> Dthl (14)
D ≧ Dthl (15)
Where Dthl: predetermined value
[0113]
These expressions (10) to (15) are stored in the related information storage memory 27.
The determination data storage memory 24 of the specific example 5 stores size determination data applied to this block.
[0114]
FIG. 16 is an explanatory diagram illustrating an example of the size determination data of the fifth specific example.
The reference coordinate storage memory 26 of the specific example 5 has three storage areas a to c as in the specific example 2.
In addition, about the same element as the specific examples 1-4, the same code | symbol is attached | subjected and description is abbreviate | omitted.
[0115]
<Operation>
Next, the operation of Example 5 will be described.
In the specific example 5, as in the specific example 1, the flowchart of FIG. 6 is executed, and the size determination process is performed in step 4.
[0116]
FIG. 17 is a flowchart showing the size determination process of the fifth specific example.
In step 51, referring to the character in the area acquired from the recognition result storage memory 23 from the beginning, the interval D is calculated, and the conditional expressions (10) to (15) are evaluated, and any of the conditions is the same. When corresponding continuously in a line, the minimum rectangular area including these characters is taken as one block.
[0117]
FIG. 18 is an explanatory diagram of this block.
As shown in FIG. 18, when the characters P, Q, and R are arranged on the same line, the interval D between the characters P and Q is expressed by the equation (9) as described above.
[0118]
For example, when the interval D between the characters P and Q corresponds to any one of the equations (10) to (15), the characters P and Q are combined into one block. A region indicated by a broken line in FIG. 18 indicates one block created in this way.
[0119]
When characters are printed in the areas (1), (2), (3) as shown in FIG. 15, the area (4) indicated by the solid line is the smallest rectangular area, and this is one block. Become.
This block is specified by block coordinates (L (i), T (i))-(R (i + 1), B (i + 1)).
[0120]
First, information “no rectangle” is stored in the storage areas a and b of the reference coordinate storage memory 26, and the block coordinates of the first block are stored in the storage area c together with the last character position of the block.
[0121]
In step 52, the next block is created in the same manner as in step 51.
The reference information stored in the storage areas a, b, and c of the reference coordinate storage memory 26 is moved one by one, the reference coordinates of the next block are acquired from the recognition result storage memory 23, and the reference coordinates of this block are stored. The area c is stored together with the last character position of the block. If there is no next line, information of “no rectangle” is stored.
[0122]
In step 53, it is determined whether there is a block whose size should be determined.
When there is a block whose size is to be determined, the process proceeds to step 54.
In step 54, related information of the current block that is the determination target is set.
In order to set this related information, the character position of the last character of the current block is obtained from the character position of the current block stored in the reference coordinate storage memory 26, and the recognition result storage memory 23 is stored for the character at that character position. The line position L, the character position I, the character position LI from the head of the line, the number of characters LN and the number of block characters BN of the line are acquired and stored in the related information storage memory 27. Note that L, I, LI, LN, and BN have values of 1 or more.
[0123]
In step 55, the block immediately before the current block stored in the areas a, b, and c of the reference coordinate storage memory 26, the current block, the block coordinates of the next block, and the related information storage memory 27 are stored. The size determination data stored in the determination data storage memory 24 according to the area information of the area stored in the read area information storage memory 22 and the size determination data conditions Calculates the truth of an expression.
The conditional expressions may be the same conditional expressions as in the first to fourth examples, or may be compounded by logical product (AND) or logical sum (OR).
[0124]
The size determination data includes a conditional expression similar to the specific example 1 (FIG. 4) and a processing method corresponding to the conditional expression.
The condition calculation method is the same as in the first specific example, and the calculation result is stored in the determination result storage memory 25.
[0125]
However, as in the second specific example, when “no rectangle” information is stored in the storage area a or the storage area c of the reference coordinate storage memory 26, the calculation result of the conditional expression is false.
[0126]
In step 56, the true / false of the calculation result is determined. If the calculation result is false, the process proceeds to step 57.
In step 57, the character position I of the current block, that is, the final character position of the current block and the number of characters BN of the block are obtained from the related information storage memory 27, and the final character position is determined from the start character position (I-BN + 1) of this block. The character coordinates and character codes up to I are copied from the recognition result storage memory 23 to the determination result storage memory 25 as they are.
[0127]
If the calculation result is true, it is determined that there is a possibility of misreading, and the process proceeds to step 58 to determine the processing method corresponding to the conditional expression.
When the processing method is not yet processed, the process proceeds to step 57 and the same processing as when the determination result is false is performed.
If the processing method is unread, the process proceeds to step 59.
[0128]
In step 59, the character position I of the current block, that is, the final character position and the number of characters BN of the block are obtained from the related information storage memory 27, and from the start character position (I-BN + 1) to the final character position I of this block. The character coordinates are copied from the recognition result storage memory 23 to the determination result storage memory 25, and the character code is replaced with a character that should not be included as a recognition result, such as “?”, And stored in the determination result storage memory 25. To do.
When the processing method is deletion, delete the character.
[0129]
When such a process is performed for all the blocks and the size determination for all the blocks is completed (step 53), the process returns to step 2 to recognize all the region information (step 3) and the size determination process (step 4). ) Is performed (step 2), all processing is terminated.
[0130]
<Effect of Specific Example 5>
As described above, according to the specific example 5, when characters with the same condition are consecutive in the same line, these characters are grouped into blocks, and size determination is performed on the blocks. Post-processing can be performed by discriminating errors in line cut-out processing and character cut-out processing.
[0131]
<Specific Example 6>
In specific example 6, the size determination data is described as script data.
[0132]
FIG. 19 is a block diagram illustrating the configuration of the sixth specific example.
The character reading device of specific example 6 includes an image input unit 1, a display unit 2, an input unit 3, a control unit 4, a recognition unit 5, a size determination unit 6, a script data analysis unit 7, an image memory 21, a reading area information storage memory 22, a recognition result storage memory 23, a determination data storage memory 24, a determination result storage memory 25, a reference coordinate storage memory 26, a related information storage memory 27, and a script data storage memory 28.
[0133]
The script data storage memory 28 is a memory for storing size determination data described in a script, and the script is described in text.
The script data analysis unit 7 refers to the size determination data stored in the script data storage memory 28, performs a syntax analysis, and has a function of converting into internal size determination data that can be used by the size determination unit 6. It is.
In addition, about the same element as the specific examples 1-5, the same code | symbol is attached | subjected and description is abbreviate | omitted.
[0134]
<Operation>
Next, the operation of the specific example 6 will be described.
In the second specific example, as in the first specific example, the flowchart of FIG. 6 is executed, and the size determination process is performed in step 4.
[0135]
FIG. 20 is a flowchart showing the size determination process of the sixth specific example.
In step 61, the size determination data described in the script is analyzed.
In order to analyze the size determination data, a script is acquired from the script data storage memory 28 according to the information of the area stored in the reading area information storage memory 22.
[0136]
The script data analysis unit 7 parses the script, converts it into internal size determination data that can be used by the size determination unit 6, and stores the converted size determination data in the determination data storage memory 24.
[0137]
Expression (16) is an expression showing an example of the size determination data described by this script.
Processing unit, (condition 1) processing 1 | (condition 2) processing 2 | ... | (condition n) processing n ... (16)
[0138]
Processing units such as character units and line units are described as processing units, and conditional expressions (10) to (15) are described as conditions 1 to n, for example. And the process 1-n corresponding to the conditions 1-n is enumerated.
[0139]
In steps 62 to 66, the related information of the current line is set as in steps 52 to 55 of the specific example 5, and the condition calculation is performed using the size determination data stored in the determination data storage memory 24.
In steps 67 to 70, post-processing similar to that in the specific example 5 is performed.
[0140]
When such a process is performed for all the blocks and the size determination for all the blocks is completed (step 64), the process returns to step 2 to recognize all the region information (step 3) and the size determination process (step 4). ) Is performed (step 2), all processing is terminated.
[0141]
<Effect of Specific Example 6>
As described above, according to the specific example 6, by describing the size determination data in a script, the same effects as the specific examples 1 to 5 can be obtained, and the conditional expression can be easily defined. For this reason, it is possible to easily cope with a change due to an error in the size determination data.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a specific example 1;
FIG. 2 is a conventional explanatory diagram.
FIG. 3 is an explanatory diagram of character coordinates of a specific example 1;
FIG. 4 is an explanatory diagram showing an example of size determination data of specific example 1;
FIG. 5 is an explanatory diagram of a reading area in specific example 1;
FIG. 6 is a flowchart showing the operation of the first specific example.
FIG. 7 is a flowchart showing a size determination process of specific example 1;
FIG. 8 is an explanatory diagram of a reference coordinate storage memory of a specific example 2;
FIG. 9 is a flowchart illustrating size determination processing according to the second specific example.
10 is an explanatory diagram of specific example 2. FIG.
FIG. 11 is a block diagram showing a configuration of a specific example 3;
FIG. 12 is an explanatory diagram of related information of specific example 3;
FIG. 13 is a flowchart illustrating a size determination process according to a specific example 3;
FIG. 14 is a flowchart illustrating a size determination process according to a specific example 4;
FIG. 15 is an explanatory diagram illustrating a method for creating line coordinates according to a fourth specific example;
FIG. 16 is an explanatory diagram showing an example of size determination data in specific example 5;
FIG. 17 is a flowchart illustrating size determination processing according to a specific example 5;
FIG. 18 is an explanatory diagram of a block of specific example 5;
FIG. 19 is an explanatory diagram showing a configuration of a specific example 6;
FIG. 20 is a flowchart illustrating size determination processing according to a specific example 6;
[Explanation of symbols]
1 Image input section
4 Control unit
5 recognition part
6 Size determination part
22 Reading area information storage memory
23 Recognition result storage memory
24 judgment data storage memory
25 Judgment result storage memory
26 Reference coordinate storage memory

Claims

Image input means for acquiring characters entered on a predetermined sheet as image data;
Recognizing means for cutting out each character from the acquired image data and recognizing the position and size of each character from the character coordinates;
When the interval between adjacent characters in the same line satisfies a predetermined condition continuously , these recognized characters are set in one block, and the position and size of the block are determined from the character coordinates of each character to the block coordinates. Block recognition means that recognizes as
First storage means for storing the block coordinates of the recognized block;
Second storage means that stores determines size determination condition that the height of the block is equal to or greater than a predetermined value,
Size determination means for determining whether or not the height of the block obtained from the block coordinates of the recognized block satisfies the size determination condition;
A character reading apparatus comprising: control means for performing post-processing on a block determined to satisfy the size determination condition.

The erroneous clipping determination condition, character reading apparatus according to claim 1, characterized in that that will be described by the script.

The control means performs any one of a process of replacing the character with another character, an unprocessed process, and a deleting process as the post-process according to the size determination result. The character reader according to 1 .