JP4162195B2

JP4162195B2 - Image processing apparatus and image processing program

Info

Publication number: JP4162195B2
Application number: JP2002250449A
Authority: JP
Inventors: 秀明山形
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2002-08-29
Filing date: 2002-08-29
Publication date: 2008-10-08
Anticipated expiration: 2022-08-29
Also published as: JP2004094292A

Description

【０００１】
【発明の属する技術分野】
本発明は、OCR（光学的文字読み取り装置）等に利用される文字認識処理に関し、より特定すると、文書原稿から読み取った文字列画像をもとに文字認識の対象となる文字候補を抽出する処理の前段で用いられる文字行の切り出しにおいて、文字行の中、例えば、本文行に対するルビ行のように一つの認識対象群から除外したい行を検出し、検出結果を用いて利用する文字行を出力することを可能にする手段を有する画像処理装置、及び画像処理プログラムに関する。
【０００２】
【従来の技術】
従来のOCR（光学的文字読み取り装置）においては、スキャナーにより文書原稿から読み取った画像に基づいて原稿に記された文字を認識する処理を行っている。この処理を行う際に、読み取った画像に含まれる文字列画像をもとに文字認識の対象となる文字候補を抽出するために文字単位の切り出しを行うが、その手順として、複数行の文字列画像から文字行を切り出す処理を前段で行う。この行切り出しは、認識対象を規定することになるので、認識の精度を保証するために適正な切り出しが必要になる。
文字行の切り出しにおいて、従来から知られている方法は、いわば“外接矩形統合法”と呼ぶべき方法である。この方法は、まず、入力文字列画像から図形としてまとまりのある黒画素の連結パターンを抽出し、抽出された各々のパターンについて、その外接矩形を求め、次に、これらの矩形を一つの行を構成する要素と判断する統合規則（例えば、矩形相互の水平、垂直方向の距離が所定範囲内にあれば統合）に従い統合し、得られる行矩形により行の切り出しを行っている（特許2895122号、参照）。
【０００３】
この“外接矩形統合法”による行の切り出しの際、対象とする原稿中に本文を構成する通常の文字行にルビ等の注に相当する行が付加されている場合に、これまではルビ行等も通常の行と同様に切り出されるのが普通であった。
ところで、近年の文字認識装置においては、パターンマッチング法により得られた文字認識結果に対して、何らかの言語処理による修正を施して、文書としてもっともらしい形態を持つ認識結果を最終的に出力する場合が多い。このような言語処理を施すにあたって、ルビ行が通常の行と同じように切り出されてしまうと、ルビ行の前後で文章的なつながりが無くなるため、言語処理による修正の精度が大きく低下する。
例えば、図９に示すようなルビ行が付加された画像が入力された場合に、ルビ行を通常の行と同様に切り出した場合、言語処理には「本日は晴天なりあしたどんてん明日は曇天なり」という文章が対象になるので、正しい言語処理が行えず、文字認識装置の認識性能の低下につながってしまう。
【０００４】
【発明が解決しようとする課題】
そこで、本文にルビが混入することがないように、特開平８−１０１８８６（文字認識装置）では、ルビ行を取り除く方法を提案している。特開平８−１０１８８６に示されている方法では、除去の対象となるルビ文字行が行間に書き加えられたものであり、従って最終行は本文行であるという前提をおいて、最終行を基準として最終行から一つ前の文字行と、先頭行に向けて逆順にルビ行の検出を行っている。しかしながら、この前提条件は常に成り立つものではなく、最終行が必ずしも通常の行であるとは限らない。例えば、脚注などが存在する原稿においては、最終行にルビと同程度の大きさの文字が配置される場合もある。従って、特開平８−１０１８８６は、一つの認識対象群（本文行群）から除外したい、或いは別に扱いたいルビや脚注といった行が、最終行にある場合に対応して、これらの行の検出をすることができない。
このように、従来技術は、ルビや脚注の入った原稿の文字列認識（切り出し）精度向上に対する要求に十分に応えるものではない、という問題を抱えている。本発明は、上述の従来技術の問題に鑑みてなされたものであり、その目的は、本文行に対するルビ行や脚注行といった行のように、一つの認識対象群（本文行群）として扱いたくない、或いは一つの認識対象群（本文行群）とは別に扱いたい行を検出する場合に、対象とする行が原稿上のどの文字行に在っても（特開平８−１０１８８６のような前提条件を置かずに、無条件で）検出ができるようにすることを可能にする画像処理装置、及び画像処理プログラムを提供することにある。
【０００５】
【課題を解決するための手段】
請求項１の発明は、文字列画像から文字行を出力する画像処理装置であって、複数の文字行から基準行を選択する手段と、選択された基準行の有する形状値に基づいて各文字行がルビであるかを判定する手段とを備え、前記選択する手段は、行幅、および行高さを変数とするメンバシップ関数による評価値に応じて基準行を選択することを特徴とする。
【０００６】
請求項２の発明は、請求項１に記載された画像処理装置において、前記選択する手段は、行幅を変数とするメンバシップ関数による評価値、および行高さを変数とするメンバシップ関数による評価値の和に応じて基準行を選択することを特徴とするものである。
【０００８】
請求項４の発明は、コンピュータに、画像データに含まれる複数の文字行から、行幅、および行高さを変数とするメンバシップ関数による評価値に応じて基準行を選択するステップと、選択された基準行の有する形状値に基づいて各文字行がルビであるかを判定するステップとを実行させることを特徴とする画像処理プログラムである。
【０００９】
請求項５の発明は、請求項４に記載された画像処理プログラムにおいて、前記選択するステップは、行幅を変数とするメンバシップ関数による評価値、および行高さを変数とするメンバシップ関数による評価値の和に応じて基準行を選択することを特徴とするものである。
【００２２】
【発明の実施の形態】
本発明が構成要件とする、文字列認識（切り出し）精度の向上を図るための文字行データの出力手段は、処理対象として入力された複数行の文字列画像に含まれる文字行の中、本文行に対するルビ行や脚注行といった行のように、一つの認識対象群（本文行群）として扱いたくない、或いは一つの認識対象群（本文行群）とは別に扱いたい行（以下、単に「ルビ行」という）を検出し、本文行、ルビ行それぞれの文字行データとして区別し、出力することを可能にし、そのための手段（手順）を提供するものである。
以下に示す本発明の各実施形態では、複数行の文字列画像に含まれる文字行全部の行切り出しを行い、その中から本文行、ルビ行それぞれを検出可能とする。その検出手順は、切り出された全行の中から所定の規則に従い基準行（標準的な本文行とみなせる行）を抽出し、抽出された基準行の有する形状値に基づいて、切り出された各々の行が本文行に属する行であるか、否（即ち、ルビ行）かを判定し、その判定結果を用いて、切り出された行データの出力を行うという手順による。
図１は、各実施形態の実施に共通に用いる処理装置（システム）の構成を示すブロック図である。
図１を参照すると、１は例えばスキャナ等の原稿画像を読み取り、その画像を入力する画像入力部、３は入力された複数行の文字列画像に含まれる文字行全部の行切り出しを行う文字行切り出し部、５は切り出された各々の行が一つの認識対象群（本文行群）に属する行であるか、否か、その属性を判定する文字行判定部、７は判定結果を用いて、切り出された文字行データを出力する行出力部である。
なお、以下の各実施形態には、本発明を特徴付ける文字行データの出力に関する手順を中心に実施に係わる形態を例示する。従って、図１に示すブロック図にも、文字認識装置のうちの、行切り出しに係わる部分のみを示し、その他の構成部分については省略し、文字認識処理全体、即ち、対象画像の入力から最終的に文字認識の最適解を得るまでの手順（手段）全体の説明をしないが、文字認識処理全体については、文字認識に必要な基本的な手順として従前から知られている手順を適用することにより、その実施が可能である。
【００２３】
「実施形態１」
本実施形態は、図１に示した処理システムにより実行される文字行データの出力（検出）処理に係わるものである。ここに示す文字行データの出力処理は、基準行（標準の文字行とみなせる行）を選択する規則として、最大行幅を用い、また、選択・抽出された基準行の有する形状値としての高さに基づいて、切り出された各々の行が本文行に属する行であるか、ルビ行か、その属性を判定し、ルビ行と判定された行については行データを削除して、切り出された行データの出力を行うという手順による処理プロセスの実施形態を示す。
図２は、本実施形態の文字行データの出力処理のフローチャートを示す。
図２を参照すると、本実施形態フローでは、先ず、画像入力部１により認識対象となる複数行の文字列画像を文字行切り出し部３に入力する（Ｓ１１）。なお、この入力の際、画像と共に、認識対象領域のデータを与えても良い。認識対象領域が与えられた場合には、与えられた領域内のみを行切り出しの対象とすればよい。
次に、文字行切り出し部３は、従来提案されている手法を適用して文字行を切り出す（Ｓ１２）。文字行の切り出しには、射影を用いる方法などさまざまな手法が提案されているが、ここでは、上記「従来の技術」の項に示した“外接矩形統合法”を用いるものとする。例えば、特許2895122号に示す手法で行切り出しを行った場合、統合により得られる行矩形の座標と、行内の矩形（統合の基になる黒画素連結成分の外接矩形）の座標が文字行切り出し部３から出力され、文字行判定部５に送られる。なお、このステップで切り出した行に関する全ての行データを記憶部２に格納する。
【００２４】
次に、行切り出し結果を受け取る文字行判定部５は、切り出された各々の行が本文行に属する行であるか、否か（即ち、ルビ行であるか）を判定する。この手順として、先ず、文字行切り出し部３から送られてきた全ての切り出し行の中から、一つの基準行を選択し、これを判定の基準として定める。基準行の選択にあたっては、行矩形の座標を用いて全ての行矩形のうち、その幅の最も広い行を基準行とする（Ｓ１３）。この基準行の定め方によると、通常、ルビ行の幅がルビを付与されている本文行の幅より広くなることは無いので、この基準で選択すれば、標準的な本文行とみなせる行が選択され、ルビ行が選択されることは無い。基準行を定めた後、判定に用いる基準値を設定するための手順として、基準行として定めた最大行幅を持つ行の高さ値：Shを取得し、取得した行高さ値の半分：Sh／2を判定の基準値として設定する（Ｓ１４）。
次いで、各切り出し行の判定は、各行の高さ：HがSh／2より低い行をルビ行と判定し、それ以外を本文行と判定する。また、本実施形態では、ルビ行と判定した行データを削除するという処理を行う。従って、この処理の手順としては、各行の高さHがSh／2より低い行であるか、否かを判定し（Ｓ１５）、Sh／2より低い行である場合には（Ｓ１５-YES）、このルビ行のデータを先に記憶部２に格納した行データから削除する（Ｓ１６）。なお、このルビ行判定・行データ削除処理は、各切り出し行毎に全部の行について、判定を行うので、ステップＳ１５，Ｓ１６の処理は、行数分繰り返し実行する。
ルビ行判定・行データ削除処理を各切り出し行に適用した後、ルビ行データが削除され、それ以外の本文行にあたる行の行矩形、行内矩形の情報を含む行データを行出力部７を通じて、文字認識処理を行うための後段の処理部へ出力し（Ｓ１７）、この処理を終了する。
【００２５】
「実施形態２」
本実施形態は、図１に示した処理システムにより実行される文字行データの出力（検出）処理に係わるものである。ここに示す文字行データの出力処理は、基準行（標準の文字行とみなせる行）を選択する規則として、行幅と行高さを変数とするメンバシップ関数を導入し、この関数により基準行としての評価値を算出する。
図５は、メンバシップ関数の一例を示す線図であり、図４は、メンバシップ関数を設定するためのパラメータに用いる切り出し行矩形の形状値を説明する図である。
このメンバシップ関数は、下記(1)、(2)の条件、
(1) 行幅が広いほど評価値が高い。
(2) 行高さが低いほど評価値が高い。
に従った設定とする。
ここでは、上記(1)を満足する関数として、図５(A)の例に示すように、最大行幅：MaxWの評価値を最大値：1とする一次関数を用いる。
また、上記(2)を満足する関数として、図５(B)の例に示すように、最大行高さ：MaxHの評価値を最小値：0とする一次関数を用いる。ただし、行高さについては誤って線分のみの行やノイズのみの微小行を選択しないように、又、ルビ行が基準行として選択されないように、所定のしきい値：Thignoreより小さい場合には評価値が“0”となるようにしている。また、メンバシップ関数の連続性を考慮して、最大行高さMaxHの半分の高さMaxH／2で評価値を最大値：1としている。
このメンバシップ関数を用いて、対象となる行各々の評価値を算出する。評価値の算出方法は、ここでは、行高さのメンバシップ関数から求まる評価値と、行幅のメンバシップ関数から求まる評価値の和を各行の評価値とし、評価値最大の行を基準行として選択する（後述の図３に示す処理フローの説明、参照）。
また、選択・抽出された基準行の有する形状値としての高さに基づいて、切り出された各々の行が本文行に属する行であるか、ルビ行か、その属性を判定し、ルビ行と判定された行については行データを削除して、切り出された行データの出力を行うという手順により、文字行データの出力処理プロセスを実行する。
【００２６】
図３は、本実施形態の文字行データの出力処理のフローチャートを示す。
図３を参照すると、本実施形態フローでは、先ず、画像入力部１により認識対象となる複数行の文字列画像を文字行切り出し部３に入力する（Ｓ２１）。なお、この入力の際、画像と共に、認識対象領域のデータを与えても良い。認識対象領域が与えられた場合には、与えられた領域内のみを行切り出しの対象とすればよい。
次に、文字行切り出し部３は、文字行を切り出しを行う（Ｓ２２）。文字行の切り出しの手法は、上記した「実施形態１」に示したと同様に、“外接矩形統合法”を適用することにより実施する。文字行の切り出し結果として得られる行矩形の座標と、行内の矩形（統合の基になる黒画素連結成分の外接矩形）の座標は、文字行切り出し部３から出力され、文字行判定部５に送られる。なお、このステップで切り出した行に関する全ての行データを記憶部２に格納する。
次に、行切り出し結果を受け取る文字行判定部５は、切り出された各々の行が本文行に属する行であるか、否か（即ち、ルビ行であるか）を判定する。この手順として、先ず、文字行切り出し部３から送られてきた全ての切り出し行の中から、一つの基準行を選択し、これを判定の基準として定める。
基準行の選択にあたっては、上記したメンバシップ関数を適用して評価値を求め、評価値最大の行を基準行として選択する。
【００２７】
図６は、この基準行の選択処理を説明するための図である。同図の(A)は認識処理の対象となる複数の行S1〜S5を示し、同図の(B)、(C)は上記で説明した方法（図４，５参照）により設定されたメンバシップ関数、及び(A)に示した対象行へのメンバシップ関数の適用時の操作状態を示す。
基準行の選択処理の手順としては、まず、メンバシップ関数を設定する（Ｓ２３）。このために、認識処理の対象となる複数の行S1〜S5の中から最大行幅MaxW及び最大行高さMaxHを抽出する（図６(A)参照）。抽出した最大行幅MaxWをパラメータとして行幅に対するメンバシップ関数（図６(B)参照）を設定し、抽出した最大行高さMaxHをパラメータとして行高さに対するメンバシップ関数（図６(C)参照）を設定する。
この後、設定されたメンバシップ関数を用いて、対象となる行各々の評価値：メンバシップ値Vを算出し、その最大値Vmaxをとる行を基準行として選択する。従って、まず、Vmax＝0として、この処理における初期条件を設定する（Ｓ２４）。
次いで、対象となる複数の行S1〜S5の各行にメンバシップ関数を適用してメンバシップ値Vを算出する（Ｓ２５）。対象となる複数の行S1〜S5の各行の行幅値、行高さ値それぞれに対し、図６の(B)、(C)の例に示すように、関数に従ったメンバシップ値を得るが、ここでは行幅値、行高さ値それぞれに対するメンバシップ値の和を算出し、最終的に求めるメンバシップ値Vとする。
さらに、最大値Vmaxとなる行を選択するので、各行毎に順次求められるメンバシップ値Vを、これまでに求めた行の最大値Vmaxと比較し（Ｓ２６）、その結果により、即ち最大値Vmaxが変更される場合（Ｓ２６-YES）、変更後の最大値Vmaxの行データ（後段で利用する最大行幅を持つ行の高さ値：Sh）を更新する（Ｓ２７）。この基準行の選択処理は、各切り出し行毎にS1〜S5全部の行について、判定を行うので、ステップＳ２５〜Ｓ２７の処理は、行数分繰り返し実行する。
【００２８】
基準行の選択処理により基準行を定めた後、切り出された各々の行が本文行に属する行であるか、否か（即ち、ルビ行であるか）を判定する。判定に用いる基準値は、前段のステップＳ２７で取得しておいた基準行が持つ行データとしての行高さ値Shを用い、この行高さ値の半分：Sh／2を判定の基準値として設定する。
各切り出し行の判定は、各行の高さ：HがSh／2より低い行をルビ行と判定し、それ以外を本文行と判定する。また、本実施形態では、ルビ行と判定した行データを削除するという処理を行う。従って、この処理の手順としては、各行の高さHがSh／2より低い行であるか、否かを判定し（Ｓ２８）、Sh／2より低い行である場合には（Ｓ２８-YES）、このルビ行のデータを先に記憶部２に格納した行データから削除する（Ｓ２９）。なお、このルビ行判定・行データ削除処理は、各切り出し行毎に全部の行について、判定を行うので、ステップＳ２８，Ｓ２９の処理は、行数分繰り返し実行する。
ルビ行判定・行データ削除処理を各切り出し行に適用した後、ルビ行データが削除され、それ以外の本文行にあたる行の行矩形、行内矩形の情報を含む行データを行出力部７を通じて、文字認識処理を行うための後段の処理部へ出力し（Ｓ３０）、この処理を終了する。
【００２９】
「実施形態３」
本実施形態は、図１に示した処理システムにより実行される文字行データの出力（検出）処理に係わるものである。ここに示す文字行データの出力処理は、上記した「実施形態２」の改良に係わるものである。改良点は、ルビ行の過検出を抑制することを可能とするものであり、ルビ行と同様の行矩形の高さ（上記の各実施形態に即していうと、H＜Sh／2となる高さ）を有する行に属するものの中に、ルビ行ではなく、本文行と見なした方が適当である、即ちルビ行として削除すると悪影響が生じる場合があり、このような行高さによるチェックで過検出となる行を、本文行として扱うことができるようにする処理を付加する。このための手段として、行高さのチェックでルビ行と判定されても、基準行の高さと比較して前後の行との間隔が広い場合、つまりルビ行と明らかに判定ができない場合（なお、本来のルビ行やノイズ行などでは、前後の行との間隔が非常に狭くなる場合が殆どなので、この条件を追加してもルビ行の検出には影響がない）には、本文行と見なし、ルビ行としての扱いをするものから除外する処理手段を用いる。
なお、基準行（標準の文字行とみなせる行）を選択する規則として、行幅と行高さを変数とするメンバシップ関数を導入し、この関数により基準行としての評価値を算出するという点では、「実施形態２」と変わりがない。
【００３０】
図７は、本実施形態の文字行データの出力処理のフローチャートを示す。
図７を参照すると、本実施形態フローでは、メンバシップ関数による評価により基準行を選択し、基準行が持つ行高さ値Shを、ルビ行判定の基準値として設定するまでのステップＳ３１〜Ｓ３７の処理手順は、上記した「実施形態２」の手順（図３のステップＳ２１〜Ｓ２７）と同様に実施する。従って、上記した「実施形態２」のステップＳ２１〜Ｓ２７の処理手順の説明を参照することとし、ここでは、この処理手順の記述を省略する。
メンバシップ関数による評価値が最大となる行を基準行とする基準行選択処理（Ｓ３５〜３７）により基準行を定めた後、切り出された各々の行が本文行に属する行であるか、否か、その属性を判定する。本実施形態では、行高さによるルビ行の判定と、ルビ行の過検出を補正するために行う前後（或いは上下）の行との間隔による判定の２段階でこの判定を行う。
ここでは、行高さによるルビ行の判定に用いる基準値は、前段のステップＳ３７で取得しておいた基準行が持つ行データとしての行高さ値Shを用い、この行高さ値の半分：Sh／2を判定の基準値として設定し、各行の高さ：HがSh／2より低い行をルビ行と判定する。また、前後の行との間隔による判定は、基準行の高さShと比較して前後の行との間隔（前行との間隔＋次行との間隔）：Bの方が広い場合に、本文行と見なすようにする。
２段階の各切り出し行の判定の結果により、本文行或いは本文行と見なされた行の行データを出力し、それ以外のルビ行と判定した行データを削除するという処理を行う。
【００３１】
従って、この処理フローにおける手順としては、まず、各行の前後の行との間隔（前行との間隔＋次行との間隔）Bを算出する（Ｓ３８）。
次いで、各行の高さHが基準行の高さの半分Sh／2より低い行であるか、否かを判定し（Ｓ３９）、Sh／2より低い行である場合には（Ｓ２８-YES）、さらにステップＳ３８で算出した前後の行との間隔Bが基準行の高さShより広いか、否かを判定する（Ｓ４０）。
ここで、前後の行との間隔Bが基準行の高さShより狭い場合（Ｓ４０-YES）、過検出のないルビ行と判定されるので、この行のデータを先に記憶部２に格納した行データから削除する（Ｓ４１）。なお、このルビ行判定・行データ削除処理は、各切り出し行毎に全部の行について、判定を行うので、ステップＳ３８〜Ｓ４１の処理は、行数分繰り返し実行する。
ルビ行判定・行データ削除処理を各切り出し行に適用した後、過検出のないルビ行と判定されたルビ行データが削除され、それ以外の本文行或いは本文行と見なされた行の行矩形、行内矩形の情報を含む行データを行出力部７を通じて、文字認識処理を行うための後段の処理部へ出力し（Ｓ４２）、この処理を終了する。
【００３２】
「実施形態４」
本実施形態は、図１に示した処理システムにより実行される文字行データの出力（検出）処理に係わるものである。ここに示す文字行データの出力処理は、上記した「実施形態３」を改変するものである。改変する点は、「実施形態３」では、過検出を抑制して、明らかなルビ行の判定を行い、判定されたルビ行について行データを削除する処理を行っているが、このルビ行についてのデータ削除を行わずに、本文行とは別系統のデータとして、後段の文字認識処理に用いることを可能にするための出力処理を行うようにした点にある。
このルビ行の出力処理は、ルビ行であることを示す情報を追加して、行出力部７を通じて後段の処理へ行データを出力する。後段の処理では、追加されたルビ行であることを示す情報により、ルビ行を無視して言語処理等の後処理を行うことが可能になる。その上、その処理とは別に、各ルビ行を独立に処理して認識結果を得、最終的に本文行の認識結果と合成して文字認識装置の処理結果として出力することも可能になる。出力は、RTFなどルビに対応したフォーマットで、ルビの部分も含めた認識結果を出力する等、利用に適した形態による方法を採用すればよい。
【００３３】
図８は、本実施形態の文字行データの出力処理のフローチャートを示す。
図８を参照すると、本実施形態フローでは、メンバシップ関数による評価により基準行を選択し、基準行が持つ行高さ値Shを、ルビ行判定の基準値として設定し、前後の行との間隔Bを求めて過検出を抑制して、明らかなルビ行の判定を行うまでのステップＳ５１〜Ｓ６０の処理手順は、上記した「実施形態３」の手順（図７のステップＳ３１〜Ｓ４０）と同様に実施する。従って、上記した「実施形態３」のステップＳ３１〜Ｓ４０の処理手順の説明を参照することとし、ここでは、この処理手順の記述を省略する。
ステップＳ５９に至るまでの処理を経てルビ行と判定された行に対し、前後の行との間隔Bが基準行の高さShより狭いか、否かの判定を行い（Ｓ６０）、前後の行との間隔Bが基準行の高さShより狭ければ、明らかな（過検出のない）ルビ行と判定される（Ｓ６０-YES）。ここで、明らかなルビ行であると判定された切り出し行に対して、上記「実施形態３」におけるように行データの削除をしないで、明らかなルビ行であるとした判定結果を行データ（行の行矩形、行内矩形の情報を含む）に追加する（Ｓ６１）。
ルビ行判定・行データ追加処理を各切り出し行に適用した後、明らかなルビ行と判定されたルビ行について、判定結果の情報が追加され、又、明らかなルビ行以外の本文行或いは本文行と見なされた行については、本来の行矩形、行内矩形の情報を含む行データを行出力部７を通じて、文字認識処理を行うための後段の処理部へ出力し（Ｓ６２）、この処理を終了する。
【００３４】
「実施形態５」
本実施形態は、本発明に係わる文字認識装置の他の実施形態を示すものである。
上記した「実施形態１」〜「実施形態４」に示した文字行データの出力処理手順を含む処理を実行する手段として、汎用のコンピュータを利用して構成される装置を例示するものである。
汎用のコンピュータにより実施するものであるから、構成要素として、スキャナ、キーボード、マウス等の入力装置に対する入力部I/F、CPU、記憶装置、ハードディスクドライブ等の補助記憶装置、ディスプレイ等への出力装置への出力I/F、リムーバブルな記憶媒体のドライブ、リムーバブルな記憶媒体、ネットワークを介して他機と通信するためのコントローラなど通常のコンピュータが備える構成要素を備え、これらをバス接続して装置（システム）を構成する。
また、記憶装置、ハードディスクドライブ等の補助記憶装置、ドライブが用いる記憶媒体の一部には、本発明に係わる文字列認識（切り出し）機能を実現するための、上記「実施形態１」〜「実施形態４」に示した文字行データの出力処理手順を含む文字認識方法に示した各処理手順を実行するためのプログラム（ソフトウェア）が記録されている。
処理対象の文字列画像は、スキャナー等の入力装置による原稿読み取りで入力され、例えばハードディスクなどに格納されているものである。CPUは、記憶手段が有する記録媒体から上記した処理手順を実現するプログラムを読み出し、プログラムに従う処理を対象文字列画像に実行し、その処理結果等をディスプレイに出力する。
なお、本発明に係わる文字認識装置を、ネットワークコントローラによりネットワークを介して、外部の装置と接続して、機能の一部をネットワーク上に持つような形態で実施してもよい。
【００３５】
【発明の効果】
複数の文字行からより正確に基準行を選択することができる。
【図面の簡単な説明】
【図１】本発明に係わる文字列認識（切り出し）処理システムの構成を示すブロック図である。
【図２】「実施形態１」に係わる文字行データの出力処理のフローチャートを示す。
【図３】「実施形態２」に係わる文字行データの出力処理のフローチャートを示す。
【図４】基準行を求めるためのメンバシップ関数を設定するためのパラメータを説明する図である。
【図５】図４のパラメータを用いて設定されたメンバシップ関数の一例を示す線図である。
【図６】メンバシップ値による基準行の選択処理を説明するための図である。
【図７】「実施形態３」に係わる文字行データの出力処理のフローチャートを示す。
【図８】「実施形態４」に係わる文字行データの出力処理のフローチャートを示す。
【図９】ルビ行が付加された画像の一例を示す。
【符号の説明】
１…画像入力部、２…記憶部、
３…文字行切り出し部、５…文字行判定部、
７…行出力部。[0001]
BACKGROUND OF THE INVENTION
  The present invention relates to a character recognition process used in an OCR (optical character reader) or the like, and more specifically, a process of extracting character candidates to be character recognition based on a character string image read from a document original. In the extraction of the character line used in the previous stage, the line to be excluded from one recognition target group, such as the ruby line for the body line, is detected, and the character line to be used is output using the detection result. Have means that allow toImage processingapparatusAnd image processingRegarding the program.
[0002]
[Prior art]
In a conventional OCR (optical character reading device), processing for recognizing characters written on a document based on an image read from a document document by a scanner is performed. When this processing is performed, character units are cut out in order to extract character candidates that are targets for character recognition based on the character string image included in the read image. The process of cutting out character lines from the image is performed in the previous stage. Since this line cut-out defines the recognition target, proper cut-out is necessary to guarantee the recognition accuracy.
In cutting out character lines, a conventionally known method is a method that should be called a “circumscribed rectangle integration method”. In this method, first, a connected pattern of black pixels that are grouped as a graphic is extracted from an input character string image, and a circumscribed rectangle is obtained for each extracted pattern, and then these rectangles are converted into one line. Integration is performed in accordance with an integration rule for determining the constituent elements (for example, integration is performed if the horizontal and vertical distances between rectangles are within a predetermined range), and rows are cut out using the obtained row rectangles (Japanese Patent No. 2895122, reference).
[0003]
When a line is cut out by this “surrounding rectangle integration method”, if a line corresponding to a note such as ruby is added to a normal character line constituting the text in the target document, the ruby line has been used so far. Etc. were usually cut out in the same way as normal lines.
By the way, in recent character recognition devices, there is a case where a character recognition result obtained by the pattern matching method is corrected by some language processing and a recognition result having a plausible form as a document is finally output. Many. In performing such language processing, if a ruby line is cut out in the same way as a normal line, there will be no linguistic connection before and after the ruby line, so the accuracy of correction by language processing is greatly reduced.
For example, when an image with a ruby line added as shown in FIG. 9 is input, if the ruby line is cut out in the same way as a normal line, the language processing indicates that “Today is a fine day, tomorrow is a cloudy day. Since the sentence “Nari” is targeted, correct language processing cannot be performed, leading to a reduction in recognition performance of the character recognition device.
[0004]
[Problems to be solved by the invention]
  Therefore, in order to prevent ruby from being mixed into the text, Japanese Patent Laid-Open No. 8-101886 (character recognition device) proposes a method for removing ruby lines. In the method disclosed in Japanese Patent Application Laid-Open No. 8-101886, the ruby character line to be removed is added between the lines, and therefore the final line is the text line, and the final line is used as a reference. The ruby lines are detected in reverse order from the last line to the previous character line and the first line. However, this precondition does not always hold, and the final line is not always a normal line. For example, in a manuscript with footnotes and the like, there are cases in which characters having the same size as ruby are arranged on the last line. Therefore, Japanese Patent Laid-Open No. 8-101886 detects these lines corresponding to the case where lines such as ruby and footnote that are to be excluded from one recognition target group (text line group) or to be handled separately are in the last line. Can not do it.
  As described above, the conventional technique has a problem that it does not sufficiently meet the demand for improving the character string recognition (cutting) accuracy of a manuscript including ruby or footnote. The present invention,UpThe purpose of the present invention was made in view of the above-mentioned problems of the prior art, and its purpose is not to treat it as one recognition target group (text line group) like a ruby line or a footnote line with respect to the text line, or When detecting a line to be handled separately from one recognition target group (text line group), no matter what character line on the document the target line is (a precondition as described in JP-A-8-101886 is set). So that it can be detected unconditionally)DoMake it possibleImage processingapparatusAnd image processingTo provide a program.
[0005]
[Means for Solving the Problems]
  The invention of claim 1An image processing apparatus that outputs a character line from a character string image, wherein means for selecting a reference line from a plurality of character lines and whether each character line is ruby based on a shape value of the selected reference line Determining means, wherein the selecting means selects a reference line according to an evaluation value by a membership function having a line width and a line height as variables.
[0006]
  The invention of claim 2 is described in claim 1Image processingIn the deviceThe means for selecting selects a reference row according to a sum of an evaluation value by a membership function with a line width as a variable and an evaluation value by a membership function with a line height as a variable.It is characterized by doing.
[0008]
  The invention of claim 4A step of selecting a reference line from a plurality of character lines included in image data according to an evaluation value by a membership function having a line width and a line height as variables, and a shape of the selected reference line And a step of determining whether each character line is ruby based on a value..
[0009]
  The invention of claim 55. The image processing program according to claim 4, wherein the selecting step is performed according to a sum of an evaluation value by a membership function having a line width as a variable and an evaluation value by a membership function having a line height as a variable. It is characterized by selecting a reference line.
[0022]
DETAILED DESCRIPTION OF THE INVENTION
The character line data output means for improving the accuracy of character string recognition (cutting out), which is a constituent requirement of the present invention, includes a text body included in character lines included in a plurality of character string images input as processing targets. A line that you do not want to treat as a single recognition target group (text line group), such as a ruby line or a footnote line, or a line that you want to treat separately from a single recognition target group (text line group) Ruby line ”), and it is possible to distinguish and output as character line data for each of the text line and the ruby line, and provide means (procedure) for that purpose.
In each embodiment of the present invention described below, all the character lines included in the character string image of a plurality of lines are cut out, and the body line and the ruby line can be detected from among them. The detection procedure extracts a reference line (a line that can be regarded as a standard text line) from all the extracted lines according to a predetermined rule, and each extracted line is based on the shape value of the extracted reference line. This is based on the procedure of determining whether the line is a line belonging to the body line or not (that is, ruby line) and outputting the extracted line data using the determination result.
FIG. 1 is a block diagram showing a configuration of a processing apparatus (system) used in common for the implementation of each embodiment.
Referring to FIG. 1, 1 is an image input unit for reading a document image such as a scanner and inputting the image, and 3 is a character line for cutting out all the character lines included in the input character string image of a plurality of lines. The cutout unit 5 is a character line determination unit that determines whether or not each of the cut out lines is a line belonging to one recognition target group (text line group), and 7 is a determination result. A line output unit that outputs the extracted character line data.
In the following embodiments, modes related to the implementation will be exemplified with a focus on procedures relating to the output of character line data characterizing the present invention. Therefore, the block diagram shown in FIG. 1 also shows only the part related to line segmentation in the character recognition device, omits other components, and finally performs the entire character recognition process, that is, the input of the target image. The entire procedure (means) for obtaining the optimal solution for character recognition is not explained in the following, but the procedure known from the past as the basic procedure necessary for character recognition is applied to the entire character recognition process. The implementation is possible.
[0023]
“Embodiment 1”
The present embodiment relates to character line data output (detection) processing executed by the processing system shown in FIG. In the character line data output processing shown here, the maximum line width is used as a rule for selecting a reference line (a line that can be regarded as a standard character line), and the shape value of the selected / extracted reference line is high. Based on the above, it is determined whether each extracted line is a line belonging to the body line or a ruby line, its attribute, and for the line determined to be a ruby line, the line data is deleted and the extracted line is deleted. An embodiment of a processing process according to a procedure of outputting data will be described.
FIG. 2 shows a flowchart of the character line data output process of this embodiment.
Referring to FIG. 2, in the flow of the present embodiment, first, the image input unit 1 inputs a plurality of character string images to be recognized into the character line cutout unit 3 (S11). In this input, the recognition target area data may be given together with the image. When a recognition target region is given, only the inside of the given region may be set as a row cutout target.
Next, the character line cutout unit 3 cuts out a character line by applying a conventionally proposed method (S12). Various methods such as a method using projection have been proposed for extracting character lines. Here, the “surrounding rectangle integration method” shown in the above-mentioned section “Prior Art” is used. For example, when line segmentation is performed by the method shown in Japanese Patent No. 2895122, the coordinates of the line rectangle obtained by integration and the coordinates of the rectangle in the line (the circumscribed rectangle of the black pixel connected component that is the basis of integration) are the character line segmentation unit. 3 and sent to the character line determination unit 5. Note that all the row data relating to the row cut out in this step is stored in the storage unit 2.
[0024]
Next, the character line determination unit 5 that receives the line cutout result determines whether each cut out line is a line belonging to the body line or not (that is, whether it is a ruby line). As this procedure, first, one reference line is selected from all the cutout lines sent from the character line cutout unit 3, and this is determined as a criterion for determination. In selecting the reference row, the row having the widest width among all the row rectangles is set as the reference row using the coordinates of the row rectangle (S13). According to the method of defining this reference line, the width of ruby lines usually does not become wider than the width of the text line to which ruby is assigned, so if this criterion is selected, lines that can be regarded as standard text lines are displayed. Selected and no ruby line is selected. After setting the reference line, as a procedure for setting the reference value used for determination, the height value: Sh of the line having the maximum line width determined as the reference line is acquired, and half of the acquired line height value: Sh / 2 is set as a reference value for determination (S14).
Next, in the determination of each cut-out line, a line whose height: H is lower than Sh / 2 is determined as a ruby line, and the other line is determined as a body line. In the present embodiment, a process of deleting line data determined to be a ruby line is performed. Therefore, as a procedure of this process, it is determined whether or not the height H of each row is lower than Sh / 2 (S15). If it is lower than Sh / 2 (S15-YES) The ruby row data is deleted from the row data previously stored in the storage unit 2 (S16). In this ruby line determination / line data deletion process, determination is made for all the lines for each cut-out line, and therefore the processes of steps S15 and S16 are repeatedly executed for the number of lines.
After applying the ruby line determination / line data deletion process to each cut-out line, the ruby line data is deleted, and line data including information on the line rectangle and the in-line rectangle corresponding to the other body lines is passed through the line output unit 7. The data is output to a subsequent processing unit for performing character recognition processing (S17), and this processing is terminated.
[0025]
“Embodiment 2”
The present embodiment relates to character line data output (detection) processing executed by the processing system shown in FIG. The character line data output process shown here introduces a membership function with the line width and line height as variables as a rule for selecting the reference line (a line that can be regarded as a standard character line). The evaluation value as is calculated.
FIG. 5 is a diagram showing an example of the membership function, and FIG. 4 is a diagram for explaining the shape value of the cut row rectangle used as a parameter for setting the membership function.
This membership function has the following conditions (1) and (2):
(1) The wider the line width, the higher the evaluation value.
(2) The lower the line height, the higher the evaluation value.
Set according to.
Here, as a function satisfying the above (1), as shown in the example of FIG. 5A, a linear function with an evaluation value of maximum row width: MaxW as a maximum value: 1 is used.
Further, as a function satisfying the above (2), as shown in the example of FIG. 5 (B), a linear function is used in which the evaluation value of the maximum row height: MaxH is the minimum value: 0. However, when the row height is smaller than a predetermined threshold value: Thignore, so as not to select a line-only line or a noise-only minute line by mistake, or to prevent a ruby line from being selected as a reference line. The evaluation value is set to “0”. In consideration of the continuity of the membership function, the evaluation value is set to the maximum value: 1 at a height MaxH / 2 that is half the maximum row height MaxH.
Using this membership function, the evaluation value of each target row is calculated. Here, the evaluation value is calculated using the sum of the evaluation value obtained from the row height membership function and the evaluation value obtained from the row width membership function as the evaluation value for each row, and the row with the largest evaluation value as the reference row. (Refer to the description of the processing flow shown in FIG. 3 described later).
Also, based on the height as the shape value of the selected / extracted reference line, it is determined whether each extracted line is a line belonging to the body line, a ruby line, or its attribute, and is determined as a ruby line. The character line data output processing process is executed according to the procedure of deleting the line data and outputting the extracted line data for the line that has been cut.
[0026]
FIG. 3 shows a flowchart of the character line data output process of this embodiment.
Referring to FIG. 3, in the flow of the present embodiment, first, the image input unit 1 inputs a plurality of character string images to be recognized into the character line cutout unit 3 (S 21). In this input, the recognition target area data may be given together with the image. When a recognition target region is given, only the inside of the given region may be set as a row cutout target.
Next, the character line cutout unit 3 cuts out the character line (S22). The character line segmentation method is implemented by applying the “circumscribed rectangle integration method” in the same manner as described in “Embodiment 1”. The coordinates of the line rectangle obtained as a result of cutting out the character line and the coordinates of the rectangle in the line (the circumscribed rectangle of the black pixel connected component that is the basis of integration) are output from the character line cutting unit 3 and sent to the character line determination unit 5. Sent. Note that all the row data relating to the row cut out in this step is stored in the storage unit 2.
Next, the character line determination unit 5 that receives the line cutout result determines whether each cut out line is a line belonging to the body line or not (that is, whether it is a ruby line). As this procedure, first, one reference line is selected from all the cutout lines sent from the character line cutout unit 3, and this is determined as a criterion for determination.
In selecting a reference row, the above-described membership function is applied to obtain an evaluation value, and the row having the maximum evaluation value is selected as the reference row.
[0027]
FIG. 6 is a diagram for explaining the reference row selection processing. (A) in the figure shows a plurality of rows S1 to S5 to be subjected to recognition processing, and (B) and (C) in the figure are members set by the method described above (see FIGS. 4 and 5). The operation state when the membership function is applied to the ship function and the target row shown in FIG.
As a procedure for selecting a reference row, first, a membership function is set (S23). For this purpose, the maximum row width MaxW and the maximum row height MaxH are extracted from the plurality of rows S1 to S5 that are the targets of recognition processing (see FIG. 6A). A membership function for the line width (see FIG. 6B) is set using the extracted maximum line width MaxW as a parameter, and a membership function for the line height (FIG. 6C) using the extracted maximum line height MaxH as a parameter. Set the reference).
Thereafter, an evaluation value: membership value V of each target row is calculated using the set membership function, and a row having the maximum value Vmax is selected as a reference row. Therefore, first, Vmax = 0 is set, and initial conditions in this process are set (S24).
Next, a membership value V is calculated by applying a membership function to each of the target rows S1 to S5 (S25). As shown in the examples of FIGS. 6B and 6C, membership values according to the function are obtained for the row width value and the row height value of each of the target rows S1 to S5. However, here, the sum of the membership values for each of the row width value and the row height value is calculated and set as the finally obtained membership value V.
Further, since the row having the maximum value Vmax is selected, the membership value V sequentially obtained for each row is compared with the maximum value Vmax of the rows obtained so far (S26), and according to the result, that is, the maximum value Vmax. Is changed (S26-YES), the row data of the maximum value Vmax after the change (the height value of the row having the maximum row width used in the subsequent stage: Sh) is updated (S27). In this reference line selection process, determination is made for all the lines S1 to S5 for each cut-out line, and therefore the processes in steps S25 to S27 are repeatedly executed for the number of lines.
[0028]
After the reference line is determined by the reference line selection process, it is determined whether each extracted line is a line belonging to the body line (ie, whether it is a ruby line). As the reference value used for the determination, the row height value Sh as the row data of the reference row acquired in the previous step S27 is used, and half of this row height value: Sh / 2 is used as the determination reference value. Set.
In the determination of each cut-out line, a line whose height: H is lower than Sh / 2 is determined as a ruby line, and the other line is determined as a body line. In the present embodiment, a process of deleting line data determined to be a ruby line is performed. Accordingly, as a procedure of this processing, it is determined whether or not the height H of each row is lower than Sh / 2 (S28). If the height H is lower than Sh / 2 (S28-YES). The ruby row data is deleted from the row data previously stored in the storage unit 2 (S29). In this ruby line determination / line data deletion process, determination is made for all the lines for each cut-out line, and therefore the processes in steps S28 and S29 are repeatedly executed for the number of lines.
After applying the ruby line determination / line data deletion process to each cut-out line, the ruby line data is deleted, and line data including information on the line rectangle and the in-line rectangle corresponding to the other body lines is passed through the line output unit 7. The data is output to a subsequent processing unit for performing character recognition processing (S30), and this processing is terminated.
[0029]
“Embodiment 3”
The present embodiment relates to character line data output (detection) processing executed by the processing system shown in FIG. The character line data output process shown here relates to the improvement of the above-described “Embodiment 2”. The improvement is that it is possible to suppress over-detection of the ruby line, and the height of the line rectangle similar to the ruby line (H <Sh / 2 in the case of the above embodiments). It is more appropriate to consider it as a body line, not a ruby line, among those belonging to a line having a height), that is, there is a case where an adverse effect may occur if it is deleted as a ruby line. A process is added so that a line that is over-detected in can be treated as a body line. As a means for this, even if the ruby line is determined by the line height check, the distance between the previous and next lines is wider than the reference line height, that is, the ruby line cannot be clearly determined (note that In the case of original ruby lines and noise lines, the interval between the previous and next lines is often very narrow, so adding this condition does not affect the detection of ruby lines) It is assumed that processing means for excluding those that are treated as ruby lines is used.
In addition, as a rule for selecting a reference line (a line that can be regarded as a standard character line), a membership function with the line width and line height as variables is introduced, and the evaluation value as the reference line is calculated by this function. Then, it is the same as “Embodiment 2”.
[0030]
FIG. 7 shows a flowchart of the character line data output process of this embodiment.
Referring to FIG. 7, in the flow of the present embodiment, steps S31 to S37 are performed until a reference row is selected by evaluation using a membership function, and a row height value Sh possessed by the reference row is set as a reference value for ruby row determination. The processing procedure is performed in the same manner as the procedure of the “embodiment 2” described above (steps S21 to S27 in FIG. 3). Accordingly, the description of the processing procedure of steps S21 to S27 of the above-described “Embodiment 2” is referred to, and description of this processing procedure is omitted here.
Whether or not each extracted line is a line belonging to the body line after the reference line is determined by the reference line selection process (S35 to 37) using the line having the maximum evaluation value by the membership function as the reference line Or determine its attributes. In the present embodiment, this determination is performed in two stages, i.e., ruby line determination based on line height and determination based on an interval between before and after (or up and down) lines performed to correct over-detection of ruby lines.
Here, the reference value used for the ruby line determination based on the line height uses the line height value Sh as the line data possessed by the reference line acquired in the previous step S37, and is half of this line height value. : Sh / 2 is set as a reference value for determination, and a line whose height: H is lower than Sh / 2 is determined as a ruby line. In addition, the determination based on the distance between the previous and next lines is determined by comparing the distance between the previous line and the previous line (the distance between the previous line and the next line) compared to the height Sh of the reference line: Think of it as a body line.
Based on the result of the determination of each cut-out line in two stages, processing is performed to output line data of a text line or a line that is regarded as a text line, and to delete line data determined to be other ruby lines.
[0031]
Therefore, as a procedure in this processing flow, first, an interval (interval with the previous row + interval with the next row) B between each row is calculated (S38).
Next, it is determined whether or not the height H of each row is lower than half the height of the reference row Sh / 2 (S39). If the row is lower than Sh / 2 (S28-YES). Further, it is determined whether or not the interval B between the previous and subsequent rows calculated in step S38 is wider than the height Sh of the reference row (S40).
Here, when the interval B between the preceding and succeeding rows is narrower than the height Sh of the reference row (S40-YES), since it is determined that there is no over-detection, the data of this row is stored in the storage unit 2 first. The deleted row data is deleted (S41). In this ruby line determination / line data deletion process, determination is made for all the lines for each cut-out line, and therefore the processes in steps S38 to S41 are repeatedly executed for the number of lines.
After applying the ruby line determination / line data deletion process to each cut-out line, the ruby line data determined to be a ruby line with no over-detection is deleted, and the line rectangle of the other lines that are considered as body lines or body lines Then, the line data including the information of the in-line rectangle is output to the subsequent processing unit for performing the character recognition process through the line output unit 7 (S42), and this process is terminated.
[0032]
“Embodiment 4”
The present embodiment relates to character line data output (detection) processing executed by the processing system shown in FIG. The character line data output process shown here is a modification of “Embodiment 3” described above. The point to be modified is that in “Embodiment 3”, over-detection is suppressed, an obvious ruby line is determined, and line data is deleted for the determined ruby line. Without deleting the data, an output process is performed so that the data can be used in a subsequent character recognition process as data of a different system from the body line.
In this ruby line output process, information indicating that it is a ruby line is added, and the line data is output to the subsequent process through the line output unit 7. In the subsequent processing, it is possible to perform post-processing such as language processing by ignoring the ruby line by the information indicating that the ruby line is added. In addition, separately from the processing, each ruby line can be processed independently to obtain a recognition result, and finally combined with the recognition result of the main text line and output as a processing result of the character recognition device. For output, a method suitable for use, such as outputting a recognition result including a ruby portion in a format corresponding to ruby, such as RTF, may be adopted.
[0033]
FIG. 8 shows a flowchart of the character line data output process of this embodiment.
Referring to FIG. 8, in the flow of this embodiment, a reference row is selected by evaluation using a membership function, a row height value Sh possessed by the reference row is set as a reference value for ruby row determination, and The processing procedure of steps S51 to S60 until the interval B is obtained and overdetection is suppressed and an obvious ruby line is determined is the same as the above-described procedure of “Embodiment 3” (steps S31 to S40 in FIG. 7). It carries out similarly. Accordingly, the description of the processing procedure of steps S31 to S40 of the above-described “Embodiment 3” is referred to, and description of this processing procedure is omitted here.
It is determined whether or not the interval B between the preceding and succeeding rows is narrower than the height Sh of the reference row for the rows determined as ruby rows through the processing up to step S59 (S60). Is smaller than the reference line height Sh, it is determined that the ruby line is clear (no over-detection) (S60-YES). Here, with respect to the cut-out line determined to be an obvious ruby line, the determination result that the line is a clear ruby line without deleting the line data as in the above-described “embodiment 3”. It is added to the line rectangle of the line (including information on the rectangle within line) (S61).
After applying ruby line determination / line data addition processing to each cut-out line, information on the determination result is added to ruby lines determined to be obvious ruby lines, and text lines or text lines other than obvious ruby lines For the line regarded as a line, the line data including the original line rectangle information and the in-line rectangle information is output to the subsequent processing unit for performing the character recognition processing through the line output unit 7 (S62), and this processing is terminated. To do.
[0034]
“Embodiment 5”
This embodiment shows other embodiment of the character recognition apparatus concerning this invention.
An apparatus configured using a general-purpose computer is exemplified as means for executing the processing including the character line data output processing procedure shown in the above-described “Embodiment 1” to “Embodiment 4”.
Since it is implemented by a general-purpose computer, as an element, an input unit I / F for an input device such as a scanner, a keyboard, and a mouse, an auxiliary storage device such as a CPU, a storage device, and a hard disk drive, and an output device for a display It is equipped with components that a normal computer has such as an output I / F, a removable storage medium drive, a removable storage medium, a controller for communicating with other devices via a network, and these are connected by a bus (device ( System).
In addition, the above-mentioned “Embodiment 1” to “Implementation” for realizing the character string recognition (cutting out) function according to the present invention on a part of the storage medium used by the storage device and the auxiliary storage device such as a hard disk drive A program (software) for executing each processing procedure shown in the character recognition method including the character line data output processing procedure shown in “Mode 4” is recorded.
The character string image to be processed is input by reading a document with an input device such as a scanner, and is stored in, for example, a hard disk. The CPU reads a program that realizes the processing procedure described above from a recording medium included in the storage unit, executes processing according to the program on the target character string image, and outputs the processing result and the like to the display.
Note that the character recognition apparatus according to the present invention may be implemented in such a manner that a part of the functions is provided on the network by connecting to an external apparatus via a network by a network controller.
[0035]
【The invention's effect】
  The reference line can be selected more accurately from multiple character lines.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a character string recognition (cutout) processing system according to the present invention.
FIG. 2 is a flowchart of character line data output processing according to “Embodiment 1”;
FIG. 3 is a flowchart of character line data output processing according to “Embodiment 2”;
FIG. 4 is a diagram illustrating parameters for setting a membership function for obtaining a reference row.
FIG. 5 is a diagram showing an example of a membership function set using the parameters shown in FIG.
FIG. 6 is a diagram for explaining reference row selection processing based on membership values;
FIG. 7 shows a flowchart of character line data output processing according to “Embodiment 3”.
FIG. 8 is a flowchart of character line data output processing according to “Embodiment 4”;
FIG. 9 shows an example of an image to which a ruby line is added.
[Explanation of symbols]
1 ... image input unit, 2 ... storage unit,
3 ... character line segmentation unit, 5 ... character line determination unit,
7: Line output unit.

Claims

An image processing apparatus that outputs a character line from a character string image,
Means for selecting a reference line from a plurality of character lines;
Means for determining whether each character line is ruby based on the shape value of the selected reference line,
The image processing apparatus according to claim 1, wherein the selecting means selects a reference row according to an evaluation value based on a membership function having a row width and a row height as variables.

The selection means selects a reference row according to a sum of an evaluation value by a membership function having a line width as a variable and an evaluation value by a membership function having a line height as a variable. The image processing apparatus according to 1.

On the computer,
Selecting a reference line from a plurality of character lines included in the image data according to an evaluation value by a membership function using a line width and a line height as variables;
Determining whether each character line is ruby based on the shape value of the selected reference line;
An image processing program for executing

The selection step includes selecting a reference row according to a sum of an evaluation value by a membership function having a line width as a variable and an evaluation value by a membership function having a line height as a variable. 3. The image processing program described in 3.