JP2017010187A

JP2017010187A - Image processing device and image processing program

Info

Publication number: JP2017010187A
Application number: JP2015123513A
Authority: JP
Inventors: 正和福永; Masakazu Fukunaga
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2015-06-19
Filing date: 2015-06-19
Publication date: 2017-01-12

Abstract

PROBLEM TO BE SOLVED: To provide an image processing device for accurately extracting separations, compared with a technique for extracting as separations, parts which are erected by short distance from a lower side frame line of a field frame line, in extracting separations for dividing a region surrounded by rule marks.SOLUTION: The image processing device is configured so that: first extraction means of the image processing device extracts rule marks and non-rule marks from an image; second extraction means extracts the non-rule marks as separations for dividing a region surrounded by the rule marks, when positions of the non-rule marks are arranged in a regular state, on the rule marks.SELECTED DRAWING: Figure 1

Description

本発明は、画像処理装置及び画像処理プログラムに関する。 The present invention relates to an image processing apparatus and an image processing program.

特許文献１には、文字枠線の傾きや、実線、破線など枠線の線種にかかわりなく文字枠線を検出できる文字認識装置及び文字枠線の検出方法を提供することを課題とし、文字枠線検出部はフィールド枠線検出部、文字区切り線検出部及びひげ線検出部を備えており、文字枠線をその性質に応じて検出し、すなわち、フィールド枠線検出部によりフィールドを構成する文字枠線の一種であるフィールド枠線（フィールド区切り線）を検出し、文字区切り線検出部はフィールド枠線検出部によって検出されたフィールド区切り線を基準としてフィールドを構成する文字枠線の一種である文字区切り線を検出し、また、ひげ線検出部は同様にフィールド区切り線を基準としてフィールドを構成する文字枠線の一種であるひげ線を検出することが開示されている。 It is an object of Patent Document 1 to provide a character recognition device and a method for detecting a character frame line that can detect the character frame line regardless of the line type of the frame line such as the inclination of the character frame line, a solid line, or a broken line. The frame line detection unit includes a field frame line detection unit, a character delimiter line detection unit, and a whisker line detection unit, and detects a character frame line according to its property, that is, a field frame line detection unit forms a field. A field border (field separator), which is a type of character border, is detected, and the character separator detector is a type of character border that constitutes a field based on the field separator detected by the field border detector. It is disclosed that a certain character delimiter line is detected, and that the beard line detection unit similarly detects a beard line that is a type of character frame line that constitutes a field on the basis of the field delimiter line. It has been.

特許文献２には、罫線や枠などの要素をできるだけ正確に抽出し、未抽出や過抽出を抑え、手作業による図形入力を削減することを課題とし、読み込まれた帳票画像から罫線を抽出し、罫線情報から罫線枠を抽出した後、抽出された罫線枠に相当する部分の画像データを前記帳票画像から消去し、生成された枠消去画像から再び罫線を抽出し、さらに抽出された罫線に相当する部分の画像データを消去し、生成された罫線消去画像から文字列を抽出することが開示されている。 In Patent Document 2, it is an object to extract elements such as ruled lines and frames as accurately as possible, suppress unextraction and overextraction, and reduce manual figure input, and extract ruled lines from read form images. After extracting the ruled line frame from the ruled line information, the image data of the portion corresponding to the extracted ruled line frame is erased from the form image, the ruled line is extracted again from the generated frame erased image, and the extracted ruled line is further extracted. It is disclosed that image data of a corresponding portion is erased and a character string is extracted from the generated ruled line erased image.

特開２００１−１５５１１３号公報JP 2001-155113 A 特開２０００−１７２７８０号公報JP 2000-172780 A

従来技術によれば、フィールド枠線の下側の枠線から短く直立し、そのフィールド内で１文字ずつの記入領域を区画する縦線からなるひげ線を検出することが行われている。
しかし、規則的に並んでいない非罫線であっても、フィールド枠線の下側の枠線から短く直立したものであれば、区切りのひげ線として検出してしまう。
本発明は、罫線によって囲まれている領域を分割するための区切りを抽出する場合にあって、フィールド枠線の下側の枠線から短く直立したものを区切りとして抽出する技術に比べて、精度よく区切りを抽出することができるようにした画像処理装置及び画像処理プログラムを提供することを目的としている。 According to the prior art, a whisker is formed which consists of a vertical line that stands short from the lower border of the field border and divides the entry area for each character in the field.
However, even a non-ruled line that is not regularly arranged is detected as a separating beard line if it is short and upright from the lower frame line of the field frame line.
The present invention, when extracting a break for dividing a region surrounded by a ruled line, is more accurate than the technique of extracting a short upright from the lower border of the field border as a break. It is an object of the present invention to provide an image processing apparatus and an image processing program that can extract a break well.

かかる目的を達成するための本発明の要旨とするところは、次の各項の発明に存する。
請求項１の発明は、画像から罫線と非罫線を抽出する第１の抽出手段と、前記罫線における前記非罫線の位置が規則的に並んでいる場合は、該罫線によって囲まれている領域を分割するための区切りとして、該非罫線を抽出する第２の抽出手段を具備することを特徴とする画像処理装置である。 The gist of the present invention for achieving the object lies in the inventions of the following items.
According to the first aspect of the present invention, when the positions of the non-ruled lines in the ruled lines are regularly arranged, the first extraction means for extracting ruled lines and non-ruled lines from the image, the region surrounded by the ruled lines is An image processing apparatus comprising: a second extraction unit that extracts the non-ruled lines as divisions for division.

請求項２の発明は、前記第２の抽出手段は、予め定められた幅を最小幅として、前記罫線を等分に分割した位置に、前記非罫線がある場合に、該非罫線を前記区切りとして抽出することを特徴とする請求項１に記載の画像処理装置である。 According to a second aspect of the present invention, when the second extraction means includes the non-ruled line at a position where the ruled line is divided into equal parts with a predetermined width as a minimum width, the non-ruled line is used as the delimiter. The image processing apparatus according to claim 1, wherein the image processing apparatus is extracted.

請求項３の発明は、前記第２の抽出手段は、高さと幅が予め定められた範囲にある前記非罫線を対象とすることを特徴とする請求項１又は２に記載の画像処理装置である。 According to a third aspect of the present invention, in the image processing apparatus according to the first or second aspect, the second extraction unit targets the non-ruled line whose height and width are in a predetermined range. is there.

請求項４の発明は、前記第２の抽出手段は、前記第２の抽出手段によって抽出された非罫線のうち、区切りとしての大きさを超える第１の非罫線とその他の第２の非罫線に分類し、該第２の非罫線の大きさを用いて、該第１の非罫線を分割することを特徴とする請求項１から３のいずれか一項に記載の画像処理装置である。 According to a fourth aspect of the present invention, the second extracting means includes a first non-ruled line exceeding the size as a delimiter among the non-ruled lines extracted by the second extracting means and other second non-ruled lines. 4. The image processing apparatus according to claim 1, wherein the first non-ruled lines are divided using the size of the second non-ruled lines.

請求項５の発明は、前記第２の抽出手段は、罫線によって囲まれている領域内に、既に印刷された文字が含まれている場合は、該文字以外の領域を形成する罫線部分を分割するための区切りとして、非罫線を抽出することを特徴とする請求項１から４のいずれか一項に記載の画像処理装置である。 According to a fifth aspect of the present invention, when the printed character is already included in the area surrounded by the ruled line, the second extracting unit divides the ruled line part that forms an area other than the character. The image processing apparatus according to claim 1, wherein a non-ruled line is extracted as a delimiter for performing the processing.

請求項６の発明は、前記第２の抽出手段は、罫線に囲まれている領域が複数並んでいる場合は、他の領域における区切りとしての非罫線を用いて、個々の領域における区切りとしての非罫線を抽出することを特徴とする請求項１から５のいずれか一項に記載の画像処理装置である。 In a sixth aspect of the present invention, when a plurality of regions surrounded by ruled lines are arranged, the second extracting means uses a non-ruled line as a delimiter in another region as a delimiter in each region. 6. The image processing apparatus according to claim 1, wherein non-ruled lines are extracted.

請求項７の発明は、コンピュータを、画像から罫線と非罫線を抽出する第１の抽出手段と、前記罫線における前記非罫線の位置が規則的に並んでいる場合は、該罫線によって囲まれている領域を分割するための区切りとして、該非罫線を抽出する第２の抽出手段として機能させるための画像処理プログラムである。 In the invention of claim 7, the computer is surrounded by the first extracting means for extracting ruled lines and non-ruled lines from the image, and when the positions of the non-ruled lines in the ruled lines are regularly arranged. This is an image processing program for functioning as a second extraction means for extracting the non-ruled lines as a partition for dividing the existing area.

請求項１の画像処理装置によれば、罫線によって囲まれている領域を分割するための区切りを抽出する場合にあって、フィールド枠線の下側の枠線から短く直立したものを区切りとして抽出する技術に比べて、精度よく区切りを抽出することができる。 According to the image processing apparatus of claim 1, in the case of extracting a partition for dividing an area surrounded by a ruled line, a short upright from the lower frame line of the field frame line is extracted as a partition Compared with the technique to do, it can extract a delimiter with high precision.

請求項２の画像処理装置によれば、予め定められた幅を最小幅として、罫線を等分に分割した位置に、非罫線がある場合に、その非罫線を区切りとして抽出することができる。 According to the image processing apparatus of the second aspect, when there is a non-ruled line at a position where the ruled line is equally divided with the predetermined width as the minimum width, the non-ruled line can be extracted as a delimiter.

請求項３の画像処理装置によれば、高さと幅が予め定められた範囲にある非罫線を対象として、区切りを抽出することができる。 According to the image processing apparatus of the third aspect, it is possible to extract a break for a non-ruled line whose height and width are in a predetermined range.

請求項４の画像処理装置によれば、区切り以外の非罫線と区切りである非罫線が接触している場合であっても、それを分割して区切りを抽出することができる。 According to the image processing apparatus of the fourth aspect, even when a non-ruled line other than a break is in contact with a non-ruled line that is a break, the break can be extracted by dividing it.

請求項５の画像処理装置によれば、罫線によって囲まれている領域内に、既に印刷された文字が含まれている場合であっても、区切りを抽出することができる。 According to the image processing apparatus of the fifth aspect, even when a printed character is already included in the area surrounded by the ruled line, the break can be extracted.

請求項６の画像処理装置によれば、罫線に囲まれている領域が複数並んでいる場合は、他の領域における区切りとしての非罫線を用いて、個々の領域における区切りとしての非罫線を抽出することができる。 According to the image processing device of claim 6, when a plurality of regions surrounded by ruled lines are arranged, non-ruled lines as delimiters in individual regions are extracted using non-ruled lines as delimiters in other regions. can do.

請求項７の画像処理プログラムによれば、罫線によって囲まれている領域を分割するための区切りを抽出する場合にあって、フィールド枠線の下側の枠線から短く直立したものを区切りとして抽出する技術に比べて、精度よく区切りを抽出することができる。 According to the image processing program of claim 7, in the case of extracting a partition for dividing an area surrounded by a ruled line, a short upright from the lower frame line of the field frame line is extracted as a partition Compared with the technique to do, it can extract a delimiter with high precision.

本実施の形態の構成例についての概念的なモジュール構成図である。It is a conceptual module block diagram about the structural example of this Embodiment. 本実施の形態を利用したシステム構成例を示す説明図である。It is explanatory drawing which shows the system configuration example using this Embodiment. 本実施の形態において、対象とする画像の例を示す説明図である。In this Embodiment, it is explanatory drawing which shows the example of the image made into object. 本実施の形態で、罫線と非罫線を抽出した例を示す説明図である。It is explanatory drawing which shows the example which extracted the ruled line and the non-ruled line in this Embodiment. 非罫線部分を文字認識した場合の例を示す説明図である。It is explanatory drawing which shows the example at the time of character-recognizing a non-ruled line part. 本実施の形態による処理例を示すフローチャートである。It is a flowchart which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示すフローチャートである。It is a flowchart which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示すフローチャートである。It is a flowchart which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態を実現するコンピュータのハードウェア構成例を示すブロック図である。It is a block diagram which shows the hardware structural example of the computer which implement | achieves this Embodiment.

以下、図面に基づき本発明を実現するにあたっての好適な一実施の形態の例を説明する。
図１は、本実施の形態の構成例についての概念的なモジュール構成図を示している。
なお、モジュールとは、一般的に論理的に分離可能なソフトウェア（コンピュータ・プログラム）、ハードウェア等の部品を指す。したがって、本実施の形態におけるモジュールはコンピュータ・プログラムにおけるモジュールのことだけでなく、ハードウェア構成におけるモジュールも指す。それゆえ、本実施の形態は、それらのモジュールとして機能させるためのコンピュータ・プログラム（コンピュータにそれぞれの手順を実行させるためのプログラム、コンピュータをそれぞれの手段として機能させるためのプログラム、コンピュータにそれぞれの機能を実現させるためのプログラム）、システム及び方法の説明をも兼ねている。ただし、説明の都合上、「記憶する」、「記憶させる」、これらと同等の文言を用いるが、これらの文言は、実施の形態がコンピュータ・プログラムの場合は、記憶装置に記憶させる、又は記憶装置に記憶させるように制御するという意味である。また、モジュールは機能に一対一に対応していてもよいが、実装においては、１モジュールを１プログラムで構成してもよいし、複数モジュールを１プログラムで構成してもよく、逆に１モジュールを複数プログラムで構成してもよい。また、複数モジュールは１コンピュータによって実行されてもよいし、分散又は並列環境におけるコンピュータによって１モジュールが複数コンピュータで実行されてもよい。なお、１つのモジュールに他のモジュールが含まれていてもよい。また、以下、「接続」とは物理的な接続の他、論理的な接続（データの授受、指示、データ間の参照関係等）の場合にも用いる。「予め定められた」とは、対象としている処理の前に定まっていることをいい、本実施の形態による処理が始まる前はもちろんのこと、本実施の形態による処理が始まった後であっても、対象としている処理の前であれば、そのときの状況・状態に応じて、又はそれまでの状況・状態に応じて定まることの意を含めて用いる。「予め定められた値」が複数ある場合は、それぞれ異なった値であってもよいし、２以上の値（もちろんのことながら、全ての値も含む）が同じであってもよい。また、「Ａである場合、Ｂをする」という意味を有する記載は、「Ａであるか否かを判断し、Ａであると判断した場合はＢをする」の意味で用いる。ただし、Ａであるか否かの判断が不要である場合を除く。
また、システム又は装置とは、複数のコンピュータ、ハードウェア、装置等がネットワーク（一対一対応の通信接続を含む）等の通信手段で接続されて構成されるほか、１つのコンピュータ、ハードウェア、装置等によって実現される場合も含まれる。「装置」と「システム」とは、互いに同義の用語として用いる。もちろんのことながら、「システム」には、人為的な取り決めである社会的な「仕組み」（社会システム）にすぎないものは含まない。
また、各モジュールによる処理毎に又はモジュール内で複数の処理を行う場合はその処理毎に、対象となる情報を記憶装置から読み込み、その処理を行った後に、処理結果を記憶装置に書き出すものである。したがって、処理前の記憶装置からの読み込み、処理後の記憶装置への書き出しについては、説明を省略する場合がある。なお、ここでの記憶装置としては、ハードディスク、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、外部記憶媒体、通信回線を介した記憶装置、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）内のレジスタ等を含んでいてもよい。 Hereinafter, an example of a preferred embodiment for realizing the present invention will be described with reference to the drawings.
FIG. 1 shows a conceptual module configuration diagram of a configuration example of the present embodiment.
The module generally refers to components such as software (computer program) and hardware that can be logically separated. Therefore, the module in the present embodiment indicates not only a module in a computer program but also a module in a hardware configuration. Therefore, the present embodiment is a computer program for causing these modules to function (a program for causing a computer to execute each procedure, a program for causing a computer to function as each means, and a function for each computer. This also serves as an explanation of the program and system and method for realizing the above. However, for the sake of explanation, the words “store”, “store”, and equivalents thereof are used. However, when the embodiment is a computer program, these words are stored in a storage device or stored in memory. This means that control is performed so as to be stored in the apparatus. Modules may correspond to functions one-to-one, but in mounting, one module may be configured by one program, or a plurality of modules may be configured by one program, and conversely, one module May be composed of a plurality of programs. The plurality of modules may be executed by one computer, or one module may be executed by a plurality of computers in a distributed or parallel environment. Note that one module may include other modules. Hereinafter, “connection” is used not only for physical connection but also for logical connection (data exchange, instruction, reference relationship between data, etc.). “Predetermined” means that the process is determined before the target process, and not only before the process according to this embodiment starts but also after the process according to this embodiment starts. In addition, if it is before the target processing, it is used in accordance with the situation / state at that time or with the intention to be decided according to the situation / state up to that point. When there are a plurality of “predetermined values”, they may be different values, or two or more values (of course, including all values) may be the same. In addition, the description having the meaning of “do B when it is A” is used in the meaning of “determine whether or not it is A and do B when it is judged as A”. However, the case where it is not necessary to determine whether or not A is excluded.
In addition, the system or device is configured by connecting a plurality of computers, hardware, devices, and the like by communication means such as a network (including one-to-one correspondence communication connection), etc., and one computer, hardware, device. The case where it implement | achieves by etc. is also included. “Apparatus” and “system” are used as synonymous terms. Of course, the “system” does not include a social “mechanism” (social system) that is an artificial arrangement.
In addition, when performing a plurality of processes in each module or in each module, the target information is read from the storage device for each process, and the processing result is written to the storage device after performing the processing. is there. Therefore, description of reading from the storage device before processing and writing to the storage device after processing may be omitted. Here, the storage device may include a hard disk, a RAM (Random Access Memory), an external storage medium, a storage device via a communication line, a register in a CPU (Central Processing Unit), and the like.

本実施の形態である画像情報処理装置１００は、罫線によって囲まれている領域を分割するための区切りを抽出するものであって、図１の例に示すように、データ読込モジュール１１０、線分検出モジュール１２０、セル構成検出モジュール１３０、非罫線要素接続算出モジュール１４０、非線分区切り判断モジュール１５０を有している。 The image information processing apparatus 100 according to the present embodiment extracts a break for dividing a region surrounded by ruled lines. As shown in the example of FIG. A detection module 120, a cell configuration detection module 130, a non-ruled line element connection calculation module 140, and a non-line segment break determination module 150 are included.

データ読込モジュール１１０は、線分検出モジュール１２０と接続されている。データ読込モジュール１１０は、画像を読み取り、その画像を線分検出モジュール１２０へ渡す。画像を読み取るとは、例えば、スキャナー、カメラ等で画像を読み込むこと、ファックス等で通信回線を介して外部機器から画像を受信すること、ハードディスク（コンピュータに内蔵されているものの他に、ネットワークを介して接続されているもの等を含む）等に記憶されている画像を読み出すこと等が含まれる。画像は、２値画像、多値画像（カラー画像を含む）であってもよい。読み取る画像は、１枚であってもよいし、複数枚であってもよい。また、画像の内容として、例えば、帳票のようにビジネスに用いられる文書、広告宣伝用のパンフレット等であってもよい。なお、その画像の読み取り対象である原稿内には、文字枠線が印刷されている。文字枠線は、その文字枠線内に収まるように文字を、印刷又は手書きで記入させるためのものである。また、読み取られた画像の文字枠線内には、文字が印刷又は記入（以下、記入という）されていてもよいし、記入されておらず空白のままのものであってもよい。文字が記入されている場合は、例えば、画像情報処理装置１００の後処理で、その文字を認識することが行われる。また、文字が記入されていない場合は、例えば、画像情報処理装置１００の後処理で、テンプレート作成に利用される。 The data reading module 110 is connected to the line segment detection module 120. The data reading module 110 reads an image and passes the image to the line segment detection module 120. Reading an image means, for example, reading an image with a scanner, a camera, etc., receiving an image from an external device via a communication line by fax or the like, a hard disk (in addition to what is built in a computer, via a network). And the like, and the like read out the images stored in the device etc.). The image may be a binary image or a multi-value image (including a color image). One image may be read or a plurality of images may be read. Further, the content of the image may be, for example, a document used for business like a form, a pamphlet for advertisement, or the like. Note that character frame lines are printed in the original from which the image is read. The character frame line is used for entering characters by printing or handwriting so as to fit within the character frame line. In addition, characters may be printed or entered (hereinafter referred to as entry) in the character frame line of the read image, or may be left blank without being entered. When a character is entered, for example, the character is recognized in post-processing of the image information processing apparatus 100. If no character is entered, it is used for template creation in post-processing of the image information processing apparatus 100, for example.

文字枠線で囲まれている領域は、罫線によって囲まれている領域である。そして、文字の位置を規定するために、その領域を分割するための区切りが付されている場合が多い。この区切りが実線である場合は、従来技術によって区切りも含めた罫線（直線状の線分）と、その罫線以外の画素塊（非罫線、例えば、文字等）を分離することが可能である。なお、画素塊とは、４連結又は８連結で連続する画素領域である。
しかし、図３の例に示すような、上辺又は下辺の罫線から対方向（下又は上）にのびている短い線である区切り（セル区切り、ひげ線ともいわれる）がある。前述の特許文献１において「フィールド枠線の下側の枠線から短く直立したもの」と表現されているものである。なお、横書の場合は、上辺又は下辺の罫線から対方向にのびている区切りとなり、縦書の場合は、左辺又は右辺の罫線から対方向（右又は左）にのびている区切りとなる。
画像情報処理装置１００は、主に、このような区切りを対象としている。 An area surrounded by character frame lines is an area surrounded by ruled lines. In many cases, a partition for dividing the area is added to define the position of the character. When this delimiter is a solid line, it is possible to separate a ruled line (straight line segment) including the delimiter and a pixel block other than the ruled line (non-ruled line, for example, a character) by a conventional technique. Note that the pixel block is a pixel region that is continuous in four or eight connections.
However, as shown in the example of FIG. 3, there is a delimiter (also referred to as a cell delimiter or a whisker) that is a short line extending in the opposite direction (lower or upper) from the ruled line on the upper side or the lower side. In the above-mentioned Patent Document 1, it is expressed as “short upright from the lower frame line of the field frame line”. In the case of horizontal writing, the separation extends in the opposite direction from the ruled line on the upper side or the lower side, and in the case of vertical writing, the separation extends in the opposite direction (right or left) from the ruled line on the left or right side.
The image information processing apparatus 100 mainly targets such a break.

線分検出モジュール１２０は、データ読込モジュール１１０、セル構成検出モジュール１３０、非罫線要素接続算出モジュール１４０と接続されている。線分検出モジュール１２０は、線分を抽出する。この線分は、文字枠線となる罫線の候補となる。例えば、直線状の線分を抽出する従来技術を用いればよい。
セル構成検出モジュール１３０は、線分検出モジュール１２０、非罫線要素接続算出モジュール１４０、非線分区切り判断モジュール１５０と接続されている。セル構成検出モジュール１３０は、線分検出モジュール１２０によって抽出された線分を対象として、文字枠線である罫線の構成を検出する。例えば、水平、垂直な線分によって構成される罫線構造を分析し、文字枠のセル構成要素を検出する。なお、文字枠の集合である表を、分析の対象として含んでいてもよい。分析として、罫線の始点、終点（線分の端点）の座標を抽出することを含む。また、その線分の太さ、色等を抽出するようにしてもよい。 The line segment detection module 120 is connected to the data reading module 110, the cell configuration detection module 130, and the non-ruled line element connection calculation module 140. The line segment detection module 120 extracts a line segment. This line segment is a candidate for a ruled line that becomes a character frame line. For example, a conventional technique for extracting a straight line segment may be used.
The cell configuration detection module 130 is connected to the line segment detection module 120, the non-ruled line element connection calculation module 140, and the non-line segment break determination module 150. The cell configuration detection module 130 detects the configuration of ruled lines, which are character frame lines, for the line segments extracted by the line segment detection module 120. For example, a ruled line structure constituted by horizontal and vertical line segments is analyzed, and cell components of a character frame are detected. A table that is a set of character frames may be included as an analysis target. The analysis includes extracting the coordinates of the start point and end point (end point of the line segment) of the ruled line. Further, the thickness, color, etc. of the line segment may be extracted.

非罫線要素接続算出モジュール１４０は、線分検出モジュール１２０、セル構成検出モジュール１３０、非線分区切り判断モジュール１５０と接続されている。非罫線要素接続算出モジュール１４０は、セル構成検出モジュール１３０によって検出された文字枠線である罫線と、その罫線以外の画素塊である非罫線を分離する。なお、前述した文字枠内の区切りは、非罫線として抽出されることになる。
そして、罫線における非罫線の位置を算出する。つまり、罫線を座標軸として、非罫線の位置を抽出するものである。例えば、文字枠線のうち下辺の罫線をＸ座標軸（例えば、左端を０）として、非罫線の位置（Ｘ座標）を抽出すればよい。この場合、区切りの他に、文字等の画素塊である非罫線が含まれることもある。また、その罫線に接触している非罫線のみの座標を抽出してもよい。この場合も、文字等が罫線に接触していることがあるので、区切りの他に、文字等の画素塊である非罫線が含まれることもある。 The non-ruled line element connection calculation module 140 is connected to the line segment detection module 120, the cell configuration detection module 130, and the non-line segment separation determination module 150. The non-ruled line element connection calculation module 140 separates ruled lines that are character frame lines detected by the cell configuration detection module 130 and non-ruled lines that are pixel blocks other than the ruled lines. Note that the above-described breaks in the character frame are extracted as non-ruled lines.
Then, the position of the non-ruled line in the ruled line is calculated. That is, the position of the non-ruled line is extracted using the ruled line as a coordinate axis. For example, the position (X coordinate) of the non-ruled line may be extracted with the ruled line on the lower side of the character frame line as the X coordinate axis (for example, the left end is 0). In this case, a non-ruled line that is a block of pixels such as characters may be included in addition to the delimiters. Alternatively, only the coordinates of the non-ruled lines that are in contact with the ruled lines may be extracted. Also in this case, since characters or the like may be in contact with the ruled lines, non-ruled lines that are pixel blocks of characters or the like may be included in addition to the delimiters.

非線分区切り判断モジュール１５０は、セル構成検出モジュール１３０、非罫線要素接続算出モジュール１４０と接続されている。非線分区切り判断モジュール１５０は、罫線における非罫線の位置が規則的に並んでいる場合は、その罫線によって囲まれている領域を分割するための区切りとして、その非罫線を抽出する。非罫線要素接続算出モジュール１４０によって算出された罫線における非罫線の位置を用いる。その位置のうち、規則的に並んでいるものを区切りの非罫線として抽出する。
非線分区切り判断モジュール１５０は、予め定められた幅を最小幅として、罫線を等分に分割した位置に、非罫線がある場合に、その非罫線を区切りとして抽出するようにしてもよい。 The non-line segment break determination module 150 is connected to the cell configuration detection module 130 and the non-ruled line element connection calculation module 140. When the positions of the non-ruled lines in the ruled lines are regularly arranged, the non-segment delimiter determining module 150 extracts the non-ruled lines as a break for dividing the area surrounded by the ruled lines. The position of the non-ruled line in the ruled line calculated by the non-ruled line element connection calculating module 140 is used. Among the positions, the regularly arranged ones are extracted as non-ruled lines that are separated.
The non-line segment break determination module 150 may extract a non-ruled line as a delimiter when there is a non-ruled line at a position where the ruled line is divided into equal parts with a predetermined width as the minimum width.

また、非線分区切り判断モジュール１５０は、高さと幅が予め定められた範囲にある非罫線を対象とするようにしてもよい。詳細な処理について、図１２の例を用いて後述する。
また、非線分区切り判断モジュール１５０は、前述の処理によって抽出した非罫線のうち、区切りとしての大きさを超える第１の非罫線とその他の第２の非罫線に分類し、その第２の非罫線の大きさを用いて、その第１の非罫線を分割するようにしてもよい。詳細な処理について、図１３、図１４の例を用いて後述する。
また、非線分区切り判断モジュール１５０は、罫線によって囲まれている領域内に、既に印刷された文字が含まれている場合は、その文字以外の領域を形成する罫線部分を分割するための区切りとして、非罫線を抽出するようにしてもよい。詳細な処理について、図１５の例を用いて後述する。
また、非線分区切り判断モジュール１５０は、罫線に囲まれている領域が複数並んでいる場合は、他の領域における区切りとしての非罫線を用いて、個々の領域における区切りとしての非罫線を抽出するようにしてもよい。ここで「罫線に囲まれている領域が複数並んでいる場合」として、それらの領域が接触している場合（例えば、文字枠線によって表を形成している場合）であってもよいし、それらの領域が離れている場合であってもよい。また、それらの領域が接触している場合のみに適用するようにしてもよい。そして、それらの領域が離れている場合は、逆に、他の領域における区切りの位置とは独立に（無関係に）、区切りとなる非罫線を抽出してもよい。詳細な処理について、図１６の例を用いて後述する。 Further, the non-line segment break determination module 150 may target non-ruled lines whose height and width are in a predetermined range. Detailed processing will be described later using the example of FIG.
Further, the non-line segment delimitation determination module 150 classifies the non-ruled lines extracted by the above processing into a first non-ruled line exceeding the size as a delimiter and other second non-ruled lines, and the second The first non-ruled line may be divided using the size of the non-ruled line. Detailed processing will be described later using the examples in FIGS. 13 and 14.
In addition, the non-line segment delimiter determining module 150, when an already printed character is included in an area surrounded by a ruled line, is a delimiter for dividing a ruled line part that forms an area other than the character. As such, a non-ruled line may be extracted. Detailed processing will be described later using the example of FIG.
In addition, when there are a plurality of regions surrounded by ruled lines, the non-segment delimiter determination module 150 extracts non-ruled lines as delimiters in individual regions using non-ruled lines as delimiters in other regions. You may make it do. Here, “when a plurality of regions surrounded by ruled lines are arranged” may be a case where these regions are in contact (for example, when a table is formed by character frame lines), It may be a case where these areas are separated. Moreover, you may make it apply only when those area | regions are contacting. If these areas are separated from each other, conversely, a non-ruled line serving as a break may be extracted independently of the position of the break in other areas (irrelevant). Detailed processing will be described later using the example of FIG.

図２は、本実施の形態を利用したシステム構成例を示す説明図である。
図２（ａ）に示す例は、画像読取装置２１０と画像情報処理装置１００と帳票処理装置２３０によって構成したものである。
画像読取装置２１０は、画像情報処理装置１００と接続されている。画像読取装置２１０は、区切りのある文字枠が記載されている帳票を読み取るスキャナーである。
画像情報処理装置１００は、画像読取装置２１０、帳票処理装置２３０と接続されている。画像情報処理装置１００は、画像読取装置２１０によって読み取られた画像を受け付け、その文字枠から、区切りの非罫線を抽出し、文字枠内の文字部分を識別可能にする。
帳票処理装置２３０は、画像情報処理装置１００と接続されている。帳票処理装置２３０は、画像情報処理装置１００によって識別可能とされた文字枠内の文字部分を文字認識し、その結果を用いて各種の帳票処理を行う。
なお、画像情報処理装置１００は、画像読取装置２１０又は帳票処理装置２３０のいずれかと一体として構成してもよいし、画像読取装置２１０、画像情報処理装置１００、帳票処理装置２３０を一体として構成してもよい。 FIG. 2 is an explanatory diagram showing a system configuration example using the present embodiment.
The example shown in FIG. 2A is configured by the image reading device 210, the image information processing device 100, and the form processing device 230.
The image reading device 210 is connected to the image information processing device 100. The image reading device 210 is a scanner that reads a form in which character frames with delimiters are described.
The image information processing apparatus 100 is connected to an image reading apparatus 210 and a form processing apparatus 230. The image information processing apparatus 100 accepts an image read by the image reading apparatus 210, extracts a non-ruled line from the character frame, and makes it possible to identify the character portion in the character frame.
The form processing device 230 is connected to the image information processing device 100. The form processing device 230 recognizes characters in the character frame that can be identified by the image information processing apparatus 100, and performs various forms processing using the results.
Note that the image information processing apparatus 100 may be configured integrally with either the image reading apparatus 210 or the form processing apparatus 230, or the image reading apparatus 210, the image information processing apparatus 100, and the form processing apparatus 230 may be configured integrally. May be.

図２（ｂ）に示す例では、画像情報処理装置１００、画像読取装置２１０Ａ、画像読取装置２１０Ｂ、画像読取装置２１０Ｃ、帳票処理装置２３０Ａ、帳票処理装置２３０Ｂは、通信回線２９０を介してそれぞれ接続されている。通信回線２９０は、無線、有線、これらの組み合わせであってもよく、例えば、通信インフラとしてのインターネット、イントラネット等であってもよい。
ここでの画像読取装置２１０は、スキャナーであってもよいし、ファックスであってもよい。画像情報処理装置１００による機能をクラウドサービスとして実現してもよい。例えば、複数の拠点に設置された画像読取装置２１０から帳票の画像が送信され、画像情報処理装置１００が処理を行い、それらの帳票について帳票処理装置２３０が統合的な帳票処理を行うようにしてもよい。 In the example shown in FIG. 2B, the image information processing apparatus 100, the image reading apparatus 210A, the image reading apparatus 210B, the image reading apparatus 210C, the form processing apparatus 230A, and the form processing apparatus 230B are connected via a communication line 290, respectively. Has been. The communication line 290 may be wireless, wired, or a combination thereof, and may be, for example, the Internet or an intranet as a communication infrastructure.
Here, the image reading apparatus 210 may be a scanner or a fax machine. The function of the image information processing apparatus 100 may be realized as a cloud service. For example, a form image is transmitted from the image reading apparatuses 210 installed at a plurality of bases, the image information processing apparatus 100 performs processing, and the form processing apparatus 230 performs integrated form processing on these forms. Also good.

図３は、本実施の形態において、対象とする画像の例を示す説明図である。データ読込モジュール１１０が読み込んだ画像の例を示している。この例では、文字枠内には、４つの文字を記載してもらうために、区切りが上辺と下辺の罫線に３個ずつある。なお、文字は手書きの数字であり、下辺の罫線又は区切りに接続しているものがあったり、ノイズが発生したりしている。
図４は、本実施の形態で、罫線と非罫線を抽出した例を示す説明図である。線分検出モジュール１２０による処理結果を示している。図４（ａ）の例は、罫線だけを抽出したものであり、図４（ｂ）の例は、罫線以外の画素塊（非罫線）だけを抽出したものである。
なお、図４（ｂ）に示した画像をそのまま文字認識した場合（本実施の形態のセル構成検出モジュール１３０、非罫線要素接続算出モジュール１４０、非線分区切り判断モジュール１５０による処理が行われなかった場合）、文字認識結果は図５の例に示すようになる。つまり、「．３．’７．’２”’９．．」のように認識されてしまう。例えば、区切りが「．’」に、ノイズが［. ”］に、区切りと重なった数字（本来は［８］）が［９］に誤認識されている。 FIG. 3 is an explanatory diagram illustrating an example of a target image in the present embodiment. An example of an image read by the data reading module 110 is shown. In this example, there are three delimiters on the upper and lower ruled lines so that four characters are written in the character frame. Note that the characters are handwritten numbers, and some of them are connected to the ruled lines or breaks on the lower side, or noise is generated.
FIG. 4 is an explanatory diagram showing an example in which ruled lines and non-ruled lines are extracted in the present embodiment. The processing result by the line segment detection module 120 is shown. In the example of FIG. 4A, only ruled lines are extracted, and in the example of FIG. 4B, only pixel blocks other than ruled lines (non-ruled lines) are extracted.
When the image shown in FIG. 4B is recognized as it is (the processing by the cell configuration detection module 130, the non-ruled line element connection calculation module 140, and the non-segment delimiter determination module 150 of this embodiment is not performed). The character recognition result is as shown in the example of FIG. That is, “.3.'7.'2” '9. . ”Is recognized. For example, the delimiter is “. '”, The noise is [. ”], And the number that overlaps the delimiter (originally [8]) is misrecognized as [9].

図６は、本実施の形態による処理例を示すフローチャートである。具体例として、図３の例に示す画像を対象とした場合の処理例を説明する。
ステップＳ６０２では、データ読込モジュール１１０が、画像データを読み込む。図３の例に示すような画像を読み込む。
ステップＳ６０４では、線分検出モジュール１２０が、直線状の線分を抽出する。図４（ａ）の例に示すように罫線を抽出する。もちろんのことながら、罫線を抽出した結果、図４（ｂ）の例に示すように非罫線も抽出することになる。 FIG. 6 is a flowchart showing an example of processing according to this embodiment. As a specific example, a processing example when the image shown in the example of FIG. 3 is targeted will be described.
In step S602, the data reading module 110 reads image data. An image as shown in the example of FIG. 3 is read.
In step S604, the line segment detection module 120 extracts a straight line segment. Ruled lines are extracted as shown in the example of FIG. Of course, as a result of extracting ruled lines, non-ruled lines are also extracted as shown in the example of FIG.

ステップＳ６０６では、セル構成検出モジュール１３０が、セル構成要素を抽出する。図４（ａ）の例に示すように罫線を対象として、図７の例に示すように、矩形を構成する罫線７１０、罫線７２０、罫線７３０、罫線７４０を抽出する。例えば、各罫線の始点と終点の座標を抽出する。そして、この例は、１個の文字枠からなっていることを抽出する。
ステップＳ６０８では、非罫線要素接続算出モジュール１４０が、罫線と非罫線要素と接続関係を抽出する。図では、図７の例で示した罫線７２０（上辺の罫線）と罫線７４０（下辺の罫線）と、図４（ｂ）の例で示した非罫線との関係を抽出したものである。非罫線（非罫線要素８１０〜８３６）として、文字、区切り、ノイズ等が混在している。罫線７２０に接触している非罫線は非罫線要素８１０から８１８であり、罫線７４０に接触しているのは非罫線要素８２０〜８２８、非罫線要素８３２〜８３６であり、非罫線要素８３０（数字の「２」）は、接触していない。
ステップＳ６１０では、非線分区切り判断モジュール１５０が、区切り要素を抽出する。ステップＳ６１０の処理内容については、図９の例に示すフローチャートを用いて説明する。 In step S606, the cell configuration detection module 130 extracts cell components. As shown in the example of FIG. 4A, ruled lines 710, ruled lines 720, ruled lines 730, and ruled lines 740 constituting a rectangle are extracted as shown in the example of FIG. For example, the coordinates of the start point and end point of each ruled line are extracted. And this example extracts that it consists of one character frame.
In step S608, the non-ruled line element connection calculation module 140 extracts a ruled line and a non-ruled line element and a connection relationship. In the figure, the relationship between the ruled line 720 (upper ruled line) and the ruled line 740 (lower ruled line) shown in the example of FIG. 7 and the non-ruled line shown in the example of FIG. 4B is extracted. As the non-ruled lines (non-ruled line elements 810 to 836), characters, delimiters, noises, and the like are mixed. The non-ruled lines that are in contact with the ruled line 720 are non-ruled line elements 810 to 818, and the lines that are in contact with the ruled line 740 are non-ruled line elements 820 to 828 and non-ruled line elements 832 to 836. No. “2”) is not in contact.
In step S610, the non-line segment delimiter determination module 150 extracts delimiter elements. The processing content of step S610 will be described using the flowchart shown in the example of FIG.

図９は、本実施の形態（非線分区切り判断モジュール１５０）による処理例を示すフローチャートである。
ステップＳ９０２では、罫線の区間内における非罫線要素の座標を抽出する。図１０に例を示す。図１０（ａ）に示す例は、図８と同じものであり、罫線７２０と非罫線要素８１０等との関係、罫線７４０と非罫線要素８２０等との関係例を示している。図１０（ｂ）に示す例は、罫線７４０における非罫線要素８２０等の位置を示すものである。例えば、罫線７４０を、その左端を［０］とした座標軸（右端は［６００］、罫線７４０の幅Ｗ：＝６００）として、各非罫線（非罫線要素８２０等）の位置を示したものである。具体的には、非罫線要素８２０は５０、非罫線要素８２２は８０、非罫線要素８２４は１５０、非罫線要素８２６は２２５、非罫線要素８２８は３００、非罫線要素８３２は５００、非罫線要素８３４は５７０、非罫線要素８３６は５７０の位置にある。この数値列を｛Ｙｊ｝ｊ＝１〜Ｎ（この例では８）とする。なお、単位は、ピクセル（画素数）であってもよいし、ｍｍ等であってもよい。もちろんのことながら、上辺である罫線７２０に対しても同等のことを行う。 FIG. 9 is a flowchart illustrating a processing example according to the present exemplary embodiment (non-line segment break determination module 150).
In step S902, the coordinates of the non-ruled line elements in the ruled line section are extracted. An example is shown in FIG. The example shown in FIG. 10A is the same as that shown in FIG. 8, and shows an example of the relationship between the ruled line 720 and the non-ruled line element 810, and the relationship between the ruled line 740 and the non-ruled line element 820. The example shown in FIG. 10B shows the position of the non-ruled line element 820 and the like on the ruled line 740. For example, the position of each non-ruled line (non-ruled line element 820, etc.) is shown with the ruled line 740 as a coordinate axis (the right end is [600], the width W of the ruled line 740: 600) with the left end set to [0]. is there. Specifically, the non-ruled line element 820 is 50, the non-ruled line element 822 is 80, the non-ruled line element 824 is 150, the non-ruled line element 826 is 225, the non-ruled line element 828 is 300, the non-ruled line element 832 is 500, and the non-ruled line element 834 is at 570, and the non-ruled line element 836 is at 570. This numerical sequence is {Yj} j = 1 to N (8 in this example). The unit may be a pixel (number of pixels), mm or the like. Of course, the same is done for the ruled line 720 which is the upper side.

ステップＳ９０４では、規則的に並んでいる非罫線要素を抽出する。図１１に例を示す。文字幅の最小値をＴｍｉｎとする。Ｔｍｉｎは予め定められた数値であり、例えば、Ｔｍｉｎ：＝４０とする。Ｔｍｉｎと罫線７４０の長さによって、この文字枠（セル、又は罫線７２０）を分割できる最大値Ｓｍａｘを算出できる。この例ではＳｍａｘ：＝１５（＝６００／４０）となる。また、分割数の最小値をＳｍｉｎとする。Ｓｍｉｎは予め定められた数値であり、例えば、Ｓｍｉｎ：＝３とする。Ｓｍｉｎは、２以上で設定可能である。なお、文字幅の最大値を定めた後に、分割数の最小値を定めるようにしてもよい。この例では、３（Ｓｍｉｎ）〜１５（Ｓｍａｘ）の範囲で、以下の探索処理を行う。 In step S904, the non-ruled line elements regularly arranged are extracted. An example is shown in FIG. Let Tmin be the minimum character width. Tmin is a predetermined numerical value, for example, Tmin: = 40. The maximum value Smax that can divide the character frame (cell or ruled line 720) can be calculated based on Tmin and the length of the ruled line 740. In this example, Smax: = 15 (= 600/40). The minimum value of the number of divisions is Smin. Smin is a predetermined numerical value, for example, Smin: = 3. Smin can be set to 2 or more. Note that the minimum value of the number of divisions may be determined after the maximum value of the character width is determined. In this example, the following search process is performed in the range of 3 (Smin) to 15 (Smax).

ステップＳ９０２で抽出した座標列に対して、分割数の最小値（Ｓｍｉｎ）から当てはまり具合を検査する。
分割数Ｓで想定される座標値列（｛Ｘｉ（：＝ｉ＊Ｗ／Ｓ）｝ｉ＝１〜Ｓ−１）に対して、予め定められた近傍δの範囲に実データの座標値列｛Ｙｊ｝で一致する点がいくつあるかカウントする。カウント数がＳ−１個に一致するならば採用する。なお、｛Ｘｉ｝は、分割数Ｓの場合の区切りの位置を示している。実際の座標値列｛Ｙｊ｝には、種々の分割数Ｓにおける｛Ｘｉ｝に合致するものが含まれているはずであり、その合致する位置にある非罫線を区切りとして抽出すればよい。
図１１の例では、３分割した場合の｛Ｘｉ｝の位置（近傍δの範囲）には、｛Ｙｊ｝がないので、罫線７４０を３分割した区切りではないと判断されている。次に、４分割した場合の｛Ｘｉ｝の位置（近傍δの範囲）には、｛Ｙｊ｝が全てあるので、罫線７４０を４分割した区切りであり、その位置にある非罫線を区切りとして抽出する。 The degree of fit is inspected from the minimum value (Smin) of the number of divisions for the coordinate sequence extracted in step S902.
With respect to the coordinate value sequence ({Xi (: = i * W / S)} i = 1 to S-1) assumed for the division number S, the coordinate value sequence of the actual data is within a predetermined range of neighborhood δ. Count how many points match with {Yj}. If the count number matches S-1, it is adopted. Note that {Xi} indicates a delimiter position in the case of the division number S. The actual coordinate value sequence {Yj} should include those that match {Xi} in various division numbers S, and a non-ruled line at the matching position may be extracted as a delimiter.
In the example of FIG. 11, since {Yj} does not exist at the position of {Xi} (range of neighborhood δ) when it is divided into three, it is determined that the ruled line 740 is not divided into three. Next, since there are all {Yj} at the position of {Xi} (range of neighborhood δ) when dividing into four, the ruled line 740 is divided into four parts, and the non-ruled line at that position is extracted as a part To do.

なお、前述の例では、探索順序として、分割数が少ない方から順に行ったが、分割数が多い方から順に行ってもよいし、予め定められた分割数（罫線の長さに基づいて定めた分割数であって、具体的には、罫線の長さを一般的な文字幅で除算した値等）から探索を始めるようにしてもよい。
また、前述の例では、カウント数がＳ−１個に一致したものを採用したが、全て探索した後に、一致したものの数が最も多いものを採用するようにしてもよい。
また、ここでの対象とする罫線は、横書の場合は、上辺又は下辺の罫線、縦書の場合は、左辺又は右辺の罫線としてもよい。 In the above-described example, the search order is performed in order from the smallest number of divisions, but may be performed in order from the largest number of divisions, or may be performed in order from the largest number of divisions (determined based on the length of ruled lines). The search may be started from the division number (specifically, a value obtained by dividing the length of the ruled line by the general character width).
Further, in the above-described example, the one with the count number matching S-1 is adopted, but after searching all, the one with the largest number of matches may be adopted.
Further, the ruled lines used here may be the ruled lines on the upper side or the lower side in the case of horizontal writing, and the ruled lines on the left side or the right side in the case of vertical writing.

図１２は、本実施の形態による処理例を示す説明図である。文字枠を構成している罫線（前述の例では、罫線７１０〜７４０）と接続した非罫線を、サイズチェックを行い、条件を満たさないものは削除する。なお、ここで非罫線のサイズチェックにおける幅とは罫線の方向（図１１の例では横方向）をいい、高さとは非罫線の罫線との接続方向（図１１の例では上方向）をいうものとする。
区切り（つまり、ひげ線状の突起）と認められない要素を以下のような処理例で削除する。
図１２の例に示すように、幅、高さに関する２次元空間での範囲内を満たす（図１２の例では、色の濃い部分の領域）ならば削除する。この例は、非罫線の幅が小さいものは基本的に区切り候補として残して、幅が大きいほど区切り候補ではないと判断するようにしたものである。例えば、スキャンの際、又はファックス送信等で発生する間延びした雑音ランレングスを区切り候補に入れない効果が期待できる。 FIG. 12 is an explanatory diagram showing a processing example according to the present embodiment. The size of the non-ruled lines connected to the ruled lines constituting the character frame (ruled lines 710 to 740 in the above example) is checked, and those that do not satisfy the conditions are deleted. Here, the width in the size check of the non-ruled lines refers to the direction of the ruled lines (horizontal direction in the example of FIG. 11), and the height refers to the connection direction with the ruled lines of the non-ruled lines (upward in the example of FIG. 11). Shall.
Elements that are not recognized as delimiters (that is, bearded protrusions) are deleted in the following processing example.
As shown in the example of FIG. 12, if the range in the two-dimensional space relating to the width and height is satisfied (in the example of FIG. 12, the region is a dark portion), it is deleted. In this example, a non-ruled line with a small width is basically left as a delimiter candidate, and it is determined that a larger width is not a delimiter candidate. For example, it can be expected that the extended noise run length generated during scanning or fax transmission or the like is not included as a candidate for delimiter.

さらに、非罫線の位置が規則的に並んでいる場合に、その非罫線が同じサイズ等であるか否かを判断するようにしてもよい。つまり、サイズ等が揃っている場合は、区切りとして採用する。そして、逸脱した非罫線に対しては補正を行うようにしてもよい。図１３、図１４の例を用いて説明する。
図１４は、本実施の形態による処理例を示す説明図である。前述したように、この例では、４分割であり、非罫線要素８２４、非罫線要素８２８、非罫線要素８３２が区切りとして抽出されている。しかし、非罫線要素８３２は、数字「８」と区切りが接触しており、非罫線要素８３２そのものを区切りとして抽出してしまうと、後の文字認識等の処理において、数字「８」が認識対象とならない等の不具合が発生することになる。また、非罫線要素８３２そのものを文字認識してしまうと、数字「８」と区切りが接触しているので、誤認識してしまうことがある。
なお、図１４内の外接矩形（外接矩形１４２４、外接矩形１４２８、外接矩形１４３２）は、見やすくするために、実際の外接矩形よりも大きく描画している。 Further, when the positions of the non-ruled lines are regularly arranged, it may be determined whether the non-ruled lines have the same size or the like. In other words, when the size is uniform, it is adopted as a delimiter. Then, the deviating non-ruled line may be corrected. This will be described with reference to the examples of FIGS.
FIG. 14 is an explanatory diagram showing a processing example according to the present embodiment. As described above, in this example, there are four divisions, and the non-ruled line element 824, the non-ruled line element 828, and the non-ruled line element 832 are extracted as delimiters. However, the delimiter of the non-ruled line element 832 is in contact with the number “8”, and if the non-ruled line element 832 itself is extracted as a delimiter, the number “8” is recognized as a recognition target in subsequent processing such as character recognition. Inconveniences such as not becoming will occur. Further, if the non-ruled line element 832 itself is recognized as a character, it may be erroneously recognized because the number “8” is in contact with the break.
Note that the circumscribed rectangles (circumscribed rectangle 1424, circumscribed rectangle 1428, circumscribed rectangle 1432) in FIG. 14 are drawn larger than the actual circumscribed rectangle for easy viewing.

図１３は、本実施の形態（非線分区切り判断モジュール１５０）による処理例を示すフローチャートである。
ステップＳ１３０２では、区切りと認識された部分画像のサイズを計測して、そのサイズの同形性、そもそもの点又は線分らしさを算出する。例えば、図１４に示すように、非罫線要素８２４の外接矩形１４２４、非罫線要素８２８の外接矩形１４２８、非罫線要素８３２の外接矩形１４３２のサイズを計測する。サイズの同形性として、例えば、外接矩形の幅、外接矩形の高さ、又は、外接矩形の幅と高さの比率等の差分が予め定められた範囲内にあることをいう。そもそもの点又は線分らしさとは、予め定められた外接矩形の幅、外接矩形の高さ、又は、外接矩形の幅と高さの比率等が予め定められた範囲内にあることをいう。
ステップＳ１３０４では、想定外のサイズのものを区分する。つまり、サイズの同形性を満たさない非罫線、そもそもの点又は線分らしさを満たさない非罫線を抽出する。 FIG. 13 is a flowchart illustrating a processing example according to the present exemplary embodiment (non-line segment break determination module 150).
In step S1302, the size of the partial image recognized as a break is measured, and the isomorphism of the size, the original point or the line segmentality is calculated. For example, as shown in FIG. 14, the sizes of the circumscribed rectangle 1424 of the non-ruled line element 824, the circumscribed rectangle 1428 of the non-ruled line element 828, and the circumscribed rectangle 1432 of the non-ruled line element 832 are measured. As the isomorphism of the size, for example, a difference such as a width of the circumscribed rectangle, a height of the circumscribed rectangle, or a ratio of the width and the height of the circumscribed rectangle is within a predetermined range. Originally, the point or line segmentation means that the width of the circumscribed rectangle, the height of the circumscribed rectangle, the ratio of the width and height of the circumscribed rectangle, and the like are within a predetermined range.
In step S1304, an unexpected size is classified. That is, a non-ruled line that does not satisfy the isomorphism of the size, or a non-ruled line that does not satisfy the original point or line segmentation is extracted.

ステップＳ１３０６では、予め定められた閾値の範囲でクラスタ化を行う。例えば、予め定められた範囲内にある非罫線の外接矩形の幅と高さでの２次元空間におけるクラスタリングを行う。また、サイズ差によるクラスタリングを行うようにしてもよい。
ステップＳ１３０８では、小さいサイズのクラスタを区切りの候補として決定する。
ステップＳ１３１０では、その他のクラスタに所属する要素を区切り候補のサイズと同等になるように分解する。つまり、区切りとその他の文字部に分けることを行う。具体的には、区切りの候補としてのクラスタ内にある非罫線のサイズ、位置関係（実際に非罫線がある位置（Ｙｉ）と本来区切りがある位置（Ｘｉ）との差分）を用いて、想定外のサイズの非罫線から、その位置関係にある画素塊を区切りの候補のサイズの分だけ除外する処理を行えばよい。図１４の例では、Ｘ１とＹ３（非罫線要素８２４の位置）との位置関係、Ｘ２とＹ５（非罫線要素８２８の位置）との位置関係と同じ位置関係になるように、Ｘ３とＹ６（非罫線要素８３２の位置）の位置関係にあって、外接矩形１４３２内から外接矩形１４２４等と同じ大きさの部分を削除することによって、本来の数字の「８」の画素塊を抽出する。 In step S1306, clustering is performed within a predetermined threshold range. For example, clustering is performed in a two-dimensional space with the width and height of a circumscribed rectangle of a non-ruled line within a predetermined range. Further, clustering by size difference may be performed.
In step S1308, a small-sized cluster is determined as a delimiter candidate.
In step S1310, the elements belonging to other clusters are decomposed so as to be equal to the size of the delimiter candidates. That is, it is divided into a separator and other character parts. Specifically, it is assumed using the size and positional relationship of non-ruled lines in a cluster as a delimiter candidate (difference between the position (Yi) where the non-ruled line is actually present and the position (Xi) where the delimiter is actually present). A process of excluding the pixel blocks in the positional relationship by the size of the candidate for separation from the non-ruled lines of the outer size may be performed. In the example of FIG. 14, the positional relationship between X1 and Y3 (the position of the non-ruled line element 824) and the positional relationship between X2 and Y5 (the position of the non-ruled line element 828) are the same as X3 and Y6 ( The pixel block of the original number “8” is extracted by deleting a portion having the same positional relationship as the non-ruled line element 832 and having the same size as the circumscribed rectangle 1424 from the circumscribed rectangle 1432.

図１５は、本実施の形態による処理例を示す説明図である。図１５（ａ）の例に示すように、文字枠内にプレプリント文字列（例えば、「価格」、「生年月日」、「名前」等のように文字枠の名称又は文字枠内に記載される情報の名称）が含まれる場合、その文字列の領域を除いた空間（文字が記載される領域）を等分割すると仮定して、区切りの非罫線であるか否かを判断するようにしてもよい。
文字枠内にプレプリント文字列が含まれている場合、その文字列が記入不可エリアとして暗黙に表現されることがある。これらの場合、前述の処理では対応しきれない。
そこで、プレプリント文字列が検出された場合に、以下の処理を行う。
（１）プレプリント部を検出する。
文字枠内で予め定められたサイズ以上の非罫線がある領域を、プレプリント部として抽出する。図１５（ｂ）の例に示すように、プレプリント部１５１０が抽出されることとなる。
（２）プレプリント部の右側に余白を持たせて右端を設定する。
プレプリント部の右側に予め定められた幅の余白を加えて、プレプリント部を拡張する。図１５（ｂ）の例に示すように、プレプリント枠の右端１５２０が設定される。プレプリント枠の右端１５２０の位置に、縦罫線があると仮定したことになる。
（３）残った領域で、区切りの抽出処理を行う。
図１５（ｂ）の例では、プレプリント以外の領域１５３０を対象として、区切りの抽出処理を行う。
なお、この処理例は、未記入帳票のレイアウト登録時を想定したものである。既に、手書きによる記載がある場合は、プレプリント部と手書きによる記載とを区別する処理を行えばよい。例えば、活字文字と手書き文字の区別処理を行えばよい。具体的には、プレプリント文字が既知である場合は、その文字とのパターンマッチング等によってプレプリント部を区別するようにしてもよい。 FIG. 15 is an explanatory diagram showing a processing example according to the present embodiment. As shown in the example of FIG. 15A, a preprint character string (for example, “price”, “birth date”, “name”, etc.) is written in the character frame name or character frame. Name), it is assumed that the space excluding the character string area (the area where the character is described) is equally divided and whether it is a non-ruled line or not. May be.
When a preprint character string is included in the character frame, the character string may be implicitly expressed as an unfillable area. In these cases, the above-described processing cannot cope with them.
Therefore, when a preprint character string is detected, the following processing is performed.
(1) The preprint portion is detected.
An area having a non-ruled line having a size larger than a predetermined size in the character frame is extracted as a preprint portion. As shown in the example of FIG. 15B, the preprint unit 1510 is extracted.
(2) Set the right edge with a margin on the right side of the preprint section.
The preprint portion is expanded by adding a margin of a predetermined width to the right side of the preprint portion. As shown in the example of FIG. 15B, the right end 1520 of the preprint frame is set. It is assumed that there is a vertical ruled line at the position of the right end 1520 of the preprint frame.
(3) The separation extraction process is performed in the remaining area.
In the example of FIG. 15B, the separation extraction process is performed on the area 1530 other than the preprint.
Note that this processing example assumes that the layout of a blank form has been registered. If there is already a handwritten description, a process for distinguishing between the preprint portion and the handwritten description may be performed. For example, a distinction process between printed characters and handwritten characters may be performed. Specifically, when a preprint character is known, the preprint portion may be distinguished by pattern matching with the character.

図１６は、本実施の形態による処理例を示す説明図である。表形式の場合、複数の行又は列で区切りの候補が検出できた場合に、それぞれの行／列で検出できた区切りを総合的に判断するようにしてもよい。
表形式文書の場合、図１６の例に示すように、多段で区切りの入った行が設計されている場合がある。この場合、上下に関連したセルで同様の区切りによる記入欄があることが多い。
そこで、各文字枠内での区切り抽出の結果をそのまま利用するのではなく、その結果から好ましい分割数を多数行の文字枠で統合的に判断する。これによって、正確性が増すこととなる。
各行において前述の区切り抽出を行う。ただし、ここでの抽出結果は、区切り候補としてであって、最終的な区切り抽出の結果ではない。例えば、行１６１０はｎ１分割が候補であり、行１６２０はｎ２分割が候補であり、行１６３０はｎｉ分割が候補であり、行１６４０はｎＮ分割が候補である。
そして、分割数の多数決によって分割数を決定するようにしてもよい。また、分割数の決定方法は、比率等を用いた確率計算を行ってもよい。
また、分割数だけでなく、分割位置（区切りの位置）、区切りのサイズについても、全ての文字枠で統一するようにしてもよい。 FIG. 16 is an explanatory diagram illustrating a processing example according to the present embodiment. In the case of the tabular format, when a candidate for a break can be detected in a plurality of rows or columns, the break that can be detected in each row / column may be comprehensively determined.
In the case of a tabular document, as shown in the example of FIG. 16, there are cases where a multi-stage delimited line is designed. In this case, there are many entry fields with similar divisions in the upper and lower related cells.
Therefore, the result of segmentation extraction within each character frame is not used as it is, but the preferred number of divisions is comprehensively determined from the result of the character frame of many lines. This increases the accuracy.
The above-described delimiter extraction is performed on each line. However, the extraction result here is a delimiter candidate and not the final delimiter extraction result. For example, row 1610 is a candidate for n1 division, row 1620 is a candidate for n2 division, row 1630 is a candidate for ni division, and row 1640 is a candidate for nN division.
Then, the division number may be determined by majority decision of the division number. Further, as a method for determining the number of divisions, probability calculation using a ratio or the like may be performed.
Further, not only the number of divisions, but also the division position (separation position) and the division size may be unified for all character frames.

図１７を参照して、本実施の形態の画像処理装置のハードウェア構成例について説明する。図１７に示す構成は、例えばパーソナルコンピュータ（ＰＣ）等によって構成されるものであり、スキャナー等のデータ読み取り部１７１７と、プリンタ等のデータ出力部１７１８を備えたハードウェア構成例を示している。 A hardware configuration example of the image processing apparatus according to the present embodiment will be described with reference to FIG. The configuration shown in FIG. 17 is configured by a personal computer (PC), for example, and shows a hardware configuration example including a data reading unit 1717 such as a scanner and a data output unit 1718 such as a printer.

ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１７０１は、前述の実施の形態において説明した各種のモジュール、すなわち、データ読込モジュール１１０、線分検出モジュール１２０、セル構成検出モジュール１３０、非罫線要素接続算出モジュール１４０、非線分区切り判断モジュール１５０等の各モジュールの実行シーケンスを記述したコンピュータ・プログラムにしたがった処理を実行する制御部である。 A CPU (Central Processing Unit) 1701 includes various modules described in the above-described embodiments, that is, a data reading module 110, a line segment detection module 120, a cell configuration detection module 130, a non-ruled line element connection calculation module 140, a non-line It is a control part which performs the process according to the computer program which described the execution sequence of each module, such as the division | segmentation judgment module 150. FIG.

ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１７０２は、ＣＰＵ１７０１が使用するプログラムや演算パラメータ等を格納する。ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１７０３は、ＣＰＵ１７０１の実行において使用するプログラムや、その実行において適宜変化するパラメータ等を格納する。これらはＣＰＵバス等から構成されるホストバス１７０４により相互に接続されている。 A ROM (Read Only Memory) 1702 stores programs used by the CPU 1701, calculation parameters, and the like. A RAM (Random Access Memory) 1703 stores programs used in the execution of the CPU 1701, parameters that change as appropriate in the execution, and the like. These are connected to each other by a host bus 1704 including a CPU bus or the like.

ホストバス１７０４は、ブリッジ１７０５を介して、ＰＣＩ（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔ／Ｉｎｔｅｒｆａｃｅ）バス等の外部バス１７０６に接続されている。 The host bus 1704 is connected to an external bus 1706 such as a PCI (Peripheral Component Interconnect / Interface) bus via a bridge 1705.

キーボード１７０８、マウス等のポインティングデバイス１７０９は、操作者により操作される入力デバイスである。ディスプレイ１７１０は、液晶表示装置又はＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）等があり、各種情報をテキストやイメージ情報として表示する。 A keyboard 1708 and a pointing device 1709 such as a mouse are input devices operated by an operator. The display 1710 includes a liquid crystal display device or a CRT (Cathode Ray Tube), and displays various types of information as text or image information.

ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）１７１１は、ハードディスク（フラッシュメモリ等であってもよい）を内蔵し、ハードディスクを駆動し、ＣＰＵ１７０１によって実行するプログラムや情報を記録又は再生させる。ハードディスクには、読み込まれた画像データ、罫線を示すデータ、非罫線を示すデータ等が格納される。さらに、その他の各種データ、各種コンピュータ・プログラム等が格納される。 An HDD (Hard Disk Drive) 1711 includes a hard disk (may be a flash memory or the like), drives the hard disk, and records or reproduces a program executed by the CPU 1701 and information. The hard disk stores read image data, data indicating ruled lines, data indicating non-ruled lines, and the like. Further, various other data, various computer programs, and the like are stored.

ドライブ１７１２は、装着されている磁気ディスク、光ディスク、光磁気ディスク、又は半導体メモリ等のリムーバブル記録媒体１７１３に記録されているデータ又はプログラムを読み出して、そのデータ又はプログラムを、インタフェース１７０７、外部バス１７０６、ブリッジ１７０５、及びホストバス１７０４を介して接続されているＲＡＭ１７０３に供給する。リムーバブル記録媒体１７１３も、ハードディスクと同様のデータ記録領域として利用可能である。 The drive 1712 reads out data or a program recorded on a removable recording medium 1713 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, and the data or program is read out as an interface 1707 and an external bus 1706. , The bridge 1705, and the RAM 1703 connected via the host bus 1704. The removable recording medium 1713 can also be used as a data recording area similar to the hard disk.

接続ポート１７１４は、外部接続機器１７１５を接続するポートであり、ＵＳＢ、ＩＥＥＥ１３９４等の接続部を持つ。接続ポート１７１４は、インタフェース１７０７、及び外部バス１７０６、ブリッジ１７０５、ホストバス１７０４等を介してＣＰＵ１７０１等に接続されている。通信部１７１６は、通信回線に接続され、外部とのデータ通信処理を実行する。データ読み取り部１７１７は、例えばスキャナーであり、ドキュメントの読み取り処理を実行する。データ出力部１７１８は、例えばプリンタであり、ドキュメントデータの出力処理を実行する。 The connection port 1714 is a port for connecting the external connection device 1715 and has a connection unit such as USB or IEEE1394. The connection port 1714 is connected to the CPU 1701 and the like via an interface 1707, an external bus 1706, a bridge 1705, a host bus 1704, and the like. The communication unit 1716 is connected to a communication line and executes data communication processing with the outside. The data reading unit 1717 is a scanner, for example, and executes document reading processing. The data output unit 1718 is a printer, for example, and executes document data output processing.

なお、図１７に示す画像処理装置のハードウェア構成は、１つの構成例を示すものであり、本実施の形態は、図１７に示す構成に限らず、本実施の形態において説明したモジュールを実行可能な構成であればよい。例えば、一部のモジュールを専用のハードウェア（例えば特定用途向け集積回路（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ：ＡＳＩＣ）等）で構成してもよく、一部のモジュールは外部のシステム内にあり通信回線で接続しているような形態でもよく、さらに図１７に示すシステムが複数互いに通信回線によって接続されていて互いに協調動作するようにしてもよい。また、特に、パーソナルコンピュータの他、携帯情報通信機器（携帯電話、スマートフォン、モバイル機器、ウェアラブルコンピュータ等を含む）、情報家電、ロボット、複写機、ファックス、スキャナー、プリンタ、複合機（スキャナー、プリンタ、複写機、ファックス等のいずれか２つ以上の機能を有している画像処理装置）などに組み込まれていてもよい。 Note that the hardware configuration of the image processing apparatus illustrated in FIG. 17 illustrates one configuration example, and the present embodiment is not limited to the configuration illustrated in FIG. 17, and the modules described in the present embodiment are executed. Any configuration is possible. For example, some modules may be configured with dedicated hardware (for example, Application Specific Integrated Circuit (ASIC), etc.), and some modules are in an external system and connected via a communication line Alternatively, a plurality of the systems shown in FIG. 17 may be connected to each other via communication lines so as to cooperate with each other. In particular, in addition to personal computers, portable information communication devices (including mobile phones, smartphones, mobile devices, wearable computers, etc.), information appliances, robots, copiers, fax machines, scanners, printers, multifunction devices (scanners, printers, An image processing apparatus having two or more functions such as a copying machine and a fax machine) may be incorporated.

また、前述の実施の形態の説明において、予め定められた値との比較において、「以上」、「以下」、「より大きい」、「より小さい（未満）」としたものは、その組み合わせに矛盾が生じない限り、それぞれ「より大きい」、「より小さい（未満）」、「以上」、「以下」としてもよい。 Further, in the description of the above-described embodiment, “more than”, “less than”, “greater than”, and “less than (less than)” in a comparison with a predetermined value contradicts the combination. As long as the above does not occur, “larger”, “smaller (less than)”, “more than”, and “less than” may be used.

なお、説明したプログラムについては、記録媒体に格納して提供してもよく、また、そのプログラムを通信手段によって提供してもよい。その場合、例えば、前記説明したプログラムについて、「プログラムを記録したコンピュータ読み取り可能な記録媒体」の発明として捉えてもよい。
「プログラムを記録したコンピュータ読み取り可能な記録媒体」とは、プログラムのインストール、実行、プログラムの流通等のために用いられる、プログラムが記録されたコンピュータで読み取り可能な記録媒体をいう。
なお、記録媒体としては、例えば、デジタル・バーサタイル・ディスク（ＤＶＤ）であって、ＤＶＤフォーラムで策定された規格である「ＤＶＤ−Ｒ、ＤＶＤ−ＲＷ、ＤＶＤ−ＲＡＭ等」、ＤＶＤ＋ＲＷで策定された規格である「ＤＶＤ＋Ｒ、ＤＶＤ＋ＲＷ等」、コンパクトディスク（ＣＤ）であって、読出し専用メモリ（ＣＤ−ＲＯＭ）、ＣＤレコーダブル（ＣＤ−Ｒ）、ＣＤリライタブル（ＣＤ−ＲＷ）等、ブルーレイ・ディスク（Ｂｌｕ−ｒａｙ（登録商標）Ｄｉｓｃ）、光磁気ディスク（ＭＯ）、フレキシブルディスク（ＦＤ）、磁気テープ、ハードディスク、読出し専用メモリ（ＲＯＭ）、電気的消去及び書換可能な読出し専用メモリ（ＥＥＰＲＯＭ（登録商標））、フラッシュ・メモリ、ランダム・アクセス・メモリ（ＲＡＭ）、ＳＤ（ＳｅｃｕｒｅＤｉｇｉｔａｌ）メモリーカード等が含まれる。
そして、前記のプログラム又はその一部は、前記記録媒体に記録して保存や流通等させてもよい。また、通信によって、例えば、ローカル・エリア・ネットワーク（ＬＡＮ）、メトロポリタン・エリア・ネットワーク（ＭＡＮ）、ワイド・エリア・ネットワーク（ＷＡＮ）、インターネット、イントラネット、エクストラネット等に用いられる有線ネットワーク、又は無線通信ネットワーク、さらにこれらの組み合わせ等の伝送媒体を用いて伝送させてもよく、また、搬送波に乗せて搬送させてもよい。
さらに、前記のプログラムは、他のプログラムの一部分であってもよく、又は別個のプログラムと共に記録媒体に記録されていてもよい。また、複数の記録媒体に分割して記録されていてもよい。また、圧縮や暗号化等、復元可能であればどのような態様で記録されていてもよい。 The program described above may be provided by being stored in a recording medium, or the program may be provided by communication means. In that case, for example, the above-described program may be regarded as an invention of a “computer-readable recording medium recording the program”.
The “computer-readable recording medium on which a program is recorded” refers to a computer-readable recording medium on which a program is recorded, which is used for program installation, execution, program distribution, and the like.
The recording medium is, for example, a digital versatile disc (DVD), which is a standard established by the DVD Forum, such as “DVD-R, DVD-RW, DVD-RAM,” and DVD + RW. Standard “DVD + R, DVD + RW, etc.”, compact disc (CD), read-only memory (CD-ROM), CD recordable (CD-R), CD rewritable (CD-RW), Blu-ray disc ( Blu-ray (registered trademark) Disc), magneto-optical disk (MO), flexible disk (FD), magnetic tape, hard disk, read-only memory (ROM), electrically erasable and rewritable read-only memory (EEPROM (registered trademark)) )), Flash memory, Random access memory (RAM) SD (Secure Digital) memory card and the like.
The program or a part of the program may be recorded on the recording medium for storage or distribution. Also, by communication, for example, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), a wired network used for the Internet, an intranet, an extranet, or a wireless communication It may be transmitted using a transmission medium such as a network or a combination of these, or may be carried on a carrier wave.
Furthermore, the program may be a part of another program, or may be recorded on a recording medium together with a separate program. Moreover, it may be divided and recorded on a plurality of recording media. Further, it may be recorded in any manner as long as it can be restored, such as compression or encryption.

１００…画像情報処理装置
１１０…データ読込モジュール
１２０…線分検出モジュール
１３０…セル構成検出モジュール
１４０…非罫線要素接続算出モジュール
１５０…非線分区切り判断モジュール
２１０…画像読取装置
２３０…帳票処理装置
２９０…通信回線 DESCRIPTION OF SYMBOLS 100 ... Image information processing apparatus 110 ... Data reading module 120 ... Line segment detection module 130 ... Cell structure detection module 140 ... Non-ruled line element connection calculation module 150 ... Non-line segment delimitation judgment module 210 ... Image reading apparatus 230 ... Form processing apparatus 290 ... communication line

Claims

First extraction means for extracting ruled lines and non-ruled lines from the image;
When the positions of the non-ruled lines in the ruled lines are regularly arranged, second extraction means for extracting the non-ruled lines is provided as a partition for dividing the area surrounded by the ruled lines. An image processing apparatus.

The second extracting means extracts the non-ruled line as the delimiter when the non-ruled line is present at a position where the ruled line is equally divided with a predetermined width as a minimum width. The image processing apparatus according to claim 1.

The image processing apparatus according to claim 1, wherein the second extraction unit targets the non-ruled line whose height and width are in a predetermined range.

The second extracting means classifies the non-ruled lines extracted by the second extracting means into a first non-ruled line exceeding the size as a delimiter and other second non-ruled lines, and the second The image processing apparatus according to any one of claims 1 to 3, wherein the first non-ruled line is divided by using the size of the non-ruled line.

The second extracting means, when a printed character is already included in the area surrounded by the ruled line, is used as a delimiter for dividing the ruled line part forming the area other than the character. The image processing apparatus according to claim 1, wherein a ruled line is extracted.

The second extracting means extracts a non-ruled line as a delimiter in each area using a non-ruled line as a delimiter in another area when a plurality of areas surrounded by the ruled line are arranged. The image processing apparatus according to claim 1, wherein the image processing apparatus is characterized.

Computer
First extraction means for extracting ruled lines and non-ruled lines from the image;
When the positions of the non-ruled lines in the ruled lines are regularly arranged, an image for functioning as a second extracting unit for extracting the non-ruled lines as a partition for dividing the area surrounded by the ruled lines Processing program.