JP2023003887A

JP2023003887A - Document image processing system, document image processing method, and document image processing program

Info

Publication number: JP2023003887A
Application number: JP2021105251A
Authority: JP
Inventors: 福光齊藤; Fukumitsu Saito
Original assignee: Net Smile Inc
Current assignee: Net Smile Inc
Priority date: 2021-06-24
Filing date: 2021-06-24
Publication date: 2023-01-17

Abstract

To accurately identify an attribute of a cell in a table without using template data.SOLUTION: A table detection unit 22 detects a table in a document image and a cell in the table, and generates cell geometry data for the cell. A text object detection unit 23 detects a text object in the document image and generates text object geometry data for the text object. A cell attribute identification unit 25 identifies the text object in the cell based on the cell geometry data and the text object geometry data, generates, for each cell, node data including the cell geometry data, the text object geometry data of the text object in the cell, and text data of the text object in the cell, and executes predetermined classification processing on a node data set including the node data corresponding to the table to identify an attribute of the cell.SELECTED DRAWING: Figure 1

Description

本発明は、書類画像処理システム、書類画像処理方法、および書類画像処理プログラムに関するものである。 The present invention relates to a document image processing system, a document image processing method, and a document image processing program.

ある帳票識別システムでは、帳票フォーマットテーブルが予めユーザにより作成されており、帳票フォーマットテーブルには、ユーザにより指定された文字認識対象領域の位置、サイズ、文字種などを示すフィールド情報が含まれている。そして、この帳票フォーマット（つまり、フィールド情報）に基づいて、帳票画像の画像データから、帳票内の文字情報（テキストデータ）が取得されている（例えば特許文献１参照）。 In a certain form identification system, a form format table is created in advance by a user, and the form format table contains field information indicating the position, size, character type, etc. of a character recognition target area specified by the user. Character information (text data) in the form is obtained from the image data of the form image based on the form format (that is, field information) (see, for example, Japanese Unexamined Patent Application Publication No. 2002-100002).

ある画像認識装置は、対象画像から部分画像を切り出して、部分画像における文字および数字を認識し、その文字および数字から所定の条件を満たす文字および数字を抽出する抽出処理を実行している（例えば特許文献２参照）。抽出処理において、その画像認識装置は、例えば、認識した文字が、予め設定されている所定の銀行名を含むか否かを判定し、その文字が所定の銀行名を含む場合、その文字とその文字から所定距離内の数字を、銀行名および口座番号の対として抽出している。 A certain image recognition device cuts out a partial image from a target image, recognizes characters and numbers in the partial image, and executes extraction processing to extract characters and numbers that satisfy a predetermined condition from the characters and numbers (for example, See Patent Document 2). In the extraction process, the image recognition device, for example, determines whether or not the recognized characters include a predetermined bank name, and if the characters include the predetermined bank name, the characters and the Digits within a predetermined distance from a letter are extracted as pairs of bank name and account number.

特開２０１６－４８４４４号公報JP 2016-48444 A 特開２０２０－１７０２６４号公報JP 2020-170264 A

しかしながら、上述の帳票識別システムでは、帳票などの書類のレイアウト（各属性が記述されている位置の情報など）を指定するテンプレートデータを使用するため、レイアウトの異なる複数の書類を処理するためには、レイアウトごとにテンプレートデータを予め作成しておかなければならず、事前に煩雑な作業が要求される。また、レイアウトが未知である書類については、上述の技術では、ある属性について書類画像内の属性値を正確に検出することは困難である。 However, the form identification system described above uses template data that specifies the layout of documents such as forms (information on the position where each attribute is described, etc.), so in order to process multiple documents with different layouts, , template data must be created in advance for each layout, and complicated work is required in advance. Also, for a document whose layout is unknown, it is difficult to accurately detect the attribute value in the document image for a certain attribute with the above-described technique.

また、上述の画像認識装置では、テンプレートデータは不要であるが、抽出すべき文字列（上述の銀行名）を予め設定しておく必要があり、設定されていない文字列については抽出されない。また、上述の画像認識装置では、上述の銀行名から所定距離内の数字を口座番号として抽出しているが、２つの文字オブジェクト間の距離が短くても、両者が関連しない場合や、両者間の距離が長くても、両者が関連する場合があるため、所望の文字列が正しく抽出されない可能性がある。 Further, in the image recognition apparatus described above, template data is unnecessary, but the character string to be extracted (the above-mentioned bank name) must be set in advance, and character strings that are not set are not extracted. In the image recognition apparatus described above, a number within a predetermined distance from the bank name is extracted as an account number. Even if the distance between the two is long, the desired character string may not be extracted correctly because the two may be related.

図５は、テーブルを含む書類画像の一例を示す図である。例えば図５に示すように、健康診断報告書の書類画像１０１には、健康診断の結果を示すテーブル１１１が含まれている。テーブル１１１には、検査項目を示すラベルと、その検査項目の検査結果である値との組み合わせが含まれている。テーブル１１１では、罫線の有無に拘わらず、セルが２次元的に配列され、セルに、ラベルや、ラベルに対応する値（数値や、数値以外の文字列）が記載されているが、両者（ラベルと値）間の距離が短くても、両者が関連しない場合や、両者間の距離が長くても、両者が関連する場合がある。 FIG. 5 is a diagram showing an example of a document image including a table. For example, as shown in FIG. 5, the document image 101 of the health checkup report includes a table 111 showing the results of the health checkup. The table 111 includes combinations of labels indicating inspection items and values that are inspection results of the inspection items. In the table 111, cells are arranged two-dimensionally regardless of the presence or absence of ruled lines, and labels and values corresponding to the labels (numerical values and character strings other than numerical values) are described in the cells. Label and value) may be unrelated even if the distance between them is short, or they may be related even if the distance between them is long.

本発明は、上記の問題に鑑みてなされたものであり、テンプレートデータを使用せずに、テーブル内のセルの属性を正確に特定する書類画像処理システム、書類画像処理方法、および書類画像処理プログラムを得ることを目的とする。 SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and provides a document image processing system, a document image processing method, and a document image processing program for accurately specifying attributes of cells in a table without using template data. The purpose is to obtain

本発明に係る書類画像処理システムは、書類画像内のテーブルを検出し、少なくともテーブル内のセルを検出し、そのセルの位置およびサイズを示すセル幾何学データを生成するテーブル検出部と、書類画像内のテキストオブジェクトを検出し、そのテキストオブジェクトの位置およびサイズを示すテキストオブジェクト幾何学データを生成するテキストオブジェクト検出部と、そのテキストオブジェクトに対して文字認識処理を実行してテキストオブジェクトに対応するテキストデータを生成する文字認識処理部と、（ａ）セル幾何学データおよびテキストオブジェクト幾何学データに基づいて、セル内のテキストオブジェクトを特定し、（ｂ）セルごとに、セル幾何学データ、セル内のテキストオブジェクトのテキストオブジェクト幾何学データ、およびセル内のテキストオブジェクトのテキストデータを含むノードデータを生成し、テーブルに対応するノードデータを含むノードデータセットを生成し、（ｃ）ノードデータセットに対して所定の分類処理を実行して、セルごとに、セルの属性を特定するセル属性特定部とを備える。 A document image processing system according to the present invention includes a table detection unit for detecting a table in a document image, detecting at least a cell in the table, and generating cell geometry data indicating the position and size of the cell; a text object detection unit for detecting a text object in a text object and generating text object geometric data indicating the position and size of the text object; a character recognition processor that generates data; (a) identifying a text object in a cell based on the cell geometry data and the text object geometry data; (c) for the node data set and a cell attribute identification unit that executes a predetermined classification process using the cell and identifies the attribute of the cell for each cell.

本発明に係る書類画像処理方法は、書類画像内のテーブルを検出し、少なくともテーブル内のセルを検出し、セルの位置およびサイズを示すセル幾何学データを生成するステップと、書類画像内のテキストオブジェクトを検出し、テキストオブジェクトの位置およびサイズを示すテキストオブジェクト幾何学データを生成するステップと、テキストオブジェクトに対して文字認識処理を実行してテキストオブジェクトに対応するテキストデータを生成するステップと、（ａ）セル幾何学データおよびテキストオブジェクト幾何学データに基づいて、セル内のテキストオブジェクトを特定し、（ｂ）セルごとに、セル幾何学データ、セル内のテキストオブジェクトのテキストオブジェクト幾何学データ、およびセル内のテキストオブジェクトのテキストデータを含むノードデータを生成し、テーブルに対応するノードデータを含むノードデータセットを生成し、（ｃ）ノードデータセットに対して所定の分類処理を実行して、セルごとに、セルの属性を特定するステップとを備える。 A document image processing method according to the present invention includes the steps of detecting a table within a document image, detecting at least cells within the table, generating cell geometry data indicating the position and size of the cell; detecting the object and generating text object geometry data indicating the position and size of the text object; performing character recognition processing on the text object to generate text data corresponding to the text object; a) identifying the text objects in the cell based on the cell geometry data and the text object geometry data; and (b) for each cell, the cell geometry data, the text object geometry data for the text objects in the cell, and generating node data containing the text data of the text objects in the cells; generating a node data set containing the node data corresponding to the table; and identifying attributes of the cell for each.

本発明に係る書類画像処理プログラムは、コンピューターを、上述のテーブル検出部、上述のテキストオブジェクト検出部、上述の文字認識処理部、並びに、上述のセル属性特定部として機能させる。 A document image processing program according to the present invention causes a computer to function as the above table detection unit, the above text object detection unit, the above character recognition processing unit, and the above cell attribute identification unit.

本発明によれば、書類画像内である属性についての記述位置を指定するテンプレートデータを使用せずに、テーブル内のセルの属性を正確に特定する書類画像処理システム、書類画像処理方法、および書類画像処理プログラムが得られる。 According to the present invention, a document image processing system, a document image processing method, and a document that accurately specify attributes of cells in a table without using template data that specifies the description position of attributes in the document image. An image processing program is obtained.

本発明の上記又は他の目的、特徴および優位性は、添付の図面とともに以下の詳細な説明から更に明らかになる。 The above and other objects, features and advantages of the present invention will become further apparent from the following detailed description together with the accompanying drawings.

図１は、本発明の実施の形態に係る書類画像処理システムの構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a document image processing system according to an embodiment of the invention. 図２は、本発明の実施の形態に係る書類画像処理システムにおいて生成されるノードデータについて説明する図である。FIG. 2 is a diagram explaining node data generated in the document image processing system according to the embodiment of the present invention. 図３は、本発明の実施の形態に係る書類画像処理システムにおける分類処理について説明する図である。FIG. 3 is a diagram for explaining classification processing in the document image processing system according to the embodiment of the present invention. 図４は、図１に示す書類画像処理システムの動作を説明するフローチャートである。FIG. 4 is a flow chart for explaining the operation of the document image processing system shown in FIG. 図５は、テーブルを含む書類画像の一例を示す図である。FIG. 5 is a diagram showing an example of a document image including a table.

以下、図に基づいて本発明の実施の形態を説明する。 Embodiments of the present invention will be described below based on the drawings.

図１は、本発明の実施の形態に係る書類画像処理システムの構成を示すブロック図である。図１に示す書類画像処理システムは、１台の情報処理装置（パーソナルコンピューター、サーバなど）で構成されているが、後述の処理部を、互いにデータ通信可能な複数の情報処理装置に分散させてもよい。また、そのような複数の情報処理装置には、特定の演算を並列処理するＧＰＵ（Graphics Processing Unit）が含まれていてもよい。 FIG. 1 is a block diagram showing the configuration of a document image processing system according to an embodiment of the invention. The document image processing system shown in FIG. 1 is composed of one information processing device (personal computer, server, etc.), but a processing unit, which will be described later, is distributed among a plurality of information processing devices capable of data communication with each other. good too. Also, such a plurality of information processing apparatuses may include GPUs (Graphics Processing Units) that perform specific computations in parallel.

図１に示す書類画像処理システムは、記憶装置１、通信装置２、画像読取装置３、および演算処理装置４を備える。 The document image processing system shown in FIG.

記憶装置１は、フラッシュメモリー、ハードディスクなどの不揮発性の記憶装置であって、各種データやプログラムを格納する。 The storage device 1 is a non-volatile storage device such as flash memory and hard disk, and stores various data and programs.

ここでは、記憶装置１には、画像処理プログラム１１が格納されており、また、システム設定データ（後述の各処理部に使用されるニューラルネットワークの係数設定値など）が必要に応じて格納される。なお、画像処理プログラム１１は、ＣＤ（Compact Disk）などの可搬性のあるコンピューター読み取り可能な記録媒体に格納されていてもよい。その場合、例えば、その記録媒体から記憶装置１へ画像処理プログラム１１がインストールされる。また、画像処理プログラム１１は、１つのプログラムでも、複数のプログラムの集合体でもよい。 Here, the storage device 1 stores an image processing program 11, and also stores system setting data (such as coefficient setting values of a neural network used in each processing unit described later) as necessary. . The image processing program 11 may be stored in a portable computer-readable recording medium such as a CD (Compact Disk). In that case, for example, the image processing program 11 is installed from the recording medium to the storage device 1 . Also, the image processing program 11 may be a single program or an aggregate of a plurality of programs.

通信装置２は、ネットワークインターフェイス、周辺機器インターフェイス、モデムなどのデータ通信可能な装置であって、必要に応じて、他の装置とデータ通信を行う。画像読取装置３は、書類から書類画像を光学的に読み取り、書類画像の画像データ（ラスタイメージデータなど）を生成する。なお、通信装置２および画像読取装置３は、必要に応じて設けられる。 The communication device 2 is a device capable of data communication such as a network interface, a peripheral device interface, a modem, etc., and performs data communication with other devices as necessary. The image reading device 3 optically reads a document image from the document and generates image data (raster image data, etc.) of the document image. Note that the communication device 2 and the image reading device 3 are provided as necessary.

演算処理装置４は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）などを備えるコンピューターであって、プログラムを、ＲＯＭ、記憶装置１などからＲＡＭにロードしＣＰＵで実行することで、各種処理部として動作する。 The arithmetic processing unit 4 is a computer equipped with a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. By executing it, it operates as various processing units.

ここでは、画像処理プログラム１１を実行することで、演算処理装置４は、書類画像取得部２１、テーブル検出部２２、テキストオブジェクト検出部２３、文字認識処理部２４、セル属性特定部２５、データ出力部２６、および機械学習処理部２７として動作する。 Here, by executing the image processing program 11, the arithmetic processing unit 4 performs a document image acquisition unit 21, a table detection unit 22, a text object detection unit 23, a character recognition processing unit 24, a cell attribute identification unit 25, a data output It operates as a unit 26 and a machine learning processing unit 27 .

書類画像取得部２１は、ラスターイメージデータなどの画像データとして書類画像を取得する。書類画像は、領収書（レシートを含む）、請求書、納品書などの帳票類、宣伝広告や告知などのチラシ、回答済みアンケート用紙、健康診断報告書などといった、１または複数の属性（記載項目など）についての属性ラベル（見出しなどのテキスト）と属性値（数値、その他の文字列などのテキスト）とをテーブル内に含む書類の画像である。例えば、書類画像取得部２１は、記憶装置１に格納されている画像データとしての書類画像を読み出したり、ネットワークなどの通信路を介して通信装置２により受信された画像データとしての書類画像を取得したり、画像読取装置３により生成された画像データとしての書類画像を取得したりする。 The document image acquisition unit 21 acquires a document image as image data such as raster image data. A document image can have one or more attributes (entry items etc.) in a table containing attribute labels (text such as headings) and attribute values (text such as numerical values and other character strings). For example, the document image acquisition unit 21 reads a document image as image data stored in the storage device 1, or acquires a document image as image data received by the communication device 2 via a communication channel such as a network. or acquires a document image as image data generated by the image reading device 3 .

テーブル検出部２２は、テンプレートデータを使用せずに、取得された書類画像内のテーブルを検出し、そのテーブルの位置およびサイズを示すテーブル幾何学データを生成するとともに、少なくともテーブル内のセルを検出し、そのセルの位置およびサイズを示すセル幾何学データを生成する。 A table detection unit 22 detects a table in an acquired document image without using template data, generates table geometry data indicating the position and size of the table, and detects at least cells in the table. and generate cell geometry data indicating the location and size of the cell.

この実施の形態では、テーブル検出部２２は、テーブル内のセルとともにロウおよびカラムの少なくとも一方を検出し、そのロウの位置およびサイズを示すロウ幾何学データ並びにそのカラムの位置およびサイズを示すカラム幾何学データの少なくとも一方（ここでは、両方）を生成する。 In this embodiment, the table detection unit 22 detects at least one of rows and columns along with cells in the table, and provides row geometry data indicating the position and size of the row and column geometry data indicating the position and size of the column. At least one (here, both) of the scientific data is generated.

なお、テーブル検出部２２は、ニューラルネットワークを使用する既存の手法に従って、書類画像内のテーブル、そのテーブル内のロウ、カラム、およびセルを検出し、それらの幾何学データを生成する。 The table detection unit 22 detects tables in the document image, rows, columns, and cells in the tables, and generates geometric data for them according to an existing technique using a neural network.

テキストオブジェクト検出部２３は、テンプレートデータを使用せずに、取得された書類画像内のテキストオブジェクトを検出し、そのテキストオブジェクトの位置およびサイズを示すテキストオブジェクト幾何学データを生成する。 The text object detection unit 23 detects text objects in the acquired document image without using template data, and generates text object geometric data indicating the position and size of the text objects.

具体的には、テキストオブジェクト検出部２３は、（ａ）書類画像内の文字以外のオブジェクト（写真オブジェクト、図形オブジェクト、罫線オブジェクトなど）を除外して文字オブジェクトを検出し、（ｂ）各文字オブジェクトの位置に基づいて、「単語」単位にグルーピングしてテキストオブジェクトを抽出する。 Specifically, the text object detection unit 23 (a) detects text objects by excluding objects other than text (photo objects, graphic objects, ruled line objects, etc.) in the document image, and (b) detects each text object. Based on the position of , extract text objects by grouping them by "words".

なお、テキストオブジェクト検出部２３は、既存の技術（例えば、領域分離処理や、機械学習されたディープニューラルネットワークなど）を使用して、書類画像内の文字オブジェクトを抽出する。 Note that the text object detection unit 23 uses existing technology (for example, region separation processing, machine-learned deep neural network, etc.) to extract text objects in the document image.

文字認識処理部２４は、検出されたテキストオブジェクト（ラスターイメージ）に対して文字認識処理を実行してそのテキストオブジェクトに対応するテキストデータ（文字コード列）を生成する。なお、この文字認識処理には、既存の技術が利用される。 The character recognition processing unit 24 performs character recognition processing on the detected text object (raster image) to generate text data (character code string) corresponding to the text object. Existing technology is used for this character recognition processing.

セル属性特定部２５は、（ａ）上述のセル幾何学データおよびテキストオブジェクト幾何学データに基づいて、セル内のテキストオブジェクトを特定し、（ｂ）セルごとに、セル幾何学データ、セル内のテキストオブジェクトのテキストオブジェクト幾何学データ、およびセル内のテキストオブジェクトのテキストデータを含むノードデータを生成し、テーブルに対応するノードデータを含むノードデータセットを生成し、（ｃ）そのノードデータセットに対して所定の分類処理を実行して、セルごとに、セルの属性を特定する。 The cell attribute identification unit 25 (a) identifies the text object in the cell based on the cell geometry data and the text object geometry data described above, and (b) for each cell, determines the cell geometry data, the generating node data containing the text object geometry data for the text objects and the text data for the text objects in the cells, generating a node data set containing the node data corresponding to the table, and (c) for the node data set , a predetermined classification process is executed, and the attribute of the cell is specified for each cell.

なお、ノードデータ内のセル、ロウ、カラム、およびテキストオブジェクトの位置は、
２次元座標の座標値であり、それらのサイズは、２次元座標のそれぞれの座標での長さである。また、その位置は、セル、ロウ、カラム、およびテキストオブジェクトの矩形領域の所定部位（四隅のいずれか、中心など）の位置で示され、テーブルの所定部位（四隅のいずれか、中心など）からの相対位置で表されるようにしてもよい。この相対位置は、書類画像内でのテーブルの絶対位置（テーブル幾何学データ内の位置）とセル、ロウ、カラムまたはテキストオブジェクトの絶対位置（当初の幾何学データ内の位置）とから導出される。 Note that the positions of cells, rows, columns, and text objects in node data are
They are coordinate values in two-dimensional coordinates, and their size is the length at each coordinate in the two-dimensional coordinates. In addition, the position is indicated by the position of a predetermined portion (one of the four corners, the center, etc.) of the rectangular area of the cell, row, column, and text object, and the may be represented by the relative position of This relative position is derived from the absolute position of the table within the document image (the position in the table geometry data) and the absolute position of the cell, row, column or text object (the position in the original geometry data). .

具体的には、検出された各セルについて、セル属性特定部２５は、そのセルのセル幾何学データから特定されるセルの領域内に、テキストオブジェクト幾何学データから特定されるテキストオブジェクト（バウンディングボックス）の領域が含まれる場合、そのテキストオブジェクトが、そのセル内のテキストオブジェクトであると判定する。 Specifically, for each detected cell, the cell attribute specifying unit 25 places a text object (bounding box) specified from the text object geometric data in the cell region specified from the cell geometric data ) is included, the text object is determined to be the text object in the cell.

図２は、本発明の実施の形態に係る書類画像処理システムにおいて生成されるノードデータについて説明する図である。図３は、本発明の実施の形態に係る書類画像処理システムにおける分類処理について説明する図である。 FIG. 2 is a diagram explaining node data generated in the document image processing system according to the embodiment of the present invention. FIG. 3 is a diagram for explaining classification processing in the document image processing system according to the embodiment of the present invention.

この実施の形態では、図２に示すように、あるセルのノードデータは、当該セルのセル幾何学データ、当該セル内のテキストオブジェクトのテキストオブジェクト幾何学データ、および当該セル内のテキストオブジェクトのテキストデータの他に、当該セルの属するロウのロウ幾何学データおよび当該セルの属するカラムのカラム幾何学データの少なくとも一方（ここでは両方）をさらに含む。 In this embodiment, as shown in FIG. 2, the node data for a cell consists of the cell geometry data for the cell, the text object geometry data for the text objects in the cell, and the text data for the text objects in the cell. In addition to the data, it further includes at least one (here, both) of the row geometry data of the row to which the cell belongs and the column geometry data of the column to which the cell belongs.

この実施の形態では、図３に示すように、セル属性特定部２５は、上述のノードデータセットを入力データとして機械学習済みのグラフニューラルネットワーク（ＧＮＮ）に入力し、そのＧＮＮの出力データを上述のセルの属性として特定する。なお、このＧＮＮおよびその機械学習については既存のものが利用できる。 In this embodiment, as shown in FIG. 3, the cell attribute identification unit 25 inputs the node data set described above as input data to a machine-learned graph neural network (GNN), and the output data of the GNN is as an attribute of the cell in Existing GNN and its machine learning can be used.

また、セルの属性は、セルのセルタイプを少なくとも含み、セルタイプは、ラベルおよび属性値のいずれかである（つまり、この場合、セルは、ラベルのセルおよび属性値のセルのいずれかに分類される）。なお、各ノードデータについての、ＧＮＮの出力データは、そのノードデータに対応するセルのセルタイプの取り得る値（ここではラベルおよび属性値）のそれぞれについての確率（０～１の範囲内の数値）であり、その確率の値に基づいて、例えば閾値による分類などによって、セルタイプが、セルタイプの取り得る値（ここではラベルおよび属性値）のいずれかに決定される。 Also, the attributes of a cell include at least the cell type of the cell, where the cell type is either label or attribute value (i.e., in this case, the cell is classified as either a label cell or an attribute value cell). is done). Note that the GNN output data for each node data is the probability (numerical value within the range of 0 to 1) for each possible value (label and attribute value here) of the cell type of the cell corresponding to the node data ), and based on the probability value, the cell type is determined to be one of the possible values (here, the label and the attribute value) of the cell type, for example, by classification using a threshold.

なお、セル属性特定部２５は、ノードデータの示す特徴量に基づいて、ノードデータに対するクラスタリングを行ってノードデータをクラスターに分類し、そのクラスターに基づいてセルの属性を特定するようにしてもよい。例えば、セル属性特定部２５は、ＧＮＮの代わりに上述のクラスタリングでセルの属性を特定するようにしてもよいし、ＧＮＮで上述のセルの属性を特定する際の信頼性が低い場合に、ＧＮＮの代わりに上述のクラスタリングでセルの属性を特定するようにしてもよい。 Note that the cell attribute identification unit 25 may perform clustering on the node data based on the feature amount indicated by the node data, classify the node data into clusters, and identify the attribute of the cell based on the clusters. . For example, the cell attribute identification unit 25 may identify the cell attribute by the above-described clustering instead of the GNN, or if the reliability of identifying the above-described cell attribute by the GNN is low, the GNN Alternatively, the above-described clustering may be used to identify the attributes of the cells.

例えば、この特徴量としては、例えばＷｏｒｄ２ｖｅｃ（Ｓｋｉｐ－Ｇｒａｍモデル）などの既存の手法に従って生成される、ノードデータ（全部または特定部分）に対応する特徴ベクトルが使用される。また、ノードデータと、そのノードデータに対するセルタイプの値との組み合わせを大量に収集し、セルタイプの各値（ここでは、ラベルまたは属性値）についての中心値（特徴ベクトルの平均）を（各クラスターの中心として）特定しておき、分類対象のノードデータの特徴ベクトルの示す位置から、最も近い中心値を有するセルタイプの値（ここでは、ラベルまたは属性値）が、そのノードデータに対応するセルタイプの値として選択される。 For example, as this feature amount, a feature vector corresponding to node data (whole or specific part) generated according to an existing method such as Word2vec (Skip-Gram model) is used. We also collect a large number of combinations of node data and cell type values for that node data, and calculate the central value (mean of the feature vector) for each cell type value (here, label or attribute value) (each cluster center), and from the position indicated by the feature vector of the node data to be classified, the cell type value (here, label or attribute value) with the closest center value corresponds to that node data Selected as the cell type value.

データ出力部２６は、各ノードデータに、そのノードデータに対応するセル属性を追加して、そのノードデータセットを所定のデータ形式で記憶装置１に記憶したり、通信装置２で送信したりする。 The data output unit 26 adds a cell attribute corresponding to the node data to each node data, stores the node data set in a predetermined data format in the storage device 1, or transmits the node data set via the communication device 2. .

この出力データ（ノードデータセット）によって、例えば、カラム内のラベルのセルおよび属性値のセルを特定したり、ロウ内のラベルのセルおよび属性値のセルを特定したりすることができる。 With this output data (node data set), for example, label cells and attribute value cells in a column can be specified, and label cells and attribute value cells in a row can be specified.

機械学習処理部２７は、上述のセル属性特定部２５におけるＧＮＮの機械学習を行う機械学習処理を実行する。なお、上述の機械学習処理部２７は、必須のものではなく、必要に応じて設ければよい。また、セル属性特定部２５（ＧＮＮ）の機械学習が完了している場合には、機械学習処理部２７は、設けられていなくてもよい。 The machine learning processing unit 27 executes machine learning processing for performing machine learning of the GNN in the cell attribute specifying unit 25 described above. Note that the machine learning processing unit 27 described above is not essential, and may be provided as needed. Further, when the machine learning of the cell attribute identification unit 25 (GNN) has been completed, the machine learning processing unit 27 may not be provided.

次に、本実施の形態に係る書類画像処理システムの動作について説明する。図４は、図１に示す書類画像処理システムの動作を説明するフローチャートである。 Next, the operation of the document image processing system according to this embodiment will be described. FIG. 4 is a flow chart for explaining the operation of the document image processing system shown in FIG.

まず、書類画像取得部２１は、書類画像を取得する（ステップＳ１）。 First, the document image acquiring section 21 acquires a document image (step S1).

次に、テーブル検出部２２は、テンプレートデータを使用せずに、書類画像内のテーブルを検出するとともに、テーブル内のセル、カラム、およびロウを検出し、セル幾何学データ、カラム幾何学データ、およびロウ幾何学データを生成する（ステップＳ２）。 Next, the table detection unit 22 detects tables in the document image without using template data, detects cells, columns, and rows in the tables, and extracts cell geometry data, column geometry data, and raw geometric data are generated (step S2).

また、テキストオブジェクト検出部２３は、テンプレートデータを使用せずに、書類画像内のテキストオブジェクトを検出し、テキストオブジェクト幾何学データを生成する（ステップＳ２）。文字認識処理部２４は、検出されたテキストオブジェクトに対して文字認識処理を実行し、そのテキストオブジェクトのテキストデータを生成する。 The text object detection unit 23 also detects text objects in the document image without using template data and generates text object geometric data (step S2). The character recognition processing unit 24 performs character recognition processing on the detected text object to generate text data of the text object.

なお、テーブル検出部２２による上述の処理およびテキストオブジェクト検出部２３による上述の処理は、並列に実行してもよいし、それらの処理を順番に行う場合には、どちらを先に実行してもよい。 Note that the above-described processing by the table detection unit 22 and the above-described processing by the text object detection unit 23 may be executed in parallel. good.

そして、検出された各セルについて、セル属性特定部２５は、上述のセル幾何学データおよびテキストオブジェクト幾何学データに基づいて、セル内のテキストオブジェクトを特定し、セル幾何学データ、セル内のテキストオブジェクトのテキストオブジェクト幾何学データ、およびセル内のテキストオブジェクトのテキストデータなどを含むノードデータを生成する（ステップＳ３）。 Then, for each detected cell, the cell attribute specifying unit 25 specifies the text object in the cell based on the cell geometric data and the text object geometric data described above, and Generate node data including text object geometry data of the object and text data of the text object in the cell (step S3).

次に、セル属性特定部２５は、テーブルごとに、そのテーブル内で検出された全セルに対応するノードデータでノードデータセットを生成し、そのノードデータセットに対して所定の分類処理を実行して、各セルの属性を分類して、各セルの属性（ここでは、ラベルまたは属性値というセルタイプ）を特定する（ステップＳ４）。 Next, for each table, the cell attribute identification unit 25 generates a node data set with node data corresponding to all cells detected in the table, and executes a predetermined classification process on the node data set. Then, the attribute of each cell is classified and the attribute of each cell (here, the cell type of label or attribute value) is specified (step S4).

そして、データ出力部２６は、例えば、各セルの属性を、そのセルに対応するノードデータに追加し、テーブルごとに、ノードデータセットを出力データとして所定のデータ形式で記憶装置１に記憶したり、通信装置２で送信したりする（ステップＳ５）。 Then, the data output unit 26 adds, for example, the attribute of each cell to the node data corresponding to the cell, and stores the node data set as output data in the storage device 1 in a predetermined data format for each table. , and transmitted by the communication device 2 (step S5).

このようにして、書類画像内の各テーブルについて、セル単位の属性データが生成される。 In this way, attribute data for each cell is generated for each table in the document image.

また、データ出力部２６は、所定形式の検索要求を受け付け、その検索要求に従って、上述のように生成されたノードデータセットにおいて、検索要求により指定されたラベルに対応する属性値を検索し、そのラベルと属性値との組み合わせを出力するようにしてもよい。例えば、まず、生成されたノードデータセットにおいて、検索対象のラベルを含む、セル、ロウ、およびカラムが特定され、そのロウまたはカラムにおいて属性値のセルが含まれているロウまたはカラムが特定され、そのロウまたはカラムにおいて、属性値のセルが１つであれば、その属性値がそのラベルに対応する属性値として特定され、属性値のセルが複数であれば、それらの属性値がそのラベルに対応する属性値として特定されるとともに、それらの属性値のセルのロウまたはカラム（検索対象のラベルを含むロウ内で属性値のセルが特定された場合にはカラム、検索対象のラベルを含むカラム内で属性値のセルが特定された場合にはロウ）のラベルをそれらの属性値にそれぞれ関連付けて付してもよい。 In addition, the data output unit 26 receives a search request in a predetermined format, searches for an attribute value corresponding to the label specified by the search request in the node data set generated as described above, according to the search request, and You may make it output the combination of a label and an attribute value. For example, first, in the generated node data set, the cell, row, and column containing the label to be searched are identified, and the row or column containing the attribute value cell is identified in the row or column, If there is one attribute value cell in that row or column, that attribute value is specified as the attribute value corresponding to that label, and if there are multiple attribute value cells, those attribute values are assigned to that label. Identified as the corresponding attribute value, the row or column of the attribute value cell (if the attribute value cell is specified within the row containing the label to be searched, the column containing the label to be searched) If cells of attribute values are specified in the row), labels may be attached in association with those attribute values, respectively.

例えば図５におけるテーブル１１１のノードデータセットにおいて、検索対象として「身長」が指定された場合、属性値として、「身長」の含まれるロウ内の「１６１．０」、「１６１．２」、および「１６１．１」の３つが検出され、「１６１．０」には「今回」というラベルが、「１６１．２」には「前回」というラベルが、「１６１．１」には「前々回」というラベルが付される。 For example, in the node data set of the table 111 in FIG. 5, when "height" is specified as a search target, "161.0", "161.2", and "161.0", "161.2", and Three of "161.1" are detected, "161.0" has the label "this time", "161.2" has the label "previous", and "161.1" has the label "previous". Labeled.

また、検索要求において２つのラベルを検索対象のラベルとして指定可能としてもよい。その場合、その２つのラベルのうちの一方のラベルで上述と同様に属性値が検出され、検出された属性値のうち、その属性値のセルの属するロウまたはカラムのラベルが他方のラベルに一致するものが、その２つのラベルに対応する属性値であると判定され検出される。 Also, two labels may be designated as search target labels in a search request. In that case, an attribute value is detected in one of the two labels in the same way as described above, and among the detected attribute values, the label of the row or column to which the cell with that attribute value belongs matches the label of the other. are determined to be the attribute values corresponding to the two labels and are detected.

以上のように、上記実施の形態によれば、テーブル検出部２２は、書類画像内のテーブルを検出し、少なくともテーブル内のセルを検出し、そのセルの位置およびサイズを示すセル幾何学データを生成する。テキストオブジェクト検出部２３は、書類画像内のテキストオブジェクトを検出し、そのテキストオブジェクトの位置およびサイズを示すテキストオブジェクト幾何学データを生成する。文字認識処理部２４は、そのテキストオブジェクトに対して文字認識処理を実行してテキストオブジェクトに対応するテキストデータを生成する。セル属性特定部２５は、（ａ）セル幾何学データおよびテキストオブジェクト幾何学データに基づいて、セル内のテキストオブジェクトを特定し、（ｂ）セルごとに、セル幾何学データ、セル内のテキストオブジェクトのテキストオブジェクト幾何学データ、およびセル内のテキストオブジェクトのテキストデータを含むノードデータを生成し、テーブルに対応するノードデータを含むノードデータセットを生成し、（ｃ）ノードデータセットに対して所定の分類処理を実行して、セルごとに、セルの属性を特定する。 As described above, according to the above embodiment, the table detection unit 22 detects a table in a document image, detects at least cells in the table, and generates cell geometry data indicating the position and size of the cell. Generate. The text object detection unit 23 detects text objects in the document image and generates text object geometric data indicating the position and size of the text objects. The character recognition processing unit 24 performs character recognition processing on the text object to generate text data corresponding to the text object. The cell attribute identification unit 25 (a) identifies a text object in a cell based on the cell geometry data and the text object geometry data, and (b) identifies the cell geometry data and the text object in the cell for each cell. and node data containing the text data of the text objects in the cells; generating a node data set containing the node data corresponding to the table; A classification process is performed to identify, for each cell, the attributes of the cell.

これにより、テンプレートデータを使用せずに、テーブル内のセルの属性が正確に特定される。また、ラベルと属性値との距離（ユークリッド距離）を考慮せずに、ラベルに対応する属性値を検出しているため、ラベルと属性値との距離（ユークリッド距離）に拘わらず、テーブル内のラベルと属性値の組み合わせが正確に特定される。 This accurately identifies the attributes of the cells in the table without using template data. Also, since the attribute value corresponding to the label is detected without considering the distance (Euclidean distance) between the label and the attribute value, regardless of the distance (Euclidean distance) between the label and the attribute value, The exact combination of label and attribute value is specified.

なお、上述の実施の形態に対する様々な変更および修正については、当業者には明らかである。そのような変更および修正は、その主題の趣旨および範囲から離れることなく、かつ、意図された利点を弱めることなく行われてもよい。つまり、そのような変更および修正が請求の範囲に含まれることを意図している。 Various changes and modifications to the above-described embodiments will be apparent to those skilled in the art. Such changes and modifications may be made without departing from the spirit and scope of its subject matter and without diminishing its intended advantages. It is therefore intended that such changes and modifications be covered by the claims.

例えば、上記実施の形態において、上述の処理が完了した後、ただちに、書類画像の画像データを当該システムから消去するようにしてもよい。 For example, in the above embodiment, the image data of the document image may be deleted from the system immediately after the above processing is completed.

また、上記実施の形態において、上述のＧＮＮの入力データのノード数は、ノードデータセットのノードデータ数（つまり、テーブルのセル数）の最大値以上の所定値に設定され、ＧＮＮの入力データのノード数よりノードデータ数が少ない場合には、不足分のノードデータとして固定値が使用され、それに対応する出力データは破棄される。 In the above embodiment, the number of nodes of the GNN input data is set to a predetermined value equal to or greater than the maximum value of the number of node data in the node data set (that is, the number of cells in the table). If the number of node data is less than the number of nodes, a fixed value is used as the missing node data, and the corresponding output data is discarded.

また、上記実施の形態において、セルの属性としてはセルタイプとしており、セルタイプはラベルおよび属性値のいずれかであるが、セルタイプが、ラベルおよび属性値の他、ヘッダー（テーブルのタイトルなど）、その他、などを取るようにしてもよい。 In the above embodiment, the attribute of a cell is the cell type, and the cell type is either a label or an attribute value. , etc. may be taken.

また、上記実施の形態において、ロウおよびカラムを検出せず、ノードデータに、ロウ幾何学データおよびカラム幾何学データを含まないようにしてもよい。その場合でも、セル幾何学データのセルの位置に基づいて、ラベルのセルの位置から、横方向および縦方向に沿って、属性値のセルを探索し検出することで、ラベルに対応する属性値を検出することができる。 Further, in the above embodiment, rows and columns may not be detected and node data may not include row geometric data and column geometric data. Even so, the attribute value corresponding to the label is found by searching and finding the attribute value cell along the horizontal and vertical directions from the label cell position based on the cell position of the cell geometry data. can be detected.

本発明は、例えば、帳票などの書類画像の認識処理に適用可能である。 INDUSTRIAL APPLICABILITY The present invention is applicable, for example, to recognition processing of document images such as forms.

４演算処理装置（コンピューターの一例）
１１画像処理プログラム（書類画像処理プログラムの一例）
２２テーブル検出部
２３テキストオブジェクト検出部
２４文字認識処理部
２５セル属性特定部 4 Arithmetic processing unit (an example of a computer)
11 Image processing program (an example of a document image processing program)
22 Table detection unit 23 Text object detection unit 24 Character recognition processing unit 25 Cell attribute identification unit

Claims

a table detector for detecting a table within a document image, detecting at least a cell within said table, and generating cell geometry data indicative of the position and size of said cell;
a text object detector for detecting text objects in the document image and generating text object geometry data indicating the location and size of the text objects;
a character recognition processing unit that performs character recognition processing on the text object to generate text data corresponding to the text object;
(a) identifying the text objects within the cells based on the cell geometry data and the text object geometry data; and (b) for each of the cells, the cell geometry data, the text within the cells. (c) generating node data including said text object geometry data for objects and said text data for said text objects in said cells, and generating a node data set including said node data corresponding to said table; a cell attribute identification unit that executes a predetermined classification process on the node data set and identifies an attribute of the cell for each of the cells;
A document image processing system comprising:

2. The cell attribute identification unit according to claim 1, wherein the node data set is input to a machine-learned graph neural network as input data, and the output data of the graph neural network is identified as the attribute of the cell. document image processing system.

The cell attribute identification unit performs clustering on the node data based on the feature amount indicated by the node data, classifies the node data into clusters, and identifies the attribute of the cell based on the clusters. 2. The document image processing system according to claim 1.

The table detection unit detects at least one of rows and columns together with cells in the table, and at least one of row geometric data indicating the position and size of the row and column geometric data indicating the position and size of the column. to generate
the node data of a cell further includes at least one of the row geometry data of the row to which the cell belongs and the column geometry data of the column to which the cell belongs;
4. The document image processing system according to any one of claims 1 to 3, characterized by:

the attributes of the cell include at least a cell type of the cell;
the cell types include labels and attribute values;
5. The document image processing system according to any one of claims 1 to 4, characterized by:

detecting a table within a document image, detecting at least a cell within said table, and generating cell geometry data indicative of the position and size of said cell;
detecting text objects in said document image and generating text object geometry data indicating the location and size of said text objects;
performing a character recognition process on the text object to generate text data corresponding to the text object;
(a) identifying the text objects within the cells based on the cell geometry data and the text object geometry data; and (b) for each of the cells, the cell geometry data, the text within the cells. (c) generating node data including said text object geometry data for objects and said text data for said text objects in said cells, and generating a node data set including said node data corresponding to said table; performing a predetermined classification process on the node data set to identify, for each cell, attributes of the cell;
A document image processing method comprising:

the computer,
a table detector for detecting a table within a document image, detecting at least a cell within said table, and generating cell geometry data indicating the position and size of said cell;
a text object detector for detecting text objects in the document image and generating text object geometry data indicating the location and size of the text objects;
a character recognition processing unit that performs character recognition processing on the text object to generate text data corresponding to the text object;
(a) identifying the text objects within the cells based on the cell geometry data and the text object geometry data; and (b) for each of the cells, the cell geometry data, the text within the cells. (c) generating node data including said text object geometry data for objects and said text data for said text objects in said cells, and generating a node data set including said node data corresponding to said table; a cell attribute identification unit that executes a predetermined classification process on the node data set and identifies an attribute of each cell;
A document image processing program that functions as a