JP7317886B2

JP7317886B2 - Information processing device and information processing method

Info

Publication number: JP7317886B2
Application number: JP2021067197A
Authority: JP
Inventors: 直志綿貫; 孝治菊池; 正也大島; 真福沢; 東谷内
Original assignee: Primagest Inc
Current assignee: Primagest Inc
Priority date: 2021-04-12
Filing date: 2021-04-12
Publication date: 2023-07-31
Anticipated expiration: 2041-04-12
Also published as: JP2022162380A

Description

本発明は、情報処理装置、情報処理方法、及びプログラムに関する。 The present invention relates to an information processing device, an information processing method, and a program.

従来より、帳票を表す画像データを用いて、その帳票に存在する文字列を自動的に認識し、文字列間の対応関係を含めてデータ化する技術が存在する（例えば、特許文献１）。帳票を認識させた後には、通常、オペレータによる認識結果の確認、及び必要に応じた認識結果の修正等のための後続処理が行われる。 Conventionally, there is a technique for automatically recognizing character strings present in a form using image data representing the form and converting the character strings into data including the correspondence between the character strings (for example, Japanese Unexamined Patent Application Publication No. 2002-200011). After recognizing the form, the operator usually performs post-processing such as checking the recognition result and correcting the recognition result as necessary.

特開２００９－１２２７２２号公報JP 2009-122722 A

特許文献１を含む従来の技術において、帳票に存在する全ての文字列が対応関係を含めて認識され、その認識結果が後続処理に渡される。しかしながら、後続処理において、全ての認識結果を確認する必要があるとは限らない。 In the prior art including Japanese Patent Application Laid-Open No. 2002-200311, all character strings in a form are recognized including their corresponding relationships, and the recognition result is passed to subsequent processing. However, it is not always necessary to confirm all recognition results in subsequent processing.

用途、及び構成等により、確認する必要性が無いか、或いはその必要性が低い文字列が帳票上に存在する可能性がある。後続処理では、そのような帳票の認識結果を確認する状況が考えられる。また、オペレータの負担、或いは時間的な関係から、比較的に重要度の高い文字列のみを後続処理で確認すれば良いというような状況が生じることも考えられる。何れの状況であっても、オペレータは、認識結果のうちから確認すべき認識結果を探し出して確認しなければならず、後続処理を迅速に行うのが困難となる。 Depending on the application, configuration, etc., there may be character strings on the form that do not need to be checked or that need not be checked. In subsequent processing, a situation is conceivable in which the recognition result of such a form is confirmed. In addition, it is conceivable that a situation may arise in which only character strings with a relatively high degree of importance need to be confirmed in subsequent processing due to the operator's burden or time constraints. In any situation, the operator has to search out the recognition result to be confirmed from among the recognition results and confirm it, which makes it difficult to perform subsequent processing quickly.

本発明は、このような状況を鑑みてなされたものであり、状況に応じて、帳票の認識結果を確認する適切な後続処理をオペレータがより迅速に行うのを可能にする技術を提供することを目的とする。 SUMMARY OF THE INVENTION The present invention has been made in view of such circumstances, and provides a technique that enables an operator to more quickly perform appropriate subsequent processing for confirming the recognition result of a form depending on the situation. With the goal.

上記目的を達成するため、本発明の一態様の情報処理装置は、
帳票を表す画像のデータを用いて、前記帳票に存在する、１つ以上の文字が連なる文字列を複数認識するとともに、認識した前記複数の文字列の夫々の前記画像における位置情報を特定する文字列認識手段と、
前記文字列認識手段による前記複数の文字列の夫々の認識結果、及び前記複数の文字列の前記位置情報を用いて、前記帳票に存在する前記複数の文字列のうち、所定の２つの文字列の間の対応関係を決定する対応関係決定手段と、
前記対応関係決定手段により対応関係が決定された前記所定の２つの文字列のうち、所定条件を満たす２つの文字列の組を特定する特定手段と、を備える。 In order to achieve the above object, an information processing device according to one aspect of the present invention includes:
Using data of an image representing a form, recognizing a plurality of character strings in which one or more characters exist in the form, and character specifying position information of each of the recognized character strings in the image. column recognition means;
predetermined two character strings out of the plurality of character strings existing in the form using the recognition result of each of the plurality of character strings by the character string recognition means and the position information of the plurality of character strings; Correspondence determining means for determining the correspondence between
an identifying means for identifying a set of two character strings satisfying a predetermined condition, from among the two predetermined character strings for which correspondence has been determined by the correspondence determining means.

本発明によれば、状況に応じて、帳票の認識結果を確認する適切な後続処理をオペレータがより迅速に行うのが可能になる。 According to the present invention, it is possible for the operator to more quickly perform appropriate subsequent processing for confirming the recognition result of the form depending on the situation.

本発明の情報処理装置の一実施形態に係る帳票認識装置により実現可能となる第１のサービスの概要を説明する図である。1 is a diagram for explaining an outline of a first service that can be realized by a form recognition device according to an embodiment of an information processing device of the present invention; FIG. 帳票画像から抽出されるテーブル、セル、及び文字列領域の各抽出結果の例を説明する図である。FIG. 10 is a diagram illustrating an example of extraction results of a table, a cell, and a character string area extracted from a form image; 表示領域でのテーブル、セル、及び文字列領域の各種抽出結果の表示例を示す図である。FIG. 10 is a diagram showing a display example of various extraction results of a table, cells, and character string areas in a display area; 本発明の情報処理装置の一実施形態に係る帳票認識装置により実現可能となる第２のサービスの概要を説明する図である。FIG. 10 is a diagram illustrating an outline of a second service that can be realized by the form recognition device according to one embodiment of the information processing device of the present invention; 帳票から特定される２つの文字列の組の例を示す図である。FIG. 10 is a diagram showing an example of a set of two character strings specified from a form; 本発明の情報処理装置の一実施形態に係る帳票認識装置を用いて構築された情報処理システムの構成例を示す図である。1 is a diagram showing a configuration example of an information processing system constructed using a form recognition device according to an embodiment of an information processing device of the present invention; FIG. 本発明の情報処理装置の一実施形態に係る帳票認識装置のハードウェア構成の一例を示すブロック図である。1 is a block diagram showing an example of a hardware configuration of a form recognition device according to an embodiment of an information processing device of the present invention; FIG. 本発明の情報処理装置の一実施形態に係る帳票認識装置上に実現される機能的構成の一例を示す機能ブロック図である。1 is a functional block diagram showing an example of a functional configuration implemented on a form recognition device according to an embodiment of an information processing device of the present invention; FIG. 本発明の情報処理装置の一実施形態に係る帳票認識装置により実行されるセル検出処理の例を示すフローチャートである。5 is a flow chart showing an example of cell detection processing executed by the form recognition device according to the embodiment of the information processing device of the present invention; 適切に他の成分と接続されていない可能性が考えられるとして特定される成分の例を示す図である。FIG. 10 is a diagram showing examples of components identified as possibly not properly connected to other components; ステップＳ１３で実行される処理の内容の第１の例を説明する図である。FIG. 4 is a diagram illustrating a first example of the content of processing executed in step S13; FIG. ステップＳ１３で実行される処理の内容の第２の例を説明する図である。FIG. 10 is a diagram illustrating a second example of the content of processing executed in step S13; FIG. 帳票画像、その帳票画像から検出されるセル、検出されたセルのグルーピングの例を示す図である。FIG. 3 is a diagram showing an example of a form image, cells detected from the form image, and grouping of the detected cells; 本発明の情報処理装置の一実施形態に係る帳票認識装置により実行される外周取得処理の例を示すフローチャートである。6 is a flow chart showing an example of perimeter acquisition processing executed by the form recognition device according to the embodiment of the information processing device of the present invention; セルの検出結果、及びその検出結果を変換して得られる節点情報の例を示す図である。FIG. 10 is a diagram showing an example of cell detection results and node information obtained by converting the detection results; グループ化された節点情報の例、及びその節点情報とからセルを含むテーブルの外周の抽出例を示す図である。FIG. 10 is a diagram showing an example of grouped node information and an example of extraction of the perimeter of a table including cells from the node information; 本発明の情報処理装置の一実施形態に係る帳票認識装置上に実際にキー－バリュー抽出部の一部として実現された機能的構成の一例を示す機能ブロック図である。FIG. 3 is a functional block diagram showing an example of a functional configuration actually implemented as a part of a key-value extractor on the form recognition device according to one embodiment of the information processing device of the present invention; グラフ情報生成部により生成される各種情報の例を説明する図である。FIG. 10 is a diagram illustrating an example of various information generated by a graph information generation unit; FIG. グラフ構築部の詳細例を示す図である。FIG. 10 is a diagram showing a detailed example of a graph constructing unit; キー－バリューの関係がある文字列の組の他の特定例を説明する図である。FIG. 10 is a diagram illustrating another specific example of a set of character strings having a key-value relationship;

以下、本発明の実施形態について、図面を用いて説明する。
図１は、本発明の情報処理装置の一実施形態に係る帳票認識装置により実現可能となる第１のサービス（以下、「本第１のサービス」と呼ぶ）の概要を説明する図である。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 is a diagram for explaining an overview of a first service (hereinafter referred to as "this first service") that can be realized by a form recognition device according to an embodiment of an information processing device of the present invention.

本第１のサービスは、帳票認識装置（後述する図６参照）により実現可能なサービスである。サービス提供者は、例えば専用のアプリケーション・ソフトウェア（以下、「専用アプリ」と呼ぶ）を開発することにより、その専用アプリを購入した個人、或いは組織に対し、本第１のサービスを提供する。ここでは、専用アプリの購入等をしたのは組織と想定する。その組織は以下、「購入企業」と呼び、購入企業内で専用アプリを実際に使用する者は「オペレータ」と呼ぶこととする。 The first service is a service that can be realized by a form recognition device (see FIG. 6, which will be described later). The service provider develops, for example, dedicated application software (hereinafter referred to as "dedicated application"), and provides the first service to individuals or organizations that have purchased the dedicated application. Here, it is assumed that the organization purchased the dedicated application. The organization is hereinafter referred to as the "purchasing company", and the person who actually uses the dedicated application within the purchasing company is referred to as the "operator".

オペレータ（操作者）は、例えば購入企業内の任意の情報処理装置に専用アプリをインストールさせることにより、その情報処理装置を帳票認識装置として利用すること、つまり本第１のサービスを利用することができる。 For example, an operator installs a dedicated application on an arbitrary information processing device in the purchasing company, and can use the information processing device as a form recognition device, that is, use the first service. can.

帳票認識装置は、帳票のデータ化、つまり電子化に用いられる。例えば、紙媒体の帳票がスキャナにより読み取られると、当該スキャナからは、当該帳票を表す画像のデータが出力される。以下、帳票を表す画像を、「帳票画像」と適宜呼ぶ。また、帳票を表す画像のデータを、「帳票画像データ」と呼ぶ。帳票認識装置は、スキャナから出力された帳票画像データを入力する。 A form recognition device is used to digitize a form, that is, to digitize a form. For example, when a paper-medium form is read by a scanner, the scanner outputs image data representing the form. An image representing a form is hereinafter referred to as a "form image" as appropriate. Data of an image representing a form is called "form image data". The form recognition device receives the form image data output from the scanner.

本第１のサービスで提供される帳票認識装置は、このような帳票画像データを解析することで、帳票に存在する、セル、テーブル、及び文字列領域の夫々を個別に抽出する機能を有している。以下、このような機能を「レイアウト解析機能」と呼ぶ。即ち、本第１のサービスとは、レイアウト解析機能を発揮可能な帳票認識装置を提供することであるとも言える。 The form recognition device provided by the first service has a function of individually extracting each of the cells, tables, and character string regions present in the form by analyzing such form image data. ing. Such a function is hereinafter referred to as a "layout analysis function". That is, it can be said that the first service is to provide a form recognition device capable of exhibiting a layout analysis function.

ここで、セルとは、帳票上に存在する罫線によって囲まれた部分のことである。通常、帳票上の文字列の多くは、セル内に存在する。 Here, a cell is a portion surrounded by ruled lines on a form. Most of the character strings on a form usually exist in cells.

文字列領域とは、１つ以上の文字が連なって何らかの意味を表している文字列が存在する領域のことである。
１つの文字列は、他の文字列と空間的に、或いは内容的に区別可能なものである。また、文字は、線、及び点のうちの少なくとも一方を用いて形作られた記号のことである。ひらがな、カタカナ、漢字、数字、アルファベット、及び各種マーク等は全て文字である。 A character string area is an area in which a string of one or more characters representing some meaning exists.
A character string is spatially or content-distinguishable from other character strings. A character is a symbol formed using at least one of lines and dots. Hiragana, katakana, kanji, numbers, alphabets, and various marks are all characters.

テーブルとは、セル、及びその位置により、グループを構成する１つ以上のセルを特定し、特定した１つ以上のセルが存在する範囲のことである。それにより、表構造を形成している複数のセルは、グループを構成しているとして、１つのテーブルと見なされる。
本第１のサービスでは、セルが存在しない範囲に１つ以上の文字列が配置されていた場合、１つの文字列領域をテーブルとして抽出する。テーブルは、１つ以上のセル、及び１つ以上の文字列領域がグループを構成しているとして抽出する場合もある（後述する図２参照）。 A table is a range in which one or more cells forming a group are identified by the cells and their positions, and the identified one or more cells exist. Thereby, a plurality of cells forming a table structure are regarded as one table as forming a group.
In the first service, when one or more character strings are arranged in a range where no cell exists, one character string area is extracted as a table. A table may be extracted as a group consisting of one or more cells and one or more character string areas (see FIG. 2, which will be described later).

具体的には例えば、本第１のサービスによれば、帳票認識装置は、レイアウト解析機能を発揮することで、帳票画像データを用いて次のような処理を実行する。
即ち、帳票認識装置は、帳票画像に存在する罫線からセルを認識し、認識したセルの帳票画像における位置を特定することで、当該セルを抽出する。
さらに、帳票認識装置は、このようにして抽出した１以上のセル及び位置に基づいて、帳票画像においてグループを構成する１以上のセルを特定し、特定した１以上のセルが存在する範囲をテーブルとして抽出する。なお、帳票画像においてグループを構成する１以上のセルが特定されることにより、テーブルの帳票画像における位置も特定されることになる。
また、帳票認識装置は、帳票に存在する文字列領域を認識し、認識した文字列領域の帳票画像における位置を特定することで、当該文字列領域を抽出する。 Specifically, for example, according to the first service, the form recognition device performs the following processing using the form image data by exerting the layout analysis function.
That is, the form recognition device recognizes a cell from the ruled lines present in the form image, and extracts the cell by specifying the position of the recognized cell in the form image.
Further, the form recognition device identifies one or more cells that form a group in the form image based on the one or more cells and their positions thus extracted, and lists the range in which the identified one or more cells exist. Extract as By specifying one or more cells forming a group in the form image, the position of the table in the form image is also specified.
Further, the form recognition device recognizes a character string area existing in the form, and extracts the character string area by specifying the position of the recognized character string area in the form image.

さらに以下、図１～図３を参照して、レイアウト解析機能の具体例について説明する。 Further, a specific example of the layout analysis function will be described below with reference to FIGS. 1 to 3. FIG.

図１には、本第１のサービスによるレイアウト解析機能の処理結果が表示可能な表示画面ＤＳが示されている。
この表示画面ＤＳには、図１に示すように、表示領域ＤＳ１、表示領域ＤＳ２、及び表示領域ＤＳ３が確保されている。 FIG. 1 shows a display screen DS on which the processing result of the layout analysis function by the first service can be displayed.
As shown in FIG. 1, the display screen DS has a display area DS1, a display area DS2, and a display area DS3.

表示領域ＤＳ１は、レイアウト解析の対象となる帳票画像データ（ファイル）をオペレータに選択させるための表示領域である。
即ち、１以上の帳票画像データはファイルの形態で予め帳票認識装置に入力されている。オペレータは、これらの１以上の帳票画像データのうち、レイアウト解析の対象となる帳票画像データを選択する操作を、表示領域ＤＳ１において行う。 The display area DS1 is a display area for allowing the operator to select form image data (file) to be subjected to layout analysis.
That is, one or more form image data are input in advance to the form recognition device in the form of a file. The operator selects form image data to be subjected to layout analysis from among the one or more form image data in the display area DS1.

表示領域ＤＳ２は、表示領域ＤＳ１において解析対象として選択操作がなされた帳票画像データが表す帳票画像を表示させる領域である。さらに、表示領域ＤＳ２は、レイアウト解析の結果、即ちセル、テーブル、及び文字列領域の夫々の抽出結果を相互に独立して帳票画像に重畳表示させる領域である。
なお、表示領域ＤＳ２の上方部分には、解析対象とする帳票画像（原画像）を含む各種画像のサムネイル画像が表示される。サムネイル画像として表示される画像には、原画像の他に、原画像の２値化画像、原画像上の罫線に着目したサリエンシーマップ画像、及び原画像上の罫線と文字列領域に着目したサリエンシーマップ画像が含まれる。２つのサリエンシーマップ画像は、例えばＤＮＮ（ＤｅｅｐＮｅｕｒａｌＮｅｔｗｏｒｋ）を用いて生成された画像である。 The display area DS2 is an area for displaying the form image represented by the form image data selected as the analysis target in the display area DS1. Further, the display area DS2 is an area in which the layout analysis results, that is, the extraction results of the cell, table, and character string areas are superimposed and displayed on the form image independently of each other.
Thumbnail images of various images including the form image (original image) to be analyzed are displayed in the upper portion of the display area DS2. In addition to the original image, the images displayed as thumbnail images include a binarized image of the original image, a saliency map image focusing on the ruled lines on the original image, and a saliency map image focusing on the ruled lines and character string areas on the original image. Contains a saliency map image. The two saliency map images are images generated using, for example, a DNN (Deep Neural Network).

表示領域ＤＳ３は、表示領域ＤＳ２に表示されたレイアウト解析の結果の詳細を表示させる領域である。 The display area DS3 is an area for displaying the details of the layout analysis result displayed in the display area DS2.

表示領域ＤＳ３の上方部分には、「テーブル」と内部に表示されたボタンＢＴ１、「セル」と内部に表示されたボタンＢＴ２、及び「テキスト」と内部に表示されたボタンＢＴ３が配置された領域ＤＳ３１（以下、「ボタン表示領域ＤＳ３１」と呼ぶ）が配置されている。
ボタンＢＴ１は、処理対象の帳票画像に対して、テーブルの抽出結果を重畳表示させる指示操作するためのボタンである。
ボタンＢＴ２は、処理対象の帳票画像に対して、セルの抽出結果を重畳表示させる指示操作するためのボタンである。
ボタンＢＴ３は、処理対象の帳票画像に対して、文字列領域の抽出結果を重畳表示させる指示操作するためのボタンである。 In the upper portion of the display area DS3, an area in which a button BT1 labeled "table" inside, a button BT2 labeled "cell" inside, and a button BT3 labeled "text" inside are arranged. DS31 (hereinafter referred to as "button display area DS31") is arranged.
The button BT1 is a button for performing an instruction operation to superimpose the extraction result of the table on the form image to be processed.
The button BT2 is a button for performing an instruction operation to superimpose a cell extraction result on the form image to be processed.
The button BT3 is a button for performing an instruction operation to superimpose the extraction result of the character string area on the form image to be processed.

オペレータは、これらのボタンＢＴ１乃至ＢＴ３のうち、確認を所望する抽出結果のボタンを例えばクリック操作することにより、当該確認を所望する抽出結果のみを表示領域ＤＳ２において確認することができる。ここで、確認を所望する抽出結果とは、テーブル、セル、及び文字列領域のうち少なくとも１つ以上の抽出結果をいう。
なお、レイアウト解析の実行直後においては、３つのボタンＢＴ１乃至ＢＴ３の全ては選択状態となっている。この場合、オペレータは、テーブル、セル、及び文字列領域（テキスト）の夫々の抽出結果の全てを確認することができる。 The operator can confirm only the extraction result desired to be confirmed in the display area DS2 by clicking, for example, the button of the extraction result desired to be confirmed among these buttons BT1 to BT3. Here, the extraction result for which confirmation is desired means the extraction result of at least one or more of the table, cell, and character string area.
Note that all the three buttons BT1 to BT3 are in the selected state immediately after the layout analysis is executed. In this case, the operator can confirm all the extraction results of the table, cell, and character string area (text).

ボタン表示領域ＤＳ３１の直下には、「テーブル」と内部に表示されたタブＴ１、「セル」と内部に表示されたタブＴ２、「テキスト」と内部に表示されたタブＴ３、及び、「認識結果（１）」と内部に表示されたタブＴ４が配置されている。これら４つのタブＴ１乃至Ｔ４の下方には、解析結果の詳細の表示用に確保された詳細表示領域ＤＳ３２が配置されている。 Immediately below the button display area DS31, a tab T1 labeled "table", a tab T2 labeled "cell", a tab T3 labeled "text", and a tab T3 labeled "recognition result (1)” is arranged therein. Below these four tabs T1 to T4, a detailed display area DS32 reserved for displaying the details of the analysis results is arranged.

タブＴ１は、レイアウト解析の結果として、テーブルの抽出結果の詳細を表示させる指示操作をさせるためのタブである。本第１のサービスでは、テーブルの抽出結果の詳細として、テーブル毎に、その位置情報、及びそのイメージの抜粋が表示される。 The tab T1 is a tab for instructing operation to display the details of the extraction result of the table as a result of the layout analysis. In the first service, as details of the table extraction results, the location information and an excerpt of its image are displayed for each table.

位置情報は、帳票画像の左上の点を基点とするｘｙ座標を想定して表すものとしている。一方のテーブルの形状としては、基本的に、ｘ軸に平行な２つの線と、ｙ軸に平行な２つの線とで囲まれた矩形形状と想定している。それにより、本第１のサービスでは、位置情報として、基点とテーブルの左上の点との間のｘ軸上、ｙ軸上の各距離、ｘ軸上の長さである幅、及びｙ軸上の長さである高さが抽出される。これらの単位は全て画素（ｐｉｘｃｅｌ）である。この位置情報に含まれる情報は、セル、及び文字列領域についても同様である。
以下、ｘ軸と平行な方向は「左右方向」、ｙ軸と平行な方向は「上下方向」とも表現する。また、特に断らない限り、位置情報は領域の位置、及び形状を表すものの意味で用いる。 The positional information is expressed assuming xy coordinates with the upper left point of the form image as the base point. The shape of one table is basically assumed to be a rectangular shape surrounded by two lines parallel to the x-axis and two lines parallel to the y-axis. As a result, in this first service, as position information, each distance on the x-axis and y-axis between the base point and the upper left point of the table, the width that is the length on the x-axis, and the distance on the y-axis The height, which is the length of , is extracted. All these units are pixels. Information included in this position information is the same for cells and character string areas.
Hereinafter, the direction parallel to the x-axis is also referred to as the "horizontal direction", and the direction parallel to the y-axis is also referred to as the "vertical direction". Further, unless otherwise specified, the positional information is used to mean the position and shape of the area.

「セル」タブＴ２は、解析結果として、セルの抽出結果の詳細を表示させるのを指示するためのタブである。本第１のサービスでは、各セルの位置情報、及び各セルのイメージの抜粋を表示させるようにしている。
「テキスト」タブＴ３は、解析結果として、文字列領域の抽出結果の詳細を表示させるのを指示するためのタブである。本第１のサービスでは、各文字列領域の位置情報、及び各文字列領域のイメージの抜粋を表示させるようにしている。
「認識結果（１）」タブＴ４は、解析結果として、文字列（テキスト）の認識結果を表示させるのを指示するためのタブである。本第１のサービスでは、各文字列の認識結果、及び各文字列の画像の抜粋を夫々、対応させて表示させるようにしている。 The "cell" tab T2 is a tab for instructing display of the details of the cell extraction result as the analysis result. In the first service, location information of each cell and an extract of an image of each cell are displayed.
The "text" tab T3 is a tab for instructing display of the details of the extraction result of the character string area as the analysis result. In the first service, the position information of each character string area and an excerpt of the image of each character string area are displayed.
The "recognition result (1)" tab T4 is a tab for instructing display of the recognition result of the character string (text) as the analysis result. In the first service, the recognition result of each character string and the excerpt of the image of each character string are displayed in association with each other.

オペレータは、これらタブＴ１～４のうちの１つをクリック操作することにより、望む解析結果の詳細のみを詳細表示領域ＤＳ３２で確認することができる。例えば図１に示すように、「認識結果（１）」タブＴ４へのクリック操作により、オペレータは、帳票上に存在する各文字列の認識結果を、その文字列の画像と対比させて確認することができる。また、位置情報の他に、抜粋したイメージも併せて表示させているため、オペレータは、文字列の認識結果を含む解析結果の確認もより迅速、且つより容易に行うことができる。 By clicking one of these tabs T1 to T4, the operator can confirm only the details of the desired analysis result in the detail display area DS32. For example, as shown in FIG. 1, by clicking on the "recognition result (1)" tab T4, the operator confirms the recognition result of each character string present on the form by comparing it with the image of the character string. be able to. In addition to the positional information, the extracted image is also displayed, so the operator can more quickly and easily check the analysis results including the character string recognition results.

表示領域ＤＳ３内の詳細表示領域ＤＳ３２の下方には、「データ保存」ボタンＢＴ５が配置されている。このボタンＢＴ５は、文字列の認識結果、及び各種抽出結果を含む解析結果の保存をオペレータが指示するためのボタンである。図１に示すように、このボタンＢＴ５をクリック操作した場合、データの保存形式をオペレータに選択させるためのポップアップメニューＤＳ３２１が表示される。そのメニューＤＳ３２１には、「データ保存」ボタンＢＴ６が配置されている。それにより、オペレータは、望む保存形式を選択した後、「データ保存」ボタンＢＴ６をクリック操作することにより、望む保存形式で文字列の認識結果を含む解析結果を保存させることができる。 A "save data" button BT5 is arranged below the detail display area DS32 in the display area DS3. This button BT5 is a button for the operator to instruct the saving of the analysis result including the character string recognition result and various extraction results. As shown in FIG. 1, when this button BT5 is clicked, a pop-up menu DS321 is displayed for allowing the operator to select a data storage format. A "data save" button BT6 is arranged in the menu DS321. Accordingly, the operator can save the analysis result including the recognition result of the character string in the desired storage format by clicking the "save data" button BT6 after selecting the desired storage format.

図２は、帳票画像から抽出されるテーブル、セル、及び文字列領域の各抽出結果の例を説明する図である。
図２に示す帳票画像ＦＩの抜粋部分では、テーブルＴＢは計５つが抽出される。５つのテーブルＴＢのうちの４つは全て、１つのセルが１つのテーブルＴＢとして抽出されている。残りの１つは、計１１つのセルＣＥと、そのうちの１つと上下方向に隣接する１つの文字列領域ＴＸとがグループにまとめられたテーブルとなっている。 FIG. 2 is a diagram illustrating an example of extraction results of tables, cells, and character string regions extracted from a form image.
A total of five tables TB are extracted from the extracted portion of the form image FI shown in FIG. One cell is extracted as one table TB in all four of the five tables TB. The remaining one is a table in which a total of 11 cells CE and one character string region TX vertically adjacent to one of them are grouped together.

図２に示すように、１つのセルＣＥのみを有する４つのテーブルＴＢは全て、他のセルＣＥ、及び文字列領域ＴＸの何れからも離れた位置に存在している。そのため、このようなセルＣＥは、他のセルＣＥ、及び文字列領域ＴＸの何れとも論理的な対応関係は存在しないとして、１つだけでも１つのテーブルＴＢとして抽出される。 As shown in FIG. 2, all four tables TB having only one cell CE are located away from any of the other cells CE and the character string area TX. Therefore, even one such cell CE is extracted as one table TB on the assumption that there is no logical correspondence with any of the other cells CE and the character string region TX.

一方、１１つのセルＣＥ、及び１つの文字列領域ＴＸを含むテーブルＴＢでは、各セルＣＥは少なくとも１つの別のセルＣＥと左右方向、或いは上下方向で隣接している。そのため、１１つのセルＣＥは、表構造を形成している要素として扱われ、グループ化される。 On the other hand, in the table TB including 11 cells CE and one character string region TX, each cell CE is adjacent to at least one other cell CE horizontally or vertically. Therefore, the 11 cells CE are treated as elements forming a table structure and grouped.

１１つのセルＣＥと同じグループとされた１つの文字列領域ＴＸは、セルＣＥ内に配置されていない。しかしながら、この文字列領域ＴＸは、上、及び右にそれぞれ位置する２つのセルＣＥと隣接している。その位置関係のため、文字列領域ＴＸは、その２つのセルＣＥ内に配置された何れかの文字列と対応関係が存在すると見なされる。この結果、セルＣＥ内に配置されていない文字列領域ＴＸは、１１つのセルＣＥとグループ化され、１つのテーブルＴＢの要素とされる。実際、この文字列領域ＴＸの文字列である「納期限」は、右に位置するセルＣＥ内の文字列である「平成２６年１２月２０日」との間に意味的な対応関係が存在する。なお、以下、「ＴＸ」は文字列そのものを指す意味でも用いる。 One character string region TX grouped in the same group as the 11 cells CE is not arranged within the cell CE. However, this character string area TX is adjacent to two cells CE located above and to the right respectively. Due to the positional relationship, the character string area TX is considered to have a corresponding relationship with any character string arranged in the two cells CE. As a result, the character string regions TX that are not arranged in the cell CE are grouped with 11 cells CE and made an element of one table TB. In fact, there is a semantic correspondence between the character string "deadline" in the character string area TX and the character string "December 20, 2014" in the cell CE located to the right. do. In addition, hereinafter, "TX" is also used to indicate the character string itself.

図２では、１１つのセルＣＥ、及び１つの文字列領域ＴＸを要素とするテーブルＴＢを例にとって、そのテーブルＴＢ、それに含まれる１つのセルＣＥ、及び文字列領域ＴＸの各位置情報の例を示している。
位置情報を示すセルＣＥは、テーブルＴＢの左側に位置するセルＣＥのうちで上から２番目に位置するセルＣＥである。そのため、位置情報に含まれるｘ軸上の距離がテーブルＴＢのその距離と一致し、ｙ軸上の距離がテーブルＴＢのその距離とは一致していない。 In FIG. 2, a table TB having 11 cells CE and one character string area TX as elements is taken as an example. showing.
The cell CE indicating the location information is the cell CE located second from the top among the cells CE located on the left side of the table TB. Therefore, the distance on the x-axis included in the position information matches the distance on the table TB, and the distance on the y-axis does not match the distance on the table TB.

位置情報を示す文字列領域ＴＸは、テーブルＴＢの左側に位置するセルＣＥのうちで上から２番目に位置するセルＣＥ内に存在する文字列領域ＴＸである。そのため、位置情報は、ｘ軸上、及びｙ軸上の各距離は全てそのセルＣＥの距離より大きい。しかし、幅、及び高さは全て、その文字列領域ＴＸが内側に配置されているセルＣＥの幅、及び高さよりも小さくなっている。 The character string area TX indicating the position information is the character string area TX existing in the cell CE located second from the top among the cells CE located on the left side of the table TB. Therefore, in the position information, each distance on the x-axis and the y-axis are all greater than the distance of the cell CE. However, the width and height are all smaller than the width and height of the cell CE in which the character string region TX is arranged.

図３は、表示領域でのテーブル、セル、及び文字列領域の各種抽出結果の表示例を示す図である。
上記のように、抽出結果としては、テーブルＴＢ、セルＣＥ、及び文字列領域ＴＸの３種類が存在する。ボタン表示領域ＤＳ３１内に配置された３つのボタンＢＴ１～３は、それらのうちの１つを選択的に表示させることを可能にする。その他に、本第１のサービスでは、３種類の抽出結果を初期表示として同時に表示させるようにしている。このことから、図３では、（Ａ）～（Ｄ）に、表示される抽出結果を異ならせて示している。具体的には、図３（Ａ）には初期表示、つまりテーブルＴＢ、セルＣＥ、及び文字列領域ＴＸの各抽出結果を表示させた場合の例を示している。同様に図３（Ｂ）～（Ｄ）には、夫々テーブルＴＢ、セルＣＥ、及び文字列領域ＴＸの抽出結果を表示させた場合の例を示している。それにより、図３（Ａ）では、「テーブル」「セル」「テキスト」の全てのボタンＢＴ１～３が選択状態となっている。図３（Ｂ）～（Ｄ）では、３つのボタンＢＴ１～３のうちの１つのみが選択状態となっている。 FIG. 3 is a diagram showing a display example of various extraction results of tables, cells, and character string areas in the display area.
As described above, there are three types of extraction results: table TB, cell CE, and character string area TX. Three buttons BT1-BT3 arranged in the button display area DS31 make it possible to selectively display one of them. In addition, in the first service, three types of extraction results are displayed simultaneously as an initial display. For this reason, in FIG. 3, (A) to (D) show different extraction results to be displayed. Specifically, FIG. 3A shows an example of an initial display, that is, a case where each extraction result of the table TB, cell CE, and character string region TX is displayed. Similarly, FIGS. 3B to 3D show examples of displaying the extraction results of the table TB, cell CE, and character string region TX, respectively. As a result, in FIG. 3A, all buttons BT1 to BT3 of "table", "cell" and "text" are in the selected state. In FIGS. 3B to 3D, only one of the three buttons BT1 to BT3 is selected.

図３（Ａ）～（Ｄ）に示すように、テーブルＴＢ、セルＣＥ、及び文字列領域ＴＸの各抽出結果は、矩形の枠により示される。枠の色は、テーブルＴＢ、セルＣＥ、及び文字列ＴＸにより異ならせている。それにより、オペレータは、枠の色により、テーブルＴＢ、セルＣＥ、及び文字列領域ＴＸの各抽出結果を視認できるようになっている。 As shown in FIGS. 3A to 3D, each extraction result of the table TB, cell CE, and character string area TX is indicated by a rectangular frame. The color of the frame is made different depending on the table TB, cell CE, and character string TX. Thereby, the operator can visually recognize each extraction result of the table TB, the cell CE, and the character string area TX by the color of the frame.

上記のように、本第１のサービスでは、他のセルＣＥと位置的に離れた１つのセルＣＥは１つのテーブルＴＢとして抽出している（図２参照）。本第１のサービスでは、同様に、他のセルＣＥ、及び他の文字列領域ＴＸの何れとも位置的に離れた１つの文字列領域ＴＸも１つのテーブルＴＢとして抽出している。それにより、図３（Ａ）に示すように、夫々、文字列領域ＴＸ内の文字列である「平成２９年度収支計算書」「収入の部」は何れも１つのテーブルＴＢ、及び１つのセルＣＥとしても抽出されている。文字列領域ＴＸ内の文字列である「自．平成２９年４月１日至．平成３０年０３月３１日」は、「自．平成２９年４月１日」「至．平成３０年０３月３１日」の２つの文節に分割され、各文節が夫々、存在する領域が１つの文字列領域ＴＸ、１つのセルＣＥ、及び１つのテーブルＴＢとして抽出されている。 As described above, in the first service, one cell CE that is located away from other cells CE is extracted as one table TB (see FIG. 2). In the first service, similarly, one character string area TX that is positionally separated from other cells CE and other character string areas TX is also extracted as one table TB. As a result, as shown in FIG. 3(A), each of the character strings in the character string area TX, ``FY2017 Income and Expenditure Statement'' and ``Income Section'', is formed into one table TB and one cell. Also extracted as CE. The character string in the character string area TX, "from April 1, 2017 to. March 31, 2018" is "from. April 1, 2017" to ". Month 31st” is divided into two clauses, and the regions in which each clause exists are extracted as one character string region TX, one cell CE, and one table TB.

本第１のサービスでは、図３（Ａ）～（Ｄ）に示すように、オペレータは、３つのボタンＢＴ１～３のうちの何れかをクリック操作することにより、望む抽出結果を個別に確認することができる。そのため、オペレータにとっては、テーブルＴＢ、セルＣＥ、及び文字列領域ＴＸの各抽出結果の確認が容易に行うことができる。 In the first service, as shown in FIGS. 3A to 3D, the operator individually confirms desired extraction results by clicking any of the three buttons BT1 to BT3. be able to. Therefore, the operator can easily confirm the extraction results of the table TB, cell CE, and character string area TX.

文字列ＴＸの全てを常に適切に認識できるとは限らない。帳票上に手書きの文字列ＴＸが存在している場合、その手書きの文字列ＴＸを適切に認識できる確率は比較的に低くなるのが普通である。ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）等を用いて、データをイメージ化して印刷した帳票であっても、全ての文字列ＴＸを適切に認識できるとは限らない。これは、印刷時に発生した不具合、画像データ化の際に帳票上に付着していたゴミ、若しくは埃、帳票への書き込み、或いは帳票に生じていた損傷、等がありうるからである。 Not all of the string TX can always be properly recognized. When a handwritten character string TX exists on a form, the probability of properly recognizing the handwritten character string TX is normally relatively low. Not all character strings TX can be properly recognized even in a form printed by converting data into an image using a PC (Personal Computer) or the like. This is because there may be a problem that occurred during printing, dirt or dust adhering to the form when the form was converted into image data, writing on the form, damage that occurred on the form, and the like.

このようなことから、帳票のデータ化、つまり電子化の後、オペレータが文字列の認識結果を少なくとも確認し、認識結果の誤りを修正する後続処理を行うのが普通である。後続処理は、多数の帳票を対象にする場合も多く、対象となる帳票の形式も様々であることも多い。帳票の形式の種類が多くなるほど、オペレータにとっては文字列の認識結果、その対応関係の確認に必要となる時間は長くなる。このようなことから、購入企業にとっては、オペレータによる後続処理をより効率的に行えるようにすることが望まれる。 For this reason, it is common for the operator to at least confirm the recognition result of the character string and perform subsequent processing to correct any errors in the recognition result after converting the form into data, that is, digitizing it. Subsequent processing often targets a large number of forms, and the formats of the target forms are often varied. As the types of form formats increase, the time required for the operator to confirm the recognition results of character strings and their corresponding relationships increases. For this reason, it is desirable for the purchasing company to enable operators to perform subsequent processing more efficiently.

本第１のサービスでは、上記のようなレイアウト解析により、文字列の認識結果の他に、セルＣＥ、文字列領域ＴＸ、及びテーブルＴＢを夫々抽出した結果を含む解析結果が保存可能である。それにより、オペレータは、抽出されたセルＣＥ、文字列領域ＴＸ、及びテーブルＴＢの各抽出結果も視認できる。 According to the layout analysis as described above, the first service can save analysis results including the results of extracting the cell CE, the character string area TX, and the table TB, in addition to the character string recognition results. Thereby, the operator can visually recognize each extraction result of the extracted cell CE, the character string area TX, and the table TB.

帳票上の文字列の多くはセル内に存在する。そのため、セルＣＥ、及び文字列領域ＴＸの何れを視認可能にしても、文字列ＴＸの認識結果、及び文字列ＴＸ間の対応関係の確認をオペレータはより容易、且つより迅速に行えるようになる。 Most of the character strings on the form exist in cells. Therefore, even if either the cell CE or the character string area TX is visible, the operator can more easily and quickly check the recognition result of the character string TX and the correspondence between the character strings TX. .

一方、テーブルを視認可能にした場合、オペレータに対し、テーブル単位で文字列ＴＸの認識結果、及びその対応関係を確認させることが可能となる。テーブルＴＢは、上記のように、１つのセルＣＥ（或いは１つの文字列領域ＴＸ）以上の領域であり、帳票によっては、表構造を考慮しつつ、その帳票を複数の部分に空間的、及び論理的に分割する。それにより、オペレータにとってのテーブル単位の認識結果の確認は、帳票平均ではより容易、且つより迅速に行えるようになる。これは、テーブルＴＢで分割する結果、考慮すべき文字列ＴＸの数がより少なくなる他に、表構造をより意識できるようになって、文字列ＴＸ間の適切な対応関係の予測がより容易となるためである。文字列ＴＸ間の適切な対応関係を予測できれば、オペレータは、文字列ＴＸの認識結果が適切か否かだけでなく、文字列ＴＸ間で特定された対応関係が適切か否かまでより容易、且つより迅速に確認することができる。
このようなことから、オペレータは、後続処理をより確実に効率的に行うことができる。 On the other hand, when the table is made visible, it is possible for the operator to check the recognition result of the character string TX and the corresponding relationship for each table. As described above, the table TB is an area larger than one cell CE (or one character string area TX). Divide logically. As a result, it becomes easier and faster for the operator to check the recognition result for each table in the form average. As a result of dividing the table TB, the number of character strings TX to be considered is reduced, and the table structure becomes more recognizable, making it easier to predict appropriate correspondences between character strings TX. This is because If an appropriate correspondence relationship between character strings TX can be predicted, the operator can easily determine not only whether the recognition result of the character string TX is appropriate, but also whether the correspondence specified between the character strings TX is appropriate. and can be verified more quickly.
As such, the operator can more reliably and efficiently perform subsequent processing.

図４は、本発明の情報処理装置の一実施形態に係る帳票認識装置により実現可能となる第２のサービス（以下、「本第２のサービス」と呼ぶ）の概要を説明する図である。 FIG. 4 is a diagram for explaining an outline of a second service (hereinafter referred to as "this second service") that can be realized by the form recognition device according to one embodiment of the information processing device of the present invention.

本第２のサービスも、帳票認識装置（後述する図６参照）により実現可能なサービスである。サービス提供者は、例えば本第１のサービスと同様に、専用アプリを開発することにより、その専用アプリを購入した個人、或いは組織に対し、本第２のサービスを提供する。ここでも、専用アプリの購入等をしたのは組織と想定し、その組織も以下、「購入企業」と呼び、購入企業内で専用プリを実際に使用する者は「オペレータ」と呼ぶこととする。 This second service is also a service that can be realized by a form recognition device (see FIG. 6, which will be described later). The service provider, for example, develops a dedicated application similar to the first service, and provides the second service to individuals or organizations who have purchased the dedicated application. Here too, it is assumed that it is the organization that purchased the dedicated app, etc., and that organization is hereinafter referred to as the "purchasing company", and the person who actually uses the dedicated app within the purchasing company is referred to as the "operator". .

オペレータは、例えば購入企業内の任意の情報処理装置に専用アプリをインストールさせることにより、その情報処理装置を帳票認識装置として利用すること、つまり本第２のサービスを利用することができる。 For example, by installing a dedicated application in any information processing device in the purchasing company, the operator can use the information processing device as a form recognition device, that is, use the second service.

本第２のサービスで提供される帳票認識装置は、帳票画像データを用いた処理を行うことで、帳票に存在する文字列ＴＸを認識し、所定条件を満たす２つの文字列ＴＸの組を特定する機能を有している。以下、このような機能を「キー－バリュー抽出機能」と呼ぶ。即ち、本第２のサービスとは、キー－バリュー抽出機能を発揮可能な帳票認識装置を提供することであるとも言える。 The form recognition device provided by the second service recognizes character strings TX present in a form by performing processing using form image data, and identifies a set of two character strings TX that satisfy a predetermined condition. It has the function to Such a function is hereinafter referred to as a "key-value extraction function". That is, it can be said that the second service is to provide a form recognition device capable of exhibiting a key-value extraction function.

ここで、キーとは、２つの文字列ＴＸの組での対応関係上、論理的に上位側に位置する文字列ＴＸのことである。バリューとは、その対応関係上、論理的に下位側に位置する文字列ＴＸのことである。通常、キーは、バリューとなる文字列ＴＸが表す具体的な内容に対応する識別子を表す文字列ＴＸである。 Here, the key is a character string TX that is logically located on the upper side in terms of the correspondence between two character strings TX. A value is a character string TX that is logically located on the lower side in terms of the correspondence relationship. A key is usually a character string TX that represents an identifier corresponding to the specific content represented by the character string TX that is a value.

具体的には例えば、本第２のサービスによれば、帳票認識装置は、キー－バリュー抽出機能を発揮することで、帳票画像データを用いて次のような処理を実行する。
即ち、帳票認識装置は、帳票を表す画像のデータを用いて、帳票画像に存在する、１つ以上の文字が連なる文字列ＴＸを複数認識するとともに、認識した複数の文字列ＴＸの夫々の帳票画像における位置情報を特定する。
さらに、帳票認識装置は、複数の文字列ＴＸの夫々の認識結果、及び複数の文字列ＴＸの位置情報を用いて、帳票に存在する複数の文字列ＴＸのうち、所定の２つの文字列ＴＸの間の対応関係を決定する。
また、帳票認識装置は、対応関係が決定された所定の２つの文字列ＴＸのうち、所定条件を満たす２つの文字列ＴＸの組を特定する。 Specifically, for example, according to the second service, the form recognition device performs the following processing using the form image data by exerting the key-value extraction function.
That is, the form recognition apparatus uses data of an image representing a form to recognize a plurality of character strings TX in which one or more characters are consecutively present in the form image, and recognizes each form of the plurality of recognized character strings TX. Identify location information in an image.
Furthermore, the form recognition device recognizes two predetermined character strings TX out of the plurality of character strings TX present in the form, using the recognition results of each of the plurality of character strings TX and the position information of the plurality of character strings TX. determine the correspondence between
Also, the form recognition device identifies a set of two character strings TX that satisfy a predetermined condition, among the two predetermined character strings TX for which the correspondence relationship has been determined.

さらに以下、図４及び図５を参照して、レイアウト解析機能の具体例について説明する。 Further, a specific example of the layout analysis function will be described below with reference to FIGS. 4 and 5. FIG.

図４には、本第２のサービスの概要として、帳票画像ＦＩから文字列ＴＸを認識して、２つの文字列ＴＸの組を特定する流れの例が示されている。
本第２のサービスでは、帳票画像ＦＩから認識した文字列ＴＸはグラフのノードＮＤとして扱われる。このグラフは、ノードＮＤ間を線で結んだものであり、各線は２つのノードＮＤ間の対応関係、つまりエッジＥＤを示している。それにより、２つの文字列ＴＸの組の特定は、グラフから、エッジＥＤで結ぶ２つのノードＮＤを抽出することに相当する。 FIG. 4 shows, as an overview of the second service, an example of the flow of recognizing a character string TX from a form image FI and identifying a set of two character strings TX.
In the second service, the character string TX recognized from the form image FI is treated as a node ND of the graph. This graph connects the nodes ND with a line, and each line indicates the correspondence between two nodes ND, that is, the edge ED. Accordingly, specifying a set of two character strings TX corresponds to extracting two nodes ND connected by an edge ED from the graph.

図４の右側に示すグラフでは、ノードＮＤに符号としてＮＤ１～３のうちの何れかを付し、エッジＥＤに符号としてＥＤ１～３のうちの何れかを付している。ＮＤ１～３は、夫々異なるノードＮＤの属性を表している。ＥＤ１～３は、そのエッジＥＤにより結びつけられるノードＮＤの属性の異なる組み合わせを表している。なお、区別する必要がないような場合、ノードの符号としては「ＮＤ」、エッジの符号としては「ＥＤ」を用いる。 In the graph shown on the right side of FIG. 4, the node ND is assigned one of ND1 to ND3 as a symbol, and the edge ED is assigned one of ED1 to 3 as a symbol. ND1 to ND3 represent attributes of different nodes ND. ED1-3 represent different combinations of attributes of the nodes ND connected by the edges ED. If there is no need to distinguish between them, "ND" is used as the node code, and "ED" is used as the edge code.

具体的には、ＮＤ１はキーと分類されたノード、ＮＤ２はバリューと分類されたノード、ＮＤ３はそれら以外、つまりその他と分類されたノードを表している。また、ＥＤ１は、キー、及びバリューと夫々分類された２つのノードＮＤ１、ＮＤ２を結びつけるエッジを表している。同様に、ＥＤ２は、キー、及びその他と夫々分類された２つのノードＮＤ１、ＮＤ３を結びつけるエッジ、ＥＤ３は、それら以外の組み合わせとなっている２つのノードＮＤを結びつけるエッジ、を表している。 Specifically, ND1 represents a node classified as a key, ND2 represents a node classified as a value, and ND3 represents a node classified as others. Also, ED1 represents an edge connecting two nodes ND1 and ND2 classified as key and value, respectively. Similarly, ED2 represents an edge that connects two nodes ND1 and ND3 classified as keys and others, respectively, and ED3 represents an edge that connects two nodes ND that are combined other than those.

図５は、帳票画像から特定される２つの文字列の組の例を示す図である。
図５（Ａ）、及び（Ｂ）では、異なる帳票毎に、その帳票画像から夫々特定される２つの文字列ＴＸの組の例を示している。
図５（Ａ）に示す帳票画像では、２つの文字列ＴＸの組として、文字列ＴＸ１１である「賦課年度」と文字列ＴＸ１３である「平成２６」、及び文字列ＴＸ１２である「対象年度」と文字列ＴＸ１４である「平成２６」が特定される。他に、「通知書番号」と「６２００１００００１」、「期／月」と「７月」、「納期限」と「平成２６年１２月２０日」の２つの文字列ＴＸの組が特定される。 FIG. 5 is a diagram showing an example of a set of two character strings specified from a form image.
FIGS. 5A and 5B show an example of a set of two character strings TX specified from the form image for each different form.
In the form image shown in FIG. 5(A) , as a set of two character strings TX, the character string TX11 is “assessment year”, the character string TX13 is “Heisei 26”, and the character string TX12 is “target year”. and "Heisei 26" which is the character string TX14 is specified. In addition, a set of two character strings TX of "notification number" and "6200100001", "period/month" and "July", and "deadline" and "December 20, 2014" is specified. .

同様に図５（Ｂ）に示す帳票画像では、２つの文字列ＴＸの組として、文字列ＴＸ２１である「合計額」と文字列ＴＸ２３である「５３，９９９」が特定される。また、文字列ＴＸ２２である「領収金額」と文字列ＴＸ２４である「５３，９９９」も２つの文字列ＴＸの組として特定される。他に、「お客様番号」と「０７２－０００００２８－００５」、「使用期間」と「平成２６年８月１日～平成２６年９月３０日」、「汚水量」と「４４」、「使用量」と「５３，９９９」、「催促手数料」と「０」、「既納入済分」と「０」、「納入期限」と「平成２７年３月３１日」も２つの文字列ＴＸの組として特定される。 Similarly, in the form image shown in FIG. 5B, the character string TX21 "total amount" and the character string TX23 "53,999" are identified as a set of two character strings TX. In addition, the character string TX22 "receipt amount" and the character string TX24 "53,999" are also identified as a set of two character strings TX. In addition, "customer number" and "072-0000028-005", "period of use" and "August 1, 2014 to September 30, 2014", "waste water amount" and "44", "usage Amount" and "53,999", "reminder fee" and "0", "already delivered" and "0", "delivery deadline" and "March 31, 2015" are also two character strings TX specified as a set.

一般的に帳票認識装置では、帳票上に存在する全ての文字列ＴＸを認識し、その認識結果を保存する。オペレータによる後続処理では、基本的に、その認識結果の全てを確認するようになっている。 Generally, a form recognition device recognizes all character strings TX present on a form and saves the recognition result. Subsequent processing by the operator basically confirms all of the recognition results.

しかしながら、帳票上に存在する文字列ＴＸの重要度には違いがある場合がある。例えば表構造を形成する文字列は、他の文字列ＴＸとの間の対応関係が存在する場合が多いこともあり、オペレータが確認する必要性が比較的に高いのが普通である。これに対し、帳票の名称等の文字列ＴＸは、他の文字列ＴＸとは対応関係がないのが普通であることから、確認する必要性は比較的に低いのが普通である。 However, the importance of character strings TX present on a form may differ. For example, character strings forming a tabular structure often have a corresponding relationship with other character strings TX, and the need for confirmation by the operator is usually relatively high. On the other hand, since the character string TX such as the name of the form usually has no correspondence with other character strings TX, the necessity of confirmation is usually relatively low.

文字列ＴＸの間の対応関係については、３つ以上の文字列ＴＸの間で対応関係が存在する場合がある。それらのうちで重要度が特に高いのは、表構造上、端に位置する文字列ＴＸ、より具体的には右端、或いは下端に位置する文字列ＴＸである場合が多い。端に位置する文字列ＴＸとの間に直接的な対応関係がある文字列ＴＸも重要度が比較的に高い場合が多い。重要度が高い文字列ＴＸほど、適切に確認する必要がある。 As for the correspondence between character strings TX, there may be correspondence between three or more character strings TX. Of these, the character string TX positioned at the end, more specifically, the character string TX positioned at the right end or the bottom end is often the most important in terms of the table structure. A character string TX that has a direct correspondence with the character string TX located at the end often has a relatively high degree of importance. A character string TX with a higher degree of importance needs to be confirmed appropriately.

また、オペレータの負担、或いは時間的な関係から、比較的に重要度の高い文字列ＴＸのみを後続処理で確認すれば良いというような状況が生じることもありうる。
重要度が比較的に高い文字列ＴＸの認識結果のみを後続処理で確認するような場合、オペレータは、全ての認識結果のうちから確認すべき認識結果を探し出して確認しなければならず、後続処理を迅速に行うのが困難となる。 Also, due to the operator's burden or time constraints, a situation may occur in which only the character string TX with a relatively high degree of importance needs to be confirmed in subsequent processing.
If only the recognition result of the character string TX with a relatively high degree of importance is to be confirmed in subsequent processing, the operator must find the recognition result to be confirmed from among all the recognition results and confirm it. It becomes difficult to process quickly.

これに対し、本第２のサービスでは、帳票画像上で認識された文字列ＴＸのうちから、後続処理の対象となる文字列ＴＸを制限させることができる。特に重要と考えられる２つの文字列ＴＸの組のみを保存する、或いはそれらを別に分ける、といったことができる。
このようなことから、オペレータにとっては、その２つの文字列ＴＸの組を適切に確認するのがより容易に行えるようになる。それにより、後続処理における重度のミスをするのをオペレータはより回避できるようにもなる。これらのことから、オペレータは、状況等に応じた適切な後続処理をより迅速に行うことができる。 On the other hand, in the second service, it is possible to limit the character strings TX to be subjected to subsequent processing from among the character strings TX recognized on the form image. It is possible to store only the set of two strings TX that are considered particularly important, or to separate them separately.
As such, it becomes easier for the operator to properly confirm the pair of the two character strings TX. It also allows the operator to better avoid making serious mistakes in subsequent processing. As a result, the operator can more quickly perform appropriate follow-up processing according to the situation or the like.

図６は、本発明の情報処理装置の一実施形態に係る帳票認識装置を用いて構築された情報処理システムの構成例を示す図である。
帳票認識装置１は、購入組織が用意した情報処理装置であり、購入した専用アプリがインストールされている。その帳票認識装置１は、帳票のイメージを読み取って電子化し、電子化によって得られる帳票画像データを出力するのが可能なスキャナ２と接続されている。それにより、情報処理システムは、帳票認識装置１とスキャナ２とを接続させた構成となっている。ここでは、帳票認識装置１は後続処理にオペレータが使用する想定である。 FIG. 6 is a diagram showing a configuration example of an information processing system constructed using a form recognition device according to an embodiment of the information processing device of the present invention.
The form recognition device 1 is an information processing device prepared by the purchasing organization, and the purchased dedicated application is installed. The form recognition apparatus 1 is connected to a scanner 2 capable of reading an image of a form, digitizing it, and outputting the form image data obtained by the digitization. Thus, the information processing system has a configuration in which the form recognition device 1 and the scanner 2 are connected. Here, it is assumed that the form recognition device 1 is used by the operator for subsequent processing.

図７は、本発明の情報処理装置の一実施形態に係る帳票認識装置のハードウェア構成の一例を示すブロック図である。
帳票認識装置１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１１と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｙ）１３と、バス１４と、入出力インターフェース１５と、出力部１６と、入力部１７と、記憶部１８と、通信部１９と、ドライブ２０と、を備えている。 FIG. 7 is a block diagram showing an example of the hardware configuration of the form recognition device according to one embodiment of the information processing device of the present invention.
The form recognition device 1 includes a CPU (Central Processing Unit) 11, a ROM (Read Only Memory) 12, a RAM (Random Access Memory) 13, a bus 14, an input/output interface 15, an output section 16, and an input section. 17 , a storage unit 18 , a communication unit 19 and a drive 20 .

ＣＰＵ１１は、ＲＯＭ１２に記録されているプログラム、又は記憶部１８からＲＡＭ１３にロードされた各種プログラムに従って各種の処理を実行する。各種プログラムには、上記２つの専用アプリが含まれる。その各種プログラムをＣＰＵ１１が実行することにより、情報処理装置は帳票認識装置１として機能する。
ＲＡＭ１３には、ＣＰＵ１１が各種の処理を実行する上において必要なデータ等も適宜記憶される。そのデータには、ＣＰＵ１１が実行する各種プログラムも含まれる。 The CPU 11 executes various processes according to programs recorded in the ROM 12 or various programs loaded from the storage unit 18 to the RAM 13 . Various programs include the above two dedicated applications. The CPU 11 executes the various programs so that the information processing device functions as the form recognition device 1 .
The RAM 13 also stores data necessary for the CPU 11 to execute various processes. The data also includes various programs executed by the CPU 11 .

ＣＰＵ１１、ＲＯＭ１２及びＲＡＭ１３は、バス１４を介して相互に接続されている。このバス１４にはまた、入出力インターフェース１５も接続されている。入出力インターフェース１５には、出力部１６、入力部１７、記憶部１８、通信部１９、及びドライブ２０が接続されている。 The CPU 11 , ROM 12 and RAM 13 are interconnected via a bus 14 . An input/output interface 15 is also connected to this bus 14 . An output unit 16 , an input unit 17 , a storage unit 18 , a communication unit 19 and a drive 20 are connected to the input/output interface 15 .

出力部１６は、例えば液晶等のディスプレイを含む構成である。出力部１６は、ＣＰＵ１１の制御により、各種画像を表示する。各種画像には、図１に示すような表示画面ＤＳが含まれる。
入力部１７は、例えばキーボード等の各種ハードウェア釦等を含む構成である。それにより、は、入力部１７を介して各種情報を入力することができる。入力部１７には、ポインティングデバイス、タッチパネル等の入力装置が複数、含まれていても良い。 The output unit 16 is configured to include a display such as a liquid crystal, for example. The output unit 16 displays various images under the control of the CPU 11 . Various images include a display screen DS as shown in FIG.
The input unit 17 includes, for example, various hardware buttons such as a keyboard. As a result, various information can be input via the input unit 17 . The input unit 17 may include a plurality of input devices such as pointing devices and touch panels.

記憶部１８は、例えばハードディスク装置、或いはＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の補助記憶装置である。データ量の大きいデータは、この記憶部１８に記憶される。
通信部１９は、スキャナ２との間の通信を可能にする。 The storage unit 18 is, for example, a hard disk device or an auxiliary storage device such as an SSD (Solid State Drive). Data having a large amount of data are stored in the storage unit 18 .
A communication unit 19 enables communication with the scanner 2 .

ドライブ２０は、必要に応じて設けられる。ドライブ２０には、磁気ディスク、光ディスク、光磁気ディスク、或いは半導体メモリカード等のリムーバブルメディア３１を着脱させることができる。プログラムが記録されたリムーバブルメディア３１をドライブ２０に装着させた場合、そのプログラムを記憶部１８に記憶させることができる。また、リムーバブルメディア３１は、記憶部１８に記憶されている各種データのコピー先、或いは移動先として用いることができる。 A drive 20 is provided as required. A removable medium 31 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory card can be attached to and detached from the drive 20 . When the removable medium 31 on which the program is recorded is attached to the drive 20 , the program can be stored in the storage section 18 . Also, the removable medium 31 can be used as a copy destination or a transfer destination of various data stored in the storage unit 18 .

このような帳票認識装置１が備えるハードウェア資源を各種プログラムによって制御する結果、帳票認識装置１は、専用アプリを購入した購入企業に属するオペレータに対し、本第１、及び第２のサービスを提供することができる。後述する各種処理は、各種プログラムがＣＰＵ１１に実行されることにより実現される。 As a result of controlling the hardware resources of the form recognition device 1 by various programs, the form recognition device 1 provides the first and second services to the operator belonging to the company that purchased the dedicated application. can do. Various processes to be described later are realized by the CPU 11 executing various programs.

図８は、本発明の情報処理装置の一実施形態に係る帳票認識装置上に実現される機能的構成の一例を示す機能ブロック図である。 FIG. 8 is a functional block diagram showing an example of a functional configuration realized on the form recognition device according to one embodiment of the information processing device of the present invention.

図８に示すように、帳票認識装置１のＣＰＵ１１においては、２つの専用アプリが実行される想定の場合、レイアウト解析部１０１と、キー－バリュー抽出部１０２とが機能する。それにより、ＣＰＵ１１は、情報処理装置である帳票認識装置１を制御するコンピューターに相当する。 As shown in FIG. 8, in the CPU 11 of the form recognition device 1, when it is assumed that two dedicated applications are executed, the layout analysis section 101 and the key-value extraction section 102 function. Accordingly, the CPU 11 corresponds to a computer that controls the form recognition device 1, which is an information processing device.

レイアウト解析部１０１は、本第１のサービスが提供可能なレイアウト解析機能を発揮する。スキャナ２から送信された帳票画像データは、通信部１９によって受信され、通信部１９からＣＰＵ１１に入力される。帳票画像データがＣＰＵ１１に入力された場合、有効となっているレイアウト解析部１０１においては、図８に示すように、セル抽出部１１１と、テーブル抽出部１１２と、文字列認識部１１３と、表示制御部１１４と、及び入力制御部１１５とが機能する。 The layout analysis unit 101 exhibits a layout analysis function that can be provided by the first service. The form image data transmitted from the scanner 2 is received by the communication section 19 and input from the communication section 19 to the CPU 11 . When form image data is input to the CPU 11, the active layout analysis unit 101 includes a cell extraction unit 111, a table extraction unit 112, a character string recognition unit 113, and a display unit 113, as shown in FIG. A control unit 114 and an input control unit 115 function.

セル抽出部１１１は、帳票画像データを用いて、帳票に存在するセルを認識し、認識したセルの帳票画像における位置を特定することで、当該セルを抽出する。
具体的には例えば、セル抽出部１１１は、帳票画像データが表す帳票画像ＦＩ上に存在する罫線を検出することにより、セルＣＥを抽出する。また、セル抽出部１１１は、抽出したセルＣＥ毎に、帳票画像においてセルＣＥが存在する範囲をセル領域として抽出するとともに、そのセル領域の位置情報を抽出する（図２参照）。なお、位置情報のｘｙ軸の各距離は、上記のように、帳票画像上で帳票と見なされる矩形形状が抽出され、抽出された矩形形状の左上の点が基点とされて表されたものである。これは、文字列領域ＴＸ、及びテーブルＴＢでも同様である。 The cell extraction unit 111 uses the form image data to recognize cells existing in the form, and extracts the cells by specifying the positions of the recognized cells in the form image.
Specifically, for example, the cell extraction unit 111 extracts the cells CE by detecting ruled lines existing on the form image FI represented by the form image data. For each extracted cell CE, the cell extraction unit 111 also extracts the range in which the cell CE exists in the form image as a cell area, and extracts the position information of the cell area (see FIG. 2). As described above, each distance on the xy axis of the position information is expressed by extracting a rectangular shape that is regarded as a form on the form image, and using the upper left point of the extracted rectangular shape as a base point. be. This also applies to the character string area TX and the table TB.

文字列認識部１１３は、帳票画像ＦＩ上に存在する文字列領域ＴＸを認識し、認識した文字列領域ＴＸの帳票画像ＦＩにおける位置（位置情報）を特定することで、文字列領域ＴＸを抽出する。文字列ＴＸは、１つ以上の文字が連なるものである。文字列認識部１１３は、帳票画像ＦＩを表す画像のデータを用いて、帳票に存在する文字列ＴＸを複数認識する。なお、レイアウト解析部１０１の一部として機能させる場合、文字列領域ＴＸの認識、及びその位置の特定のみを行わせても良い。つまり文字列ＴＸの認識は必須とはならない。 The character string recognition unit 113 recognizes the character string area TX existing on the form image FI and identifies the position (position information) of the recognized character string area TX in the form image FI, thereby extracting the character string area TX. do. The character string TX is a sequence of one or more characters. The character string recognition unit 113 recognizes a plurality of character strings TX present in the form using the image data representing the form image FI. When functioning as part of the layout analysis unit 101, only recognition of the character string area TX and specification of its position may be performed. That is, recognition of the character string TX is not essential.

テーブル抽出部１１２は、セル抽出部１１１により抽出された１以上のセルＣＥ及び位置に基づいて、帳票画像ＦＩにおいてグループを構成する１以上のセルＣＥを特定し、特定した１以上のセルＣＥが存在する範囲をテーブルＴＢとして抽出する。文字列領域ＴＸから、或いは文字列領域ＴＸを含むテーブルＴＢを抽出する場合もある。このことから、テーブル抽出部１１２は、セル抽出部１１１によるセルＣＥ毎の位置情報抽出結果、及び文字列認識部１１３による文字列領域ＴＸ毎の位置情報抽出結果を参照して、テーブルＴＢを抽出する。テーブル抽出部１１２は、抽出したテーブルＴＢ毎に、そのテーブルＴＢの位置情報も併せてを抽出する（図２参照）。 The table extraction unit 112 identifies one or more cells CE forming a group in the form image FI based on the one or more cells CE and the positions extracted by the cell extraction unit 111, and the identified one or more cells CE are Extract the existing range as a table TB. A table TB may be extracted from the character string area TX or including the character string area TX. Therefore, the table extraction unit 112 extracts the table TB by referring to the location information extraction result for each cell CE by the cell extraction unit 111 and the location information extraction result for each character string region TX by the character string recognition unit 113. do. The table extraction unit 112 also extracts the position information of the table TB for each extracted table TB (see FIG. 2).

このようなセル抽出部１１１、テーブル抽出部１１２、及び文字列認識部１１３により、図１に示すような表示画面ＤＳの表示に必要な情報が生成される。セル抽出部１１１、テーブル抽出部１１２、及び文字列認識部１１３により得られた情報は、解析結果として、記憶部１８に確保された解析結果格納部１８２に格納される。また、スキャナ２から送信され、通信部１９を介してＣＰＵ１１に入力された帳票画像データは、記憶部１８に確保された画像格納部１８３に格納される。 The cell extraction unit 111, the table extraction unit 112, and the character string recognition unit 113 generate information necessary for displaying the display screen DS as shown in FIG. Information obtained by the cell extraction unit 111, the table extraction unit 112, and the character string recognition unit 113 is stored as an analysis result in the analysis result storage unit 182 secured in the storage unit 18. FIG. Form image data transmitted from the scanner 2 and input to the CPU 11 via the communication section 19 is stored in the image storage section 183 secured in the storage section 18 .

表示制御部１１４は、夫々抽出されたセルＣＥ、テーブルＴＢ、及び文字列ＴＸの夫々の位置を視認可能な形態で、画像を表示させる制御を実行する。それにより、表示制御部１１４は、図１に示すような表示画面ＤＳを出力部１６に表示させることができる。表示画面ＤＳを出力部１６に表示させる場合、表示制御部１１４は、画像格納部１８３からは対応する帳票画像データ、解析結果格納部１８２からは対応する解析結果を夫々読み出し、表示画面ＤＳの表示用データを生成する。生成された表示用データが出力部１６に出力されることにより、出力部１６は表示画面ＤＳを表示させる。 The display control unit 114 performs control to display an image in a form in which the positions of the extracted cells CE, table TB, and character string TX can be visually recognized. Accordingly, the display control unit 114 can cause the output unit 16 to display a display screen DS as shown in FIG. When displaying the display screen DS on the output unit 16, the display control unit 114 reads the corresponding form image data from the image storage unit 183 and the corresponding analysis result from the analysis result storage unit 182, and displays the display screen DS. Generate data for By outputting the generated display data to the output unit 16, the output unit 16 displays the display screen DS.

入力制御部１１５は、入力部１７に対してオペレータが行った操作を認識して処理し、その認識結果に応じた制御を行う。つまり、入力制御部１１５は、３つのボタンＢＴ１～３、「データ保存」ボタンＢＴ５及びＢＴ６、並びに４つのタブＴ１～４へのオペレータによるクリック操作、メニューＤＳ３２１でのオペレータによる保存形式の選択に対応する。それにより、オペレータは、３つのボタンＢＴ１～３の何れかへのクリック操作により、表示領域ＤＳ２上で望む抽出結果を確認することができる。また、オペレータは、４つのタブＴ１～４の何れかへのクリック操作により、望む抽出結果の詳細、或いは文字列ＴＸの認識結果を表示領域ＤＳ３上で確認することができる。そのために、入力制御部１１５は、操作を認識したボタン、或いはタブに応じた指示を表示制御部１１４に対して行う。 The input control unit 115 recognizes and processes the operation performed by the operator on the input unit 17, and performs control according to the recognition result. In other words, the input control unit 115 corresponds to the operator's click operation on the three buttons BT1 to BT3, the "data save" buttons BT5 and BT6, and the four tabs T1 to T4, and the operator's selection of the save format on the menu DS321. do. Accordingly, the operator can confirm the desired extraction result on the display area DS2 by clicking any one of the three buttons BT1 to BT3. Further, the operator can confirm the details of the desired extraction result or the recognition result of the character string TX on the display area DS3 by clicking any of the four tabs T1 to T4. Therefore, the input control unit 115 issues an instruction to the display control unit 114 according to the button or tab whose operation has been recognized.

オペレータがメニューＤＳ３２１で保存形式を選択した後、「データ保存」ボタンＢＴ６をクリック操作した場合、入力制御部１１５は、そのクリック操作に応じた指示を表示制御部１１４に対して行う。それにより、表示制御部１１４は、例えばメニューＤＳ３２１を表示させる前の状態の表示画面ＤＳを出力部１６に表示させる。 When the operator selects the save format in the menu DS321 and then clicks the "save data" button BT6, the input control unit 115 instructs the display control unit 114 according to the click operation. Thereby, the display control unit 114 causes the output unit 16 to display the display screen DS in the state before the menu DS321 is displayed, for example.

その一方、入力制御部１１５は、選択された保存形式を表す情報をテーブル抽出部１１２に渡し、その保存形式での解析結果の保存を指示する。
その指示により、テーブル抽出部１１２は、オペレータが選択した保存形式で解析結果を記憶部１８に確保された解析結果保存部１８１に格納する。 On the other hand, the input control unit 115 passes information representing the selected storage format to the table extraction unit 112 and instructs storage of the analysis results in that storage format.
According to the instruction, the table extraction unit 112 stores the analysis result in the analysis result storage unit 181 secured in the storage unit 18 in the storage format selected by the operator.

キー－バリュー抽出部１０２は、本第２のサービスが提供可能なキー－バリュー抽出機能を発揮する。有効となっているキー－バリュー抽出部１０２においては、図８に示すように、文字列認識部１１３と、対応関係決定部１１６と、及び特定部１１７とが機能する。 The key-value extraction unit 102 exhibits a key-value extraction function that can be provided by the second service. In the enabled key-value extraction unit 102, as shown in FIG. 8, a character string recognition unit 113, a correspondence determination unit 116, and an identification unit 117 function.

キー－バリュー抽出部１０２が有効となっている場合、文字列認識部１１３は、帳票画像ＦＩを表す画像のデータを用いて、帳票に存在する、１つ以上の文字が連なる文字列ＴＸを複数認識するとともに、認識した複数の文字列ＴＸの夫々の画像における位置情報を特定する。 When the key-value extraction unit 102 is enabled, the character string recognition unit 113 uses the data of the image representing the form image FI to extract a plurality of character strings TX in which one or more characters are consecutively present in the form. Along with recognition, position information in each image of the recognized character strings TX is specified.

対応関係決定部１１６は、文字列認識部１１３による複数の文字列ＴＸの夫々の認識結果、及び複数の文字列ＴＸの位置情報を用いて、帳票画像ＦＩに存在する複数の文字列ＴＸのうち、所定の２つの文字列ＴＸの間の対応関係を決定する（図５参照）。 The correspondence determination unit 116 uses the recognition result of each of the plurality of character strings TX by the character string recognition unit 113 and the position information of the plurality of character strings TX to identify the character strings TX among the plurality of character strings TX present in the form image FI. , determine the correspondence between two given strings TX (see FIG. 5).

特定部１１７は、対応関係決定部１１６により対応関係が決定された所定の２つの文字列ＴＸのうち、所定条件を満たす２つの文字列ＴＸの組を特定する（図５参照）。キーとバリューの関係にある２つの文字列ＴＸの組は、その例である。 The identifying unit 117 identifies a set of two character strings TX that satisfy a predetermined condition among the two predetermined character strings TX whose correspondence relationship is determined by the correspondence determining unit 116 (see FIG. 5). A set of two strings TX in a key-value relationship is an example.

以下、図９～図１６を用いて、本第１のサービスの提供のために実行される処理の概要例について説明する。 9 to 16, an outline example of the processing executed to provide the first service will be described below.

図９は、本発明の情報処理装置の一実施形態に係る帳票認識装置により実行されるセル検出処理の例を示すフローチャートである。
図８に示すセル抽出部１１１は、このセル検出処理をＣＰＵ１１が実行することで実現される。このことから、ここでは処理を実行する主体をセル抽出部１１１として説明を行う。 FIG. 9 is a flow chart showing an example of cell detection processing executed by the form recognition device according to one embodiment of the information processing device of the present invention.
The cell extraction unit 111 shown in FIG. 8 is implemented by the CPU 11 executing this cell detection process. For this reason, the cell extracting unit 111 is assumed to be the entity that executes the processing.

先ず、ステップＳ１１において、セル抽出部１１１は、帳票画像ＦＩを１チャネルのグレースケール画像に変換して２値化し、２値化した帳票画像ＦＩ上に存在する罫線を水平線成分、垂直線成分に分離する。 First, in step S11, the cell extraction unit 111 converts the form image FI into a one-channel grayscale image and binarizes it, and divides the ruled lines existing on the binarized form image FI into horizontal line components and vertical line components. To separate.

本実施形態では、罫線の水平線成分、垂直線成分への分離は、２値化した帳票画像ＦＩに対してモルフォロジー変換を用いることで行っている。罫線の水平線成分、垂直線成分の分離には、夫々用意したモルフォロジーフィルタの構造要素が用いられる。また、モルフォロジー変換は、水平線成分、垂直線成分の拡大・収縮処理を通して、線成分を太くする、線成分を補足する、等のためにも用いている。
モルフォロジー変換は、ＤＮＮを用いて行っている。表示領域ＤＳ２（図１）の上方部分にサムネイル画像として表示される２つのサリエンシーマップ画像には、罫線が含まれる。その罫線は、ＤＮＮを用いて分離させた水平線成分、垂直線成分を用いて特定されたものである。そのため、２つのサリエンシーマップ画像の生成にＤＮＮが用いられている。 In this embodiment, the separation of ruled lines into horizontal line components and vertical line components is performed by using morphological transformation on the binarized form image FI. Structural elements of prepared morphological filters are used to separate the horizontal line component and the vertical line component of the ruled line. Morphological transformation is also used to thicken line components, supplement line components, etc. through enlargement/reduction processing of horizontal line components and vertical line components.
Morphological transformation is performed using DNN. The two saliency map images displayed as thumbnail images in the upper portion of display area DS2 (FIG. 1) include ruled lines. The ruled line is identified using a horizontal line component and a vertical line component separated using DNN. Therefore, DNN is used to generate two saliency map images.

ステップＳ１１においては、セル抽出部１１１は、水平線成分、及び垂直線成分と元の帳票画像ＦＩとの間で夫々ピクセル毎の論理積をとることにより、水平線成分のみ、及び垂直線成分のみの分離済み画像の生成も行う。 In step S11, the cell extraction unit 111 separates only the horizontal line component and the vertical line component by taking the logical product of each pixel between the horizontal line component and the vertical line component and the original form image FI. It also generates finished images.

スキャナ２による帳票の読み取りが適切に行われるとは限らない。また、読み取らせた帳票に歪み等が生じている可能性もある。このようなことから、実際には、ステップＳ１１の処理を実行する前に、帳票画像ＦＩの回転、帳票画像ＦＩに生じている歪みの補正等のための前処理を行うことが必要である。 The scanner 2 does not necessarily read the form properly. Moreover, there is a possibility that distortion or the like has occurred in the read form. For this reason, in practice, it is necessary to perform preprocessing such as rotation of the form image FI and correction of distortion occurring in the form image FI before executing the process of step S11.

次に、ステップＳ１２において、セル抽出部１１１は、２つの分離済み画像を処理し、矩形領域を囲んでいる水平線成分、及び垂直線成分の各成分を検出する。セル抽出部１１１は、検出した成分が交差する点についての位置情報である交差情報も併せて抽出する。
また、セル抽出部１１１は、検出した各成分により囲まれた矩形領域を、接続、つまり隣接していると見なす範囲でグルーピングする。それにより、例えば上下左右のうちの何れかの方向で、定めた閾値以下の距離で隣り合う矩形領域は１つのグループにまとめられる。抽出した交差情報は、矩形領域のグルーピングのために参照される。
また、セル抽出部１１１は、検出した各成分のうちで適切に他の成分と接続されていない可能性が考えられる成分を特定し、特定した成分を必要に応じて操作する処理も併せて行う。 Next, in step S12, the cell extraction unit 111 processes the two separated images to detect horizontal line components and vertical line components surrounding the rectangular area. The cell extraction unit 111 also extracts intersection information, which is position information about points where the detected components intersect.
In addition, the cell extraction unit 111 groups the rectangular regions surrounded by the detected components within a range considered to be connected, that is, adjacent. As a result, rectangular areas that are adjacent to each other at a distance equal to or less than a predetermined threshold value in, for example, up, down, left, or right directions are combined into one group. The extracted intersection information is referenced for grouping rectangular regions.
In addition, the cell extracting unit 111 also performs a process of identifying a component among the detected components that may not be properly connected to other components, and manipulating the identified component as necessary. .

図１０は、適切に他の成分と接続されていない可能性が考えられるとして特定される成分の例を示す図である。
罫線では、図１０に示すように、水平線、或いは垂直線の成分Ｌの一部Ｌａが交差する他の成分Ｌからはみ出している場合がある。本第１のサービスでは、はみ出した一部Ｌａは、その一部Ｌａがはみ出した方向に存在する他の成分Ｌと接続させるべきものである可能性を考慮して対応するようにしている。具体的には、一部Ｌａがはみ出している方向に、その一部Ｌａに向かってはみ出している一部Ｌａを有する他の成分Ｌが存在する場合、その２つの一部Ｌａを、その２つの一部Ｌａを含む１つの成分Ｌに置き換えるようにしている。その置き換えにより、２つの一部Ｌａは、それらを結ぶ罫線として扱われる。以下、一部Ｌａは「はみ出し部Ｌａ」と呼ぶ。 FIG. 10 is a diagram showing examples of components identified as possibly not properly connected to other components.
In a ruled line, as shown in FIG. 10, a portion La of a component L of a horizontal line or a vertical line may protrude from another component L with which it intersects. In the first service, the protruding portion La is handled in consideration of the possibility that the protruding portion La should be connected to another component L existing in the protruding direction. Specifically, when there is another component L having a part La protruding toward the part La in the direction in which the part La protrudes, the two parts La He is trying to replace with one component L which partially contains La. Due to the replacement, the two parts La are treated as a ruled line connecting them. Hereinafter, the part La will be referred to as the "protruding portion La".

このような置き換えにおいて、異なる２つの成分Ｌに、対向するはみ出し部Ｌａが存在することを条件としている。これは、成分Ｌの途中が何らかの理由によって消えた状態となって、その成分Ｌの代わりに２つの対向するはみ出し部Ｌａが生じた可能性が高いと考えられるからである。本第１のサービスでは、２つの対向するはみ出し部Ｌａの存在は、その可能性が高いことを示す根拠と見なしている。このような条件により、図１０に示すはみ出し部Ｌａの全ては、他の成分Ｌとは接続されない。 In such a replacement, it is a condition that the two different components L have facing protruding portions La. This is because it is highly probable that the middle portion of the component L disappeared for some reason, and instead of the component L, two opposing protruding portions La were generated. In the first service, the presence of the two facing protruding portions La is regarded as evidence indicating the high possibility. Due to these conditions, all the protruding portions La shown in FIG. 10 are not connected to other components L.

角が丸いセルＣＥでは、例え同じ成分Ｌを共有する隣接するセルＣＥであっても、丸い角の交差情報はその成分Ｌから離れた位置を示すことになる。そのため、交差情報を用いたグルーピングを適切に行うのが困難となる。 For a cell CE with rounded corners, the intersection information of the rounded corners will indicate a position away from that component L, even for neighboring cells CE that share the same component L. Therefore, it is difficult to appropriately perform grouping using intersection information.

しかしながら、罫線を水平線成分と垂直線成分とに分離した場合、セルＣＥの丸い角の部分は無視され、丸い角に繋がる２つの成分Ｌはともに、適切に他の成分Ｌと接続されていない可能性が考えられる成分Ｌとして特定される。この２つの成分Ｌは、その成分Ｌの延長線上で他の成分Ｌと交差する。このことから、この２つの成分Ｌはともに、互いに他の成分Ｌと接続させるべき成分Ｌと見なされ、他の成分Ｌと交差する点までのびる成分Ｌに変更される。この結果、セルＣＥの丸い角は全て、直角の角に成形される。このような成形により、セルＣＥの角の形に係わらず、交差情報を用いたグルーピングも適切に行うことができる。 However, if the ruled line is separated into horizontal and vertical line components, the rounded corner portion of the cell CE is ignored, and the two components L that lead to the rounded corner may not both be properly connected to the other component L. is identified as component L where the nature is considered. These two components L intersect with other components L on the extension of the component L. For this reason, both of these two components L are regarded as components L to be connected with each other L, and are changed to components L extending to the point of intersection with the other L components. As a result, all rounded corners of cell CE are shaped into square corners. With such shaping, grouping using intersection information can be performed appropriately regardless of the corner shape of the cell CE.

成分Ｌのうちには、途中で消えているか、或いは読み取り時に帳票に付着した埃等により、本来、１つの成分Ｌが複数の成分Ｌとして検出される場合がある。そのような複数の成分Ｌでは、同じ向きに対向する他の成分Ｌが存在する。そのため、これら複数の成分Ｌは、一つの成分Ｌに置き換えられる。 One component L may originally be detected as a plurality of components L because some of the components L have disappeared in the middle or dust adhered to the form at the time of reading. In such multiple components L, there are other components L facing in the same direction. Therefore, these multiple components L are replaced with one component L.

このような成分Ｌへの操作により、成分Ｌが交差する点が新たに生じた場合、その点の交差情報が抽出される。このことから、ステップＳ１２の処理の実行により、セルＣＥの可能性が考えられる矩形領域の各交差情報の大部分が抽出されることになる。 When a new point at which the component L intersects is generated by such an operation on the component L, the intersection information of that point is extracted. Therefore, by executing the process of step S12, most of the intersection information of the rectangular areas that are considered possible cells CE are extracted.

ステップＳ１３において、セル抽出部１１１は、水平線成分、及び垂直線成分を夫々、高さを持たない水平線成分、幅を持たない垂直線成分として、各成分の情報を変換する。セル抽出部１１１は、変換後の情報を格子座標情報として集約し、水平線成分と垂直線成分とが交差する点で分割される格子の各要素で罫線の有無を判定し、その判定結果を格子の要素ごとの罫線有無データとして保存する。 In step S13, the cell extraction unit 111 converts the information of each horizontal line component and vertical line component into a horizontal line component without height and a vertical line component without width, respectively. The cell extraction unit 111 aggregates the information after the conversion as grid coordinate information, determines whether or not there is a ruled line in each element of the grid divided at the intersection of the horizontal line component and the vertical line component, and stores the determination result in the grid. Save as ruled line presence/absence data for each element of .

図１１は、ステップＳ１３で実行される処理の内容の第１の例を説明する図である。
水平線成分ＬＨ、及び垂直線成分ＬＶはともに、本来は太さを有する成分である。水平線成分ＬＨは高さを持たない成分に、垂直線成分ＬＶは幅を持たない成分に変換される。その結果、図１１に示すように、太さのない水平線成分と垂直線成分とで形成される格子を表す格子データが作成される。 FIG. 11 is a diagram for explaining a first example of the content of the processing executed in step S13.
Both the horizontal line component LH and the vertical line component LV are originally thick components. The horizontal line component LH is converted to a component without height, and the vertical line component LV is converted to a component without width. As a result, as shown in FIG. 11, grid data representing a grid formed by thin horizontal line components and vertical line components is created.

太さのない水平線成分、垂直線成分を想定することにより、図１１に示すように、太さの違いによって座標情報が異なるのを回避させることができる。それにより、例えば垂直線成分では、ｙ軸上、異なる位置、及び異なる太さの２つの垂直線成分のｘ座標を同じにすることができ、処理が簡単化される。 By assuming horizontal line components and vertical line components with no thickness, as shown in FIG. 11, it is possible to avoid different coordinate information due to differences in thickness. Thus, for vertical line components, for example, two vertical line components with different positions on the y-axis and different thicknesses can have the same x-coordinate, which simplifies the processing.

格子の各要素は、水平線成分と垂直線成分とで分割される成分Ｌである。例えばｘ座標の値がｘ_１となっている垂直線成分では、４つの水平線成分によって分割される３つの部分が夫々要素となる。つまりｙ座標の値がｙ_０～ｙ_１の間、ｙ_１～ｙ_２の間、ｙ_２～ｙ_３までの間が夫々要素となる。 Each element of the grid is a component L divided by a horizontal line component and a vertical line component. For example, in a vertical line component whose x-coordinate value is _x1 , three parts divided by four horizontal line components are each an element. That is, y-coordinate values between y ₀ and y ₁ , between y ₁ and y ₂ , and between y ₂ and y ₃ are elements.

罫線の有無判定は、要素ごとに行われる。図１１に示す例では、ｘ座標の値がｘ_１、ｙ座標の値がｙ_１～ｙ_２の要素のみ、罫線は無しと判定され、その判定結果を表す「０」が罫線有無データとなっている。他の要素は全て、罫線は有りと判定され、その判定結果を表す「１」が罫線有無データとなっている。 The presence/absence of ruled lines is determined for each element. In the example shown in FIG. 11, only elements with an x coordinate value of x ₁ and y coordinate values of y ₁ to y ₂ are determined to have no ruled lines, and "0" representing the determination result is ruled line presence/absence data. ing. All the other elements are determined to have ruled lines, and "1" representing the determination result is ruled line presence/absence data.

ｘ座標の値がｘ_２、ｙ座標の値がｙ_２～ｙ_３までの間である要素では、垂直線成分ＬＶが全ての範囲に存在しない。しかし、その垂直線成分ＬＶは、その要素の大部分に存在する。そのため、この要素も罫線は有りと判定される。 In the element whose x-coordinate value is x ₂ and whose y-coordinate value is between y ₂ and y ₃ , the vertical line component LV does not exist in the entire range. However, the vertical line component LV is present in most of the elements. Therefore, it is determined that this element also has a ruled line.

このように、本第１のサービスでは、格子の要素ごとに、罫線の有無を判定している。そのため、要素に１つ以上の成分が存在し、その要素の全体を１つの成分がカバーしていなくとも、その要素には罫線が有りと判定され、その１つ以上の成分は適切な１つの成分に補完される。 Thus, in the first service, the presence or absence of ruled lines is determined for each grid element. Therefore, even if an element has one or more components and one component does not cover the entire element, it is determined that the element has ruled lines, and the one or more components are Complemented by ingredients.

図１２は、ステップＳ１３で実行される処理の内容の第２の例を説明する図である。
上記のように、ステップＳ１２では、他の成分Ｌからはみ出しているはみ出し部Ｌａが特定される。図１２では、同じ水平方向にはみ出している２つのはみ出し部Ｌａ、その逆向きにはみ出している１つのはみ出し部Ｌａ、及び垂直方向にはみ出している１つのはみ出し部Ｌａが特定された場合の例を示している。 FIG. 12 is a diagram illustrating a second example of the contents of the processing executed in step S13.
As described above, in step S12, the protruding portion La protruding from the other component L is identified. FIG. 12 shows an example in which two protruding portions La protruding in the same horizontal direction, one protruding portion La protruding in the opposite direction, and one protruding portion La protruding in the vertical direction are specified. showing.

図１２に示すように、各はみ出し部Ｌａは、はみ出した方向に、そのはみ出した方向に最初に存在する、その方向と直角な成分Ｌまで延長されている。それにより、ステップＳ１２の処理時に検出された成分Ｌは、ステップＳ１３の処理により、図１２に示すように操作される。角の２つの成分Ｌが交差していない部分は、要素ごとの罫線の有無の判定により、それらが直角に接続するように操作される。 As shown in FIG. 12, each overhang La extends in the overhang direction to a component L that initially exists in the overhang direction and is perpendicular to that direction. Thereby, the component L detected during the process of step S12 is manipulated as shown in FIG. 12 by the process of step S13. A portion where the two corner components L do not intersect is manipulated so that they are connected at right angles by determining whether or not there is a ruled line for each element.

各はみ出し部Ｌａをはみ出した方向に延長させているのは、そのはみ出した方向に、接続させる可能性が考えられる成分Ｌが存在するからである。言い換えれば、テーブルＴＢの範囲、つまり外周となる成分Ｌの一部が検出できなかった可能性が考えられるからである。 The reason why each protruding portion La is extended in the protruding direction is that there is a component L that may be connected in the protruding direction. In other words, it is possible that the range of the table TB, that is, part of the component L, which is the outer circumference, could not be detected.

このような操作により、罫線を構成する成分を適切に補完することが期待できる。また、セルＣＥと見なすことが可能な矩形領域、及びその矩形領域が集まっている範囲は、より矩形形状に近づけることができる。 Through such an operation, it can be expected that the components that make up the ruled line will be appropriately interpolated. Also, the rectangular area that can be regarded as the cell CE and the range where the rectangular areas are gathered can be made closer to a rectangular shape.

表構造となっている部分は、全体的に矩形形状となっている場合が多い。そのため、矩形領域、及びその矩形領域が集まっている範囲をより矩形形状に近づけることは、セルＣＥ、及びテーブルＴＢをより適切に抽出できるようにするうえで効果が期待できる。 A part having a table structure is often rectangular as a whole. Therefore, making the rectangular area and the range where the rectangular areas are gathered closer to a rectangular shape is expected to be effective in extracting the cell CE and the table TB more appropriately.

ステップＳ１４において、セル抽出部１１１は、格子座標情報、及び格子の要素ごとの罫線有無データを参照し、他の矩形領域に含まれない領域を含む矩形領域を全てセルＣＥとして検出する。検出したセルＣＥの全てから、位置情報を夫々抽出する。そのようにして、セルＣＥの検出、及び検出したセルＣＥからの位置情報の抽出を行った後、セル検出処理が終了する。 In step S14, the cell extraction unit 111 refers to the grid coordinate information and ruled line presence/absence data for each grid element, and detects all rectangular regions including regions that are not included in other rectangular regions as cells CE. Location information is extracted from all the detected cells CE. After detecting the cell CE and extracting the location information from the detected cell CE in this way, the cell detection process ends.

図１３は、帳票画像、その帳票画像から検出されるセル、検出されたセルのグルーピングの例を示す図である。
図１３に示すように、帳票画像ＦＩ上に存在する罫線で囲まれ、且つ他の矩形領域に含まれた領域が存在しない矩形領域が全てセルＣＥとして検出される。検出されたセルＣＥは、形状を含む位置関係からグルーピングされ、１グループを構成するセルＣＥはテーブルＴＢの構成要素として扱われる。また、矩形形状となっていない領域を囲む罫線による閉罫線ＣＬは、セルＣＥとして検出されず、排除される。
その排除により、有効とする罫線が確定する。それにより、２つのサリエンシーマップ画像上の罫線も確定する。その一方に配置される文字列領域ＴＸは、罫線とは別に抽出されて特定される。 FIG. 13 is a diagram showing an example of a form image, cells detected from the form image, and grouping of the detected cells.
As shown in FIG. 13, all rectangular areas surrounded by ruled lines on the form image FI and not included in other rectangular areas are detected as cells CE. The detected cells CE are grouped according to the positional relationship including shape, and the cells CE forming one group are treated as constituent elements of the table TB. A closed ruled line CL formed by a ruled line surrounding a non-rectangular area is not detected as a cell CE and is excluded.
By the exclusion, the valid ruled line is determined. Thereby, ruled lines on the two saliency map images are also determined. The character string area TX arranged on one side is extracted and specified separately from the ruled lines.

図１４は、本発明の情報処理装置の一実施形態に係る帳票認識装置により実行される外周取得処理の例を示すフローチャートである。
図８に示すテーブル抽出部１１２は、この外周取得処理をＣＰＵ１１が実行することで実現される。このことから、ここでは処理を実行する主体をテーブル抽出部１１２として説明を行う。
上記セル検出処理では、セルＣＥの検出、及びセルＣＥのグループ化を行う。このことから、外周取得処理では、グループ化されたセルＣＥの集合の外周を特定することにより、テーブルＴＢの範囲を抽出するようになっている。 FIG. 14 is a flow chart showing an example of perimeter acquisition processing executed by the form recognition device according to one embodiment of the information processing device of the present invention.
The table extraction unit 112 shown in FIG. 8 is implemented by the CPU 11 executing this outer circumference acquisition process. For this reason, the table extraction unit 112 is assumed to be the entity that executes the processing.
In the cell detection process, the cells CE are detected and the cells CE are grouped. Therefore, in the perimeter acquisition process, the range of the table TB is extracted by specifying the perimeter of the set of grouped cells CE.

先ず、ステップＳ２１において、テーブル抽出部１１２は、セルＣＥの位置情報を節点情報に変換する。節点情報は、各セルＣＥの位置情報から４つの角を特定し、特定した角で交差する水平線、及び垂直線を夫々引いたと想定した場合に、セルＣＥを表す罫線が水平線、或いは垂直線と交差する点の位置を表す情報のことである。 First, in step S21, the table extraction unit 112 converts the position information of the cell CE into node information. Assuming that four corners are specified from the positional information of each cell CE, and horizontal and vertical lines are drawn that intersect at the specified corners, the node information indicates whether the ruled line representing the cell CE is a horizontal line or a vertical line. This is information that indicates the position of the intersection point.

図１５は、セルの検出結果、及びその検出結果を変換して得られる節点情報の例を示す図である。
図１５の左側に示すようにセルＣＥが検出されるか、或いはグルーピングされることにより、各セルＣＥの角で交差する水平線、及び垂直線が格子線ＧＤとして引いた状態が想定される。節点Ｐは、セルＣＥを表す罫線が格子線ＧＤの水平線、或いは垂直線と交差する点である。節点情報は、節点Ｐの位置を表す位置情報である。 FIG. 15 is a diagram showing an example of cell detection results and node information obtained by converting the detection results.
By detecting or grouping the cells CE as shown on the left side of FIG. 15, it is assumed that horizontal lines and vertical lines that intersect at the corners of each cell CE are drawn as grid lines GD. A node P is a point where a ruled line representing a cell CE intersects a horizontal line or a vertical line of the grid line GD. The node information is position information representing the position of the node P. FIG.

各節点Ｐには、セルＣＥを表す罫線の少なくとも一部が接続された形となっている。その一部は、各節点Ｐで少なくとも２つ存在する。その一部については、以下「リンクＬＫ」と総称する。 Each node P is connected to at least part of a ruled line representing the cell CE. There are at least two of them at each node P. Some of them are hereinafter collectively referred to as "link LK".

各節点Ｐでは、節点情報とともに、その節点Ｐと接続された各リンクＬＫを表すリンク情報が生成される。リンク情報は、節点Ｐと隣接する別の節点Ｐとの間の位置関係を示す情報である。本第１のサービスでは、位置関係を示す情報として、節点Ｐの左右上下の方向ごとに、隣接する節点Ｐの有無、節点Ｐが存在していれば、その節点Ｐまでの距離を保存するようにしている。距離は、ピクセルを単位として表したものである。そのリンク情報により、左右上下の４つの方向で隣接する別の節点Ｐの有無、隣接する別の節点Ｐが存在するならば、その節点Ｐまでの距離を特定することができる。 At each node P, link information representing each link LK connected to the node P is generated together with the node information. Link information is information indicating the positional relationship between a node P and another adjacent node P. In this first service, as information indicating the positional relationship, the presence or absence of an adjacent node P and, if the node P exists, the distance to that node P are stored for each of the left, right, up, and down directions of the node P. I have to. The distance is expressed in units of pixels. Based on the link information, it is possible to specify the presence or absence of another node P adjacent in the four directions of left, right, up and down, and if there is another adjacent node P, the distance to that node P can be specified.

ステップＳ２２において、テーブル抽出部１１２は、生成した節点情報、及びリンク情報を、セルＣＥの検出の際にグルーピングされた結果に従ってグループ化する。 In step S22, the table extraction unit 112 groups the generated node information and link information according to the results of grouping when detecting the cell CE.

その後、ステップＳ２３において、テーブル抽出部１１２は、グループ毎に、そのグループの節点情報、及びリンク情報を参照して、グループ化されたセルＣＥの範囲の外周をテーブルＴＢの外周として抽出する。全てのテーブルＴＢの外周を抽出した後、外周取得処理が終了する。 After that, in step S23, the table extracting unit 112 extracts the perimeter of the range of the grouped cells CE as the perimeter of the table TB by referring to the node information and link information of the group for each group. After extracting the perimeters of all the tables TB, the perimeter acquisition process ends.

図１６は、グループ化された節点情報の例、及びその節点情報からセルを含むテーブルの外周の抽出例を示す図である。
節点Ｐは、テーブルＴＢの外周を特定するために想定される。テーブルＴＢの外周の特定は、起点とする節点Ｐから、隣接する別の節点Ｐに着目する節点Ｐを順次、移動させる探索により行われる。この探索は、既に探索済みの節点Ｐに戻った時点で終了する。 FIG. 16 is a diagram showing an example of grouped node information and an example of extracting the perimeter of a table including cells from the node information.
A node P is assumed to specify the perimeter of the table TB. The specification of the outer circumference of the table TB is performed by a search in which the node P of interest is sequentially moved from the node P serving as the starting point to another adjacent node P. This search ends when it returns to the already searched node P.

本第１のサービスでは、グループ化された節点情報から特定される節点Ｐのうち、最も左側に位置し、且つ最も上に位置する節点Ｐを起点として探索を行うようにしている。次の節点Ｐに進行する進行方向には優先順位を設けている。その優先順位は、現在の節点Ｐに移動した進行方向から見て、左方向が最も優先順位が高いものとしている。次に高い優先順位は現在の進行方向、つまり前方向、その次に高い優先順位は右方向、最も低い優先順位は後方向、としている。それにより、図１６に示す例では、「Ｐ」を付した節点Ｐが起点となり、外周に位置する節点Ｐのみを移動させる探索が行われ、テーブルＴＢの外周が抽出される。 In the first service, among the nodes P specified from the grouped node information, the search is performed with the leftmost and uppermost node P as a starting point. Priority is given to the direction of travel to the next node P. As shown in FIG. Regarding the order of priority, when viewed from the traveling direction in which the current node P is moved, the left direction has the highest priority. The next highest priority is the current direction of travel, that is, forward, the next highest priority is rightward, and the lowest priority is backward. As a result, in the example shown in FIG. 16, the node P marked with "P" is the starting point, and a search is performed to move only the node P located on the outer circumference, and the outer circumference of the table TB is extracted.

図１７は、本発明の情報処理装置の一実施形態に係る帳票認識装置上に実際にキー－バリュー抽出部の一部として実現された機能的構成の一例を示す機能ブロック図である。 FIG. 17 is a functional block diagram showing an example of a functional configuration actually implemented as a part of the key-value extractor on the form recognition device according to one embodiment of the information processing device of the present invention.

図１７に示すように、帳票認識装置１のＣＰＵ１１においては、キー－バリュー抽出部１０２が機能する場合、グラフ情報生成部１２１、及びグラフ構築部１２２が機能する。グラフ情報生成部１２１、及びグラフ構築部１２２は、対応関係決定部１１６、及び特定部１１７の機能を含むものである。 As shown in FIG. 17, in the CPU 11 of the form recognition device 1, when the key-value extraction unit 102 functions, the graph information generation unit 121 and the graph construction unit 122 function. The graph information generation unit 121 and graph construction unit 122 include the functions of the correspondence determination unit 116 and the identification unit 117 .

本第２のサービスで想定するグラフは、ノードＮＤとエッジＥＤからなるデータ構造のものである。ここでのノードＮＤは、認識された文字列ＴＸである。エッジＥＤは、ノードＮＤ同士の対応関係、つまり関係性を表すデータである。以下、「ノード」は、認識された文字列ＴＸを指す意味で用いる。 The graph assumed in the second service has a data structure consisting of nodes ND and edges ED. The node ND here is the recognized string TX. The edge ED is data representing the correspondence between the nodes ND, that is, the relationship. Hereinafter, "node" is used to mean the recognized character string TX.

グラフ情報生成部１２１は、文字列認識部１１３から帳票画像ＦＩを表す画像（データ）ＤＴ１、認識した文字列ＴＸが存在する文字列領域の位置情報ＤＴ２、及び文字列ＴＸの認識結果ＤＴ３を入力する。 The graph information generation unit 121 receives from the character string recognition unit 113 an image (data) DT1 representing the form image FI, position information DT2 of the character string area in which the recognized character string TX exists, and recognition result DT3 of the character string TX. do.

グラフ情報生成部１２１では、図１７に示すように、グラフモデラ部１２１１、及び特徴演算部１２１２が機能する。
グラフモデラ部１２１１は、画像ＤＴ１、及び文字列領域ＴＸの位置情報ＤＴ２を参照し、文字列領域ＴＸ間の相対的な位置情報、つまり距離情報を生成する。
特徴演算部１２１２は、文字列ＴＸの認識結果ＤＴ３から、各文字列ＴＸの属性を表す属性情報、及び文字列の自然言語処理（ＮＬＰ：ＮａｔｕｒａｌＬａｎｇｕａｇｅＰｒｏｃｅｓｓｉｎｇ）を行った結果であるＮＬＰ情報を生成する。 In the graph information generation unit 121, as shown in FIG. 17, a graph modeler unit 1211 and a feature calculation unit 1212 function.
The graph modeler unit 1211 refers to the image DT1 and the position information DT2 of the character string areas TX, and generates relative position information between the character string areas TX, that is, distance information.
From the recognition result DT3 of the character string TX, the feature calculation unit 1212 generates attribute information representing the attribute of each character string TX and NLP information that is the result of performing natural language processing (NLP) on the character string. do.

図１８は、グラフ情報生成部により生成される各種情報の例を説明する図である。
文字列領域ＴＸ間の相対的な位置情報は、空間的な特徴を表す情報である。グラフモデラ部１２１１は、認識された文字列ＴＸ、より正確には、その文字列ＴＸが存在する文字列領域毎に、隣接する文字列領域ＴＸとの間の相対的な位置情報を生成する。本第２のサービスでは、図１８に示すように、上下左右の４方向に分け、方向毎に、隣接する文字列領域ＴＸまでの位置情報を生成する。具体的には、上下に隣接する文字列領域ＴＸでは、その間の距離を、対象とする文字列領域ＴＸの高さで割って得られる位置情報を生成する。左右に隣接する文字列領域ＴＸでは、その間の距離を、対象とする文字列領域ＴＸの幅で割って得られる位置情報を生成する。 FIG. 18 is a diagram illustrating an example of various information generated by the graph information generation unit.
The relative positional information between the character string areas TX is information representing spatial features. The graph modeler unit 1211 generates the recognized character string TX, more precisely, relative position information between adjacent character string regions TX for each character string region in which the character string TX exists. In the second service, as shown in FIG. 18, it is divided into four directions, up, down, left, and right, and position information up to the adjacent character string area TX is generated for each direction. Specifically, for vertically adjacent character string areas TX, position information is generated by dividing the distance between them by the height of the target character string area TX. Positional information is generated by dividing the distance between adjacent character string areas TX by the width of the target character string area TX.

対象とする文字列領域ＴＸの大きさ（形状）で文字列領域ＴＸ間の距離を割ることにより、文字列領域ＴＸ間の相対的な位置情報は、文字列領域ＴＸ間の距離がその文字列領域ＴＸの大きさに応じて正規化された形となる。文字列領域ＴＸ間の距離は、ノード間の対応関係を特定するうえで重要な情報である。その距離を正規化することにより、文字列領域ＴＸの大きさに係わらず、文字列領域ＴＸ間の対応関係をより適切に特定できるようになる。 By dividing the distance between the character string areas TX by the size (shape) of the target character string area TX, the relative position information between the character string areas TX can be obtained by determining the distance between the character string areas TX as the character string It is normalized according to the size of the area TX. The distance between the character string areas TX is important information for specifying the correspondence between nodes. By normalizing the distance, it becomes possible to more appropriately specify the correspondence between the character string areas TX regardless of the size of the character string areas TX.

特徴演算部１２１２は、予め定められた分類に沿って、文字列ＴＸの属性を判定し、その属性を論理型の情報として生成する。文字列ＴＸは、図１８に示すように、例えば日付、数値、それらとは異なるその他のうちの何れかに分類される。論理型の属性情報は、分類毎に、その分類に属するか否かを２値で表す情報である。 The feature calculation unit 1212 determines the attribute of the character string TX according to a predetermined classification, and generates the attribute as Boolean information. The character string TX is classified, as shown in FIG. 18, into, for example, date, numerical value, or others different from them. Boolean attribute information is information that indicates, for each classification, whether or not it belongs to that classification.

ＮＬＰ情報は、認識された文字列ＴＸに対して自然言語処理を行って得られる情報である。このＮＬＰ情報は、例えば図１８に示すように、空白で区切られた複数の単語を１つの文字列ＴＸとして扱うのを可能にする。また、図１に示す「自．平成２９年４月１日至．平成３０年０３月３１日」を、「自．平成２９年４月１日」「至．平成３０年０３月３１日」の２つの文字列ＴＸとして扱うのを可能にする。 The NLP information is information obtained by performing natural language processing on the recognized character string TX. This NLP information makes it possible to treat a plurality of words separated by blanks as one character string TX, as shown in FIG. 18, for example. In addition, "From April 1, 2017 to March 31, 2018" shown in FIG. as two strings TX of .

グラフモデラ部１２１１で生成された各種情報は、グラフ構築部１２２に渡される。
グラフ構築部１２２では、図１７に示すように、特徴抽出部１２２１、ノード分類部１２２２、ノード抽出部１２２３、エッジ分類部１２２４、及び結合部１２２５が機能する。特徴抽出部１２２１、ノード分類部１２２２、及びエッジ分類部１２２４は、ＭＬＰ（ＭｕｌｔｉＬａｙｅｒＰｅｒｃｅｐｔｒｏｎ）を用いたものである。 Various information generated by the graph modeler unit 1211 is passed to the graph construction unit 122 .
In the graph construction unit 122, as shown in FIG. 17, a feature extraction unit 1221, a node classification unit 1222, a node extraction unit 1223, an edge classification unit 1224, and a connection unit 1225 function. The feature extraction unit 1221, the node classification unit 1222, and the edge classification unit 1224 use MLP (MultiLayer Perceptron).

特徴抽出部１２２１は、各ノードＮＤが持つ特徴量を繰り返して学習する機能である。例えばノードＮＤが「合計額」という文字列ＴＸの画像であった場合、そのノードＮＤは、教師データと比較することにより、「合計額」の意味の文字列ＴＸと認識される。 The feature extraction unit 1221 has a function of repeatedly learning the feature amount of each node ND. For example, if the node ND is an image of the character string TX "total amount", the node ND is recognized as the character string TX meaning "total amount" by comparing with the teacher data.

ノード分類部１２２２は、各ノードＮＤから抽出された特徴量を用いて、各ノードＮＤを分類し、その分類結果をノード分類子として生成する。ここでは、各ノードＮＤは、キー、バリュー、及びその他のうちの何れかに分類される。このノード分類子は、ノードＮＤの属性情報に相当する。 The node classification unit 1222 classifies each node ND using the feature amount extracted from each node ND, and generates the classification result as a node classifier. Here, each node ND is categorized as one of key, value, and other. This node classifier corresponds to the attribute information of the node ND.

帳票画像ＦＩ上のノードＮＤ（文字列ＴＸ）のうちには、キー、及びバリューの両方に分類できるものも存在する。本第２のサービスでは、学習データを用いた学習により、キー、及びバリューの両方に分類できるノードＮＤをそのうちの一方にのみ分類する。それにより、特定すべき２つのノードＮＤの組、つまりノードＮＤ１とノードＮＤ２の組のみを特定できるようにしている。 Some of the nodes ND (character strings TX) on the form image FI can be classified into both keys and values. In the second service, a node ND that can be classified into both a key and a value is classified into only one of them by learning using learning data. Thereby, only the set of two nodes ND to be identified, that is, the set of node ND1 and node ND2 can be identified.

ノードＮＤ間の対応関係、つまり関係性を特定すべきノードＮＤは、キー、或いはバリューと分類されたノードＮＤである。その他と分類されたノードＮＤ３は、対応関係を特定する対象には含まれない。このことから、ノード抽出部１２２３は、各ノードＮＤのうちから、キー、或いはバリューと分類されたノードＮＤ１、ＮＤ２のみを抽出する。 A node ND for which a correspondence between nodes ND, that is, a relationship should be specified is a node ND classified as a key or a value. The node ND3 classified as others is not included in the targets for identifying the correspondence relationship. Accordingly, the node extraction unit 1223 extracts only the nodes ND1 and ND2 classified as keys or values from among the nodes ND.

エッジ分類部１２２４は、キー、或いはバリューと分類されたノードＮＤのみを対象に、そのノードＮＤとの間に対応関係が存在するノードＮＤを予測することにより、エッジＥＤを生成する。対応関係を決定する、つまりエッジＥＤを予測するノードＮＤを、キー、或いはバリューと分類したものに限定しているため、全てのノードＮＤを対象にする場合と比較して、計算量、及び必要なメモリ量がともに低減される。 The edge classification unit 1224 generates an edge ED by predicting a node ND having a correspondence relationship with only a node ND classified as a key or value. Since the nodes ND for determining the correspondence relationship, that is, for predicting the edge ED, are limited to those classified as keys or values, the amount of calculation and the required Both the amount of memory required is reduced.

結合部１２２５には、各ノードＮＤのノード分類子、及びエッジＥＤの予測結果が渡される。それにより、結合部１２２５は、予測（決定）されたエッジＥＤのうち、ノード分類子がキー、及びバリューとなっている２つのノードＮＤ１、ＮＤ２を結びつけるエッジＥＤ１のみを抽出する。このエッジＥＤ１の抽出により、ノード分類子がキーとバリューとなっている２つのノードＮＤ１、ＮＤ２の組が特定される。 The node classifier of each node ND and the prediction result of the edge ED are passed to the combining unit 1225 . As a result, the connecting unit 1225 extracts only the edge ED1 that connects the two nodes ND1 and ND2 whose node classifiers are KEY and VALUE from the predicted (determined) edges ED. By extracting this edge ED1, a set of two nodes ND1 and ND2 whose node classifiers are key and value is specified.

図１９は、グラフ構築部の詳細例を示す図である。
特徴抽出部１２２１は、図１９に示すように、３層のＮＮ（ＮｅｕｒａｌＮｅｔｗｏｒｋ）である。この特徴抽出部１２２１には、グラフ情報生成部１２１から、ノードＮＤの集合、及びエッジＥＤの集合を表すグラフ、ノード数、エッジ数、ノード属性、ノードラベル、隣接行列、マッピングマトリックス、等が渡される。 FIG. 19 is a diagram illustrating a detailed example of a graph constructing unit;
The feature extraction unit 1221 is a three-layer NN (Neural Network), as shown in FIG. The feature extraction unit 1221 receives from the graph information generation unit 121 a graph representing a set of nodes ND and a set of edges ED, the number of nodes, the number of edges, node attributes, node labels, an adjacency matrix, a mapping matrix, and the like. be

ノードラベルは、例えばノードＮＤの識別を可能にする情報である。マッピングマトリックスは、ノードＮＤ間の関係性を表す多次元行列であり、エッジＥＤの予測結果に応じて更新される。ノード属性は、図１８に示すように、論理型の情報である。 A node label is information that enables identification of a node ND, for example. A mapping matrix is a multi-dimensional matrix representing the relationship between nodes ND, and is updated according to the prediction result of edge ED. The node attribute is logical type information, as shown in FIG.

特徴抽出部１２２１は、上記のように、各ノードＮＤが持つ特徴量を抽出し、意味を特定する。その特定により、ノード分類子がその他と分類されるノードＮＤがノード抽出部１２２３に通知される。 As described above, the feature extraction unit 1221 extracts the feature amount of each node ND and identifies the meaning. As a result of this identification, the node extracting unit 1223 is notified of the node ND classified as other by the node classifier.

ノード分類部１２２２も、図１９に示すように、３層のＮＮである。ノード分類部１２２２は、ノード属性、及びノードラベルを用いて、各ノードのノード分類子を生成する。ここでは、各ノードＮＤをキー、バリュー、及びその他のうちの何れかに分類することから、ノード分類子は０～２の数値で表している。０はキー、１はバリュー、２はその他を表している。各ノードのノード分類子は、ノード抽出部１２２３に渡される。 The node classification unit 1222 is also a three-layer NN, as shown in FIG. The node classifier 1222 generates a node classifier for each node using node attributes and node labels. Here, each node ND is classified into one of key, value, and others, so the node classifier is represented by a numerical value of 0-2. 0 represents the key, 1 the value, and 2 the others. A node classifier for each node is passed to the node extraction unit 1223 .

ノード分類部１２２２では、ノード分類子の生成のための学習データを用いた学習が行われている。この学習データは、グラフ理論を採用したものであり、例えばノードＮＤをキー、及びバリューの何れかに分類し、それらの間の関連性の強さをリンク情報としてアノテーションしたものである。それにより、ノード分類部１２２２は、他のノードＮＤを考慮し、各ノードＮＤのノード分類子を生成する。キー、及びバリューの両方に分類可能なノードＮＤは、そのうちの一方にのみ分類される。また、或る文字列ＴＸがキーに分類される場合、その類義語に相当する文字列ＴＸも全てキーに分類される。 The node classifier 1222 performs learning using learning data for generating node classifiers. This learning data employs graph theory, for example, the node ND is classified into either a key or a value, and the strength of the relationship between them is annotated as link information. Thereby, the node classifier 1222 considers other nodes ND and generates a node classifier for each node ND. A node ND that can be classified into both a key and a value is classified into only one of them. Further, when a character string TX is classified as a key, all character strings TX corresponding to synonyms are also classified as keys.

ノード抽出部１２２３では、図１９に示すように、ノード削除部１２２３１、及びノード選択部１２２３２が機能する。
その他と分類されるノードＮＤは、ノード削除部１２２３１に通知され、各ノードのノード分類子はノード削除部１２２３１に渡される。それにより、ノード削除部１２２３１は、全てのノードＮＤのうちから、その他と分類されるノードＮＤを対象から削除する。この結果、ノード選択部１２２３２には、キー、或いはバリューと分類されたノードＮＤのみが通知される。 In the node extraction unit 1223, as shown in FIG. 19, a node deletion unit 12231 and a node selection unit 12232 function.
Nodes ND classified as others are notified to the node deletion unit 12231 , and the node classifier of each node is passed to the node deletion unit 12231 . Accordingly, the node deletion unit 12231 deletes the node ND classified as other from among all the nodes ND. As a result, the node selection unit 12232 is notified only of the nodes ND classified as keys or values.

ノード選択部１２２３２は、ノード削除部１２２３１から通知されたノードＮＤを選択し、エッジ分類部１２２４に選択結果を通知する。
この結果、エッジ分類部１２２４は、キー、或いはバリューと分類されたノードＮＤのみを対象に、そのノードＮＤとの間に対応関係があると見なすノードＮＤを特定し、マッピングマトリックスを更新する。 The node selection unit 12232 selects the node ND notified from the node deletion unit 12231 and notifies the edge classification unit 1224 of the selection result.
As a result, the edge classification unit 1224 identifies nodes ND that are considered to have a correspondence relationship with only the nodes ND classified as keys or values, and updates the mapping matrix.

マッピングマトリックスは、例えばノードＮＤ間の対応関係の有無を０、或いは１で表すマトリックスである。０は、対応関係が無いことを表し、１は、対応関係が有ることを表す。このことから、エッジ分類部１２２４は、対応関係が認められるノードＮＤ間に対応する要素の値を０から１に更新する。 The mapping matrix is, for example, a matrix that indicates the presence or absence of correspondence between nodes ND with 0 or 1. 0 indicates that there is no correspondence, and 1 indicates that there is a correspondence. Accordingly, the edge classifying unit 1224 updates the value of the element corresponding between the nodes ND where the correspondence is recognized from 0 to 1.

エッジ分類部１２２４は、自然言語処理の機能を実装させた２層のＮＮである。それにより、エッジ分類部１２２４は、表構造上、キーとバリューの関係になっている２つのノードＮＤの間にエッジＥＤが存在すると見なす（図５参照）。しかし、その２つのノードＮＤが実際にキー、バリューと夫々分類されているとは限らない。このことから、結合部１２２５は、マッピングマトリックス、及びノード分類子を参照し、ノード分類子がキー、バリューを示す２つのノードＮＤ１、ＮＤ２を結ぶエッジＥＤ１を特定する。それにより、結合部１２２５は、そのエッジＥＤ１が結ぶ２つのノードＮＤ１、ＮＤ２の組のみを抽出する。この２つのノードＮＤ１、ＮＤ２の組は全て、特定結果として記憶部１８の特定結果格納部１８４に格納される。 The edge classification unit 1224 is a two-layer NN that implements a natural language processing function. As a result, the edge classification unit 1224 considers that an edge ED exists between two nodes ND having a key-value relationship in the table structure (see FIG. 5). However, the two nodes ND are not necessarily classified as keys and values, respectively. Therefore, the connecting unit 1225 refers to the mapping matrix and the node classifier, and identifies the edge ED1 connecting the two nodes ND1 and ND2, which are indicated by the node classifier as key and value. Thereby, the connecting unit 1225 extracts only the set of two nodes ND1 and ND2 connected by the edge ED1. All the sets of these two nodes ND1 and ND2 are stored in the identification result storage unit 184 of the storage unit 18 as identification results.

以上、本発明の一実施形態について説明した、しかし、本発明が適用される実施形態は、例えば以下のようなものであっても良い。 An embodiment of the present invention has been described above, but an embodiment to which the present invention is applied may be, for example, as follows.

例えば帳票には、キーに分類されるノードＮＤ１（文字列ＴＸ）に対し、バリューに分類されるノードＮＤ２が複数、存在するものもある。例えば図２０に示す帳票では、「氏名」の文字列ＴＸ３１がキーに分類されるノードＮＤ１に対応し、「山田太郎」の文字列ＴＸ４１、ＴＸ４２は、そのノードＮＤ１との対応関係があるバリューに分類されるノードＮＤ２に対応する。このことから、文字列ＴＸ４１の他に、文字列ＴＸ４２も文字列ＴＸ３１に対応づけるようにしても良い。同様にして、文字列ＴＸ４３、ＴＸ４４を文字列ＴＸ３２に、文字列ＴＸ４５、ＴＸ４６を文字列ＴＸ３３に、文字列ＴＸ４７、ＴＸ４８を文字列ＴＸ３４に、夫々対応づけるようにしても良い。 For example, some forms have a plurality of nodes ND2 classified as values for a node ND1 (character string TX) classified as keys. For example, in the form shown in FIG. 20, the character string TX31 of "full name" corresponds to the node ND1 classified as a key, and the character strings TX41 and TX42 of "Taro Yamada" correspond to the values having a corresponding relationship with the node ND1. Corresponds to the classified node ND2. Therefore, in addition to the character string TX41, the character string TX42 may also be associated with the character string TX31. Similarly, the character strings TX43 and TX44 may be associated with the character string TX32, the character strings TX45 and TX46 with the character string TX33, and the character strings TX47 and TX48 with the character string TX34.

また、文字列ＴＸ４３～ＴＸ４７は、文字列ＴＸ４１の「山田太郎」とも対応づけられるものである。このことから、文字列ＴＸ４１と、文字列ＴＸ４３～ＴＸ４７との組をまとめて対応づけても良い。この組は、構造化データとして扱うことができる。
また、帳票画像ＦＩを表す帳票画像データは、スキャナ２から取得しているが、その帳票画像データは、ネットワークを介して接続可能な端末から取得するようにしても良い。つまり、帳票画像データを取得する装置は特に限定されない。それにより、図１に示すような表示画面ＤＳを表示させる装置も特に限定されない。 The character strings TX43 to TX47 are also associated with the character string TX41 "Taro Yamada". For this reason, the set of character string TX41 and character strings TX43 to TX47 may be associated together. This set can be treated as structured data.
Further, although the form image data representing the form image FI is obtained from the scanner 2, the form image data may be obtained from a terminal connectable via a network. In other words, the device that acquires the form image data is not particularly limited. Accordingly, the device for displaying the display screen DS as shown in FIG. 1 is not particularly limited.

表示画面ＤＳの画面構成についても、図１に示すようなものに限定されない。例えばテーブルＴＢの抽出結果を常に表示させる表示領域を確保し、別の表示領域に、セル領域、文字列領域の各抽出結果を選択的に表示できるようにしても良い。詳細表示領域ＤＳ３２に表示させた文字列ＴＸの認識結果をオペレータが修正できるようにしても良い。
キー－バリューの関係にある２つの文字列ＴＸの組の抽出にＮＮを用いているが、ＮＮを用いなくても良い。しかし、ＮＮを用いた場合、用意する学習データにより、様々な帳票に対応させることが比較的に容易に可能になるという利点がある。
テーブルＴＢの範囲内に複種類の罫線が存在する場合がある。複種類の罫線とは、太さが異なる、１つの罫線を形成する線の数が異なる、及び色が異なる、等のうちの何れかにより区別される複種類の罫線である。罫線を異ならせていることには何らかの意図が存在するのが普通である。このことから、テーブルＴＢの抽出、或いは文字列ＴＸ（ノードＮＤ）間の対応関係の決定に、罫線の種類の判定結果を反映させるようにしても良い。
帳票は表構造となっている。そのため、テーブルＴＢの抽出、文字列ＴＸ間の対応関係の決定を行う対象を帳票としている。それにより、表構造を有しているものであれば対象にすることが可能である。つまり、対象は、帳票以外のものであっても良い。 The screen configuration of the display screen DS is not limited to that shown in FIG. 1 either. For example, a display area for always displaying the extraction result of the table TB may be secured, and each extraction result of the cell area and the character string area may be selectively displayed in another display area. The operator may correct the recognition result of the character string TX displayed in the detail display area DS32.
Although NN is used to extract a set of two character strings TX in a key-value relationship, NN may not be used. However, when NN is used, there is an advantage that it becomes possible to correspond to various forms relatively easily with prepared learning data.
Multiple types of ruled lines may exist within the range of the table TB. Multiple types of ruled lines are multiple types of ruled lines that are distinguished by any of the following: different thicknesses, different numbers of lines forming one ruled line, different colors, and the like. There is usually some intention in making the ruled lines different. For this reason, the determination result of the ruled line type may be reflected in the extraction of the table TB or the determination of the correspondence between the character strings TX (nodes ND).
The form has a tabular structure. Therefore, the target for extracting the table TB and determining the correspondence between the character strings TX is the form. As a result, any object having a table structure can be targeted. In other words, the object may be something other than a form.

本第１のサービス、及び本第２のサービスは、夫々異なる専用アプリにより提供されるものと説明したが、１つの専用アプリでそれらのサービスを提供可能にしても良い。図８では、本第２のサービスを提供可能な専用アプリが、本第１のサービスを提供可能な専用アプリの一部の機能（文字列認識部１１３を実現させる機能）を利用するものと想定している。 Although it has been explained that the first service and the second service are provided by different dedicated applications, these services may be provided by one dedicated application. In FIG. 8, it is assumed that the dedicated application capable of providing the second service uses a part of the functions of the dedicated application capable of providing the first service (the function that realizes the character string recognition unit 113). are doing.

以上まとめると、本発明が適用される、本第１のサービスを提供可能な情報処理装置は、次のような構成を取れば足り、各種各様な実施形態を採ることができる。
即ち、本第１のサービスを提供可能な情報処理装置（例えば図６に示す帳票認識装置１）は、
帳票を表す画像のデータを用いて、前記帳票に存在するセルを認識し、認識した前記セルの前記画像における位置を特定することで、当該セルを抽出するセル抽出手段（例えば図８に示すセル抽出部１１１）と、
前記セル抽出手段によりが抽出された１以上の前記セル及び位置に基づいて、前記画像においてグループを構成する１以上の前記セルを特定し、特定した前記１以上のセルが存在する範囲をテーブルとして抽出するテーブル抽出手段（例えば図８に示すテーブル抽出部１１２）と、
前記帳票に存在する文字列領域を認識し、認識した前記文字列領域の前記画像における位置を特定することで、当該文字列領域を抽出する文字列抽出手段（例えば図８に示す文字列認識部１１３）と、
を備える情報処理装置。 In summary, the information processing apparatus capable of providing the first service, to which the present invention is applied, only needs to have the following configuration, and various embodiments can be adopted.
That is, an information processing device capable of providing the first service (for example, the form recognition device 1 shown in FIG. 6)
A cell extracting means (for example, a cell shown in FIG. 8) extracts a cell by recognizing a cell existing in the form using image data representing the form and specifying the position of the recognized cell in the image. an extraction unit 111);
Based on the one or more cells and their positions extracted by the cell extracting means, one or more cells forming a group in the image are identified, and a range in which the one or more identified cells exist is defined as a table. a table extracting means for extracting (for example, a table extracting unit 112 shown in FIG. 8);
Character string extraction means (for example, the character string recognition unit shown in FIG. 8) extracts a character string region by recognizing a character string region existing in the form and specifying the position of the recognized character string region in the image. 113) and
Information processing device.

これにより、オペレータは、セル、文字列領域、及びテーブルの各抽出結果を確認することができる。多くのセルには、内側に文字列が配置される。それにより、セルの抽出結果から、大部分の文字列が存在する位置を確認することができる。
テーブルの抽出結果は、テーブル単位で文字列の認識結果、及びその対応関係を確認する後続処理を行うことを可能にする。テーブル単位で確認できることから、一度に意識すべき文字列の数はより少なくなり、文字列間の対応関係の確認もより容易となる。
このようなことから、オペレータは、適切な後続処理をより容易、且つより迅速に行うことができる。 This allows the operator to confirm the extraction results of the cell, character string area, and table. Many cells have text placed inside them. As a result, it is possible to confirm the positions where most of the character strings exist from the cell extraction results.
The extraction result of the table makes it possible to perform subsequent processing for confirming the recognition result of the character string and the corresponding relationship for each table. Since the confirmation can be made on a table-by-table basis, the number of character strings to be aware of at one time can be reduced, making it easier to confirm the correspondence between character strings.
As such, the operator can more easily and quickly perform appropriate follow-up processing.

上記情報処理装置には、
前記セル抽出手段、前記テーブル抽出手段、及び前記文字列抽出手段により夫々抽出された前記セル、前記テーブル、及び前記文字列の夫々の位置を視認可能な形態で、前記画像を表示させる制御を実行する表示制御手段（例えば図８に示す表示制御部１１４）、を更に備えることができる。
これにより、オペレータは、表示された画像から、セル、テーブル、及び文字列領域の位置を確認することができる。 The information processing device includes
Execution of control to display the image in a form in which the positions of the cells, the table, and the character strings extracted by the cell extracting means, the table extracting means, and the character string extracting means can be visually recognized. Display control means (for example, the display control unit 114 shown in FIG. 8) can be further provided.
This allows the operator to confirm the positions of the cells, tables, and character string areas from the displayed image.

上記情報処理装置には、
前記表示制御手段は、
操作者による操作に基づいて、前記テーブル、前記セル、及び前記文字列領域のうちの１つ以上の位置を選択的に視認可能な形態で、前記画像を表示させる制御を実行する、ようにさせることができる。
これにより、オペレータは、表示された画像から、セル、テーブル、及び文字列領域のうちの任意の位置を選択的に確認することができる。そのため、位置の確認はより容易に行えるようになる。 The information processing device includes
The display control means is
Based on an operation by an operator, control is executed to display the image in a form in which one or more positions of the table, the cell, and the character string area are selectively visible. be able to.
This allows the operator to selectively check any position of the cell, table, and character string area from the displayed image. Therefore, it becomes easier to confirm the position.

また、本発明が適用される、本第２のサービスを提供可能な情報処理装置は、次のような構成を取れば足り、各種各様な実施形態を採ることができる。
即ち、本第２のサービスを提供可能な情報処理装置（例えば図６に示す帳票認識装置１）は、
帳票を表す画像のデータを用いて、前記帳票に存在する、１つ以上の文字が連なる文字列を複数認識するとともに、認識した前記複数の文字列の夫々の前記画像における位置情報を特定する文字列認識手段（例えば図８に示す文字列認識部１１３）と、
前記文字列認識手段による前記複数の文字列の夫々の認識結果、及び前記複数の文字列の前記位置情報を用いて、前記帳票に存在する前記複数の文字列のうち、所定の２つの文字列の間の対応関係を決定する対応関係決定手段（例えば図８に示す対応関係決定部１１６）と、
前記関係性決定手段により対応関係が決定された前記所定の２つの文字列のうち、所定条件を満たす２つの文字列の組を特定する特定手段（例えば図８に示す特定部１１７）と、
を備える情報処理装置。 Further, the information processing apparatus capable of providing the second service, to which the present invention is applied, only needs to have the following configuration, and various embodiments can be adopted.
That is, an information processing device capable of providing the second service (for example, the form recognition device 1 shown in FIG. 6)
Using data of an image representing a form, recognizing a plurality of character strings in which one or more characters exist in the form, and character specifying position information of each of the recognized character strings in the image. a string recognition means (for example, a character string recognition unit 113 shown in FIG. 8);
predetermined two character strings out of the plurality of character strings present in the form using the recognition result of each of the plurality of character strings by the character string recognition means and the position information of the plurality of character strings; Correspondence determination means (for example, correspondence determination unit 116 shown in FIG. 8) for determining the correspondence between
specifying means (for example, specifying unit 117 shown in FIG. 8) for specifying a set of two character strings satisfying a predetermined condition among the two predetermined character strings for which the correspondence relationship is determined by the relationship determining means;
Information processing device.

これにより、オペレータは、帳票に存在する文字列のうち、特に重要度の高いような２つの文字列の組に限定した確認を容易に行うことが可能となる。文字列の認識結果のうちから、確認すべき認識結果を探すような作業を行う必要性は回避可能となる。視点を変えれば、確認を行う必要性がない、或いは比較的に低いような文字列の認識結果の確認を回避することが容易となる。
このようなことから、オペレータは、適切な後続処理をより迅速に行えるようになる。 As a result, the operator can easily perform confirmation limited to a set of two character strings of particularly high importance among the character strings present in the form. It is possible to avoid the need to search for the recognition result to be confirmed among the character string recognition results. From a different point of view, it becomes easier to avoid checking recognition results of character strings that are not required or are relatively infrequently checked.
As such, the operator can take appropriate follow-up actions more quickly.

上記情報処理装置には、
前記対応関係決定手段は、前記複数の文字列の中で、所定の１種類以上の属性のうち何れかの属性を有する文字列のみを前記所定の２つの文字列の夫々として採用して、当該所定の２つの文字列の対応関係を決定する、ようにさせることができる。
これにより、採用させる文字列の属性を通して、所望の２つの文字列の組を特定させることができる。 The information processing device includes
The correspondence relationship determination means adopts only character strings having any one of one or more predetermined attributes among the plurality of character strings as each of the two predetermined character strings, and It is possible to determine the correspondence between two predetermined character strings.
This makes it possible to specify a desired pair of two character strings through the attributes of the character strings to be adopted.

上記情報処理装置には、
前記所定条件は、一方の文字列がキーであり他方の文字列がバリューである関係が成立する条件を含む、ようにさせることができる。
これにより、その関係を有する２つの文字列の組を特定させることができる。 The information processing device includes
The predetermined condition can include a condition where one character string is a key and the other character string is a value.
This allows the set of two strings that have that relationship to be identified.

１帳票認識装置、２スキャナ、１１ＣＰＵ、１２ＲＯＭ、１３ＲＡＭ、１４バス、１５入出力インターフェース、１６出力部、１７入力部、１８記憶部、１９通信部、２０ドライブ、３１リムーバブル、エディア、１１１セル抽出部、１１２テーブル抽出部、１１３文字列認識部、１１４表示制御部、１１５入力制御部、１１６対応関係決定部、１１７特定部 1 form recognition device, 2 scanner, 11 CPU, 12 ROM, 13 RAM, 14 bus, 15 input/output interface, 16 output section, 17 input section, 18 storage section, 19 communication section, 20 drive, 31 removable, edia, 111 Cell extractor 112 Table extractor 113 Character string recognizer 114 Display controller 115 Input controller 116 Correspondence determining unit 117 Identifying unit

Claims

Using data of an image representing a form, recognizing a plurality of character strings in which one or more characters exist in the form, and character specifying position information of each of the recognized character strings in the image. column recognition means;
determination of an attribute based on the result of recognition of each of the plurality of character strings by the character string recognition means, the position information of the plurality of character strings, and each of the plurality of character strings, and generation using the determined attribute Correspondence determination means for determining a correspondence relationship between two predetermined character strings among the plurality of character strings present in the form using a classifier;
identifying means for identifying a set of two character strings that satisfy a predetermined condition from among the two predetermined character strings for which the correspondence has been determined by the correspondence determining means;
the attributes represent any of dates, numbers, and others;
The classifier may be any one of a key logically located on the upper side of the correspondence between the two predetermined character strings, a value logically located on the lower side of the correspondence, and others. represents
Information processing equipment.

The correspondence determining means adopts, from among the plurality of character strings, only character strings having one of the predetermined one or more classifiers as each of the two predetermined character strings. to determine the correspondence between the two given strings;
The information processing device according to claim 1 .

The correspondence determining means uses the position information to generate relative position information between the character strings, and uses the generated relative position information to identify the correspondence between the two character strings. and determining two character strings for which one of the classifiers is the key or the value among the two character strings for which the correspondence relationship is specified as the two predetermined character strings,
The identifying means identifies a set of the two character strings in which the classifier of one character string is the key and the classifier of the other character string is the value.
The information processing apparatus according to claim 1 or 2.

In the information processing method executed by the information processing device,
Using data of an image representing a form, recognizing a plurality of character strings in which one or more characters exist in the form, and character specifying position information of each of the recognized character strings in the image. a column recognition step;
determining an attribute based on the recognition result of each of the plurality of character strings by the processing of the character string recognition step, the position information of the plurality of character strings, and each of the plurality of character strings; and using the determined attribute a correspondence determining step of determining a correspondence between two predetermined character strings among the plurality of character strings present in the form using the generated classifier;
a specifying step of specifying a set of two character strings that satisfy a predetermined condition from among the two predetermined character strings for which the correspondence has been determined by the processing of the correspondence determining step;
the attributes represent any of dates, numbers, and others;
The classifier may be any one of a key logically located on the upper side of the correspondence between the two predetermined character strings, a value logically located on the lower side of the correspondence, and others. represents
Information processing methods.