JP7365835B2

JP7365835B2 - Structure recognition system, structure recognition device, structure recognition method, and program

Info

Publication number: JP7365835B2
Application number: JP2019179710A
Authority: JP
Inventors: 美恵大串; 貴広馬場; 陽太 ▲高▼岡; 英雄寺田
Original assignee: Open Stream Inc
Current assignee: Open Stream Inc
Priority date: 2019-09-30
Filing date: 2019-09-30
Publication date: 2023-10-20
Anticipated expiration: 2039-09-30
Also published as: JP2021056796A

Description

本発明は、構造認識システム、構造認識装置、構造認識方法、及びプログラムに関する。 The present invention relates to a structure recognition system, a structure recognition device, a structure recognition method, and a program.

帳票などの文書をスキャナー等により読み込むことにより作成されたスキャン画像から、画像内の文字情報を抽出する技術がある（例えば、特許文献１－２参照）。特許文献１の技術では、画像内の文字をその位置に基づいて構造化することにより、文字情報の誤りを修正し易くする技術が開示されている。特許文献１の構造化とは、文字情報を一群の情報ごとにまとめ、まとめた情報の階層関係を特定し、表現すること、と記載されている。例えば、画像から、タイトル、文書作成者、及び文書作成日等の文字情報が抽出された場合、構造化されたデータでは、最上位の階層にタイトルが示され、その下層に文書作成者、及び文書作成日が示される。特許文献２の技術では、画像内の文字情報と、罫線の特徴を示す特徴情報を抽出する。これにより、文書を検索する際に、文書に記載された文字に加えて、文書に記載された罫線の特徴を指定することができ、効率よく検索することが可能である。 There is a technique for extracting character information in an image from a scanned image created by reading a document such as a form using a scanner or the like (see, for example, Patent Documents 1 and 2). The technique of Patent Document 1 discloses a technique that makes it easier to correct errors in character information by structuring characters in an image based on their positions. Structuring in Patent Document 1 describes that text information is grouped into groups of information, and the hierarchical relationship of the grouped information is specified and expressed. For example, when character information such as title, document creator, and document creation date are extracted from an image, in structured data, the title is shown at the top level, and the document creator and The document creation date is indicated. The technique disclosed in Patent Document 2 extracts character information in an image and feature information indicating features of ruled lines. Thereby, when searching for a document, in addition to the characters written in the document, the characteristics of the ruled lines written in the document can be specified, making it possible to search efficiently.

一方、近年のコンピュータおよび通信ネットワーク技術の発展に伴い、紙媒体の情報が電子データに置き換えられる傾向がある。帳票を電子化することにより、ペーパーレス化による業務の効率化や省資源化を実現することができ、尚且つ、スマートフォンなどの電子機器を介して文書の記入欄などに記入することができるようになり、ユーザの利便性を向上させることができる。 On the other hand, with the recent development of computer and communication network technology, there is a tendency for paper-based information to be replaced with electronic data. By digitizing forms, it is possible to achieve paperless work efficiency and resource conservation, and it is also possible to fill in fields on documents using electronic devices such as smartphones. Therefore, user convenience can be improved.

帳票を電子化する場合、そのレイアウトが変更されることが多い。帳票が印字された紙面と、スマートフォンなどの電子機器の画面とでは、アスペクト比が互いに異なり、紙の帳票のレイアウトを変更することなく、そのまま電子機器の画面に表示させると、表示の縮尺によっては文書の一部が表示できなかったり、画像全体を表示させようとすると、かなり縮小されてしまい文字が読み取り難くなってしまったりする可能性があるためである。レイアウトを変更する場合には、変換前の帳票に記載されていた内容を、変換後の帳票に過不足なく反映させる必要がある。この対策として、例えば、特許文献１－２の技術を適用して帳票のレイアウトを変更することが考えられる。特許文献１－２の技術を用いれば、帳票に記載された文字の構造、及び罫線の特徴を維持して、レイアウトを変更することが可能となる。 When digitizing a form, its layout is often changed. The aspect ratio of the paper on which the form is printed and the screen of an electronic device such as a smartphone is different, and if you display the paper form as it is on the screen of the electronic device without changing the layout, the aspect ratio may differ depending on the display scale. This is because part of the document may not be displayed, or if you try to display the entire image, it may be reduced considerably and the text may become difficult to read. When changing the layout, it is necessary to reflect exactly what was written in the form before conversion on the form after conversion. As a countermeasure to this problem, for example, it is possible to apply the technology of Patent Documents 1-2 to change the layout of the form. By using the techniques of Patent Documents 1 and 2, it is possible to change the layout while maintaining the character structure and ruled line characteristics written on the form.

特開２０１９－８２８１４号公報JP2019-82814A 特開２００８－４０８３４号公報Japanese Patent Application Publication No. 2008-40834

しかしながら、文字の構造、及び罫線の特徴を維持してレイアウトを変更しても、変換前の帳票に記載されていた内容を、変換後の帳票に過不足なく反映させることができない。帳票には、必要事項を記入するための記入枠が存在するものが多い。このような記入枠のほとんどが、文字を含まない、単純な矩形で示される。このような記入枠それ自体からは文字の情報を抽出することはできない。このため特許文献１の技術では、記入枠などの矩形を含む帳票に記載されている事項すべてについて階層構造を判定することが困難である。また、特許文献２を用いてレイアウト変更後の帳票に変更前の罫線の特徴が維持されたとしても、罫線で区分される何れの領域に文字を記載するか、或いは記載しないで記入枠とするかが判らなければ、適切にレイアウト変換を行うことができない。このように、従来の技術をそのまま利用するのみでは、帳票に記載されている事項（矩形を含む）の意味的な繋がり（構造）を維持しながら、レイアウトを変更することが困難であった。 However, even if the layout is changed while maintaining the character structure and the characteristics of the ruled lines, the content written in the form before conversion cannot be reflected in the form after conversion without excess or deficiency. Many forms have entry frames for entering necessary items. Most of these entry frames are shown as simple rectangles that do not contain any text. Character information cannot be extracted from such an entry frame itself. Therefore, with the technique of Patent Document 1, it is difficult to determine the hierarchical structure of all items written in a form including rectangular shapes such as entry frames. Furthermore, even if the characteristics of the ruled lines before the change are maintained in the form after the layout has been changed using Patent Document 2, it is still difficult to decide in which areas divided by the ruled lines characters should be written, or in which areas they should not be written and should be used as entry frames. If this is not known, layout conversion cannot be performed appropriately. As described above, it is difficult to change the layout while maintaining the semantic connection (structure) of the items (including rectangles) written on the form by simply using the conventional technology as is.

本発明は、このような状況に鑑みてなされたもので、矩形を含む文書のレイアウトを変換するために必要な情報を抽出することができる構造認識システム、構造認識装置、構造認識方法、及びプログラムを提供する。 The present invention has been made in view of this situation, and provides a structure recognition system, a structure recognition device, a structure recognition method, and a program that can extract information necessary for converting the layout of a document including rectangles. I will provide a.

本発明の上述した課題を解決するために、本発明は、文字と矩形とを含む対象画像における画像データを取得する画像データ取得部と、前記対象画像における文字と矩形のそれぞれの領域を判定する領域判定部と、前記領域判定部によって判定された前記領域に関する領域データに基づいて、前記対象画像に含まれる矩形の階層構造を判定する構造判定部と、を備え、前記構造判定部は、前記対象画像に含まれる矩形のうち、前記階層構造を判定する対象である着目矩形の従属元となる矩形又は文字を判定する構造認識システムである。
また、本発明は、文字と矩形とを含む対象画像における画像データを取得する画像データ取得部と、前記対象画像における文字と矩形のそれぞれの領域を判定する領域判定部と、前記領域判定部によって判定された前記領域に関する領域データに基づいて、前記対象画像に含まれる矩形の階層構造を判定する構造判定部と、文字の領域を示す前記領域データを用いて、当該領域に示される第１文字に対応する特定の第２文字を含む意味タグ情報を生成する前処理部と、を備え、前記構造判定部は、学習済みモデルを用いて前記階層構造を判定し、前記学習済みモデルは、文字と矩形とを含む学習画像における前記意味タグ情報及び矩形の領域を示す前記領域データと、前記学習画像に含まれる矩形の階層構造とを対応付けた学習用データセットを用いて、入力された画像に含まれる矩形の階層構造を出力するように学習されたモデルである、構造認識システムである。 In order to solve the above-mentioned problems of the present invention, the present invention includes an image data acquisition unit that acquires image data in a target image including characters and rectangles, and determines respective regions of characters and rectangles in the target image. an area determination unit; a structure determination unit that determines a hierarchical structure of rectangles included in the target image based on area data regarding the area determined by the area determination unit; This is a structure recognition system that determines, among rectangles included in a target image, a rectangle or a character that is a dependent source of a rectangle of interest whose hierarchical structure is to be determined.
The present invention also provides an image data acquisition unit that acquires image data in a target image including characters and rectangles, an area determination unit that determines areas of each of the characters and rectangles in the target image, and an area determination unit that a structure determining unit that determines a hierarchical structure of rectangles included in the target image based on the determined area data regarding the area; a preprocessing unit that generates semantic tag information including a specific second character corresponding to a character, the structure determination unit determines the hierarchical structure using a learned model, and the learned model An input image using a learning data set that associates the semantic tag information and the area data indicating the rectangular area in the learning image containing a rectangle with the hierarchical structure of the rectangles included in the learning image. This is a structure recognition system, which is a model trained to output the hierarchical structure of rectangles contained in .

また、本発明は、上述の構造認識システムにおいて、文字の領域を示す前記領域データを用いて、当該領域に示される第１文字に対応する特定の第２文字を含む意味タグ情報を生成する前処理部を更に備え、前記構造判定部は、前記意味タグ情報と、矩形の領域を示す前記領域データとに基づいて、前記階層構造を判定する。 Furthermore, in the structure recognition system described above, the present invention uses the area data indicating a character area to generate semantic tag information including a specific second character corresponding to the first character shown in the area. The apparatus further includes a processing section, and the structure determining section determines the hierarchical structure based on the semantic tag information and the area data indicating a rectangular area.

また、本発明は、上述の構造認識システムにおいて、前記構造判定部は、学習済みモデルを用いて前記階層構造を判定し、前記学習済みモデルは、文字と矩形とを含む学習画像における前記意味タグ情報及び矩形の領域を示す前記領域データと、前記学習画像に含まれる矩形の前記階層構造とを対応付けた学習用データセットを用いて、入力された画像に含まれる矩形の階層構造を出力するように学習されたモデルである。 Further, in the structure recognition system of the present invention, the structure determination unit determines the hierarchical structure using a trained model, and the trained model is configured to identify the meaning tags in the learning image including characters and rectangles. Outputting a hierarchical structure of rectangles included in the input image using a learning data set in which information and the area data indicating a rectangular area are associated with the hierarchical structure of rectangles included in the learning image. This is a model trained as follows.

また、本発明は、上述の構造認識システムにおいて、前記構造判定部は、前記対象画像において、前記階層構造を判定する着目矩形を選択し、前記選択した着目矩形の位置から所定の第１範囲内に位置する前記意味タグ情報である近傍意味タグ群を取得し、前記選択した着目矩形の位置から所定の第２範囲内に位置する矩形の前記領域データである近傍矩形群を取得し、取得した前記着目矩形、前記近傍意味タグ群、及び前記近傍矩形群の位置に応じた並べ替えを行うことにより、前記学習済みモデルに入力させる入力データの順序を決定する。 Further, in the structure recognition system of the present invention, the structure determination unit selects a rectangle of interest for determining the hierarchical structure in the target image, and within a predetermined first range from the position of the selected rectangle of interest. A group of neighboring semantic tags, which is the semantic tag information located in the selected rectangle of interest, is obtained, and a group of neighboring rectangles, which is the region data of a rectangle located within a predetermined second range from the position of the selected rectangle of interest, is obtained. By performing rearrangement according to the positions of the rectangle of interest, the group of neighborhood meaning tags, and the group of neighborhood rectangles, the order of input data to be input to the trained model is determined.

また、本発明は、対象画像における文字と矩形とのそれぞれの領域に関する領域データを取得する領域データ取得部と、前記領域データに基づいて、前記対象画像に含まれる矩形の階層構造を判定し、前記対象画像に含まれる矩形のうち、前記階層構造を判定する対象である着目矩形の従属元となる矩形又は文字を判定する構造判定部と、を備える構造認識装置である。 The present invention also provides a region data acquisition unit that acquires region data regarding each region of characters and rectangles in a target image, and a hierarchical structure of rectangles included in the target image based on the region data , The structure recognition apparatus includes a structure determination unit that determines, among rectangles included in the target image, a rectangle or a character that is a dependent source of a rectangle of interest whose hierarchical structure is to be determined.

また、本発明は、領域データ取得部が、対象画像における文字と矩形とのそれぞれの領域に関する領域データを取得し、構造判定部が、前記領域データに基づいて、前記対象画像に含まれる矩形の階層構造を判定し、前記対象画像に含まれる矩形のうち、前記階層構造を判定する対象である着目矩形の従属元となる矩形又は文字を判定する構造認識方法である。 Further, in the present invention, the area data acquisition unit acquires area data regarding the respective areas of characters and rectangles in the target image, and the structure determination unit determines the area of the rectangle included in the target image based on the area data. This is a structure recognition method that determines a hierarchical structure and determines, among rectangles included in the target image, a rectangle or a character that is a dependent source of a rectangle of interest whose hierarchical structure is to be determined.

また、本発明は、コンピュータを、上記に記載の構造認識装置として動作させるためのプログラムであって、前記コンピュータを前記構造認識装置が備える各部として機能させるためのプログラムである。 Further, the present invention is a program for causing a computer to operate as the structure recognition device described above, and a program for causing the computer to function as each part included in the structure recognition device.

本発明によれば、矩形を含む文書のレイアウトを変換するために必要な情報を抽出することができる。 According to the present invention, information necessary for converting the layout of a document including rectangles can be extracted.

実施形態に係る構造認識システム１の構成例を示す図である。1 is a diagram showing a configuration example of a structure recognition system 1 according to an embodiment. 実施形態に係る構造認識システム１が行う処理を説明する図である。FIG. 2 is a diagram illustrating processing performed by the structure recognition system 1 according to the embodiment. 実施形態に係る領域分割装置１０の構成例を示すブロック図である。1 is a block diagram showing a configuration example of a region dividing device 10 according to an embodiment. FIG. 実施形態に係る構造認識装置３０の構成例を示すブロック図である。It is a block diagram showing an example of composition of structure recognition device 30 concerning an embodiment. 実施形態に係る変換テーブル３６０の構成例を示す図である。It is a figure showing an example of composition of conversion table 360 concerning an embodiment. 実施形態に係る領域分割装置１０が行う処理を説明する図である。FIG. 3 is a diagram illustrating processing performed by the region dividing device 10 according to the embodiment. 実施形態に係る構造認識装置３０が行う処理を説明する図である。It is a figure explaining the processing which the structure recognition device 30 concerning an embodiment performs. 実施形態に係る構造認識システム１を適用したレイアウト変換の例を示す図である。It is a figure showing an example of layout conversion to which structure recognition system 1 concerning an embodiment is applied. 実施形態に係る構造認識システム１が行う処理の流れを示すシーケンス図である。It is a sequence diagram showing the flow of processing performed by the structure recognition system 1 according to the embodiment.

以下、発明の実施形態について図面を参照しながら説明する。 Embodiments of the invention will be described below with reference to the drawings.

本実施形態の構造認識システム１は、矩形を含む文書のレイアウトを変換するために必要な情報を抽出するシステムである。 The structure recognition system 1 of this embodiment is a system that extracts information necessary for converting the layout of a document including rectangles.

以下の説明では、レイアウトを変換する対象とする文書が帳票である場合を例示して説明するが、これに限定されることはない。レイアウトを変換する対象は、少なくとも文字と矩形とが含まれる文書であればよく、例えば、アンケート、問診票、テスト問題、定型文テンプレート、アイディアシートなど、任意の文書であってよい。文書に含まれる矩形とは、文書において長方形や正方形など四角形状に囲まれた領域を示す。矩形は、実線で囲まれた領域のみならず、点線や特定の記号や図形により囲まれた矩形の領域、或いは、背景色の濃淡等により区分される矩形の領域を含む。また、文書に含まれる文字とは、単体の文字のみならず、複数の文字からなる文字列や、文字群を含む。 In the following explanation, a case will be explained in which the document whose layout is to be converted is a form, but the present invention is not limited to this. The target for layout conversion may be any document as long as it includes at least characters and rectangles, and may be any document such as a questionnaire, medical questionnaire, test question, fixed phrase template, or idea sheet. A rectangle included in a document refers to an area surrounded by a quadrilateral shape such as a rectangle or a square in the document. A rectangle includes not only an area surrounded by a solid line, but also a rectangular area surrounded by a dotted line, a specific symbol, or a figure, or a rectangular area divided by the shade of a background color. Further, characters included in a document include not only a single character but also a string of characters and a group of characters.

レイアウトを変換するために必要な情報とは、帳票に含まれる文字及び矩形の階層構造を示す情報（以下、構造化データと称する）である。帳票に含まれる文字及び矩形の階層構造が判れば、その構造を維持したままレイアウトを変換することができる。したがって、レイアウト変換前と変換後において帳票に示される文字や記入欄等とそれらの相対的な位置関係を維持することができる。すなわち、帳票が示している内容を維持したままレイアウトを変更するためには、帳票に含まれる文字及び矩形の構造化データを抽出する必要がある。 The information necessary to convert the layout is information indicating the hierarchical structure of characters and rectangles included in the form (hereinafter referred to as structured data). If the hierarchical structure of characters and rectangles included in a form is known, the layout can be converted while maintaining that structure. Therefore, it is possible to maintain the characters, entry fields, etc. shown on the form and their relative positional relationships before and after the layout conversion. That is, in order to change the layout while maintaining the contents shown in the form, it is necessary to extract structured data of characters and rectangles included in the form.

構造化データの例を説明する。図６に示すように、帳票が、矩形の領域Ｋ１～Ｋ５を含む場合を考える。図７に示すように、領域Ｋ１～Ｋ３の構造化データは、上位の階層に領域Ｋ１、その下に領域Ｋ２、Ｋ３が従属されるという階層構造を示す情報である。領域Ｋ４、Ｋ５の構造化データは、上位の階層に領域Ｋ４、その下に領域Ｋ５が従属されるという階層構造を示す情報である。 An example of structured data will be explained. As shown in FIG. 6, consider a case where a form includes rectangular areas K1 to K5. As shown in FIG. 7, the structured data of regions K1 to K3 is information indicating a hierarchical structure in which region K1 is at the upper level and regions K2 and K3 are subordinated therebelow. The structured data of areas K4 and K5 is information indicating a hierarchical structure in which area K4 is at the upper level and area K5 is subordinated below it.

構造認識システム１の全体構成について、図１を用いて説明する。図１は、実施形態に係る構造認識システム１の構成例を示すブロック図である。図1に示すように、構造認識システム１は、例えば、領域分割装置１０と、ＯＣＲ装置２０と、構造認識装置３０とを備える。構造認識システム１におけるこれらの構成要素（領域分割装置１０、ＯＣＲ装置２０、及び構造認識装置３０）は、通信可能に接続される。
なお、図１ではＯＣＲ装置２０が文字認識を行う場合を例示したが、文字認識処理の機能が、領域分割装置１０、又は構造認識装置３０に設けられていてもよく、この場合、ＯＣＲ装置２０を省略することができる。 The overall configuration of the structure recognition system 1 will be explained using FIG. 1. FIG. 1 is a block diagram showing a configuration example of a structure recognition system 1 according to an embodiment. As shown in FIG. 1, the structure recognition system 1 includes, for example, a region dividing device 10, an OCR device 20, and a structure recognition device 30. These components (area segmentation device 10, OCR device 20, and structure recognition device 30) in the structure recognition system 1 are communicably connected.
Although FIG. 1 illustrates a case where the OCR device 20 performs character recognition, the function of character recognition processing may be provided in the area dividing device 10 or the structure recognition device 30. In this case, the OCR device 20 can be omitted.

領域分割装置１０は、帳票を、その帳票に示された文字及び矩形等の領域に分割する装置である。ＯＣＲ装置２０は、入力された画像に示された文字を認識する文字認識処理を行う装置である。構造認識装置３０は、帳票に示される矩形の階層構造を判定する装置である。 The region dividing device 10 is a device that divides a document into regions such as characters and rectangles shown on the document. The OCR device 20 is a device that performs character recognition processing to recognize characters shown in an input image. The structure recognition device 30 is a device that determines the hierarchical structure of rectangles shown in a form.

以下では、構造認識装置３０が、帳票に示される「矩形」の階層構造を特定する場合を例示して説明する。帳票に示される「文字」の階層構造を特定する場合にも同様の方法を適用することができる。 In the following, a case will be explained in which the structure recognition device 30 specifies the hierarchical structure of a "rectangle" shown in a form. A similar method can be applied to specifying the hierarchical structure of "characters" shown on a form.

また、以下では、階層構造として、帳票に含まれる矩形の従属元となる矩形又は文字の識別情報（以下、親ＩＤと称する）を判定する場合を例示して説明する。この場合、構造化データは、矩形と、その矩形の親ＩＤとを対応付けた情報である。階層構造として親ＩＤを判定する方法を用いることによって、データ容量の増加を抑制しつつ、矩形の構造を一意に特定することができるため好適である。しかしながら、これに限定されることはない。矩形の階層構造を特定する方法として、矩形の従属先となる矩形又は文字の識別情報（以下、子ＩＤと称する）を判定することも考えられる。この場合、一つの矩形に複数の文字や矩形が従属する構造が有り得るため、矩形に複数の子ＩＤを対応付けられるような構成をとる必要があるためデータ容量の増加を招く要因となり得る。矩形の階層構造を特定する方法は、少なくとも階層構造が特定できれば、任意の方法であってよい。矩形の階層構造を特定する方法は、矩形に親ＩＤを対応付ける方法であってもよいし、矩形に子ＩＤを対応付ける方法であってもよいし、矩形に親ＩＤと子ＩＤの双方を対応付ける方法であってもよいし、他の方法であってもよいのは勿論である。 Further, as a hierarchical structure, a case will be described below in which identification information of a rectangle or a character (hereinafter referred to as a parent ID) that is a dependent source of a rectangle included in a form is determined as an example. In this case, the structured data is information that associates a rectangle with a parent ID of that rectangle. By using a method of determining the parent ID as a hierarchical structure, it is possible to uniquely specify the rectangular structure while suppressing an increase in data capacity, which is preferable. However, it is not limited to this. As a method of specifying the hierarchical structure of rectangles, it is also possible to determine the identification information (hereinafter referred to as child ID) of a rectangle or a character to which the rectangle is dependent. In this case, since there may be a structure in which multiple characters or rectangles are subordinate to one rectangle, it is necessary to adopt a configuration that allows multiple child IDs to be associated with a rectangle, which may lead to an increase in data capacity. The method for specifying the rectangular hierarchical structure may be any method as long as at least the hierarchical structure can be specified. The method for specifying the hierarchical structure of rectangles may be a method of associating a parent ID with a rectangle, a method of associating a child ID with a rectangle, or a method of associating a rectangle with both a parent ID and a child ID. Of course, other methods may also be used.

ここで、構造認識システム１が行う処理について、図２を用いて説明する。図２は、実施形態に係る構造認識システム１が行う処理を説明する図である。図２に示すように、帳票Ｔをスキャナーにより読み込む処理（スキャン処理ＳＣ）によりスキャンされた帳票Ｔの画像（スキャン画像）を示す情報（スキャン画像データＳＤ）が作成される。スキャン画像は、領域分割装置１０による処理の対象となる画像である。すなわち、スキャン画像は、「対象画像」の一例である。スキャン画像データＳＤは、領域分割装置１０に入力される。これにより、領域分割装置１０は、スキャン画像データＳＤを取得する。 Here, the processing performed by the structure recognition system 1 will be explained using FIG. 2. FIG. 2 is a diagram illustrating processing performed by the structure recognition system 1 according to the embodiment. As shown in FIG. 2, information (scan image data SD) representing a scanned image of the form T (scan image) is created by the process of reading the form T with a scanner (scan processing SC). The scanned image is an image to be processed by the area segmentation device 10. That is, the scanned image is an example of a "target image." The scan image data SD is input to the region dividing device 10. Thereby, the region dividing device 10 obtains scan image data SD.

領域分割装置１０は、スキャン画像データＳＤに基づいて、帳票Ｔにおける文字、矩形などの要素ごとの領域に分割する。領域分割装置１０は、スキャン画像における、文字の領域を示す情報（文字領域データＭＤ）、及び矩形の領域を示す情報（矩形領域データＫＤ）を出力する。領域分割装置１０は、文字領域データＭＤをＯＣＲ装置２０に出力する。文字領域データＭＤは、「領域データ」の一例である。矩形領域データＫＤは、「領域データ」の一例である。 The region dividing device 10 divides the form T into regions for each element such as a character or a rectangle based on the scan image data SD. The area dividing device 10 outputs information indicating a character area (character area data MD) and information indicating a rectangular area (rectangular area data KD) in the scanned image. The area dividing device 10 outputs the character area data MD to the OCR device 20. The character area data MD is an example of "area data." The rectangular area data KD is an example of "area data."

ＯＣＲ装置２０は、領域分割装置１０から取得した文字領域データに示される文字を認識する文字認識処理を行なう。ＯＣＲ装置２０は、認識した文字の内容を示す情報（文字認識データＭＮＤ）を構造認識装置３０に通知する。文字認識データＭＮＤは、構造認識装置３０に入力される。これにより、構造認識装置３０は、文字認識データＭＮＤを取得する。 The OCR device 20 performs character recognition processing to recognize characters shown in the character region data acquired from the region dividing device 10. The OCR device 20 notifies the structure recognition device 30 of information indicating the content of the recognized characters (character recognition data MND). The character recognition data MND is input to the structure recognition device 30. Thereby, the structure recognition device 30 acquires character recognition data MND.

一方、領域分割装置１０によって出力された文字領域データＭＤ、及び矩形領域データＫＤは、構造認識装置３０に入力される。これにより、構造認識装置３０は、文字領域データＭＤ、及び矩形領域データＫＤを取得する。 On the other hand, the character region data MD and rectangular region data KD output by the region dividing device 10 are input to the structure recognition device 30. Thereby, the structure recognition device 30 obtains the character area data MD and the rectangular area data KD.

構造認識装置３０は、領域分割装置１０から取得した文字領域データＭＤ、及び矩形領域データＫＤ、及びＯＣＲ装置２０から取得した文字認識データＭＮＤに基づいて、帳票に示される矩形の階層構造を判定する。構造認識装置３０は、変換テーブル３６０を用いて、文字認識データＭＮＤを、その文字の意味ごとに区分する。構造認識装置３０は、構造判定部３４よって、文字領域データＭＤ、矩形領域データＫＤ、及び文字認識データＭＮＤをその文字の意味ごとに区分した情報を用いて、矩形の階層構造を判定する。構造認識装置３０が矩形の階層構造を判定する方法については後で詳しく説明する。構造認識装置３０は、矩形の階層構造を示す情報（構造化データＫＺＤ）を出力する。 The structure recognition device 30 determines the hierarchical structure of rectangles shown in the form based on the character area data MD and rectangular area data KD acquired from the area dividing device 10 and the character recognition data MND acquired from the OCR device 20. . The structure recognition device 30 uses the conversion table 360 to classify the character recognition data MND according to the meaning of the character. In the structure recognition device 30, the structure determination unit 34 determines the hierarchical structure of a rectangle using information in which the character area data MD, rectangular area data KD, and character recognition data MND are classified according to the meanings of the characters. The method by which the structure recognition device 30 determines the hierarchical structure of rectangles will be described in detail later. The structure recognition device 30 outputs information (structured data KZD) indicating a rectangular hierarchical structure.

ここで、領域分割装置１０の構成について、図３を用いて説明する。図３は、実施形態に係る領域分割装置１０の構成例を示すブロック図である。図３に示すように、領域分割装置１０は、例えば、画像データ取得部１１と、変調画像生成部１２と、領域判定部１３と、領域データ出力部１４と、記憶部１５とを備える。 Here, the configuration of the region dividing device 10 will be explained using FIG. 3. FIG. 3 is a block diagram showing a configuration example of the region dividing apparatus 10 according to the embodiment. As shown in FIG. 3, the region dividing device 10 includes, for example, an image data acquisition section 11, a modulated image generation section 12, a region determination section 13, a region data output section 14, and a storage section 15.

画像データ取得部１１は、スキャン画像データＳＤを取得する。スキャン画像データＳＤは、例えば、画素ごとに、画像に関する情報が対応付けられた情報であり、画素ごとのグレースケール値が示された情報、或いは、画素ごとのＲＧＢ値が示された情報等である。画像データ取得部１１は、取得したスキャン画像データＳＤを、変調画像生成部１２、及び領域判定部１３に出力する。 The image data acquisition unit 11 acquires scan image data SD. The scan image data SD is, for example, information in which image-related information is associated with each pixel, such as information indicating a gray scale value for each pixel, or information indicating an RGB value for each pixel. be. The image data acquisition unit 11 outputs the acquired scan image data SD to the modulated image generation unit 12 and the area determination unit 13.

変調画像生成部１２は、画像データ取得部１１から取得したスキャン画像データＳＤに基づいて、強調画像を生成する。強調画像は、スキャン画像における画素ごとの画素値（グレースケール値や、ＲＧＢ値）を所定の変調条件に基づいて変更した画像である。 The modulated image generation unit 12 generates an enhanced image based on the scan image data SD acquired from the image data acquisition unit 11. The enhanced image is an image in which the pixel value (grayscale value or RGB value) of each pixel in the scanned image is changed based on predetermined modulation conditions.

変調画像生成部１２は、例えば、スキャン画像のエッジを強調する強調処理を行った画像を強調画像として生成する。この場合、変調画像生成部１２は、スキャン画像におけるエッジを検出し、検出したエッジを強調する処理を行う。変調画像生成部１２は、例えば、従来行われている任意の手法により、スキャン画像におけるエッジを検出する。任意の手法とは、例えば、スキャン画像にメディアンフィルタ処理を行ったものと、ガウシアンフィルタなどによる平滑化処理を行ったものとの差分を検出することにより、エッジを検出する手法である。或いは、スキャン画像に、ラプラシアンフィルタやソーベル（Sobel）フィルタを適用することにより、エッジを検出する手法を用いてもよい。変調画像生成部１２は、検出したエッジをある特定の画素値（例えば、「黒」を示すグレースケール値や、ＲＧＢ値）とし、エッジとして検出されなかった画素の画素値を、別の特定の画素値（例えば、「白」を示すグレースケール値や、ＲＧＢ値）に変換することにより、強調画像を生成する。変調画像生成部１２は、生成した強調画像の画像データを領域判定部１３に出力する。 The modulated image generation unit 12 generates, as an enhanced image, an image that has been subjected to enhancement processing to enhance the edges of the scanned image, for example. In this case, the modulated image generation unit 12 detects edges in the scanned image and performs processing to emphasize the detected edges. The modulated image generation unit 12 detects edges in the scanned image using, for example, any conventional method. The arbitrary method is, for example, a method of detecting edges by detecting a difference between a scanned image subjected to median filter processing and an image obtained after smoothing processing using a Gaussian filter or the like. Alternatively, a method of detecting edges may be used by applying a Laplacian filter or a Sobel filter to the scanned image. The modulated image generation unit 12 sets the detected edge to a certain specific pixel value (for example, a gray scale value indicating "black" or an RGB value), and sets the pixel value of the pixel that is not detected as an edge to another specific pixel value. An enhanced image is generated by converting into pixel values (for example, grayscale values indicating "white" or RGB values). The modulated image generation unit 12 outputs image data of the generated enhanced image to the area determination unit 13.

領域判定部１３は、スキャン画像、及び強調画像のそれぞれの画像における文字、矩形及び背景（文字でなく、且つ矩形でない要素）の領域を判定する。領域判定部１３は、例えば、領域判定モデル１５０を用いて画像における文字、及び矩形の領域を判定する。領域判定モデル１５０は、記憶部１５に記載される情報であって、画像データと、画像における文字、矩形及び背景のそれぞれの領域を判定した結果とを対応付けた学習用データセットを、学習モデルに学習させた学習結果である。このような学習用データセットを学習させることにより、学習モデルは、入力された画像のデータに対し、その画像における文字、及び矩形の領域を、精度よく出力（予測）できるように学習される。学習モデルは、例えば、ＤＣＮＮ（Deep Convolutional Neural Network）であるが、これに限定されることはない。学習モデルとして、例えば、ＣＮＮ、決定木、階層ベイズ、ＳＶＭ（Support Vector Machine）などの手法、およびこれらを適宜組み合わせた手法が用いられてもよい。 The area determination unit 13 determines areas of characters, rectangles, and backgrounds (elements that are not characters and are not rectangles) in each of the scanned image and the emphasized image. The area determination unit 13 uses, for example, the area determination model 150 to determine characters and rectangular areas in the image. The area determination model 150 is information written in the storage unit 15, and is a learning data set that associates image data with the results of determining each area of characters, rectangles, and background in the image. This is the learning result obtained by training the robot. By learning such a training data set, the learning model is trained to accurately output (predict) characters and rectangular areas in the input image data. The learning model is, for example, a DCNN (Deep Convolutional Neural Network), but is not limited to this. As the learning model, for example, methods such as CNN, decision tree, hierarchical Bayes, SVM (Support Vector Machine), and methods combining these appropriately may be used.

領域判定部１３は、スキャン画像、及び強調画像におけるそれぞれの判定結果に基づいて、スキャン画像における文字、矩形及び背景のそれぞれの領域を確定させる。領域判定部１３は、例えば、スキャン画像の判定結果と、強調画像の判定結果とが一致する領域については、スキャン画像の判定結果をそのまま確定させる。 The area determination unit 13 determines the respective areas of the text, rectangle, and background in the scan image based on the determination results of the scan image and the emphasized image. For example, for a region where the scan image determination result and the enhanced image determination result match, the area determination unit 13 determines the scan image determination result as is.

一方、領域判定部１３は、スキャン画像の判定結果と、強調画像の判定結果とが一致しない領域については、予め定めた所定の規定に基づいて文字、矩形及び背景のそれぞれの領域を確定させる。例えば、領域判定部１３は、スキャン画像、及び強調画像のうち少なくとも一方が文字、他方が背景と判定した領域を、スキャン画像における文字の領域に確定させる。領域判定部１３は、例えば、スキャン画像、及び強調画像のうち少なくとも一方が矩形、他方が背景と判定した領域を、スキャン画像における矩形の領域に確定させる。領域判定部１３は、スキャン画像における文字の領域を示す情報（文字領域データＭＤ）、及び、矩形の領域を示す情報（矩形領域データＫＤ）を領域データ出力部１４に出力する。 On the other hand, for areas where the scan image determination result and the enhanced image determination result do not match, the area determination unit 13 determines the respective areas of characters, rectangles, and background based on predetermined regulations. For example, the area determining unit 13 determines that an area in which at least one of the scanned image and the emphasized image is a text and the other is a background is determined to be a text area in the scanned image. The area determining unit 13 determines, for example, an area in which at least one of the scan image and the emphasized image is a rectangle and the other is a background, as a rectangular area in the scan image. The area determination unit 13 outputs information indicating a character area in the scanned image (character area data MD) and information indicating a rectangular area (rectangular area data KD) to the area data output unit 14.

領域データ出力部１４は、文字領域データＭＤをＯＣＲ装置２０に出力する。領域データ出力部１４は、文字領域データＭＤ、及び矩形領域データＫＤを構造認識装置３０に出力する。記憶部１５は、領域判定モデル１５０を記憶する。 The area data output unit 14 outputs the character area data MD to the OCR device 20. The area data output unit 14 outputs the character area data MD and the rectangular area data KD to the structure recognition device 30. The storage unit 15 stores an area determination model 150.

ここで、構造認識装置３０の構成について、図４を用いて説明する。図４は、実施形態に係る構造認識装置３０の構成例を示すブロック図である。図４に示すように、構造認識装置３０は、例えば、領域データ取得部３１と、文字認識データ取得部３２と、前処理部３３と、構造判定部３４と、構造データ出力部３５と、記憶部３６とを備える。 Here, the configuration of the structure recognition device 30 will be explained using FIG. 4. FIG. 4 is a block diagram showing a configuration example of the structure recognition device 30 according to the embodiment. As shown in FIG. 4, the structure recognition device 30 includes, for example, a region data acquisition section 31, a character recognition data acquisition section 32, a preprocessing section 33, a structure determination section 34, a structure data output section 35, and a memory. 36.

領域データ取得部３１は、領域分割装置１０から領域データ（文字領域データＭＤ、及び矩形領域データＫＤ）を取得する。文字領域データＭＤは、例えば、文字の領域における位置を示す座標と、文字の領域であることを示す識別情報とが対応付けられた情報である。矩形領域データＫＤは、例えば、矩形の領域における位置を示す座標と、矩形の領域であることを示す識別情報とが対応付けられた情報である。ここで、領域における位置を示す座標とは、例えば、領域の形状が四角形である場合、当該四角形の四つの頂点のうち、対角線上に位置する二つの頂点の座標である。或いは、領域における位置を示す座標は、四角形の四つの頂点のうち予め定めた特定の頂点（例えば、左下の頂点）の座標と、縦横それぞれの長さを示す情報であってもよい。領域データ取得部３１は、取得した領域データを、構造判定部３４に出力する。 The area data acquisition unit 31 acquires area data (character area data MD and rectangular area data KD) from the area dividing device 10. The character area data MD is, for example, information in which coordinates indicating a position in a character area are associated with identification information indicating that the area is a character area. The rectangular area data KD is, for example, information in which coordinates indicating a position in a rectangular area are associated with identification information indicating that the area is a rectangular area. Here, the coordinates indicating the position in the region are, for example, when the shape of the region is a quadrangle, the coordinates of two vertices located diagonally among the four vertices of the quadrangle. Alternatively, the coordinates indicating the position in the region may be information indicating the coordinates of a predetermined specific vertex (for example, the lower left vertex) among the four vertices of the quadrangle, and the length and width of the rectangle. The area data acquisition unit 31 outputs the acquired area data to the structure determination unit 34.

文字認識データ取得部３２は、ＯＣＲ装置２０から文字認識データＭＮＤを取得する。文字認識データＭＮＤは、例えば、文字領域データに、その領域で認識された文字を示す文字認識結果が対応付けられた情報である。文字認識データ取得部３２は、取得した文字認識データＭＮＤを前処理部３３に出力する。 The character recognition data acquisition unit 32 acquires character recognition data MND from the OCR device 20. The character recognition data MND is, for example, information in which character area data is associated with character recognition results indicating characters recognized in that area. The character recognition data acquisition unit 32 outputs the acquired character recognition data MND to the preprocessing unit 33.

前処理部３３は、後述する構造判定部３４が、階層構造を判定し易くする目的で、判定処理に先立って、事前の処理（前処理）を行う。具体的に、前処理部３３は、文字認識データ取得部３２から取得した文字認識データＭＮＤを用いて、意味タグ情報を生成する。 The preprocessing unit 33 performs preliminary processing (preprocessing) prior to the determination process for the purpose of making it easier for the structure determination unit 34 (described later) to determine the hierarchical structure. Specifically, the preprocessing unit 33 uses the character recognition data MND acquired from the character recognition data acquisition unit 32 to generate semantic tag information.

意味タグ情報は、文字領域データに、その領域に示された文字の意味に応じたタグ（意味タグ）を付与した情報である。意味タグは、意味的に同等の文言であることを示す何らかの情報であればよい。意味タグは、例えば、意味的に同等の文言を代表させた文言であり、より具体的には、「お住まい」、「住所」、「おところ」、「ご住所」などの文言が、「住所」であることを示す情報である。前処理部３３が、意味タグ情報を生成することにより、意味的に同等の文言を、一つの文言に統一させることができる。したがって、文言を統一しない場合と比較して、後段の処理を簡素にでき、後段の構造判定部３４が階層構造を判定し易くなる。 The semantic tag information is information in which a tag (meaning tag) corresponding to the meaning of the character shown in the area is added to the character area data. The semantic tag may be any information that indicates that the words are semantically equivalent. Semantic tags are, for example, words that represent semantically equivalent words, and more specifically, words such as "home", "address", "place", "address", etc. This information indicates that the address is "address". By generating semantic tag information, the preprocessing unit 33 can unify semantically equivalent phrases into one phrase. Therefore, compared to the case where the wording is not unified, subsequent processing can be simplified, and the subsequent structure determination unit 34 can easily determine the hierarchical structure.

前処理部３３は、文字認識データＭＮＤにおける文字認識結果を、変換テーブル３６０（図５参照）を用いて所定の文字に変換することにより意味タグ情報を生成する。変換テーブル３６０は、記憶部３６に記憶される情報であり、変換前の文字と、変換後の文字とが対応付けられた情報（テーブル）である。例えば、変換テーブル３６０の変換前の文字列には、帳票において頻出する文字であり、かつ表記にばらつきが有り得る文字が示される。変換前の文字列は、住所、おところ、ご住所などである。変換後の文字列には、意味に応じて設定した一つの文字、例えば「住所、おところ、ご住所」に対応する「住所」との文言が示される。 The preprocessing unit 33 generates semantic tag information by converting the character recognition results in the character recognition data MND into predetermined characters using the conversion table 360 (see FIG. 5). The conversion table 360 is information stored in the storage unit 36, and is information (table) in which characters before conversion and characters after conversion are associated with each other. For example, the character string before conversion in the conversion table 360 shows characters that frequently appear in forms and whose notation may vary. The character string before conversion is an address, place, address, etc. The converted character string shows one character set according to the meaning, for example, the word "address" corresponding to "address, place, address".

前処理部３３は、文字認識データＭＮＤにおける文字認識結果に基づいて変換テーブル３６０を参照する。前処理部３３は、変換テーブル３６０の変換前に示される文字に、文字認識結果が存在する場合、その変換前の文字に対応付けられた、変換後の文字を取得する。前処理部３３は、文字認識結果を、変換テーブル３６０に示される変換後の文字に変換する。前処理部３３は、文字領域データに、変換後の文字を対応づけることにより意味タグ情報を生成する。前処理部３３は、生成した意味タグ情報を構造判定部３４に出力する。なお、前処理部３３は、変換テーブル３６０の変換前に示される文字に、文字認識結果が存在しない場合、文字認識結果を変換することなく、文字領域データに、文字認識結果の文字を対応づけることにより意味タグ情報を生成する。 The preprocessing unit 33 refers to the conversion table 360 based on the character recognition result in the character recognition data MND. If a character recognition result exists for the character shown before conversion in the conversion table 360, the preprocessing unit 33 acquires the converted character associated with the pre-conversion character. The preprocessing unit 33 converts the character recognition results into converted characters shown in the conversion table 360. The preprocessing unit 33 generates semantic tag information by associating the converted characters with the character area data. The preprocessing unit 33 outputs the generated semantic tag information to the structure determining unit 34. Note that if there is no character recognition result for the character shown before conversion in the conversion table 360, the preprocessing unit 33 associates the character of the character recognition result with the character area data without converting the character recognition result. By this, semantic tag information is generated.

構造判定部３４は、矩形領域データ、及び意味タグ情報を用いて、矩形の階層構造を判定する。構造判定部３４は、構造判定モデル３６１を用いて矩形の階層構造を判定する。構造判定モデル３６１は、矩形領域データ、及び意味タグ情報と、矩形の親ＩＤとを対応付けた学習用データセットを、学習モデルに学習させた学習結果である。このような学習用データセットを学習させることにより、学習モデルは、入力された矩形領域データ、及び意味タグ情報に対し、矩形の親ＩＤを、精度よく出力（予測）できるように学習される。学習モデルは、例えば、ＲＮＮ（Recurrent Neural Network）である。ＲＮＮを用いることにより、順序づけられた系列情報に基づく学習を実行することができる。 The structure determining unit 34 determines the hierarchical structure of a rectangle using the rectangular area data and semantic tag information. The structure determination unit 34 determines the hierarchical structure of the rectangle using the structure determination model 361. The structure determination model 361 is a learning result obtained by causing a learning model to learn a learning dataset in which rectangular area data, semantic tag information, and rectangular parent IDs are associated with each other. By learning such a training data set, the learning model is trained to accurately output (predict) the parent ID of a rectangle based on the input rectangular area data and semantic tag information. The learning model is, for example, an RNN (Recurrent Neural Network). By using RNN, learning based on ordered sequence information can be performed.

構造判定部３４は、学習モデルにＲＮＮを用いる場合、構造判定モデル３６１に入力させるデータ（以下、入力データという）の順序が情報を持つように、入力データを生成する。構造判定部３４は、スキャン画像における着目矩形を選択する。着目矩形は、階層構造を判定したい矩形である。構造判定部３４は、着目矩形から所定の範囲（以下、第１範囲という）にある矩形領域データ（以下、近傍矩形群という）を抽出する。構造判定部３４は、着目矩形から所定の範囲（以下、第２範囲という）にある意味タグ情報（以下、近傍意味タグ群という）を抽出する。ここでの所定の範囲は、任意に設定されてよい。第１範囲と第２範囲とが互いに異なる範囲であってもよいし、同じ範囲であってもよい。また、第１範囲、第２範囲が予め定められた固定値であってもよいし、スキャン画像のサイズや、着目矩形の大きさに応じて、第１範囲、第２範囲が変動するようにしてもよい。 When using an RNN as a learning model, the structure determination unit 34 generates input data such that the order of data input to the structure determination model 361 (hereinafter referred to as input data) has information. The structure determination unit 34 selects a rectangle of interest in the scan image. The rectangle of interest is a rectangle whose hierarchical structure is to be determined. The structure determination unit 34 extracts rectangular area data (hereinafter referred to as a group of neighboring rectangles) within a predetermined range (hereinafter referred to as a first range) from the rectangle of interest. The structure determination unit 34 extracts semantic tag information (hereinafter referred to as a group of neighboring semantic tags) within a predetermined range (hereinafter referred to as a second range) from the rectangle of interest. The predetermined range here may be set arbitrarily. The first range and the second range may be different ranges from each other, or may be the same range. Further, the first range and the second range may be predetermined fixed values, or the first range and the second range may vary depending on the size of the scanned image or the size of the rectangle of interest. It's okay.

構造判定部３４は、着目矩形、近傍矩形群、近傍意味タグ群のそれぞれの代表座標（例えば、中心座標）をラスター順にソートしたデータを入力データとする。ここでのラスター順とは、二次元に配置された画素を読み込む（或いは、書込む）際における、所定の方向に沿った読み込み（書き込み）順序である。例えば、ラスター順は、画像における水平方向の左側から右側へ向かう方向に沿う順序であり、且つ垂直方向の上側から下側へ向かう方向である。しかしながら、ラスター順における所定の方向は、任意の方向であってよく、右側から左側へ向かう方向に沿う順序であってもよいし、下側から上側へ向かう方向に沿う順序であってもよい。 The structure determination unit 34 uses, as input data, data obtained by sorting the representative coordinates (for example, center coordinates) of each of the rectangle of interest, a group of neighboring rectangles, and a group of neighboring semantic tags in raster order. The raster order here refers to the reading (or writing) order along a predetermined direction when reading (or writing) pixels arranged two-dimensionally. For example, the raster order is an order along the horizontal direction from the left to the right in the image, and a vertical direction from the top to the bottom. However, the predetermined direction in the raster order may be any direction, and may be an order from the right side to the left side, or an order from the bottom side to the top side.

構造判定部３４は、生成した入力データを構造判定モデル３６１に入力させることにより得られる出力に基づいて、着目矩形の親ＩＤを判定する。構造判定部３４は、スキャン画像における全ての矩形を一つずつ着目矩形として選択し、上述した方法を繰り返し行うことにより、全ての矩形の親ＩＤを判定する。これにより、構造判定部３４は、矩形の階層構造を判定する。構造判定部３４は、判定した矩形の階層構造を示す情報、すなわち構造化データを構造データ出力部３５に出力する。構造データ出力部３５は、構造化データを出力する。記憶部３６は、変換テーブル３６０、及び構造判定モデル３６１を記憶する。 The structure determination unit 34 determines the parent ID of the rectangle of interest based on the output obtained by inputting the generated input data to the structure determination model 361. The structure determination unit 34 selects all the rectangles in the scanned image one by one as the rectangle of interest, and repeatedly performs the above-described method to determine the parent IDs of all the rectangles. Thereby, the structure determination unit 34 determines the hierarchical structure of the rectangle. The structure determination unit 34 outputs information indicating the determined hierarchical structure of the rectangle, that is, structured data, to the structure data output unit 35. The structured data output unit 35 outputs structured data. The storage unit 36 stores a conversion table 360 and a structure determination model 361.

なお、上述した入力データを入力させて構造判定モデル３６１に矩形の階層構造を出力させる場合、学習段階においても、同様な方法で学習用データセットにおける入力データを生成する必要がある。すなわち、学習用の画像から着目矩形を選択し、選択した着目矩形に対する近傍矩形群、及び近傍意味タグ群を抽出する。そして、着目矩形、近傍矩形群、近傍意味タグ群のそれぞれの代表座標（例えば、中心座標）をラスター順にソートしたデータを入力データとする。入力データを学習モデルに入力することにより得られる出力が、その着目矩形の親ＩＤとなるように学習させることにより、構造判定モデル３６１が生成される。 Note that when inputting the above-mentioned input data and causing the structure determination model 361 to output a rectangular hierarchical structure, it is necessary to generate the input data in the learning dataset using a similar method in the learning stage as well. That is, a rectangle of interest is selected from the learning image, and a group of neighboring rectangles and a group of neighboring semantic tags for the selected rectangle of interest are extracted. Then, input data is data obtained by sorting the representative coordinates (for example, center coordinates) of each of the rectangle of interest, a group of neighboring rectangles, and a group of neighboring semantic tags in raster order. A structure determination model 361 is generated by learning such that the output obtained by inputting input data to the learning model becomes the parent ID of the rectangle of interest.

図５は、実施形態に係る変換テーブル３６０の構成例を示す図である。変換テーブル３６０は、例えば、意味タグＩＤ、変換後、変換前などの各項目を備える。意味タグＩＤには、意味タグを一意に識別する識別情報が示される。変換後には変換後の文字が示される。変換前には変換前の文字列が示される。この例では、意味タグＩＤ（Ｅ０００１）に、変換後の文字として「氏名」、変換前の文字として「お名前」、「名前」、「おなまえ」が示されている。 FIG. 5 is a diagram showing a configuration example of the conversion table 360 according to the embodiment. The conversion table 360 includes items such as, for example, meaning tag ID, after conversion, and before conversion. The meaning tag ID indicates identification information that uniquely identifies the meaning tag. After conversion, the converted characters are shown. Before conversion, the character string before conversion is shown. In this example, the meaning tag ID (E0001) shows "name" as characters after conversion, and "name", "name", and "name" as characters before conversion.

図６は、実施形態に係る領域分割装置１０が行う処理を説明する図である。図６には、領域分割装置１０が判定した文字及び矩形それぞれの領域の例が示されている。領域分割装置１０は、図６に示すスキャン画像から、文字の領域Ｍ１～Ｍ６、及び矩形の領域Ｋ１～Ｋ５のそれぞれの領域を抽出する。領域Ｍ１は、「申込書」の文字が示されている領域である。領域Ｍ２は、「ご住所」の文字が示されている領域である。領域Ｍ３は、「都道府県」の文字が示されている領域である。領域Ｍ４は、「お名前」の文字が示されている領域である。領域Ｍ５は、「記入日」の文字が示されている領域である。領域Ｍ６は、「年月日」の文字が示されている領域である。このように、領域分割装置１０は、例えば、文字の領域を、矩形（四角形）の形状の領域として抽出するようにしてもよい。 FIG. 6 is a diagram illustrating processing performed by the region dividing apparatus 10 according to the embodiment. FIG. 6 shows examples of character and rectangular regions determined by the region dividing device 10. The region dividing device 10 extracts character regions M1 to M6 and rectangular regions K1 to K5 from the scanned image shown in FIG. Area M1 is an area where the characters "Application Form" are shown. Area M2 is an area where the characters "address" are shown. Area M3 is an area where the characters "prefecture" are shown. Area M4 is an area where the characters "name" are shown. Area M5 is an area where the characters "Date of Entry" are shown. Area M6 is an area where the characters "Year Month Day" are shown. In this way, the region dividing device 10 may extract, for example, a character region as a rectangular (square) shaped region.

領域Ｋ１は、領域Ｍ２を囲む矩形が示されている領域である。領域Ｋ２は、領域Ｍ３が枠内の右端に配置されるように、領域Ｍ３を囲む矩形が示されている領域である。領域Ｋ３は、領域Ｋ２の右側に配置される矩形が示されている領域である。領域Ｋ３は、領域Ｍ４を囲む矩形が示されている領域である。領域Ｋ５は、領域Ｋ４の右側に配置される矩形が示されている領域である。 Region K1 is a region in which a rectangle surrounding region M2 is shown. Area K2 is an area in which a rectangle surrounding area M3 is shown so that area M3 is placed at the right end of the frame. Area K3 is an area in which a rectangle placed on the right side of area K2 is shown. Region K3 is a region in which a rectangle surrounding region M4 is shown. Area K5 is an area in which a rectangle placed on the right side of area K4 is shown.

図７は、実施形態に係る構造認識装置３０が行う処理を説明する図である。図７には、構造認識装置３０が判定した構造化データを、ツリー構造により可視化した例が示されている。図７において、領域Ｍ１＃は、文字の領域Ｍ１に示された文字が、前処理部３３により変換された後の領域を示している。領域Ｍ２＃～Ｍ６＃についても同様に、文字の領域Ｍ２～Ｍ６に示された文字が、前処理部３３により変換された後の領域を示している。 FIG. 7 is a diagram illustrating processing performed by the structure recognition device 30 according to the embodiment. FIG. 7 shows an example in which structured data determined by the structure recognition device 30 is visualized in a tree structure. In FIG. 7, area M1# indicates an area after the characters shown in character area M1 have been converted by the preprocessing unit 33. Similarly, the regions M2# to M6# indicate the regions after the characters shown in the character regions M2 to M6 have been converted by the preprocessing unit 33.

構造認識装置３０は、例えば、図６に示すスキャン画像における意味タグ情報、及び矩形領域データに基づいて、矩形の階層構造を判定する。構造認識装置３０は、領域Ｋ１の親（従属元）は、領域Ｋ２であると判定する。構造認識装置３０は、領域Ｋ４の親は、領域Ｋ２であると判定する。構造認識装置３０は、領域Ｋ５の親は、領域Ｋ３であると判定する。 The structure recognition device 30 determines the hierarchical structure of a rectangle based on the semantic tag information and rectangular area data in the scanned image shown in FIG. 6, for example. The structure recognition device 30 determines that the parent (dependent source) of the area K1 is the area K2. The structure recognition device 30 determines that the parent of the area K4 is the area K2. The structure recognition device 30 determines that the parent of the region K5 is the region K3.

図８は、実施形態に係るレイアウト変換の例を示す図である。図８に示すように、図６に示す縦長の帳票を、横長のレイアウトに変換することを考える。この場合、構造認識装置３０により判定された矩形の階層構造を維持しつつレイアウトを変更する。すなわち、領域Ｋ１の親が領域Ｋ２となるように、領域Ｋ４の親が領域Ｋ２となるように、レイアウトを変換する。こうすることで、元の帳票に記載されていた必要事項を過不足なく、且つ元の帳票と同等な感覚で必要事項を記載させることができるようにレイアウトの変換を行うことが可能となる。なお、この例に示すように、必要に応じて領域Ｋ６、Ｋ７を補うようにしてもよい。領域Ｋ６は、「日付」の文字を内包する矩形の領域である。領域Ｋ７は、「年月日」の文字を内包する矩形の領域である。例えば、領域Ｋ７の親が領域Ｋ６であると判定された場合、その判定結果を用いることにより、図８に示すような適切な変換を行うことが可能となる。 FIG. 8 is a diagram illustrating an example of layout conversion according to the embodiment. As shown in FIG. 8, consider converting the vertically long form shown in FIG. 6 into a horizontally long layout. In this case, the layout is changed while maintaining the rectangular hierarchical structure determined by the structure recognition device 30. That is, the layout is converted so that the parent of area K1 becomes area K2 and the parent of area K4 becomes area K2. By doing this, it becomes possible to convert the layout so that the necessary items written in the original form can be written in just the right amount and in the same way as the original form. Note that, as shown in this example, regions K6 and K7 may be supplemented as necessary. Area K6 is a rectangular area containing characters of "date". The area K7 is a rectangular area containing the characters "Year Month Day". For example, if it is determined that the parent of area K7 is area K6, by using the determination result, it becomes possible to perform appropriate conversion as shown in FIG. 8.

図９は、実施形態に係る構造認識システム１が行う処理の流れを示すシーケンス図である。領域分割装置１０は、スキャン画像データを取得し（ステップＳ１０）、スキャン画像における文字及び矩形の領域を判定することにより、文字と矩形それぞれの領域データを生成する（ステップＳ１１）。構造認識装置３０は、文字の領域データ、及びＯＣＲ装置２０により文字認識された文字認識データを用いて、意味タグ情報を生成する（ステップＳ１２）。 FIG. 9 is a sequence diagram showing the flow of processing performed by the structure recognition system 1 according to the embodiment. The area dividing device 10 acquires scanned image data (step S10), and generates area data for each of the characters and rectangles by determining the areas of the characters and rectangles in the scanned image (step S11). The structure recognition device 30 generates semantic tag information using the character area data and the character recognition data obtained by character recognition by the OCR device 20 (step S12).

構造認識装置３０は、スキャン画像から着目矩形を選択する（ステップＳ１３）。構造認識装置３０は、着目矩形における近傍意味タグ群を取得し（ステップＳ１４）、近傍矩形群を取得する（ステップＳ１５）。構造認識装置３０は、着目矩形、近傍意味タグ群、及び近傍矩形群の代表座標をラスター順にソートすることにより入力データを生成する（ステップＳ１６）。構造認識装置３０は、入力データを構造判定モデル３６１に入力させることにより得られる出力に基づいて、着目矩形の親ＩＤを判定する（ステップＳ１７）。構造認識装置３０は、スキャン画像における全ての矩形について親ＩＤを判定したか否かを判断し（ステップＳ１８）、親ＩＤを判定していない矩形がある場合には、ステップＳ１３に戻り、親ＩＤを判定する処理を繰返す。 The structure recognition device 30 selects a rectangle of interest from the scanned image (step S13). The structure recognition device 30 obtains a group of neighboring semantic tags for the rectangle of interest (step S14), and obtains a group of neighboring rectangles (step S15). The structure recognition device 30 generates input data by sorting the representative coordinates of the rectangle of interest, the neighboring semantic tag group, and the neighboring rectangular group in raster order (step S16). The structure recognition device 30 determines the parent ID of the rectangle of interest based on the output obtained by inputting the input data to the structure determination model 361 (step S17). The structure recognition device 30 determines whether parent IDs have been determined for all rectangles in the scanned image (step S18), and if there is a rectangle for which parent IDs have not been determined, the process returns to step S13 and the parent IDs are determined for all rectangles in the scanned image. The process of determining is repeated.

以上説明したように、実施形態の構造認識システム１は、画像データ取得部１１と、領域判定部１３と、構造判定部３４とを備える。画像データ取得部１１は、文字と矩形とを含むスキャン画像（「対象画像」の一例）における、画像データを取得する。領域判定部１３は、スキャン画像における文字と矩形のそれぞれの領域を判定する。構造判定部３４は、領域データに基づいて、前記対象画像に含まれる矩形の階層構造を判定する。これにより、実施形態の構造認識システム１では、矩形の階層構造を判定することができる。したがって、レイアウトの変更に必要な情報を取得することが可能である。 As described above, the structure recognition system 1 of the embodiment includes the image data acquisition section 11, the area determination section 13, and the structure determination section 34. The image data acquisition unit 11 acquires image data in a scanned image (an example of a "target image") including characters and rectangles. The area determining unit 13 determines each character and rectangular area in the scanned image. The structure determination unit 34 determines the hierarchical structure of rectangles included in the target image based on the area data. Thereby, the structure recognition system 1 of the embodiment can determine the hierarchical structure of rectangles. Therefore, it is possible to obtain information necessary for changing the layout.

また、実施形態の構造認識システム１は、前処理部３３を更に備える。前処理部３３は、文字領域データを用いて、当該領域に示される文字認識結果（「第１文字」の一例）に対応する、意味に応じて設定される文字（「特定の第２文字」の一例）を含む意味タグ情報を生成する。これにより、実施形態の構造認識システム１では、文字領域データに示されている文字について、その意味に応じたタグ付けを行うことができ、構造判定部３４による判定の処理を、タグ付けを行わない場合と比較して、簡単にすることが可能である。 Further, the structure recognition system 1 of the embodiment further includes a preprocessing section 33. The preprocessing unit 33 uses the character area data to generate a character (a ``specific second character'') set according to the meaning corresponding to the character recognition result (an example of a ``first character'') shown in the area. Generate semantic tag information including (an example). As a result, in the structure recognition system 1 of the embodiment, it is possible to tag characters shown in the character area data according to their meanings, and the determination processing by the structure determination unit 34 can be performed by tagging. It is possible to simplify it compared to the case without it.

また、実施形態の構造認識システム１では、構造判定部３４は、構造判定モデル３６１（「学習済みモデル」の一例）を用いて、矩形の階層構造を判定する。構造判定モデル３６１は、文字と矩形とを含む学習画像における、意味タグ情報及び矩形領域データと、学習画像に含まれる矩形の構造化データと、を対応付けた学習用データセットを用いて、入力された画像に含まれる矩形の構造化データを出力するように学習されたモデルである。これにより、実施形態の構造認識システム１では、学習済みモデルにデータを入力させるという簡単な方法で、矩形の階層構造を認識することが可能である。 Further, in the structure recognition system 1 of the embodiment, the structure determination unit 34 determines the hierarchical structure of a rectangle using the structure determination model 361 (an example of a "trained model"). The structure determination model 361 is input using a learning dataset that associates semantic tag information and rectangular area data in a learning image containing characters and rectangles with structured rectangular data included in the learning image. This model is trained to output rectangular structured data included in the image. Thereby, in the structure recognition system 1 of the embodiment, it is possible to recognize a rectangular hierarchical structure by a simple method of inputting data to a trained model.

また、実施形態の構造認識システム１では、構造判定部３４は、スキャン画像において、着目矩形を選択し、着目矩形における近傍意味タグ群を取得し、着目矩形における近傍矩形群を取得し、取得した着目矩形、意味タグ群、及び近傍矩形群の位置に応じたソート（並べ替え）を行うことにより、構造判定モデル３６１に入力させる入力データの順序を決定する。これにより、実施形態の構造認識システム１では、入力データに意味（情報）を持たせることができ、ＲＮＮ系の学習モデルに基づく学習済みモデルを用いて、入力データの順序を考慮した予測、すなわち近傍にある文字や矩形との関係から、親ＩＤを予測させることができ、予測の精度向上が期待できる。 In the structure recognition system 1 of the embodiment, the structure determination unit 34 selects a rectangle of interest in the scan image, acquires a group of neighboring semantic tags for the rectangle of interest, obtains a group of neighboring rectangles of the rectangle of interest, and The order of input data to be input to the structure determination model 361 is determined by sorting according to the positions of the rectangle of interest, the group of meaning tags, and the group of neighboring rectangles. As a result, in the structure recognition system 1 of the embodiment, it is possible to give meaning (information) to the input data, and use a trained model based on an RNN-based learning model to perform predictions that take into account the order of the input data, i.e. The parent ID can be predicted from the relationship with nearby characters and rectangles, and an improvement in prediction accuracy can be expected.

また、実施形態の構造認識装置３０は、領域データ取得部３１と構造判定部３４とを備える。領域データ取得部３１は、スキャン画像における文字と矩形とのそれぞれの領域に関する領域データを取得する。構造判定部３４は、領域データに基づいて、スキャン画像に含まれる矩形の階層構造を判定する。これにより、上述した効果と同様の効果を奏する。 Further, the structure recognition device 30 of the embodiment includes a region data acquisition section 31 and a structure determination section 34. The area data acquisition unit 31 acquires area data regarding respective areas of characters and rectangles in the scanned image. The structure determination unit 34 determines the hierarchical structure of rectangles included in the scanned image based on the area data. This produces effects similar to those described above.

上述した実施形態における構造認識システム１、及び構造認識装置３０の全部または一部をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＦＰＧＡ等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 All or part of the structure recognition system 1 and structure recognition device 30 in the embodiments described above may be realized by a computer. In that case, a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read into a computer system and executed. Note that the "computer system" herein includes hardware such as an OS and peripheral devices. Furthermore, the term "computer-readable recording medium" refers to portable media such as flexible disks, magneto-optical disks, ROMs, and CD-ROMs, and storage devices such as hard disks built into computer systems. Furthermore, a "computer-readable recording medium" refers to a storage medium that dynamically stores a program for a short period of time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. It may also include a device that retains a program for a certain period of time, such as a volatile memory inside a computer system that is a server or client in that case. Further, the above-mentioned program may be one for realizing a part of the above-mentioned functions, or may be one that can realize the above-mentioned functions in combination with a program already recorded in the computer system. It may also be realized using a programmable logic device such as an FPGA.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiments of the present invention have been described above in detail with reference to the drawings, the specific configuration is not limited to these embodiments, and includes designs within the scope of the gist of the present invention.

１…構造認識システム
１０…領域分割装置
１１…画像データ取得部
１２…変調画像生成部
１３…領域判定部
１４…領域データ出力部
１５…記憶部
１５０…領域判定モデル
２０…ＯＣＲ装置
３０…構造認識装置
３１…領域データ取得部
３２…文字認識データ取得部
３３…前処理部
３４…構造判定部
３５…構造データ出力部
３６…記憶部
３６０…変換テーブル
３６１…構造判定モデル DESCRIPTION OF SYMBOLS 1... Structure recognition system 10... Area division device 11... Image data acquisition part 12... Modulated image generation part 13... Area determination part 14... Area data output part 15... Storage part 150... Area determination model 20... OCR device 30... Structure recognition Device 31...Region data acquisition unit 32...Character recognition data acquisition unit 33...Preprocessing unit 34...Structure determination unit 35...Structure data output unit 36...Storage unit 360...Conversion table 361...Structure determination model

Claims

an image data acquisition unit that acquires image data in a target image including characters and rectangles;
an area determination unit that determines respective areas of characters and rectangles in the target image;
A hierarchical structure of rectangles included in the target image is determined based on area data regarding the area determined by the area determining unit , and the hierarchical structure of the rectangles included in the target image is to be determined. a structure determining unit that determines a rectangle or character that is a dependent source of the rectangle of interest ;
A structure recognition system equipped with

an image data acquisition unit that acquires image data in a target image including characters and rectangles;
an area determination unit that determines respective areas of characters and rectangles in the target image;
a structure determining unit that determines a hierarchical structure of rectangles included in the target image based on area data regarding the area determined by the area determining unit;
a preprocessing unit that uses the area data indicating a character area to generate semantic tag information including a specific second character corresponding to the first character shown in the area;
Equipped with
The structure determining unit determines the hierarchical structure included in the target image using a learned model,
The trained model includes a training dataset in which the semantic tag information in the learning image containing characters and rectangles, the area data indicating the rectangular area, and the hierarchical structure of the rectangles included in the learning image are associated. This is a model trained to output a hierarchical structure of rectangles included in the input image using
Structure recognition system.

further comprising a preprocessing unit that uses the area data indicating a character area to generate semantic tag information including a specific second character corresponding to the first character shown in the area;
The structure determining unit determines the hierarchical structure based on the semantic tag information and the area data indicating a rectangular area.
The structure recognition system according to claim 1.

The structure determining unit determines the hierarchical structure using a learned model,
The trained model is a training data set in which the semantic tag information and the area data indicating a rectangular area in a learning image including characters and rectangles are associated with the hierarchical structure of rectangles included in the learning image. This is a model trained to output a hierarchical structure of rectangles included in the input image using
The structure recognition system according to claim 3 .

The structure determining unit selects a rectangle of interest whose hierarchical structure is to be determined in the target image, and selects a group of neighboring semantic tags that are the semantic tag information located within a predetermined first range from the position of the selected rectangle of interest. , obtain a group of neighboring rectangles that are the area data of a rectangle located within a predetermined second range from the position of the selected rectangle of interest, and obtain the obtained rectangle of interest, the group of neighboring semantic tags, and the neighborhood. determining the order of input data to be input to the trained model by rearranging the rectangle groups according to their positions;
The structure recognition system according to claim 2 or claim 4 .

an area data acquisition unit that acquires area data regarding each area of characters and rectangles in the target image;
A hierarchical structure of rectangles included in the target image is determined based on the area data, and among the rectangles included in the target image, a rectangle or a character is a dependent source of the rectangle of interest whose hierarchical structure is to be determined. a structure determination unit that determines the
A structure recognition device comprising:

The area data acquisition unit acquires area data regarding each area of characters and rectangles in the target image,
A structure determination unit determines a hierarchical structure of rectangles included in the target image based on the area data , and determines a dependent element of a rectangle of interest whose hierarchical structure is to be determined among the rectangles included in the target image. Determine the rectangle or character that is
Structure recognition method.

A program for causing a computer to operate as the structure recognition device according to claim 6 , the program for causing the computer to function as each section included in the structure recognition device.